from:"Colin McCabe"

Re: [VOTE] Merge feature branch YARN-5355 (Timeline Service v2) to trunk

2017-08-30 Thread Colin McCabe

The "git" way of doing things would be to rebase the feature branch on
master (trunk) and then commit the patch stack.

Squashing the entire feature into a 10 MB megapatch is the "svn" way of
doing things.

The svn workflow evolved because merging feature branches back to trunk
was really painful in svn.  So people preferred just to basically do an
rsync from a checkout of the feature branch, edit that up a bit to make
sure they weren't overwriting something that happened in trunk, and then
do an "svn commit" that did not tie back to the feature branch.

P.S. Merges were so painful in SVN that some of the organizations I
worked for maintained spreadsheets of patches which had been merged to
each branch, since svn was of so little help with merging

P.P.S. svn eventually got slightly smarter about merges -- I think in
svn 1.6 or something?

Colin


On Wed, Aug 30, 2017, at 14:37, Sangjin Lee wrote:
> I recall this discussion about a couple of years ago:
> https://lists.apache.org/thread.html/43cd65c6b6c3c0e8ac2b3c76afd9eff1f78b177fabe9c4a96d9b3d0b@1440189889@%3Ccommon-dev.hadoop.apache.org%3E
> 
> On Wed, Aug 30, 2017 at 2:32 PM, Steve Loughran 
> wrote:
> 
> > I'd have assumed it would have gone in as one single patch, rather than a
> > full history. I don't see why the trunk needs all the evolutionary history
> > of a build.
> >
> > What should our policy/process be here?
> >
> > I do currently plan to merge the s3guard in as one single squashed patch;
> > just getting HADOOP-14809 sorted first.
> >
> >
> > > On 30 Aug 2017, at 07:09, Vrushali C  wrote:
> > >
> > > I'm adding my +1 (binding) to conclude the vote.
> > >
> > > With 13 +1's (11 binding) and no -1's, the vote passes. We'll get on with
> > > the merge to trunk shortly. Thanks everyone!
> > >
> > > Regards
> > > Vrushali
> > >
> > >
> > > On Tue, Aug 29, 2017 at 10:54 AM, varunsax...@apache.org <
> > > varun.saxena.apa...@gmail.com> wrote:
> > >
> > >> +1 (binding).
> > >>
> > >> Kudos to all the team members for their great work!
> > >>
> > >> Being part of the ATSv2 team, I have been involved with either
> > development
> > >> or review of most of the JIRAs'.
> > >> Tested ATSv2 in both secure and non-secure mode. Also verified that
> > there
> > >> is no impact when ATSv2 is turned off.
> > >>
> > >> Regards,
> > >> Varun Saxena.
> > >>
> > >> On Tue, Aug 22, 2017 at 12:02 PM, Vrushali Channapattan <
> > >> vrushalic2...@gmail.com> wrote:
> > >>
> > >>> Hi folks,
> > >>>
> > >>> Per earlier discussion [1], I'd like to start a formal vote to merge
> > >>> feature branch YARN-5355 [2] (Timeline Service v.2) to trunk. The vote
> > >>> will
> > >>> run for 7 days, and will end August 29 11:00 PM PDT.
> > >>>
> > >>> We have previously completed one merge onto trunk [3] and Timeline
> > Service
> > >>> v2 has been part of Hadoop release 3.0.0-alpha1.
> > >>>
> > >>> Since then, we have been working on extending the capabilities of
> > Timeline
> > >>> Service v2 in a feature branch [2] for a while, and we are reasonably
> > >>> confident that the state of the feature meets the criteria to be merged
> > >>> onto trunk and we'd love folks to get their hands on it in a test
> > capacity
> > >>> and provide valuable feedback so that we can make it production-ready.
> > >>>
> > >>> In a nutshell, Timeline Service v.2 delivers significant scalability
> > and
> > >>> usability improvements based on a new architecture. What we would like
> > to
> > >>> merge to trunk is termed "alpha 2" (milestone 2). The feature has a
> > >>> complete end-to-end read/write flow with security and read level
> > >>> authorization via whitelists. You should be able to start setting it up
> > >>> and
> > >>> testing it.
> > >>>
> > >>> At a high level, the following are the key features that have been
> > >>> implemented since alpha1:
> > >>> - Security via Kerberos Authentication and delegation tokens
> > >>> - Read side simple authorization via whitelist
> > >>> - Client configurable entity sort ordering
> > >>> - Richer REST APIs for apps, app attempts, containers, fetching
> > metrics by
> > >>> timerange, pagination, sub-app entities
> > >>> - Support for storing sub-application entities (entities that exist
> > >>> outside
> > >>> the scope of an application)
> > >>> - Configurable TTLs (time-to-live) for tables, configurable table
> > >>> prefixes,
> > >>> configurable hbase cluster
> > >>> - Flow level aggregations done as dynamic (table level) coprocessors
> > >>> - Uses latest stable HBase release 1.2.6
> > >>>
> > >>> There are a total of 82 subtasks that were completed as part of this
> > >>> effort.
> > >>>
> > >>> We paid close attention to ensure that once disabled Timeline Service
> > v.2
> > >>> does not impact existing functionality when disabled (by default).
> > >>>
> > >>> Special thanks to a team of folks who worked hard and contributed
> > towards
> > >>> this effort with patches, reviews and guidance: Rohith Sharma K S,
> > Varun
> > >>> Saxena, Haib

Re: [DISCUSS] Branches and versions for Hadoop 3

2017-08-28 Thread Colin McCabe

On Mon, Aug 28, 2017, at 14:22, Allen Wittenauer wrote:
> 
> > On Aug 28, 2017, at 12:41 PM, Jason Lowe  wrote:
> > 
> > I think this gets back to the "if it's worth committing" part.
> 
>   This brings us back to my original question:
> 
>   "Doesn't this place an undue burden on the contributor with the first 
> incompatible patch to prove worthiness?  What happens if it is decided that 
> it's not good enough?"

I feel like this line of argument is flawed by definition.  "What
happens if the patch isn't worth breaking compatibility over"?  Then we
shouldn't break compatibility over it.  We all know that most
compatibility breaks are avoidable with enough effort.  And it's an
effort we should make, for the good of our users.

Most useful features can be implemented without compatibility breaks. 
And for the few that truly can't, the community should surely agree that
it's worth breaking compatibility before we do it.  If it's a really
cool feature, that approval will surely not be hard to get (I'm tempted
to quote your earlier email about how much we love features...)

> 
>   The answer, if I understand your position, is then at least a maybe 
> leaning towards yes: a patch that prior to this branching policy change that  
> would have gone in without any notice now has a higher burden (i.e., major 
> feature) to prove worthiness ... and in the process eliminates a whole class 
> of contributors and empowers others. Thus my concern ...
> 
> > As you mentioned, people are already breaking compatibility left and right 
> > as it is, which is why I wondered if it was really any better in practice.  
> > Personally I'd rather find out about a major breakage sooner than later, 
> > since if trunk remains an active area of development at all times it's more 
> > likely the community will sit up and take notice when something crazy goes 
> > in.  In the past, trunk was not really an actively deployed area for over 5 
> > years, and all sorts of stuff went in without people really being aware of 
> > it.
> 
>   Given the general acknowledgement that the compatibility guidelines are 
> mostly useless in reality, maybe the answer is really that we're doing 
> releases all wrong.  Would it necessarily be a bad thing if we moved to a 
> model where incompatible changes gradually released instead of one big one 
> every seven?

I haven't seen anyone "acknowledge that... compatibility guidelines are
mostly useless"... even you.  Reading your posts from the past, I don't
get that impression.  On the contrary, you are often upset about
compatibility breakages.

What would be positive about allowing compatibility breaks in minor
releases?  Can you give a specific example of what would be improved?

> 
>   Yes, I lived through the "walking on glass" days at Yahoo! and realize 
> what I'm saying.  But I also think the rate of incompatible changes has 
> slowed tremendously.  Entire groups of APIs aren't getting tossed out every 
> week anymore.
> 
> > It sounds like we agree on that part but disagree on the specifics of how 
> > to help trunk remain active.
> 
>   Yup, and there is nothing wrong with that. ;)
> 
> >  Given that historically trunk has languished for years I was hoping this 
> > proposal would help reduce the likelihood of it happening again.  If we 
> > eventually decide that cutting branch-3 now makes more sense then I'll do 
> > what I can to make that work well, but it would be good to see concrete 
> > proposals on how to avoid the problems we had with it over the last 6 years.
> 
> 
>   Yup, agree. But proposals rarely seem to get much actual traction. 
> (It's kind of fun reading the Hadoop bylaws and compatibility guidelines and 
> old [VOTE] threads to realize how much stuff doesn't actually happen despite 
> everyone generally agree that abc is a good idea.)  To circle back a bit, I 
> do also agree that automation has a role to play
> 
>Before anyone can accuse or imply me of being a hypocrite (and I'm 
> sure someone eventually will privately if not publicly), I'm sure some folks 
> don't realize I've been working on this set of problems from a different 
> angle for the past few years.
> 
>   There are a handful of people that know I was going to attempt to do a 
> 3.x release a few years ago. [Andrew basically beat me to it. :) ] But I ran 
> into the release process.  What a mess.  Way too much manual work, lots of 
> undocumented bits, violation of ASF rules(!) , etc, etc.  We've all heard the 
> complaints.
> 
>   My hypothesis:  if the release process itself is easier, then getting a 
> release based on trunk is easier too. The more we automate, the more 
> non-vendors ("non traditional release managers"?) will be willing to roll 
> releases.  The more people that feel comfortable rolling a release, the more 
> likelihood releases will happen.  The more likelihood of releases happening, 
> the greater chance trunk had of getting out the door.

Re: inotify

2016-07-05 Thread Colin McCabe

I think it makes sense to have an AddBlockEvent.  It seems like we could
provide something like the block ID, block pool ID, and genstamp, as
well as the inode ID and path of the file which the block was added to. 
Clearly, we cannot provide the length, since we don't know how many
bytes the client will write.  Would you mind filing a JIRA for this?

We should also set up inotify events for the snapshot operations at some
point as well.

regards,
Colin

On Fri, Jul 1, 2016, at 10:28, rahul gidwani wrote:
> Hello kind folks,
> 
> I was wondering if anyone would be interested in adding a AddBlock Event
> to
> the inotify pipeline?  We were thinking about using Inotify for hdfs
> replication to another datacenter.
> 
> Right now the problem is with the appends.  We get a notification when an
> append starts with an AppendEvent and then we know the file is complete
> when we get a CloseEvent.  This would increase our latency considerably.
> 
> So if we could add an AddBlockEvent whenever we get an OP_ADD_BLOCK that
> would give us the ability to ship blocks.
> 
> This would basically give us an ExtendedBlock and then we could ask the
> namenode to give us the corresponding LocatedBlock which we could just
> get
> a stream of bytes ready to ship to our destination cluster.
> 
> Is this something the community would be interested in?
> 
> Thank you
> rahul

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Re: HDFS Block compression

2016-07-05 Thread Colin McCabe

We have discussed this in the past.  I think the single biggest issue is
that HDFS doesn't understand the schema of the data which is stored in
it.  So it may not be aware of what compression scheme would be most
appropriate for the application and data.

While it is true that HDFS doens't allow random writes, it does allow
random reads.  In fact, HDFS currently supports a very low-cost seek
operation while reading an input stream.  Compression would increase the
cost of seeking greatly.  Of course, the cost increase would depend on
the kind of compression used-- "chunk-based" schemes where there was
some kind of side index could be more efficient at seeking.

Because HDFS transparent encryption is client-side, it will not work
unless compression is client-side as well.  The reason is because
compressing encrypted data provides no space savings.  But a client-side
scheme loses some of the benefits of doing compression in HDFS, like the
ability to cache the uncompressed data in the DataNode.

I think any project along these lines should start with a careful
analysis of what the goals are and what advantages the scheme has over
the current client-side compression.

best,
Colin

On Mon, Jul 4, 2016, at 07:16, Robert James wrote:
> A lot of work in Hadoop concerns splittable compression.  Could this
> be solved by offerring compression at the HDFS block (ie 64 MB) level,
> just like many OS filesystems do?
> 
> http://stackoverflow.com/questions/6511255/why-cant-hadoop-split-up-a-large-text-file-and-then-compress-the-splits-using-g?rq=1
> discusses this and suggests the issues is separation of concerns.
> However, if the compression is done at the *HDFS block* level (with
> perhaps a single flag indicating such), this would be totally
> transparent to readers and writers.  This is the exact way, for
> example, NTFS compression works; apps need no knowledge of the
> compression.  HDFS, since it doesn't allow random reads and writes,
> but only streaming, is a perfect candidate for this.
> 
> Thoughts?
> 
> -
> To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
> 

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Re: [DISCUSS] Increased use of feature branches

2016-06-13 Thread Colin McCabe

On Mon, Jun 13, 2016, at 12:41, Anu Engineer wrote:
> Hi Colin,
> 
> >Even if everyone used branches for all development, person X might merge
> >their branch before person Y, forcing person Y to do a rebase or merge. 
> >It is not the presence of absence of branches that causes the need to
> >merge or rebase, but the presence of absence of "churn."
> 
> You are perfectly right on this technically. The issue is when a 
> branch developer gets caught in Commit, Revert, let-us-commit-again, 
> oh-it-is-not-fixed-completely, let-us-revert-the-revert cycle. 
> 
> I was hoping that branches will be exposed to less of this if everyone 
> had private branches and got some time to test and bake the feature 
> instead of just directly committing to trunk and then test.

To be fair to developers, when something becomes problematic after it
gets committed, it's usually because of something that didn't show up in
testing on a private branch.  For example, maybe unit tests fail
occasionally with JDK7 instead of JDK8 (but the developer wasn't using
JDK7, so how would he know?)  Maybe there's a flaky test that shows up
when the test machine is overloaded (but the developer's machine wasn't
overloaded, so how would he see this?)  Maybe there's some interaction
with a new feature that just got added in trunk.  And so on.

> Once again, I agree with your point that in a perfect world, merges
> should
> be about the churn, but trunk is often treated as development branch, 
> So my point is that it gets unnecessary churn. I really appreciate the 
> thought in the thread - that is - let us be more responsible about how we
> treat trunk.

I think assuming that we will catch all bugs before branch merge is the
"perfect world" view, and accepting that some of them will get through
is the realistic view.  Feature branch code will receive fewer test runs
since it's not tested in every precommit build like trunk code is.  I do
agree that good and well-thought out tests should be a precondition of
merging any big feature branch.   But we have to expect that merges will
be destabilizing in Hadoop, just like in every other software project
out there.

Trunk *is* a development branch, and should be treated as such.  Not
everything that hits trunk needs to immediately hit the stable branches.
 It's OK for there to be some experimentation, as long as developers
make a strong effort to test things thoroughly and avoid flaky or
time-dependent tests.

> 
> > I thought the feature branch merge voting period had been shortened to 5
> >days rather than 7?  We should probably spell this out on
> >https://hadoop.apache.org/bylaws.html 
> 
> Thanks for the link, right now it says 7 days. That is why I assumed it
> is 7. 
> Would you be kind enough to point me to a thread that says it is 5 days
> for a merge Vote? 
> I did a google search, but was not able to find a thread like that.
> Thanks in advance.

Hmm, perhaps I was thinking of the release vote process.  Can anyone
confirm?  It would be nice if this information could appear on the
bylaws page...

best,
Colin

> 
> Thanks
> Anu
> 
> 
> On 6/13/16, 11:51 AM, "Colin McCabe"  wrote:
> 
> >On Sun, Jun 12, 2016, at 05:06, Steve Loughran wrote:
> >> > On 10 Jun 2016, at 20:37, Anu Engineer  wrote:
> >> > 
> >> > I actively work on two branches (Diskbalancer and ozone) and I agree 
> >> > with most of what Sangjin said. 
> >> > There is an overhead in working with branches, there are both technical 
> >> > costs and administrative issues 
> >> > which discourages developers from using branches.
> >> > 
> >> > I think the biggest issue with branch based development is that fact 
> >> > that other developers do not use a branch.
> >> > If a small feature appears as a series of commits to 
> >> > Ã¢ÂÂÃ¢ÂÂdatanode.javaÃ¢ÂÂÃ¢ÂÂ, the branch based developer ends 
> >> > up rebasing 
> >> > and paying this price of rebasing many times. If everyone followed a 
> >> > model of branch + Pull request, other branches
> >> > would not have to deal with continues rebasing to trunk commits. If we 
> >> > are moving to a branch based 
> >
> >Even if everyone used branches for all development, person X might merge
> >their branch before person Y, forcing person Y to do a rebase or merge. 
> >It is not the presence of absence of branches that causes the need to
> >merge or rebase, but the presence of absence of "churn."
> >
> >We try to minimize "churn" in many ways.  For example, we discourage
> >people from making trivial whit

Re: [DISCUSS] Increased use of feature branches

2016-06-13 Thread Colin McCabe

On Sun, Jun 12, 2016, at 05:06, Steve Loughran wrote:
> > On 10 Jun 2016, at 20:37, Anu Engineer  wrote:
> > 
> > I actively work on two branches (Diskbalancer and ozone) and I agree with 
> > most of what Sangjin said. 
> > There is an overhead in working with branches, there are both technical 
> > costs and administrative issues 
> > which discourages developers from using branches.
> > 
> > I think the biggest issue with branch based development is that fact that 
> > other developers do not use a branch.
> > If a small feature appears as a series of commits to 
> > ââdatanode.javaââ, the branch based developer ends up rebasing 
> > and paying this price of rebasing many times. If everyone followed a model 
> > of branch + Pull request, other branches
> > would not have to deal with continues rebasing to trunk commits. If we are 
> > moving to a branch based 

Even if everyone used branches for all development, person X might merge
their branch before person Y, forcing person Y to do a rebase or merge. 
It is not the presence of absence of branches that causes the need to
merge or rebase, but the presence of absence of "churn."

We try to minimize "churn" in many ways.  For example, we discourage
people from making trivial whitespace changes to parts of the code
they're not modifying in their patch.  Or doing things like letting
their editor change the line ending of files from LF to CR/LF.  However,
in the final analysis, churn will always exist because development
exists.

> > development, we should probably move to that model for most development to 
> > avoid this tax on people who
> > actually end up working in the branches.
> > 
> > I do have a question in my mind though: What is being proposed is that we 
> > move active development to branches 
> > if the feature is small or incomplete, however keep the trunk open for 
> > check-ins. One of the biggest reason why we 
> > check-in into trunk and not to branch-2 is because it is a change that will 
> > break backward compatibility. So do we 
> > have an expectation of backward compatibility thru the 3.0-alpha series (I 
> > personally vote No, since 3.0 is experimental 
> > at this stage), but if we decide to support some sort of backward-compact 
> > then willy-nilly committing to trunk 
> > and still maintaining the expectation we can release Alphas from 3.0 does 
> > not look possible.
> > 
> > And then comes the question, once 3.0 becomes official, where do we 
> > check-in a change,  if that would break something? 
> > so this will lead us back to trunk being the unstable â 3.0 being the new 
> > âbranch-2â.

I'm not sure I really understand the goal of the "trunk-incompat"
proposal.  Like Karthik asked earlier in this thread, isn't it really
just a rename of the existing trunk branch?
It sounds like the policy is going to be exactly the same as now:
incompatible stuff in trunk/trunk-incompat/whatever, 3.x compatible
changes in the 3.x line, 2.x compatible changes in the 2.x line, etc.
etc.

I think we should just create branch-3 and follow the same policy we
followed with branch-2 and branch-1.  Switching around the names doesn't
really change the policy, and it creates confusion since it's
inconsistent with what we did earlier.

I think one of the big frustrations with trunk is that features sat
there a while without being released because they weren't compatible
with branch-2-- the shell script rewrite, for example.  However, this
reflects a fundamental tradeoff-- either incompatible features can't be
developed at all in the lifetime of Hadoop 3.x, or we will need
somewhere to put them.  The trunk-incompat proposal is like saying that
you've solved the prison overcrowding problem by renaming all prisons to
"correctional facilities."

> > 
> > One more point: If we are moving to use a branch always â then we are 
> > looking at a model similar to using a git + pull 
> > request model. If that is so would it make sense to modify the rules to 
> > make these branches easier to merge?
> > Say for example, if all commits in a branch has followed review and 
> > checking policy â just like trunk and commits 
> > have been made only after a sign off from a committer, would it be possible 
> > to merge with a 3-day voting period 
> > instead of 7, or treat it just like todayâs commit to trunk â but with 
> > 2 people signing-off? 

I thought the feature branch merge voting period had been shortened to 5
days rather than 7?  We should probably spell this out on
https://hadoop.apache.org/bylaws.html .  Like I said above, I don't
believe that *all* development should be on feature branches, just
biggish stuff that is likely to be controversial and/or disruptive.  The
suggestion I made earlier is that if 3 people ask you for a branch, you
should definitely strongly consider a branch.

I do think we should shorten the voting period for adding new branch
committers... making it 3 or 4 days would be fine.  After all, the work
of br

Re: Compile proto

2016-05-10 Thread Colin McCabe

Hi Kun Ren,

You have to add your new proto file to the relevant pom.xml file.

best,
Colin

On Fri, May 6, 2016, at 13:04, Kun Ren wrote:
> Hi Genius,
> 
> I added a new proto into the
> HADOOP_DIR/hadoop-common-project/hadoop-common/src/main/proto,
> 
> however,every time when I run the following Maven commands:
> 
>mvn install -DskipTests
>mvn eclipse:eclipse -DdownloadSources=true -DdownloadJavadocs=
> true
> 
> It only compiles all other protoes, but don't compile my added new
> proto,do
> you know why and how can I configure it? Otherwise I have to compile the
> new proto by hand.
> 
> Thanks a lot for your help.

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Re: Another thought on client-side support of HDFS federation

2016-05-02 Thread Colin McCabe

Hi Tianyi HE,

Thanks for sharing this!  This reminds me of the httpfs daemon.  This
daemon basically sits in front of an HDFS cluster and accepts requests,
which it serves by forwarding them to the underlying HDFS instance. 
There is some documentation about it here:
https://hadoop.apache.org/docs/stable/hadoop-hdfs-httpfs/index.html

Since httpfs uses an org.apache.hadoop.fs.FileSystem instance, it seems
like you could plug in the apache.hadoop.fs.viewfs.ViewFileSystem class
and be up and running with federation.  I haven't tried this, but I
would expect that it would work, unless there are bugs in ViewFS itself.

The big advantage of httpfs is that it provides a webhdfs-style REST
interface.  As you said, this kind of interface makes it simple to use
any language with REST bindings, without worrying about using a thick
client.

The big disadvantage of httpfs is that you must move both metadata and
data operations through the httpfs daemon.  This could become a
performance bottleneck.  It seems like you are concerned about this
bottleneck.

We also have webhdfs.  Unlike httpfs, webhdfs doesn't require all the
data to move through its daemon.  With webhdfs, the client talks to
DataNodes directly.

I wonder if extending httpfs or webhdfs would be a better approach than
starting from scratch.  There is a maintenance burden for adding new
services and daemons.  This was our motivation for removing hftp, for
example.  It's certainly something to think about.

best,
Colin

On Thu, Apr 28, 2016, at 17:55, 何天一 wrote:
> Hey guys,
> 
> My associates have investigated HDFS federation recently, which, turns
> out
> to be a quite good solution for improving scalability on
> NameNode/DataNode
> side.
> 
> However, we encountered some problem on client-side. Since:
> A) For historical reason, we use clients in multiple languages to access
> HDFS, (i.e. python-snakebite, or perhaps libhdfs++). So we either
> implement
> multiple versions of ViewFS or we give up the consistency view (which can
> be confusing to user).
> B) We have hadoop client configuration deployed on client nodes, which we
> do not have control over . Also, releasing new configuration could be a
> real heavy operation because it needs to be pushed to several thousand of
> nodes, as well as maintaining consistency (say a node is down throughout
> the operation, then come back online. it could still possess a stale
> version of configuration).
> 
> So we intended to explore another solution to these problems, and came up
> with a proxy model.
> That is, build a RPC proxy in front of NameNodes.
> All clients talk to proxy when they need to consult NameNode, then proxy
> decide which NameNode should the request go to according to mount table.
> This solved our problem. All clients are seamlessly upgraded with
> federation support.
> We open sourced the proxy recently: https://github.com/bytedance/nnproxy
> (BTW, all kinds of feedbacks are welcomed)
> 
> But there are still a few issues. For example, several modifications
> needs
> to be done inside hadoop ipc to support rpc forwarding. We released patch
> according to which with nnproxy project (
> https://github.com/bytedance/nnproxy/tree/master/hadoop-patches). But it
> could be better to have these merged to apache trunk. Does someone think
> it's worth?
> 
> 
> -- 
> Cheers,
> Tianyi HE
> (+86) 185 0042 4096

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Re: 2.7.3 release plan

2016-04-04 Thread Colin McCabe

I agree that HDFS-8578 should be a prerequisite for backporting
HDFS-8791.

I think we are overestimating the number of people affected by
HDFS-8791, and underestimating the disruption that would be caused by a
layout version upgrade in a dot release.  As Andrew, Sean, and others in
the thread pointed out, this could reduce people's trust in the
stabilization branches.

The maintenance branches were never intended to live forever. 
Eventually, people should start using newer releases.  A more efficient
DataNode layout is just one more motivation to upgrade.

best,
Colin


On Fri, Apr 1, 2016, at 14:54, Chris Trezzo wrote:
> A few thoughts:
> 
> 1. To echo Andrew Wang, HDFS-8578 (parallel upgrades) should be a
> prerequisite for HDFS-8791. Without that patch, upgrades can be very slow
> for data nodes depending on your setup.
> 
> 2. We have already deployed this patch internally so, with my Twitter hat
> on, I would be perfectly happy as long as it makes it into trunk and 2.8.
> That being said, I would be hesitant to deploy the current 2.7.x or 2.6.x
> releases on a large production cluster that has a diverse set of block
> ids
> without this patch, especially if your data nodes have a large number of
> disks or you are using federation. To be clear though: this highly
> depends
> on your setup and at a minimum you should verify that this regression
> will
> not affect you. The current block-id based layout in 2.6.x and 2.7.2 has
> a
> performance regression that gets worse over time. When you see it
> happening
> on a live cluster, it is one of the harder issues to identify a root
> cause
> and debug. I do understand that this is currently only affecting a
> smaller
> number of users, but I also think this number has potential to increase
> as
> time goes on. Maybe we can issue a warning in the release notes for
> future
> 2.7.x and 2.6.x releases?
> 
> 3. One option (this was suggested on HDFS-8791 and I think Sean alluded
> to
> this proposal on this thread) would be to cut a 2.8 release off of the
> 2.7.3 release with the new layout. What people currently think of as 2.8
> would then become 2.9. This would give customers a stable release that
> they
> could deploy with the new layout and would not break upgrade and
> downgrade
> expectations.
> 
> On Fri, Apr 1, 2016 at 11:32 AM, Andrew Purtell 
> wrote:
> 
> > As a downstream consumer of Apache Hadoop 2.7.x releases, I expect we would
> > patch the release to revert HDFS-8791 before pushing it out to production.
> > For what it's worth.
> >
> >
> > On Fri, Apr 1, 2016 at 11:23 AM, Andrew Wang 
> > wrote:
> >
> > > One other thing I wanted to bring up regarding HDFS-8791, we haven't
> > > backported the parallel DN upgrade improvement (HDFS-8578) to branch-2.6.
> > > HDFS-8578 is a very important related fix since otherwise upgrade will be
> > > very slow.
> > >
> > > On Thu, Mar 31, 2016 at 10:35 AM, Andrew Wang 
> > > wrote:
> > >
> > > > As I expressed on HDFS-8791, I do not want to include this JIRA in a
> > > > maintenance release. I've only seen it crop up on a handful of our
> > > > customer's clusters, and large users like Twitter and Yahoo that seem
> > to
> > > be
> > > > more affected are also the most able to patch this change in
> > themselves.
> > > >
> > > > Layout upgrades are quite disruptive, and I don't think it's worth
> > > > breaking upgrade and downgrade expectations when it doesn't affect the
> > > (in
> > > > my experience) vast majority of users.
> > > >
> > > > Vinod seemed to have a similar opinion in his comment on HDFS-8791, but
> > > > will let him elaborate.
> > > >
> > > > Best,
> > > > Andrew
> > > >
> > > > On Thu, Mar 31, 2016 at 9:11 AM, Sean Busbey 
> > > wrote:
> > > >
> > > >> As of 2 days ago, there were already 135 jiras associated with 2.7.3,
> > > >> if *any* of them end up introducing a regression the inclusion of
> > > >> HDFS-8791 means that folks will have cluster downtime in order to back
> > > >> things out. If that happens to any substantial number of downstream
> > > >> folks, or any particularly vocal downstream folks, then it is very
> > > >> likely we'll lose the remaining trust of operators for rolling out
> > > >> maintenance releases. That's a pretty steep cost.
> > > >>
> > > >> Please do not include HDFS-8791 in any 2.6.z release. Folks having to
> > > >> be aware that an upgrade from e.g. 2.6.5 to 2.7.2 will fail is an
> > > >> unreasonable burden.
> > > >>
> > > >> I agree that this fix is important, I just think we should either cut
> > > >> a version of 2.8 that includes it or find a way to do it that gives an
> > > >> operational path for rolling downgrade.
> > > >>
> > > >> On Thu, Mar 31, 2016 at 10:10 AM, Junping Du 
> > > wrote:
> > > >> > Thanks for bringing up this topic, Sean.
> > > >> > When I released our latest Hadoop release 2.6.4, the patch of
> > > HDFS-8791
> > > >> haven't been committed in so that's why we didn't discuss this
> > earlier.
> > > >> > I remember in JIRA dis

Re: Revive HADOOP-2705?

2015-12-18 Thread Colin McCabe

Reading files from HDFS has different performance characteristics than
reading local files.  For one thing, HDFS does a few megabyes of
readahead internally by default.  If you are going to make a
performance improvement suggestion, I would strongly encourage you to
test it first.

cheers,
Colin


On Tue, Dec 15, 2015 at 2:22 PM, dam6923 .  wrote:
> Here was the justification from 2004:
>
> https://bugs.openjdk.java.net/browse/JDK-4953311
>
>
> Also, some research into the matter (not my own):
>
> http://nadeausoftware.com/articles/2008/02/java_tip_how_read_files_quickly
>
> One of the conclusions:
>
> "Minimize I/O operations by reading an array at a time, not a byte at
> a time. An 8Kbyte array is a good size."
>
>
> On Tue, Dec 15, 2015 at 3:41 PM, Colin McCabe  wrote:
>> Hi David,
>>
>> Do you have benchmarks to justify changing this configuration?
>>
>> best,
>> Colin
>>
>> On Wed, Dec 9, 2015 at 8:05 AM, dam6923 .  wrote:
>>> Hello!
>>>
>>> A while back, Java 1.6, the size of the internal internal file-reading
>>> buffers were bumped-up to 8192 bytes.
>>>
>>> http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/io/BufferedInputStream.java
>>>
>>> Perhaps it's time to update Hadoop to at least this default level too. :)
>>>
>>> https://issues.apache.org/jira/browse/HADOOP-2705
>>>
>>> Thanks,
>>> David

Re: Revive HADOOP-2705?

2015-12-15 Thread Colin McCabe

Hi David,

Do you have benchmarks to justify changing this configuration?

best,
Colin

On Wed, Dec 9, 2015 at 8:05 AM, dam6923 .  wrote:
> Hello!
>
> A while back, Java 1.6, the size of the internal internal file-reading
> buffers were bumped-up to 8192 bytes.
>
> http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/io/BufferedInputStream.java
>
> Perhaps it's time to update Hadoop to at least this default level too. :)
>
> https://issues.apache.org/jira/browse/HADOOP-2705
>
> Thanks,
> David

Re: DISCUSS: is the order in FS.listStatus() required to be sorted?

2015-06-16 Thread Colin McCabe

On Tue, Jun 16, 2015 at 3:02 AM, Steve Loughran  wrote:
>
>> On 15 Jun 2015, at 21:22, Colin P. McCabe  wrote:
>>
>> One possibility is that we could randomize the order of returned
>> results in HDFS (at least within a given batch of results returned
>> from the NN).  This is similar to how the Go programming language
>> randomizes the order of iteration over hash table keys, to avoid code
>> being written which relies on a specific implementation-defined
>> ordering.
>>
>> Regardless of whether we do that, though, there is a bunch of code
>> even in Hadoop common that doesn't properly deal with unsorted
>> listStatus / globStatus... such as "hadoop fs -ls"
>
> something we could make an option for tests...be fun to see what happens. I 
> wouldn't inflict it on production, as people would only hate us for breaking 
> things. Again

Well, we do inflict it on production.  LocalFileSystem has always
returned unsorted results.  And most stuff that works with HDFS is
capable of running against LocalFileSystem.

Colin

Re: HDFS audit log

2015-05-05 Thread Colin McCabe

I think HDFS INotify is a better choice if you need:
* guaranteed backwards compatibility
* rapid and unambiguous parsing (via protobuf)
* clear Java API for retrieving the data (I.e. not rsync on a text file)
* ability to resume reading at a given point if the consumer process fails

We are using it in production for this purpose, via Cloudera Manager.  It
would work well with Kafka or Flume or whatever.

The audit log is just a human readable log file.  Its format has never been
fixed or even formally specified.

Colin
On Apr 25, 2015 7:58 AM, "Allen Wittenauer"  wrote:

>
> I think we need to have a discussion about the HDFS audit log.
>
> The purpose of the HDFS audit log* is for operations and security
> people to keep track of actual, bits-on-disk changes to HDFS and related
> metadata changes. It is not meant as a catch-all for any and all HDFS
> operations.  It is most definitely processed by code written by people.
> It’s format is meant to be fixed; specifically no new fields and all fields
> should be present on every line. It’s meant to be extremely easy to parse
> for even junior admins.
>
> For the past year, I’ve noticed an extremely disturbing trend:
>
> a) Changes to the log file with BREAKS operations people.
> Part of the problem here is that the compatibility guidelines don’t specify
> that this file is locked.  We should fix this.
>
> b) An increasing number of “we should log this random NN
> operation”.  Unless it modifies the actual data, these are not AUDIT-worthy
> events.  Ask yourself, “would a security person care?”  If the answer is
> no, then don’t put it in the HDFS audit log and just keep an entry in the
> generic namenode log.  If the answer is yes, get a second opinion from
> someone else, preferably outside your team who actually does security.
>
>
> * - if anyone wants the full history, feel free to ask …

Re: fsck output compatibility question with regard to HDFS-7281

2015-05-05 Thread Colin McCabe

How about just having a --json option for the fsck command?  That's what we
did in Ceph for some command line tools.  It would make the output easier
to consume and easier to provide compatibility for.

Colin
On Apr 28, 2015 12:32 PM, "Allen Wittenauer"  wrote:

>
> A lot of the summary information… but the key parts of “yo, these files
> are busted and here’s why” is not, IIRC.  That’s one of the key items where
> people are parsing fsck output (and worse, usually under duress.)
>
> On Apr 28, 2015, at 12:23 PM, Mai Haohui  wrote:
>
> > In terms of the monitoring, we have put a lot of information into the
> > JMX output.
> >
> > It's relatively easy to use python / ruby / node.js to write your own
> > tools to parse the information. In the longer term, it might also make
> > sense to move some of our tools to based on the JMX output instead of
> > making RPC calls to avoid duplicated code.
> >
> > ~Haohui
> >
> > On Tue, Apr 28, 2015 at 11:54 AM, Andrew Wang 
> wrote:
> >> On Tue, Apr 28, 2015 at 11:25 AM, Allen Wittenauer 
> wrote:
> >>
> >>>
> >>> On Apr 28, 2015, at 10:59 AM, Andrew Wang 
> >>> wrote:
> 
>  This is also not something typically upheld by unix-y commands. BSD
> vs.
> >>> GNU
>  already leads to incompatible flags and output. Most of these commands
>  haven't been changed in 20 years, but that doesn't constitute a compat
>  guarantee.
> >>>
> >>>One of the reasons why Solaris doesn’t officially support user
> >>> names greater than 8 characters is because of the breakage to ls and
> what
> >>> that would do with how one parses them.  So, yes, it is upheld in cases
> >>> where it would be too big of a burden on backward compatibility.
> (That’s
> >>> the easy example, I could give a lot more from my days at Sun if you’d
> >>> like.)
> >>>
> >>> Yup, agree. I know HFS and NTFS support case-insensitivity for similar
> >> reasons. Not sure changing fsck or dfsadmin is quite the same level
> though
> >> :)
> >>
> >>
>  This is something I'd like to follow for our own commands. We provide
>  different APIs for machine consumption vs. human consumption, and make
> >>> this
>  clear in the compat guide. Of course, we should still be judicious
> when
>  changing the human output, but I just don't see a good way forward
> >>> without
>  relaxing our current compat guidelines.
> >>>
> >>>I think that’s a great suggestion.
> >>>
> >>> Allen, do you have a "top 3" for shell commands that need the
> "plumbing"
> >> treatment? That'd be a good place to start. Yongjun expressed some
> interest
> >> to me in working on this, and I think it'd be a great place for new
> >> contributors too. We can probably crib ideas from what git did. Once
> that's
> >> in place, we can think about changing this part of the compat
> guidelines.
> >>
> >>
>  The other thing to consider is providing supported Java APIs for the
>  commonly-parsed shell commands. This is something we have much more
>  experience with.
> >>>
> >>>I think people forget about who the customer of some of these
> >>> interfaces actually are.  I can probably count the number of ops
> people I
> >>> know who speak Java frequently enough to be comfortable with it for
> every
> >>> day use on two hands. In this particular case, fsck is, by and far, an
> ops
> >>> tool.  Give us perl and/or python and/or ruby bindings.  That was the
> >>> promise of protobuf, right?  But Java? Yeah, no thanks, I’ll continue
> >>> processing it from stdin with a couple lines of perl than deal with the
> >>> mountains of Java cruft.
> >>
> >>
> >> My idea behind providing a Java API is for monitoring tools (i.e. CM,
> >> Ambari). I suspect some of the info available in shell commands is not
> also
> >> available through another API, which forces tools that are okay with
> Java
> >> to instead parse shell output. We still don't have a python / etc client
> >> that doesn't wrap a JVM, so alternate language bindings are tough right
> now.
> >>
> >> In terms of the ops experience, I'm hoping that "plumbing" (which will
> be
> >> more difficult to use) will meet needs for long-lived scripts, while
> >> "porcelain" will be okay for adhoc one-off usage. This makes porcelain
> more
> >> okay to break, since I rewrite my grep/cut/awk pipelines each time
> anyway.
> >>
> >> Best,
> >> Andrew
>
>

Re: upstream jenkins build broken?

2015-03-11 Thread Colin McCabe

On Wed, Mar 11, 2015 at 2:34 PM, Chris Nauroth  wrote:
> The only thing I'm aware of is the failOnError option:
>
> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-errors
> .html
>
>
> I prefer that we don't disable this, because ignoring different kinds of
> failures could leave our build directories in an indeterminate state.  For
> example, we could end up with an old class file on the classpath for test
> runs that was supposedly deleted.
>
> I think it's worth exploring Eddy's suggestion to try simulating failure
> by placing a file where the code expects to see a directory.  That might
> even let us enable some of these tests that are skipped on Windows,
> because Windows allows access for the owner even after permissions have
> been stripped.

+1.  JIRA?

Colin

>
> Chris Nauroth
> Hortonworks
> http://hortonworks.com/
>
>
>
>
>
>
> On 3/11/15, 2:10 PM, "Colin McCabe"  wrote:
>
>>Is there a maven plugin or setting we can use to simply remove
>>directories that have no executable permissions on them?  Clearly we
>>have the permission to do this from a technical point of view (since
>>we created the directories as the jenkins user), it's simply that the
>>code refuses to do it.
>>
>>Otherwise I guess we can just fix those tests...
>>
>>Colin
>>
>>On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu  wrote:
>>> Thanks a lot for looking into HDFS-7722, Chris.
>>>
>>> In HDFS-7722:
>>> TestDataNodeVolumeFailureXXX tests reset data dir permissions in
>>>TearDown().
>>> TestDataNodeHotSwapVolumes reset permissions in a finally clause.
>>>
>>> Also I ran mvn test several times on my machine and all tests passed.
>>>
>>> However, since in DiskChecker#checkDirAccess():
>>>
>>> private static void checkDirAccess(File dir) throws DiskErrorException {
>>>   if (!dir.isDirectory()) {
>>> throw new DiskErrorException("Not a directory: "
>>>  + dir.toString());
>>>   }
>>>
>>>   checkAccessByFileMethods(dir);
>>> }
>>>
>>> One potentially safer alternative is replacing data dir with a regular
>>> file to stimulate disk failures.
>>>
>>> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
>>> wrote:
>>>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>>>> TestDataNodeVolumeFailureReporting, and
>>>> TestDataNodeVolumeFailureToleration all remove executable permissions
>>>>from
>>>> directories like the one Colin mentioned to simulate disk failures at
>>>>data
>>>> nodes.  I reviewed the code for all of those, and they all appear to be
>>>> doing the necessary work to restore executable permissions at the end
>>>>of
>>>> the test.  The only recent uncommitted patch I¹ve seen that makes
>>>>changes
>>>> in these test suites is HDFS-7722.  That patch still looks fine
>>>>though.  I
>>>> don¹t know if there are other uncommitted patches that changed these
>>>>test
>>>> suites.
>>>>
>>>> I suppose it¹s also possible that the JUnit process unexpectedly died
>>>> after removing executable permissions but before restoring them.  That
>>>> always would have been a weakness of these test suites, regardless of
>>>>any
>>>> recent changes.
>>>>
>>>> Chris Nauroth
>>>> Hortonworks
>>>> http://hortonworks.com/
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 3/10/15, 1:47 PM, "Aaron T. Myers"  wrote:
>>>>
>>>>>Hey Colin,
>>>>>
>>>>>I asked Andrew Bayer, who works with Apache Infra, what's going on with
>>>>>these boxes. He took a look and concluded that some perms are being
>>>>>set in
>>>>>those directories by our unit tests which are precluding those files
>>>>>from
>>>>>getting deleted. He's going to clean up the boxes for us, but we should
>>>>>expect this to keep happening until we can fix the test in question to
>>>>>properly clean up after itself.
>>>>>
>>>>>To help narrow down which commit it was that started this, Andrew sent
>>>>>me
>>>>>this info:
>>>>>
>>>>>"/home/jenkins/jenkins-

Re: upstream jenkins build broken?

2015-03-11 Thread Colin McCabe

Is there a maven plugin or setting we can use to simply remove
directories that have no executable permissions on them?  Clearly we
have the permission to do this from a technical point of view (since
we created the directories as the jenkins user), it's simply that the
code refuses to do it.

Otherwise I guess we can just fix those tests...

Colin

On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu  wrote:
> Thanks a lot for looking into HDFS-7722, Chris.
>
> In HDFS-7722:
> TestDataNodeVolumeFailureXXX tests reset data dir permissions in TearDown().
> TestDataNodeHotSwapVolumes reset permissions in a finally clause.
>
> Also I ran mvn test several times on my machine and all tests passed.
>
> However, since in DiskChecker#checkDirAccess():
>
> private static void checkDirAccess(File dir) throws DiskErrorException {
>   if (!dir.isDirectory()) {
> throw new DiskErrorException("Not a directory: "
>  + dir.toString());
>   }
>
>   checkAccessByFileMethods(dir);
> }
>
> One potentially safer alternative is replacing data dir with a regular
> file to stimulate disk failures.
>
> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth  
> wrote:
>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>> TestDataNodeVolumeFailureReporting, and
>> TestDataNodeVolumeFailureToleration all remove executable permissions from
>> directories like the one Colin mentioned to simulate disk failures at data
>> nodes.  I reviewed the code for all of those, and they all appear to be
>> doing the necessary work to restore executable permissions at the end of
>> the test.  The only recent uncommitted patch I¹ve seen that makes changes
>> in these test suites is HDFS-7722.  That patch still looks fine though.  I
>> don¹t know if there are other uncommitted patches that changed these test
>> suites.
>>
>> I suppose it¹s also possible that the JUnit process unexpectedly died
>> after removing executable permissions but before restoring them.  That
>> always would have been a weakness of these test suites, regardless of any
>> recent changes.
>>
>> Chris Nauroth
>> Hortonworks
>> http://hortonworks.com/
>>
>>
>>
>>
>>
>>
>> On 3/10/15, 1:47 PM, "Aaron T. Myers"  wrote:
>>
>>>Hey Colin,
>>>
>>>I asked Andrew Bayer, who works with Apache Infra, what's going on with
>>>these boxes. He took a look and concluded that some perms are being set in
>>>those directories by our unit tests which are precluding those files from
>>>getting deleted. He's going to clean up the boxes for us, but we should
>>>expect this to keep happening until we can fix the test in question to
>>>properly clean up after itself.
>>>
>>>To help narrow down which commit it was that started this, Andrew sent me
>>>this info:
>>>
>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/ has
>>>500 perms, so I'm guessing that's the problem. Been that way since 9:32
>>>UTC
>>>on March 5th."
>>>
>>>--
>>>Aaron T. Myers
>>>Software Engineer, Cloudera
>>>
>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe 
>>>wrote:
>>>
 Hi all,

 A very quick (and not thorough) survey shows that I can't find any
 jenkins jobs that succeeded from the last 24 hours.  Most of them seem
 to be failing with some variant of this message:

 [ERROR] Failed to execute goal
 org.apache.maven.plugins:maven-clean-plugin:2.5:clean (default-clean)
 on project hadoop-hdfs: Failed to clean project: Failed to delete


/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hdfs-pr
oject/hadoop-hdfs/target/test/data/dfs/data/data3
 -> [Help 1]

 Any ideas how this happened?  Bad disk, unit test setting wrong
 permissions?

 Colin

>>
>
>
>
> --
> Lei (Eddy) Xu
> Software Engineer, Cloudera

Re: 2.7 status

2015-02-17 Thread Colin McCabe

+1 for starting thinking about releasing 2.7 soon.

Re: building Windows binaries.  Do we release binaries for all the
Linux and UNIX architectures?  I thought we didn't.  It seems a little
inconsistent to release binaries just for Windows, but not for those
other architectures and OSes.  I wonder if we can improve this
situation?

best,
Colin

On Fri, Feb 13, 2015 at 4:36 PM, Karthik Kambatla  wrote:
> 2 weeks from now (end of Feb) sounds reasonable. The one feature I would
> like for to be included is shared-cache: we are pretty close - two more
> main items to take care of.
>
> In an offline conversation, Steve mentioned building Windows binaries for
> our releases. Do we want to do that for 2.7? If so, can anyone with Windows
> expertise setup a Jenkins job to build these artifacts, and may be hook it
> up to https://builds.apache.org/job/HADOOP2_Release_Artifacts_Builder/
>
>
>
> On Fri, Feb 13, 2015 at 11:07 AM, Arun Murthy  wrote:
>
>> My bad, been sorted distracted.
>>
>> I agree, we should just roll fwd a 2.7 ASAP with all the goodies.
>>
>> What sort of timing makes sense? 2 week hence?
>>
>> thanks,
>> Arun
>>
>> 
>> From: Jason Lowe 
>> Sent: Friday, February 13, 2015 8:11 AM
>> To: common-...@hadoop.apache.org
>> Subject: Re: 2.7 status
>>
>> I'd like to see a 2.7 release sooner than later.  It has been almost 3
>> months since Hadoop 2.6 was released, and there have already been 634 JIRAs
>> committed to 2.7.  That's a lot of changes waiting for an official release.
>>
>> https://issues.apache.org/jira/issues/?jql=project%20in%20%28hadoop%2Chdfs%2Cyarn%2Cmapreduce%29%20AND%20fixversion%3D2.7.0%20AND%20resolution%3DFixed
>> Jason
>>
>>   From: Sangjin Lee 
>>  To: "common-...@hadoop.apache.org" 
>>  Sent: Tuesday, February 10, 2015 1:30 PM
>>  Subject: 2.7 status
>>
>> Folks,
>>
>> What is the current status of the 2.7 release? I know initially it started
>> out as a "java-7" only release, but looking at the JIRAs that is very much
>> not the case.
>>
>> Do we have a certain timeframe for 2.7 or is it time to discuss it?
>>
>> Thanks,
>> Sangjin
>>
>>
>>
>
>
> --
> Karthik Kambatla
> Software Engineer, Cloudera Inc.
> 
> http://five.sentenc.es

Re: max concurrent connection to HDFS name node

2015-02-12 Thread Colin McCabe

The NN can do somewhere around 30,000 - 50,000 RPCs per second
currently, depending on configuration.  In general you do not want to
have extremely high NN RPC traffic, because it will slow things down.
You might consider re-architecting your application to do more DN
traffic and less NN traffic, if possible.  Hope that helps.

best,
Colin

On Tue, Feb 10, 2015 at 4:29 PM, Demai Ni  wrote:
> hi, folks,
>
> Is there a max limit of concurrent connection to a name node? or whether
> there is a best practice?
>
> My scenario is simple. Client(java/c++) program will open a connection
> through hdfs api call, and then open a few hdfs files, maybe read a bit
> data, then close the connection. In some case, the number of clients may
> be  50,000~100,000 concurrently. Is the number of connection acceptable?
>
> Thanks.
>
> Demai

Re: NFSv3 Filesystem Connector

2015-01-14 Thread Colin McCabe

Why not just use LocalFileSystem with an NFS mount (or several)?  I read
through the README but I didn't see that question answered anywhere.

best,
Colin

On Tue, Jan 13, 2015 at 1:35 PM, Gokul Soundararajan  wrote:

> Hi,
>
> We (Jingxin Feng, Xing Lin, and I) have been working on providing a
> FileSystem implementation that allows Hadoop to utilize a NFSv3 storage
> server as a filesystem. It leverages code from hadoop-nfs project for all
> the request/response handling. We would like your help to add it as part of
> hadoop tools (similar to the way hadoop-aws and hadoop-azure).
>
> In more detail, the Hadoop NFS Connector allows Apache Hadoop (2.2+) and
> Apache Spark (1.2+) to use a NFSv3 storage server as a storage endpoint.
> The NFS Connector can be run in two modes: (1) secondary filesystem - where
> Hadoop/Spark runs using HDFS as its primary storage and can use NFS as a
> second storage endpoint, and (2) primary filesystem - where Hadoop/Spark
> runs entirely on a NFSv3 storage server.
>
> The code is written in a way such that existing applications do not have to
> change. All one has to do is to copy the connector jar into the lib/
> directory of Hadoop/Spark. Then, modify core-site.xml to provide the
> necessary details.
>
> The current version can be seen at:
> https://github.com/NetApp/NetApp-Hadoop-NFS-Connector
>
> It is my first time contributing to the Hadoop codebase. It would be great
> if someone on the Hadoop team can guide us through this process. I'm
> willing to make the necessary changes to integrate the code. What are the
> next steps? Should I create a JIRA entry?
>
> Thanks,
>
> Gokul
>

Re: Symbolic links disablement

2014-12-31 Thread Colin McCabe

As far as I know, nobody is working on this at the moment.  There are
a lot of issues that would need to be worked through before we could
enable symlinks in production.

We never quite agreed on the semantics of how symlinks should work...
for example, some people advocated that listing a directory should
list the resolved names of all symlinks in it, while others argued
that this would impose too great a performance load on clients listing
directories with symlinks.  Similarly, some people argued that
cross-filesystem symlinks should be banned, partly because they can't
be optimized very effectively.

Then there were a bunch of security issues.  Basically any
higher-level software that is relying on path-based access will have
problems with symlinks.  For example, Hive assumes that if you limit a
user's access to just things under /home/username, then you have
effectively sandboxed that person.  But if you can create a symlink
from /home/username/foo to /foo, then you've effectively broken out of
Hive's sandbox.  Since Hive often runs with elevated permissions, and
is willing access files under /home/username with those permissions,
this would be disastrous.  Hive is just one example, of course...
basically we'd have to audit all software using HDFS for this kind of
problem before enabling symlinks.

You can see a list of all these issues and more at:
https://issues.apache.org/jira/browse/HADOOP-10019.

best,
Colin

On Thu, Dec 25, 2014 at 12:30 PM, Ananth Gundabattula
 wrote:
> Hello All,
>
> Happy holidays.
>
>
> I was wondering if Symbolic links would be re-enabled anytime in the near
> future ? https://issues.apache.org/jira/browse/HADOOP-10020
>
> I am using CDH VM and the moment I try to use the
> FileContext.createSymlink() I get an error stating " Symbolic links not
> supported" message ( UnsupportedOperationException ). The release notes
> from CDH states that HADOOP-10020 is currently in the binary release.
>
> Is there any expected time line for this feature to be put back into the
> main trunk ? If it is already, could anyone point me to the hadoop release
> that the symbolic link has been reenabled ?
>
> Thanks for your time.
>
>
> Regards,
> Ananth

Re: Switching to Java 7

2014-12-08 Thread Colin McCabe

On Mon, Dec 8, 2014 at 7:46 AM, Steve Loughran  wrote:
> On 8 December 2014 at 14:58, Ted Yu  wrote:
>
>> Looks like there was still OutOfMemoryError :
>>
>>
>> https://builds.apache.org/job/Hadoop-Hdfs-trunk/1964/testReport/junit/org.apache.hadoop.hdfs.server.namenode.snapshot/TestRenameWithSnapshots/testRenameDirAcrossSnapshottableDirs/
>>
>
> Well, I'm going to ignore that for now as it's a java 8 problem, surfacing
> this weekend once the builds were actually switched to Java 8. memory size
> tuning can continue.
>
> I have now committed the Java 7+ only patch to branch-2 and up: new code
> does not have to worry about java 6 compatibility unless they plan to
> backport to Java 2.6 or earlier. Having written some Java 7 code, the <>
> constructor for typed classes are a convenience, the multiple-catch entries
> more useful, as they eliminate duplicate code in exception handling.
>
> Getting this patch in has revealed that the Jenkins builds of hadoop are
> (a) a bit of a mess and (b) prone to race conditions related to the m2
> repository if >1 project builds simultaneously. The way the nightly builds
> are staggered means this doesn't usually surface, but it may show up during
> precommit/postcommit builds.

It would be nice if we could have a separate .m2 directory per test executor.

It seems like that would eliminate these race conditions once and for
all, at the cost of storing a few extra jars (proportional to the # of
simultaneous executors)

best,
Colin


>
> The switch to Java 7 as the underlying JDK appears to be triggering
> failures, these are things that the projects themselves are going to have
> to look at.
>
>
> This then, is where we are with builds right now. This is not a consequence
> of the changes to the POM; this list predates that patch. This is Jenkins
> running Hadoop builds and tests with Java 7u55
>
>
> *Working: *
>
> https://builds.apache.org/view/H-L/view/Hadoop/job/Hadoop-branch2/
> https://builds.apache.org/view/H-L/view/Hadoop/job/Hadoop-Yarn-trunk/
>
> *failing tests*
>
> https://builds.apache.org/view/H-L/view/Hadoop/job/Hadoop-Common-2-Build/
> https://builds.apache.org/view/H-L/view/Hadoop/job/Hadoop-Hdfs-trunk/
> https://builds.apache.org/view/H-L/view/Hadoop/job/PreCommit-YARN-Build/
> https://builds.apache.org/view/H-L/view/Hadoop/job/Hadoop-Mapreduce-trunk/
>
> *failing tests on Java 8 (may include OOM)*
>
> https://builds.apache.org/view/H-L/view/Hadoop/job/Hadoop-common-trunk-Java8/
> https://builds.apache.org/view/H-L/view/Hadoop/job/Hadoop-Hdfs-trunk-Java8/
> https://builds.apache.org/view/H-L/view/Hadoop/job/Hadoop-Mapreduce-trunk-Java8/
> https://builds.apache.org/view/H-L/view/Hadoop/job/Hadoop-Yarn-trunk-Java8/
>
>
> *failing with maven internal dependency problems*
>
> https://builds.apache.org/view/H-L/view/Hadoop/job/Hadoop-trunk-Commit/
>
>
> *failing even though it appears to work in the logs*
>
> https://builds.apache.org/view/H-L/view/Hadoop/job/Hadoop-Common-trunk/
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

Re: Thinking ahead to hadoop-2.7

2014-12-08 Thread Colin McCabe

On Fri, Dec 5, 2014 at 11:15 AM, Karthik Kambatla  wrote:
> It would be nice to cut the branch for the next "feature" release (not just
> Java 7) in the first week of January, so we can get the RC out by the end
> of the month?
>
> Yesterday, this came up in an offline discussion on ATS. Given people can
> run 2.6 on Java 7, is there merit to doing 2.7 with the exact same bits
> targeting Java 7? I am okay with going through with it, as long as it
> doesn't delay the next feature release.
>
> Thoughts?

That's a good point.  I think it's important to point out that most of
our users are already on JDK7.  We shouldn't think of this decision as
adding support for something new, we should think about it as taking
something away (JDK6 support).  I think it's good that we are finally
moving away from supporting JDK6, but I'm not completely sure that we
need to devote a whole release to that.  Are there a lot of open JDK7
issues that would require a release to straighten out?

best,
Colin


>
> On Wed, Dec 3, 2014 at 8:59 AM, Sangjin Lee  wrote:
>
>> Late January sounds fine to me. I think we should be able to wrap it up
>> much earlier than that (hopefully).
>>
>> Thanks,
>> Sangjin
>>
>> On Tue, Dec 2, 2014 at 5:19 PM, Arun C Murthy  wrote:
>>
>> > Sangjin/Karthik,
>> >
>> >  How about planning on hadoop-2.8 by late Jan? Thoughts?
>> >
>> > thanks,
>> > Arun
>> >
>> > On Dec 2, 2014, at 11:09 AM, Sangjin Lee  wrote:
>> >
>> > > If 2.7 is being positioned as the JDK7-only release, then it would be
>> > good
>> > > to know how 2.8 lines up in terms of timing. Our interest is landing
>> the
>> > > shared cache feature (YARN-1492)... Thanks.
>> > >
>> > > Sangjin
>> > >
>> > > On Mon, Dec 1, 2014 at 2:55 PM, Karthik Kambatla 
>> > wrote:
>> > >
>> > >> Thanks for starting this thread, Arun.
>> > >>
>> > >> Your proposal seems reasonable to me. I suppose we would like new
>> > features
>> > >> and improvements to go into 2.8 then? If yes, what time frame are we
>> > >> looking at for 2.8? Looking at YARN, it would be nice to get a release
>> > with
>> > >> shared-cache and a stable version of reservation work. I believe they
>> > are
>> > >> well under way and should be ready in a few weeks.
>> > >>
>> > >> Regarding 2.7 release specifics, do you plan to create a branch off of
>> > >> current branch-2.6 and update all issues marked fixed for 2.7 to be
>> > fixed
>> > >> for 2.8?
>> > >>
>> > >> Thanks
>> > >> Karthik
>> > >>
>> > >> On Mon, Dec 1, 2014 at 2:42 PM, Arun Murthy 
>> > wrote:
>> > >>
>> > >>> Folks,
>> > >>>
>> > >>> With hadoop-2.6 out it's time to think ahead.
>> > >>>
>> > >>> As we've discussed in the past, 2.6 was the last release which
>> supports
>> > >>> JDK6.
>> > >>>
>> > >>> I'm thinking it's best to try get 2.7 out in a few weeks (maybe by
>> the
>> > >>> holidays) with just the switch to JDK7 (HADOOP-10530) and possibly
>> > >>> support for JDK-1.8 (as a runtime) via HADOOP-11090.
>> > >>>
>> > >>> This way we can start with the stable base of 2.6 and switch over to
>> > >>> JDK7 to allow our downstream projects to use either for a short time
>> > >>> (hadoop-2.6 or hadoop-2.7).
>> > >>>
>> > >>> I'll update the Roadmap wiki accordingly.
>> > >>>
>> > >>> Thoughts?
>> > >>>
>> > >>> thanks,
>> > >>> Arun
>> > >>>
>> > >>> --
>> > >>> CONFIDENTIALITY NOTICE
>> > >>> NOTICE: This message is intended for the use of the individual or
>> > entity
>> > >> to
>> > >>> which it is addressed and may contain information that is
>> confidential,
>> > >>> privileged and exempt from disclosure under applicable law. If the
>> > reader
>> > >>> of this message is not the intended recipient, you are hereby
>> notified
>> > >> that
>> > >>> any printing, copying, dissemination, distribution, disclosure or
>> > >>> forwarding of this communication is strictly prohibited. If you have
>> > >>> received this communication in error, please contact the sender
>> > >> immediately
>> > >>> and delete it from your system. Thank You.
>> > >>>
>> > >>
>> >
>> > --
>> > Arun C. Murthy
>> > Hortonworks Inc.
>> > http://hortonworks.com/hdp/
>> >
>> >
>> >
>> > --
>> > CONFIDENTIALITY NOTICE
>> > NOTICE: This message is intended for the use of the individual or entity
>> to
>> > which it is addressed and may contain information that is confidential,
>> > privileged and exempt from disclosure under applicable law. If the reader
>> > of this message is not the intended recipient, you are hereby notified
>> that
>> > any printing, copying, dissemination, distribution, disclosure or
>> > forwarding of this communication is strictly prohibited. If you have
>> > received this communication in error, please contact the sender
>> immediately
>> > and delete it from your system. Thank You.
>> >
>>
>
>
>
> --
> -- Karthik Kambatla, Software Engineer, Cloudera
> 
> Q: Why is this email five sentences or less?
> A: http://five.sentenc.es

Re: Why do reads take as long as replicated writes?

2014-11-10 Thread Colin McCabe

I strongly suggest benchmarking a modern version of Hadoop rather than
Hadoop 1.x.  The native CRC stuff from HDFS-3528 greatly reduces CPU
consumption on the read path.  I wrote about some other read path
optimizations in Hadoop 2.x here:
http://www.club.cc.cmu.edu/~cmccabe/d/2014.04_ApacheCon_HDFS_read_path_optimization_presentation.pdf
. I agree with Andrew that Teragen and Teravalidate are probably a
better choice for you.  Look for the bottleneck in your system.

best,
Colin

On Wed, Nov 5, 2014 at 4:10 PM, Eitan Rosenfeld  wrote:
> Daemeon - Indeed, I neglected to mention that I am clearing the caches
> throughout my cluster before running the read benchmark. My expectation
> was to ideally get results that were proportionate to disk I/O, given
> that replicated writes perform twice the disk I/O relative to reads. I've
> verified the I/O with iostat. However, as I mentioned earlier, reads and
> writes converge as the number of files in the workload increases, despite
> the constant ratio of write I/O to read I/O.
>
> Andrew - I've verified that the network is not the bottleneck. (All of the
> links are 10Gb). As you'll see, I suspect that the lack of data-locality
> causes the slowdown because a given node can be responsible for
> serving multiple remote block reads all at once.
>
> I hope my understanding of writes and reads can be confirmed:
>
> Write pipelining allows a node to write, replicate, and receive replicated
> data in parallel. If node A is writing its own data while receiving
> replicated data from node B, node B does not wait for node A to finish
> writing B's replicated data to disk. Rather, node B can begin writing its
> next local block immediately.  Thus, pipelining helps replicated writes
> have good performance.
>
> In contrast, let's assume node A is currently reading a block. If node A
> receives an additional read request from node B, A will take longer to
> serve the block to B because of A's pre-existing read. Because node B
> waits longer for the block to be served from A, there is a delay on node B
> before it attempts to read the next block in the file. Multiple read
> requests from different nodes are a consequence of having no built-in
> data locality with TestDFSIO. Finally, as the number of concurrent tasks
> throughout the cluster increases, the wait time for reads increases.
>
> Is my understanding of these read and write mechanisms correct?
>
> Thank you,
> Eitan

Re: Guava

2014-11-10 Thread Colin McCabe

I'm usually an advocate for getting rid of unnecessary dependencies
(cough, jetty, cough), but a lot of the things in Guava are really
useful.

Immutable collections, BiMap, Multisets, Arrays#asList, the stuff for
writing hashCode() and equals(), String#Joiner, the list goes on.  We
particularly use the Cache/CacheBuilder stuff a lot in HDFS to get
maps with LRU eviction without writing a lot of boilerplate.  The QJM
stuff uses ListenableFuture a lot, although perhaps we could come up
with our own equivalent for that.

On Mon, Nov 10, 2014 at 9:26 AM, Alejandro Abdelnur  wrote:
> IMO we should:
>
> 1* have a clean and thin client API JAR (which does not drag any 3rd party
> dependencies, or a well defined small set -i.e. slf4j & log4j-)
> 2* have a client implementation that uses a classloader to isolate client
> impl 3rd party deps from app dependencies.
>
> #2 can be done using a stock URLClassLoader (i would just subclass it to
> forbid packages in the API JAR and exposed 3rd parties to be loaded from
> the app JAR)
>
> #1 is the tricky thing as our current API modules don't have a clean
> API/impl separation.
>
> thx
> PS: If folks are interested in pursing this, I can put together a prototype
> of how  #2 would work (I don't think it will be more than 200 lines of code)

Absolutely, I agree that we should not be using Guava types in public
APIs.  Guava has not been very responsible with backwards
compatibility, that much is clear.

A client / server jar separation is an interesting idea.  But then we
still have to get rid of Guava and other library deps in the client
jars.  I think it would be more work than it seems.  For example, the
HDFS client uses Guava Cache a lot, so we'd have to write our own
version of this.

Can't we just shade this stuff?  Has anyone tried shading Hadoop's Guava?

best,
Colin


>
>
> On Mon, Nov 10, 2014 at 5:18 AM, Steve Loughran 
> wrote:
>
>> Yes, Guava is a constant pain; there's lots of open JIRAs related to it, as
>> its the one we can't seamlessly upgrade. Not unless we do our own fork and
>> reinsert the missing classes.
>>
>> The most common uses in the code are
>>
>> @VisibleForTesting (easily replicated)
>> and the Precondition.check() operations
>>
>> The latter is also easily swapped out, and we could even add the check they
>> forgot:
>> Preconditions.checkArgNotNull(argname, arg)
>>
>>
>> These are easy; its the more complex data structures that matter more.
>>
>> I think for Hadoop 2.7 & java 7 we need to look at this problem and do
>> something. Even if we continue to ship Guava 11 so that the HBase team
>> don't send any (more) death threats, we can/should rework Hadoop to build
>> and run against Guava 16+ too. That's needed to fix some of the recent java
>> 7/8+ changes.
>>
>> -Everything in v11 dropped from v16 MUST  to be implemented with our own
>> versions.
>> -anything tagged as deprecated in 11+ SHOULD be replaced by newer stuff,
>> wherever possible.
>>
>> I think for 2.7+ we should add some new profiles to the POM, for Java 8 and
>> 9 alongside the new baseline java 7. For those later versions we could
>> perhaps mandate Guava 16.
>>
>>
>>
>> On 10 November 2014 00:42, Arun C Murthy  wrote:
>>
>> > … has been a constant pain w.r.t compatibility etc.
>> >
>> > Should we consider adopting a policy to not use guava in
>> Common/HDFS/YARN?
>> >
>> > MR doesn't matter too much since it's application-side issue, it does
>> hurt
>> > end-users though since they still might want a newer guava-version, but
>> at
>> > least they can modify MR.
>> >
>> > Thoughts?
>> >
>> > thanks,
>> > Arun
>> >
>> >
>> > --
>> > CONFIDENTIALITY NOTICE
>> > NOTICE: This message is intended for the use of the individual or entity
>> to
>> > which it is addressed and may contain information that is confidential,
>> > privileged and exempt from disclosure under applicable law. If the reader
>> > of this message is not the intended recipient, you are hereby notified
>> that
>> > any printing, copying, dissemination, distribution, disclosure or
>> > forwarding of this communication is strictly prohibited. If you have
>> > received this communication in error, please contact the sender
>> immediately
>> > and delete it from your system. Thank You.
>> >
>>
>> --
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity to
>> which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>>

Re: builds failing on H9 with "cannot access java.lang.Runnable"

2014-10-03 Thread Colin McCabe

Thanks, Andrew and Giridharan.

Colin

On Fri, Oct 3, 2014 at 1:20 PM, Andrew Bayer  wrote:
> Yeah, the other possibility is that an ansible run borks running
> slaves. If this happens again, let us know.
>
> A.
>
> On Fri, Oct 3, 2014 at 1:15 PM, Giridharan Kesavan
>  wrote:
>> all the slaves are getting re-booted give it some more time
>>
>> -giri
>>
>> On Fri, Oct 3, 2014 at 1:13 PM, Ted Yu  wrote:
>>
>>> Adding builds@
>>>
>>> On Fri, Oct 3, 2014 at 1:07 PM, Colin McCabe 
>>> wrote:
>>>
>>> > It looks like builds are failing on the H9 host with "cannot access
>>> > java.lang.Runnable"
>>> >
>>> > Example from
>>> >
>>> https://builds.apache.org/job/PreCommit-HDFS-Build/8313/artifact/patchprocess/trunkJavacWarnings.txt
>>> > :
>>> >
>>> > [INFO]
>>> > 
>>> > [INFO] BUILD FAILURE
>>> > [INFO]
>>> > 
>>> > [INFO] Total time: 03:13 min
>>> > [INFO] Finished at: 2014-10-03T18:04:35+00:00
>>> > [INFO] Final Memory: 57M/839M
>>> > [INFO]
>>> > 
>>> > [ERROR] Failed to execute goal
>>> > org.apache.maven.plugins:maven-compiler-plugin:2.5.1:testCompile
>>> > (default-testCompile) on project hadoop-mapreduce-client-app:
>>> > Compilation failure
>>> > [ERROR]
>>> >
>>> /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/commit/TestCommitterEventHandler.java:[189,-1]
>>> > cannot access java.lang.Runnable
>>> > [ERROR] bad class file:
>>> java/lang/Runnable.class(java/lang:Runnable.class)
>>> >
>>> > I don't have shell access to this, does anyone know what's going on on
>>> H9?
>>> >
>>> > best,
>>> > Colin
>>> >
>>>
>>
>> --
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity to
>> which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.

builds failing on H9 with "cannot access java.lang.Runnable"

2014-10-03 Thread Colin McCabe

It looks like builds are failing on the H9 host with "cannot access
java.lang.Runnable"

Example from 
https://builds.apache.org/job/PreCommit-HDFS-Build/8313/artifact/patchprocess/trunkJavacWarnings.txt
:

[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 03:13 min
[INFO] Finished at: 2014-10-03T18:04:35+00:00
[INFO] Final Memory: 57M/839M
[INFO] 
[ERROR] Failed to execute goal
org.apache.maven.plugins:maven-compiler-plugin:2.5.1:testCompile
(default-testCompile) on project hadoop-mapreduce-client-app:
Compilation failure
[ERROR] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/commit/TestCommitterEventHandler.java:[189,-1]
cannot access java.lang.Runnable
[ERROR] bad class file: java/lang/Runnable.class(java/lang:Runnable.class)

I don't have shell access to this, does anyone know what's going on on H9?

best,
Colin

Re: [VOTE] Merge HDFS-6581 to trunk - Writing to replicas in memory.

2014-09-24 Thread Colin McCabe

On Wed, Sep 24, 2014 at 4:03 PM, Arpit Agarwal  wrote:
>> I would appreciate in the future if benchmarks were posted
>> a day or two before a merge vote of a performance improvement was
>> suggested.
>> I feel that it would have been better to float the idea of a
>> merge on the JIRA before actually calling it,
>
> Colin, I noted my intention to start the merge vote on the Jira last week.
> In response you asked for verification that read performance will not
> regressed. Read results were posted prior to starting the vote. I agree
> that the write numbers were not posted until one day after the merge vote
> started. If we delay the vote expiration by a day will it help address any
> remaining timing concerns?

Thank you for the offer.  I am OK with the current vote expiration
time, now that we've seen more benchmarks and discussed more potential
issues.

I am -0 on the merge vote.

But I do want to note that I consider HDFS-7141 to be a blocker for
merging to branch-2.6.

>> There are two areas where I think we need more clarification.  The
>> first is whether other eviction strategies can be implemented besides
>> LRU.
>
> If 2Q or another scheme requires more hooks we can certainly add to the
> pluggable interface. It is not a public interface that is set in stone. It
> is similar to BlockPlacementPolicy and VolumeChoosingPolicy. Both are
> HDFS-internal interfaces with alternative implementations that we rev
> frequently.

I implemented a modified 2Q today on HDFS-7142... I'd appreciate a
review.  Although I haven't had time to do a lot of testing on this, I
think this removes this as a blocker in my mind.

>> The other area is how the memory management for HDFS-6581 fits in with
>> the memory management for HDFS-4949.  I am very concerned that this
>> has been handwaved away as future work. If we create a configuration
>> mess for system administrators as a result, I will be sad.
>
> You asked about memory management on 8/21. I responded the same day stating
> that we are not introducing any configuration settings since mounting the
> disk is done by an administrator. Your response did not indicate any
> concern with this approach. You can see the comment history on HDFS-6581. So
> I am surprised you raise it as a blocker one month later. We have not
> introduced new limits as part of this change so there is no concern of
> future config incompatibilities. The Cache Executor and Lazy Writer can be
> updated to be aware of the memory usage of each other in a compatible way.
>
> What we are voting on here is merging to trunk. If you have additional and
> reasonable concerns that you would like to see addressed prior to 2.6 we
> can have a separate discussion about 2.6 blockers.

I think it's fine to address HDFS-7141 after the merge to trunk.  But
as I noted above, I think we absolutely need to address it before the
merge to 2.6.  We are starting to see a lot of users of HDFS-4949, and
I want to make sure that there is a reasonable story for using both
features at the same time.  Let's continue this discussion on
HDFS-6919 and HDFS-6988 and see if we can come up with a solution that
works for everyone.

best,
Colin


>
> Regards,
> Arpit
>
>
> On Wed, Sep 24, 2014 at 2:19 PM, Colin McCabe 
> wrote:
>
>> On Wed, Sep 24, 2014 at 11:12 AM, Suresh Srinivas
>>  wrote:
>> > Features are done in a separate branch for two reasons: 1) During a
>> feature
>> > development the branch may be not functional 2) The high level approach
>> and
>> > design is not very clear and development can continue while that is being
>> > sorted out.
>>
>> Up until the last two days, we had no benchmarks at all, so we didn't
>> have any way to evaluate whether this performance improvement was
>> functional.  Andrew commented on this as well in this thread, and we
>> also raised the issue on the JIRA.  I am glad that a few benchmarks
>> have finally been posted.  I would appreciate in the future if
>> benchmarks were posted a day or two before a merge vote of a
>> performance improvement was suggested.  As it is, it feels like we are
>> racing the clock right now to figure out how well this works, and it
>> puts the reviewers in an unpleasant position.
>>
>> >In case of this feature, clearly (2) is not an issue. We have
>> > had enough discussion about the approach. I also think this branch is
>> ready
>> > for merge without rendering trunk not functional.
>>
>> I agree that this can be merged without rendering trunk
>> non-functional.  I don't agree that we have achieved consensus on all
>> the high-level approach and design.
>>
>>

Re: [VOTE] Merge HDFS-6581 to trunk - Writing to replicas in memory.

2014-09-24 Thread Colin McCabe

has gone into it, and given it is a very
> important feature that allows experimentation to use memory tier, I would
> like to see this available in release 2.6.

Just to reiterate, I am -0 provided we can address HDFS-7141 and
HDFS-7142 before this gets set in stone.  I would hate to -1 this,
because it would mean that you could not call another vote for a week.
But I feel that it would have been better to float the idea of a merge
on the JIRA before actually calling it, to avoid having discussions
like this where we are racing the clock.

thanks,
Colin

>
> On Tue, Sep 23, 2014 at 6:09 PM, Colin McCabe 
> wrote:
>
>> This seems like a really aggressive timeframe for a merge.  We still
>> haven't implemented:
>>
>> * Checksum skipping on read and write from lazy persisted replicas.
>> * Allowing mmaped reads from the lazy persisted data.
>> * Any eviction strategy other than LRU.
>> * Integration with cache pool limits (how do HDFS-4949 and lazy
>> persist replicas share memory)?
>> * Eviction from RAM disk via truncation (HDFS-6918)
>> * Metrics
>> * System testing to find out how useful this is, and what the best
>> eviction strategy is.
>>
>> I see why we might want to defer checksum skipping, metrics, allowing
>> mmap, eviction via truncation, and so forth until later.  But I feel
>> like we need to figure out how this will integrate with the memory
>> used by HDFS-4949 before we merge.  I also would like to see another
>> eviction strategy other than LRU, which is a very poor eviction
>> strategy for scanning workloads.  I mentioned this a few times on the
>> JIRA.
>>
>> I'd also like to get some idea of how much testing this has received
>> in a multi-node cluster.  What makes us confident that this is the
>> right time to merge, rather than in a week or two?
>>
>> best,
>> Colin
>>
>>
>> On Tue, Sep 23, 2014 at 4:55 PM, Arpit Agarwal 
>> wrote:
>> > I have posted write benchmark results to the Jira.
>> >
>> > On Tue, Sep 23, 2014 at 3:41 PM, Arpit Agarwal > >
>> > wrote:
>> >
>> >> Hi Andrew, I said "it is not going to be a substantial fraction of
>> memory
>> >> bandwidth". That is certainly not the same as saying it won't be good or
>> >> there won't be any improvement.
>> >>
>> >> Any time you have transfers over RPC or the network stack you will not
>> get
>> >> close to the memory bandwidth even for intra-host transfers.
>> >>
>> >> I'll add some micro-benchmark results to the Jira shortly.
>> >>
>> >> Thanks,
>> >> Arpit
>> >>
>> >> On Tue, Sep 23, 2014 at 2:33 PM, Andrew Wang 
>> >> wrote:
>> >>
>> >>> Hi Arpit,
>> >>>
>> >>> Here is the comment. It was certainly not my intention to misquote
>> anyone.
>> >>>
>> >>>
>> >>>
>> https://issues.apache.org/jira/browse/HDFS-6581?focusedCommentId=14138223&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14138223
>> >>>
>> >>> Quote:
>> >>>
>> >>> It would be nice to see that would could get a substantial fraction of
>> >>> memory bandwidth when writing to a single replica in-memory.
>> >>>
>> >>> The comparison will be interesting but I can tell you without
>> measurement
>> >>> it is not going to be a substantial fraction of memory bandwidth. We
>> are
>> >>> still going through DataTransferProtocol with all the copies and
>> overhead
>> >>> that involves.
>> >>>
>> >>> When the goal is in-memory writes and we are unable to achieve a
>> >>> substantial fraction of memory bandwidth, to me that is "not good
>> >>> performance."
>> >>>
>> >>> I also looked through the subtasks, and AFAICT the only one related to
>> >>> improving this is deferring checksum computation. The benchmarking we
>> did
>> >>> on HDFS-4949 showed that this only really helps when you're down to
>> single
>> >>> copy or zero copies with SCR/ZCR. DTP reads didn't see much of an
>> >>> improvement, so I'd guess the same would be true for DTP writes.
>> >>>
>> >>> I think my above three questions are still open, as well as my question
>> >>> about why we're merging now, as opposed to

Re: [VOTE] Merge HDFS-6581 to trunk - Writing to replicas in memory.

2014-09-23 Thread Colin McCabe

This seems like a really aggressive timeframe for a merge.  We still
haven't implemented:

* Checksum skipping on read and write from lazy persisted replicas.
* Allowing mmaped reads from the lazy persisted data.
* Any eviction strategy other than LRU.
* Integration with cache pool limits (how do HDFS-4949 and lazy
persist replicas share memory)?
* Eviction from RAM disk via truncation (HDFS-6918)
* Metrics
* System testing to find out how useful this is, and what the best
eviction strategy is.

I see why we might want to defer checksum skipping, metrics, allowing
mmap, eviction via truncation, and so forth until later.  But I feel
like we need to figure out how this will integrate with the memory
used by HDFS-4949 before we merge.  I also would like to see another
eviction strategy other than LRU, which is a very poor eviction
strategy for scanning workloads.  I mentioned this a few times on the
JIRA.

I'd also like to get some idea of how much testing this has received
in a multi-node cluster.  What makes us confident that this is the
right time to merge, rather than in a week or two?

best,
Colin


On Tue, Sep 23, 2014 at 4:55 PM, Arpit Agarwal  wrote:
> I have posted write benchmark results to the Jira.
>
> On Tue, Sep 23, 2014 at 3:41 PM, Arpit Agarwal 
> wrote:
>
>> Hi Andrew, I said "it is not going to be a substantial fraction of memory
>> bandwidth". That is certainly not the same as saying it won't be good or
>> there won't be any improvement.
>>
>> Any time you have transfers over RPC or the network stack you will not get
>> close to the memory bandwidth even for intra-host transfers.
>>
>> I'll add some micro-benchmark results to the Jira shortly.
>>
>> Thanks,
>> Arpit
>>
>> On Tue, Sep 23, 2014 at 2:33 PM, Andrew Wang 
>> wrote:
>>
>>> Hi Arpit,
>>>
>>> Here is the comment. It was certainly not my intention to misquote anyone.
>>>
>>>
>>> https://issues.apache.org/jira/browse/HDFS-6581?focusedCommentId=14138223&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14138223
>>>
>>> Quote:
>>>
>>> It would be nice to see that would could get a substantial fraction of
>>> memory bandwidth when writing to a single replica in-memory.
>>>
>>> The comparison will be interesting but I can tell you without measurement
>>> it is not going to be a substantial fraction of memory bandwidth. We are
>>> still going through DataTransferProtocol with all the copies and overhead
>>> that involves.
>>>
>>> When the goal is in-memory writes and we are unable to achieve a
>>> substantial fraction of memory bandwidth, to me that is "not good
>>> performance."
>>>
>>> I also looked through the subtasks, and AFAICT the only one related to
>>> improving this is deferring checksum computation. The benchmarking we did
>>> on HDFS-4949 showed that this only really helps when you're down to single
>>> copy or zero copies with SCR/ZCR. DTP reads didn't see much of an
>>> improvement, so I'd guess the same would be true for DTP writes.
>>>
>>> I think my above three questions are still open, as well as my question
>>> about why we're merging now, as opposed to when the performance of the
>>> branch is proven out.
>>>
>>> Thanks,
>>> Andrew
>>>
>>> On Tue, Sep 23, 2014 at 2:10 PM, Arpit Agarwal 
>>> wrote:
>>>
>>> > Andrew, don't misquote me. Can you link the comment where I said
>>> > performance wasn't going to be good?
>>> >
>>> > I will add some add some preliminary write results to the Jira later
>>> today.
>>> >
>>> > > What's the plan to improve write performance?
>>> > I described this in response to your and Colin's comments on the Jira.
>>> >
>>> > For the benefit of folks not following the Jira, the immediate task we'd
>>> > like to get done post-merge is moving checksum computation off the write
>>> > path. Also see open subtasks of HDFS-6581 for other planned perf
>>> > improvements.
>>> >
>>> > Thanks,
>>> > Arpit
>>> >
>>> >
>>> > On Tue, Sep 23, 2014 at 1:07 PM, Andrew Wang 
>>> > wrote:
>>> >
>>> > > Hi Arpit,
>>> > >
>>> > > On HDFS-6581, I asked for write benchmarks on Sep 19th, and you
>>> responded
>>> > > that the performance wasn't going to be good. However, I thought the
>>> > > primary goal of this JIRA was to improve write performance, and write
>>> > > performance is listed as the first feature requirement in the design
>>> doc.
>>> > >
>>> > > So, this leads me to a few questions, which I also asked last week on
>>> the
>>> > > JIRA (I believe still unanswered):
>>> > >
>>> > > - What's the plan to improve write performance?
>>> > > - What kind of performance can we expect after the plan is completed?
>>> > > - Can this expected performance be validated with a prototype?
>>> > >
>>> > > Even with these questions answered, I don't understand the need to
>>> merge
>>> > > this before the write optimization work is completed. Write perf is
>>> > listed
>>> > > as a feature requirement, so the branch can reasonably be called not
>>> > > feature complete until it's shown to be

Re: [DISCUSS] Allow continue reading from being-written file using same stream

2014-09-19 Thread Colin McCabe

On Fri, Sep 19, 2014 at 9:41 AM, Vinayakumar B  wrote:
> Thanks Colin for the detailed explanation.
>
> On Fri, Sep 19, 2014 at 9:38 PM, Colin McCabe 
> wrote:
>>
>> On Thu, Sep 18, 2014 at 11:06 AM, Vinayakumar B 
> wrote:
>> > bq. I don't know about the merits of this, but I do know that native
>> > filesystems
>> > implement this by not raising the EOF exception on the seek() but only
> on
>> > the read ... some of the non-HDFS filesystems Hadoop support work this
> way.
>>
>> Pretty much all of them should.  POSIX specifies that seeking past the
>> end of a file is not an error.  Reading past the end of the file gives
>> an EOF, but the seek always succeeds.
>>
>> It would be nice if HDFS had this behavior as well.  It seems like
>> this would have to be a 3.0 thing, since it's a potential
>> incompatibility.
>>
>> > I agree with you steve. read only will throw EOF. But when we know that
>> > file is being written  and it can have fresh data, then polling can be
> done
>> > by calling available(). later we can continue read or call seek.
>>
>
> Yes, I too agree that, if we are changing seek() behaviour, then definitely
> that is a 3.0 thing.
>
>> InputStream#available() has a really specific function in Java:
>> telling you approximately how much data is currently buffered by the
>> stream.
>>
>> As a side note, InputStream#available seems to be one of the most
>> misunderstood APIs in Java.  It's pretty common for people to assume
>> that it means "how much data is left in the stream" or something like
>> that.  I think I made that mistake at least once when getting started
>> with Java.  I guess the JavaDoc is kind of vague-- it specifies that
>> available returns "an estimate of the number of bytes that can be read
>> (or skipped over) from this input stream without blocking."  But in
>> practice, that means how much is buffered (for a file-backed stream,
>> to pull more bytes from the OS would require a syscall, which is
>> "blocking."  Similarly for network-backed streams.)
>
> Yes, InputStream#available() javadoc says its the data which can be read
> non-blocking,
> It also says, impls can chose to return total number of bytes available in
> the stream, which is done in DFSInputStream

I think DFSInputStream#available would be a lot more useful if it told
users how much data could be read without doing network I/O.  Right
now, this is something that users have no way of figuring out.

Plus, available() returns an int, and HDFS files are often longer than
2 gigs.  We have an API for getting file length and current offset
(DFSInputStream#getFileLength)... we don't need to make available() do
that stuff.

>
>> In any case, we certainly could create a new API to refresh
>> inputstream data.  I guess the idea would be to check if the last
>> block we knew about had reached full length-- if so, we would ask the
>> NameNode for any new block locations.  So it would be a DN operation
>> in most cases, but sometimes a NN operation.
>
> Correct, we can anyway have new API to refresh. But if clients uses just
> InputStream interface, then IMO its better to do this in available()
> itself. This will be inline with native FileInputStream.
>  If the file is closed, then we can chose to return -1, else if no new data
> available then can return 0 as its doing now.
> As you mentioned,  refresh can be done only from DNs, and if the block is
> full, then refresh from NN again. But also needs to think how we can handle
> this, if the proposed "variable length blocks" comes to HDFS.

Returning the complete file length minus the current position might
technically be within the bounds of the JavaDoc (although a poor
implementation, I would argue), but going over the network and
contacting the NameNode is definitely way outside it.  In C++ terms,
available() is intended to be const... it's not supposed to mutate the
state of the stream.  In my opinion, at least...

>
>> Have you looked at https://issues.apache.org/jira/browse/HDFS-6633:
>> Support reading new data in a being written file until the file is
>> closed?  That patch seems to take the approach of turning reading past
>> the end of the file into an operation that blocks until there is new
>> data.  (when dfs.client.read.tail-follow is set.)  I think I prefer
>> the idea of a new refresh API, just because it puts more control in
>> the hands of the user.
>
> Just now saw the Jira. Intention of the Jira is same as this discussion.
> Before seeing this mail, I had raised
> https://issues.apache.org/jira/browse/HDF

Re: [DISCUSS] Allow continue reading from being-written file using same stream

2014-09-19 Thread Colin McCabe

On Thu, Sep 18, 2014 at 11:06 AM, Vinayakumar B  wrote:
> bq. I don't know about the merits of this, but I do know that native
> filesystems
> implement this by not raising the EOF exception on the seek() but only on
> the read ... some of the non-HDFS filesystems Hadoop support work this way.

Pretty much all of them should.  POSIX specifies that seeking past the
end of a file is not an error.  Reading past the end of the file gives
an EOF, but the seek always succeeds.

It would be nice if HDFS had this behavior as well.  It seems like
this would have to be a 3.0 thing, since it's a potential
incompatibility.

> I agree with you steve. read only will throw EOF. But when we know that
> file is being written  and it can have fresh data, then polling can be done
> by calling available(). later we can continue read or call seek.

InputStream#available() has a really specific function in Java:
telling you approximately how much data is currently buffered by the
stream.

As a side note, InputStream#available seems to be one of the most
misunderstood APIs in Java.  It's pretty common for people to assume
that it means "how much data is left in the stream" or something like
that.  I think I made that mistake at least once when getting started
with Java.  I guess the JavaDoc is kind of vague-- it specifies that
available returns "an estimate of the number of bytes that can be read
(or skipped over) from this input stream without blocking."  But in
practice, that means how much is buffered (for a file-backed stream,
to pull more bytes from the OS would require a syscall, which is
"blocking."  Similarly for network-backed streams.)

In any case, we certainly could create a new API to refresh
inputstream data.  I guess the idea would be to check if the last
block we knew about had reached full length-- if so, we would ask the
NameNode for any new block locations.  So it would be a DN operation
in most cases, but sometimes a NN operation.

Have you looked at https://issues.apache.org/jira/browse/HDFS-6633:
Support reading new data in a being written file until the file is
closed?  That patch seems to take the approach of turning reading past
the end of the file into an operation that blocks until there is new
data.  (when dfs.client.read.tail-follow is set.)  I think I prefer
the idea of a new refresh API, just because it puts more control in
the hands of the user.

Another thing to consider is how this all interacts with the proposed
HDFS truncate operation (see HDFS-3107).

best,
Colin

>
> One simple example use case is tailing a file.
>
> Regards,
> Vinay
>
> On Thu, Sep 18, 2014 at 3:35 PM, Steve Loughran 
> wrote:
>
>> I don't know about the merits of this, but I do know that native
>> filesystems implement this by not raising the EOF exception on the seek()
>> but only on the read ... some of the non-HDFS filesystems Hadoop support
>> work this way.
>>
>> -I haven't ever looked to see what code assumes that it is the seek that
>> fails, not the read.
>> -PositionedReadable had better handle this too, even if it isn't done via a
>> seek()-read()-seek() sequence
>>
>>
>> On 18 September 2014 08:48, Vinayakumar B  wrote:
>>
>> > Hi all,
>> >
>> > Currently *DFSInputStream *doen't allow reading a write-inprogress file,
>> > once all written bytes, by the time of opening an input stream, are read.
>> >
>> > To read further update on the same file, needs to be read by opening
>> > another stream to the same file again.
>> >
>> > Instead how about refreshing length of such open files if the current
>> > position is at earlier EOF.
>> >
>> > May be this could be done in *available() *method, So that clients who
>> > knows that original writer will not close then read can continuously poll
>> > for new data using the same stream?
>> >
>> > PS: This is possible in local disk read using FileInputStream
>> >
>> > Regards,
>> > Vinay
>> >
>>
>> --
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity to
>> which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>>

Re: In hindsight... Re: Thinking ahead to hadoop-2.6

2014-09-15 Thread Colin McCabe

On Mon, Sep 15, 2014 at 10:48 AM, Allen Wittenauer  wrote:
>
> It’s now September.  With the passage of time, I have a lot of doubts 
> about this plan and where that trajectory takes us.
>
> * The list of changes that are already in branch-2 scare the crap out of any 
> risk adverse person (Hello to my fellow operations people!). Not only are the 
> number of changes extremely high, but in addition there are a lot of major, 
> blockbuster features in what is supposed to be a minor release.  Combined 
> with the fact that we’ve had to do some micro releases, it seems to hint that 
> branch-2 is getting less stable over time.

I don't see what is so scary about 2.6, can you be more concrete?  It
seems like a pretty normal release to me and most of the new features
are optional.

I also don't see why you think that "branch-2 is getting less stable
over time."  Actually, I think that branch-2 has gotten more stable
over time as people have finally gotten around to upgrading from 1.x
or earlier, and contributed their efforts to addressing regressions in
branch-2.

> *  One of the plans talked about was rolling a 2.7 release that drops JDK6 
> and makes JDK7 the standard.  If 2.7 comes after 2.6 in October, date wise  
> makes it somewhere around January 2015.  JDK7 EOL’s in April 2015.  So we’ll 
> have a viable JDK7 release for exactly 3 months.  Frankly, it is too late for 
> us to talk about JDK7 and we need to start thinking about JDK8.
>
> * trunk is currently sitting at 3 years old.  There is a lot of stuff that 
> has been hanging around that really needs to get into people hands so that we 
> can start stabilizing it for a “real” release.

We have been pretty careful about minimizing trunk's divergence from
branch-2.  I can't think of an example of anything in trunk that
"really needs to get into people's hands"-- did I forget something?

>
>
> To me this all says one thing:
>
> Drop the 2.6.0 release, branch trunk, and start rolling a 3.0.0-alpha 
> with JDK8 as the minimum.  2.5.1 becomes the base for all sustaining work.  
> This gives the rest of the community time to move to JDK8 if they haven’t 
> already.  For downstream vendors, it gives a roadmap for their customers who 
> will be asking about JDK8 sooner rather than later.  By the time 3.0 
> stabilizes, we’re probably looking at April, which is perfect timing.
>
> One of the issues I’ve heard mention is that 3.0 doesn’t have 
> anything “compelling” in it.  Well, dropping 2.6 makes the feature list the 
> carrot, JDK8 support is obviously the stick.
>
> Thoughts?

As we've discussed before, supporting JDK8 is very different from
forcing people to use JDK8.  branch-2 and Hadoop 2.6 most certainly
should support JDK8, and most certainly NOT force people to use JDK8.
Cloudera has been using JDK7 internally for a long time, and
recommending it to customers too.  Some developers are using JDK8 as
well.  It works fine (although I'm sure there will be bugs and
workarounds that get reported and fixed as more people migrate).  I
don't see this particular issue as a reason to change the schedule.

best,
Colin


>
>
>
>
> On Aug 15, 2014, at 6:07 PM, Subramaniam Krishnan  wrote:
>
>> Thanks for initiating the thread Arun.
>>
>> Can we add YARN-1051  to
>> the list? We have most of the patches for the sub-JIRAs under review and
>> have committed a couple.
>>
>> -Subru
>>
>> -- Forwarded message --
>>
>> From: Arun C Murthy 
>>
>> Date: Tue, Aug 12, 2014 at 1:34 PM
>>
>> Subject: Thinking ahead to hadoop-2.6
>>
>> To: "common-...@hadoop.apache.org" , "
>> hdfs-dev@hadoop.apache.org" , "
>> mapreduce-...@hadoop.apache.org" ,
>>
>> "yarn-...@hadoop.apache.org" 
>>
>>
>>
>>
>>
>> Folks,
>>
>>
>>
>> With hadoop-2.5 nearly done, it's time to start thinking ahead to
>> hadoop-2.6.
>>
>>
>>
>> Currently, here is the Roadmap per the wiki:
>>
>>
>>
>>• HADOOP
>>
>>• Credential provider HADOOP-10607
>>
>>• HDFS
>>
>>• Heterogeneous storage (Phase 2) - Support APIs for using
>> storage tiers by the applications HDFS-5682
>>
>>• Memory as storage tier HDFS-5851
>>
>>• YARN
>>
>>• Dynamic Resource Configuration YARN-291
>>
>>• NodeManager Restart YARN-1336
>>
>>• ResourceManager HA Phase 2 YARN-556
>>
>>• Support for admin-specified labels in YARN YARN-796
>>
>>• Support for automatic, shared cache for YARN application
>> artifacts YARN-1492
>>
>>• Support NodeGroup layer topology on YARN YARN-18
>>
>>• Support for Docker containers in YARN YARN-1964
>>
>>• YARN service registry YARN-913
>>
>>
>>
>> My suspicion is, as is normal, some will make the cut and some won't.
>>
>> Please do add/subtract from the list as appropriate. Ideally, it would be
>> good to ship hadoop-2.6 in

Re: Updates on migration to git

2014-08-27 Thread Colin McCabe

Thanks for making this happen, Karthik and Daniel.  Great job.

best,
Colin

On Tue, Aug 26, 2014 at 5:59 PM, Karthik Kambatla  wrote:
> Yes, we have requested for force-push disabled on trunk and branch-*
> branches. I didn't test it though :P, it is not writable yet.
>
>
> On Tue, Aug 26, 2014 at 5:48 PM, Todd Lipcon  wrote:
>
>> Hey Karthik,
>>
>> Just to confirm, have we disabled force-push support on the repo?
>>
>> In my experience, especially when a project has committers new to git,
>> force-push support causes more trouble than it's worth.
>>
>> -Todd
>>
>>
>> On Tue, Aug 26, 2014 at 4:39 PM, Karthik Kambatla 
>> wrote:
>>
>> > Looks like our git repo is good to go.
>> >
>> > On INFRA-8195, I am asking Daniel to enable writing to it. In case you
>> find
>> > any issues, please comment on the JIRA.
>> >
>> > Thanks
>> > Karthik
>> >
>> >
>> > On Tue, Aug 26, 2014 at 3:28 PM, Arpit Agarwal > >
>> > wrote:
>> >
>> > > I cloned the new repo, built trunk and branch-2, verified all the
>> > branches
>> > > are present. Also checked a few branches and the recent commit history
>> > > matches our existing repo. Everything looks good so far.
>> > >
>> > >
>> > > On Tue, Aug 26, 2014 at 1:19 PM, Karthik Kambatla 
>> > > wrote:
>> > >
>> > > > The git repository is now ready for inspection. I ll take a look
>> > shortly,
>> > > > but it would be great if a few others could too.
>> > > >
>> > > > Once we are okay with it, we can ask it to be writable.
>> > > >
>> > > > On Tuesday, August 26, 2014, Karthik Kambatla 
>> > > wrote:
>> > > >
>> > > > > Hi Suresh
>> > > > >
>> > > > > There was one vote thread on whether to migrate to git, and the
>> > > > > implications to the commit process for individual patches and
>> feature
>> > > > > branches -
>> > > > >
>> > >
>> https://www.mail-archive.com/common-dev@hadoop.apache.org/msg13447.html
>> > > > .
>> > > > > Prior to that, there was a discuss thread on the same topic.
>> > > > >
>> > > > > As INFRA handles the actual migration from subversion to git, the
>> > vote
>> > > > > didn't include those specifics. The migration is going on as we
>> speak
>> > > > (See
>> > > > > INFRA-8195). The initial expectation was that the migration would
>> be
>> > > done
>> > > > > in a few hours, but it has been several hours and the last I heard
>> > the
>> > > > > import was still running.
>> > > > >
>> > > > > I have elaborated on the points in the vote thread and drafted up a
>> > > wiki
>> > > > > page on how-to-commit -
>> > > > https://wiki.apache.org/hadoop/HowToCommitWithGit
>> > > > > . We can work on improving this further and call a vote thread on
>> > those
>> > > > > items if need be.
>> > > > >
>> > > > > Thanks
>> > > > > Karthik
>> > > > >
>> > > > >
>> > > > > On Tue, Aug 26, 2014 at 11:41 AM, Suresh Srinivas <
>> > > > sur...@hortonworks.com
>> > > > > > wrote:
>> > > > >
>> > > > >> Karthik,
>> > > > >>
>> > > > >> I would like to see detailed information on how this migration
>> will
>> > be
>> > > > >> done, how it will affect the existing project and commit process.
>> > This
>> > > > >> should be done in a document that can be reviewed instead of in an
>> > > email
>> > > > >> thread on an ad-hoc basis. Was there any voting on this in PMC and
>> > > > should
>> > > > >> we have a vote to ensure everyone is one the same page on doing
>> this
>> > > and
>> > > > >> how to go about it?
>> > > > >>
>> > > > >> Regards,
>> > > > >> Suresh
>> > > > >>
>> > > > >>
>> > > > >> On Tue, Aug 26, 2014 at 9:17 AM, Karthik Kambatla <
>> > ka...@cloudera.com
>> > > > >> >
>> > > > >> wrote:
>> > > > >>
>> > > > >> > Last I heard, the import is still going on and appears closer to
>> > > > getting
>> > > > >> > done. Thanks for your patience with the migration.
>> > > > >> >
>> > > > >> > I ll update you as and when there is something. Eventually, the
>> > git
>> > > > repo
>> > > > >> > should be at the location in the wiki.
>> > > > >> >
>> > > > >> >
>> > > > >> > On Mon, Aug 25, 2014 at 3:45 PM, Karthik Kambatla <
>> > > ka...@cloudera.com
>> > > > >> >
>> > > > >> > wrote:
>> > > > >> >
>> > > > >> > > Thanks for bringing these points up, Zhijie.
>> > > > >> > >
>> > > > >> > > By the way, a revised How-to-commit wiki is at:
>> > > > >> > > https://wiki.apache.org/hadoop/HowToCommitWithGit . Please
>> feel
>> > > > free
>> > > > >> to
>> > > > >> > > make changes and improve it.
>> > > > >> > >
>> > > > >> > > On Mon, Aug 25, 2014 at 11:00 AM, Zhijie Shen <
>> > > > zs...@hortonworks.com
>> > > > >> >
>> > > > >> > > wrote:
>> > > > >> > >
>> > > > >> > >> Do we have any convention about "user.name" and
>> "user.email"?
>> > > For
>> > > > >> > >> example,
>> > > > >> > >> we'd like to use @apache.org for the email.
>> > > > >> > >>
>> > > > >> > >
>> > > > >> > > May be, we can ask people to use project-specific configs here
>> > and
>> > > > use
>> > > > >> > > their real name and @apache.org address.
>> > > > >> > >
>> > > > >> > > Is there any downside to le

Re: HDFS-6902 FileWriter should be closed in finally block in BlockReceiver#receiveBlock()

2014-08-25 Thread Colin McCabe

Let's discuss this on the JIRA.  I think Tsuyoshi OZAWA's solution is good.

Colin


On Thu, Aug 21, 2014 at 7:08 AM, Ted Yu  wrote:
> bq. else there is a memory leak
>
> Moving call of close() would prevent the leak.
>
> bq. but then this code snippet could be java and can be messy
>
> The code is in Java.
>
> Cheers
>
> On Wed, Aug 20, 2014 at 10:00 PM, vlab  wrote:
>
>> Unless you need 'out' later, have this statement.
>> FileWriter out(restartMeta);
>> then when exiting the try block, 'out' will go out of scope
>>
>> i assume this FileWriter that is create is delete'd else where
>> (else there is a memory leak).   {but then this code snippet could be java
>> and can be messy.}
>>
>>
>> On 8/20/2014 8:50 PM, Ted Yu (JIRA) wrote:
>>
>>> Ted Yu created HDFS-6902:
>>> 
>>>
>>>   Summary: FileWriter should be closed in finally block in
>>> BlockReceiver#receiveBlock()
>>>   Key: HDFS-6902
>>>   URL: https://issues.apache.org/jira/browse/HDFS-6902
>>>   Project: Hadoop HDFS
>>>Issue Type: Bug
>>>  Reporter: Ted Yu
>>>  Priority: Minor
>>>
>>>
>>> Here is code starting from line 828:
>>> {code}
>>>  try {
>>>FileWriter out = new FileWriter(restartMeta);
>>>// write out the current time.
>>>out.write(Long.toString(Time.now() + restartBudget));
>>>out.flush();
>>>out.close();
>>>  } catch (IOException ioe) {
>>> {code}
>>> If write() or flush() call throws IOException, out wouldn't be closed.
>>>
>>>
>>>
>>> --
>>> This message was sent by Atlassian JIRA
>>> (v6.2#6252)
>>>
>>>
>>

Re: [DISCUSS] Switch to log4j 2

2014-08-18 Thread Colin McCabe

On Fri, Aug 15, 2014 at 8:50 AM, Aaron T. Myers  wrote:
> Not necessarily opposed to switching logging frameworks, but I believe we
> can actually support async logging with today's logging system if we wanted
> to, e.g. as was done for the HDFS audit logger in this JIRA:
>
> https://issues.apache.org/jira/browse/HDFS-5241

Yes, this is a great example of making something async without
switching logging frameworks.  +1 for doing that where it is
appropriate.

>
> --
> Aaron T. Myers
> Software Engineer, Cloudera
>
>
> On Fri, Aug 15, 2014 at 5:44 AM, Steve Loughran 
> wrote:
>
>> moving to SLF4J as an API is independent —it's just a better API for
>> logging than commons-logging, was already a dependency and doesn't force
>> anyone to switch to a new log back end.

Interesting idea.  Did anyone do a performance comparison and/or API
comparison with SLF4j on Hadoop?

>>
>>
>> On 15 August 2014 03:34, Tsuyoshi OZAWA  wrote:
>>
>> > Hi,
>> >
>> > Steve has started discussion titled "use SLF4J APIs in new modules?"
>> > as a related topic.
>> >
>> >
>> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201404.mbox/%3cca+4kjvv_9cmmtdqzcgzy-chslyb1wkgdunxs7wrheslwbuh...@mail.gmail.com%3E
>> >
>> > It sounds good to me to use asynchronous logging when we log INFO. One

-1.  Async logging for everything will make a lot of failures
un-debuggable.  Just to give one example, what if you get a JVM out of
memory crash?  You'll lose the last few log messages which could have
told you what was going on.  Even if the JVM doesn't terminate, log
messages will be out of order, which is annoying, and will make
debugging harder.

The kernel already buffers the log files in memory.  Not every log
message generates a disk seek.  But on the other hand, if the JVM
process crashes, you've got everything.  In other words, we've already
got as much buffering and asynchronicity as we need!

If the problem is that the noisy logs are overloading the disk
bandwidth, that problem can't be solved by adding Java-level async.
You need more bandwidth.  A simple way of doing this is putting the
log partition on /dev/shm.  We could also look into stripping some of
the boilerplate from log messages-- there are a lot of super-long log
messages that could be much more concise.  Other Java logging
frameworks might have less overhead (I'm not an expert on this, but
maybe someone could post some numbers?)

best,
Colin


>> > concern is that asynchronous logging makes debugging difficult - I
>> > don't know log4j 2 well, but I suspect that ordering of logging can be
>> > changed even if WARN or  FATAL are logged with synchronous logger.
>> >
>> > Thanks,
>> > - Tsuyoshi
>> >
>> > On Thu, Aug 14, 2014 at 6:44 AM, Arpit Agarwal > >
>> > wrote:
>> > > I don't recall whether this was discussed before.
>> > >
>> > > I often find our INFO logging to be too sparse for useful diagnosis. A
>> > high
>> > > performance logging framework will encourage us to log more.
>> > Specifically,
>> > > Asynchronous Loggers look interesting.
>> > > https://logging.apache.org/log4j/2.x/manual/async.html#Performance
>> > >
>> > > What does the community think of switching to log4j 2 in a Hadoop 2.x
>> > > release?
>> > >
>> > > --
>> > > CONFIDENTIALITY NOTICE
>> > > NOTICE: This message is intended for the use of the individual or
>> entity
>> > to
>> > > which it is addressed and may contain information that is confidential,
>> > > privileged and exempt from disclosure under applicable law. If the
>> reader
>> > > of this message is not the intended recipient, you are hereby notified
>> > that
>> > > any printing, copying, dissemination, distribution, disclosure or
>> > > forwarding of this communication is strictly prohibited. If you have
>> > > received this communication in error, please contact the sender
>> > immediately
>> > > and delete it from your system. Thank You.
>> >
>>
>> --
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity to
>> which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>>

Re: [VOTE] Migration from subversion to git for version control

2014-08-11 Thread Colin McCabe

+1.

best,
Colin

On Fri, Aug 8, 2014 at 7:57 PM, Karthik Kambatla  wrote:
> I have put together this proposal based on recent discussion on this topic.
>
> Please vote on the proposal. The vote runs for 7 days.
>
>1. Migrate from subversion to git for version control.
>2. Force-push to be disabled on trunk and branch-* branches. Applying
>changes from any of trunk/branch-* to any of branch-* should be through
>"git cherry-pick -x".
>3. Force-push on feature-branches is allowed. Before pulling in a
>feature, the feature-branch should be rebased on latest trunk and the
>changes applied to trunk through "git rebase --onto" or "git cherry-pick
>".
>4. Every time a feature branch is rebased on trunk, a tag that
>identifies the state before the rebase needs to be created (e.g.
>tag_feature_JIRA-2454_2014-08-07_rebase). These tags can be deleted once
>the feature is pulled into trunk and the tags are no longer useful.
>5. The relevance/use of tags stay the same after the migration.
>
> Thanks
> Karthik
>
> PS: Per Andrew Wang, this should be a "Adoption of New Codebase" kind of
> vote and will be Lazy 2/3 majority of PMC members.

Re: Finding file size during block placement

2014-07-25 Thread Colin McCabe

On Wed, Jul 23, 2014 at 8:15 AM, Arjun  wrote:

> Hi,
>
> I want to write a block placement policy that takes the size of the file
> being placed into account. Something like what is done in CoHadoop or BEEMR
> paper. I have the following questions:
>
>
Hadoop uses a stream metaphor.  So at the time you're deciding what blocks
to use for a DFSOutputStream, you don't know how many bytes the user code
is going to write.  It could be terabytes, or nothing.

You could potentially start placing the later replicas differently, once
the first few blocks had been written.  You would probably need to modify
the BlockPlacementPolicy interface to supply this information.  I could be
wrong, but as far as I can see, there's no way to access that with the
current API.

cheers,
Colin



> 1- Is srcPath in chooseTarget the path to the original un-chunked file, or
> it is a path to a single block?
>
> 2- Will a simple new File(srcPath) will do?
>
> 3- I've spent time looking at hadoop source code. I can't find a way to go
> from srcPath in chooseTarget to a file size. Every function I think can do
> it, in FSNamesystem, FSDirectory, etc., is either non-public, or cannot be
> called from inside the blockmanagement package or blockplacement class.
>
> How do I go from srcPath in blockplacement class to size of the file being
> placed?
>
> Thank you,
>
> AB
>

Re: [DISCUSS] Assume Private-Unstable for classes that are not annotated

2014-07-25 Thread Colin McCabe

+1.

Colin


On Tue, Jul 22, 2014 at 2:54 PM, Karthik Kambatla 
wrote:

> Hi devs
>
> As you might have noticed, we have several classes and methods in them that
> are not annotated at all. This is seldom intentional. Avoiding incompatible
> changes to all these classes can be considerable baggage.
>
> I was wondering if we should add an explicit disclaimer in our
> compatibility guide that says, "Classes without annotations are to
> considered @Private"
>
> For methods, is it reasonable to say - "Class members without specific
> annotations inherit the annotations of the class"?
>
> Thanks
> Karthik
>

Re: [Vote] Merge The HDFS XAttrs Feature Branch (HDFS-2006) to Trunk

2014-05-20 Thread Colin McCabe

Great job, guys.  +1.

I don't think we need to finish libhdfs support before we merge (unless you
want to).

Colin


On Wed, May 14, 2014 at 5:47 AM, Gangumalla, Uma
wrote:

> Hello HDFS Devs,
>   I would like to call for a vote to merge the HDFS Extended Attributes
> (XAttrs) feature from the HDFS-2006 branch to the trunk.
>   XAttrs are already widely supported on many operating systems, including
> Linux, Windows, and Mac OS. This will allow storing attributes for HDFS
> file/directory.
>   XAttr consist of a name and a value and exist in one of 4 namespaces:
> user, trusted, security, and system. An XAttr name is prefixed with one of
> these namespaces, so for example, "user.myxattr".
>   Consistent with ongoing awareness of Namenode memory usage, the maximum
> number and size of XAttrs on a file/directory are limited by a
> configuration parameter.
>   The design document contains more details and can be found here:
> https://issues.apache.org/jira/secure/attachment/12644341/HDFS-XAttrs-Design-3.pdf
>   Development of this feature has been tracked in JIRA HDFS-2006:
> https://issues.apache.org/jira/browse/HDFS-2006
>   All of the development work for the feature is contained in the
> "HDFS-2006" branch:
> https://svn.apache.org/repos/asf/hadoop/common/branches/HDFS-2006
>  As last tasks, we are working to support XAttrs via libhdfs, webhdfs as
> well as other minor improvements.
>   We intend to finish those enhancements before the vote completes and
> otherwise we could move them to top-level JIRAs as they can be tracked
> independently. User document is also ready for this feature.
>   Here the doc attached in JIRA:
> https://issues.apache.org/jira/secure/attachment/12644787/ExtendedAttributes.html
>  The XAttrs feature is backwards-compatible and enabled by default. A
> cluster administrator can disable it.
> Testing:
>  We've developed more than 70 new tests which cover the XAttrs get, set
> and remove APIs through DistributedFileSystem and WebHdfsFileSystem, the
> new XAttr CLI commands, HA, XAttr persistence in the fsimage and related.
>   Additional  testing plans are documented in:
> https://issues.apache.org/jira/secure/attachment/12644342/Test-Plan-for-Extended-Attributes-1.pdf
>   Thanks a lot to the contributors who have helped and participated in the
> branch development.
>   Code contributors are Yi Liu, Charles Lamb, Andrew Wang and Uma
> Maheswara Rao G.
>  The design document incorporates feedback from many community members:
> Chris Nauroth, Andrew Purtell, Tianyou Li, Avik Dey, Charles Lamb,
> Alejandro, Andrew Wang, Tsz Wo Nicholas Sze and Uma Maheswara Rao G.
>  Code reviewers on individual patches include Chris Nauroth, Alejandro,
> Andrew Wang, Charles Lamb, Tsz Wo Nicholas Sze and Uma Maheswara Rao G.
>
>   Also thanks to Dhruba for bringing up this JIRA and thanks to others who
> participated for discussions.
> This vote will run for a week and close on 5/21/2014 at 06:16 pm IST.
>
> Here is my +1 to start with.
> Regards,
> Uma
> (umamah...@apache.org)
>
>
>
>

Re: In-Memory Reference FS implementations

2014-03-06 Thread Colin McCabe

NetFlix's Apache-licensed S3mper system provides consistency for an
S3-backed store.
http://techblog.netflix.com/2014/01/s3mper-consistency-in-cloud.html

It would be nice to see this or something like it integrated with
Hadoop.  I fear that a lot of applications are not ready for eventual
consistency, and may never be, leading to the feeling that Hadoop on
S3 is buggy.

Colin

On Thu, Mar 6, 2014 at 10:42 AM, Jay Vyas  wrote:
> do you consider that native S3 FS  a real "reference implementation" for
> blob stores? or just something that , by mere chance, we are able to use as
> a ref. impl.

Re: [VOTE] Release Apache Hadoop 2.3.0

2014-02-11 Thread Colin McCabe

Looks good.

+1, also non-binding.

I downloaded the source tarball, checked md5, built, ran some unit
tests, ran an HDFS cluster.

cheers,
Colin

On Tue, Feb 11, 2014 at 6:53 PM, Andrew Wang  wrote:
> Thanks for putting this together Arun.
>
> +1 non-binding
>
> Downloaded source tarball
> Verified signature and digests
> Ran apache-rat:check
> Built with "mvn clean package install -Pdist -Dtar"
> Started a one-node cluster and ran wordcount
>
> Best,
> Andrew
>
> On Tue, Feb 11, 2014 at 5:40 PM, Ravi Prakash  wrote:
>
>> Thanks Arun for another release!
>>
>> +1 non-binding
>>
>> Verified signatures, deployed a single node cluster and ran sleep and
>> wordcount. Everything looks fine.
>>
>>
>> Regards
>> Ravi
>>
>>
>>
>>
>> On Tuesday, February 11, 2014 5:36 PM, Travis Thompson <
>> tthomp...@linkedin.com> wrote:
>>
>> Everything looks good so far, running on 100 nodes with security enabled.
>>
>> I've found two minor issues I've found with the new Namenode UI so far and
>> will work on them over the next few days:
>>
>> HDFS-5934
>> HDFS-5935
>>
>> Thanks,
>>
>> Travis
>>
>>
>> On Feb 11, 2014, at 4:53 PM, Mohammad Islam 
>> wrote:
>>
>> > Thanks Arun for the initiative.
>> >
>> > +1 non-binding.
>> >
>> >
>> > I tested the followings:
>> > 1. Build package from the source tar.
>> > 2. Verified with md5sum
>> > 3. Verified with gpg
>> > 4. Basic testing
>> >
>> > Overall, good to go.
>> >
>> > Regards,
>> > Mohammad
>> >
>> >
>> >
>> >
>> > On Tuesday, February 11, 2014 2:07 PM, Chen He 
>> wrote:
>> >
>> > +1, non-binding
>> > successful compiled on MacOS 10.7
>> > deployed to Fedora 7 and run test job without any problem.
>> >
>> >
>> >
>> > On Tue, Feb 11, 2014 at 8:49 AM, Arun C Murthy 
>> wrote:
>> >
>> >> Folks,
>> >>
>> >> I've created a release candidate (rc0) for hadoop-2.3.0 that I would
>> like
>> >> to get released.
>> >>
>> >> The RC is available at:
>> >> http://people.apache.org/~acmurthy/hadoop-2.3.0-rc0
>> >> The RC tag in svn is here:
>> >> https://svn.apache.org/repos/asf/hadoop/common/tags/release-2.3.0-rc0
>> >>
>> >> The maven artifacts are available via repository.apache.org.
>> >>
>> >> Please try the release and vote; the vote will run for the usual 7 days.
>> >>
>> >> thanks,
>> >> Arun
>> >>
>> >> PS: Thanks to Andrew, Vinod & Alejandro for all their help in various
>> >> release activities.
>> >> --
>> >> CONFIDENTIALITY NOTICE
>> >> NOTICE: This message is intended for the use of the individual or
>> entity to
>> >> which it is addressed and may contain information that is confidential,
>> >> privileged and exempt from disclosure under applicable law. If the
>> reader
>> >> of this message is not the intended recipient, you are hereby notified
>> that
>> >> any printing, copying, dissemination, distribution, disclosure or
>> >> forwarding of this communication is strictly prohibited. If you have
>> >> received this communication in error, please contact the sender
>> immediately
>> >> and delete it from your system. Thank You.
>>

Re: Is there a way to get a Block through block id?

2014-01-29 Thread Colin McCabe

You could look at the BlocksMap.  That is where blocks should reside.  It
depends on what you're trying to do.

cheers,
Colin


On Tue, Jan 21, 2014 at 10:00 PM, Yu Li  wrote:

> Hi Colin,
>
> Thanks for the reply. I guess you're referring to the below methods? If so,
> I'm afraid it can only get an empty block but not the real block I really
> want to move.
> =
>   public ExtendedBlock(final String poolId, final long blockId) {
> this(poolId, blockId, 0, 0);
>   }
>   public ExtendedBlock(final String poolId, final long blkid, final long
> len,
>   final long genstamp) {
> this.poolId = poolId;
> block = new Block(blkid, len, genstamp);
>   }
> =
>
> I'm now working around this by adding an option to directly move the block
> during fsck (NmaenodeFsck). But still, if there's any method to directly
> get the Block instance by blockid through any api, it would be great to
> know. :-)
>
> On 21 January 2014 03:46, Colin McCabe  wrote:
>
> > In order to uniquely identify a block in hadoop 2.2, you are going to
> need
> > both a block and a block pool ID.  You can construct a Block object with
> > those two items.
> >
> > On Wed, Jan 15, 2014 at 8:46 AM, Yu Li  wrote:
> >
> > > Dear all,
> > >
> > > As titled, I actually have two questions here:
> > >
> > > 1. In current releases like hadoop-2.2.0, is block id unique and able
> to
> > > locate a Block in HDFS? I'm asking because I could see HDFS-4645 is
> > trying
> > > to resolve the uniqueness issue. However, from the code comment it
> seems
> > > block id is expected to be unique
> > >
> > >
> > It's expected to be unique within a block pool.  You can get the block
> pool
> > ID you are using from your FSNamesystem object in 2.2.
> >
> > best,
> > Colin
> >
> > 2. It seems there's no method to get a Block object through a block id.
> > > However, there's some scenarios need such method, like if I use the
> > > FavouredNode feature to create logical datanode group and planned to
> put
> > > some data within a group, then I might need to periodically check
> whether
> > > there's block somehow placed outside the group, and move it back. In
> such
> > > scenario, I would need to first get block ids, then move. But to move
> > them,
> > > it seems we need a Block instance to initiate the ExtendedBlock object.
> > >
> > > Any suggestion, or reference to existing JIRA would be highly
> > appreciated,
> > > and thanks in advance!
> > >
> > > --
> > > Best Regards,
> > > Li Yu
> > >
> >
>
>
>
> --
> Best Regards,
> Li Yu
>

Re: Is there a way to get a Block through block id?

2014-01-20 Thread Colin McCabe

In order to uniquely identify a block in hadoop 2.2, you are going to need
both a block and a block pool ID.  You can construct a Block object with
those two items.

On Wed, Jan 15, 2014 at 8:46 AM, Yu Li  wrote:

> Dear all,
>
> As titled, I actually have two questions here:
>
> 1. In current releases like hadoop-2.2.0, is block id unique and able to
> locate a Block in HDFS? I'm asking because I could see HDFS-4645 is trying
> to resolve the uniqueness issue. However, from the code comment it seems
> block id is expected to be unique
>
>
It's expected to be unique within a block pool.  You can get the block pool
ID you are using from your FSNamesystem object in 2.2.

best,
Colin

2. It seems there's no method to get a Block object through a block id.
> However, there's some scenarios need such method, like if I use the
> FavouredNode feature to create logical datanode group and planned to put
> some data within a group, then I might need to periodically check whether
> there's block somehow placed outside the group, and move it back. In such
> scenario, I would need to first get block ids, then move. But to move them,
> it seems we need a Block instance to initiate the ExtendedBlock object.
>
> Any suggestion, or reference to existing JIRA would be highly appreciated,
> and thanks in advance!
>
> --
> Best Regards,
> Li Yu
>

Re: deadNodes in DFSInputStream

2013-12-31 Thread Colin McCabe

Take a look at HDFS-4273, which fixes some issues with the read retry logic.

cheers,
Colin

On Tue, Dec 31, 2013 at 1:25 AM, lei liu  wrote:
> I use Hbase-0.94 and CDH-4.3.1
> When RegionServer read data from loca datanode, if local datanode is dead,
> the local datanode is add to deadNodes, and RegionServer read data from
> remote datanode. But when local datanode is become live, RegionServer still
> read data from remote datanode, that reduces the performance of RegionServer.
> We need to on way that remove local datanode from deadNodes when the local
> datanode is become live.
>
> I can do it, please everybody give some advises.
>
>
> Thanks,
>
> LiuLei

Re: ByteBuffer-based read API for pread

2013-12-31 Thread Colin McCabe

It's true that HDFS (and Hadoop generally) doesn't currently have a
ByteBuffer-based pread API.  There is a JIRA open for this issue,
HDFS-3246.

I do not know if implementing a ByteBuffer API for pread would be as
big of a performance gain as implementing it for regular read.  One
issue is that when you do a pread, you always destroy the old
BlockReader object and create a new one.  This overhead may tend to
make the overhead of doing a single buffer copy less significant in
terms of total cost.  I suppose it partly depends on how big the
buffer is that is being copied... a really large pread would certainly
benefit from avoiding the copy into a byte array.

cheers,
Colin

On Tue, Dec 31, 2013 at 1:01 AM, lei liu  wrote:
>  There is ByteBuffer read API for sequential read in CDH4.3.1,
> example:public synchronized int read(final ByteBuffer buf) throws
> IOException  API. But there is not ByteBuffe read API for pread.
>
> Why don't support ByteBuffer read API for pread in CDH4.3.1?
>
> Thanks,
>
> LiuLei

Re: Next releases

2013-12-06 Thread Colin McCabe

If 2.4 is released in January, I think it's very unlikely to include
symlinks.  There is still a lot of work to be done before they're
usable.  You can look at the progress on HADOOP-10019.  For some of
the subtasks, it will require some community discussion before any
code can be written.

For better or worse, symlinks have not been requested by users as
often as features like NFS export, HDFS caching, ACLs, etc, so effort
has been focused on those instead.

For now, I think we should put the symlinks-disabling patches
(HADOOP-10020, etc) into branch-2, so that they will be part of the
next releases without additional effort.

I would like to see HDFS caching make it into 2.4.  The APIs and
implementation are beginning to stabilize, and around January it
should be ok to backport to a stable branch.

best,
Colin

On Thu, Dec 5, 2013 at 3:57 PM, Arun C Murthy  wrote:
> Ok, I've updated https://wiki.apache.org/hadoop/Roadmap with a initial 
> strawman list for hadoop-2.4 which I feel we can get out in Jan.
>
> What else would folks like to see? Please keep timeframe in mind.
>
> thanks,
> Arun
>
> On Dec 2, 2013, at 10:55 AM, Arun C Murthy  wrote:
>
>>
>> On Nov 13, 2013, at 1:55 PM, Jason Lowe  wrote:
>>>
>>>
>>> +1 to limiting checkins of patch releases to Blockers/Criticals.  If 
>>> necessary committers check into trunk/branch-2 only and defer to the patch 
>>> release manager for the patch release merge.  Then there should be fewer 
>>> surprises for everyone what ended up in a patch release and less likely the 
>>> patch release becomes destabilized from the sheer amount of code churn.  
>>> Maybe this won't be necessary if everyone understands that the patch 
>>> release isn't the only way to get a change out in timely manner.
>>
>> I've updated https://wiki.apache.org/hadoop/Roadmap to reflect that we only 
>> put in Blocker/Critical bugs into Point Releases.
>>
>> Committers, from now, please exercise extreme caution when committing to a 
>> point release: they should only be limited to Blocker bugs.
>>
>> thanks,
>> Arun
>>
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

Re: Deprecate BackupNode

2013-12-05 Thread Colin McCabe

+1

Colin
On Dec 4, 2013 3:07 PM, "Suresh Srinivas"  wrote:

> It is almost an year a jira proposed deprecating backup node -
> https://issues.apache.org/jira/browse/HDFS-4114.
>
> Maintaining it adds unnecessary work. As an example, when I added support
> for retry cache there were bunch of code paths related to backup node that
> added unnecessary work. I do not know of anyone who is using this.
>
> If there are no objections, I want to deprecate that code in 2.3 release
> and remove it from trunk. I will start this work next week.
>
> Regards,
>
> Suresh
>
>
> --
> http://hortonworks.com/download/
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Re: Next releases

2013-11-14 Thread Colin McCabe

On Wed, Nov 13, 2013 at 10:10 AM, Arun C Murthy  wrote:
>
> On Nov 12, 2013, at 1:54 PM, Todd Lipcon  wrote:
>
>> On Mon, Nov 11, 2013 at 2:57 PM, Colin McCabe wrote:
>>
>>> To be honest, I'm not aware of anything in 2.2.1 that shouldn't be
>>> there.  However, I have only been following the HDFS and common side
>>> of things so I may not have the full picture.  Arun, can you give a
>>> specific example of something you'd like to "blow away"?
>
> There are bunch of issues in YARN/MapReduce which clearly aren't *critical*, 
> similarly in HDFS a cursory glance showed up some 
> *enhancements*/*improvements* in CHANGES.txt which aren't necessary for a 
> patch release, plus things like:
>
> HADOOP-9623
> Update jets3t dependency to 0.9.0

I'm fine with reverting HADOOP-9623 from branch-2.2 and pushing it out
to branch-2.3.  It does bring in httpcore, a dependency that wasn't
there before.

Colin

>
> Here is a straw man proposal:
>
> 
> A patch (third version) release should only include *blocker* bugs which are 
> critical from an operational, security or data-integrity issues.
>
> This way, we can ensure that a minor series release (2.2.x or 2.3.x or 2.4.x) 
> is always release-able, and more importantly, deploy-able at any point in 
> time.
>
> 
>
> Sandy did bring up a related point about timing of releases and the urge for 
> everyone to cram features/fixes into a dot release.
>
> So, we could remedy that situation by doing a release every 4-6 weeks (2.3, 
> 2.4 etc.) and keep the patch releases limited to blocker bugs.
>
> Thoughts?
>
> thanks,
> Arun
>
>
>
>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

Re: Next releases

2013-11-11 Thread Colin McCabe

HADOOP-10020 is a JIRA that disables symlinks temporarily.  They will
be disabled in 2.2.1 as well, if the plan is to have only minor fixes
in that branch.

To be honest, I'm not aware of anything in 2.2.1 that shouldn't be
there.  However, I have only been following the HDFS and common side
of things so I may not have the full picture.  Arun, can you give a
specific example of something you'd like to "blow away"?

Colin

On Mon, Nov 11, 2013 at 10:06 AM, Hari Mankude  wrote:
> Hi Arun,
>
> Another feature that would be relevant and got deferred was the symlink
> work (HADOOP-10020) that Colin and Andrew were working on. Can we include
> this in hadoop-2.3.0 also?
>
> thanks
> hari
>
>
> On Sun, Nov 10, 2013 at 2:07 PM, Alejandro Abdelnur wrote:
>
>> Arun, thanks for jumping on this.
>>
>> On hadoop branch-2.2. I've quickly scanned the commit logs starting from
>> the 2.2.0 release and I've found around 20 JIRAs that I like seeing in
>> 2.2.1. Not all of them are bugs but the don't shake anything and improve
>> usability.
>>
>> I presume others will have their own laundry lists as well and I wonder the
>> union of all of them how much adds up to the current 81 commits.
>>
>> How about splitting the JIRAs among a few contributors to assert there is
>> nothing risky in there? And if so get discuss getting rid of those commits
>> for 2.2.1. IMO doing that would be cheaper than selectively applying
>> commits on a fresh branch.
>>
>> Said this, I think we should get 2.2.1 out of the door before switching
>> main efforts to 2.3.0. I volunteer myself to drive 2.2.1 a  release if ASAP
>> if you don't have the bandwidth at the moment for it.
>>
>> Cheers.
>>
>> Alejandro
>>
>>
>> 
>> Commits in branch-2.2 that I'd like them to be in the 2.2.1 release:
>>
>> The ones prefixed with '*' technically are not bugs.
>>
>>  YARN-1284. LCE: Race condition leaves dangling cgroups entries for killed
>> containers. (Alejandro Abdelnur via Sandy Ryza)
>>  YARN-1265. Fair Scheduler chokes on unhealthy node reconnect (Sandy Ryza)
>>  YARN-1044. used/min/max resources do not display info in the scheduler
>> page (Sangjin Lee via Sandy Ryza)
>>  YARN-305. Fair scheduler logs too many "Node offered to app" messages.
>> (Lohit Vijayarenu via Sandy Ryza)
>> *MAPREDUCE-5463. Deprecate SLOTS_MILLIS counters. (Tzuyoshi Ozawa via Sandy
>> Ryza)
>>  YARN-1259. In Fair Scheduler web UI, queue num pending and num active apps
>> switched. (Robert Kanter via Sandy Ryza)
>>  YARN-1295. In UnixLocalWrapperScriptBuilder, using bash -c can cause Text
>> file busy errors. (Sandy Ryza)
>> *MAPREDUCE-5457. Add a KeyOnlyTextOutputReader to enable streaming to write
>> out text files without separators (Sandy Ryza)
>> *YARN-1258. Allow configuring the Fair Scheduler root queue (Sandy Ryza)
>> *YARN-1288. Make Fair Scheduler ACLs more user friendly (Sandy Ryza)
>>  YARN-1330. Fair Scheduler: defaultQueueSchedulingPolicy does not take
>> effect (Sandy Ryza)
>>  HDFS-5403. WebHdfs client cannot communicate with older WebHdfs servers
>> post HDFS-5306. Contributed by Aaron T. Myers.
>> *YARN-1335. Move duplicate code from FSSchedulerApp and FiCaSchedulerApp
>> into SchedulerApplication (Sandy Ryza)
>> *YARN-1333. Support blacklisting in the Fair Scheduler (Tsuyoshi Ozawa via
>> Sandy Ryza)
>> *MAPREDUCE-4680. Job history cleaner should only check timestamps of files
>> in old enough directories (Robert Kanter via Sandy Ryza)
>>  YARN-1109. Demote NodeManager "Sending out status for container" logs to
>> debug (haosdent via Sandy Ryza)
>> *YARN-1321. Changed NMTokenCache to support both singleton and an instance
>> usage. Contributed by Alejandro Abdelnur
>>  YARN-1343. NodeManagers additions/restarts are not reported as node
>> updates in AllocateResponse responses to AMs. (tucu)
>>  YARN-1381. Same relaxLocality appears twice in exception message of
>> AMRMClientImpl#checkLocalityRelaxationConflict() (Ted Yu via Sandy Ryza)
>>  HADOOP-9898. Set SO_KEEPALIVE on all our sockets. Contributed by Todd
>> Lipcon.
>>  YARN-1388. Fair Scheduler page always displays blank fair share (Liyin
>> Liang via Sandy Ryza)
>>
>>
>>
>> On Fri, Nov 8, 2013 at 10:35 PM, Chris Nauroth > >wrote:
>>
>> > Arun, what are your thoughts on test-only patches?  I know I've been
>> > merging a lot of Windows test stabilization patches down to branch-2.2.
>> >  These can't rightly be called blockers, but they do improve dev
>> > experience, and there is no risk to product code.
>> >
>> > Chris Nauroth
>> > Hortonworks
>> > http://hortonworks.com/
>> >
>> >
>> >
>> > On Fri, Nov 8, 2013 at 1:30 AM, Steve Loughran > > >wrote:
>> >
>> > > On 8 November 2013 02:42, Arun C Murthy  wrote:
>> > >
>> > > > Gang,
>> > > >
>> > > >  Thinking through the next couple of releases here, appreciate f/b.
>> > > >
>> > > >  # hadoop-2.2.1
>> > > >
>> > > >  I was looking through commit logs and there is a *lot* of conte

Re: HDFS single datanode cluster issues

2013-11-07 Thread Colin McCabe

First of all, HDFS isn't really the right choice for single-node
environments.  I would recommend using LocalFileSystem in this case.
If you're evaluating HDFS and only have one computer, it will really
be better to run several VMs to see how it works, rather than running
just one Datanode.

You are correct that there are some issues with the pipeline recovery
code on small clusters.  In a lot of cases, pipeline recovery can make
the whole output stream fail, when it is unable to find enough nodes.
We filed HDFS-5131 to address those issues.

In the meantime, you can set
dfs.client.block.write.replace-datanode-on-failure.enable to false,
and dfs.client.block.write.replace-datanode-on-failure.policy to
NEVER.  Based on your comment, it seems like you are already trying to
do this.  Make sure you are setting this configuration on the client
side as well as on the server side-- then you should not see error
messages about pipeline recovery.

best,
Colin

On Wed, Oct 30, 2013 at 8:06 AM, David Mankellow
 wrote:
> We are mapping a 1:1 replication.
>
> We have tried setting
> dfs.client.block.write.replace-datanode-on-failure.enable to NEVER but it
> seems to be ignored.
>
> We have tried the following:
> ===
>   
> dfs.client.block.write.replace-datanode-on-failure.enable
> false
>   
>   
> dfs.client.block.write.replace-datanode-on-failure.policy
> NEVER
>   
> ===
>   
> dfs.client.block.write.replace-datanode-on-failure.enable
> true
>   
>   
> dfs.client.block.write.replace-datanode-on-failure.policy
> NEVER
>   
> ===
>   
> dfs.client.block.write.replace-datanode-on-failure.enable
> true
>   
>
> Any more help would be greatly appreciated.
>
> Thanks,
> Dave
>
>
>
>
> On 30/10/2013 10:50, Allan Wilson wrote:
>>
>> Hi David
>>
>> How does your block replica count compare to the number of datanodes in
>> your cluster?
>>
>> Anyway...I found this in the online doc.  You may want to use the NEVER
>> policy.
>>
>> dfs.client.block.write.replace-datanode-on-failure.enable true If there is
>> a datanode/network failure in the write pipeline, DFSClient will try to
>> remove the failed datanode from the pipeline and then continue writing
>> with
>> the remaining datanodes. As a result, the number of datanodes in the
>> pipeline is decreased. The feature is to add new datanodes to the
>> pipeline.
>> This is a site-wide property to enable/disable the feature. When the
>> cluster size is extremely small, e.g. 3 nodes or less, cluster
>> administrators may want to set the policy to NEVER in the default
>> configuration file or disable this feature. Otherwise, users may
>> experience
>> an unusually high rate of pipeline failures since it is impossible to find
>> new datanodes for replacement. See also
>> dfs.client.block.write.replace-datanode-on-failure.policy
>> dfs.client.block.write.replace-datanode-on-failure.policy DEFAULT This
>> property is used only if the value of
>> dfs.client.block.write.replace-datanode-on-failure.enable is true. ALWAYS:
>> always add a new datanode when an existing datanode is removed. NEVER:
>> never add a new datanode. DEFAULT: Let r be the replication number. Let n
>> be the number of existing datanodes. Add a new datanode only if r is
>> greater than or equal to 3 and either (1) floor(r/2) is greater than or
>> equal to n; or (2) r is greater than n and the block is hflushed/appended.
>>
>> Allan
>> On Oct 30, 2013 5:52 AM, "David Mankellow" 
>> wrote:
>>
>>> Hi all,
>>>
>>> We are having difficulty writing any logs to a HDFS cluster of less than
>>> 3
>>> nodes. This has been since the update between cdh4.2 and 4.3 (4.4 is also
>>> the same). Has anything changed that may make this occur and is there
>>> anything that can be done to rectify the situation, so we can use a
>>> single
>>> datanode once more?
>>>
>>> The error log contains errors about "lease recovery" and "Failed to add a
>>> datanode".
>>>
>>> Here is an example stack trace:
>>>
>>> java.io.IOException: Failed to add a datanode.  User may turn off this
>>> feature by setting
>>> dfs.client.block.write.**replace-datanode-on-failure.**policy
>>> in configuration, where the current policy is DEFAULT.  (Nodes: current=[
>>> 5.9.130.139:50010, 5.9.130.140:50010], original=[5.9.130.139:50010,
>>> 5.9.130.140:50010])
>>>  at org.apache.hadoop.hdfs.**DFSOutputStream$DataStreamer.**
>>> findNewDatanode(**DFSOutputStream.java:816)
>>>  at org.apache.hadoop.hdfs.**DFSOutputStream$DataStreamer.**
>>> addDatanode2ExistingPipeline(**DFSOutputStream.java:876)
>>>  at org.apache.hadoop.hdfs.**DFSOutputStream$DataStreamer.**
>>> setupPipelineForAppendOrRecove**ry(DFSOutputStream.java:982)
>>>  at org.apache.hadoop.hdfs.**DFSOutputStream$DataStreamer.**
>>> run(DFSOutputStream.java:493)
>>> FSDataOutputStream#close error:
>>> java.io.IOException: Failed to add a datanode.  User may turn off this
>>> feature by setting
>>> dfs.client.block.write.**replace-datanode-on-failure.**policy
>>> in configu

Re: Replacing the JSP web UIs to HTML 5 applications

2013-11-01 Thread Colin McCabe

> built
>> > > > in JS web ui that consumes JSON and Hue has built an external web UI
>> > > > that also consumes JSON. In the case of Hue UI, Oozie didn't have to
>> > > > change anything to get that UI and improvements on the Hue UI don't
>> > > > require changes in Oozie unless it is to produce additional
>> > information.
>> > > >
>> > > > hope this clarifies.
>> > > >
>> > > > Thx
>> > > >
>> > > >
>> > > > On Mon, Oct 28, 2013 at 4:06 PM, Haohui Mai 
>> > > wrote:
>> > > >
>> > > > > Echo my comments on HDFS-5402:
>> > > > >
>> > > > > bq. If we're going to remove the old web UI, I think the new web
>> > > > > UI has to have the same level of unit testing. We shouldn't go
>> > > > > backwards in terms of unit testing.
>> > > > >
>> > > > > I take a look at TestNamenodeJspHelper / TestDatanodeJspHelper /
>> > > > > TestClusterJspHelper. It seems to me that we can merge these tests
>> > > > > with
>> > > > the
>> > > > > unit tests on JMX.
>> > > > >
>> > > > > bq. If we are going to
>> > > > > remove this capability, we need to add some other command-line
>> > > > > tools to get the same functionality. These tools could use REST if
>> > > > > we have that, or JMX, but they need to exist before we can
>> > > > > consider removing the old UI.
>> > > > >
>> > > > > This is a good point. Since all information are available through
>> > > > > JMX,
>> > > > the
>> > > > > easiest way to approach it is to write some scripts using Node.js.
>> > > > > The architecture of the new Web UIs is ready for this.
>> > > > >
>> > > > >
>> > > > > On Mon, Oct 28, 2013 at 3:57 PM, Alejandro Abdelnur
>> > > > > > > > > > >wrote:
>> > > > >
>> > > > > > Producing JSON would be great. Agree with Colin that we should
>> > > > > > leave
>> > > > for
>> > > > > > now the current JSP based web ui.
>> > > > > >
>> > > > > > thx
>> > > > > >
>> > > > > >
>> > > > > > On Mon, Oct 28, 2013 at 11:16 AM, Colin McCabe <
>> > > cmcc...@alumni.cmu.edu
>> > > > > > >wrote:
>> > > > > >
>> > > > > > > This is a really interesting project, Haohui.  I think it will
>> > > > > > > make our web UI much nicer.
>> > > > > > >
>> > > > > > > I have a few concerns about removing the old web UI, however:
>> > > > > > >
>> > > > > > > * If we're going to remove the old web UI, I think the new web
>> > > > > > > UI
>> > > has
>> > > > > > > to have the same level of unit testing.  We shouldn't go
>> > > > > > > backwards
>> > > in
>> > > > > > > terms of unit testing.
>> > > > > > >
>> > > > > > > * Most of the deployments of elinks and links out there don't
>> > > support
>> > > > > > > Javascript.  This is just a reality of life when using CentOS
>> > > > > > > 5 or
>> > > 6,
>> > > > > > > which many users are still using.  I have used "links" to
>> > > > > > > diagnose problems through the web UI in the past, in systems
>> > > > > > > where access to the cluster was available only through telnet.
>> > > > > > > If we are going to remove this capability, we need to add some
>> > > > > > > other command-line
>> > > tools
>> > > > > > > to get the same functionality.  These tools could use REST if
>> > > > > > > we
>> > > have
>> > > > > > > that, or JMX, but they need to exist before we can consider
>> > > removing
>> > > > > > > the old UI.
>> > > > > > >
>> > > > > > > best,
>> &g

Re: [VOTE] Merge HDFS-4949 to trunk

2013-10-28 Thread Colin McCabe

With 3 +1s, the vote passes.  Thanks, all.

best,
Colin

On Fri, Oct 25, 2013 at 4:01 PM, Colin McCabe  wrote:
> On Fri, Oct 25, 2013 at 10:07 AM, Suresh Srinivas
>  wrote:
>> I posted a comment in the other thread about feature branch merges.
>>
>> My preference is to make sure the requirements we have for regular patches
>> be applied to feature branch patch as well (3 +1s is the only exception).
>> Also
>> adding details about what functionality is missing (I posted a comment on
>> HDFS-4949)
>> and the changes that deferred that will be done post merge to trunk would
>> be good.
>>
>> It would be better to start the merge vote  when the work is ready instead
>> of
>> trying to optimize 1 week by doing the required work for merging in
>> parallel with
>> the vote.
>
> OK.
>
>>
>> If all the requirements for merging have been met, I am +1 on the merge,
>> without
>> the need for restarting the vote.
>>
>
> I think the requirements are all in place right now.  I'll create a
> JIRA detailing the post-merge subtasks just to make it clearer what
> the plan is from here.
>
> If there are no more comments, I'll commit later tonight.
>
> I wouldn't mind waiting a week if there was a feature someone
> absolutely felt we needed pre-merge, but I also feel like it would be
> two weeks, due to Hadoop Summit next week.
>
> best,
> Colin
>
>
>>
>> On Thu, Oct 24, 2013 at 11:29 PM, Aaron T. Myers  wrote:
>>
>>> I don't necessarily disagree with the general questions about the
>>> procedural issues of merge votes. Thanks for bringing that up in the other
>>> thread you mentioned. To some extent it seems like much of this has been
>>> based on custom, and if folks feel that more precisely defining the merge
>>> vote process is warranted, then I think we should take that up over on that
>>> thread.
>>>
>>> With regard to this particular merge vote, I've spoken with Chris offline
>>> about his feelings on this. He said that he is not dead-set on restarting
>>> the vote, though he suspects that others may be. It seems to me the
>>> remaining unfinished asks (e.g. updating the design doc) can reasonably be
>>> done either after this vote but before the merge to trunk proper, or could
>>> even reasonably be done after merging to trunk.
>>>
>>> Given that, I'll lend my +1 to this merge. I've been reviewing the branch
>>> pretty consistently since work started on it, and have personally
>>> run/tested several builds of it along the way. I've also reviewed the
>>> design thoroughly. The implementation, overall design, and API seem to me
>>> plenty stable enough to be merged into trunk. I know that there remains a
>>> handful of javac warnings in the branch that aren't in trunk, but I trust
>>> those will be taken care of before the merge.
>>>
>>> If anyone out there does feel strongly that this merge vote should be
>>> restarted, then please speak up soon. Again, we can restart the vote if
>>> need be, but I honestly think we'll gain very little by doing so.
>>>
>>> Best,
>>> Aaron
>>>
>>>
>>> On Fri, Oct 25, 2013 at 5:45 AM, Chris Nauroth >> >wrote:
>>>
>>> > Hi Andrew,
>>> >
>>> > I've come to the conclusion that I'm very confused about merge votes.
>>>  :-)
>>> >  It's not just about HDFS-4949.  I'm confused about all merge votes.
>>> >  Rather than muddy the waters here, I've started a separate discussion on
>>> > common-dev.
>>> >
>>> > I do agree with the general plan outlined here, and I will comment
>>> directly
>>> > on the HDFS-4949 jira with a binding +1 when I see that we've completed
>>> > that plan.
>>> >
>>> > Chris Nauroth
>>> > Hortonworks
>>> > http://hortonworks.com/
>>> >
>>> >
>>> >
>>> > On Wed, Oct 23, 2013 at 2:18 PM, Andrew Wang >> > >wrote:
>>> >
>>> > > Hey Chris,
>>> > >
>>> > > Right now we're on track to have all of those things done by tomorrow.
>>> > > Since the remaining issues are either not technical or do not involve
>>> > major
>>> > > changes, I was hoping we could +1 this merge vote in the spirit of "+1
>>> > > pending jenkins". We've gotten clean

Re: libhdfs portability

2013-10-28 Thread Colin McCabe

On Mon, Oct 28, 2013 at 4:24 PM, Kyle Sletmoe
 wrote:
> I have written a WebHDFSClient and I do not believe that reusing
> connections is enough to noticeably speed up transfers in my case. I did
> some tests and on average it took roughly 14 minutes to transfer a 3.6 GB
> file to an HDFS on my local network (I tried the same operation using cURL,
> with similar results). I tried transferring the exact same file with the
> hdfs->dfs->copyFromLocal command, and it took on average 40 seconds. I need
> to be able to reliably transfer files that are in the 250 GB - 1TB range,
> and I really need the speed afforded by the "direct" transferring method
> that libhdfs uses. Does libhdfs work with Hadoop 2.2.0 (if I use it in
> Linux)?

libhdfs is the basis of a lot of software built on top of HDFS, such
as Impala and fuse_dfs, and yes, it works.

Patches that improve portabilty are welcome.  However, rather than
#ifdefs, I would rather see platform-specific files that implement
whatever functionality is platform-specific.

Another option for you is to use the new NFS v3 gateway included in
Hadoop 2.  I have heard that newer version of Windows finally include
some kind of NFS support.  (However, older versions, such as Windows
XP, do not have this support).

best,
Colin


>
> --
> Kyle Sletmoe
>
> *Urban Robotics Inc.**
> *Software Engineer
>
> 33 NW First Avenue, Suite 200 | Portland, OR 97209
> c: (541) 621-7516 | e: kyle.slet...@urbanrobotics.net
>
> http://www.urbanrobotics.net
>
>
> On Mon, Oct 28, 2013 at 4:14 PM, Haohui Mai  wrote:
>
>> I believe that the WebHDFS API is your best bet for now. The current
>> implementation of WebHDFSClient does not reuse the HTTP connections, which
>> leads to a large part of the performance penalty.
>>
>> You might want to implement your own version that reuses HTTP connection to
>> see whether it meets your performance requirements.
>>
>> Thanks,
>> Haohui
>>
>>
>> On Mon, Oct 28, 2013 at 3:38 PM, Kyle Sletmoe <
>> kyle.slet...@urbanrobotics.net> wrote:
>>
>> > Now that Hadoop 2.2.0 is Windows compatible, is there going to be work on
>> > creating a portable version of libhdfs for C/C++ interaction with HDFS? I
>> > know I can use the WebHDFS REST API, but the data transfer rates are
>> > abysmally slow compared to the direct interaction via libhdfs.
>> >
>> > Regards,
>> > --
>> > Kyle Sletmoe
>> >
>> > *Urban Robotics Inc.**
>> > *Software Engineer
>> >
>> > 33 NW First Avenue, Suite 200 | Portland, OR 97209
>> > c: (541) 621-7516 | e: kyle.slet...@urbanrobotics.net
>> >
>> > http://www.urbanrobotics.net
>> >
>> > --
>> > *Information contained herein is subject to the Code of Federal
>> Regulations
>> > Chapter 22 International Traffic in Arms Regulations. This data may not
>> be
>> > resold, diverted, transferred, transshipped, made available to a foreign
>> > national within the United States, or otherwise disposed of in any other
>> > country outside of its intended destination, either in original form or
>> > after being incorporated through an intermediate process into other data
>> > without the prior written approval of the US Department of State.
>> >  **Penalties
>> > for violation include bans on defense and military work, fines and
>> > imprisonment.*
>> >
>>
>> --
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity to
>> which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>>
>
> --
> *Information contained herein is subject to the Code of Federal Regulations
> Chapter 22 International Traffic in Arms Regulations. This data may not be
> resold, diverted, transferred, transshipped, made available to a foreign
> national within the United States, or otherwise disposed of in any other
> country outside of its intended destination, either in original form or
> after being incorporated through an intermediate process into other data
> without the prior written approval of the US Department of State.  **Penalties
> for violation include bans on defense and military work, fines and
> imprisonment.*

Re: Replacing the JSP web UIs to HTML 5 applications

2013-10-28 Thread Colin McCabe

This is a really interesting project, Haohui.  I think it will make
our web UI much nicer.

I have a few concerns about removing the old web UI, however:

* If we're going to remove the old web UI, I think the new web UI has
to have the same level of unit testing.  We shouldn't go backwards in
terms of unit testing.

* Most of the deployments of elinks and links out there don't support
Javascript.  This is just a reality of life when using CentOS 5 or 6,
which many users are still using.  I have used "links" to diagnose
problems through the web UI in the past, in systems where access to
the cluster was available only through telnet.  If we are going to
remove this capability, we need to add some other command-line tools
to get the same functionality.  These tools could use REST if we have
that, or JMX, but they need to exist before we can consider removing
the old UI.

best,
Colin

On Fri, Oct 25, 2013 at 7:31 PM, Haohui Mai  wrote:
> Thanks for the reply, Luke. Here I just echo my response from the jira:
>
> bq. this client-side js only approach, which is less secure than a
> progressively enhanced hybrid approach used by YARN. The recent gmail
> XSS fiasco highlights the issue.
>
> I'm presenting an informal security analysis to compare the security of the
> old and the new web UIs.
>
> An attacker launches an XSS attack by injecting malicious code which are
> usually HTML or JavaScript fragments into the web page, so that the
> malicious code can have the same privileges of the web page.
>
> First, in the scope of XSS attacks, note that the threat models of
> launching XSS attacks on Internet sites Gmail/Linkedin and the one of the
> Hadoop UIs are different. They have fundamental different sets of external
> inputs that the attackers have control to. Internet sites have little
> control of these inputs. In the case of Gmail / Linkedin, an attack can
> send you a crafted e-mail, or put malicious description in his /
> her Linkedin profile. The sets of external inputs are *restricted* in
> Hadoop UIs. The new web UIs take JMX and WebHDFS as inputs. The
> attacker has to launch a XSS attack by:
>
> * Compromise the jars so that the output of JMX / WebHDFS have the
> malicious code.
> * Replace the web UIs completely to include the malicious code.
>
> In either case *the attacker has to compromise the hadoop core or the
> namenode*. That means the new web UIs are at least as secure as the hadoop
> core, and the namenode machine.
>
> Second, I argue that using client-side templates are more secure than the
> current JSP-based server-side templates. To defend against XSS
> attacks, both techniques have to filter the external inputs at *every*
> possible execution paths. Several facts much be taken into
> plays when evaluating the security of both approaches in real-world
> environments:
>
> * The JavaScript libraries used in the new web UIs have survived in
> extremely large-scale production tests. jQuery is used by Google and
>  Microsoft, bootstrap is used by Twitter, and dust.js is used by Linkedin.
> All libraries survived from hundreds of thousands of
>  attack attempts on a daily basis. I agree that the libraries might still
> be imperfect, but there's no way that we can test the JSP web
>  UIs to achieve the same level of assurances given the amount of resources
> the community has.
> * Client-side templates consolidate all filtering logic in one central
> place. Recall that the goal is to filter all external inputs at every
>  execution paths, this is a much more systematic approach compared to the
> server-side templates we have today. It is difficult (if not
>  impossible) to do it in a JSP/ASP/PHP application, since such filtering
> can be only achieved via ad-hoc approaches ([1] shows some
>  empirical data). Also, HDFS-4901 recently describes a XSS vulnerability in
> browseDirectory.jsp.
>
> bq. You'd require proper SSL (not self signed) setup to avoid JS
> injection
>
> Commodity browsers enforce Same-Origin Policy to defend against code
> injections. It has nothing to do with what kinds of SSL certificates
> you hold.
>
> bq.  I also have concerns that we commit these changes without matching
> unit tests
>
> The JavaScript code can be automatically tested. The same code can be run
> by node.js and the test can compared with pre-defined
> results. It is also possible to write an adapter to use Rhino to accomplish
> the same task. We can discuss how to integrate them into
> the maven test routines in a different thread.
>
> bq. Client side rendering completely breaks the workflows for ops who rely
> on text based terminal/emacs/vim browsers (no js support) to
> monitor component UI.
>
> links / elinks (http://elinks.or.cz/) are text-based web browsers that
> support JavaScript.
>
> bq. The priority/requirements for UI in core Hadoop should be security and
> correctness, which client side templating cannot address properly
> so far.
>
> I agree that we should focus on security and correctness. T

Re: [VOTE] Merge HDFS-4949 to trunk

2013-10-25 Thread Colin McCabe

 only real result will be delaying the merge by a week.
>> > >
>> > > Thanks,
>> > > Andrew
>> > >
>> > >
>> > > On Wed, Oct 23, 2013 at 1:03 PM, Chris Nauroth <
>> cnaur...@hortonworks.com
>> > > >wrote:
>> > >
>> > > > I've received some feedback that we haven't handled this merge vote
>> the
>> > > > same as other comparable merge votes, and that the vote should be
>> reset
>> > > > because of this.
>> > > >
>> > > > The recent custom is that we only call for the merge vote after all
>> > > > pre-requisites have been satisfied.  This would include committing to
>> > the
>> > > > feature branch all patches that the devs deem necessary before the
>> code
>> > > > lands in trunk, posting a test plan, posting an updated design doc in
>> > > case
>> > > > implementation choices diverged from the original design doc, and
>> > > getting a
>> > > > good test-patch run from Jenkins on the merge patch.  This was the
>> > > process
>> > > > followed for other recent major features like HDFS-2802 (snapshots),
>> > > > HDFS-347 (short-circuit reads via sharing file descriptors), and
>> > > > HADOOP-8562 (Windows compatibility).  In this thread, we've diverged
>> > from
>> > > > that process by calling for a vote on a branch that hasn't yet
>> > completed
>> > > > the pre-requisites and stating a plan for work to be done before the
>> > > merge.
>> > > >
>> > > > I still support this work, but can we please restart the vote after
>> the
>> > > > pre-requisites have landed in the branch?
>> > > >
>> > > > Chris Nauroth
>> > > > Hortonworks
>> > > > http://hortonworks.com/
>> > > >
>> > > >
>> > > >
>> > > > On Fri, Oct 18, 2013 at 1:37 PM, Chris Nauroth <
>> > cnaur...@hortonworks.com
>> > > > >wrote:
>> > > >
>> > > > > +1
>> > > > >
>> > > > > Sounds great!
>> > > > >
>> > > > > Regarding testing caching+federation, this is another thing that I
>> > had
>> > > > > intended to pick up as part of HDFS-5149.  I'm not sure if I can
>> get
>> > > this
>> > > > > done in the next 7 days, so I'll keep you posted.
>> > > > >
>> > > > > Chris Nauroth
>> > > > > Hortonworks
>> > > > > http://hortonworks.com/
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Fri, Oct 18, 2013 at 11:15 AM, Colin McCabe <
>> > cmcc...@alumni.cmu.edu
>> > > > >wrote:
>> > > > >
>> > > > >> Hi Chris,
>> > > > >>
>> > > > >> I think it's feasible to complete those tasks in the next 7 days.
>> > > > >> Andrew is on HDFS-5386.
>> > > > >>
>> > > > >> The test plan document is a great idea.  We'll try to get that up
>> > > > >> early next week.  We have a lot of unit tests now, clearly, but
>> some
>> > > > >> manual testing is important too.
>> > > > >>
>> > > > >> If we discover any issues during testing, then we can push out the
>> > > > >> merge timeframe.  For example, one area that probably needs more
>> > > > >> testing is caching+federation.
>> > > > >>
>> > > > >> I would like to get HDFS-5378 and HDFS-5366 in as well.
>> > > > >>
>> > > > >> The other subtasks are "nice to have" but not really critical,
>> and I
>> > > > >> think it would be just as easy to do them in trunk.  We're hoping
>> > that
>> > > > >> having this in trunk will make it easier for us to collaborate on
>> > > > >> HDFS-2832 and other ongoing work.
>> > > > >>
>> > > > >> > Also, I want to confirm that this vote only covers trunk.
>> > > > >> > I don't see branch-2 mentioned, so I assume that we're
>> > > > >> > not voting on merge to branch-2 yet.

Re: [VOTE] Merge HDFS-4949 to trunk

2013-10-24 Thread Colin McCabe

On Thu, Oct 24, 2013 at 1:45 PM, Chris Nauroth  wrote:
> Hi Andrew,
>
> I've come to the conclusion that I'm very confused about merge votes.  :-)
>  It's not just about HDFS-4949.  I'm confused about all merge votes.
>  Rather than muddy the waters here, I've started a separate discussion on
> common-dev.
>
> I do agree with the general plan outlined here, and I will comment directly
> on the HDFS-4949 jira with a binding +1 when I see that we've completed
> that plan.

Thanks, Chris.  Andrew posted a merge patch to HDFS-4949.

We're happy that this code is getting closer to getting into trunk,
since it will make it easier to integrate with the other features in
trunk (like HDFS-2832).  There are still some follow-up tasks, but we
feel that it's easier to do those in trunk.

I'm going to update the design doc in just a moment so be sure to
check it out.  Are there any other things we should do today prior to
merging?

Colin


>
> Chris Nauroth
> Hortonworks
> http://hortonworks.com/
>
>
>
> On Wed, Oct 23, 2013 at 2:18 PM, Andrew Wang wrote:
>
>> Hey Chris,
>>
>> Right now we're on track to have all of those things done by tomorrow.
>> Since the remaining issues are either not technical or do not involve major
>> changes, I was hoping we could +1 this merge vote in the spirit of "+1
>> pending jenkins". We've gotten clean unit test runs on upstream Jenkins as
>> well, so the only fixups we should need for test-patch.sh are findbugs and
>> javac (which are normally pretty trivial to clean up). Of course, all of
>> your listed prereqs and test-patch would be taken care of before actually
>> merging to trunk.
>>
>> So, we can reset the vote if you feel strongly about this, but it seems
>> like the only real result will be delaying the merge by a week.
>>
>> Thanks,
>> Andrew
>>
>>
>> On Wed, Oct 23, 2013 at 1:03 PM, Chris Nauroth > >wrote:
>>
>> > I've received some feedback that we haven't handled this merge vote the
>> > same as other comparable merge votes, and that the vote should be reset
>> > because of this.
>> >
>> > The recent custom is that we only call for the merge vote after all
>> > pre-requisites have been satisfied.  This would include committing to the
>> > feature branch all patches that the devs deem necessary before the code
>> > lands in trunk, posting a test plan, posting an updated design doc in
>> case
>> > implementation choices diverged from the original design doc, and
>> getting a
>> > good test-patch run from Jenkins on the merge patch.  This was the
>> process
>> > followed for other recent major features like HDFS-2802 (snapshots),
>> > HDFS-347 (short-circuit reads via sharing file descriptors), and
>> > HADOOP-8562 (Windows compatibility).  In this thread, we've diverged from
>> > that process by calling for a vote on a branch that hasn't yet completed
>> > the pre-requisites and stating a plan for work to be done before the
>> merge.
>> >
>> > I still support this work, but can we please restart the vote after the
>> > pre-requisites have landed in the branch?
>> >
>> > Chris Nauroth
>> > Hortonworks
>> > http://hortonworks.com/
>> >
>> >
>> >
>> > On Fri, Oct 18, 2013 at 1:37 PM, Chris Nauroth > > >wrote:
>> >
>> > > +1
>> > >
>> > > Sounds great!
>> > >
>> > > Regarding testing caching+federation, this is another thing that I had
>> > > intended to pick up as part of HDFS-5149.  I'm not sure if I can get
>> this
>> > > done in the next 7 days, so I'll keep you posted.
>> > >
>> > > Chris Nauroth
>> > > Hortonworks
>> > > http://hortonworks.com/
>> > >
>> > >
>> > >
>> > > On Fri, Oct 18, 2013 at 11:15 AM, Colin McCabe > > >wrote:
>> > >
>> > >> Hi Chris,
>> > >>
>> > >> I think it's feasible to complete those tasks in the next 7 days.
>> > >> Andrew is on HDFS-5386.
>> > >>
>> > >> The test plan document is a great idea.  We'll try to get that up
>> > >> early next week.  We have a lot of unit tests now, clearly, but some
>> > >> manual testing is important too.
>> > >>
>> > >> If we discover any issues during testing, then we can push out the
>> > >>

Re: [VOTE] Merge HDFS-4949 to trunk

2013-10-18 Thread Colin McCabe

Hi Chris,

I think it's feasible to complete those tasks in the next 7 days.
Andrew is on HDFS-5386.

The test plan document is a great idea.  We'll try to get that up
early next week.  We have a lot of unit tests now, clearly, but some
manual testing is important too.

If we discover any issues during testing, then we can push out the
merge timeframe.  For example, one area that probably needs more
testing is caching+federation.

I would like to get HDFS-5378 and HDFS-5366 in as well.

The other subtasks are "nice to have" but not really critical, and I
think it would be just as easy to do them in trunk.  We're hoping that
having this in trunk will make it easier for us to collaborate on
HDFS-2832 and other ongoing work.

> Also, I want to confirm that this vote only covers trunk.
> I don't see branch-2 mentioned, so I assume that we're
> not voting on merge to branch-2 yet.

Yeah, this vote is only to merge to trunk.

cheers.
Colin

On Fri, Oct 18, 2013 at 10:48 AM, Chris Nauroth
 wrote:
> I agree that the code has reached a stable point.  Colin and Andrew, thank
> you for your contributions and collaboration.
>
> Throughout development, I've watched the feature grow by running daily
> builds in a pseudo-distributed deployment.  As of this week, the full
> feature set is working end-to-end.  I also think we've reached a point of
> API stability for clients who want to control caching programmatically.
>
> There are several things that I'd like to see completed before the merge as
> pre-requisites:
>
> - HDFS-5203: Concurrent clients that add a cache directive on the same path
> may prematurely uncache from each other.
> - HDFS-5385: Caching RPCs are AtMostOnce, but do not persist client ID and
> call ID to edit log.
> - HDFS-5386: Add feature documentation for datanode caching.
> - Standard clean-ups to satisfy Jenkins pre-commit on the merge patch.
>  (For example, I know we've introduced some Javadoc warnings.)
> - Full test suite run on Windows.  (The feature is not yet implemented on
> Windows.  This is just intended to catch regressions.)
> - Test plan posted to HDFS-4949, similar in scope to the snapshot test plan
> that was posted to HDFS-2802.  For my own part, I've run the new unit
> tests, and I've tested end-to-end in a pseudo-distributed deployment.  It's
> unlikely that I'll get a chance to test fully distributed before the vote
> closes, so I'm curious to hear if you've done this on your side yet.
>
> Also, I want to confirm that this vote only covers trunk.  I don't see
> branch-2 mentioned, so I assume that we're not voting on merge to branch-2
> yet.
>
> Before I cast my vote, can you please discuss whether or not it's feasible
> to complete all of the above in the next 7 days?  For the issues assigned
> to me, I do expect to complete them.
>
> Thanks again for all of your hard work!
>
> Chris Nauroth
> Hortonworks
> http://hortonworks.com/
>
>
>
> On Thu, Oct 17, 2013 at 3:07 PM, Colin McCabe wrote:
>
>> +1.  Thanks, guys.
>>
>> best,
>> Colin
>>
>> On Thu, Oct 17, 2013 at 3:01 PM, Andrew Wang 
>> wrote:
>> > Hello all,
>> >
>> > I'd like to call a vote to merge the HDFS-4949 branch (in-memory caching)
>> > to trunk. Colin McCabe and I have been hard at work the last 3.5 months
>> > implementing this feature, and feel that it's reached a level of
>> stability
>> > and utility where it's ready for broader testing and integration.
>> >
>> > I'd also like to thank Chris Nauroth at Hortonworks for code reviews and
>> > bug fixes, and everyone who's reviewed the HDFS-4949 design doc and left
>> > comments.
>> >
>> > Obviously, I am +1 for the merge. The vote will run the standard 7 days,
>> > closing on October 24 at 11:59PM.
>> >
>> > Thanks,
>> > Andrew
>>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

Re: [VOTE] Merge HDFS-4949 to trunk

2013-10-17 Thread Colin McCabe

+1.  Thanks, guys.

best,
Colin

On Thu, Oct 17, 2013 at 3:01 PM, Andrew Wang  wrote:
> Hello all,
>
> I'd like to call a vote to merge the HDFS-4949 branch (in-memory caching)
> to trunk. Colin McCabe and I have been hard at work the last 3.5 months
> implementing this feature, and feel that it's reached a level of stability
> and utility where it's ready for broader testing and integration.
>
> I'd also like to thank Chris Nauroth at Hortonworks for code reviews and
> bug fixes, and everyone who's reviewed the HDFS-4949 design doc and left
> comments.
>
> Obviously, I am +1 for the merge. The vote will run the standard 7 days,
> closing on October 24 at 11:59PM.
>
> Thanks,
> Andrew

Re: Build Still Unstable: CDH5beta1-Hadoop-Common-2.1.0-CDH-JDK7-tests-JDK7-run-JDK7 - Build # 21

2013-10-16 Thread Colin McCabe

Sorry for the noise.  I posted to the wrong list.

best,
Colin

On Wed, Oct 16, 2013 at 9:13 AM, Colin McCabe  wrote:
> This looks pretty similar to https://jira.cloudera.com/browse/CDH-10759
>
> Probably need to take a look at this test to see why it's not managing
> its threads correctly.
>
> Colin
>
> On Tue, Oct 15, 2013 at 8:37 AM, Jenkins  wrote:
>> I offer a cookie, to whoever fixes me. See 
>> <http://golden.jenkins.cloudera.com/job/CDH5beta1-Hadoop-Common-2.1.0-CDH-JDK7-tests-JDK7-run-JDK7/21/>
>>
>> FAILED TESTS
>> 
>> 1 tests failed.
>> REGRESSION:  org.apache.hadoop.ipc.TestRPC.testStopsAllThreads
>>
>> Error Message:
>> Expect no Reader threads running before test expected:<0> but was:<1>
>>
>> Stack Trace:
>> java.lang.AssertionError: Expect no Reader threads running before test 
>> expected:<0> but was:<1>
>> at org.junit.Assert.fail(Assert.java:93)
>> at org.junit.Assert.failNotEquals(Assert.java:647)
>> at org.junit.Assert.assertEquals(Assert.java:128)
>> at org.junit.Assert.assertEquals(Assert.java:472)
>> at 
>> org.apache.hadoop.ipc.TestRPC.testStopsAllThreads(TestRPC.java:777)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:606)
>> at 
>> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
>> at 
>> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>> at 
>> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
>> at 
>> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
>> at 
>> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
>> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
>> at 
>> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
>> at 
>> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
>> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
>> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
>> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
>> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
>> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
>> at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
>> at 
>> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
>> at 
>> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
>> at 
>> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:606)
>> at 
>> org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
>> at 
>> org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
>> at 
>> org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
>> at 
>> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
>> at 
>> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)
>>
>>
>>
>>
>> CHANGES
>> 
>>
>>
>> BUILD LOG
>> 
>> [...truncated 204383 lines...]
>>
>> main:
>> [mkdir] Skipping 
>> /var/lib/jenkins/workspace/CDH5beta1-Hadoop-Common-2.1.0-CDH-JDK7-tests-JDK7-run-JDK7/hadoop-common-project/target/test-dir
>>  because it already exists.
>> [mkdir] Skipping 
>> /var/lib/jenkins/workspace/CDH5beta1-Hadoop-Common-2.1.0-

Re: Build Still Unstable: CDH5beta1-Hadoop-Common-2.1.0-CDH-JDK7-tests-JDK7-run-JDK7 - Build # 21

2013-10-16 Thread Colin McCabe

This looks pretty similar to https://jira.cloudera.com/browse/CDH-10759

Probably need to take a look at this test to see why it's not managing
its threads correctly.

Colin

On Tue, Oct 15, 2013 at 8:37 AM, Jenkins  wrote:
> I offer a cookie, to whoever fixes me. See 
> 
>
> FAILED TESTS
> 
> 1 tests failed.
> REGRESSION:  org.apache.hadoop.ipc.TestRPC.testStopsAllThreads
>
> Error Message:
> Expect no Reader threads running before test expected:<0> but was:<1>
>
> Stack Trace:
> java.lang.AssertionError: Expect no Reader threads running before test 
> expected:<0> but was:<1>
> at org.junit.Assert.fail(Assert.java:93)
> at org.junit.Assert.failNotEquals(Assert.java:647)
> at org.junit.Assert.assertEquals(Assert.java:128)
> at org.junit.Assert.assertEquals(Assert.java:472)
> at org.apache.hadoop.ipc.TestRPC.testStopsAllThreads(TestRPC.java:777)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
> at 
> org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
> at 
> org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
> at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
> at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)
>
>
>
>
> CHANGES
> 
>
>
> BUILD LOG
> 
> [...truncated 204383 lines...]
>
> main:
> [mkdir] Skipping 
> /var/lib/jenkins/workspace/CDH5beta1-Hadoop-Common-2.1.0-CDH-JDK7-tests-JDK7-run-JDK7/hadoop-common-project/target/test-dir
>  because it already exists.
> [mkdir] Skipping 
> /var/lib/jenkins/workspace/CDH5beta1-Hadoop-Common-2.1.0-CDH-JDK7-tests-JDK7-run-JDK7/hadoop-common-project/target/test-dir
>  because it already exists.
> [INFO] Executed tasks
> [INFO] 
> 
> [INFO] Reactor Summary:
> [INFO]
> [INFO] Apache Hadoop Annotations . SUCCESS [14.290s]
> [INFO] Apache Hadoop Auth  SUCCESS [15.448s]
> [INFO] Apache Hadoop Auth Examples ... SUCCESS [6.434s]
> [INFO] Apache Hadoop Common .. SUCCESS 
> [15:04.997s]
> [INFO] Apache Hadoop NFS . SUCCESS [19.836s]
> [INFO] Apache Hadoop Common Project .. SUCCESS [0.067s]
> [INFO] 
> 
> [INF

Re: 2.1.2 (Was: Re: [VOTE] Release Apache Hadoop 2.1.1-beta)

2013-10-02 Thread Colin McCabe

I don't think HADOOP-9972 is a must-do for the next Apache release,
whatever version number it ends up having.  It's just adding a new
API, not changing any existing ones, and it can be done entirely in
generic code.  (The globber doesn't involve FileSystem or AFS
subclasses).

My understanding is that GA is about stabilizing APIs rather than
achieving feature completeness on symlinks.

Colin

On Wed, Oct 2, 2013 at 6:01 PM, Andrew Wang  wrote:
> If we're serious about not breaking compatibility after GA, then we need to
> slow down and make sure we get these new APIs right, or can add them in a
> compatible fashion.
>
> HADOOP-9984 ended up being a bigger change than initially expected, and we
> need to break compatibility with out-of-tree FileSystems to do it properly.
> I would like to see HADOOP-9972 in as well (globLinkStatus), and there are
> open questions on HADOOP-9984 about changing PathFilter and
> FileStatus.getPath() semantics (which would be incompatible). Yes, we could
> just +1 HADOOP-9984 and stamp 2.2.0 on it, but I think it looks bad to then
> immediately turn around and release an incompatible 2.3.
>
> My preference is still for a 2.1.2 with the above API questions resolved,
> then an actual API-stable 2.2.0 GA. This is already punting out all the
> other related FS/tooling changes that we think can be done compatibly but
> are still pretty crucial: shell, distcp, webhdfs, hftp; it'd be great to
> get help on any of these.
>
> Thanks,
> Andrew
>
>
> On Wed, Oct 2, 2013 at 2:56 PM, Roman Shaposhnik  wrote:
>
>> On Tue, Oct 1, 2013 at 5:15 PM, Vinod Kumar Vavilapalli
>>  wrote:
>> > +1. We should get an RC as soon as possible so that we can get all the
>> downstream components to sign off.
>> > The earlier the better.
>>
>> On this very note -- would there be any interest in joining efforts
>> with the Bigtop integration aimed at Hadoop 2.2.x based release
>> of all the Hadoop ecosystem projects?
>>
>> Our current plan is to release Bigtop 0.7.0 within a couple of weeks.
>> That will be the last stable 2.0.x-based release. Bigtop 0.8.0 is supposed
>> to
>> be based on Hadoop 2.x that gets us (Bigtop community) as close as possible
>> to the Hadoop's GA. Here's more on what we'll be doing with Bigtop 0.8.0:
>>
>> http://comments.gmane.org/gmane.comp.apache.incubator.bigtop.devel/10769
>>
>> Of course, on the Bigtop side of things we're stuck with all the necessary
>> integration work anyway, but if there's anything at all that folks are
>> willing
>> to help us and the bigger Hadoop community with that would be very
>> much appreciated. I think both communities will benefit from this type
>> of collaboration.
>>
>> On a practical side of things, as soon as the branch for 2.2.0 gets cut
>> Bigtop can start publishing a complete set of Hadoop ecosystem
>> artifacts built against that particular version and easily install-able
>> on all of our supported systems. We can also start publishing VMs
>> so that folks on OSes other than Linux can help us with testing.
>>
>> Thanks,
>> Roman.
>>

Re: 2.1.2 (Was: Re: [VOTE] Release Apache Hadoop 2.1.1-beta)

2013-10-02 Thread Colin McCabe

On Tue, Oct 1, 2013 at 8:59 PM, Arun C Murthy  wrote:
> Yes, sorry if it wasn't clear.
>
> As others seem to agree, I think we'll be better getting a protocol/api 
> stable GA done and then iterating on bugs etc.
>
> I'm not super worried about HADOOP-9984 since symlinks just made it to 
> branch-2.1 recently.
>
> Currently we only have 2 blockers: HADOOP-9984 & MAPREDUCE-5530. Both of 
> which are PA and I've reviewed MR-5530 and is good to go (thanks Robert). 
> Hopefully we can finish up HADOOP-9984 asap and we'll be good.

We've had several reviews for HADOOP-9984 and are currently just
waiting on a +1.

Sorry if this is a dumb question, but will the 2.2 release be made
from branch-2 or what is currently named branch-2.1-beta?  If it is
the latter, we have a few backports we'll need to do.

Colin


>
> thanks,
> Arun
>
> On Oct 1, 2013, at 4:53 PM, Alejandro Abdelnur  wrote:
>
>> Arun,
>>
>> Does this mean that you want to skip a beta release and go straight to GA
>> with the next release?
>>
>> thx
>>
>>
>> On Tue, Oct 1, 2013 at 4:15 PM, Arun C Murthy  wrote:
>>
>>> Guys,
>>>
>>> I took a look at the content in 2.1.2-beta so far, other than the
>>> critical fixes such as HADOOP-9984 (symlinks) and few others in YARN/MR,
>>> there is fairly little content (unit tests fixes etc.)
>>>
>>> Furthermore, it's standing up well in testing too. Plus, the protocols
>>> look good for now (I wrote a gohadoop to try convince myself), let's lock
>>> them in.
>>>
>>> Given that, I'm thinking we can just go ahead rename it 2.2.0 rather than
>>> make another 2.1.x release.
>>>
>>> This will drop a short-lived release (2.1.2) and help us move forward on
>>> 2.3 which has a fair bunch of content already...
>>>
>>> Thoughts?
>>>
>>> thanks,
>>> Arun
>>>
>>>
>>> On Sep 24, 2013, at 4:24 PM, Zhijie Shen  wrote:
>>>
 I've added MAPREDUCE-5531 to the blocker list. - Zhijie


 On Tue, Sep 24, 2013 at 3:41 PM, Arun C Murthy 
>>> wrote:

> With 4 +1s (3 binding) and no -1s the vote passes. I'll push it out…
>>> I'll
> make it clear on the release page, that there are some known issues and
> that we will follow up very shortly with another release.
>
> Meanwhile, let's fix the remaining blockers (please mark them as such
>>> with
> Target Version 2.1.2-beta).
> The current blockers are here:
> http://s.apache.org/hadoop-2.1.2-beta-blockers
>
> thanks,
> Arun
>
> On Sep 16, 2013, at 11:38 PM, Arun C Murthy 
>>> wrote:
>
>> Folks,
>>
>> I've created a release candidate (rc0) for hadoop-2.1.1-beta that I
> would like to get released - this release fixes a number of bugs on top
>>> of
> hadoop-2.1.0-beta as a result of significant amounts of testing.
>>
>> If things go well, this might be the last of the *beta* releases of
> hadoop-2.x.
>>
>> The RC is available at:
> http://people.apache.org/~acmurthy/hadoop-2.1.1-beta-rc0
>> The RC tag in svn is here:
>
>>> http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.1.1-beta-rc0
>>
>> The maven artifacts are available via repository.apache.org.
>>
>> Please try the release and vote; the vote will run for the usual 7
>>> days.
>>
>> thanks,
>> Arun
>>
>>
>> --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or
>>> entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the
>>> reader
> of this message is not the intended recipient, you are hereby notified
>>> that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender
>>> immediately
> and delete it from your system. Thank You.
>



 --
 Zhijie Shen
 Hortonworks Inc.
 http://hortonworks.com/

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
>>> to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified
>>> that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender
>>> immediately
 and delete it from your system. Thank You.
>>>
>>> --
>>> Arun C. Murthy
>>> Hortonworks Inc.
>>> http://hortonworks.com/
>>>
>>>
>>>
>>

Re: symlink support in Hadoop 2 GA

2013-09-19 Thread Colin McCabe

What we're trying to get to here is a consensus on whether
FileSystem#listStatus and FileSystem#globStatus should return symlinks
__as_symlinks__.  If 2.1-beta goes out with these semantics, I think
we are not going to be able to change them later.  That is what will
happen in the "do nothing" scenario.

Also see Jason Lowe's comment here:
https://issues.apache.org/jira/browse/HADOOP-9912?focusedCommentId=13772002&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13772002

Colin


On Wed, Sep 18, 2013 at 5:11 PM, J. Rottinghuis  wrote:
> However painful protobuf version changes are at build time for Hadoop
> developers, at runtime with multiple clusters and many Hadoop users this is
> a total nightmare.
> Even upgrading clusters from one protobuf version to the next is going to
> be very difficult. The same users will run jobs on, and/or read&write to
> multiple clusters. That means that they will have to fork their code, run
> multiple instances? Or in the very least they have to do an update to their
> applications. All in sync with Hadoop cluster changes. And these are not
> doable in a rolling fashion.
> All Hadoop and HBase clusters will all upgrade at the same time, or we'll
> have to have our users fork / roll multiple versions ?
> My point is that these things are much harder than just fix the (Jenkins)
> build and we're done. These changes are massively disruptive.
>
> There is a similar situation with symlinks. Having an API that lets users
> create symlinks is very problematic. Some users create symlinks and as Eli
> pointed out, somebody else (or automated process) tries to copy to / from
> another (Hadoop 1.x?) cluster over hftp. What will happen ?
> Having an API that people should not use is also a nightmare. We
> experienced this with append. For a while it was there, but users were "not
> allowed to use it" (or else there were large #'s of corrupt blocks). If
> there is an API to create a symlink, then some of our users are going to
> use it and others are going to trip over those symlinks. We already know
> that Pig does not work with symlinks yet, and as Steve pointed out, there
> is tons of other code out there that assumes that !isDir() means isFile().
>
> I like symlink functionality, but in our migration to Hadoop 2.x this is a
> total distraction. If the APIs stay in 2.2 GA we'll have to choose to:
> a) Not uprev until symlink support is figured out up and down the stack,
> and we've been able to migrate all our 1.x (equivalent) clusters to 2.x
> (equivalent). Or
> b) rip out the API altogether. Or
> c) change the implementation to throw an UnsupportedOperationException
> I'm not sure yet which of these I like least.
>
> Thanks,
>
> Joep
>
>
>
>
> On Wed, Sep 18, 2013 at 9:48 AM, Arun C Murthy  wrote:
>
>>
>> On Sep 16, 2013, at 6:49 PM, Andrew Wang  wrote:
>>
>> > Hi all,
>> >
>> > I wanted to broadcast plans for putting the FileSystem symlinks work
>> > (HADOOP-8040) into branch-2.1 for the pending Hadoop 2 GA release. I
>> think
>> > it's pretty important we get it in since it's not a compatible change; if
>> > it misses the GA train, we're not going to have symlinks until the next
>> > major release.
>>
>> Just catching up, is this an incompatible change, or not? The above reads
>> 'not an incompatible change'.
>>
>> Arun
>>
>> >
>> > However, we're still dealing with ongoing issues revealed via testing.
>> > There's user-code out there that only handles files and directories and
>> > will barf when given a symlink (perhaps a dangling one!). See HADOOP-9912
>> > for a nice example where globStatus returning symlinks broke Pig; some of
>> > us had a conference call to talk it through, and one definite conclusion
>> > was that this wasn't solvable in a generally compatible manner.
>> >
>> > There are also still some gaps in symlink support right now. For example,
>> > the more esoteric FileSystems like WebHDFS, HttpFS, and HFTP need symlink
>> > resolution, and tooling like the FsShell and Distcp still need to be
>> > updated as well.
>> >
>> > So, there's definitely work to be done, but there are a lot of users
>> > interested in the feature, and symlinks really should be in GA. Would
>> > appreciate any thoughts/input on the matter.
>> >
>> > Thanks,
>> > Andrew
>>
>> --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>>
>> --
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity to
>> which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>>

Re: symlink support in Hadoop 2 GA

2013-09-17 Thread Colin McCabe

The issue is not modifying existing APIs.  The issue is that code has
been written that makes assumptions that are incompatible with the
existence of things that are not files or directories.  For example,
there is a lot of code out there that looks at FileStatus#isFile, and
if it returns false, assumes that what it is looking at is a
directory.  In the case of a symlink, this assumption is incorrect.

Faced with this, we have considered making the default behavior of
listStatus and globStatus to be fully resolving symlinks, and simply
not listing dangling symlinks. Code which is prepared to deal symlinks
can use newer versions of the listStatus and globStatus functions
which do return symlinks as symlinks.

We might consider defaulting FileSystem#listStatus and
FileSystem#globStatus to "fully resolving symlinks by default" and
defaulting FileContext#listStatus and FileContext#Util#globStatus to
the opposite.  This seems like the maximally compatible solution that
we're going to get.  I think this makes sense.

The alternative is kicking the can down the road to Hadoop 3, and
letting vendors of alternative (including some proprietary
alternative) systems continue to claim that "Hadoop doesn't support
symlinks yet" (with some justice).

P.S.  I would be fine with putting this in 2.2 or 2.3 if that seems
more appropriate.

sincerely,
Colin

On Tue, Sep 17, 2013 at 8:23 AM, Suresh Srinivas  wrote:
> I agree that this is an important change. However, 2.2.0 GA is getting
> ready to rollout in weeks. I am concerned that these changes will add not
> only incompatible changes late in the game, but also possibly instability.
> Java API incompatibility is some thing we have avoided for the most part
> and I am concerned that this is adding such incompatibility in FileSystem
> APIs. We should find work arounds by adding possibly newer APIs and leaving
> existing APIs as is. If this can be done, my vote is to enable this feature
> in 2.3. Even if it cannot be done, I am concerned that this is coming quite
> late and we should see if could allow some incompatible changes into 2.3
> for this feature.
>
>
> On Mon, Sep 16, 2013 at 6:49 PM, Andrew Wang wrote:
>
>> Hi all,
>>
>> I wanted to broadcast plans for putting the FileSystem symlinks work
>> (HADOOP-8040) into branch-2.1 for the pending Hadoop 2 GA release. I think
>> it's pretty important we get it in since it's not a compatible change; if
>> it misses the GA train, we're not going to have symlinks until the next
>> major release.
>>
>> However, we're still dealing with ongoing issues revealed via testing.
>> There's user-code out there that only handles files and directories and
>> will barf when given a symlink (perhaps a dangling one!). See HADOOP-9912
>> for a nice example where globStatus returning symlinks broke Pig; some of
>> us had a conference call to talk it through, and one definite conclusion
>> was that this wasn't solvable in a generally compatible manner.
>>
>> There are also still some gaps in symlink support right now. For example,
>> the more esoteric FileSystems like WebHDFS, HttpFS, and HFTP need symlink
>> resolution, and tooling like the FsShell and Distcp still need to be
>> updated as well.
>>
>> So, there's definitely work to be done, but there are a lot of users
>> interested in the feature, and symlinks really should be in GA. Would
>> appreciate any thoughts/input on the matter.
>>
>> Thanks,
>> Andrew
>>
>
>
>
> --
> http://hortonworks.com/download/
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

Re: symlink support in Hadoop 2 GA

2013-09-17 Thread Colin McCabe

I think it makes sense to finish symlinks support in the Hadoop 2 GA release.

Colin

On Mon, Sep 16, 2013 at 6:49 PM, Andrew Wang  wrote:
> Hi all,
>
> I wanted to broadcast plans for putting the FileSystem symlinks work
> (HADOOP-8040) into branch-2.1 for the pending Hadoop 2 GA release. I think
> it's pretty important we get it in since it's not a compatible change; if
> it misses the GA train, we're not going to have symlinks until the next
> major release.
>
> However, we're still dealing with ongoing issues revealed via testing.
> There's user-code out there that only handles files and directories and
> will barf when given a symlink (perhaps a dangling one!). See HADOOP-9912
> for a nice example where globStatus returning symlinks broke Pig; some of
> us had a conference call to talk it through, and one definite conclusion
> was that this wasn't solvable in a generally compatible manner.
>
> There are also still some gaps in symlink support right now. For example,
> the more esoteric FileSystems like WebHDFS, HttpFS, and HFTP need symlink
> resolution, and tooling like the FsShell and Distcp still need to be
> updated as well.
>
> So, there's definitely work to be done, but there are a lot of users
> interested in the feature, and symlinks really should be in GA. Would
> appreciate any thoughts/input on the matter.
>
> Thanks,
> Andrew

Re: hdfs native build failing in trunk

2013-09-16 Thread Colin McCabe

The relevant line is:

[exec]   gcc: vfork: Resource temporarily unavailable

Looks like the build slave was overloaded and could not create new processes?

Colin

On Mon, Sep 16, 2013 at 4:43 AM, Alejandro Abdelnur  wrote:
> It seems a commit of native code in YARN has triggered a native build in
> HDFS and things are failing.
>
>
> -- Forwarded message --
> From: Apache Jenkins Server 
> Date: Mon, Sep 16, 2013 at 1:34 PM
> Subject: Hadoop-Hdfs-trunk - Build # 1524 - Still Failing
> To: hdfs-dev@hadoop.apache.org
>
>
> See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1524/
>
> ###
> ## LAST 60 LINES OF THE CONSOLE
> ###
> [...truncated 10055 lines...]
> [INFO] Building Apache Hadoop HDFS Project 3.0.0-SNAPSHOT
> [INFO]
> 
> [WARNING] The POM for org.eclipse.m2e:lifecycle-mapping:jar:1.0.0 is
> missing, no dependency information available
> [WARNING] Failed to retrieve plugin descriptor for
> org.eclipse.m2e:lifecycle-mapping:1.0.0: Plugin
> org.eclipse.m2e:lifecycle-mapping:1.0.0 or one of its dependencies could
> not be resolved: Failed to read artifact descriptor for
> org.eclipse.m2e:lifecycle-mapping:jar:1.0.0
> [INFO]
> [INFO] --- maven-clean-plugin:2.4.1:clean (default-clean) @
> hadoop-hdfs-project ---
> [INFO] Deleting
> /home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-trunk/trunk/hadoop-hdfs-project/target
> [INFO]
> [INFO] --- maven-antrun-plugin:1.6:run (create-testdirs) @
> hadoop-hdfs-project ---
> [INFO] Executing tasks
>
> main:
> [mkdir] Created dir:
> /home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-trunk/trunk/hadoop-hdfs-project/target/test-dir
> [INFO] Executed tasks
> [INFO]
> [INFO] --- maven-source-plugin:2.1.2:jar-no-fork (hadoop-java-sources) @
> hadoop-hdfs-project ---
> [INFO]
> [INFO] --- maven-source-plugin:2.1.2:test-jar-no-fork (hadoop-java-sources)
> @ hadoop-hdfs-project ---
> [INFO]
> [INFO] --- maven-enforcer-plugin:1.0:enforce (dist-enforce) @
> hadoop-hdfs-project ---
> [INFO]
> [INFO] --- maven-site-plugin:3.0:attach-descriptor (attach-descriptor) @
> hadoop-hdfs-project ---
> [INFO]
> [INFO] --- maven-javadoc-plugin:2.8.1:jar (module-javadocs) @
> hadoop-hdfs-project ---
> [INFO] Not executing Javadoc as the project is not a Java classpath-capable
> package
> [INFO]
> [INFO] --- maven-checkstyle-plugin:2.6:checkstyle (default-cli) @
> hadoop-hdfs-project ---
> [INFO]
> [INFO] --- findbugs-maven-plugin:2.3.2:findbugs (default-cli) @
> hadoop-hdfs-project ---
> [INFO] ** FindBugsMojo execute ***
> [INFO] canGenerate is false
> [INFO]
> 
> [INFO] Reactor Summary:
> [INFO]
> [INFO] Apache Hadoop HDFS  FAILURE [39.738s]
> [INFO] Apache Hadoop HttpFS .. SKIPPED
> [INFO] Apache Hadoop HDFS BookKeeper Journal . SKIPPED
> [INFO] Apache Hadoop HDFS-NFS  SKIPPED
> [INFO] Apache Hadoop HDFS Project  SUCCESS [1.693s]
> [INFO]
> 
> [INFO] BUILD FAILURE
> [INFO]
> 
> [INFO] Total time: 42.265s
> [INFO] Finished at: Mon Sep 16 11:34:09 UTC 2013
> [INFO] Final Memory: 45M/468M
> [INFO]
> 
> [ERROR] Failed to execute goal
> org.apache.maven.plugins:maven-antrun-plugin:1.6:run (make) on project
> hadoop-hdfs: An Ant BuildException has occured: exec returned: 1 -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions,
> please read the following articles:
> [ERROR] [Help 1]
> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
> Build step 'Execute shell' marked build as failure
> Archiving artifacts
> Updating YARN-1137
> Sending e-mails to: hdfs-dev@hadoop.apache.org
> Email was triggered for: Failure
> Sending email for trigger: Failure
>
>
>
> ###
> ## FAILED TESTS (if any)
> ##
> No tests ran.
>
>
>
> --
> Alejandro

Re: [VOTE] Release Apache Hadoop 2.1.0-beta

2013-08-22 Thread Colin McCabe

On Wed, Aug 21, 2013 at 3:49 PM, Stack  wrote:
> On Wed, Aug 21, 2013 at 1:25 PM, Colin McCabe wrote:
>
>> St.Ack wrote:
>>
>> > + Once I figured where the logs were, found that JAVA_HOME was not being
>> > exported (don't need this in hadoop-2.0.5 for instance).  Adding an
>> > exported JAVA_HOME to my running shell which don't seem right but it took
>> > care of it (I gave up pretty quick on messing w/
>> > yarn.nodemanager.env-whitelist and yarn.nodemanager.admin-env -- I wasn't
>> > getting anywhere)
>>
>> I thought that we were always supposed to have JAVA_HOME set when
>> running any of these commands.  At least, I do.  How else can the
>> system disambiguate between different Java installs?  I need 2
>> installs to test with JDK7.
>>
>>
>
> That is fair enough but I did not need to define this explicitly previously
> (for hadoop-2.0.5-alpha for instance) or the JAVA_HOME that was figured in
> start scripts was propagated and now is not (I have not dug in).
>
>
>
>> > + This did not seem to work for me:
>> > hadoop.security.group.mapping
>> >
>> org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback> > lue>.
>>
>> We've seen this before.  I think your problem is that you have
>> java.library.path set correctly (what System.loadLibrary checks), but
>> your system library path does not include a necessary dependency of
>> libhadoop.so-- most likely, libjvm.so.  Probably, we should fix
>> NativeCodeLoader to actually make a function call in libhadoop.so
>> before it declares everything OK.
>>
>
> My expectation was that if native group lookup fails, as it does here, then
> the 'Fallback' would kick in and we'd do the Shell query.  This mechanism
> does not seem to be working.

I filed https://issues.apache.org/jira/browse/HADOOP-9895 to address this issue.

best,
Colin


>
>
> St.Ack

Re: [VOTE] Release Apache Hadoop 2.1.0-beta

2013-08-21 Thread Colin McCabe

St.Ack wrote:

> + Once I figured where the logs were, found that JAVA_HOME was not being
> exported (don't need this in hadoop-2.0.5 for instance).  Adding an
> exported JAVA_HOME to my running shell which don't seem right but it took
> care of it (I gave up pretty quick on messing w/
> yarn.nodemanager.env-whitelist and yarn.nodemanager.admin-env -- I wasn't
> getting anywhere)

I thought that we were always supposed to have JAVA_HOME set when
running any of these commands.  At least, I do.  How else can the
system disambiguate between different Java installs?  I need 2
installs to test with JDK7.

> + This did not seem to work for me:
> hadoop.security.group.mapping
> org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback lue>.

We've seen this before.  I think your problem is that you have
java.library.path set correctly (what System.loadLibrary checks), but
your system library path does not include a necessary dependency of
libhadoop.so-- most likely, libjvm.so.  Probably, we should fix
NativeCodeLoader to actually make a function call in libhadoop.so
before it declares everything OK.

Colin


On Tue, Aug 20, 2013 at 5:35 PM, Stack  wrote:
> On Thu, Aug 15, 2013 at 2:15 PM, Arun C Murthy  wrote:
>
>> Folks,
>>
>> I've created a release candidate (rc2) for hadoop-2.1.0-beta that I would
>> like to get released - this fixes the bugs we saw since the last go-around
>> (rc1).
>>
>> The RC is available at:
>> http://people.apache.org/~acmurthy/hadoop-2.1.0-beta-rc2/
>> The RC tag in svn is here:
>> http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.1.0-beta-rc2
>>
>> The maven artifacts are available via repository.apache.org.
>>
>> Please try the release and vote; the vote will run for the usual 7 days.
>>
>
> It basically works (in insecure mode), +1.
>
> + Checked signature.
> + Ran on small cluster w/ small load made using mapreduce interfaces.
> + Got the HBase full unit test suite to pass on top of it.
>
> I had the following issues getting it to all work. I don't know if they are
> known issues so will just list them here first.
>
> + I could not find documentation on how to go from tarball to running
> cluster (the bundled 'cluster' and 'standalone' doc are not about how to
> get this tarball off the ground).
> + I had a bit of a struggle putting this release in place under hbase unit
> tests.  The container would just exit w/ 127 errcode.  No logs in expected
> place.  Tripped over where minimrcluster was actually writing.  Tried to
> corral it so it played nicely w/o our general test setup but found that the
> new mini clusters have 'target' hardcoded as output dirs.
> + Once I figured where the logs were, found that JAVA_HOME was not being
> exported (don't need this in hadoop-2.0.5 for instance).  Adding an
> exported JAVA_HOME to my running shell which don't seem right but it took
> care of it (I gave up pretty quick on messing w/
> yarn.nodemanager.env-whitelist and yarn.nodemanager.admin-env -- I wasn't
> getting anywhere)
> + This did not seem to work for me:
> hadoop.security.group.mapping
> org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback.
>  It just did this:
>
> Caused by: java.lang.UnsatisfiedLinkError:
> org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative()V
> at org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative(Native
> Method)
> at
> org.apache.hadoop.security.JniBasedUnixGroupsMapping.(JniBasedUnixGroupsMapping.java:49)
> at
> org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback.(JniBasedUnixGroupsMappingWithFallback.java:38)
>
> ..so I replaced it
> w/ org.apache.hadoop.security.ShellBasedUnixGroupsMapping on the hbase-side
> to get my cluster up and running.
>
> + Untarring the bin dir, it undoes as hadoop-X.Y.Z-beta.  Undoing the src
> dir it undoes as hadoop-X.Y.Z-beta-src.  I'd have thought they would undo
> into the one directory overlaying each other.
>
> St.Ack

Re: Secure deletion of blocks

2013-08-20 Thread Colin McCabe

Just to clarify, ext4 has the option to turn off journalling.  ext3 does
not.  Not sure about reiser.

Colin


On Tue, Aug 20, 2013 at 12:42 PM, Colin McCabe wrote:

> > If I've got the right idea about this at all?
>
> From the man page for wipe(1);
>
> "Journaling filesystems (such as Ext3 or ReiserFS) are now being used by
> default by most Linux distributions. No secure deletion program that does
> filesystem-level calls can sanitize files on such filesystems, because
> sensitive data and metadata can be written to the journal, which cannot be
> readily accessed. Per-file secure deletion is better implemented in the
> operating system."
>
> You might be able to work around this by turning off the journal on these
> filesystems.  But even then, you've got issues like the drive remapping bad
> sectors (and leaving around the old ones), flash firmware that is unable to
> erase less than an erase block, etc.
>
> The simplest solution is probably just to use full-disk encryption.  Then
> you don't need any code changes at all.
>
> Doing something like invoking shred on the block files could improve
> security somewhat, but it's not going to work all the time.
>
> Colin
>
>
> On Thu, Aug 15, 2013 at 5:31 AM, Matt Fellows <
> matt.fell...@bespokesoftware.com> wrote:
>
>> Hi,
>> I'm looking into writing a patch for HDFS which will provide a new method
>> within HDFS which can securely delete the contents of a block on all the
>> nodes upon which it exists. By securely delete I mean, overwrite with
>> 1's/0's/random data cyclically such that the data could not be recovered
>> forensically.
>>
>> I'm not currently aware of any existing code / methods which provide
>> this, so was going to implement this myself.
>>
>> I figured the DataNode.java was probably the place to start looking into
>> how this could be done, so I've read the source for this, but it's not
>> really enlightened me a massive amount.
>>
>> I'm assuming I need to tell the NameServer that all DataNodes with a
>> particular block id would be required to be deleted, then as each DataNode
>> calls home, the DataNode would be instructed to securely delete the
>> relevant block, and it would oblige.
>>
>> Unfortunately I have no idea where to begin and was looking for some
>> pointers?
>>
>> I guess specifically I'd like to know:
>>
>> 1. Where the hdfs CLI commands are implemented
>> 2. How a DataNode identifies a block / how a NameServer could inform a
>> DataNode to delete a block
>> 3. Where the existing "delete" is implemented so I can make sure my
>> secure delete makes use of it after successfully blanking the block contents
>> 4. If I've got the right idea about this at all?
>>
>> Kind regards,
>> Matt Fellows
>>
>> --
>> [image: cid:1CBF4038-3F0F-4FC2-A1FF-6DC81B8B6F94]
>>  First Option Software Ltd
>> Signal House
>> Jacklyns Lane
>> Alresford
>> SO24 9JJ
>> Tel: +44 (0)1962 738232
>> Mob: +44 (0)7710 160458
>> Fax: +44 (0)1962 600112
>> Web: www.b 
>> <http://www.fosolutions.co.uk/>espokesoftware.com<http://bespokesoftware.com/>
>>
>> __**__
>>
>> This is confidential, non-binding and not company endorsed - see full
>> terms at 
>> www.fosolutions.co.uk/**emailpolicy.html<http://www.fosolutions.co.uk/emailpolicy.html>
>>
>> First Option Software Ltd Registered No. 06340261
>> Signal House, Jacklyns Lane, Alresford, Hampshire, SO24 9JJ, U.K.
>> __**__
>>
>>
>

Re: Secure deletion of blocks

2013-08-20 Thread Colin McCabe

> If I've got the right idea about this at all?

>From the man page for wipe(1);

"Journaling filesystems (such as Ext3 or ReiserFS) are now being used by
default by most Linux distributions. No secure deletion program that does
filesystem-level calls can sanitize files on such filesystems, because
sensitive data and metadata can be written to the journal, which cannot be
readily accessed. Per-file secure deletion is better implemented in the
operating system."

You might be able to work around this by turning off the journal on these
filesystems.  But even then, you've got issues like the drive remapping bad
sectors (and leaving around the old ones), flash firmware that is unable to
erase less than an erase block, etc.

The simplest solution is probably just to use full-disk encryption.  Then
you don't need any code changes at all.

Doing something like invoking shred on the block files could improve
security somewhat, but it's not going to work all the time.

Colin


On Thu, Aug 15, 2013 at 5:31 AM, Matt Fellows <
matt.fell...@bespokesoftware.com> wrote:

> Hi,
> I'm looking into writing a patch for HDFS which will provide a new method
> within HDFS which can securely delete the contents of a block on all the
> nodes upon which it exists. By securely delete I mean, overwrite with
> 1's/0's/random data cyclically such that the data could not be recovered
> forensically.
>
> I'm not currently aware of any existing code / methods which provide this,
> so was going to implement this myself.
>
> I figured the DataNode.java was probably the place to start looking into
> how this could be done, so I've read the source for this, but it's not
> really enlightened me a massive amount.
>
> I'm assuming I need to tell the NameServer that all DataNodes with a
> particular block id would be required to be deleted, then as each DataNode
> calls home, the DataNode would be instructed to securely delete the
> relevant block, and it would oblige.
>
> Unfortunately I have no idea where to begin and was looking for some
> pointers?
>
> I guess specifically I'd like to know:
>
> 1. Where the hdfs CLI commands are implemented
> 2. How a DataNode identifies a block / how a NameServer could inform a
> DataNode to delete a block
> 3. Where the existing "delete" is implemented so I can make sure my secure
> delete makes use of it after successfully blanking the block contents
> 4. If I've got the right idea about this at all?
>
> Kind regards,
> Matt Fellows
>
> --
> [image: cid:1CBF4038-3F0F-4FC2-A1FF-6DC81B8B6F94]
>  First Option Software Ltd
> Signal House
> Jacklyns Lane
> Alresford
> SO24 9JJ
> Tel: +44 (0)1962 738232
> Mob: +44 (0)7710 160458
> Fax: +44 (0)1962 600112
> Web: www.b 
> espokesoftware.com
>
> __**__
>
> This is confidential, non-binding and not company endorsed - see full
> terms at 
> www.fosolutions.co.uk/**emailpolicy.html
>
> First Option Software Ltd Registered No. 06340261
> Signal House, Jacklyns Lane, Alresford, Hampshire, SO24 9JJ, U.K.
> __**__
>
>

Re: Feature request to provide DFSInputStream subclassing mechanism

2013-08-08 Thread Colin McCabe

There is work underway to decouple the block layer and the namespace
layer of HDFS from each other.  Once this is done, block behaviors
like the one you describe will be easy to implement.  It's a use case
very similar to the hierarchical storage management (HSM) use case
that we've discussed before.  Check out HDFS-2832.  Hopefully there
will be a design document posted soon.

cheers,
Colin


On Thu, Aug 8, 2013 at 11:52 AM, Matevz Tadel  wrote:
> Hi everybody,
>
> I'm jumping in as Jeff is away due to an unexpected annoyance involving
> Californian wildlife.
>
>
> On 8/7/13 7:47 PM, Andrew Wang wrote:
>>
>> Blocks are supposed to be an internal abstraction within HDFS, and aren't
>> an
>> inherent part of FileSystem (the user-visible class used to access all
>> Hadoop
>> filesystems).
>
>
> Yes, but it's a really useful abstraction :) Do you really believe the
> blocks could be abandoned in the next couple of years? I mean, it's such a
> simple and effective solution ...
>
>
>> Is it possible to instead deal with files and offsets? On a read failure,
>> you
>> could open a stream to the same file on the backup filesystem, seek to the
>> old
>> file position, and retry the read. This feels like it's possible via
>> wrapping.
>
>
> As Jeff briefly mentioned, all USCMS sites export their data via XRootd (not
> all of them use HDFS!) and we developed a specialization of XRootd caching
> proxy that can fetch only requested blocks (block size is passed from our
> input stream class to XRootd client (via JNI) and on to the proxy server)
> and keep them in a local cache. This allows as to do three things:
>
> 1. the first time we notice a block is missing, a whole block is fetched
> from elsewhere and further access requests from the same process get
> fulfilled with zero latency;
>
> 2. later requests from other processes asking for this block are fulfilled
> immediately (well, after the initial 3 retries);
>
> 3. we have a list of blocks that were fetched and we could (this is what we
> want to look into in the near future) re-inject them into HDFS if the data
> loss turns out to be permanent (bad disk vs. node that was
> offline/overloaded for a while).
>
> Handling exceptions at the block level thus gives us just what we need. As
> input stream is the place where these errors become known it is, I think,
> also the easiest place to handle them.
>
> I'll understand if you find opening-up of the interfaces in the central
> repository unacceptable. We can always apply the patch at the OSG level
> where rpms for all our deployments get built.
>
> Thanks & Best regards,
> Matevz
>
>>
>> On Wed, Aug 7, 2013 at 3:29 PM, Jeff Dost > > wrote:
>>
>> Thank you for the suggestion, but we don't see how simply wrapping a
>> FileSystem object would be sufficient in our use case.  The reason why
>> is we
>> need to catch and handle read exceptions at the block level.  There
>> aren't
>> any public methods available in the high level FileSystem abstraction
>> layer
>> that would give us the fine grained control we need at block level
>> read
>> failures.
>>
>> Perhaps if I outline the steps more clearly it will help explain what
>> we are
>> trying to do.  Without our enhancements, suppose a user opens a file
>> stream
>> and starts reading the file from Hadoop. After some time, at some
>> position
>> into the file, if there happen to be no replicas available for a
>> particular
>> block for whatever reason, datanodes have gone down due to disk
>> issues, etc.
>> the stream will throw an IOException (BlockMissingException or
>> similar) and
>> the read will fail.
>>
>> What we are doing is rather than letting the stream fail, we have
>> another
>> stream queued up that knows how to fetch the blocks elsewhere outside
>> of our
>> Hadoop cluster that couldn't be retrieved.  So we need to be able to
>> catch
>> the exception at this point, and these externally fetched bytes then
>> get
>> read into the user supplied read buffer.  Now Hadoop can proceed to
>> read in
>> the stream the next blocks in the file.
>>
>> So as you can see this method of fail over on demand allows an input
>> stream
>> to keep reading data, without having to start it all over again if a
>> failure
>> occurs (assuming the remote bytes were successfully fetched).
>>
>> As a final note I would like to mention that we will be providing our
>> failover module to the Open Science Grid.  Since we hope to provide
>> this as
>> a benefit to all OSG users running at participating T2 computing
>> clusters,
>> we will be committed to maintaining this software and any changes to
>> Hadoop
>> needed to make it work.  In other words we will be willing to maintain
>> any
>> implementation changes that may become necessary as Hadoop internals
>> change
>> in future releases.
>>
>> Thanks,
>> Jeff
>>
>>
>> On 8/7/13 11:30

Re: I'm interested in working with HDFS-4680. Can somebody be a mentor?

2013-07-17 Thread Colin McCabe

Andrew Wang has been working on getting this kind of Dapper-style
trace functionality in HDFS.  He is on vacation this week, but next
week he might have some ideas about how you could contribute and/or
integrate with his patch.  Doing this right with security, etc is a
pretty big project and I think he wanted to do it incrementally.

best,
Colin McCabe


On Wed, Jul 17, 2013 at 1:44 PM, Sreejith Ramakrishnan
 wrote:
> Hey,
>
> I was originally researching options to work on ACCUMULO-1197. Basically,
> it was a bid to pass trace functionality through the DFSClient. I discussed
> with the guys over there on implementing a Google Dapper-style trace with
> HTrace. The guys at HBase are also trying to achieve the same HTrace
> integration [HBASE-6449]
>
> But, that meant adding stuff to the RPC in HDFS. For a start, we've to add
> a 64-bit span-id to every RPC with tracing enabled. There's some more in
> the original Dapper paper and HTrace documentation.
>
> I was told by the Accumulo people to talk with and seek help from the
> experts at HDFS. I'm open to suggestions.
>
> Additionally, I'm participating in a Joint Mentoring Programme by Apache
> which is quite similar to GSoC. Luciano Resende (Community Development,
> Apache) is incharge of the programme. I'll attach a link. The last date is
> 19th July. So, I'm pretty tensed without any mentors :(
>
> [1] https://issues.apache.org/jira/browse/ACCUMULO-1197
> [2] https://issues.apache.org/jira/browse/HDFS-4680
> [3] https://github.com/cloudera/htrace
> [4] http://community.apache.org/mentoringprogramme-icfoss-pilot.html
> [5] https://issues.apache.org/jira/browse/HBASE-6449
>
> Thank you,
> Sreejith R

Re: data loss after cluster wide power loss

2013-07-08 Thread Colin McCabe

Thanks.  Suresh and Kihwal are right-- renames are journalled, but not
necessarily durable (stored to disk).  I was getting mixed up with
HDFS semantics, in which we actually do make the journal durable
before returning success to the client.

It might be a good idea for HDFS to fsync the file descriptor of the
directories involved in the rename operation, before assuming that the
operation is durable.

If you're using ext{2,3,4}, a quick fix would be to use mount -o
dirsync.  I haven't tested it out, but it's supposed to make these
operations synchronous.

>From the man page:
   dirsync
  All directory updates within the filesystem should be done  syn-
  chronously.   This  affects  the  following system calls: creat,
  link, unlink, symlink, mkdir, rmdir, mknod and rename.

Colin


On Wed, Jul 3, 2013 at 10:19 AM, Suresh Srinivas  wrote:
> On Wed, Jul 3, 2013 at 8:12 AM, Colin McCabe  wrote:
>
>> On Mon, Jul 1, 2013 at 8:48 PM, Suresh Srinivas 
>> wrote:
>> > Dave,
>> >
>> > Thanks for the detailed email. Sorry I did not read all the details you
>> had
>> > sent earlier completely (on my phone). As you said, this is not related
>> to
>> > data loss related to HBase log and hsync. I think you are right; the
>> rename
>> > operation itself might not have hit the disk. I think we should either
>> > ensure metadata operation is synced on the datanode or handle it being
>> > reported as blockBeingWritten. Let me spend sometime to debug this issue.
>>
>> In theory, ext3 is journaled, so all metadata operations should be
>> durable in the case of a power outage.  It is only data operations
>> that should be possible to lose.  It is the same for ext4.  (Assuming
>> you are not using nonstandard mount options.)
>>
>
> ext3 journal may not hit the disk right. From what I read, if you do not
> specifically
> call sync, even the metadata operations do not hit disk.
>
> See - https://www.kernel.org/doc/Documentation/filesystems/ext3.txt
>
> commit=nrsec(*) Ext3 can be told to sync all its data and metadata
> every 'nrsec' seconds. The default value is 5 seconds.
> This means that if you lose your power, you will lose
> as much as the latest 5 seconds of work (your
> filesystem will not be damaged though, thanks to the
> journaling).  This default value (or any low value)
> will hurt performance, but it's good for data-safety.
> Setting it to 0 will have the same effect as leaving
> it at the default (5 seconds).
> Setting it to very large values will improve
>
> performance.

Re: data loss after cluster wide power loss

2013-07-03 Thread Colin McCabe

On Mon, Jul 1, 2013 at 8:48 PM, Suresh Srinivas  wrote:
> Dave,
>
> Thanks for the detailed email. Sorry I did not read all the details you had
> sent earlier completely (on my phone). As you said, this is not related to
> data loss related to HBase log and hsync. I think you are right; the rename
> operation itself might not have hit the disk. I think we should either
> ensure metadata operation is synced on the datanode or handle it being
> reported as blockBeingWritten. Let me spend sometime to debug this issue.

In theory, ext3 is journaled, so all metadata operations should be
durable in the case of a power outage.  It is only data operations
that should be possible to lose.  It is the same for ext4.  (Assuming
you are not using nonstandard mount options.)

In practice, it is possible that your hard disks didn't actually
persist the data that they said they did.  Rumor has it that some
drives ignore the SATA FLUSH CACHE command in some cases, since it
makes them look bad in benchmarks.  In that case, there is nothing the
filesystem or any other software can do.

There was also a bug in older linux kernels where the kernel would not
actually send FLUSH CACHE.  Since the poster is using ext3 and
hadoop-1, it's possible he's also using an antique kernel as well.  I
know for sure this affected LVM-- it used to ignore barriers until
fairly recently.

In Ceph, we used to recommend disabling the hard drive write cache if
your kernel was older than 2.6.33.  You can read the recommendation
for yourself here:
http://ceph.com/docs/master/rados/configuration/filesystem-recommendations/
 This will have an impact on performance, however.

An uninterruptable power supply is not a bad idea.

I am curious:
what kernel version you are using?
are you using LVM?

Colin


>
> One surprising thing is, all the replicas were reported as
> blockBeingWritten.
>
> Regards,
> Suresh
>
>
> On Mon, Jul 1, 2013 at 6:03 PM, Dave Latham  wrote:
>>
>> (Removing hbase list and adding hdfs-dev list as this is pretty internal
>> stuff).
>>
>> Reading through the code a bit:
>>
>> FSDataOutputStream.close calls
>> DFSOutputStream.close calls
>> DFSOutputStream.closeInternal
>>  - sets currentPacket.lastPacketInBlock = true
>>  - then calls
>> DFSOutputStream.flushInternal
>>  - enqueues current packet
>>  - waits for ack
>>
>> BlockReceiver.run
>>  - if (lastPacketInBlock && !receiver.finalized) calls
>> FSDataset.finalizeBlock calls
>> FSDataset.finalizeBlockInternal calls
>> FSVolume.addBlock calls
>> FSDir.addBlock calls
>> FSDir.addBlock
>>  - renames block from "blocksBeingWritten" tmp dir to "current" dest dir
>>
>> This looks to me as I would expect a synchronous chain from a DFS client
>> to moving the file from blocksBeingWritten to the current dir so that once
>> the file is closed that it the block files would be in the proper directory
>> - even if the contents of the file are still in the OS buffer rather than
>> synced to disk.  It's only after this moving of blocks that
>> NameNode.complete file is called.  There are several conditions and loops in
>> there that I'm not certain this chain is fully reliable in all cases without
>> a greater understanding of the code.
>>
>> Could it be the case that the rename operation itself is not synced and
>> that ext3 lost the fact that the block files were moved?
>> Or is there a bug in the close file logic that for some reason the block
>> files are not always moved into place when a file is closed?
>>
>> Thanks for your patience,
>> Dave
>>
>>
>> On Mon, Jul 1, 2013 at 3:35 PM, Dave Latham  wrote:
>>>
>>> Thanks for the response, Suresh.
>>>
>>> I'm not sure that I understand the details properly.  From my reading of
>>> HDFS-744 the hsync API would allow a client to make sure that at any point
>>> in time it's writes so far hit the disk.  For example, for HBase it could
>>> apply a fsync after adding some edits to its WAL to ensure those edits are
>>> fully durable for a file which is still open.
>>>
>>> However, in this case the dfs file was closed and even renamed.  Is it
>>> the case that even after a dfs file is closed and renamed that the data
>>> blocks would still not be synced and would still be stored by the datanode
>>> in "blocksBeingWritten" rather than in "current"?  If that is case, would it
>>> be better for the NameNode not to reject replicas that are in
>>> blocksBeingWritten, especially if it doesn't have any other replicas
>>> available?
>>>
>>> Dave
>>>
>>>
>>> On Mon, Jul 1, 2013 at 3:16 PM, Suresh Srinivas 
>>> wrote:

 Yes this is a known issue.

 The HDFS part of this was addressed in
 https://issues.apache.org/jira/browse/HDFS-744 for 2.0.2-alpha and is
 not
 available in 1.x  release. I think HBase does not use this API yet.


 On Mon, Jul 1, 2013 at 3:00 PM, Dave Latham  wrote:

 > We're running HBase over HDFS 1.0.2 on about 1000 nodes.  On Saturday
 > the
 > data center we were in had a total power

Re: dfs.datanode.socket.reuse.keepalive

2013-06-17 Thread Colin McCabe

Thanks for reminding me.  I filed
https://issues.apache.org/jira/browse/HDFS-4911 for this.

4307 was about making the cache robust against programs that change
the wall-clock time.

best,
Colin


On Sun, Jun 16, 2013 at 7:29 AM, Harsh J  wrote:
> Hi Colin,
>
> Do we have a JIRA already for this? Is it
> https://issues.apache.org/jira/browse/HDFS-4307?
>
> On Mon, Jun 10, 2013 at 11:05 PM, Todd Lipcon  wrote:
>> +1 for dropping the client side expiry down to something like 1-2 seconds.
>> I'd rather do that than up the server side, since the server side resource
>> (DN threads) is likely to be more contended.
>>
>> -Todd
>>
>> On Fri, Jun 7, 2013 at 4:29 PM, Colin McCabe  wrote:
>>
>>> Hi all,
>>>
>>> HDFS-941 added dfs.datanode.socket.reuse.keepalive.  This allows
>>> DataXceiver worker threads in the DataNode to linger for a second or
>>> two after finishing a request, in case the client wants to send
>>> another request.  On the client side, HDFS-941 added a SocketCache, so
>>> that subsequent client requests could reuse the same socket.  Sockets
>>> were closed purely by an LRU eviction policy.
>>>
>>> Later, HDFS-3373 added a minimum expiration time to the SocketCache,
>>> and added a thread which periodically closed old sockets.
>>>
>>> However, the default timeout for SocketCache (which is now called
>>> PeerCache) is much longer than the DN would possibly keep the socket
>>> open.  Specifically, dfs.client.socketcache.expiryMsec defaults to 2 *
>>> 60 * 1000 (2 minutes), whereas dfs.datanode.socket.reuse.keepalive
>>> defaults to 1000 (1 second).
>>>
>>> I'm not sure why we have such a big disparity here.  It seems like
>>> this will inevitably lead to clients trying to use sockets which have
>>> gone stale, because the server closes them way before the client
>>> expires them.  Unless I'm missing something, we should probably either
>>> lengthen the keepalive, or shorten the socket cache expiry, or both.
>>>
>>> thoughts?
>>> Colin
>>>
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>
>
>
> --
> Harsh J

Re: Why is FileSystem.createNonRecursive deprecated?

2013-06-12 Thread Colin McCabe

This seems inconsistent. If the method is deprecated just because it's
in org.apache.hadoop.FileSystem, shouldn't all FileSystem methods be
marked as deprecated?

On the other hand, a user opening up FileSystem.java would probably
not realize that it is deprecated.  The JavaDoc for the class itself
doesn't mention it, although the JavaDoc for a few of the methods
talks about a "transition".

It might make more sense to mark the class as a whole @deprecated if
that is the intent.

I did a quick search on JIRA, but didn't see anything answering these
questions.  Opinions?

best,
Colin

On Tue, Jun 11, 2013 at 4:09 PM, Andrew Wang  wrote:
> Hi Ravi,
>
> I wasn't around for HADOOP-6840, but I'm guessing it's deprecated for the
> same reasons as primitiveCreate: FileSystem is supposed to eventually to be
> supplanted by FileContext.
>
> FileContext#create also has a more manageable number of method signatures
> through the use of flags, and in fact defaults to not creating parent
> directories. I believe MR2 also uses FileContext over FileSystem, so this
> might be your best bet.
>
> HTH,
> Andrew
>
>
> On Tue, Jun 11, 2013 at 3:18 PM, Ravi Prakash  wrote:
>
>> Hi folks,
>>
>> I am trying to fix MAPREDUCE-5317. I noticed that the only way through
>> FileSystem to NOT recursively create directories is through the deprecated
>> method
>>
>> @deprecated API only for 0.20-append
>> FileSystem.createNonRecursive.
>>
>>
>> This has been marked deprecated ever since it was put in by HADOOP-6840.
>> Do we know if we ever expect to un-deprecate this method? I am trying to
>> find the rationale behind checking it in as a deprecated method, but
>> haven't been able to find any written record. Does anyone know?
>> Thanks
>> Ravi

dfs.datanode.socket.reuse.keepalive

2013-06-07 Thread Colin McCabe

Hi all,

HDFS-941 added dfs.datanode.socket.reuse.keepalive.  This allows
DataXceiver worker threads in the DataNode to linger for a second or
two after finishing a request, in case the client wants to send
another request.  On the client side, HDFS-941 added a SocketCache, so
that subsequent client requests could reuse the same socket.  Sockets
were closed purely by an LRU eviction policy.

Later, HDFS-3373 added a minimum expiration time to the SocketCache,
and added a thread which periodically closed old sockets.

However, the default timeout for SocketCache (which is now called
PeerCache) is much longer than the DN would possibly keep the socket
open.  Specifically, dfs.client.socketcache.expiryMsec defaults to 2 *
60 * 1000 (2 minutes), whereas dfs.datanode.socket.reuse.keepalive
defaults to 1000 (1 second).

I'm not sure why we have such a big disparity here.  It seems like
this will inevitably lead to clients trying to use sockets which have
gone stale, because the server closes them way before the client
expires them.  Unless I'm missing something, we should probably either
lengthen the keepalive, or shorten the socket cache expiry, or both.

thoughts?
Colin

Re: [jira] [Created] (HDFS-4824) FileInputStreamCache.close leaves dangling reference to FileInputStreamCache.cacheCleaner

2013-05-15 Thread Colin McCabe

Hi Shouvanik,

Why not try asking the Talend community?

Also, this question belongs on the user list.

thanks,
Colin


On Wed, May 15, 2013 at 4:20 AM, Shouvanik Haldar <
shouvanik.hal...@gmail.com> wrote:

> Hi,
>
> I am facing a problem.
>
> I am using Talend for scheduling and running a job. But, I am getting a
> error. can anybody please help.
>
> [2013-05-15 16:30:59]Deploying job to Hadoop...
> [2013-05-15 16:31:08]Deployment failed!
> [2013-05-15 16:31:08]Can not access Hadoop File System with user root!
> [2013-05-15 16:31:08]Server IPC version 7 cannot communicate with client
> version 4
>
> Regards,
> Shouvanik
>
>
> On Wed, May 15, 2013 at 3:55 AM, Henry Robinson (JIRA)  >wrote:
>
> > Henry Robinson created HDFS-4824:
> > 
> >
> >  Summary: FileInputStreamCache.close leaves dangling
> reference
> > to FileInputStreamCache.cacheCleaner
> >  Key: HDFS-4824
> >  URL: https://issues.apache.org/jira/browse/HDFS-4824
> >  Project: Hadoop HDFS
> >   Issue Type: Bug
> >   Components: hdfs-client
> > Affects Versions: 2.0.4-alpha
> > Reporter: Henry Robinson
> > Assignee: Colin Patrick McCabe
> >
> >
> > {{FileInputStreamCache}} leaves around a reference to its
> {{cacheCleaner}}
> > after {{close()}}.
> >
> > The {{cacheCleaner}} is created like this:
> >
> > {code}
> > if (cacheCleaner == null) {
> >   cacheCleaner = new CacheCleaner();
> >   executor.scheduleAtFixedRate(cacheCleaner, expiryTimeMs,
> > expiryTimeMs,
> >   TimeUnit.MILLISECONDS);
> > }
> > {code}
> >
> > and supposedly removed like this:
> >
> > {code}
> > if (cacheCleaner != null) {
> >   executor.remove(cacheCleaner);
> > }
> > {code}
> >
> > However, {{ScheduledThreadPoolExecutor.remove}} returns a success boolean
> > which should be checked. And I _think_ from a quick read of that class
> that
> > the return value of {{scheduleAtFixedRate}} should be used as the
> argument
> > to {{remove}}.
> >
> > --
> > This message is automatically generated by JIRA.
> > If you think it was sent incorrectly, please contact your JIRA
> > administrators
> > For more information on JIRA, see:
> http://www.atlassian.com/software/jira
> >
>
>
>
> --
> Thanks,
> *Shouvanik*
>

Re: Is Hadoop SequenceFile binary safe?

2013-05-02 Thread Colin McCabe

It seems like we could just set up an escape sequence and make it actually
binary-safe, rather than just probabilistically.  The escape sequence would
only be inserted when there would otherwise be confusion between data and a
sync marker.

best,
Colin


On Thu, May 2, 2013 at 3:26 AM, Hs  wrote:

> Hi Chris,
> thanks for your replay.
> That's to say, SequenceFile is probabilistically binary safe. I notice a
> jira issue attempting to support "append" in existing SequenceFile(
> https://issues.apache.org/jira/browse/HADOOP-7139).  It occurred to me
> that
> if some hacker reads the sync marker from the existing file and append some
> elaborate data containing sync marker to the file,  the file may seem
> corrupted when calculating splits while  nothing is wrong when we read the
> SequenceFile sequentially.  However, this currently may not be a problem.
> Thank you again !
>
>
>
> 2013/4/30 Chris Douglas 
>
> > You're not missing anything, but the probability of a 16 (thought it
> > was 20?) byte collision with random bytes is vanishingly small. -C
> >
> > On Sat, Apr 27, 2013 at 4:30 AM, Hs  wrote:
> > > Hi,
> > >
> > > I am learning hadoop.  I read the SequenceFile.java in hadoop-1.0.4
> > source
> > > codes. And I find the sync(long position) method which is used to find
> a
> > > "sync marker" (a 16 bytes MD5 when generated at file creation time) in
> > > SequenceFile when splitting SequenceFile into splits in MapReduce.
> > >
> > > /** Seek to the next sync mark past a given position.*/public
> > > synchronized void sync(long position) throws IOException {
> > >   if (position+SYNC_SIZE >= end) {
> > > seek(end);
> > > return;
> > >   }
> > >
> > >   try {
> > > seek(position+4); // skip escape
> > > in.readFully(syncCheck);
> > > int syncLen = sync.length;
> > > for (int i = 0; in.getPos() < end; i++) {
> > >   int j = 0;
> > >   for (; j < syncLen; j++) {
> > > if (sync[j] != syncCheck[(i+j)%syncLen])
> > >   break;
> > >   }
> > >   if (j == syncLen) {
> > > in.seek(in.getPos() - SYNC_SIZE); // position before sync
> > > return;
> > >   }
> > >   syncCheck[i%syncLen] = in.readByte();
> > > }
> > >   } catch (ChecksumException e) { // checksum failure
> > > handleChecksumException(e);
> > >   }}
> > >
> > > According to my understanding, these codes simply look for a data
> > sequence
> > > which contain the same data as "sync marker".
> > >
> > > My doubt:
> > > Consider a situation where the data in a SequenceFile happen to
> contain a
> > > 16 bytes data sequence the same as "sync marker", the codes above will
> > > mistakenly treat that 16-bytes data as a "sync marker" and then the
> > > SequenceFile won't be correctly parsed?
> > >
> > > I don't find any "escape" operation about the data or the sync marker.
> > So,
> > > how can SequenceFile be binary safe? Am I missing something? Please
> > correct
> > > me if I am wrong.
> > >
> > > Thanks!
> > >
> > > Shawn
> >
>

Re: VOTE: HDFS-347 merge

2013-04-12 Thread Colin McCabe

Hi Azuryy,

The branch adds new APT documentation which describes the new configuration
that is needed.
It's
in ./hadoop-hdfs-project/hadoop-hdfs/src/site/apt/ShortCircuitLocalReads.apt.vm

best,
Colin


On Thu, Apr 11, 2013 at 6:37 PM, Azuryy Yu  wrote:

> It's good to know HDFS-347 win the votes finally.
>
> Does there need some additional configuration to enable these features?
>
>
>
> On Fri, Apr 12, 2013 at 2:05 AM, Colin McCabe  >wrote:
>
> > The merge vote is now closed.  With three +1s, it passes.
> >
> > thanks,
> > Colin
> >
> >
> > On Wed, Apr 10, 2013 at 10:00 PM, Aaron T. Myers 
> wrote:
> >
> > > I'm +1 as well. I've reviewed much of the code as well and have
> > personally
> > > seen it running in production at several different sites. I agree with
> > Todd
> > > that it's a substantial improvement in operability.
> > >
> > > Best,
> > > Aaron
> > >
> > > On Apr 8, 2013, at 1:19 PM, Todd Lipcon  wrote:
> > >
> > > > +1 for the branch merge. I've reviewed all of the code in the branch,
> > and
> > > > we have people now running this code in production scenarios. It is
> as
> > > > functional as the old version and way easier to set up/configure.
> > > >
> > > > -Todd
> > > >
> > > > On Mon, Apr 1, 2013 at 4:32 PM, Colin McCabe  >
> > > wrote:
> > > >
> > > >> Hi all,
> > > >>
> > > >> I think it's time to merge the HDFS-347 branch back to trunk.  It's
> > been
> > > >> under
> > > >> review and testing for several months, and provides both a
> performance
> > > >> advantage, and the ability to use short-circuit local reads without
> > > >> compromising system security.
> > > >>
> > > >> Previously, we tried to merge this and the objection was brought up
> > > that we
> > > >> should keep the old, insecure short-circuit local reads around so
> that
> > > >> platforms for which secure SCR had not yet been implemented could
> use
> > it
> > > >> (e.g. Windows).  This has been addressed-- see HDFS-4538 for
> details.
> > > >> Suresh has also volunteered to maintain the insecure SCR code until
> > > secure
> > > >> SCR can be implemented for Windows.
> > > >>
> > > >> Please cast your vote by EOD Monday 4/8.
> > > >>
> > > >> best,
> > > >> Colin
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > Todd Lipcon
> > > > Software Engineer, Cloudera
> > >
> >
>

Re: VOTE: HDFS-347 merge

2013-04-11 Thread Colin McCabe

The merge vote is now closed.  With three +1s, it passes.

thanks,
Colin


On Wed, Apr 10, 2013 at 10:00 PM, Aaron T. Myers  wrote:

> I'm +1 as well. I've reviewed much of the code as well and have personally
> seen it running in production at several different sites. I agree with Todd
> that it's a substantial improvement in operability.
>
> Best,
> Aaron
>
> On Apr 8, 2013, at 1:19 PM, Todd Lipcon  wrote:
>
> > +1 for the branch merge. I've reviewed all of the code in the branch, and
> > we have people now running this code in production scenarios. It is as
> > functional as the old version and way easier to set up/configure.
> >
> > -Todd
> >
> > On Mon, Apr 1, 2013 at 4:32 PM, Colin McCabe 
> wrote:
> >
> >> Hi all,
> >>
> >> I think it's time to merge the HDFS-347 branch back to trunk.  It's been
> >> under
> >> review and testing for several months, and provides both a performance
> >> advantage, and the ability to use short-circuit local reads without
> >> compromising system security.
> >>
> >> Previously, we tried to merge this and the objection was brought up
> that we
> >> should keep the old, insecure short-circuit local reads around so that
> >> platforms for which secure SCR had not yet been implemented could use it
> >> (e.g. Windows).  This has been addressed-- see HDFS-4538 for details.
> >> Suresh has also volunteered to maintain the insecure SCR code until
> secure
> >> SCR can be implemented for Windows.
> >>
> >> Please cast your vote by EOD Monday 4/8.
> >>
> >> best,
> >> Colin
> >>
> >
> >
> >
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
>

Re: testHDFSConf.xml

2013-04-10 Thread Colin McCabe

On Wed, Apr 10, 2013 at 10:16 AM, Jay Vyas  wrote:

> Hello HDFS brethren !
>
> I've noticed that the testHDFSConf.xml has alot of references to
> supergroup.
>
>
> https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml
>
> 1) I wonder why this is hardcoded in the testHDFSConf.xml
>
>
"supergroup" is the default supergroup in HDFS.  Check DFSConfigKeys.java:

  public static final String  DFS_PERMISSIONS_SUPERUSERGROUP_KEY =
"dfs.permissions.superusergroup";
  public static final String  DFS_PERMISSIONS_SUPERUSERGROUP_DEFAULT =
"supergroup";

It seems fine to use "supergroup" in a test.  after all, we do control the
configuration we pass into the test.


> 2) Also, Im wondering if there are any good ideas for extending/modifying
> this file for a extention of the FileSystem implementation.
>
>
It would be interesting to think about pulling the non-hdfs-specific
components of TestHDFSCLI into another test; perhaps one in common.
 Theoretically, what we print on the console should be really similar, no
matter whether HDFS or some other filesystem is being used.  In practice,
there may be some differences, however...

I find it a little bit challenging to modify TestHDFSCLI because the test
is really long and executes as a single unit.  Breaking it down into
multiple units would probably be another good improvement, at least in my
opinion.

best,
Colin


Right  now im doing some global find replace statements - but was thinking
> that maybe parameterizing the file would be a good JIRA - so that people
> could use this as a base test for FileSystem implementations
>
> Depending on feedback im certainly willing to submit and put in a first
> pass at a more modular version of this file.
>
> Its in many ways a very generalizable component of the hdfs trunk.
>
> Thanks!
> --
> Jay Vyas
> http://jayunit100.blogspot.com
>

Re: VOTE: HDFS-347 merge

2013-04-08 Thread Colin McCabe

Let's extend this vote by another 2 days just in case Nicholas doesn't find
time in his schedule today to comment.

He needs to withdraw his -1 before we can proceed.

Colin


On Mon, Apr 8, 2013 at 1:19 PM, Todd Lipcon  wrote:

> +1 for the branch merge. I've reviewed all of the code in the branch, and
> we have people now running this code in production scenarios. It is as
> functional as the old version and way easier to set up/configure.
>
> -Todd
>
> On Mon, Apr 1, 2013 at 4:32 PM, Colin McCabe 
> wrote:
>
> > Hi all,
> >
> > I think it's time to merge the HDFS-347 branch back to trunk.  It's been
> > under
> > review and testing for several months, and provides both a performance
> > advantage, and the ability to use short-circuit local reads without
> > compromising system security.
> >
> > Previously, we tried to merge this and the objection was brought up that
> we
> > should keep the old, insecure short-circuit local reads around so that
> > platforms for which secure SCR had not yet been implemented could use it
> > (e.g. Windows).  This has been addressed-- see HDFS-4538 for details.
> >  Suresh has also volunteered to maintain the insecure SCR code until
> secure
> > SCR can be implemented for Windows.
> >
> > Please cast your vote by EOD Monday 4/8.
> >
> > best,
> > Colin
> >
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Re: VOTE: HDFS-347 merge

2013-04-08 Thread Colin McCabe

Hi all,

Since it is past 2/24, this VOTE is now closed.

Colin


On Tue, Mar 5, 2013 at 3:08 PM, Tsz Wo Sze  wrote:

> Hi Colin,
>
> It is great to hear that you agree to keep HDFS-2246.  Please as well
> address my comments posted on HDFS-347 and let me know once you have posted
> a new patch on HDFS-347.  Thanks a lot!
>
> BTW, you should conclude the VOTE you started and use a separated thread
> for other discussion as Suresh mentioned earlier.  I hope you could use our
> convention on voting.  Otherwise, it is hard for others to follow.
>
> Tsz-Wo
>
>
>
>
> 
>  From: Suresh Srinivas 
> To: "hdfs-dev@hadoop.apache.org" 
> Sent: Wednesday, March 6, 2013 5:09 AM
> Subject: Re: VOTE: HDFS-347 merge
>
> Thanks Colin. Will check it out as soon as I can.
>
>
> On Tue, Mar 5, 2013 at 12:24 PM, Colin McCabe  >wrote:
>
> > On Tue, Feb 26, 2013 at 5:09 PM, Suresh Srinivas  >
> > wrote:
> > >>
> > >> Suresh, if you're willing to "support and maintain" HDFS-2246, do you
> > >> have cycles to propose a patch to the HDFS-347 branch reintegrating
> > >> HDFS-2246 with the simplifications you outlined? In your review, did
> > >> you find anything else you'd like to address prior to the merge, or is
> > >> this the only item? -C
> > >
> > >
> > > Yes, I can work on adding HDFS-2246 back in HDFS-347 branch. I will get
> > to
> > > it
> > > in a week or two, if that is okay.
> > >
> > > I have not looked at the patch closely. But I think any issues found
> > could
> > > be fixed in trunk and should not block the merge.
> >
> > Hi Suresh,
> >
> > HDFS-4538 adds the ability to use the old block reader in the HDFS-347
> > branch.  Check it out.
> >
> > Colin
> >
>
>
>
> --
> http://hortonworks.com/download/
>

Re: VOTE: HDFS-347 merge

2013-04-02 Thread Colin McCabe

On Mon, Apr 1, 2013 at 6:58 PM, Colin McCabe  wrote:

> On Mon, Apr 1, 2013 at 5:04 PM, Suresh Srinivas wrote:
>
>> Colin,
>>
>> For the record, the last email in the previous thread in ended with the
>> following comment from Nicholas:
>> > It is great to hear that you agree to keep HDFS-2246.  Please as well
>> address my comments posted on HDFS-347 and let me know once you have
>> posted
>> a new patch on HDFS-347.
>>
>>
> Hi Nicholas,
>
> Can you please open a JIRA listing what you think should be fixed or
> changed, and why?
>
> Also please specify whether it is important to fix this before the merge,
> and if so, why.  If this is a minor style change, or renaming function X to
> Y, then I think we can easily do it after the merge.
>
> thanks,
> Colin
>
>
Hi Nicholas,

I opened https://issues.apache.org/jira/browse/HDFS-4661 with some of the
style fixes that you suggested in HDFS-347.  If there is anything else you
would like to see addressed before the merge, please add it to this JIRA.

thanks,
Colin



>
>
>> I did not see any response (unless I missed it). Can you please address
>> it?
>>
>> Regards,
>> Suresh
>>
>>
>> On Mon, Apr 1, 2013 at 4:32 PM, Colin McCabe 
>> wrote:
>>
>> > Hi all,
>> >
>> > I think it's time to merge the HDFS-347 branch back to trunk.  It's been
>> > under
>> > review and testing for several months, and provides both a performance
>> > advantage, and the ability to use short-circuit local reads without
>> > compromising system security.
>> >
>> > Previously, we tried to merge this and the objection was brought up
>> that we
>> > should keep the old, insecure short-circuit local reads around so that
>> > platforms for which secure SCR had not yet been implemented could use it
>> > (e.g. Windows).  This has been addressed-- see HDFS-4538 for details.
>> >  Suresh has also volunteered to maintain the insecure SCR code until
>> secure
>> > SCR can be implemented for Windows.
>> >
>> > Please cast your vote by EOD Monday 4/8.
>> >
>> > best,
>> > Colin
>> >
>>
>>
>>
>> --
>> http://hortonworks.com/download/
>>
>
>

Re: VOTE: HDFS-347 merge

2013-04-01 Thread Colin McCabe

On Mon, Apr 1, 2013 at 5:04 PM, Suresh Srinivas wrote:

> Colin,
>
> For the record, the last email in the previous thread in ended with the
> following comment from Nicholas:
> > It is great to hear that you agree to keep HDFS-2246.  Please as well
> address my comments posted on HDFS-347 and let me know once you have posted
> a new patch on HDFS-347.
>
>
Hi Nicholas,

Can you please open a JIRA listing what you think should be fixed or
changed, and why?

Also please specify whether it is important to fix this before the merge,
and if so, why.  If this is a minor style change, or renaming function X to
Y, then I think we can easily do it after the merge.

thanks,
Colin



> I did not see any response (unless I missed it). Can you please address it?
>
> Regards,
> Suresh
>
>
> On Mon, Apr 1, 2013 at 4:32 PM, Colin McCabe 
> wrote:
>
> > Hi all,
> >
> > I think it's time to merge the HDFS-347 branch back to trunk.  It's been
> > under
> > review and testing for several months, and provides both a performance
> > advantage, and the ability to use short-circuit local reads without
> > compromising system security.
> >
> > Previously, we tried to merge this and the objection was brought up that
> we
> > should keep the old, insecure short-circuit local reads around so that
> > platforms for which secure SCR had not yet been implemented could use it
> > (e.g. Windows).  This has been addressed-- see HDFS-4538 for details.
> >  Suresh has also volunteered to maintain the insecure SCR code until
> secure
> > SCR can be implemented for Windows.
> >
> > Please cast your vote by EOD Monday 4/8.
> >
> > best,
> > Colin
> >
>
>
>
> --
> http://hortonworks.com/download/
>

VOTE: HDFS-347 merge

2013-04-01 Thread Colin McCabe

Hi all,

I think it's time to merge the HDFS-347 branch back to trunk.  It's been under
review and testing for several months, and provides both a performance
advantage, and the ability to use short-circuit local reads without
compromising system security.

Previously, we tried to merge this and the objection was brought up that we
should keep the old, insecure short-circuit local reads around so that
platforms for which secure SCR had not yet been implemented could use it
(e.g. Windows).  This has been addressed-- see HDFS-4538 for details.
 Suresh has also volunteered to maintain the insecure SCR code until secure
SCR can be implemented for Windows.

Please cast your vote by EOD Monday 4/8.

best,
Colin

Re: Heartbeat interval and timeout: why 3 secs and 10 min?

2013-03-13 Thread Colin McCabe

My understanding is that the 10 minute timeout helps to avoid replication
storms, especially during startup.

You might be interested in HDFS-3703, which adds a "stale" state which
datanodes are placed into after 30 seconds of missing heartbeats.  (This is
an optional feature controlled by dfs.namenode.check.stale.datanode )

best,
Colin


On Tue, Mar 12, 2013 at 5:29 PM, André Oriani  wrote:

> No take on this one?
>
> In Zookeeper the heartbeats happen on every third of the timeout.  If I am
> not mistaken, recomended timeout is  more than 2 minutes to avoid false
> positives.
>
> But I still cannot see the relationship on HDFS between heartbeat interval
> and timeout. Okay 10 minutes seems to be a conservative value to avoid
> false positives in  a big cluster. But that means 200 hearbeats. Heartbeats
> on HDFS are not only used for liveness detection but also to send
> information about free space and load and to receive commands from
> NameNode. So they are also essential for block placement decisions and for
> ensuring the replication levels. Would that then be reason why heartbeats
> are so frequent? A lot can happen to a DataNode in just three seconds?
>
>
> Thanks,
> André Oriani
>
>
>
> On Thu, Mar 7, 2013 at 10:37 PM, André Oriani  wrote:
>
> > Hi,
> >
> > Is there any particular reason why the default heartbeat interval is 3
> > seconds and the timeout is 10 minutes? Everywhere I looked (code, Google,
> > ..) only mentions  the values but no clue on why those values were
> chosen.
> >
> >
> > Thanks in advance,
> > André Oriani
> >
>

Re: VOTE: HDFS-347 merge

2013-03-05 Thread Colin McCabe

On Tue, Feb 26, 2013 at 5:09 PM, Suresh Srinivas  wrote:
>>
>> Suresh, if you're willing to "support and maintain" HDFS-2246, do you
>> have cycles to propose a patch to the HDFS-347 branch reintegrating
>> HDFS-2246 with the simplifications you outlined? In your review, did
>> you find anything else you'd like to address prior to the merge, or is
>> this the only item? -C
>
>
> Yes, I can work on adding HDFS-2246 back in HDFS-347 branch. I will get to
> it
> in a week or two, if that is okay.
>
> I have not looked at the patch closely. But I think any issues found could
> be fixed in trunk and should not block the merge.

Hi Suresh,

HDFS-4538 adds the ability to use the old block reader in the HDFS-347
branch.  Check it out.

Colin

Re: VOTE: HDFS-347 merge

2013-02-27 Thread Colin McCabe

Here is a compromise proposal, which hopefully will satisfy both sides:
We keep the old block reader and have a configuration option that enables it.

So in addition to dfs.client.use.legacy.blockreader, which we already
have, we would have dfs.client.use.legacy.blockreader.local.

Does that make sense?

best,
Colin


On Wed, Feb 27, 2013 at 12:06 PM, Eli Collins  wrote:
> On Wed, Feb 27, 2013 at 11:45 AM, sanjay Radia  wrote:
>>
>> On Feb 26, 2013, at 1:51 PM, Eli Collins wrote:
>>
>>> it doesn't seem right to hold up 347 up for Windows support given that
>>> Windows support has not been merged to trunk yet, is not in any Apache
>>> release, etc. Personally I don't like establishing the precedent here
>>> that we can hold up a merge due to requirements from an unmerged
>>> branch.
>>
>> It is not being held back of for the windows port. It is being held back 
>> because 2246 should not be removed as part of 347; a separate jira should 
>> had been filed to remove it.
>
> This isn't about just having a separate jira though right?  We could
> easily pull the change out to two jiras (one removes 2246 and then
> next adds 347), they weren't separated because the goal for 347 was to
> be a re-write of the same feature (direct reads).  You commented on
> 2246 that it is a temporary workaround for 347, do you no longer feel
> that way?  Your reply to ATM made it seem like this was something that
> we'd be maintaining for a while (vs being a stopgap until 347 adds
> Windows support).

Re: VOTE: HDFS-347 merge

2013-02-25 Thread Colin McCabe

On Sat, Feb 23, 2013 at 4:23 PM, Tsz Wo Sze  wrote:
> I still do not see a valid reason to remove HDFS-2246 immediately.  Some 
> users may have insecure clusters and they don't want to change their 
> configuration.
>
> BTW, is Unix Domain Socket supported by all Unix-like systems?  Does anyone 
> can confirm that or show some counterexamples?

UNIX domain sockets are supported on MacOS, FreeBSD, OpenBSD, NetBSD,
Linux, etc etc.  Their behavior is standardized by POSIX and
implemented by all POSIX OSes... although there are some OS-specific
behaviors which we avoid.

Colin


>
>
> Tsz-Wo
>
>
>
> 
>  From: Aaron T. Myers 
> To: "hdfs-dev@hadoop.apache.org" ; Tsz Wo Sze 
> 
> Sent: Friday, February 22, 2013 6:40 PM
> Subject: Re: VOTE: HDFS-347 merge
>
> On Fri, Feb 22, 2013 at 6:32 PM, Tsz Wo Sze  wrote:
>
>> Another
>>  substantive concern is that HDFS-347 is not as well tested as
>> HDFS-2246.  So, we should keep HDFS-2246 around for sometime and remove
>> it later.  Is this the usual practice?
>>
>
> I'm proposing we do just that - keep HDFS-2246 around in branch-2 to let
> HDFS-347 soak a bit on trunk and then remove HDFS-2246 from branch-2 once
> we're confident in HDFS-347 and trunk adds Windows support. As Colin
> pointed out, this VOTE has always been about only merging this branch to
> trunk.
>
>
> --
> Aaron T. Myers
> Software Engineer, Cloudera

Re: VOTE: HDFS-347 merge

2013-02-22 Thread Colin McCabe

On Thu, Feb 21, 2013 at 1:24 PM, Chris Douglas  wrote:
> On Wed, Feb 20, 2013 at 5:12 PM, Aaron T. Myers  wrote:
>> Given that the only substantive concerns with HDFS-347 seem to be about
>> Windows support for local reads, for now we only merge this branch to
>> trunk. Support for doing HDFS-2246 style local reads will be removed from
>> trunk, but retained in branch-2 for now. Only once someone adds support for
>> doing HDFS-347 style local reads which work on Windows will we consider
>> merging HDFS-347 to branch-2. This should ensure that there's no feature
>> regression on branch-2, but also means that we will not need to maintain
>> the HDFS-2246 code path alongside the HDFS-347 code path indefinitely.
>
> This seems reasonable, though retaining HDFS-2246 in branch-2 could be
> a workaround if a Windows port of HDFS-347 is not forthcoming. -C

This seems like a reasonable solution to me.

To implement this on Windows, you would use the DuplicateHandle or
WSADuplicateSocket APIs on that platform.  I think the longer we delay
the merge, the less time we will actually have to implement secure
short-circuit on Windows.  So we should not delay it any longer.

best,
Colin

VOTE: HDFS-347 merge

2013-02-17 Thread Colin McCabe

Hi all,

I would like to merge the HDFS-347 branch back to trunk.  It's been
under intensive review and testing for several months.  The branch
adds a lot of new unit tests, and passes Jenkins as of 2/15 [1]

We have tested HDFS-347 with both random and sequential workloads. The
short-circuit case is substantially faster [2], and overall
performance looks very good.  This is especially encouraging given
that the initial goal of this work was to make security compatible
with short-circuit local reads, rather than to optimize the
short-circuit code path.  We've also stress-tested HDFS-347 on a
number of clusters.

This iniial VOTE is to merge only into trunk.  Just as we have done
with our other recent merges, we will consider merging into branch-2
after the code has been in trunk for few weeks.

Please cast your vote by EOD Sunday 2/24.

best,
Colin McCabe

[1] 
https://issues.apache.org/jira/browse/HDFS-347?focusedCommentId=13579704&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13579704

[2] 
https://issues.apache.org/jira/browse/HDFS-347?focusedCommentId=13551755&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13551755

HDFS-347 (Short-circuit local reads with security)

2013-01-15 Thread Colin McCabe

Hi all,

The HDFS-347 branch implements short-circuit local reads with support for
security.  This is just a "heads up" that it is getting ready to merge, and
we will probably send out an email about that next week.

HDFS-347 has been under review for a while-- most of the code was written
in November and December and reviewed on Apache reviewboard [1].  Recently,
we split the JIRA into subtasks and created a branch for them, in order to
facilitate review (see the subtasks on HDFS-347.)

We have tested HDFS-347 with both random and sequential workloads.  The
short-circuit case is substantially faster than previously, and overall
performance looks very good.  This is especially encouraging given that the
initial goal of this work was to make security compatible with
short-circuit local reads, rather than to optimize the short-circuit code
path.

For detailed benchmarks, see Todd Lipcon's comment here: [2]

best,
Colin McCabe

[1]. https://reviews.apache.org/r/8554/

[2].
https://issues.apache.org/jira/browse/HDFS-347?focusedCommentId=13551755&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13551755

Re: Release of Decompressor resources in CodecPool

2012-12-27 Thread Colin McCabe

I think that you're right.  It looks like BuiltInGzipDecompressor,
which is marked as DoNotPool, ends up owning some JNI-managed
resources.  In this case, just relying on the GC to get around to
calling the finalizer isn't a great idea.  I think you should open a
JIRA.

cheers,
Colin

On Mon, Dec 24, 2012 at 11:11 PM, lars hofhansl  wrote:
> In CodecPool.returnDecompressor, should we call Decompressor.end() in the 
> "DoNotPool" case?
> Otherwise end() is only by finalize(), which is pretty terrible.
>
> -- Lars

Re: Recovering fsImage from Namenode Logs

2012-12-27 Thread Colin McCabe

On Thu, Dec 20, 2012 at 12:33 AM, ishan chhabra  wrote:
> Unfortunately, the checkpoint image that I have has the deletes recorded. I
> cannot use it. I do have an image that is 15 days old, which I am currently
> running.
>
> I looked at the my logs and I have the filename, block allocated and
> generation stamp. Can you explain to me the importance of the generation
> stamp here? Since my hdfs cluster is operational with the old image and I
> am writing new data to it, the generation stamp must have been incremented
> beyond what it was 15 days ago. If I try to restore a block that we
> written, lets say, 13 days ago, there can be generation stamp collision.
> So, if I stop my cluster and make the new entries with generation stamp
> increments after what is currently in the namenode, will it be ok? Is the
> generation stamp stored somewhere in the datanode or the block stored in
> the datanode?

The generation stamp is stored by the datanode in the block directory,
as part of the .meta filename.

For example, if block -8546336708468389550 has genstamp 1002, you
would see something like this:

cmccabe@keter:/h> ls -l
/r/data1/current/BP-380817083-127.0.0.1-1356638793552/current/finalized/*8546336708468389550*
total 8
-rw-r--r-- 1 cmccabe users 2025 Dec 27 12:38 blk_-8546336708468389550
-rw-r--r-- 1 cmccabe users   23 Dec 27 12:38 blk_-8546336708468389550_1002.meta

cheers,
Colin


>
> Thanks for the clarifications.
>
> On Wed, Dec 19, 2012 at 10:40 PM, Harsh J  wrote:
>
>> proper INode entries of them to append/recreate your fsimage. They are
>
>
>
>
> --
> Thanks.
>
> Regards,
> Ishan Chhabra

Re: FSDataInputStream.read returns -1 with growing file and never continues reading

2012-12-27 Thread Colin McCabe

Also, read() returning -1 is not an error, it's EOF.  This is the same
as for the regular Java InputStream.

best,
Colin


On Thu, Dec 20, 2012 at 10:32 AM, Christoph Rupp  wrote:
> Thank you, Harsh. I appreciate it.
>
> 2012/12/20 Harsh J 
>
>> Hi Christoph,
>>
>> If you use sync/hflush/hsync, the new length of data is only seen by a
>> new reader, not an existent reader. The "workaround" you've done
>> exactly how we've implemented the "fs -tail " utility. See code
>> for that at
>> http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Tail.java?view=markup
>> (Note the looping at ~74).
>>
>> On Thu, Dec 20, 2012 at 5:51 PM, Christoph Rupp  wrote:
>> > Hi,
>> >
>> > I am experiencing an unexpected situation where FSDataInputStream.read()
>> > returns -1 while reading data from a file that another process still
>> appends
>> > to. According to the documentation read() should never return -1 but
>> throw
>> > Exceptions on errors. In addition, there's more data available, and
>> read()
>> > definitely should not fail.
>> >
>> > The problem gets worse because the FSDataInputStream is not able to
>> recover
>> > from this. If it once returns -1 then it will always return -1, even if
>> the
>> > file continues growing.
>> >
>> > If, at the same time, other Java processes read other HDFS files, they
>> will
>> > also return -1 immediately after opening the file. It smells like this
>> error
>> > gets propagated to other client processes as well.
>> >
>> > I found a workaround: close the FSDataInputStream, open it again and then
>> > seek to the previous position. And then reading works fine.
>> >
>> > Another problem that i have seen is that the FSDataInputStream returns -1
>> > when reaching EOF. It will never return 0 (which i would expect when
>> > reaching EOF).
>> >
>> > I use CDH 4.1.2, but also saw this with CDH 3u5. I have attached samples
>> to
>> > reproduce this.
>> >
>> > My cluster consists of 4 machines; 1 namenode and 3 datanodes. I run my
>> > tests on the namenode machine. there are no other HDFS users, and the
>> load
>> > that is generated by my tests is fairly low, i would say.
>> >
>> > One process writes to 6 files simultaneously, but with a 5 sec sleep
>> between
>> > each write. It uses an FSDataOutputStream, and after writing data it
>> calls
>> > sync(). Each write() appends 8 mb; it stops when the file grows to 100
>> mb.
>> >
>> > Six processes read files; each process reads one file. At first each
>> reader
>> > loops till the file exists. If it does then it opens the
>> FSDataInputStream
>> > and starts reading. Usually the first process returns the first 8 MB in
>> the
>> > file before it starts returning -1. But the other processes immediately
>> > return -1 without reading any data. I start the 6 reader processes
>> before i
>> > start the writer.
>> >
>> > Search HdfsReader.java for "WORKAROUND" and remove the comments; this
>> will
>> > reopen the FSDataInputStream after -1 is returned, and then everything
>> > works.
>> >
>> > Sources are attached.
>> >
>> > This is a very basic scenario and i wonder if i'm doing anything wrong
>> or if
>> > i found an HDFS bug.
>> >
>> > bye
>> > Christoph
>> >
>>
>>
>>
>> --
>> Harsh J
>>

Re: How to speedup test case running?

2012-10-22 Thread Colin McCabe

Hi,

You can run a specific test with mvn eclipse -Dtest=

I find that junit tests start more quickly when run within Eclipse.
If you're interested, you can find instructions on setting up eclipse
here:
http://wiki.apache.org/hadoop/EclipseEnvironment

cheers,
Colin


On Sun, Oct 21, 2012 at 7:00 PM, 谢良  wrote:
> Hi devs,
>
> are there any tips or parameters to pass to "mvn test" to make the running 
> more aggressively?  it costs me almost two hours for a "mvn test" run under 
> "haddop-hdfs" directory.  I tried the magic in HBase community(ramdisk & 
> surefire.secondPartThreadCount),  seems didn't work here:)
>
> Best,
> Liang

Re: MiniDFSCluster

2012-09-05 Thread Colin McCabe

Hi Vlad,

I think you might be on to something.  File a JIRA?

It should be a simple improvement, I think.

cheers,
Colin


On Wed, Sep 5, 2012 at 10:42 AM, Vladimir Rozov  wrote:
> There are few methods on MiniDFSCluster class that are declared as static 
> (getBlockFile, getStorageDirPath), though as long as MiniDFSCluster is not a 
> singleton they should be instance methods not class methods. In my tests I 
> see that starting second instance of MiniDFSCluster invalidates the first 
> instance if I don’t change cluster base directory (existing data directory is 
> fully deleted), but at the same time static declaration of getBlockFile and 
> getStorageDirPath does not allow base directory to be changed without 
> affecting functionality.
>
>
>
> Thank you,
>
>
>
> Vlad

Re: validating user IDs

2012-06-11 Thread Colin McCabe

Sure.  We could also find the current user ID and bake that into the
test as an "acceptable" UID.  If that makes sense.

Colin


On Mon, Jun 11, 2012 at 4:12 PM, Alejandro Abdelnur  wrote:
> Colin,
>
> Would be possible using some kind of cmake config magic to set a macro to
> the current OS limit? Even if this means detecting the OS version and
> assuming its default limit.
>
> thx
>
> On Mon, Jun 11, 2012 at 3:57 PM, Colin McCabe wrote:
>
>> Hi all,
>>
>> I recently pulled the latest source, and ran a full build.  The
>> command line was this:
>> mvn compile -Pnative
>>
>> I was confronted with this:
>>
>> [INFO] Requested user cmccabe has id 500, which is below the minimum
>> allowed 1000
>> [INFO] FAIL: test-container-executor
>> [INFO] 
>> [INFO] 1 of 1 test failed
>> [INFO] Please report to mapreduce-...@hadoop.apache.org
>> [INFO] 
>> [INFO] make[1]: *** [check-TESTS] Error 1
>> [INFO] make[1]: Leaving directory
>>
>> `/home/cmccabe/hadoop4/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/native/container-executor'
>>
>> Needless to say, it didn't do much to improve my mood.  I was even
>> less happy when I discovered that -DskipTests has no effect on native
>> tests (they always run.)  See HADOOP-8480.
>>
>> Unfortunately, it seems like this problem is popping up more and more
>> in our native code.  It first appeared in test-task-controller (see
>> MAPREDUCE-2376) and then later in test-container-executor
>> (HADOOP-8499).  The basic problem seems to be the hardcoded assumption
>> that all user IDs below 1000 are system IDs.
>>
>> It is true that there are configuration files that can be changed to
>> alter the minimum user ID, but unfortunately these configuration files
>> are not used by the unit tests.  So anyone developing on a platform
>> where the user IDs start at 500 is now a second-class citizen, unable
>> to run unit tests.  This includes anyone running Red Hat, MacOS,
>> Fedora, etc.
>>
>> Personally, I can change my user ID.  It's a time-consuming process,
>> because I need to re-uid all files, but I can do it.  This luxury may
>> not be available to everyone, though-- developers who don't have root
>> on their machines, or are using a pre-assigned user ID to connect to
>> NFS come to mind.
>>
>> It's true that we could hack around this with environment variables.
>> It might even be possible to have Maven set these environment
>> variables automatically from the current user ID.  However, the larger
>> question I have here is whether this UID validation scheme even makes
>> any sense.  I have a user named "nobody" whose user ID is 65534.
>> Surely I should not be able to run map-reduce jobs as this user?  Yet,
>> under the current system, I can do exactly that.  The root of the
>> problem seems to be that there is both a default minimum and a default
>> maximum for "automatic" user IDs.  This configuration seems to be
>> stored in /etc/login.defs.
>>
>> On my system, it has:
>> SYSTEM_UID_MIN            100
>> SYSTEM_UID_MAX            499
>> UID_MIN                  500
>> UID_MAX                 6
>>
>> So that means that anything over 6 (like nobody) is not considered
>> a valid user ID for regular users.
>> We could potentially read this file (at least on Linux) and get more
>> sensible defaults.
>>
>> I am also curious if we could simply check whether the user we're
>> trying to run the job as has a valid login shell.  System users are
>> almost always set to have a login shell of /bin/false or
>> /sbin/nologin.
>>
>> Thoughts?
>> Colin
>>
>
>
>
> --
> Alejandro

1 2 >

1 - 100 of 105 matches

Mail list logo