Re: Apache Hadoop 3.0.1 Release plan

2018-02-01 Thread Aaron T. Myers
Hey Anu,

My feeling on HDFS-12990 is that we've discussed it quite a bit already and
it doesn't seem at this point like either side is going to budge. I'm
certainly happy to have a phone call about it, but I don't expect that we'd
make much progress.

My suggestion is that we simply include the patch posted to HDFS-12990 in
the 3.0.1 RC and call this issue out clearly in the subsequent VOTE thread
for the 3.0.1 release. Eddy, are you up for that?

Best,
Aaron

On Thu, Feb 1, 2018 at 1:13 PM, Lei Xu  wrote:

> +Xiao
>
> My understanding is that we will have this for 3.0.1.   Xiao, could
> you give your inputs here?
>
> On Thu, Feb 1, 2018 at 11:55 AM, Anu Engineer 
> wrote:
> > Hi Eddy,
> >
> > Thanks for driving this release. Just a quick question, do we have time
> to close this issue?
> > https://issues.apache.org/jira/browse/HDFS-12990
> >
> > or are we abandoning it? I believe that this is the last window for us
> to fix this issue.
> >
> > Should we have a call and get this resolved one way or another?
> >
> > Thanks
> > Anu
> >
> > On 2/1/18, 10:51 AM, "Lei Xu"  wrote:
> >
> > Hi, All
> >
> > I just cut branch-3.0.1 from branch-3.0.  Please make sure all
> patches
> > targeted to 3.0.1 being checked in both branch-3.0 and branch-3.0.1.
> >
> > Thanks!
> > Eddy
> >
> > On Tue, Jan 9, 2018 at 11:17 AM, Lei Xu  wrote:
> > > Hi, All
> > >
> > > We have released Apache Hadoop 3.0.0 in December [1]. To further
> > > improve the quality of release, we plan to cut branch-3.0.1 branch
> > > tomorrow for the preparation of Apache Hadoop 3.0.1 release. The
> focus
> > > of 3.0.1 will be fixing blockers (3), critical bugs (1) and bug
> fixes
> > > [2].  No new features and improvement should be included.
> > >
> > > We plan to cut branch-3.0.1 tomorrow (Jan 10th) and vote for RC on
> Feb
> > > 1st, targeting for Feb 9th release.
> > >
> > > Please feel free to share your insights.
> > >
> > > [1] https://www.mail-archive.com/general@hadoop.apache.org/
> msg07757.html
> > > [2] https://issues.apache.org/jira/issues/?filter=12342842
> > >
> > > Best,
> > > --
> > > Lei (Eddy) Xu
> > > Software Engineer, Cloudera
> >
> >
> >
> > --
> > Lei (Eddy) Xu
> > Software Engineer, Cloudera
> >
> > 
> -
> > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> > For additional commands, e-mail: common-dev-h...@hadoop.apache.org
> >
> >
> >
>
>
>
> --
> Lei (Eddy) Xu
> Software Engineer, Cloudera
>
> -
> To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
>
>


Re: [VOTE] Release Apache Hadoop 3.0.0 RC1

2017-12-11 Thread Aaron T. Myers
+1 (binding)

- downloaded the src tarball and built the source (-Pdist -Pnative)
- verified the checksum
- brought up a secure pseudo distributed cluster
- did some basic file system operations (mkdir, list, put, cat) and
confirmed that everything was working
- confirmed that the web UI worked

Best,
Aaron

On Fri, Dec 8, 2017 at 12:31 PM, Andrew Wang 
wrote:

> Hi all,
>
> Let me start, as always, by thanking the efforts of all the contributors
> who contributed to this release, especially those who jumped on the issues
> found in RC0.
>
> I've prepared RC1 for Apache Hadoop 3.0.0. This release incorporates 302
> fixed JIRAs since the previous 3.0.0-beta1 release.
>
> You can find the artifacts here:
>
> http://home.apache.org/~wang/3.0.0-RC1/
>
> I've done the traditional testing of building from the source tarball and
> running a Pi job on a single node cluster. I also verified that the shaded
> jars are not empty.
>
> Found one issue that create-release (probably due to the mvn deploy change)
> didn't sign the artifacts, but I fixed that by calling mvn one more time.
> Available here:
>
> https://repository.apache.org/content/repositories/orgapachehadoop-1075/
>
> This release will run the standard 5 days, closing on Dec 13th at 12:31pm
> Pacific. My +1 to start.
>
> Best,
> Andrew
>


[jira] [Created] (HDFS-11441) Add escaping to error messages in web UIs

2017-02-22 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-11441:
-

 Summary: Add escaping to error messages in web UIs
 Key: HDFS-11441
 URL: https://issues.apache.org/jira/browse/HDFS-11441
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security
Affects Versions: 2.8.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
Priority: Minor


There's a handful of places where web UIs don't escape error messages. We 
should add escaping in these places.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 3.0.0-alpha1 RC0

2016-08-31 Thread Aaron T. Myers
+1 (binding) from me. Downloaded the source, built from source, set up a
pseudo cluster, and ran a few of the sample jobs.

Thanks a lot for doing all this release work, Andrew.

--
Aaron T. Myers
Software Engineer, Cloudera

On Tue, Aug 30, 2016 at 8:51 AM, Andrew Wang 
wrote:

> Hi all,
>
> Thanks to the combined work of many, many contributors, here's an RC0 for
> 3.0.0-alpha1:
>
> http://home.apache.org/~wang/3.0.0-alpha1-RC0/
>
> alpha1 is the first in a series of planned alpha releases leading up to GA.
> The objective is to get an artifact out to downstreams for testing and to
> iterate quickly based on their feedback. So, please keep that in mind when
> voting; hopefully most issues can be addressed by future alphas rather than
> future RCs.
>
> Sorry for getting this out on a Tuesday, but I'd still like this vote to
> run the normal 5 days, thus ending Saturday (9/3) at 9AM PDT. I'll extend
> if we lack the votes.
>
> Please try it out and let me know what you think.
>
> Best,
> Andrew
>


Re: Why there are so many revert operations on trunk?

2016-06-06 Thread Aaron T. Myers
Junping,

All of this is being discussed on HDFS-9924. Suggest you follow the
conversation there.

--
Aaron T. Myers
Software Engineer, Cloudera

On Mon, Jun 6, 2016 at 7:20 AM, Junping Du  wrote:

> Hi Andrew,
>
>  I just noticed you revert 8 commits on trunk last Friday:
>
> HADOOP-13226
>
> HDFS-10430
>
> HDFS-10431
>
> HDFS-10390
>
> HADOOP-13168
>
> HDFS-10390
>
> HADOOP-13168
>
> HDFS-10346
>
> HADOOP-12957
>
> HDFS-10224
>
>And I didn't see you have any comments on JIRA or email discussion
> before you did this. I don't think we are legally allowed to do this even
> as committer/PMC member. Can you explain what's your intention to do this?
>
>BTW, thanks Nicolas to revert all these "illegal" revert operations.
>
>
>
> Thanks,
>
>
> Junping
>


Re: 'Target Version' field missing in Jira

2016-02-16 Thread Aaron T. Myers
Great news. Thanks a lot for looking into this, Vinod.

--
Aaron T. Myers
Software Engineer, Cloudera

On Tue, Feb 16, 2016 at 11:42 AM, Vinod Kumar Vavilapalli <
vino...@apache.org> wrote:

> The data is still intact as I can see that I can continue to search on
> JIRA against target-version. But only in advanced-search though.
>
> +Vinod
>
> > On Feb 16, 2016, at 10:23 AM, Aaron T. Myers  wrote:
> >
> > I worry
> > that all the data for the target versions has disappeared as well...
>
>


Re: 'Target Version' field missing in Jira

2016-02-16 Thread Aaron T. Myers
Has anyone followed up with ASF Infra about getting this addressed? I worry
that all the data for the target versions has disappeared as well...

--
Aaron T. Myers
Software Engineer, Cloudera

On Fri, Feb 12, 2016 at 1:35 PM, Kihwal Lee 
wrote:

> It's still here:
> https://issues.apache.org/jira/plugins/servlet/project-config/HDFS/fields
> But somehow not showing up on pages.
> Kihwal
>
>
>   From: Arpit Agarwal 
>  To: "common-...@hadoop.apache.org" ; "
> hdfs-dev@hadoop.apache.org" 
>  Sent: Friday, February 12, 2016 1:52 PM
>  Subject: 'Target Version' field missing in Jira
>
> Is it just me or has the Target Version/s field gone missing from Apache
> Hadoop Jira? I don't recall any recent discussion about it.
>
>
>
>


Re: Hadoop encryption module as Apache Chimera incubator project

2016-01-28 Thread Aaron T. Myers
On Wed, Jan 27, 2016 at 11:31 AM, Owen O'Malley  wrote:

> I believe encryption is becoming a core part of Hadoop. I think that moving
> core components out of Hadoop is bad from a project management perspective.
>

Although it's certainly true that encryption capabilities (in HDFS, YARN,
etc.) are becoming core to Hadoop, I don't think that should really
influence whether or not the non-Hadoop-specific encryption routines should
be part of the Hadoop code base, or part of the code base of another
project that Hadoop depends on. If Chimera had existed as a library hosted
at ASF when HDFS encryption was first developed, HDFS probably would have
just added that as a dependency and been done with it. I don't think we
would've copy/pasted the code for Chimera into the Hadoop code base.


> To put it another way, a bug in the encryption routines will likely become
> a security problem that security@hadoop needs to hear about.
>
I don't think
> adding a separate project in the middle of that communication chain is a
> good idea. The same applies to data corruption problems, and so on...
>

Isn't the same true of all the libraries that Hadoop currently depends
upon? If the commons-httpclient library (or commons-codec, or commons-io,
or guava, or...) has a security vulnerability, we need to know about it so
that we can update our dependency to a fixed version. This case doesn't
seem materially different than that.


>
>
> > It may be good to keep at generalized place(As in the
> > discussion, we thought that place could be Apache Commons).
>
>
> Apache Commons is a collection of *Java* projects, so Chimera as a
> JNI-based library isn't a natural fit.
>

Could very well be that Apache Commons's charter would preclude Chimera.
You probably know better than I do about that.


> Furthermore, Apache Commons doesn't
> have its own security list so problems will go to the generic
> secur...@apache.org.
>

That seems easy enough to remedy, if they wanted to, and besides I'm not
sure why that would influence this discussion. In my experience projects
that don't have a separate security@project.a.o mailing list tend to just
handle security issues on their private@project.a.o mailing list, which
seems fine to me.


>
> Why do you think that Apache Commons is a better home than Hadoop?
>

I'm certainly not at all wedded to Apache Commons, that just seemed like a
natural place to put it to me. Could be that a brand new TLP might make
more sense.

I *do* think that if other non-Hadoop projects want to make use of Chimera,
which as I understand it is the goal which started this thread, then
Chimera should exist outside of Hadoop so that:

a) Projects that have nothing to do with Hadoop can just depend directly on
Chimera, which has nothing Hadoop-specific in there.

b) The Hadoop project doesn't have to export/maintain/concern itself with
yet another publicly-consumed interface.

c) Chimera can have its own (presumably much faster) release cadence
completely separate from Hadoop.

--
Aaron T. Myers
Software Engineer, Cloudera


Re: Hadoop encryption module as Apache Chimera incubator project

2016-01-20 Thread Aaron T. Myers
+1 for Hadoop depending upon Chimera, assuming Chimera can get
hosted/released under some Apache project umbrella. If that's Apache
Commons (which makes a lot of sense to me) then I'm also a big +1 on
Andrew's suggestion that we make it a separate module.

Uma, would you be up for approaching the Apache Commons folks saying that
you'd like to contribute Chimera? I'd recommend saying that Hadoop and
Spark are both on board to depend on this.

--
Aaron T. Myers
Software Engineer, Cloudera

On Wed, Jan 20, 2016 at 4:31 PM, Andrew Wang 
wrote:

> Thanks Uma for putting together this proposal. Overall sounds good to me,
> +1 for these improvements. A few comments/questions:
>
> * If it becomes part of Apache Commons, could we make Chimera a separate
> JAR? We have real difficulties bumping dependency versions right now, so
> ideally we don't need to bump our existing Commons dependencies to use
> Chimera.
> * With this refactoring, do we have confidence that we can get our desired
> changes merged and released in a timely fashion? e.g. if we find another
> bug like HADOOP-11343, we'll first need to get the fix into Chimera, have a
> new Chimera release, then bump Hadoop's Chimera dependency. This also
> relates to the previous point, it's easier to do this dependency bump if
> Chimera is a separate JAR.
>
> Best,
> Andrew
>
> On Mon, Jan 18, 2016 at 11:46 PM, Gangumalla, Uma <
> uma.ganguma...@intel.com>
> wrote:
>
> > Hi Devs,
> >
> >   Some of our Hadoop developers working with Spark community to implement
> > the shuffle encryption. While implementing that, they realized some/most
> of
> > the code in Hadoop encryption code and their  implemention code have to
> be
> > duplicated. This leads to an idea to create separate library, named it as
> > Chimera (https://github.com/intel-hadoop/chimera). It is an optimized
> > cryptographic library. It provides Java API for both cipher level and
> Java
> > stream level to help developers implement high performance AES
> > encryption/decryption with the minimum code and effort. Chimera was
> > originally based Hadoop crypto code but was improved and generalized a
> lot
> > for supporting wider scope of data encryption needs for more components
> in
> > the community.
> >
> > So, now team is thinking to make this library code as open source project
> > via Apache Incubation.  Proposal is Chimera to join the Apache as
> > incubating or Apache commons for facilitating its adoption.
> >
> > In general this will get the following advantages:
> > 1. As Chimera embedded the native in jar (similar to Snappy java), it
> > solves the current issues in Hadoop that a HDFS client has to depend
> > libhadoop.so if the client needs to read encryption zone in HDFS. This
> > means a HDFS client may has to depend a Hadoop installation in local
> > machine. For example, HBase uses depends on HDFS client jar other than a
> > Hadoop installation and then has no access to libhadoop.so. So HBase
> cannot
> > use an encryption zone or it cause error.
> > 2. Apache Spark shuffle and spill encryption could be another example
> > where we can use Chimera. We see the fact that the stream encryption for
> > Spark shuffle and spill doesn’t require a stream cipher like AES/CTR,
> > although the code shares the common characteristics of a stream style
> API.
> > We also see the need of optimized Cipher for non-stream style use cases
> > such as network encryption such as RPC. These improvements actually can
> be
> > shared by more projects of need.
> >
> > 3. Simplified code in Hadoop to use dedicated library. And drives more
> > improvements. For example, current the Hadoop crypto code API is totally
> > based on AES/CTR although it has cipher suite configurations.
> >
> > AES/CTR is for HDFS data encryption at rest, but it doesn’t necessary to
> > be AES/CTR for all the cases such as Data transfer encryption and
> > intermediate file encryption.
> >
> >
> >
> >  So, we wanted to check with Hadoop community about this proposal. Please
> > provide your feedbacks on it.
> >
> > Regards,
> > Uma
> >
>


Re: INotify stability

2015-09-16 Thread Aaron T. Myers
Hey Mohammad,

Ravi's suggestion of getting a heap dump would almost certainly let you get
to the bottom of this, but I'll just throw out there that it sounds like
you may have been hitting https://issues.apache.org/jira/browse/HDFS-8965,
which indeed can be triggered by use of inotify. That's fixed now, and in
addition to HDFS-8964 should fully address that issue.

--
Aaron T. Myers
Software Engineer, Cloudera

On Wed, Sep 16, 2015 at 12:41 PM, Ravi Prakash  wrote:

> Hi Mohammad!
>
> Thanks for reporting the issue. Could you please take a heap dump of the
> NN and analyze it to see where the memory is being spent?
>
> Thanks
> Ravi
>
>
>
> On Tuesday, September 15, 2015 11:53 AM, Mohammad Islam <
> misla...@yahoo.com> wrote:
>
>
> Hi,
> We were using INotify feature in one of our internal service. Looks like
> it creates a lot of memory pressure on NN. Memory usage goes very high and
> remains the same causing expensive GC.
>
> Did anyone use this feature in any service? Is there any con to setup? We
> are using latest CDH.
>
> Regards,
> Mohammad
>
>
>
>
>


[jira] [Resolved] (HDFS-2433) TestFileAppend4 fails intermittently

2015-07-20 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-2433.
--
Resolution: Cannot Reproduce

I don't think I've seen this fail in a long, long time. Going to close this 
out. Please reopen if you disagree.

> TestFileAppend4 fails intermittently
> 
>
> Key: HDFS-2433
> URL: https://issues.apache.org/jira/browse/HDFS-2433
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode, test
>Affects Versions: 0.20.205.0, 1.0.0
>Reporter: Robert Joseph Evans
>Priority: Critical
> Attachments: failed.tar.bz2
>
>
> A Jenkins build we have running failed twice in a row with issues form 
> TestFileAppend4.testAppendSyncReplication1 in an attempt to reproduce the 
> error I ran TestFileAppend4 in a loop over night saving the results away.  
> (No clean was done in between test runs)
> When TestFileAppend4 is run in a loop the testAppendSyncReplication[012] 
> tests fail about 10% of the time (14 times out of 130 tries)  They all fail 
> with something like the following.  Often it is only one of the tests that 
> fail, but I have seen as many as two fail in one run.
> {noformat}
> Testcase: testAppendSyncReplication2 took 32.198 sec
> FAILED
> Should have 2 replicas for that block, not 1
> junit.framework.AssertionFailedError: Should have 2 replicas for that block, 
> not 1
> at 
> org.apache.hadoop.hdfs.TestFileAppend4.replicationTest(TestFileAppend4.java:477)
> at 
> org.apache.hadoop.hdfs.TestFileAppend4.testAppendSyncReplication2(TestFileAppend4.java:425)
> {noformat}
> I also saw several other tests that are a part of TestFileApped4 fail during 
> this experiment.  They may all be related to one another so I am filing them 
> in the same JIRA.  If it turns out that they are not related then they can be 
> split up later.
> testAppendSyncBlockPlusBbw failed 6 out of the 130 times or about 5% of the 
> time
> {noformat}
> Testcase: testAppendSyncBlockPlusBbw took 1.633 sec
> FAILED
> unexpected file size! received=0 , expected=1024
> junit.framework.AssertionFailedError: unexpected file size! received=0 , 
> expected=1024
> at 
> org.apache.hadoop.hdfs.TestFileAppend4.assertFileSize(TestFileAppend4.java:136)
> at 
> org.apache.hadoop.hdfs.TestFileAppend4.testAppendSyncBlockPlusBbw(TestFileAppend4.java:401)
> {noformat}
> testAppendSyncChecksum[012] failed 2 out of the 130 times or about 1.5% of 
> the time
> {noformat}
> Testcase: testAppendSyncChecksum1 took 32.385 sec
> FAILED
> Should have 1 replica for that block, not 2
> junit.framework.AssertionFailedError: Should have 1 replica for that block, 
> not 2
> at 
> org.apache.hadoop.hdfs.TestFileAppend4.checksumTest(TestFileAppend4.java:556)
> at 
> org.apache.hadoop.hdfs.TestFileAppend4.testAppendSyncChecksum1(TestFileAppend4.java:500)
> {noformat}
> I will attach logs for all of the failures.  Be aware that I did change some 
> of the logging messages in this test so I could better see when 
> testAppendSyncReplication started and ended.  Other then that the code is 
> stock 0.20.205 RC2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-3811) TestPersistBlocks#TestRestartDfsWithFlush appears to be flaky

2015-07-20 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-3811.
--
Resolution: Cannot Reproduce

I don't think I've seen this fail in a very long time. Going to resolve this. 
Please reopen if you disagree.

> TestPersistBlocks#TestRestartDfsWithFlush appears to be flaky
> -
>
> Key: HDFS-3811
> URL: https://issues.apache.org/jira/browse/HDFS-3811
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.2-alpha
>Reporter: Andrew Wang
>Assignee: Todd Lipcon
> Attachments: stacktrace, testfail-editlog.log, testfail.log, 
> testpersistblocks.txt
>
>
> This test failed on a recent Jenkins build, but passes for me locally. Seems 
> flaky.
> See:
> https://builds.apache.org/job/PreCommit-HDFS-Build/3021//testReport/org.apache.hadoop.hdfs/TestPersistBlocks/TestRestartDfsWithFlush/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-3660) TestDatanodeBlockScanner#testBlockCorruptionRecoveryPolicy2 times out

2015-07-20 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-3660.
--
  Resolution: Cannot Reproduce
Target Version/s:   (was: )

This is an ancient/stale flaky test JIRA. Resolving.

> TestDatanodeBlockScanner#testBlockCorruptionRecoveryPolicy2 times out   
> 
>
> Key: HDFS-3660
> URL: https://issues.apache.org/jira/browse/HDFS-3660
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.0-alpha
>Reporter: Eli Collins
>Priority: Minor
>
> Saw this on a recent jenkins run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-4001) TestSafeMode#testInitializeReplQueuesEarly may time out

2015-07-20 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-4001.
--
Resolution: Fixed

Haven't seen this fail in a very long time. Closing this out. Feel free to 
reopen if you disagree.

> TestSafeMode#testInitializeReplQueuesEarly may time out
> ---
>
> Key: HDFS-4001
> URL: https://issues.apache.org/jira/browse/HDFS-4001
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.0-alpha
>Reporter: Eli Collins
> Attachments: timeout.txt.gz
>
>
> Saw this failure on a recent branch-2 jenkins run, has also been seen on 
> trunk.
> {noformat}
> java.util.concurrent.TimeoutException: Timed out waiting for condition
>   at 
> org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:107)
>   at 
> org.apache.hadoop.hdfs.TestSafeMode.testInitializeReplQueuesEarly(TestSafeMode.java:191)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-3532) TestDatanodeBlockScanner.testBlockCorruptionRecoveryPolicy1 times out

2015-07-20 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-3532.
--
Resolution: Cannot Reproduce

This is an ancient/stale flaky test JIRA. Resolving.

> TestDatanodeBlockScanner.testBlockCorruptionRecoveryPolicy1 times out
> -
>
> Key: HDFS-3532
> URL: https://issues.apache.org/jira/browse/HDFS-3532
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0-alpha
>Reporter: Eli Collins
>
> I've seen this test time out on recent trunk jenkins test patch runs even 
> though HDFS-3266 was put in a couple weeks ago.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8194) Add administrative tool to be able to examine the NN's view of DN storages

2015-04-20 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-8194:


 Summary: Add administrative tool to be able to examine the NN's 
view of DN storages
 Key: HDFS-8194
 URL: https://issues.apache.org/jira/browse/HDFS-8194
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: namenode
Affects Versions: 2.7.0
Reporter: Aaron T. Myers
Assignee: Colin Patrick McCabe


The NN has long had facilities to be able to list all of the DNs that are 
registered with it. It would be great if there were an administrative tool be 
able to list all of the individual storages that the NN is tracking.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8193) Add the ability to delay replica deletion for a period of time

2015-04-20 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-8193:


 Summary: Add the ability to delay replica deletion for a period of 
time
 Key: HDFS-8193
 URL: https://issues.apache.org/jira/browse/HDFS-8193
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: namenode
Affects Versions: 2.7.0
Reporter: Aaron T. Myers
Assignee: Zhe Zhang


When doing maintenance on an HDFS cluster, users may be concerned about the 
possibility of administrative mistakes or software bugs deleting replicas of 
blocks that cannot easily be restored. It would be handy if HDFS could be made 
to optionally not delete any replicas for a configurable period of time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: upstream jenkins build broken?

2015-03-10 Thread Aaron T. Myers
Hey Colin,

I asked Andrew Bayer, who works with Apache Infra, what's going on with
these boxes. He took a look and concluded that some perms are being set in
those directories by our unit tests which are precluding those files from
getting deleted. He's going to clean up the boxes for us, but we should
expect this to keep happening until we can fix the test in question to
properly clean up after itself.

To help narrow down which commit it was that started this, Andrew sent me
this info:

"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/ has
500 perms, so I'm guessing that's the problem. Been that way since 9:32 UTC
on March 5th."

--
Aaron T. Myers
Software Engineer, Cloudera

On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe  wrote:

> Hi all,
>
> A very quick (and not thorough) survey shows that I can't find any
> jenkins jobs that succeeded from the last 24 hours.  Most of them seem
> to be failing with some variant of this message:
>
> [ERROR] Failed to execute goal
> org.apache.maven.plugins:maven-clean-plugin:2.5:clean (default-clean)
> on project hadoop-hdfs: Failed to clean project: Failed to delete
>
> /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
> -> [Help 1]
>
> Any ideas how this happened?  Bad disk, unit test setting wrong
> permissions?
>
> Colin
>


Re: Looking to a Hadoop 3 release

2015-03-02 Thread Aaron T. Myers
+1, this sounds like a good plan to me.

Thanks a lot for volunteering to take this on, Andrew.

Best,
Aaron

On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang 
wrote:

> Hi devs,
>
> It's been a year and a half since 2.x went GA, and I think we're about due
> for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that will
> have a tremendous positive impact for our users.
>
> First, classpath isolation being done at HADOOP-11656, which has been a
> long-standing request from many downstreams and Hadoop users.
>
> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.
>
> Between the two, we'll also have quite an opportunity to clean up and
> upgrade our dependencies, another common user and developer request.
>
> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities. There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.
>
> This opens up discussion about inclusion of other changes, but I'm hoping
> to freeze incompatible changes after maybe two alphas, do a beta (with no
> further incompat changes allowed), and then finally a 3.x GA. For those
> keeping track, that means a 3.x GA in about four months.
>
> I would also like to stress though that this is not intended to be a big
> bang release. For instance, it would be great if we could maintain wire
> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're
> likely maintaining 2.x for a while yet.
>
> Please let me know any comments / concerns related to the above. If people
> are friendly to the idea, I'd like to cut a branch-3 and start working on
> the first alpha.
>
> Best,
> Andrew
>


[jira] [Resolved] (HDFS-7421) Move processing of postponed over-replicated blocks to a background task

2015-01-23 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-7421.
--
Resolution: Duplicate

> Move processing of postponed over-replicated blocks to a background task
> 
>
> Key: HDFS-7421
> URL: https://issues.apache.org/jira/browse/HDFS-7421
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha, namenode
>Affects Versions: 2.6.0
>    Reporter: Aaron T. Myers
>    Assignee: Aaron T. Myers
>
> In an HA environment, we postpone sending block invalidates to DNs until all 
> DNs holding a given block have done at least one block report to the NN after 
> it became active. When that first block report after becoming active does 
> occur, we attempt to reprocess all postponed misreplicated blocks inline with 
> the block report RPC. In the case where there are many postponed 
> misreplicated blocks, this can cause block report RPCs to take an 
> inordinately long time to complete, sometimes on the order of minutes, which 
> has the potential to tie up RPC handlers, block incoming RPCs, etc. There's 
> no need to hurriedly process all postponed misreplicated blocks so that we 
> can quickly send invalidate commands back to DNs, so let's move this 
> processing outside of the RPC handler context and into a background thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7421) Move processing of postponed over-replicated blocks to a background task

2014-11-21 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-7421:


 Summary: Move processing of postponed over-replicated blocks to a 
background task
 Key: HDFS-7421
 URL: https://issues.apache.org/jira/browse/HDFS-7421
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, namenode
Affects Versions: 2.6.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


In an HA environment, we postpone sending block invalidates to DNs until all 
DNs holding a given block have done at least one block report to the NN after 
it became active. When that first block report after becoming active does 
occur, we attempt to reprocess all postponed misreplicated blocks inline with 
the block report RPC. In the case where there are many postponed misreplicated 
blocks, this can cause block report RPCs to take an inordinately long time to 
complete, sometimes on the order of minutes, which has the potential to tie up 
RPC handlers, block incoming RPCs, etc. There's no need to hurriedly process 
all postponed misreplicated blocks so that we can quickly send invalidate 
commands back to DNs, so let's move this processing outside of the RPC handler 
context and into a background thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] Switch to log4j 2

2014-08-15 Thread Aaron T. Myers
Not necessarily opposed to switching logging frameworks, but I believe we
can actually support async logging with today's logging system if we wanted
to, e.g. as was done for the HDFS audit logger in this JIRA:

https://issues.apache.org/jira/browse/HDFS-5241

--
Aaron T. Myers
Software Engineer, Cloudera


On Fri, Aug 15, 2014 at 5:44 AM, Steve Loughran 
wrote:

> moving to SLF4J as an API is independent —it's just a better API for
> logging than commons-logging, was already a dependency and doesn't force
> anyone to switch to a new log back end.
>
>
> On 15 August 2014 03:34, Tsuyoshi OZAWA  wrote:
>
> > Hi,
> >
> > Steve has started discussion titled "use SLF4J APIs in new modules?"
> > as a related topic.
> >
> >
> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201404.mbox/%3cca+4kjvv_9cmmtdqzcgzy-chslyb1wkgdunxs7wrheslwbuh...@mail.gmail.com%3E
> >
> > It sounds good to me to use asynchronous logging when we log INFO. One
> > concern is that asynchronous logging makes debugging difficult - I
> > don't know log4j 2 well, but I suspect that ordering of logging can be
> > changed even if WARN or  FATAL are logged with synchronous logger.
> >
> > Thanks,
> > - Tsuyoshi
> >
> > On Thu, Aug 14, 2014 at 6:44 AM, Arpit Agarwal  >
> > wrote:
> > > I don't recall whether this was discussed before.
> > >
> > > I often find our INFO logging to be too sparse for useful diagnosis. A
> > high
> > > performance logging framework will encourage us to log more.
> > Specifically,
> > > Asynchronous Loggers look interesting.
> > > https://logging.apache.org/log4j/2.x/manual/async.html#Performance
> > >
> > > What does the community think of switching to log4j 2 in a Hadoop 2.x
> > > release?
> > >
> > > --
> > > CONFIDENTIALITY NOTICE
> > > NOTICE: This message is intended for the use of the individual or
> entity
> > to
> > > which it is addressed and may contain information that is confidential,
> > > privileged and exempt from disclosure under applicable law. If the
> reader
> > > of this message is not the intended recipient, you are hereby notified
> > that
> > > any printing, copying, dissemination, distribution, disclosure or
> > > forwarding of this communication is strictly prohibited. If you have
> > > received this communication in error, please contact the sender
> > immediately
> > > and delete it from your system. Thank You.
> >
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>


Re: [VOTE] Migration from subversion to git for version control

2014-08-11 Thread Aaron T. Myers
+1 (binding)

Thanks for driving this, Karthik.

--
Aaron T. Myers
Software Engineer, Cloudera


On Fri, Aug 8, 2014 at 7:57 PM, Karthik Kambatla  wrote:

> I have put together this proposal based on recent discussion on this topic.
>
> Please vote on the proposal. The vote runs for 7 days.
>
>1. Migrate from subversion to git for version control.
>2. Force-push to be disabled on trunk and branch-* branches. Applying
>changes from any of trunk/branch-* to any of branch-* should be through
>"git cherry-pick -x".
>3. Force-push on feature-branches is allowed. Before pulling in a
>feature, the feature-branch should be rebased on latest trunk and the
>changes applied to trunk through "git rebase --onto" or "git cherry-pick
>".
>4. Every time a feature branch is rebased on trunk, a tag that
>identifies the state before the rebase needs to be created (e.g.
>tag_feature_JIRA-2454_2014-08-07_rebase). These tags can be deleted once
>the feature is pulled into trunk and the tags are no longer useful.
>5. The relevance/use of tags stay the same after the migration.
>
> Thanks
> Karthik
>
> PS: Per Andrew Wang, this should be a "Adoption of New Codebase" kind of
> vote and will be Lazy 2/3 majority of PMC members.
>


[jira] [Created] (HDFS-6647) Edit log corruption when pipeline recovery occurs for deleted file present in snapshot

2014-07-09 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-6647:


 Summary: Edit log corruption when pipeline recovery occurs for 
deleted file present in snapshot
 Key: HDFS-6647
 URL: https://issues.apache.org/jira/browse/HDFS-6647
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode, snapshots
Affects Versions: 2.4.1
Reporter: Aaron T. Myers


I've encountered a situation wherein an OP_UPDATE_BLOCKS can appear in the edit 
log for a file after an OP_DELETE has previously been logged for that file. 
Such an edit log sequence cannot then be successfully read by the NameNode.

More details in the first comment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: [VOTE] Release Apache Hadoop 2.4.1

2014-06-27 Thread Aaron T. Myers
That's fine by me. Like I said, assuming that rc1 does indeed include the fix 
in HDFS-6527, and not the revert, then rc1 should be functionally correct. 
What's in branch-2.4.1 doesn't currently match what's in this RC, but if that 
doesn't bother anyone else then I won't lose any sleep over it. 

--
Aaron T. Myers
Software Engineer, Cloudera

> On Jun 27, 2014, at 3:04 PM, "Arun C. Murthy"  wrote:
> 
> Aaron,
> 
> Since the amend was just to the test, I'll keep this RC as-is.
> 
> I'll also comment on jira.
> 
> thanks,
> Arun
> 
> 
> 
>> On Jun 27, 2014, at 2:40 PM, "Aaron T. Myers"  wrote:
>> 
>> I'm -0 on rc1.
>> 
>> Note the latest discussion on HDFS-6527 which first resulted in that patch
>> being reverted from branch-2.4.1 because it was believed it wasn't
>> necessary, and then some more discussion which indicates that in fact the
>> patch for HDFS-6527 should be included in 2.4.1, but with a slightly
>> different test case.
>> 
>> I believe that rc1 was actually created after the first backport of
>> HDFS-6527, but before the revert, so rc1 should be functionally correct,
>> but the test case is not quite correct in rc1, and I believe that rc1 does
>> not currently reflect the actual tip of branch-2.4.1. I'm not going to
>> consider this a deal-breaker, but seems like we should probably clean it up.
>> 
>> To get this all sorted out properly, if we wanted to, I believe we should
>> do another backport of HDFS-6527 to branch-2.4.1 including only the amended
>> test case, and create a new RC from that point.
>> 
>> Best,
>> Aaron
>> 
>> --
>> Aaron T. Myers
>> Software Engineer, Cloudera
>> 
>> 
>>> On Fri, Jun 20, 2014 at 11:51 PM, Arun C Murthy  
>>> wrote:
>>> 
>>> Folks,
>>> 
>>> I've created another release candidate (rc1) for hadoop-2.4.1 based on the
>>> feedback that I would like to push out.
>>> 
>>> The RC is available at:
>>> http://people.apache.org/~acmurthy/hadoop-2.4.1-rc1
>>> The RC tag in svn is here:
>>> https://svn.apache.org/repos/asf/hadoop/common/tags/release-2.4.1-rc1
>>> 
>>> The maven artifacts are available via repository.apache.org.
>>> 
>>> Please try the release and vote; the vote will run for the usual 7 days.
>>> 
>>> thanks,
>>> Arun
>>> 
>>> 
>>> 
>>> --
>>> Arun C. Murthy
>>> Hortonworks Inc.
>>> http://hortonworks.com/hdp/
>>> 
>>> 
>>> 
>>> --
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity to
>>> which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
> 
> -- 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to 
> which it is addressed and may contain information that is confidential, 
> privileged and exempt from disclosure under applicable law. If the reader 
> of this message is not the intended recipient, you are hereby notified that 
> any printing, copying, dissemination, distribution, disclosure or 
> forwarding of this communication is strictly prohibited. If you have 
> received this communication in error, please contact the sender immediately 
> and delete it from your system. Thank You.


Re: [VOTE] Release Apache Hadoop 2.4.1

2014-06-27 Thread Aaron T. Myers
I'm -0 on rc1.

Note the latest discussion on HDFS-6527 which first resulted in that patch
being reverted from branch-2.4.1 because it was believed it wasn't
necessary, and then some more discussion which indicates that in fact the
patch for HDFS-6527 should be included in 2.4.1, but with a slightly
different test case.

I believe that rc1 was actually created after the first backport of
HDFS-6527, but before the revert, so rc1 should be functionally correct,
but the test case is not quite correct in rc1, and I believe that rc1 does
not currently reflect the actual tip of branch-2.4.1. I'm not going to
consider this a deal-breaker, but seems like we should probably clean it up.

To get this all sorted out properly, if we wanted to, I believe we should
do another backport of HDFS-6527 to branch-2.4.1 including only the amended
test case, and create a new RC from that point.

Best,
Aaron

--
Aaron T. Myers
Software Engineer, Cloudera


On Fri, Jun 20, 2014 at 11:51 PM, Arun C Murthy  wrote:

> Folks,
>
> I've created another release candidate (rc1) for hadoop-2.4.1 based on the
> feedback that I would like to push out.
>
> The RC is available at:
> http://people.apache.org/~acmurthy/hadoop-2.4.1-rc1
> The RC tag in svn is here:
> https://svn.apache.org/repos/asf/hadoop/common/tags/release-2.4.1-rc1
>
> The maven artifacts are available via repository.apache.org.
>
> Please try the release and vote; the vote will run for the usual 7 days.
>
> thanks,
> Arun
>
>
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/hdp/
>
>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>


Re: [VOTE] Change by-laws on release votes: 5 days instead of 7

2014-06-24 Thread Aaron T. Myers
+1 (binding)

--
Aaron T. Myers
Software Engineer, Cloudera


On Tue, Jun 24, 2014 at 1:53 AM, Arun C Murthy  wrote:

> Folks,
>
>  As discussed, I'd like to call a vote on changing our by-laws to change
> release votes from 7 days to 5.
>
>  I've attached the change to by-laws I'm proposing.
>
>  Please vote, the vote will the usual period of 7 days.
>
> thanks,
> Arun
>
> 
>
> [main]$ svn diff
> Index: author/src/documentation/content/xdocs/bylaws.xml
> ===
> --- author/src/documentation/content/xdocs/bylaws.xml   (revision 1605015)
> +++ author/src/documentation/content/xdocs/bylaws.xml   (working copy)
> @@ -344,7 +344,16 @@
>  Votes are open for a period of 7 days to allow all active
>  voters time to consider the vote. Votes relating to code
>  changes are not subject to a strict timetable but should be
> -made as timely as possible.
> +made as timely as possible.
> +
> + 
> +  Product Release - Vote Timeframe
> +   Release votes, alone, run for a period of 5 days. All other
> + votes are subject to the above timeframe of 7 days.
> + 
> +   
> +   
> +
> 
> 
>  
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>


[jira] [Created] (HDFS-6563) NameNode cannot save fsimage in certain circumstances when snapshots are in use

2014-06-18 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-6563:


 Summary: NameNode cannot save fsimage in certain circumstances 
when snapshots are in use
 Key: HDFS-6563
 URL: https://issues.apache.org/jira/browse/HDFS-6563
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode, snapshots
Affects Versions: 2.4.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
Priority: Critical


Checkpoints will start to fail and the NameNode will not be able to manually 
saveNamespace if the following set of steps occurs:

# A zero-length file appears in a snapshot
# That file is later lengthened to include at least one block
# That file is subsequently deleted from the present file system but remains in 
the snapshot

More details in the first comment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6463) Incorrect permission can be created after setting ACLs

2014-05-28 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-6463:


 Summary: Incorrect permission can be created after setting ACLs
 Key: HDFS-6463
 URL: https://issues.apache.org/jira/browse/HDFS-6463
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.4.0
Reporter: Aaron T. Myers


When setting ACLs for a file or directory, it's possible for the resulting 
FsPermission object's group entry to be set incorrectly, in particular it will 
be set to the mask entry. More details in the first comment of this JIRA.

Thanks to Szehon Ho for identifying this issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6435) Add support for specifying a static uid/gid mapping for the NFS gateway

2014-05-20 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-6435:


 Summary: Add support for specifying a static uid/gid mapping for 
the NFS gateway
 Key: HDFS-6435
 URL: https://issues.apache.org/jira/browse/HDFS-6435
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: nfs
Affects Versions: 2.4.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


It's quite reasonable that folks will want to access the HDFS NFS Gateway from 
client machines where the UIDs/GIDs do not line up with those on the NFS 
Gateway itself. We should provide a way to map these UIDs/GIDs between the 
systems.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6406) Add capability for NFS gateway to reject connections from unprivileged ports

2014-05-15 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-6406:


 Summary: Add capability for NFS gateway to reject connections from 
unprivileged ports
 Key: HDFS-6406
 URL: https://issues.apache.org/jira/browse/HDFS-6406
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.4.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


Many NFS servers have the ability to only accept client connections originating 
from privileged ports. It would be nice if the HDFS NFS gateway had the same 
feature.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6289) HA failover can fail if there are pending DN messages for DNs which no longer exist

2014-04-25 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-6289:


 Summary: HA failover can fail if there are pending DN messages for 
DNs which no longer exist
 Key: HDFS-6289
 URL: https://issues.apache.org/jira/browse/HDFS-6289
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha
Affects Versions: 2.4.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
Priority: Critical


In an HA setup, the standby NN may receive messages from DNs for blocks which 
the standby NN is not yet aware of. It queues up these messages and replays 
them when it next reads from the edit log or fails over. On a failover, all of 
these pending DN messages must be processed successfully in order for the 
failover to succeed. If one of these pending DN messages refers to a DN 
storageId that no longer exists (because the DN with that transfer address has 
been reformatted and has re-registered with the same transfer address) then on 
transition to active the NN will not be able to process this DN message and 
will suicide with an error like the following:

{noformat}
2014-04-25 14:23:17,922 FATAL namenode.NameNode 
(NameNode.java:doImmediateShutdown(1525)) - Error encountered requiring NN 
shutdown. Shutting down immediately.
java.io.IOException: Cannot mark blk_1073741825_900(stored=blk_1073741825_1001) 
as corrupt because datanode 127.0.0.1:33324 does not exist
{noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6281) Provide option to use the NFS Gateway without having to use the Hadoop portmapper

2014-04-24 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-6281:


 Summary: Provide option to use the NFS Gateway without having to 
use the Hadoop portmapper
 Key: HDFS-6281
 URL: https://issues.apache.org/jira/browse/HDFS-6281
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: nfs
Affects Versions: 2.4.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


In order to use the NFS Gateway on operating systems with the rpcbind 
privileged registration bug, we currently require users to shut down and 
discontinue use of the system-provided portmap daemon, and instead use the 
portmap daemon provided by Hadoop. Alternately, we can work around this bug if 
we tweak the NFS Gateway to perform its port registration from a privileged 
port, and still let users use the system portmap daemon.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6280) Provide option to

2014-04-24 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-6280:


 Summary: Provide option to 
 Key: HDFS-6280
 URL: https://issues.apache.org/jira/browse/HDFS-6280
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Aaron T. Myers






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HDFS-6280) Provide option to

2014-04-24 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-6280.
--

Resolution: Invalid

Accidentally hit "create" too soon. :)

> Provide option to 
> --
>
> Key: HDFS-6280
> URL: https://issues.apache.org/jira/browse/HDFS-6280
> Project: Hadoop HDFS
>  Issue Type: Improvement
>    Reporter: Aaron T. Myers
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6112) NFS Gateway docs are incorrect for allowed hosts configuration

2014-03-17 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-6112:


 Summary: NFS Gateway docs are incorrect for allowed hosts 
configuration
 Key: HDFS-6112
 URL: https://issues.apache.org/jira/browse/HDFS-6112
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.4.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
Priority: Minor


The NFS gateway export configuration docs say that the machine name 
configuration can be "wildcards" and provides the example 
"{{host*.example.com}}". The term "wildcard" and this example might imply 
typical globbing semantics, but in fact what it actually supports is Java 
regular expressions. I think we should change the docs to make this clearer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6056) Clean up NFS config settings

2014-03-04 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-6056:


 Summary: Clean up NFS config settings
 Key: HDFS-6056
 URL: https://issues.apache.org/jira/browse/HDFS-6056
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.3.0
Reporter: Aaron T. Myers
Assignee: Brandon Li


As discussed on HDFS-6050, there's a few opportunities to improve the config 
settings related to NFS. This JIRA is to implement those changes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HDFS-6048) DFSClient fails if native library doesn't exist

2014-03-03 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-6048.
--

Resolution: Duplicate

Hi Akira, I think this will be addressed by HDFS-6040, which should be 
committed shortly.

> DFSClient fails if native library doesn't exist
> ---
>
> Key: HDFS-6048
> URL: https://issues.apache.org/jira/browse/HDFS-6048
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Akira AJISAKA
>Priority: Blocker
>
> When I executed FSShell commands (such as hdfs dfs -ls, -mkdir, -cat) in 
> trunk, {{UnsupportedOperationException}} occurred in 
> {{o.a.h.net.unix.DomainSocketWatcher}} and the commands failed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6033) PBImageXmlWriter incorrectly handles processing cache directives

2014-02-27 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-6033:


 Summary: PBImageXmlWriter incorrectly handles processing cache 
directives
 Key: HDFS-6033
 URL: https://issues.apache.org/jira/browse/HDFS-6033
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: caching
Affects Versions: 2.4.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


When attempting to process cache directives in 
{{PBImageXmlWriter#dumpCacheManagerSection}}, we incorrectly loop the number of 
cache _pools_, not directives.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Re: [VOTE] Release Apache Hadoop 2.3.0

2014-02-12 Thread Aaron T. Myers
I don't think any of what you describe below is a regression in behavior
from earlier releases. The fs.defaultFS has been set to file:/// for a long
time, and likewise you've similarly had to set up your YARN configs. Given
that, I don't think this warrants a new RC.

--
Aaron T. Myers
Software Engineer, Cloudera


On Wed, Feb 12, 2014 at 5:37 PM, Alejandro Abdelnur wrote:

> Running pseudo cluster out of the box (expanding the binary tar, or
> building from source) does not work, you have to go an set the MR framework
> to yarn, the default FS URI to hdfs://localhost:8020, and so on.
>
> While I don't see this as showstopper (for the knowledgeable user), it will
> make may users to fail miserably.
>
> Plus, running a an example MR job out of the box uses the local runner. If
> the user does not pay attention to the output will think the job run in the
> cluster.
>
> Should we do a new RC fixing this?
>
> Thanks.
>
>
>
> On Wed, Feb 12, 2014 at 5:10 PM, Zhijie Shen 
> wrote:
>
> > +1 (non-binding)
> >
> > I download the source tar ball, built from it, ran a number of MR jobs
> on a
> > single-node cluster, checked the job history from job history server.
> >
> >
> > On Wed, Feb 12, 2014 at 2:47 PM, Jian He  wrote:
> >
> > > +1 (non-binding)
> > >
> > > Built from source. Ran a few MR sample jobs on a pseudo cluster.
> > > Everything works fine.
> > >
> > > Jian
> > >
> > >
> > > On Wed, Feb 12, 2014 at 2:32 PM, Aaron T. Myers 
> > wrote:
> > >
> > > > +1 (binding)
> > > >
> > > > I downloaded the source tar ball, checked signatures, built from the
> > > > source, ran a few of the sample jobs on a pseudo cluster. Everything
> > was
> > > as
> > > > expected.
> > > >
> > > > --
> > > > Aaron T. Myers
> > > > Software Engineer, Cloudera
> > > >
> > > >
> > > > On Tue, Feb 11, 2014 at 6:49 AM, Arun C Murthy 
> > > > wrote:
> > > >
> > > > > Folks,
> > > > >
> > > > > I've created a release candidate (rc0) for hadoop-2.3.0 that I
> would
> > > like
> > > > > to get released.
> > > > >
> > > > > The RC is available at:
> > > > > http://people.apache.org/~acmurthy/hadoop-2.3.0-rc0
> > > > > The RC tag in svn is here:
> > > > >
> > https://svn.apache.org/repos/asf/hadoop/common/tags/release-2.3.0-rc0
> > > > >
> > > > > The maven artifacts are available via repository.apache.org.
> > > > >
> > > > > Please try the release and vote; the vote will run for the usual 7
> > > days.
> > > > >
> > > > > thanks,
> > > > > Arun
> > > > >
> > > > > PS: Thanks to Andrew, Vinod & Alejandro for all their help in
> various
> > > > > release activities.
> > > > > --
> > > > > CONFIDENTIALITY NOTICE
> > > > > NOTICE: This message is intended for the use of the individual or
> > > entity
> > > > to
> > > > > which it is addressed and may contain information that is
> > confidential,
> > > > > privileged and exempt from disclosure under applicable law. If the
> > > reader
> > > > > of this message is not the intended recipient, you are hereby
> > notified
> > > > that
> > > > > any printing, copying, dissemination, distribution, disclosure or
> > > > > forwarding of this communication is strictly prohibited. If you
> have
> > > > > received this communication in error, please contact the sender
> > > > immediately
> > > > > and delete it from your system. Thank You.
> > > > >
> > > >
> > >
> > > --
> > > CONFIDENTIALITY NOTICE
> > > NOTICE: This message is intended for the use of the individual or
> entity
> > to
> > > which it is addressed and may contain information that is confidential,
> > > privileged and exempt from disclosure under applicable law. If the
> reader
> > > of this message is not the intended recipient, you are hereby notified
> > that
> > > any printing, copying, dissemination, distribution, disclosure or
> > > forwarding of this communication is strictly prohibited. If you have
> > > received this communication in error, please contact the sender
> > immediately
> > > and delete it from your system. Thank You.
> > >
> >
> >
> >
> > --
> > Zhijie Shen
> > Hortonworks Inc.
> > http://hortonworks.com/
> >
> > --
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or entity
> to
> > which it is addressed and may contain information that is confidential,
> > privileged and exempt from disclosure under applicable law. If the reader
> > of this message is not the intended recipient, you are hereby notified
> that
> > any printing, copying, dissemination, distribution, disclosure or
> > forwarding of this communication is strictly prohibited. If you have
> > received this communication in error, please contact the sender
> immediately
> > and delete it from your system. Thank You.
> >
>
>
>
> --
> Alejandro
>


Re: [VOTE] Release Apache Hadoop 2.3.0

2014-02-12 Thread Aaron T. Myers
+1 (binding)

I downloaded the source tar ball, checked signatures, built from the
source, ran a few of the sample jobs on a pseudo cluster. Everything was as
expected.

--
Aaron T. Myers
Software Engineer, Cloudera


On Tue, Feb 11, 2014 at 6:49 AM, Arun C Murthy  wrote:

> Folks,
>
> I've created a release candidate (rc0) for hadoop-2.3.0 that I would like
> to get released.
>
> The RC is available at:
> http://people.apache.org/~acmurthy/hadoop-2.3.0-rc0
> The RC tag in svn is here:
> https://svn.apache.org/repos/asf/hadoop/common/tags/release-2.3.0-rc0
>
> The maven artifacts are available via repository.apache.org.
>
> Please try the release and vote; the vote will run for the usual 7 days.
>
> thanks,
> Arun
>
> PS: Thanks to Andrew, Vinod & Alejandro for all their help in various
> release activities.
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>


Re: Re-swizzle 2.3

2014-02-10 Thread Aaron T. Myers
There's still ongoing discussion on HDFS-4858 and I don't think we should
hold up 2.3.0 for that. IMO we should target that for 2.3.1 or 2.4.0.

--
Aaron T. Myers
Software Engineer, Cloudera


On Mon, Feb 10, 2014 at 5:53 PM, Konstantin Shvachko
wrote:

> Sorry for the last minute request.
> Can we add HDFS-4858 to the release, please?
> It solves pretty important bug related to failover.
> I can commit momentarily if there are no objections.
>
> Thanks,
> --Konstantin
>
>
> On Mon, Feb 10, 2014 at 4:50 PM, Aaron T. Myers  wrote:
>
> > Just committed a fix for HDFS-5921 to branch-2.3.
> >
> > Fire away.
> >
> > --
> > Aaron T. Myers
> > Software Engineer, Cloudera
> >
> >
> > On Mon, Feb 10, 2014 at 1:34 PM, Aaron T. Myers 
> wrote:
> >
> > > OK. I think I should be able to get it in by 6pm PT, thanks to a quick
> +1
> > > from Andrew, but certainly don't let it hold up the train if for some
> > > reason it takes longer than that.
> > >
> > > --
> > > Aaron T. Myers
> > > Software Engineer, Cloudera
> > >
> > >
> > > On Mon, Feb 10, 2014 at 12:04 PM, Arun C Murthy  > >wrote:
> > >
> > >> Looks like we are down to 0 blockers; I'll create rc0 tonight.
> > >>
> > >> ATM - Your call, you have until 6pm tonight to get this in.
> > >>
> > >> thanks,
> > >> Arun
> > >>
> > >> On Feb 10, 2014, at 11:44 AM, "Aaron T. Myers" 
> > wrote:
> > >>
> > >> > I just filed an issue for the fact that browsing the FS from the NN
> is
> > >> > broken if you have a directory with the sticky bit set:
> > >> >
> > >> > https://issues.apache.org/jira/browse/HDFS-5921
> > >> >
> > >> > I didn't set this to be targeted for 2.3 because it doesn't seem
> like
> > a
> > >> > _blocker_ to me, but if we're not going to get 2.3 out today anyway,
> > I'd
> > >> > like to put this in. It's a small fix, and since many people have
> the
> > >> > sticky bit set on /tmp, they won't be able to browse any of the FS
> > >> > hierarchy from the NN without this fix.
> > >> >
> > >> > --
> > >> > Aaron T. Myers
> > >> > Software Engineer, Cloudera
> > >> >
> > >> >
> > >> > On Fri, Feb 7, 2014 at 12:45 PM, Vinod Kumar Vavilapalli <
> > >> vino...@apache.org
> > >> >> wrote:
> > >> >
> > >> >> Heres what I've done:
> > >> >> - Reverted YARN-1493,YARN-1490,YARN-1041,
> > >> >> YARN-1166,YARN-1566,YARN-1689,YARN-1661 from branch-2.3.
> > >> >> - Updated YARN's CHANGES.txt in trunk, branch-2 and branch-2.3.
> > >> >> - Updated these JIRAs to have 2.4 as the fix-version.
> > >> >> - Compiled branch-2.3.
> > >> >>
> > >> >> Let me know if you run into any issues caused by this revert.
> > >> >>
> > >> >> Thanks,
> > >> >> +Vinod
> > >> >>
> > >> >>
> > >> >> On Fri, Feb 7, 2014 at 11:41 AM, Vinod Kumar Vavilapalli <
> > >> >> vino...@apache.org
> > >> >>> wrote:
> > >> >>
> > >> >>> Haven't heard back from Jian. Reverting the set from branch-2.3
> > >> (only).
> > >> >> Tx
> > >> >>> for the offline list.
> > >> >>>
> > >> >>> +Vinod
> > >> >>>
> > >> >>>
> > >> >>> On Fri, Feb 7, 2014 at 9:08 AM, Alejandro Abdelnur <
> > t...@cloudera.com
> > >> >>> wrote:
> > >> >>>
> > >> >>>> Vinod, I have the patches to revert most of the JIRAs, the first
> > >> batch,
> > >> >>>> I'll send them off line to you.
> > >> >>>>
> > >> >>>> Thanks.
> > >> >>>>
> > >> >>>>
> > >> >>>> On Thu, Feb 6, 2014 at 8:56 PM, Vinod Kumar Vavilapalli
> > >> >>>> wrote:
> > >> >>>>
> > >> >>>>>
> > >> >>>>> Thanks. please post your findings, Jian wrote this 

Re: Re-swizzle 2.3

2014-02-10 Thread Aaron T. Myers
Just committed a fix for HDFS-5921 to branch-2.3.

Fire away.

--
Aaron T. Myers
Software Engineer, Cloudera


On Mon, Feb 10, 2014 at 1:34 PM, Aaron T. Myers  wrote:

> OK. I think I should be able to get it in by 6pm PT, thanks to a quick +1
> from Andrew, but certainly don't let it hold up the train if for some
> reason it takes longer than that.
>
> --
> Aaron T. Myers
> Software Engineer, Cloudera
>
>
> On Mon, Feb 10, 2014 at 12:04 PM, Arun C Murthy wrote:
>
>> Looks like we are down to 0 blockers; I'll create rc0 tonight.
>>
>> ATM - Your call, you have until 6pm tonight to get this in.
>>
>> thanks,
>> Arun
>>
>> On Feb 10, 2014, at 11:44 AM, "Aaron T. Myers"  wrote:
>>
>> > I just filed an issue for the fact that browsing the FS from the NN is
>> > broken if you have a directory with the sticky bit set:
>> >
>> > https://issues.apache.org/jira/browse/HDFS-5921
>> >
>> > I didn't set this to be targeted for 2.3 because it doesn't seem like a
>> > _blocker_ to me, but if we're not going to get 2.3 out today anyway, I'd
>> > like to put this in. It's a small fix, and since many people have the
>> > sticky bit set on /tmp, they won't be able to browse any of the FS
>> > hierarchy from the NN without this fix.
>> >
>> > --
>> > Aaron T. Myers
>> > Software Engineer, Cloudera
>> >
>> >
>> > On Fri, Feb 7, 2014 at 12:45 PM, Vinod Kumar Vavilapalli <
>> vino...@apache.org
>> >> wrote:
>> >
>> >> Heres what I've done:
>> >> - Reverted YARN-1493,YARN-1490,YARN-1041,
>> >> YARN-1166,YARN-1566,YARN-1689,YARN-1661 from branch-2.3.
>> >> - Updated YARN's CHANGES.txt in trunk, branch-2 and branch-2.3.
>> >> - Updated these JIRAs to have 2.4 as the fix-version.
>> >> - Compiled branch-2.3.
>> >>
>> >> Let me know if you run into any issues caused by this revert.
>> >>
>> >> Thanks,
>> >> +Vinod
>> >>
>> >>
>> >> On Fri, Feb 7, 2014 at 11:41 AM, Vinod Kumar Vavilapalli <
>> >> vino...@apache.org
>> >>> wrote:
>> >>
>> >>> Haven't heard back from Jian. Reverting the set from branch-2.3
>> (only).
>> >> Tx
>> >>> for the offline list.
>> >>>
>> >>> +Vinod
>> >>>
>> >>>
>> >>> On Fri, Feb 7, 2014 at 9:08 AM, Alejandro Abdelnur > >>> wrote:
>> >>>
>> >>>> Vinod, I have the patches to revert most of the JIRAs, the first
>> batch,
>> >>>> I'll send them off line to you.
>> >>>>
>> >>>> Thanks.
>> >>>>
>> >>>>
>> >>>> On Thu, Feb 6, 2014 at 8:56 PM, Vinod Kumar Vavilapalli
>> >>>> wrote:
>> >>>>
>> >>>>>
>> >>>>> Thanks. please post your findings, Jian wrote this part of the code
>> >> and
>> >>>>> between him/me, we can take care of those issues.
>> >>>>>
>> >>>>> +1 for going ahead with the revert on branch-2.3. I'll go do that
>> >>>> tomorrow
>> >>>>> morning unless I hear otherwise from Jian.
>> >>>>>
>> >>>>> Thanks,
>> >>>>> +Vinod
>> >>>>>
>> >>>>>
>> >>>>> On Feb 6, 2014, at 8:28 PM, Alejandro Abdelnur 
>> >>>> wrote:
>> >>>>>
>> >>>>>> Hi Vinod,
>> >>>>>>
>> >>>>>> Nothing confidential,
>> >>>>>>
>> >>>>>> * With umanaged AMs I'm seeing the trace I've posted a couple of
>> >> days
>> >>>> ago
>> >>>>>> in YARN-1577 (
>> >>>>>>
>> >>>>>
>> >>>>
>> >>
>> https://issues.apache.org/jira/browse/YARN-1577?focusedCommentId=13891853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13891853
>> >>>>>> ).
>> >>>>>>
>> >>>>>> * Also, Robert has been digging in Oozie testcases failing/getting
>> >>>> suck
>> >>>>>> with several token renewer threa

[jira] [Created] (HDFS-5922) DN heartbeat thread can get stuck in tight loop

2014-02-10 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-5922:


 Summary: DN heartbeat thread can get stuck in tight loop
 Key: HDFS-5922
 URL: https://issues.apache.org/jira/browse/HDFS-5922
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.3.0
Reporter: Aaron T. Myers


We saw an issue recently on a test cluster where one of the DN threads was 
consuming 100% of a single CPU. Running jstack indicated that it was the DN 
heartbeat thread. I believe I've tracked down the cause to a bug in the 
accounting around the value of {{pendingReceivedRequests}}.

More details in the first comment.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Re: Re-swizzle 2.3

2014-02-10 Thread Aaron T. Myers
OK. I think I should be able to get it in by 6pm PT, thanks to a quick +1
from Andrew, but certainly don't let it hold up the train if for some
reason it takes longer than that.

--
Aaron T. Myers
Software Engineer, Cloudera


On Mon, Feb 10, 2014 at 12:04 PM, Arun C Murthy  wrote:

> Looks like we are down to 0 blockers; I'll create rc0 tonight.
>
> ATM - Your call, you have until 6pm tonight to get this in.
>
> thanks,
> Arun
>
> On Feb 10, 2014, at 11:44 AM, "Aaron T. Myers"  wrote:
>
> > I just filed an issue for the fact that browsing the FS from the NN is
> > broken if you have a directory with the sticky bit set:
> >
> > https://issues.apache.org/jira/browse/HDFS-5921
> >
> > I didn't set this to be targeted for 2.3 because it doesn't seem like a
> > _blocker_ to me, but if we're not going to get 2.3 out today anyway, I'd
> > like to put this in. It's a small fix, and since many people have the
> > sticky bit set on /tmp, they won't be able to browse any of the FS
> > hierarchy from the NN without this fix.
> >
> > --
> > Aaron T. Myers
> > Software Engineer, Cloudera
> >
> >
> > On Fri, Feb 7, 2014 at 12:45 PM, Vinod Kumar Vavilapalli <
> vino...@apache.org
> >> wrote:
> >
> >> Heres what I've done:
> >> - Reverted YARN-1493,YARN-1490,YARN-1041,
> >> YARN-1166,YARN-1566,YARN-1689,YARN-1661 from branch-2.3.
> >> - Updated YARN's CHANGES.txt in trunk, branch-2 and branch-2.3.
> >> - Updated these JIRAs to have 2.4 as the fix-version.
> >> - Compiled branch-2.3.
> >>
> >> Let me know if you run into any issues caused by this revert.
> >>
> >> Thanks,
> >> +Vinod
> >>
> >>
> >> On Fri, Feb 7, 2014 at 11:41 AM, Vinod Kumar Vavilapalli <
> >> vino...@apache.org
> >>> wrote:
> >>
> >>> Haven't heard back from Jian. Reverting the set from branch-2.3 (only).
> >> Tx
> >>> for the offline list.
> >>>
> >>> +Vinod
> >>>
> >>>
> >>> On Fri, Feb 7, 2014 at 9:08 AM, Alejandro Abdelnur  >>> wrote:
> >>>
> >>>> Vinod, I have the patches to revert most of the JIRAs, the first
> batch,
> >>>> I'll send them off line to you.
> >>>>
> >>>> Thanks.
> >>>>
> >>>>
> >>>> On Thu, Feb 6, 2014 at 8:56 PM, Vinod Kumar Vavilapalli
> >>>> wrote:
> >>>>
> >>>>>
> >>>>> Thanks. please post your findings, Jian wrote this part of the code
> >> and
> >>>>> between him/me, we can take care of those issues.
> >>>>>
> >>>>> +1 for going ahead with the revert on branch-2.3. I'll go do that
> >>>> tomorrow
> >>>>> morning unless I hear otherwise from Jian.
> >>>>>
> >>>>> Thanks,
> >>>>> +Vinod
> >>>>>
> >>>>>
> >>>>> On Feb 6, 2014, at 8:28 PM, Alejandro Abdelnur 
> >>>> wrote:
> >>>>>
> >>>>>> Hi Vinod,
> >>>>>>
> >>>>>> Nothing confidential,
> >>>>>>
> >>>>>> * With umanaged AMs I'm seeing the trace I've posted a couple of
> >> days
> >>>> ago
> >>>>>> in YARN-1577 (
> >>>>>>
> >>>>>
> >>>>
> >>
> https://issues.apache.org/jira/browse/YARN-1577?focusedCommentId=13891853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13891853
> >>>>>> ).
> >>>>>>
> >>>>>> * Also, Robert has been digging in Oozie testcases failing/getting
> >>>> suck
> >>>>>> with several token renewer threads, this failures happened
> >>>> consistently
> >>>>> at
> >>>>>> different places around the same testcases (like some file
> >> descriptors
> >>>>>> leaking out), reverting YARN-1490 fixes the problem. The potential
> >>>> issue
> >>>>>> with this is that a long running client (oozie) my run into this
> >>>>> situation
> >>>>>> thus becoming unstable.
> >>>>>>
> >>>>>> *Robert,* mind posting to YA

Re: Re-swizzle 2.3

2014-02-10 Thread Aaron T. Myers
I just filed an issue for the fact that browsing the FS from the NN is
broken if you have a directory with the sticky bit set:

https://issues.apache.org/jira/browse/HDFS-5921

I didn't set this to be targeted for 2.3 because it doesn't seem like a
_blocker_ to me, but if we're not going to get 2.3 out today anyway, I'd
like to put this in. It's a small fix, and since many people have the
sticky bit set on /tmp, they won't be able to browse any of the FS
hierarchy from the NN without this fix.

--
Aaron T. Myers
Software Engineer, Cloudera


On Fri, Feb 7, 2014 at 12:45 PM, Vinod Kumar Vavilapalli  wrote:

> Heres what I've done:
>  - Reverted YARN-1493,YARN-1490,YARN-1041,
> YARN-1166,YARN-1566,YARN-1689,YARN-1661 from branch-2.3.
>  - Updated YARN's CHANGES.txt in trunk, branch-2 and branch-2.3.
>  - Updated these JIRAs to have 2.4 as the fix-version.
>  - Compiled branch-2.3.
>
> Let me know if you run into any issues caused by this revert.
>
> Thanks,
> +Vinod
>
>
> On Fri, Feb 7, 2014 at 11:41 AM, Vinod Kumar Vavilapalli <
> vino...@apache.org
> > wrote:
>
> > Haven't heard back from Jian. Reverting the set from branch-2.3 (only).
> Tx
> > for the offline list.
> >
> > +Vinod
> >
> >
> > On Fri, Feb 7, 2014 at 9:08 AM, Alejandro Abdelnur  >wrote:
> >
> >> Vinod, I have the patches to revert most of the JIRAs, the first batch,
> >> I'll send them off line to you.
> >>
> >> Thanks.
> >>
> >>
> >> On Thu, Feb 6, 2014 at 8:56 PM, Vinod Kumar Vavilapalli
> >> wrote:
> >>
> >> >
> >> > Thanks. please post your findings, Jian wrote this part of the code
> and
> >> > between him/me, we can take care of those issues.
> >> >
> >> > +1 for going ahead with the revert on branch-2.3. I'll go do that
> >> tomorrow
> >> > morning unless I hear otherwise from Jian.
> >> >
> >> > Thanks,
> >> > +Vinod
> >> >
> >> >
> >> > On Feb 6, 2014, at 8:28 PM, Alejandro Abdelnur 
> >> wrote:
> >> >
> >> > > Hi Vinod,
> >> > >
> >> > > Nothing confidential,
> >> > >
> >> > > * With umanaged AMs I'm seeing the trace I've posted a couple of
> days
> >> ago
> >> > > in YARN-1577 (
> >> > >
> >> >
> >>
> https://issues.apache.org/jira/browse/YARN-1577?focusedCommentId=13891853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13891853
> >> > > ).
> >> > >
> >> > > * Also, Robert has been digging in Oozie testcases failing/getting
> >> suck
> >> > > with several token renewer threads, this failures happened
> >> consistently
> >> > at
> >> > > different places around the same testcases (like some file
> descriptors
> >> > > leaking out), reverting YARN-1490 fixes the problem. The potential
> >> issue
> >> > > with this is that a long running client (oozie) my run into this
> >> > situation
> >> > > thus becoming unstable.
> >> > >
> >> > > *Robert,* mind posting to YARN-1490 the jvm thread dump at the time
> of
> >> > test
> >> > > hanging?
> >> > >
> >> > > After YARN-1493 & YARN-1490 we have a couple of JIRAs trying to fix
> >> > issues
> >> > > introduced by them, and we still didn't get them right.
> >> > >
> >> > > Because this, the improvements driven by YARN-1493 & YARN-1490 seem
> >> that
> >> > > require more work before being stable.
> >> > >
> >> > > IMO, being conservative, we should do 2.3 without them and roll them
> >> with
> >> > > 2.4. If we want to do regular releases we will have to make this
> kind
> >> of
> >> > > calls, else we will start dragging the releases.
> >> > >
> >> > > Sounds like a plan?
> >> > >
> >> > > Thanks.
> >> > >
> >> > >
> >> > >
> >> > > On Thu, Feb 6, 2014 at 6:27 PM, Vinod Kumar Vavilapalli
> >> > > wrote:
> >> > >
> >> > >> Hey
> >> > >>
> >> > >> I am not against removing them from 2.3 if that is helpful for
> >> progress.
> >> > >&

[jira] [Created] (HDFS-5921) Cannot browse file system via NN web UI if any directory has the sticky bit set

2014-02-10 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-5921:


 Summary: Cannot browse file system via NN web UI if any directory 
has the sticky bit set
 Key: HDFS-5921
 URL: https://issues.apache.org/jira/browse/HDFS-5921
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.3.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
Priority: Critical


You'll see an error like this in the JS console if any directory has the sticky 
bit set:

{noformat}
'helper_to_permission': function(chunk, ctx, bodies, params) {

var exec = ((parms.perm % 10) & 1) == 1;
Uncaught ReferenceError: parms is not defined
{noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Re: Re-swizzle 2.3

2014-01-30 Thread Aaron T. Myers
I just committed HADOOP-10310 to branch-2.3, so we're good to go there.
(Thanks to Andrew and Daryn for the prompt reviews.)

--
Aaron T. Myers
Software Engineer, Cloudera


On Wed, Jan 29, 2014 at 6:52 PM, Aaron T. Myers  wrote:

> I just filed this JIRA as a blocker for 2.3:
> https://issues.apache.org/jira/browse/HADOOP-10310
>
> The tl;dr is that JNs will not work with security enabled without this
> fix. If others don't think that supporting QJM with security enabled
> warrants a blocker for 2.3, then we can certainly lower the priority, but
> it seems pretty important to me.
>
> Best,
> Aaron
>
> --
> Aaron T. Myers
> Software Engineer, Cloudera
>
>
> On Wed, Jan 29, 2014 at 6:24 PM, Andrew Wang wrote:
>
>> I just finished tuning up branch-2.3 and fixing up the HDFS and Common
>> CHANGES.txt in trunk, branch-2, and branch-2.3. I had to merge back a few
>> JIRAs committed between the swizzle and now where the fix version was 2.3
>> but weren't in branch-2.3.
>>
>> I think the only two HDFS and Common JIRAs that are marked for 2.4 are
>> these:
>>
>> HDFS-5842 Cannot create hftp filesystem when using a proxy user ugi and a
>> doAs on a secure cluster
>> HDFS-5781 Use an array to record the mapping between FSEditLogOpCode and
>> the corresponding byte value
>>
>> Jing, these both look safe to me if you want to merge them back, or I can
>> just do it.
>>
>> Thanks,
>> Andrew
>>
>> On Wed, Jan 29, 2014 at 1:21 PM, Doug Cutting  wrote:
>> >
>> > On Wed, Jan 29, 2014 at 12:30 PM, Jason Lowe 
>> wrote:
>> > >  It is a bit concerning that the JIRA history showed that the target
>> version
>> > > was set at some point in the past but no record of it being cleared.
>> >
>> > Perhaps the version itself was renamed?
>> >
>> > Doug
>>
>
>


Re: Re-swizzle 2.3

2014-01-29 Thread Aaron T. Myers
I just filed this JIRA as a blocker for 2.3:
https://issues.apache.org/jira/browse/HADOOP-10310

The tl;dr is that JNs will not work with security enabled without this fix.
If others don't think that supporting QJM with security enabled warrants a
blocker for 2.3, then we can certainly lower the priority, but it seems
pretty important to me.

Best,
Aaron

--
Aaron T. Myers
Software Engineer, Cloudera


On Wed, Jan 29, 2014 at 6:24 PM, Andrew Wang wrote:

> I just finished tuning up branch-2.3 and fixing up the HDFS and Common
> CHANGES.txt in trunk, branch-2, and branch-2.3. I had to merge back a few
> JIRAs committed between the swizzle and now where the fix version was 2.3
> but weren't in branch-2.3.
>
> I think the only two HDFS and Common JIRAs that are marked for 2.4 are
> these:
>
> HDFS-5842 Cannot create hftp filesystem when using a proxy user ugi and a
> doAs on a secure cluster
> HDFS-5781 Use an array to record the mapping between FSEditLogOpCode and
> the corresponding byte value
>
> Jing, these both look safe to me if you want to merge them back, or I can
> just do it.
>
> Thanks,
> Andrew
>
> On Wed, Jan 29, 2014 at 1:21 PM, Doug Cutting  wrote:
> >
> > On Wed, Jan 29, 2014 at 12:30 PM, Jason Lowe 
> wrote:
> > >  It is a bit concerning that the JIRA history showed that the target
> version
> > > was set at some point in the past but no record of it being cleared.
> >
> > Perhaps the version itself was renamed?
> >
> > Doug
>


[jira] [Created] (HDFS-5840) Follow-up to HDFS-5138 to improve error handling during partial upgrade failures

2014-01-27 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-5840:


 Summary: Follow-up to HDFS-5138 to improve error handling during 
partial upgrade failures
 Key: HDFS-5840
 URL: https://issues.apache.org/jira/browse/HDFS-5840
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.0.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
 Fix For: 3.0.0


Suresh posted some good comment in HDFS-5138 after that patch had already been 
committed to trunk. This JIRA is to address those. See the first comment of 
this JIRA for the full content of the review.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5517) Lower the default maximum number of blocks per file

2013-11-14 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-5517:


 Summary: Lower the default maximum number of blocks per file
 Key: HDFS-5517
 URL: https://issues.apache.org/jira/browse/HDFS-5517
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.2.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


We introduced the maximum number of blocks per file in HDFS-4305, but we set 
the default to 1MM. In practice this limit is so high as to never be hit, 
whereas we know that an individual file with 10s of thousands of blocks can 
cause problems. We should lower the default value, in my opinion to 10k.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HDFS-5433) When reloading fsimage during checkpointing, we should clear existing snapshottable directories

2013-10-25 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-5433:


 Summary: When reloading fsimage during checkpointing, we should 
clear existing snapshottable directories
 Key: HDFS-5433
 URL: https://issues.apache.org/jira/browse/HDFS-5433
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 2.2.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
Priority: Critical


The complete set of snapshottable directories are referenced both via the file 
system tree and in the SnapshotManager class. It's possible that when the 2NN 
performs a checkpoint, it will reload its in-memory state based on a new 
fsimage from the NN, but will not clear the set of snapshottable directories 
referenced by the SnapshotManager. In this case, the 2NN will write out an 
fsimage that cannot be loaded, since the integer written to the fsimage 
indicating the number of snapshottable directories will be out of sync with the 
actual number of snapshottable directories serialized to the fsimage.

This is basically the same as HDFS-3835, but for snapshottable directories 
instead of delegation tokens.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


Re: [VOTE] Merge HDFS-4949 to trunk

2013-10-24 Thread Aaron T. Myers
I don't necessarily disagree with the general questions about the
procedural issues of merge votes. Thanks for bringing that up in the other
thread you mentioned. To some extent it seems like much of this has been
based on custom, and if folks feel that more precisely defining the merge
vote process is warranted, then I think we should take that up over on that
thread.

With regard to this particular merge vote, I've spoken with Chris offline
about his feelings on this. He said that he is not dead-set on restarting
the vote, though he suspects that others may be. It seems to me the
remaining unfinished asks (e.g. updating the design doc) can reasonably be
done either after this vote but before the merge to trunk proper, or could
even reasonably be done after merging to trunk.

Given that, I'll lend my +1 to this merge. I've been reviewing the branch
pretty consistently since work started on it, and have personally
run/tested several builds of it along the way. I've also reviewed the
design thoroughly. The implementation, overall design, and API seem to me
plenty stable enough to be merged into trunk. I know that there remains a
handful of javac warnings in the branch that aren't in trunk, but I trust
those will be taken care of before the merge.

If anyone out there does feel strongly that this merge vote should be
restarted, then please speak up soon. Again, we can restart the vote if
need be, but I honestly think we'll gain very little by doing so.

Best,
Aaron


On Fri, Oct 25, 2013 at 5:45 AM, Chris Nauroth wrote:

> Hi Andrew,
>
> I've come to the conclusion that I'm very confused about merge votes.  :-)
>  It's not just about HDFS-4949.  I'm confused about all merge votes.
>  Rather than muddy the waters here, I've started a separate discussion on
> common-dev.
>
> I do agree with the general plan outlined here, and I will comment directly
> on the HDFS-4949 jira with a binding +1 when I see that we've completed
> that plan.
>
> Chris Nauroth
> Hortonworks
> http://hortonworks.com/
>
>
>
> On Wed, Oct 23, 2013 at 2:18 PM, Andrew Wang  >wrote:
>
> > Hey Chris,
> >
> > Right now we're on track to have all of those things done by tomorrow.
> > Since the remaining issues are either not technical or do not involve
> major
> > changes, I was hoping we could +1 this merge vote in the spirit of "+1
> > pending jenkins". We've gotten clean unit test runs on upstream Jenkins
> as
> > well, so the only fixups we should need for test-patch.sh are findbugs
> and
> > javac (which are normally pretty trivial to clean up). Of course, all of
> > your listed prereqs and test-patch would be taken care of before actually
> > merging to trunk.
> >
> > So, we can reset the vote if you feel strongly about this, but it seems
> > like the only real result will be delaying the merge by a week.
> >
> > Thanks,
> > Andrew
> >
> >
> > On Wed, Oct 23, 2013 at 1:03 PM, Chris Nauroth  > >wrote:
> >
> > > I've received some feedback that we haven't handled this merge vote the
> > > same as other comparable merge votes, and that the vote should be reset
> > > because of this.
> > >
> > > The recent custom is that we only call for the merge vote after all
> > > pre-requisites have been satisfied.  This would include committing to
> the
> > > feature branch all patches that the devs deem necessary before the code
> > > lands in trunk, posting a test plan, posting an updated design doc in
> > case
> > > implementation choices diverged from the original design doc, and
> > getting a
> > > good test-patch run from Jenkins on the merge patch.  This was the
> > process
> > > followed for other recent major features like HDFS-2802 (snapshots),
> > > HDFS-347 (short-circuit reads via sharing file descriptors), and
> > > HADOOP-8562 (Windows compatibility).  In this thread, we've diverged
> from
> > > that process by calling for a vote on a branch that hasn't yet
> completed
> > > the pre-requisites and stating a plan for work to be done before the
> > merge.
> > >
> > > I still support this work, but can we please restart the vote after the
> > > pre-requisites have landed in the branch?
> > >
> > > Chris Nauroth
> > > Hortonworks
> > > http://hortonworks.com/
> > >
> > >
> > >
> > > On Fri, Oct 18, 2013 at 1:37 PM, Chris Nauroth <
> cnaur...@hortonworks.com
> > > >wrote:
> > >
> > > > +1
> > > >
> > > > Sounds great!
> > > >
> > > > Regarding testing caching+federation, this is another thing that I
> had
> > > > intended to pick up as part of HDFS-5149.  I'm not sure if I can get
> > this
> > > > done in the next 7 days, so I'll keep you posted.
> > > >
> > > > Chris Nauroth
> > > > Hortonworks
> > > > http://hortonworks.com/
> > > >
> > > >
> > > >
> > > > On Fri, Oct 18, 2013 at 11:15 AM, Colin McCabe <
> cmcc...@alumni.cmu.edu
> > > >wrote:
> > > >
> > > >> Hi Chris,
> > > >>
> > > >> I think it's feasible to complete those tasks in the next 7 days.
> > > >> Andrew is on HDFS-5386.
> > > >>
> > > >> The test plan document is a great idea.

Re: [VOTE] Merge HDFS-4949 to trunk

2013-10-24 Thread Aaron T. Myers
On Thu, Oct 24, 2013 at 6:18 AM, Andrew Wang wrote:

> Right now we're on track to have all of those things done by tomorrow.
> Since the remaining issues are either not technical or do not involve major
> changes, I was hoping we could +1 this merge vote in the spirit of "+1
> pending jenkins". We've gotten clean unit test runs on upstream Jenkins as
> well, so the only fixups we should need for test-patch.sh are findbugs and
> javac (which are normally pretty trivial to clean up). Of course, all of
> your listed prereqs and test-patch would be taken care of before actually
> merging to trunk.
>
> So, we can reset the vote if you feel strongly about this, but it seems
> like the only real result will be delaying the merge by a week.
>

I agree with this. Chris raised some concerns 6 days ago, but seems like
these have all been addressed since then. Resetting the vote would seem to
serve little purpose except to delay the merge by another week. If the
merge vote were to be restarted, I'd expect that we'd quickly see the
requisite three +1s be cast, and then we'd wait around for 7 days.

Chris, does this make sense to you? Appreciate a prompt response since I
believe this vote is supposed to close at midnight tonight.

Thanks folks.

Best,
Aaron


[jira] [Created] (HDFS-5408) Optional fields in PB definitions should be optional in HTTP deserialization as well

2013-10-23 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-5408:


 Summary: Optional fields in PB definitions should be optional in 
HTTP deserialization as well
 Key: HDFS-5408
 URL: https://issues.apache.org/jira/browse/HDFS-5408
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 2.2.0
Reporter: Aaron T. Myers


As was pointed out by [~andrew.wang] in HDFS-5403, there are a few fields which 
are marked as optional in PB definitions but then assumed to always exist in 
the JSON deserialization of the WebHdfs client. We should handle the 
possibility that these fields will not be present in WebHdfs as well.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HDFS-5403) WebHdfs client cannot communicate with older WebHdfs servers post HDFS-5306

2013-10-22 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-5403:


 Summary: WebHdfs client cannot communicate with older WebHdfs 
servers post HDFS-5306
 Key: HDFS-5403
 URL: https://issues.apache.org/jira/browse/HDFS-5403
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 2.2.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


HDFS-5306 introduced the field infoSecurePort to the DatanodeIDProto PB 
definition and made it optional for compatibility purposes. However, we don't 
correctly the handle the case when this field is not present when deserializing 
the response from a WebHdfs request. This results in an NPE at the client when 
this occurs.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


Re: [VOTE] Release Apache Hadoop 2.2.0

2013-10-13 Thread Aaron T. Myers
+1 (binding)

Downloaded the release, built from tarball, tested a single node cluster.
Everything worked as expected.

--
Aaron T. Myers
Software Engineer, Cloudera


On Mon, Oct 7, 2013 at 12:00 AM, Arun C Murthy  wrote:

> Folks,
>
> I've created a release candidate (rc0) for hadoop-2.2.0 that I would like
> to get released - this release fixes a small number of bugs and some
> protocol/api issues which should ensure they are now stable and will not
> change in hadoop-2.x.
>
> The RC is available at:
> http://people.apache.org/~acmurthy/hadoop-2.2.0-rc0
> The RC tag in svn is here:
> http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.2.0-rc0
>
> The maven artifacts are available via repository.apache.org.
>
> Please try the release and vote; the vote will run for the usual 7 days.
>
> thanks,
> Arun
>
> P.S.: Thanks to Colin, Andrew, Daryn, Chris and others for helping nail
> down the symlinks-related issues. I'll release note the fact that we have
> disabled it in 2.2. Also, thanks to Vinod for some heavy-lifting on the
> YARN side in the last couple of weeks.
>
>
>
>
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>


[jira] [Resolved] (HDFS-3958) Integrate upgrade/finalize/rollback with external journals

2013-10-02 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-3958.
--

Resolution: Duplicate

Resolving as a duplicate of HDFS-5138, which has a lot more discussion about 
how best to do this.

> Integrate upgrade/finalize/rollback with external journals
> --
>
> Key: HDFS-3958
> URL: https://issues.apache.org/jira/browse/HDFS-3958
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>
> Currently the NameNode upgrade/rollback/finalize framework only supports 
> local storage. With edits being stored in pluggable Journals, this could 
> create certain difficulties - in particular, rollback wouldn't actually 
> rollback the external storage to the old state.
> We should look at how to expose the right hooks to the external journal 
> storage to snapshot/rollback/finalize.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Resolved] (HDFS-3133) Add support for DFS upgrade with HA enabled

2013-10-02 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-3133.
--

Resolution: Duplicate

Resolving as a duplicate of HDFS-5138, which has a lot more discussion about 
how best to do this.

> Add support for DFS upgrade with HA enabled
> ---
>
> Key: HDFS-3133
> URL: https://issues.apache.org/jira/browse/HDFS-3133
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha, namenode
>Affects Versions: 2.0.0-alpha
>    Reporter: Aaron T. Myers
>
> For the first implementation of HA NameNode, we punted on allowing DFS 
> upgrade with HA enabled, which makes doing a DFS upgrade on an HA-enabled 
> cluster quite cumbersome and error-prone. We should add better support for 
> this.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HDFS-5289) Race condition in TestRetryCacheWithHA#testCreateSymlink causes spurious test failure

2013-10-02 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-5289:


 Summary: Race condition in TestRetryCacheWithHA#testCreateSymlink 
causes spurious test failure
 Key: HDFS-5289
 URL: https://issues.apache.org/jira/browse/HDFS-5289
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.1.1-beta
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


The code to check if the operation has been completed on the active NN can 
potentially execute before the thread actually doing the operation has run. In 
this case the checking code will retry the check if the result of the check is 
null. However, the test operation does not in fact return null, instead 
throwing an exception if the file doesn't exist yet. We need to catch the 
exception and retry.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HDFS-5223) Allow edit log/fsimage format changes without changing layout version

2013-09-18 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-5223:


 Summary: Allow edit log/fsimage format changes without changing 
layout version
 Key: HDFS-5223
 URL: https://issues.apache.org/jira/browse/HDFS-5223
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.1.1-beta
Reporter: Aaron T. Myers


Currently all HDFS on-disk formats are version by the single layout version. 
This means that even for changes which might be backward compatible, like the 
addition of a new edit log op code, we must go through the full `namenode 
-upgrade' process which requires coordination with DNs, etc. HDFS should 
support a lighter weight alternative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-4299) WebHDFS Should Support HA Configuration

2013-09-11 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-4299.
--

Resolution: Duplicate
  Assignee: (was: Haohui Mai)

Thanks, Daisuke. Closing this one out.

> WebHDFS Should Support HA Configuration
> ---
>
> Key: HDFS-4299
> URL: https://issues.apache.org/jira/browse/HDFS-4299
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Daisuke Kobayashi
>
> WebHDFS clients connect directly to NameNodes rather than use a Hadoop 
> client, so there is no failover capability.  Though a workaround is available 
> to use HttpFS with an HA client, WebHDFS also should support HA configuration.
> Please see also: https://issues.cloudera.org/browse/DISTRO-403

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-5159) Secondary NameNode fails to checkpoint if error occurs downloading edits on first checkpoint

2013-09-03 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-5159:


 Summary: Secondary NameNode fails to checkpoint if error occurs 
downloading edits on first checkpoint
 Key: HDFS-5159
 URL: https://issues.apache.org/jira/browse/HDFS-5159
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.1.0-beta
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


The 2NN will avoid downloading/loading a new fsimage if its local copy of 
fsimage is the same as the version on the NN. However, the decision to *load* 
the fsimage from disk into memory is based only on the on-disk fsimage version. 
If an error occurs between downloading and loading the fsimage on the first 
checkpoint attempt, the 2NN will never load the fsimage, and then on subsequent 
checkpoint attempts it will not load the on-disk fsimage and thus will never 
checkpoint successfully.

Example error message in the first comment of this ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [VOTE] Release Apache Hadoop 2.0.6-alpha (RC1)

2013-08-21 Thread Aaron T. Myers
+1 (binding)

I downloaded the bits, set up a 4-node cluster, and ran some example jobs.
Looks good to me.


--
Aaron T. Myers
Software Engineer, Cloudera


On Thu, Aug 15, 2013 at 10:29 PM, Konstantin Boudnik  wrote:

> All,
>
> I have created a release candidate (rc1) for hadoop-2.0.6-alpha that I
> would
> like to release.
>
> This is a stabilization release that includes fixed for a couple a of
> issues
> as outlined on the security list.
>
> The RC is available at:
> http://people.apache.org/~cos/hadoop-2.0.6-alpha-rc1/
> The RC tag in svn is here:
> http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.0.6-alpha-rc1
>
> The maven artifacts are available via repository.apache.org.
>
> The only difference between rc0 and rc1 is ASL added to releasenotes.html
> and
> updated release dates in CHANGES.txt files.
>
> Please try the release bits and vote; the vote will run for the usual 7
> days.
>
> Thanks for your voting
>   Cos
>
>


Re: [VOTE] Release Apache Hadoop 2.1.0-beta

2013-08-20 Thread Aaron T. Myers
I was evaluating the release bits when I noticed that the change done in
HDFS-4763 to add support for starting the HDFS NFSv3 gateway, which is
marked with a "fix version" of 2.1.0-beta and included in the release notes
of RC2, is not in fact included in the RC2 release bits. It looks to me
like the change is included in branch-2.1-beta, but not branch-2.1.0-beta.

Particularly since the release notes in RC2 are incorrect in claiming that
this change is in this release, it seems like a pretty serious
issue. Ordinarily I'd say that this issue should result in a new RC, and I
would vote -1 on RC2. But, given the previous discussion that folks are
interested in releasing 2.1.0-beta with several fairly substantial bugs
that we already know about, I'll withhold my vote. If RC2 ends up getting
released as-is, we should be sure to change the fix version field on that
JIRA to be correct.

--
Aaron T. Myers
Software Engineer, Cloudera


On Thu, Aug 15, 2013 at 2:15 PM, Arun C Murthy  wrote:

> Folks,
>
> I've created a release candidate (rc2) for hadoop-2.1.0-beta that I would
> like to get released - this fixes the bugs we saw since the last go-around
> (rc1).
>
> The RC is available at:
> http://people.apache.org/~acmurthy/hadoop-2.1.0-beta-rc2/
> The RC tag in svn is here:
> http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.1.0-beta-rc2
>
> The maven artifacts are available via repository.apache.org.
>
> Please try the release and vote; the vote will run for the usual 7 days.
>
> thanks,
> Arun
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>


[jira] [Created] (HDFS-5102) Snapshot names should not be allowed to contain slash characters

2013-08-15 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-5102:


 Summary: Snapshot names should not be allowed to contain slash 
characters
 Key: HDFS-5102
 URL: https://issues.apache.org/jira/browse/HDFS-5102
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 2.1.0-beta
Reporter: Aaron T. Myers


Snapshots of a snapshottable directory are allowed to have arbitrary names. 
Presently, if you create a snapshot with a snapshot name that begins with a "/" 
character, this will be allowed, but later attempts to access this snapshot 
will fail because of the way the {{Path}} class deals with consecutive "/" 
characters. I suggest we disallow "/" from appearing in snapshot names.

An example of this is in the first comment on this JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-5097) TestDoAsEffectiveUser can fail on JDK 7

2013-08-14 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-5097:


 Summary: TestDoAsEffectiveUser can fail on JDK 7
 Key: HDFS-5097
 URL: https://issues.apache.org/jira/browse/HDFS-5097
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.1.0-beta
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
Priority: Minor


Another issue with the test method execution order changing between JDK 6 and 7.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-5064) Standby checkpoints should not block concurrent readers

2013-08-02 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-5064:


 Summary: Standby checkpoints should not block concurrent readers
 Key: HDFS-5064
 URL: https://issues.apache.org/jira/browse/HDFS-5064
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, namenode
Affects Versions: 2.1.1-beta
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


We've observed an issue which causes fetches of the {{/jmx}} page of the NN to 
take a long time to load when the standby is in the process of creating a 
checkpoint.

Even though both creating the checkpoint and gathering the statistics for 
{{/jmx}} take only the FSNS read lock, the issue is that since the FSNS uses a 
_fair_ RW lock, a single writer attempting to get the lock will block all 
threads attempting to get only the read lock for the duration of the 
checkpoint. This will cause {{/jmx}}, and really any thread only attempting to 
get the read lock, to block for the duration of the checkpoint, even though 
they should be able to proceed concurrently with the checkpointing thread.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-5060) NN should proactively perform a saveNamespace if it has a huge number of outstanding uncheckpointed transactions

2013-08-02 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-5060:


 Summary: NN should proactively perform a saveNamespace if it has a 
huge number of outstanding uncheckpointed transactions
 Key: HDFS-5060
 URL: https://issues.apache.org/jira/browse/HDFS-5060
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.1.0-beta
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


In a properly-functioning HDFS system, checkpoints will be triggered either by 
the secondary NN or standby NN regularly, by default every hour or 1MM 
outstanding edits transactions, whichever come first. However, in cases where 
this second node is down for an extended period of time, the number of 
outstanding transactions can grow so large as to cause a restart to take an 
inordinately long time.

This JIRA proposes to make the active NN monitor its number of outstanding 
transactions and perform a proactive local saveNamespace if it grows beyond a 
configurable threshold. I'm envisioning something like 10x the configured 
number of transactions which in a properly-functioning cluster would result in 
a checkpoint from the second NN. Though this would be disruptive to clients 
while it's taking place, likely for a few minutes, this seems better than the 
alternative of a subsequent multi-hour restart and should never actually occur 
in a properly-functioning cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-5027) On startup, DN should scan volumes in parallel

2013-07-23 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-5027:


 Summary: On startup, DN should scan volumes in parallel
 Key: HDFS-5027
 URL: https://issues.apache.org/jira/browse/HDFS-5027
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.0.4-alpha
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


On startup the DN must scan all replicas on all configured volumes before the 
initial block report to the NN. This is currently done serially, but can be 
done in parallel to improve startup time of the DN.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Question for edit log retain

2013-07-01 Thread Aaron T. Myers
Hi Azuryy,

On Wed, Jun 26, 2013 at 6:12 PM, Azuryy Yu  wrote:

> Hi Dear all,
>
> I have some confusion for edit log retain,
>
> NNStorageRetentionManager.java:
>
> 1)
> purgeCheckpointsOlderThan()
>   What's mean of check point here?
>

Here, "checkpoint" is referring to the fsimage files on the NN.


>
> 2)purgeOldStorage()
>   I cannot understand the calculation of minimum txid, I think I can
> understand it if I know these keys indicates.
> DFS_NAMENODE_NUM_CHECKPOINTS_RETAINED_KEY
>

This is the number of old fsimage files to retain on the NN during purging.


> DFS_NAMENODE_NUM_EXTRA_EDITS_RETAINED_KEY
>

This is the number of extra transactions to retain during purging. Here,
"extra" is referring to transactions beyond what is strictly required to
play back all FS changes since the last checkpoint (fsimage.)


> DFS_NAMENODE_MAX_EXTRA_EDITS_SEGMENTS_RETAINED_KEY
>

This is the maximum number of extra edit _segments_ to retain during
purging. Segments refer to finalized edit log files, with a start and end
transaction ID. A single edit file is a "segment."


>
> Can any developer give me a short detail explanation? Thanks much.
>


[jira] [Created] (HDFS-4906) HDFS Output streams should not accept writes after being closed

2013-06-14 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-4906:


 Summary: HDFS Output streams should not accept writes after being 
closed
 Key: HDFS-4906
 URL: https://issues.apache.org/jira/browse/HDFS-4906
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.0.5-alpha
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


Currently if one closes an OutputStream obtained from FileSystem#create and 
then calls write(...) on that closed stream, the write will appear to succeed 
without error though no data will be written to HDFS. A subsequent call to 
close will also silently appear to succeed. We should make it so that attempts 
to write to closed streams fails fast.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4830) Typo in config settings for AvailableSpaceVolumeChoosingPolicy in hdfs-default.xml

2013-05-16 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-4830:


 Summary: Typo in config settings for 
AvailableSpaceVolumeChoosingPolicy in hdfs-default.xml
 Key: HDFS-4830
 URL: https://issues.apache.org/jira/browse/HDFS-4830
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.5-beta
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
Priority: Minor


In hdfs-default.xml we have these two settings:

{noformat}
dfs.datanode.fsdataset.volume.choosing.balanced-space-threshold
dfs.datanode.fsdataset.volume.choosing.balanced-space-preference-percent
{noformat}

But in fact they should be these, from DFSConfigKeys.java:

{noformat}
dfs.datanode.available-space-volume-choosing-policy.balanced-space-threshold
dfs.datanode.available-space-volume-choosing-policy.balanced-space-preference-percent
{noformat}

This won't actually affect any functionality, since default values are used in 
the code anyway, but makes the documentation generated from hdfs-default.xml 
inaccurate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-4352) Encapsulate arguments to BlockReaderFactory in a class

2013-05-10 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-4352.
--

Resolution: Won't Fix

Seems some folks don't think this is the best idea.

> Encapsulate arguments to BlockReaderFactory in a class
> --
>
> Key: HDFS-4352
> URL: https://issues.apache.org/jira/browse/HDFS-4352
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Affects Versions: 2.0.3-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: 01b.patch, 01.patch
>
>
> Encapsulate the arguments to BlockReaderFactory in a class to avoid having to 
> pass around 10+ arguments to a few different functions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [VOTE] Merge HDFS-2802 snapshot feature to trunk

2013-05-07 Thread Aaron T. Myers
I'm +1 as well on merging this branch to trunk. I've thoroughly
reviewed/discussed the design and have personally reviewed much of the code
as well. Good stuff.

I would have preferred to wrap up the discussion regarding which API should
include the snapshot create/delete operations before the merge, but I don't
think that should hold up the merge to trunk. No reason we can't discuss it
afterward, though I'd really like to come to a conclusion about that before
we merge it to branch-2. Suresh, I believe the last action on that thread
was you asking Nicholas to move the discussion to a JIRA, but I don't think
that JIRA ever got filed. Mind doing that?


--
Aaron T. Myers
Software Engineer, Cloudera


On Wed, May 1, 2013 at 11:54 AM, Suresh Srinivas wrote:

> This is a follow up to my earlier heads up about merging Snapshot feature
> to trunk - http://markmail.org/message/ixkyku2cebkewnzy. I am happy to
> announce that we have completed the development of the feature. It is ready
> to be merged into trunk.
>
> Development of snapshot feature is tracked in the jira
> https://issues.apache.org/jira/browse/HDFS-2802. This is an important
> feature for HDFS. Please see a brief presentation that describes the
> feature at a highlevel from the Snapshot discussion meetup we had a while
> back -
> https://issues.apache.org/jira/secure/attachment/12552861/Snapshots.pdf.
>
>
> Details of development and testing:
> Development has been done in a separate branch -
> https://svn.apache.org/repos/asf/hadoop/common/branches/HDFS-2802. The
> updated design is posted at -
>
> https://issues.apache.org/jira/secure/attachment/12581376/Snapshots20120429.pdf
> .
> The feature development has involved close to 120 subtasks and close to 25K
> lines of code.
>
> A lot of unit tests have been added as a part of the feature. We also have
> been testing this in a cluster of 5 nodes with a long running test that
> mimics a real cluster usage with emphasis on use cases related to snapshot.
>  We are also testing non-snapshot code path running tests on a separate
> cluster without configuring snapshot. Please see the test plan
>
> https://issues.apache.org/jira/secure/attachment/12575442/snapshot-testplan.pdf
> for
> the details. Once the feature is merged into trunk, we will continue to
> test and fix any bugs that may be found on the trunk.
>
> This feature is a result of many people's collaboration. The bulk of the
> code and design work was done by Nicholas Sze, Jing Zhao, Hari Mankude,
> Brandon Li,  Aprit Agarwal, Sanjay Radia and me. Thanks to Ramya Sunil for
> doing feature testing and finding many bugs. Aaron Myers, Konstantin
> Shvachko, Allen Wittenauer, Chris Nauroth, Todd Lipcon, Michael Stack, Eli
> Collins, Lars Hofhansl
>  and many others contributed to the feature definition, design and code
> reviews on the jiras and during meetup.
>
> This vote runs for a week and closes on 5/8/2013 at 12:00 pm. Here is my +1
> for the merge.
>
> Regards,
> Suresh
>


[jira] [Created] (HDFS-4747) Convert snapshot user guide to APT from XDOC

2013-04-24 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-4747:


 Summary: Convert snapshot user guide to APT from XDOC
 Key: HDFS-4747
 URL: https://issues.apache.org/jira/browse/HDFS-4747
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: documentation
Affects Versions: Snapshot (HDFS-2802)
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


To be consistent with the rest of the HDFS docs, the user snapshots user guide 
should use APT instead of XDOC.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4739) NN can miscalculate the number of extra edit log segments to retain

2013-04-24 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-4739:


 Summary: NN can miscalculate the number of extra edit log segments 
to retain
 Key: HDFS-4739
 URL: https://issues.apache.org/jira/browse/HDFS-4739
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.0.4-alpha
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


The code in NNStorageRetentionManager#purgeOldStorage is intended to place a 
cap on the number of _extra_ edit log segments retained beyond what is strictly 
required to replay the FS history since the last fsimage. In fact this code 
currently places a limit on the _total_ number of extra edit log segments. If 
the number of required segments is greater than the configured cap, there will 
be no data loss, but an ugly error will be thrown and the NN will fail to start.

The fix is simple, and in the meantime a work-around is just to raise the value 
of dfs.namenode.max.extra.edits.segments.retained and start the NN.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Cannot communicate

2013-04-23 Thread Aaron T. Myers
Hi Kevin,

Since it seems like this is a CDH-specific question, I recommend you email
cdh-u...@cloudera.org where you should be able to get some help with this.


--
Aaron T. Myers
Software Engineer, Cloudera


On Mon, Apr 22, 2013 at 10:33 AM,  wrote:

>
> I am on a Ubuntu server. When I go to the link you provided there is a
> hyperlink for Ubuntu but it seems like it is the main site. I tried
> searching for hadoop native but didn't get any useful results. Is there
> some other package that I should install using apt-get?
>
> On Mon, Apr 22, 2013 at 12:02 PM, Vinod Kumar Vavilapalli wrote:
>
>  >
> It means what it says: that hadoop native library isn't available for some
> reason. See 
> http://hadoop.apache.org/docs/**stable/native_libraries.html<http://hadoop.apache.org/docs/stable/native_libraries.html><
> http://hadoop.apache.org/**docs/stable/native_libraries.**html<http://hadoop.apache.org/docs/stable/native_libraries.html>>
>   
> <http://hadoop.apache.org/**docs/stable/native_libraries.**html<http://hadoop.apache.org/docs/stable/native_libraries.html>
> >
>
>
> Thanks,
> +Vinod Kumar Vavilapalli
> Hortonworks Inc.
> http://hortonworks.com/ <http://hortonworks.com/>   <
> http://hortonworks.com/>
>
> On Apr 22, 2013, at 9:58 AM, 
> rkevinbur...@charter.net **openComposeWindow('rkevinburto**n...@charter.net 
> ')>
>n...@charter.net ')>  wrote:
>
>  I was able to add the appropriate Maven dependencies and it "works". I
>> have one last question on this thread. With the added dependencies I am
>> getting the warning:
>>
>> 13/04/22 11:53:18 WARN util.NativeCodeLoader: Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>> What does this mean? Can it be avoided?
>>
>> Thanks again.
>>
>>
>> On Mon, Apr 22, 2013 at 11:17 AM, Kevin Burton wrote:
>>
>>  What dependency for the Maven project should I use?
>>>
>>> On Apr 22, 2013, at 10:02 AM, Ted Yu 
>>> >> **openComposeWindow('yuzhihong@**gmail.com ')>
>>> ')>
>>> > wrote:
>>>
>>>  The exception was due to incompatible RPC versions between Apache maven
>>>> artifacts and CDH4.
>>>>
>>>> I suggest you build the project with same hadoop version as in your
>>>> cluster.
>>>>
>>>> On Mon, Apr 22, 2013 at 7:50 AM, Kevin Burton 
>>>> >>> **openComposeWindow('rkevinburto**n...@charter.net')>
>>>>   >>> n...@charter.net ')> >wrote:
>>>>
>>>>
>>>>  I am relatively new to Hadoop and am working through a Manning
>>>>> publication
>>>>> "Hadoop in Action". One of the first program in the book (page 44)
>>>>> gives me
>>>>> a Java exception: org.apache.hadoop.ipc.**RemoteException: Server IPC
>>>>> version
>>>>> 7 cannot communicate with client version 3.
>>>>>
>>>>> My Hadoop distribution is CDH4. The Java Maven project takes its
>>>>> dependency from Apache. The exception comes from a line involving the
>>>>> "Configuration" class.
>>>>>
>>>>> Any idea on how to avoid this exception?
>>>>>
>>>>


Re: Convention question for using DFSConfigKey constants : are zeros magic?

2013-04-23 Thread Aaron T. Myers
On Fri, Apr 19, 2013 at 3:40 PM, Jay Vyas  wrote:

> The more generic question is wether or not there is enforcement of naming
> conventions or commenting for special values in numeric configuration
> parameters: I'm interpretting your answer as "No"...?
>

There's certainly no "enforcement" of it, no. We try to use consistent
prefixes, append units like "millis" or "seconds" to config settings which
represent time, etc. but this is just by convention.

--
Aaron T. Myers
Software Engineer, Cloudera


Re: Convention question for using DFSConfigKey constants : are zeros magic?

2013-04-19 Thread Aaron T. Myers
Hi Jay,

On Sat, Apr 20, 2013 at 1:10 AM, Jay Vyas  wrote:

> I recently looked into the HDFS source tree to determine idioms with
> respect to a hairy debate about the threshold between what is and is not a
> magic number, and found that :  And it appears that the number zero is NOT
> considered magic - at least not in the HDFS source code.
>

It's certainly not magic in the Configuration class interpretation of it,
though I think if you surveyed the full source code you'd find that there
won't be much consistency with regard to ad hoc checks in the code for
certain values, like you've identified below.


>
> I found that that DFSConfigKeys.java defines DEFAULT values of zeros for
> some fields, and those defaults result in non-quantitative interpretation
> of the field.
>
> For example:
> dfs.image.transfer.bandwidthPerSec
>
> Is commented like so:
> public static final long DFS_IMAGE_TRANSFER_RATE_DEFAULT = 0;  //no
> throttling
>
> However in the implementation of these defaults, "magic" zeros are used
> without commenting:
> if (transferBandwidth > 0) {
>   throttler = new DataTransferThrottler(transferBandwidth);
> }
>
> 
>
> Seems like the 0 above would be better replaced with
> DFS_IMAGE_TRANSFER_RATE_DEFAULT since the "no throttling" behaviour is
> defined with the constant in the DFSConfigKeys file, and not defined in the
>
> hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/GetImageServlet.java.
>

I don't agree with this. What if we later changed the default to something
greater than 0 in DFSConfigKeys? If the code were comparing against the
value DFS_IMAGE_TRANSFER_RATE_DEFAULT, the check in the code would then be
wrong. The only value for that config that should denote "no throttling" is
0, regardless of what the default is, so the explicit comparison against 0
makes sense to me.


>
>
> 
>
> Trying to get a feel for if there are conventions  to enforce in some code
> reviews for our hadoop dependent configuration code.  We'd like to follow
> hadoopy idioms if possible..
>

I'd say the main conventions you should concern yourself with for this
purpose is config setting naming, e.g. use consistent prefixes within your
own code, use lower case separated by dots.and-dashes, etc.


>
>
> --
> Jay Vyas
> http://jayunit100.blogspot.com
>


Re: Heads up - Snapshots feature merge into trunk

2013-04-18 Thread Aaron T. Myers
On Fri, Apr 19, 2013 at 6:53 AM, Tsz Wo Sze  wrote:

> HdfsAdmin is also for admin operations.  However, createSnapshot etc
> methods aren't.
>

I agree that they're not administrative operations in the sense that they
don't strictly require super user privilege, but they are "administrative"
in the sense that they will most-often be used by those administering HDFS.
The HdfsAdmin class should not be construed to contain only operations
which require super user privilege, even though that happens to be the case
right now. It's intended as just a public API for HDFS-specific operations.

Regardless, my point is not necessarily that these operations should go
into the HdfsAdmin class, but rather that they shouldn't go into the
FileSystem class, since the snapshots API doesn't seem to me like it will
generalize to other FileSystem implementations.

--
Aaron T. Myers
Software Engineer, Cloudera


Re: Heads up - Snapshots feature merge into trunk

2013-04-18 Thread Aaron T. Myers
On Fri, Apr 19, 2013 at 4:48 AM, Tsz Wo Sze  wrote:

> Currently, allowSnapshot(..) and disallowSnapshot(..) are already in
> HdfsAdmin.
>

Ah, my bad. Not sure how I missed those. Good to see. Though, now that I
look at them, those methods should really be taking Paths as arguments, not
Strings. This is obviously quite minor, though.


>   The other operations createSnapshot(..), renameSnapshot(..) and
> deleteSnapshot(..) are actually user operations and they are declared in
> FileSystem.  Users can take snapshots for their own directories once admin
> has allowed snapshots for those directories.  Snapshot is not a
> HDFS-specific operation.  Many other file systems do support it.  No?
>

Certainly other "file systems" support it, e.g. WAFL, ZFS, etc, but do
other "FileSystem" (the Hadoop class) implementations, e.g.
LocalFileSystem, S3FileSystem, etc? Will they ever? If they do, will they
support sub-tree snapshots like HDFS does? Snapshots in general seem like
something whose implementation, interface, etc. are highly file
system-specific, and thus I don't think it makes a ton of sense to put that
API in what is intended to be a broad, stable interface. If we were to move
these operations into the HdfsAdmin interface, there's nothing to stop
users from using that interface instead of FileSystem. After all, that was
the point of adding the HdfsAdmin class in the first place - to have a
public API for performing HDFS-specific operations.

--
Aaron T. Myers
Software Engineer, Cloudera


Re: Heads up - Snapshots feature merge into trunk

2013-04-17 Thread Aaron T. Myers
I'm very excited to see that this project is nearing completion. I've been
following the development pretty closely and am very much looking forward
to getting this merged to trunk.

One thing that I do think we should address before the merge is moving the
programmatic APIs for working with snapshots. I've brought this up before,
and was told that it would be done in a separate JIRA, but I don't think
that JIRA was ever filed.

As it stands right now, the API for using snapshots is the following:

1. The API to create/delete/rename snapshots are in FileSystem.
2. The API to mark directories as snapshottable or not only exists in
DistributedFileSystem and DFSAdmin, neither of which are intended to be
public APIs.

In my opinion (and I think this was shared by others at the last snapshots
design meetup?) we should move #1 out of the FileSystem class since these
are primarily administrative APIs, and it is unlikely that any other
FileSystem implementation besides HDFS will ever implement these commands.
Also, #2 should really be in some public (not necessarily stable, but
public) class for use by tools which are used to administer HDFS. In my
opinion the most natural place for both of these APIs is in the HdfsAdmin
class, which is a public/evolving interface explicitly for these sorts of
operations.

What are others thoughts on this subject?

Best,
Aaron

--
Aaron T. Myers
Software Engineer, Cloudera


On Sat, Apr 13, 2013 at 10:05 AM, Suresh Srinivas wrote:

> Support for snapshots feature is being worked on in the jira
> https://issues.apache.org/jira/browse/HDFS-2802. This is an important and
> a
> large feature in HDFS. Please see a brief presentation that describes the
> feature at a highlevel from the Snapshot discussion meetup we had a while
> back -
> https://issues.apache.org/jira/secure/attachment/12552861/Snapshots.pdf.
>
> I am exicted to announce that the feature development will soon be
> completed. Please see the jira for the design and the details of the
> subtasks. This is a heads up about the merge vote mail that will soon be
> sent.
>
> Details of development and testing:
> Development has been done in a separate branch -
> https://svn.apache.org/repos/asf/hadoop/common/branches/HDFS-2802. The
> design is posted at -
>
> https://issues.apache.org/jira/secure/attachment/12551474/Snapshots20121030.pdf
> .
> The feature development has involved close to 100 subtasks and close to 20K
> lines of code.
>
> A lot of unit tests have been added as a part of the feature. We also have
> been testing this in a cluster of 5 nodes with a long running test that
> mimics a real cluster usage with emphasis on use cases related to
> snapshots.  Please see the test plan
>
> https://issues.apache.org/jira/secure/attachment/12575442/snapshot-testplan.pdffor
> the details.
>
> Next steps, before calling for merge vote, we need to get the following
> done:
> - Add user documentation that describes the feature, and how to use it
> - Complete some of the pending tasks
> - Continue testing the feature and fix any bugs that might come up
> - Update the design document
>
> Thanks to everyone who has participated in design and development of this
> feature. Please review the work and help in testing the feature.
>
> Regards,
> Suresh
>


Re: VOTE: HDFS-347 merge

2013-04-12 Thread Aaron T. Myers
Since the merge vote passed, I have merged the HDFS-347 branch to trunk.
Leaving the JIRA open for now until we also do the merge to branch-2.

Colin, thanks a ton for the monster contribution. This is a long time in
coming.


--
Aaron T. Myers
Software Engineer, Cloudera


On Thu, Apr 11, 2013 at 11:05 AM, Colin McCabe wrote:

> The merge vote is now closed.  With three +1s, it passes.
>
> thanks,
> Colin
>
>
> On Wed, Apr 10, 2013 at 10:00 PM, Aaron T. Myers  wrote:
>
> > I'm +1 as well. I've reviewed much of the code as well and have
> personally
> > seen it running in production at several different sites. I agree with
> Todd
> > that it's a substantial improvement in operability.
> >
> > Best,
> > Aaron
> >
> > On Apr 8, 2013, at 1:19 PM, Todd Lipcon  wrote:
> >
> > > +1 for the branch merge. I've reviewed all of the code in the branch,
> and
> > > we have people now running this code in production scenarios. It is as
> > > functional as the old version and way easier to set up/configure.
> > >
> > > -Todd
> > >
> > > On Mon, Apr 1, 2013 at 4:32 PM, Colin McCabe 
> > wrote:
> > >
> > >> Hi all,
> > >>
> > >> I think it's time to merge the HDFS-347 branch back to trunk.  It's
> been
> > >> under
> > >> review and testing for several months, and provides both a performance
> > >> advantage, and the ability to use short-circuit local reads without
> > >> compromising system security.
> > >>
> > >> Previously, we tried to merge this and the objection was brought up
> > that we
> > >> should keep the old, insecure short-circuit local reads around so that
> > >> platforms for which secure SCR had not yet been implemented could use
> it
> > >> (e.g. Windows).  This has been addressed-- see HDFS-4538 for details.
> > >> Suresh has also volunteered to maintain the insecure SCR code until
> > secure
> > >> SCR can be implemented for Windows.
> > >>
> > >> Please cast your vote by EOD Monday 4/8.
> > >>
> > >> best,
> > >> Colin
> > >>
> > >
> > >
> > >
> > > --
> > > Todd Lipcon
> > > Software Engineer, Cloudera
> >
>


Re: VOTE: HDFS-347 merge

2013-04-10 Thread Aaron T. Myers
I'm +1 as well. I've reviewed much of the code as well and have personally seen 
it running in production at several different sites. I agree with Todd that 
it's a substantial improvement in operability. 

Best,
Aaron

On Apr 8, 2013, at 1:19 PM, Todd Lipcon  wrote:

> +1 for the branch merge. I've reviewed all of the code in the branch, and
> we have people now running this code in production scenarios. It is as
> functional as the old version and way easier to set up/configure.
> 
> -Todd
> 
> On Mon, Apr 1, 2013 at 4:32 PM, Colin McCabe  wrote:
> 
>> Hi all,
>> 
>> I think it's time to merge the HDFS-347 branch back to trunk.  It's been
>> under
>> review and testing for several months, and provides both a performance
>> advantage, and the ability to use short-circuit local reads without
>> compromising system security.
>> 
>> Previously, we tried to merge this and the objection was brought up that we
>> should keep the old, insecure short-circuit local reads around so that
>> platforms for which secure SCR had not yet been implemented could use it
>> (e.g. Windows).  This has been addressed-- see HDFS-4538 for details.
>> Suresh has also volunteered to maintain the insecure SCR code until secure
>> SCR can be implemented for Windows.
>> 
>> Please cast your vote by EOD Monday 4/8.
>> 
>> best,
>> Colin
>> 
> 
> 
> 
> -- 
> Todd Lipcon
> Software Engineer, Cloudera


[jira] [Created] (HDFS-4658) Standby NN will log that it has received a block report "after becoming active"

2013-04-01 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-4658:


 Summary: Standby NN will log that it has received a block report 
"after becoming active"
 Key: HDFS-4658
 URL: https://issues.apache.org/jira/browse/HDFS-4658
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, namenode
Affects Versions: 2.0.4-alpha
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
Priority: Trivial


Even when in the standby state the following line will sometimes be logged:

{noformat}
INFO blockmanagement.BlockManager: BLOCK* processReport: Received first block 
report from 172.21.3.106:50010 after becoming active. Its block contents are no 
longer considered stale
{noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4657) If incremental BR is received before first full BR NN will log a line for every block on a DN

2013-04-01 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-4657:


 Summary: If incremental BR is received before first full BR NN 
will log a line for every block on a DN
 Key: HDFS-4657
 URL: https://issues.apache.org/jira/browse/HDFS-4657
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.0.4-alpha
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


This can impact restart times pretty substantially if the DNs have a lot of 
blocks, and since the FSNS write lock is held while processing the block report 
clients will not make any progress.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4656) DN heartbeat loop can be briefly tight

2013-04-01 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-4656:


 Summary: DN heartbeat loop can be briefly tight
 Key: HDFS-4656
 URL: https://issues.apache.org/jira/browse/HDFS-4656
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.0.4-alpha
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
Priority: Minor


The DN hearbeat loop looks roughly like this:

{code}
if (now - timeOfLastHeartbeat > configuredHeartbeatInterval) {
  // do heartbeat
}
timeToWait = configuredHeartbeatInterval - (now - timeOfLastHeartbeat)
sleep(timeToWait)
{code}

The trouble is that since we sleep for exactly the heartbeat interval, and then 
check to see if we have waited _more_ than that heartbeat interval, we will 
very often have waited exactly the heartbeat interval (in millis), and not more 
than it. In this case we will skip actually performing the heartbeat and will 
calculcate timeToWait as being 0ms. The DN heartbeat loop will then loop 
tightly for 1ms. The solution is just to change the "{{>}}" in the code above 
to "{{>=}}".

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4655) DNA_FINALIZE is logged as being an unknown command by the DN when received from the standby NN

2013-04-01 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-4655:


 Summary: DNA_FINALIZE is logged as being an unknown command by the 
DN when received from the standby NN
 Key: HDFS-4655
 URL: https://issues.apache.org/jira/browse/HDFS-4655
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.0.4-alpha
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
Priority: Minor


This is harmless since the alternative is just to log the command as being 
ignored, but this bug results in a somewhat concerning error message appearing 
in the logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4614) FSNamesystem#getContentSummary should use getPermissionChecker helper method

2013-03-19 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-4614:


 Summary: FSNamesystem#getContentSummary should use 
getPermissionChecker helper method
 Key: HDFS-4614
 URL: https://issues.apache.org/jira/browse/HDFS-4614
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.0.4-alpha
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
Priority: Trivial


HDFS-4222 added this helper method and called it in most places, but missed one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4592) Default values for access time precision are out of sync between hdfs-default.xml and the code

2013-03-11 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-4592:


 Summary: Default values for access time precision are out of sync 
between hdfs-default.xml and the code
 Key: HDFS-4592
 URL: https://issues.apache.org/jira/browse/HDFS-4592
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.0.4-alpha
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


In {{hdfs-default.xml}} we have:
{code}

  dfs.namenode.accesstime.precision
  360
  The access time for HDFS file is precise upto this value.
   The default value is 1 hour. Setting a value of 0 disables
   access times for HDFS.
  

{code}

But in {{FSNamesystem}} we have:
{code}
this.accessTimePrecision = conf.getLong(DFS_NAMENODE_ACCESSTIME_PRECISION_KEY, 
0);
{code}

We properly define {{DFS_NAMENODE_ACCESSTIME_PRECISION_DEFAULT}} in 
DFSConfigKeys.java, but it's not actually referenced anywhere in the code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4591) HA clients can fail to fail over while Standby NN is performing long checkpoint

2013-03-11 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-4591:


 Summary: HA clients can fail to fail over while Standby NN is 
performing long checkpoint
 Key: HDFS-4591
 URL: https://issues.apache.org/jira/browse/HDFS-4591
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, namenode
Affects Versions: 2.0.4-alpha
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


Clients know to fail over to talk to the Active NN when they perform an RPC to 
the Standby NN and it throws a StandbyException. However, most places in the 
code that check if the NN is in the standby state do so inside the FSNS fsLock. 
Since this lock is held for the duration of the saveNamespace during a 
checkpoint, StandbyExceptions will not be thrown during this time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: are the HDFS javadocs published on the website?

2013-02-25 Thread Aaron T. Myers
That sounds like a good plan to me.


--
Aaron T. Myers
Software Engineer, Cloudera


On Sat, Feb 23, 2013 at 6:11 PM, Andrew Wang wrote:

> Taking silence here to mean we aren't that concerned with the javadocs.
>
> Are we okay with Doug's proposed fix for the broken links?
>
> Thanks,
> Andrew
>
>
> On Thu, Feb 14, 2013 at 4:06 PM, Doug Cutting  wrote:
>
> > All of Hadoop's javadocs were recently lost from our website when it
> > was converted to svnpubsub.  These were historically not stored in
> > subversion but manually added to the website by release managers.
> > When the site was converted to svnpubsub no one had first copied the
> > docs tree into subversion so it was lost.  (It could perhaps be
> > recovered from tape archives, but that would be a pain.)
> >
> > Yesterday, on seeing this, I reconstructed what I could.  I extracted
> > documentation from the release tarballs of recent releases an pushed
> > it into subversion.  Those release tarballs did not seem to include
> > HDFS javadocs.
> >
> > You've found two links to HDFS javadocs in what I restored, and those
> > links, as you note, are broken.  If someone has those javadocs or
> > wants to build them then they can be restored by committing them to
> > subversion under:
> >
> >
> >
> https://svn.apache.org/repos/asf/hadoop/common/site/main/publish/docs/r1.1.1/
> >
> >
> https://svn.apache.org/repos/asf/hadoop/common/site/main/publish/docs/r1.0.4/
> >
> > I've not seen (broken) links to HDFS documentation in the other more
> > recent releases whose documentation I restored.
> >
> > An alternative might be to put a redirect in to the HDFS user guide to
> > fix those two broken links.  If folks prefer that approach I'd be
> > happy to implement it.
> >
> > Doug
> >
> > On Thu, Feb 14, 2013 at 3:48 PM, Andrew Wang 
> > wrote:
> > > Hi all,
> > >
> > > I think something changed recently regarding the online HDFS javadocs.
> > I'm
> > > fairly sure they used to be available online, since it's indexed by
> > google:
> > >
> > >
> >
> https://www.google.com/?q=inurl:distributedfilesystem++site%3Ahadoop.apache.org
> > >
> > > However, all of those results 404 now.
> > >
> > > Going to the current API doc page (
> > > http://hadoop.apache.org/docs/current/api/), the "Hadoop Distributed
> > > FileSystem (HDFS)<
> >
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/hdfs/package-summary.html
> > >"
> > > link also 404's:
> > >
> > >
> >
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/hdfs/package-summary.html
> > >
> > > Is this an intended change? I at least found it really handy to have
> this
> > > stuff indexed and available online, even if they aren't user-facing
> APIs.
> > >
> > > Best,
> > > Andrew
> >
>


Re: VOTE: HDFS-347 merge

2013-02-22 Thread Aaron T. Myers
On Fri, Feb 22, 2013 at 6:32 PM, Tsz Wo Sze  wrote:

> Another
>  substantive concern is that HDFS-347 is not as well tested as
> HDFS-2246.  So, we should keep HDFS-2246 around for sometime and remove
> it later.  Is this the usual practice?
>

I'm proposing we do just that - keep HDFS-2246 around in branch-2 to let
HDFS-347 soak a bit on trunk and then remove HDFS-2246 from branch-2 once
we're confident in HDFS-347 and trunk adds Windows support. As Colin
pointed out, this VOTE has always been about only merging this branch to
trunk.


--
Aaron T. Myers
Software Engineer, Cloudera


Re: VOTE: HDFS-347 merge

2013-02-20 Thread Aaron T. Myers
On Wed, Feb 20, 2013 at 4:29 PM, Chris Douglas  wrote:

> Given that HDFS-347 is a strictly better approach, once committed,
> there will be ample motivation to add support for other OSes and
> remove HDFS-2246 entirely. Nobody is confused about this. There's
> ample precedent for retaining obscure, clumsy features as a temporary
> stop-gap (e.g., service plugins, opaque blobs of bytes in Tasks,
> configurable combiner semantics). What's the virtue of insisting on
> removing this? Unless there was a lot of follow-on work, HDFS-2246
> doesn't look like a lot of code...
>

Though it's not a ton of code, I think that having to support a more
complex fallback path (i.e. try the HDFS-347 method, then fall back to
trying the HDFS-2246 method, then fall back to doing normal TCP reads to
the local DN) will make the code quite a bit hairier for little added
benefit.

How about this proposal for a compromise:

Given that the only substantive concerns with HDFS-347 seem to be about
Windows support for local reads, for now we only merge this branch to
trunk. Support for doing HDFS-2246 style local reads will be removed from
trunk, but retained in branch-2 for now. Only once someone adds support for
doing HDFS-347 style local reads which work on Windows will we consider
merging HDFS-347 to branch-2. This should ensure that there's no feature
regression on branch-2, but also means that we will not need to maintain
the HDFS-2246 code path alongside the HDFS-347 code path indefinitely.

--
Aaron T. Myers
Software Engineer, Cloudera


[jira] [Resolved] (HDFS-4476) HDFS-347: style cleanups

2013-02-15 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-4476.
--

  Resolution: Fixed
Hadoop Flags: Reviewed

I've just committed this to the HDFS-347 branch. Thanks a lot for the 
contribution, Colin.

> HDFS-347: style cleanups
> 
>
> Key: HDFS-4476
> URL: https://issues.apache.org/jira/browse/HDFS-4476
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, hdfs-client, performance
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Attachments: HDFS-4476.001.patch, HDFS-4476.002.patch
>
>
> Clean up some code style issues in HDFS-347.
> DomainSocket.java
>   do not use AtomicInteger for status, add a new class
>   rename fdRef(), fdUnref(boolean), jfds, jbuf, SND_BUF_SIZE, etc.
>   do not override finalize().
>   remove some dead code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-4453) Make a simple doc to describe the usage and design of the shortcircuit read feature

2013-02-11 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-4453.
--

  Resolution: Fixed
Hadoop Flags: Reviewed

I've just committed this to the HDFS-347 branch.

Thanks a lot for the contribution, Colin.

> Make a simple doc to describe the usage and design of the shortcircuit read 
> feature
> ---
>
> Key: HDFS-4453
> URL: https://issues.apache.org/jira/browse/HDFS-4453
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, hdfs-client
>Reporter: Brandon Li
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-4453.001.patch, HDFS-4453.002.patch, 
> HDFS-4453.003.patch, HDFS-4453.004.patch
>
>
> It would be nice to have a document to describe the configuration and design 
> of this feature. Also its relationship with previous short circuit read 
> implementation(HDFS-2246), for example, can they co-exist, or this one is 
> planed to replaces HDFS-2246, or it can fall back on HDFS-2246 when unix 
> domain socket is not supported.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-4485) HDFS-347: DN should chmod socket path a+w

2013-02-08 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-4485.
--

Resolution: Fixed

I've just committed this to the HDFS-347 branch.

Thanks a lot for the contribution, Colin.

> HDFS-347: DN should chmod socket path a+w
> -
>
> Key: HDFS-4485
> URL: https://issues.apache.org/jira/browse/HDFS-4485
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Todd Lipcon
>Assignee: Colin Patrick McCabe
>Priority: Critical
> Attachments: HDFS-4485.001.patch, HDFS-4485.003.patch
>
>
> In cluster-testing HDFS-347, we found that in clusters where the MR job 
> doesn't run as the same user as HDFS, clients wouldn't use short circuit read 
> because of a 'permission denied' error connecting to the socket. It turns out 
> that, in order to connect to a socket, clients need write permissions on the 
> socket file.
> The DN should set these permissions automatically after it creates the socket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [VOTE] Release hadoop-2.0.3-alpha

2013-02-08 Thread Aaron T. Myers
+1 (binding)

I downloaded the src tar ball, built it with the native bits enabled,
started up a little cluster, and ran some sample jobs. Things worked as
expected. I also verified the signatures on the source artifact.

I did bump into one little issue, but I don't think it should be considered
a blocker. When I first tried to start up the RM, it failed to start with
this error:

13/02/08 16:00:31 FATAL resourcemanager.ResourceManager: Error starting
ResourceManager
java.lang.IllegalStateException: Queue configuration missing child queue
names for root
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:328)
 at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:255)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:220)
 at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.init(ResourceManager.java:226)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:710)

And then this on shutdown:

13/02/08 16:00:31 INFO service.CompositeService: Error stopping
ResourceManager
java.lang.NullPointerException
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.stop(ResourceManager.java:590)
 at
org.apache.hadoop.yarn.service.CompositeService$CompositeServiceShutdownHook.run(CompositeService.java:122)
at
org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)

Presumably this is because I don't have the CapacityScheduler queues
configured at all, and the default scheduler is now the CapacityScheduler.
To work around this for my testing, I switched to the FairScheduler and the
RM came up just fine.


--
Aaron T. Myers
Software Engineer, Cloudera


On Wed, Feb 6, 2013 at 7:59 PM, Arun C Murthy  wrote:

> Folks,
>
> I've created a release candidate (rc0) for hadoop-2.0.3-alpha that I would
> like to release.
>
> This release contains several major enhancements such as QJM for HDFS HA,
> multi-resource scheduling for YARN, YARN ResourceManager restart etc.
> Also YARN has achieved significant stability at scale (more details from
> Y! folks here: http://s.apache.org/VYO).
>
> The RC is available at:
> http://people.apache.org/~acmurthy/hadoop-2.0.3-alpha-rc0/
> The RC tag in svn is here:
> http://svn.apache.org/viewvc/hadoop/common/tags/release-2.0.3-alpha-rc0/
>
> The maven artifacts are available via repository.apache.org.
>
> Please try the release and vote; the vote will run for the usual 7 days.
>
> thanks,
> Arun
>
>
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>


[jira] [Resolved] (HDFS-4473) don't create domain socket unless we need it

2013-02-06 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-4473.
--

  Resolution: Fixed
Hadoop Flags: Reviewed

I've just committed this to the HDFS-347 branch.

Thanks a lot for the contribution, Andy.

> don't create domain socket unless we need it
> 
>
> Key: HDFS-4473
> URL: https://issues.apache.org/jira/browse/HDFS-4473
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, hdfs-client, performance
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Attachments: HDFS-4473.001.patch
>
>
> If {{dfs.domain.socket.path}} is set, but we don't have anything enabled 
> which would need it (like {{dfs.client.read.shortcircuit}}), don't create the 
> socket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


  1   2   3   4   >