from:"Chris Nauroth"

Re: [DISCUSS] hadoop branch-3.3+ going to java11 only

2023-03-28 Thread Chris Nauroth

In theory, I like the idea of setting aside Java 8. Unfortunately, I don't
know that upgrading within the 3.3 line adheres to our binary compatibility
policy [1]. I don't see specific discussion of the Java version there, but
it states that you should be able to drop in minor upgrades and have
existing apps keep working. Users might find it surprising if they try to
upgrade a cluster that has JDK 8.

There is also the question of impact on downstream projects [2]. We'd have
to check plans with our consumers.

What about the idea of shooting for a 3.4 release on JDK 11 (or even 17)?
The downside is that we'd probably need to set boundaries on end of
life/limited support for 3.2 and 3.3 to keep the workload manageable.

[1]
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html#Java_Binary_compatibility_for_end-user_applications_i.e._Apache_Hadoop_ABI
[2] https://github.com/apache/spark/blob/v3.3.2/pom.xml#L109

Chris Nauroth


On Tue, Mar 28, 2023 at 11:10 AM Ayush Saxena  wrote:

> >
> >  it's already hard to migrate from JDK8 why not retarget JDK17.
> >
>
> +1, makes sense to me, sounds like a win-win situation to me, though there
> would be some additional issues to chase now :)
>
> -Ayush
>
>
> On Tue, 28 Mar 2023 at 23:29, Wei-Chiu Chuang  wrote:
>
> > My random thoughts. Probably bad takes:
> >
> > There are projects experimenting with JDK17 now.
> > JDK11 active support will end in 6 months. If it's already hard to
> migrate
> > from JDK8 why not retarget JDK17.
> >
> > On Tue, Mar 28, 2023 at 10:30 AM Ayush Saxena 
> wrote:
> >
> >> I know Jersey upgrade as a blocker. Some folks were chasing that last
> >> year during 3.3.4 time, I don’t know where it is now, didn’t see then
> >> what’s the problem there but I remember there was some intitial PR which
> >> did it for HDFS atleast, so I never looked beyond that…
> >>
> >> I too had jdk-11 in my mind, but only for trunk. 3.4.x can stay as
> >> java-11 only branch may be, but that is something later to decide, once
> we
> >> get the code sorted…
> >>
> >> -Ayush
> >>
> >> > On 28-Mar-2023, at 9:16 PM, Steve Loughran
> 
> >> wrote:
> >> >
> >> > well, how about we flip the switch and get on with it.
> >> >
> >> > slf4j seems happy on java11,
> >> >
> >> > side issue, anyone seen test failures on zulu1.8; somehow my test run
> is
> >> > failing and i'm trying to work out whether its a mismatch in command
> >> > line/ide jvm versions, or the 3.3.5 JARs have been built with an
> openjdk
> >> > version which requires IntBuffer implements an overridden method
> >> IntBuffer
> >> > rewind().
> >> >
> >> > java.lang.NoSuchMethodError:
> >> java.nio.IntBuffer.rewind()Ljava/nio/IntBuffer;
> >> >
> >> > at
> >> org.apache.hadoop.fs.FSInputChecker.verifySums(FSInputChecker.java:341)
> >> > at
> >> >
> >>
> org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:308)
> >> > at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:257)
> >> > at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:202)
> >> > at java.io.DataInputStream.read(DataInputStream.java:149)
> >> >
> >> >> On Tue, 28 Mar 2023 at 15:52, Viraj Jasani 
> wrote:
> >> >> IIRC some of the ongoing major dependency upgrades (log4j 1 to 2,
> >> jersey 1
> >> >> to 2 and junit 4 to 5) are blockers for java 11 compile + test
> >> stability.
> >> >> On Tue, Mar 28, 2023 at 4:55 AM Steve Loughran
> >>  >> >> wrote:
> >> >>> Now that hadoop 3.3.5 is out, i want to propose something new
> >> >>> we switch branch-3.3 and trunk to being java11 only
> >> >>> 1. java 11 has been out for years
> >> >>> 2. oracle java 8 is no longer available under "premier support"; you
> >> >>> can't really get upgrades
> >> >>>
> https://www.oracle.com/java/technologies/java-se-support-roadmap.html
> >> >>> 3. openJDK 8 releases != oracle ones, and things you compile with
> them
> >> >>> don't always link to oracle java 8 (some classes in java.nio have
> >> >> added
> >> >>> more overrides)
> >> >>> 4. more and more libraries we want to upgrade to/bundle are java 11
> >> >> only
> >> >>> 5. moving to java 11 would cut our yetus build workload in half, and
> >> >>> line up for adding java 17 builds instead.
> >> >>> I know there are some outstanding issues still in
> >> >>> https://issues.apache.org/jira/browse/HADOOP-16795 -but are they
> >> >> blockers?
> >> >>> Could we just move to java11 and enhance at our leisure, once java8
> >> is no
> >> >>> longer a concern.
> >>
> >> -
> >> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> >> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
> >>
> >>
>

Re: [VOTE] Release Apache Hadoop 3.3.5 (RC3)

2023-03-20 Thread Chris Nauroth

+1

Thank you for the release candidate, Steve!

* Verified all checksums.
* Verified all signatures.
* Built from source, including native code on Linux.
* mvn clean package -Pnative -Psrc -Drequire.openssl -Drequire.snappy
-Drequire.zstd -DskipTests
* Tests passed.
* mvn --fail-never clean test -Pnative -Dparallel-tests
-Drequire.snappy -Drequire.zstd -Drequire.openssl
-Dsurefire.rerunFailingTestsCount=3 -DtestsThreadCount=8
* Checked dependency tree to make sure we have all of the expected library
updates that are mentioned in the release notes.
* mvn -o dependency:tree
* Confirmed that hadoop-openstack is now just a stub placeholder artifact
with no code.
* For ARM verification:
* Ran "file " on all native binaries in the ARM tarball to confirm
they actually came out with ARM as the architecture.
* Output of hadoop checknative -a on ARM looks good.
* Ran a MapReduce job with the native bzip2 codec for compression, and
it worked fine.
* Ran a MapReduce job with YARN configured to use
LinuxContainerExecutor and verified launching the containers through
container-executor worked.

Chris Nauroth


On Mon, Mar 20, 2023 at 3:45 AM Ayush Saxena  wrote:

> +1(Binding)
>
> * Built from source (x86 & ARM)
> * Successful Native Build (x86 & ARM)
> * Verified Checksums (x86 & ARM)
> * Verified Signature (x86 & ARM)
> * Checked the output of hadoop version (x86 & ARM)
> * Verified the output of hadoop checknative (x86 & ARM)
> * Ran some basic HDFS shell commands.
> * Ran some basic Yarn shell commands.
> * Played a bit with HDFS Erasure Coding.
> * Ran TeraGen & TeraSort
> * Browed through NN, DN, RM & NM UI
> * Skimmed over the contents of website.
> * Skimmed over the contents of maven repo.
> * Selectively ran some HDFS & CloudStore tests
>
> Thanx Steve for driving the release. Good Luck!!!
>
> -Ayush
>
> > On 20-Mar-2023, at 12:54 PM, Xiaoqiao He  wrote:
> >
> > +1
> >
> > * Verified signature and checksum of the source tarball.
> > * Built the source code on Ubuntu and OpenJDK 11 by `mvn clean package
> > -DskipTests -Pnative -Pdist -Dtar`.
> > * Setup pseudo cluster with HDFS and YARN.
> > * Run simple FsShell - mkdir/put/get/mv/rm (include EC) and check the
> > result.
> > * Run example mr applications and check the result - Pi & wordcount.
> > * Check the Web UI of NameNode/DataNode/Resourcemanager/NodeManager etc.
> >
> > Thanks Steve for your work.
> >
> > Best Regards,
> > - He Xiaoqiao
> >
> >> On Mon, Mar 20, 2023 at 12:04 PM Masatake Iwasaki <
> iwasak...@oss.nttdata.com>
> >> wrote:
> >>
> >> +1
> >>
> >> + verified the signature and checksum of the source tarball.
> >>
> >> + built from the source tarball on Rocky Linux 8 (x86_64) and OpenJDK 8
> >> with native profile enabled.
> >>   + launched pseudo distributed cluster including kms and httpfs with
> >> Kerberos and SSL enabled.
> >>   + created encryption zone, put and read files via httpfs.
> >>   + ran example MR wordcount over encryption zone.
> >>   + checked the binary of container-executor.
> >>
> >> + built rpm packages by Bigtop (with trivial modifications) on Rocky
> Linux
> >> 8 (aarch64).
> >>   + ran smoke-tests of hdfs, yarn and mapreduce.
> >> + built site documentation and skimmed the contents.
> >>   +  Javadocs are contained.
> >>
> >> Thanks,
> >> Masatake Iwasaki
> >>
> >>> On 2023/03/16 4:47, Steve Loughran wrote:
> >>> Apache Hadoop 3.3.5
> >>>
> >>> Mukund and I have put together a release candidate (RC3) for Hadoop
> >> 3.3.5.
> >>>
> >>> What we would like is for anyone who can to verify the tarballs,
> >> especially
> >>> anyone who can try the arm64 binaries as we want to include them too.
> >>>
> >>> The RC is available at:
> >>> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC3/
> >>>
> >>> The git tag is release-3.3.5-RC3, commit 706d88266ab
> >>>
> >>> The maven artifacts are staged at
> >>>
> https://repository.apache.org/content/repositories/orgapachehadoop-1369/
> >>>
> >>> You can find my public key at:
> >>> https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
> >>>
> >>> Change log
> >>>
> >>
> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC3/CHANGELOG.md
> >>>
> >>> Release notes
> >>>

Re: [VOTE] Release Apache Hadoop 3.3.5 (RC3)

2023-03-18 Thread Chris Nauroth

Yes, I'm in progress on verification, so you can expect to get a vote from
me. Thank you, Steve!

Chris Nauroth


On Sat, Mar 18, 2023 at 9:19 AM Ashutosh Gupta 
wrote:

> Hi Steve
>
> I will also do it by today/tomorrow.
>
> Thanks,
> Ashutosh
>
> On Sat, 18 Mar, 2023, 4:07 pm Steve Loughran,  >
> wrote:
>
> > Thank you for this!
> >
> > Can anyone else with time do a review too? i really want to get this one
> > done, now the HDFS issues are all resolved.
> >
> > I do not want this release to fall by the wayside through lack of votes
> > alone. In fact, I would be very unhappy
> >
> >
> >
> > On Sat, 18 Mar 2023 at 06:47, Viraj Jasani  wrote:
> >
> > > +1 (non-binding)
> > >
> > > * Signature/Checksum: ok
> > > * Rat check (1.8.0_341): ok
> > >  - mvn clean apache-rat:check
> > > * Built from source (1.8.0_341): ok
> > >  - mvn clean install  -DskipTests
> > > * Built tar from source (1.8.0_341): ok
> > >  - mvn clean package  -Pdist -DskipTests -Dtar
> -Dmaven.javadoc.skip=true
> > >
> > > Containerized deployments:
> > > * Deployed and started Hdfs - NN, DN, JN with Hbase 2.5 and Zookeeper
> 3.7
> > > * Deployed and started JHS, RM, NM
> > > * Hbase, hdfs CRUD looks good
> > > * Sample RowCount MapReduce job looks good
> > >
> > > * S3A tests with scale profile looks good
> > >
> > >
> > > On Wed, Mar 15, 2023 at 12:48 PM Steve Loughran
> > > 
> > > wrote:
> > >
> > > > Apache Hadoop 3.3.5
> > > >
> > > > Mukund and I have put together a release candidate (RC3) for Hadoop
> > > 3.3.5.
> > > >
> > > > What we would like is for anyone who can to verify the tarballs,
> > > especially
> > > > anyone who can try the arm64 binaries as we want to include them too.
> > > >
> > > > The RC is available at:
> > > > https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC3/
> > > >
> > > > The git tag is release-3.3.5-RC3, commit 706d88266ab
> > > >
> > > > The maven artifacts are staged at
> > > >
> > https://repository.apache.org/content/repositories/orgapachehadoop-1369/
> > > >
> > > > You can find my public key at:
> > > > https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
> > > >
> > > > Change log
> > > >
> > >
> >
> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC3/CHANGELOG.md
> > > >
> > > > Release notes
> > > >
> > > >
> > >
> >
> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC3/RELEASENOTES.md
> > > >
> > > > This is off branch-3.3 and is the first big release since 3.3.2.
> > > >
> > > > Key changes include
> > > >
> > > > * Big update of dependencies to try and keep those reports of
> > > >   transitive CVEs under control -both genuine and false positives.
> > > > * HDFS RBF enhancements
> > > > * Critical fix to ABFS input stream prefetching for correct reading.
> > > > * Vectored IO API for all FSDataInputStream implementations, with
> > > >   high-performance versions for file:// and s3a:// filesystems.
> > > >   file:// through java native io
> > > >   s3a:// parallel GET requests.
> > > > * This release includes Arm64 binaries. Please can anyone with
> > > >   compatible systems validate these.
> > > > * and compared to the previous RC, all the major changes are
> > > >   HDFS issues.
> > > >
> > > > Note, because the arm64 binaries are built separately on a different
> > > > platform and JVM, their jar files may not match those of the x86
> > > > release -and therefore the maven artifacts. I don't think this is
> > > > an issue (the ASF actually releases source tarballs, the binaries are
> > > > there for help only, though with the maven repo that's a bit
> blurred).
> > > >
> > > > The only way to be consistent would actually untar the x86.tar.gz,
> > > > overwrite its binaries with the arm stuff, retar, sign and push out
> > > > for the vote. Even automating that would be risky.
> > > >
> > > > Please try the release and vote. The vote will run for 5 days.
> > > >
> > > > -Steve
> > > >
> > >
> >
>

[jira] [Resolved] (HDFS-16891) Avoid the overhead of copy-on-write exception list while loading inodes sub sections in parallel

2023-01-18 Thread Chris Nauroth (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-16891.
--
Fix Version/s: 3.4.0
   3.3.9
   Resolution: Fixed

> Avoid the overhead of copy-on-write exception list while loading inodes sub 
> sections in parallel
> 
>
> Key: HDFS-16891
> URL: https://issues.apache.org/jira/browse/HDFS-16891
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.4
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> If we enable parallel loading and persisting of inodes from/to fs image, we 
> get the benefit of improved performance. However, while loading sub-sections 
> INODE_DIR_SUB and INODE_SUB, if we encounter any errors, we use copy-on-write 
> list to maintain the list of exceptions. Since our usecase is not to iterate 
> over this list while executor threads are adding new elements to the list, 
> using copy-on-write is bit of an overhead for this usecase.
> It would be better to synchronize adding new elements to the list rather than 
> having the list copy all elements over every time new element is added to the 
> list.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-16887) Log start and end of phase/step in startup progress

2023-01-12 Thread Chris Nauroth (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-16887.
--
Fix Version/s: 3.4.0
   3.2.5
   3.3.9
   Resolution: Fixed

> Log start and end of phase/step in startup progress
> ---
>
> Key: HDFS-16887
> URL: https://issues.apache.org/jira/browse/HDFS-16887
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.5, 3.3.9
>
>
> As part of Namenode startup progress, we have multiple phases and steps 
> within phase that are instantiated. While the startup progress view can be 
> instantiated with the current view of phase/step, having at least DEBUG logs 
> for startup progress would be helpful to identify when a particular step for 
> LOADING_FSIMAGE/SAVING_CHECKPOINT/LOADING_EDITS was started and ended.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Re: [VOTE] Release Apache Hadoop 3.3.5

2023-01-04 Thread Chris Nauroth

Is it a problem limited to MiniDFSCluster, or is it a broader problem of
RPC client resource cleanup? The patch is changing connection close
cleanup, so I assumed the latter. If so, then it could potentially impact
applications integrating with the RPC clients.

If the problem is limited to MiniDFSCluster and restarts within a single
JVM, then I agree the impact is smaller. Then, we'd want to consider what
downstream projects have tests that do restarts on a MiniDFSCluster.

Chris Nauroth


On Wed, Jan 4, 2023 at 4:22 PM Ayush Saxena  wrote:

> Hmm I'm looking at HADOOP-11867 related stuff but couldn't find it
>> mentioned anywhere in change log or release notes. Are they actually
>> up-to-date?
>
>
> I don't think there is any issue with the ReleaseNotes generation as such
> but with the Resolution type of this ticket, It ain't marked as Fixed but
> Done. The other ticket which is marked Done is also not part of the release
> notes. [1]
>
> if I'm understanding the potential impact of HDFS-16853
>> correctly, then it's serious enough to fix before a release. (I could
>> change my vote if someone wants to make a case that it's not that
>> serious.)
>>
>
> Chris, I just had a very quick look at HDFS-16853, I am not sure if this
> can happen outside a MiniDfsCluster setup? Just guessing from the
> description in the ticket. It looked like when we did a restart of the
> Namenode in the MiniDfsCluster, I guess that would be in the same single
> JVM, and that is why a previous blocked thread caused issues with the
> restart. That is what I understood, I haven't checked the code though.
>
> Second, In the same context, Being curious If this lands up being a
> MiniDfsCluster only issue, do we still consider this a release blocker? Not
> saying in a way it won't be serious, MiniDfsCluster is very widely used by
> downstream projects and all, so just wanted to know
>
> Regarding the Hive & Bouncy castle. The PR seems to have a valid binding
> veto, I am not sure if it will get done any time soon, so if the use case
> is something required, I would suggest handling it at Hadoop itself. It
> seems to be centric to Hive-3.x, I tried compiling the Hive master branch
> with 3.3.5 and it passed. Other than that Hive officially support only
> Hadoop-3.3.1 and that too only in the last 4.x release[2]
>
>
> [1]
> https://issues.apache.org/jira/browse/HADOOP-11867?jql=project%20%3D%20HADOOP%20AND%20resolution%20%3D%20Done%20AND%20fixVersion%20%3D%203.3.5%20ORDER%20BY%20resolution%20DESC
> [2] https://issues.apache.org/jira/browse/HIVE-24484
>
> -Ayush
>
> On Tue, 3 Jan 2023 at 23:51, Chris Nauroth  wrote:
>
>> -1, because if I'm understanding the potential impact of HDFS-16853
>> correctly, then it's serious enough to fix before a release. (I could
>> change my vote if someone wants to make a case that it's not that
>> serious.)
>>
>> Otherwise, this RC was looking good:
>>
>> * Verified all checksums.
>> * Verified all signatures.
>> * Built from source, including native code on Linux.
>> * mvn clean package -Pnative -Psrc -Drequire.openssl -Drequire.snappy
>> -Drequire.zstd -DskipTests
>> * Tests passed.
>> * mvn --fail-never clean test -Pnative -Dparallel-tests
>> -Drequire.snappy -Drequire.zstd -Drequire.openssl
>> -Dsurefire.rerunFailingTestsCount=3 -DtestsThreadCount=8
>> * Checked dependency tree to make sure we have all of the expected library
>> updates that are mentioned in the release notes.
>> * mvn -o dependency:tree
>> * Farewell, S3Guard.
>> * Confirmed that hadoop-openstack is now just a stub placeholder artifact
>> with no code.
>> * For ARM verification:
>> * Ran "file " on all native binaries in the ARM tarball to confirm
>> they actually came out with ARM as the architecture.
>> * Output of hadoop checknative -a on ARM looks good.
>> * Ran a MapReduce job with the native bzip2 codec for compression, and
>> it worked fine.
>> * Ran a MapReduce job with YARN configured to use
>> LinuxContainerExecutor and verified launching the containers through
>> container-executor worked.
>>
>> My local setup didn't have the test failures mentioned by Viraj, though
>> there was some flakiness with a few HDFS snapshot tests timing out.
>>
>> Regarding Hive and Bouncy Castle, there is an existing issue and pull
>> request tracking an upgrade attempt. It's looking like some amount of code
>> changes are required:
>>
>> https://issues.apache.org/jira/browse/HIVE-26648
>> https://github.com/apache/hive/pull/3744

Re: [VOTE] Release Apache Hadoop 3.3.5

2023-01-03 Thread Chris Nauroth

-1, because if I'm understanding the potential impact of HDFS-16853
correctly, then it's serious enough to fix before a release. (I could
change my vote if someone wants to make a case that it's not that serious.)

Otherwise, this RC was looking good:

* Verified all checksums.
* Verified all signatures.
* Built from source, including native code on Linux.
* mvn clean package -Pnative -Psrc -Drequire.openssl -Drequire.snappy
-Drequire.zstd -DskipTests
* Tests passed.
* mvn --fail-never clean test -Pnative -Dparallel-tests
-Drequire.snappy -Drequire.zstd -Drequire.openssl
-Dsurefire.rerunFailingTestsCount=3 -DtestsThreadCount=8
* Checked dependency tree to make sure we have all of the expected library
updates that are mentioned in the release notes.
* mvn -o dependency:tree
* Farewell, S3Guard.
* Confirmed that hadoop-openstack is now just a stub placeholder artifact
with no code.
* For ARM verification:
* Ran "file " on all native binaries in the ARM tarball to confirm
they actually came out with ARM as the architecture.
* Output of hadoop checknative -a on ARM looks good.
* Ran a MapReduce job with the native bzip2 codec for compression, and
it worked fine.
* Ran a MapReduce job with YARN configured to use
LinuxContainerExecutor and verified launching the containers through
container-executor worked.

My local setup didn't have the test failures mentioned by Viraj, though
there was some flakiness with a few HDFS snapshot tests timing out.

Regarding Hive and Bouncy Castle, there is an existing issue and pull
request tracking an upgrade attempt. It's looking like some amount of code
changes are required:

https://issues.apache.org/jira/browse/HIVE-26648
https://github.com/apache/hive/pull/3744

Chris Nauroth


On Tue, Jan 3, 2023 at 8:57 AM Chao Sun  wrote:

> Hmm I'm looking at HADOOP-11867 related stuff but couldn't find it
> mentioned anywhere in change log or release notes. Are they actually
> up-to-date?
>
> On Mon, Jan 2, 2023 at 7:48 AM Masatake Iwasaki
>  wrote:
> >
> > >- building HBase 2.4.13 and Hive 3.1.3 against 3.3.5 failed due to
> dependency change.
> >
> > For HBase, classes under com/sun/jersey/json/* and com/sun/xml/* are not
> expected in hbase-shaded-with-hadoop-check-invariants.
> > Updating hbase-shaded/pom.xml is expected to be the fix as done in
> HBASE-27292.
> >
> https://github.com/apache/hbase/commit/00612106b5fa78a0dd198cbcaab610bd8b1be277
> >
> >[INFO] --- exec-maven-plugin:1.6.0:exec
> (check-jar-contents-for-stuff-with-hadoop) @
> hbase-shaded-with-hadoop-check-invariants ---
> >[ERROR] Found artifact with unexpected contents:
> '/home/rocky/srcs/bigtop/build/hbase/rpm/BUILD/hbase-2.4.13/hbase-shaded/hbase-shaded-client/target/hbase-shaded-client-2.4.13.jar'
> >Please check the following and either correct the build or update
> >the allowed list with reasoning.
> >
> >com/
> >com/sun/
> >com/sun/jersey/
> >com/sun/jersey/json/
> >...
> >
> >
> > For Hive, classes belonging to org.bouncycastle:bcprov-jdk15on:1.68 seem
> to be problematic.
> > Excluding them on hive-jdbc  might be the fix.
> >
> >[ERROR] Failed to execute goal
> org.apache.maven.plugins:maven-shade-plugin:3.2.1:shade (default) on
> project hive-jdbc: Error creating shaded jar: Problem shading JAR
> /home/rocky/.m2/repository/org/bouncycastle/bcprov-jdk15on/1.68/bcprov-jdk15on-1.68.jar
> entry
> META-INF/versions/15/org/bouncycastle/jcajce/provider/asymmetric/edec/SignatureSpi$EdDSA.class:
> java.lang.IllegalArgumentException: Unsupported class file major version 59
> -> [Help 1]
> >...
> >
> >
> > On 2023/01/02 22:02, Masatake Iwasaki wrote:
> > > Thanks for your great effort for the new release, Steve and Mukund.
> > >
> > > +1 while it would be nice if we can address missed Javadocs.
> > >
> > > + verified the signature and checksum.
> > > + built from source tarball on Rocky Linux 8 and OpenJDK 8 with native
> profile enabled.
> > >+ launched pseudo distributed cluster including kms and httpfs with
> Kerberos and SSL enabled.
> > >+ created encryption zone, put and read files via httpfs.
> > >+ ran example MR wordcount over encryption zone.
> > > + built rpm packages by Bigtop and ran smoke-tests on Rocky Linux 8
> (both x86_64 and aarch64).
> > >- building HBase 2.4.13 and Hive 3.1.3 against 3.3.5 failed due to
> dependency change.
> > >  # while building HBase 2.4.13 and Hive 3.1.3 against Hadoop 3.3.4
> worked.
> > > + skimmed the site contents.
> > >-

Re: [VOTE] Release Apache Hadoop 3.3.5

2022-12-27 Thread Chris Nauroth

I'm not quite ready to vote yet, pending some additional testing.

However, I wanted to give a quick update that ARM support is looking good
from my perspective. I focused on verifying the native bits that would need
to be different for ARM vs. x64. Here is what I did:
* Ran "file " on all native binaries in the ARM tarball to confirm they
actually came out with ARM as the architecture.
* Output of hadoop checknative -a on ARM looks good.
* Ran a MapReduce job with the native bzip2 codec for compression, and it
worked fine.
* Ran a MapReduce job with YARN configured to use LinuxContainerExecutor
and verified launching the containers through container-executor worked.

Chris Nauroth


On Wed, Dec 21, 2022 at 11:29 AM Steve Loughran 
wrote:

> Mukund and I have put together a release candidate (RC0) for Hadoop 3.3.5.
>
> Given the time of year it's a bit unrealistic to run a 5 day vote and
> expect people to be able to test it thoroughly enough to make this the one
> we can ship.
>
> What we would like is for anyone who can to verify the tarballs, and test
> the binaries, especially anyone who can try the arm64 binaries. We've got
> the building of those done and now the build file will incorporate them
> into the release -but neither of us have actually tested it yet. Maybe I
> should try it on my pi400 over xmas.
>
> The maven artifacts are up on the apache staging repo -they are the ones
> from x86 build. Building and testing downstream apps will be incredibly
> helpful.
>
> The RC is available at:
> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC0/
>
> The git tag is release-3.3.5-RC0, commit 3262495904d
>
> The maven artifacts are staged at
> https://repository.apache.org/content/repositories/orgapachehadoop-1365/
>
> You can find my public key at:
> https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
>
> Change log
> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC0/CHANGELOG.md
>
> Release notes
>
> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC0/RELEASENOTES.md
>
> This is off branch-3.3 and is the first big release since 3.3.2.
>
> Key changes include
>
> * Big update of dependencies to try and keep those reports of
>   transitive CVEs under control -both genuine and false positive.
> * HDFS RBF enhancements
> * Critical fix to ABFS input stream prefetching for correct reading.
> * Vectored IO API for all FSDataInputStream implementations, with
>   high-performance versions for file:// and s3a:// filesystems.
>   file:// through java native io
>   s3a:// parallel GET requests.
> * This release includes Arm64 binaries. Please can anyone with
>   compatible systems validate these.
>
>
> Please try the release and vote on it, even though i don't know what is a
> good timeline here...i'm actually going on holiday in early jan. Mukund is
> around and so can drive the process while I'm offline.
>
> Assuming we do have another iteration, the RC1 will not be before mid jan
> for that reason
>
> Steve (and mukund)
>

[jira] [Resolved] (HDFS-8510) Provide different timeout settings for hdfs dfsadmin -getDatanodeInfo.

2022-10-25 Thread Chris Nauroth (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-8510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-8510.
-
Resolution: Won't Fix

This is an old improvement proposal that I'm no longer planning on 
implementing. I'm going to close the issue. If anyone else would find it 
useful, please feel free to reopen and reassign. I'd be happy to help with code 
review.

> Provide different timeout settings for hdfs dfsadmin -getDatanodeInfo.
> --
>
> Key: HDFS-8510
> URL: https://issues.apache.org/jira/browse/HDFS-8510
> Project: Hadoop HDFS
>  Issue Type: Improvement
>      Components: tools
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Major
>
> During a rolling upgrade, an administrator runs {{hdfs dfsadmin 
> -getDatanodeInfo}} to check if a DataNode has stopped.  Currently, this 
> operation is subject to the RPC connection retries defined in 
> {{ipc.client.connect.max.retries}} and {{ipc.client.connect.retry.interval}}. 
>  This issue proposes adding separate configuration properties to control the 
> retries for this operation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-4289) FsDatasetImpl#updateReplicaUnderRecovery throws errors validating replica byte count on Windows

2022-10-25 Thread Chris Nauroth (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-4289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-4289.
-
Resolution: Won't Fix

I'm no longer actively working on this. I no longer have easy access to a 
Windows environment to make Windows-specific changes, or even to confirm that 
this test failure still happens. It's a very old issue with no recent activity, 
so I'm going to assume it's no longer relevant and close it out. If it's still 
an ongoing issue that a Windows developer wants to pick up, please feel free to 
reopen and reassign.

> FsDatasetImpl#updateReplicaUnderRecovery throws errors validating replica 
> byte count on Windows
> ---
>
> Key: HDFS-4289
> URL: https://issues.apache.org/jira/browse/HDFS-4289
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>    Affects Versions: trunk-win
>        Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Major
>
> {{FsDatasetImpl#updateReplicaUnderRecovery}} throws errors validating replica 
> byte count on Windows.  This can be seen by running 
> {{TestBalancerWithNodeGroup#testBalancerWithRackLocality}}, which fails on 
> Windows.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Re: [VOTE] Release Apache Hadoop 3.3.4

2022-08-03 Thread Chris Nauroth

+1 (binding)

* Verified all checksums.
* Verified all signatures.
* Built from source, including native code on Linux.
* mvn clean package -Pnative -Psrc -Drequire.openssl -Drequire.snappy
-Drequire.zstd -DskipTests
* Tests passed.
* mvn --fail-never clean test -Pnative -Dparallel-tests
-Drequire.snappy -Drequire.zstd -Drequire.openssl
-Dsurefire.rerunFailingTestsCount=3 -DtestsThreadCount=8
* Checked dependency tree to make sure we have all of the expected library
updates that are mentioned in the release notes.
* mvn -o dependency:tree

I saw a LibHDFS test failure, but I know it's something flaky that's
already tracked in a JIRA issue. The release looks good. Steve, thank you
for driving this.

Chris Nauroth


On Wed, Aug 3, 2022 at 11:27 AM Steve Loughran 
wrote:

> my vote for this is +1, binding.
>
> obviously I`m biased, but i do not want to have to issue any more interim
> releases before the feature release off branch-3.3, so I am trying to be
> ruthless.
>
> my client vaidator ant project has a more targets to help with releasing,
> and now builds a lot mor of my local projects
> https://github.com/steveloughran/validate-hadoop-client-artifacts
> all good as far as my test coverage goes, with these projects validating
> the staged dependencies.
>
> now, who else can review
>
> On Fri, 29 Jul 2022 at 19:47, Steve Loughran  wrote:
>
> >
> >
> > I have put together a release candidate (RC1) for Hadoop 3.3.4
> >
> > The RC is available at:
> > https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.4-RC1/
> >
> > The git tag is release-3.3.4-RC1, commit a585a73c3e0
> >
> > The maven artifacts are staged at
> > https://repository.apache.org/content/repositories/orgapachehadoop-1358/
> >
> > You can find my public key at:
> > https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
> >
> > Change log
> >
> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.4-RC1/CHANGELOG.md
> >
> > Release notes
> >
> >
> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.4-RC1/RELEASENOTES.md
> >
> > There's a very small number of changes, primarily critical code/packaging
> > issues and security fixes.
> >
> > See the release notes for details.
> >
> > Please try the release and vote. The vote will run for 5 days.
> >
> > steve
> >
>

Re: [VOTE] Release Apache Hadoop 3.2.4 - RC0

2022-07-21 Thread Chris Nauroth

I'm changing my vote to +1 (binding).

Masatake and Ashutosh, thank you for investigating.

I reran tests without the parallel options, and that mostly addressed the
failures. Maybe the tests in question are just not sufficiently isolated to
support parallel execution. That looks to be the case for TestFsck, where
the failure was caused by missing audit log entries. This test works by
toggling global logging state, so I can see why multi-threaded execution
might confuse the test.

Chris Nauroth


On Thu, Jul 21, 2022 at 12:01 AM Ashutosh Gupta 
wrote:

> +1(non-binding)
>
> * Builds from source look good.
> * Checksums and signatures are correct.
> * Running basic HDFS and MapReduce commands looks good.
>
> > * TestAMRMProxy - Not able to reproduce in local
> > * TestFsck - I can see failure only I can see is
>  TestFsck.testFsckListCorruptSnapshotFiles which passed after applying
> HDFS-15038
> > * TestSLSStreamAMSynth - Not able to reproduce in local
> > * TestServiceAM - Not able to reproduce in local
>
> Thanks Masatake for driving this release.
>
> On Thu, Jul 21, 2022 at 5:51 AM Masatake Iwasaki <
> iwasak...@oss.nttdata.com>
> wrote:
>
> > Hi developers,
> >
> > I'm still waiting for your vote.
> > I'm considering the intermittent test failures mentioned by Chris are not
> > blocker.
> > Please file a JIRA and let me know if you find a blocker issue.
> >
> > I will appreciate your help for the release process.
> >
> > Regards,
> > Masatake Iwasaki
> >
> > On 2022/07/20 14:50, Masatake Iwasaki wrote:
> > >> TestServiceAM
> > >
> > > I can see the reported failure of TestServiceAM in some "Apache Hadoop
> > qbt Report: branch-3.2+JDK8 on Linux/x86_64".
> > > 3.3.0 and above might be fixed by YARN-8867 which added guard using
> > GenericTestUtils#waitFor for stabilizing the
> > testContainersReleasedWhenPreLaunchFails.
> > > YARN 8867 did not modified other code under hadoop-yarn-services.
> > > If it is the case, TestServiceAM can be tagged as flaky in branch-3.2.
> > >
> > >
> > > On 2022/07/20 14:21, Masatake Iwasaki wrote:
> > >> Thanks for testing the RC0, Chris.
> > >>
> > >>> The following are new test failures for me on 3.2.4:
> > >>> * TestAMRMProxy
> > >>> * TestFsck
> > >>> * TestSLSStreamAMSynth
> > >>> * TestServiceAM
> > >>
> > >> I could not reproduce the test failures on my local.
> > >>
> > >> For TestFsck, if the failed test case is
> > testFsckListCorruptSnapshotFiles,
> > >> cherry-picking HDFS-15038 (fixing only test code) could be the fix.
> > >>
> > >> The failure of TestSLSStreamAMSynth looks frequently reported by
> > >> "Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86_64".
> > >> It could be tagged as known flaky test.
> > >>
> > >> On 2022/07/20 9:15, Chris Nauroth wrote:
> > >>> -0 (binding)
> > >>>
> > >>> * Verified all checksums.
> > >>> * Verified all signatures.
> > >>> * Built from source, including native code on Linux.
> > >>>  * mvn clean package -Pnative -Psrc -Drequire.openssl
> > -Drequire.snappy
> > >>> -Drequire.zstd -DskipTests
> > >>> * Tests mostly passed, but see below.
> > >>>  * mvn --fail-never clean test -Pnative -Dparallel-tests
> > >>> -Drequire.snappy -Drequire.zstd -Drequire.openssl
> > >>> -Dsurefire.rerunFailingTestsCount=3 -DtestsThreadCount=8
> > >>>
> > >>> The following are new test failures for me on 3.2.4:
> > >>> * TestAMRMProxy
> > >>> * TestFsck
> > >>> * TestSLSStreamAMSynth
> > >>> * TestServiceAM
> > >>>
> > >>> The following tests also failed, but they also fail for me on 3.2.3,
> so
> > >>> they aren't likely to be related to this release candidate:
> > >>> * TestCapacitySchedulerNodeLabelUpdate
> > >>> * TestFrameworkUploader
> > >>> * TestSLSGenericSynth
> > >>> * TestSLSRunner
> > >>> * test_libhdfs_threaded_hdfspp_test_shim_static
> > >>>
> > >>> I'm not voting a full -1, because I haven't done any root cause
> > analysis on
> > >>> these new test failures. I don't know if it's a quirk to my
> > environment,
> > >>> though I&

Re: [VOTE] Release Apache Hadoop 3.2.4 - RC0

2022-07-19 Thread Chris Nauroth

-0 (binding)

* Verified all checksums.
* Verified all signatures.
* Built from source, including native code on Linux.
* mvn clean package -Pnative -Psrc -Drequire.openssl -Drequire.snappy
-Drequire.zstd -DskipTests
* Tests mostly passed, but see below.
* mvn --fail-never clean test -Pnative -Dparallel-tests
-Drequire.snappy -Drequire.zstd -Drequire.openssl
-Dsurefire.rerunFailingTestsCount=3 -DtestsThreadCount=8

The following are new test failures for me on 3.2.4:
* TestAMRMProxy
* TestFsck
* TestSLSStreamAMSynth
* TestServiceAM

The following tests also failed, but they also fail for me on 3.2.3, so
they aren't likely to be related to this release candidate:
* TestCapacitySchedulerNodeLabelUpdate
* TestFrameworkUploader
* TestSLSGenericSynth
* TestSLSRunner
* test_libhdfs_threaded_hdfspp_test_shim_static

I'm not voting a full -1, because I haven't done any root cause analysis on
these new test failures. I don't know if it's a quirk to my environment,
though I'm using the start-build-env.sh Docker container, so any build
dependencies should be consistent. I'd be comfortable moving ahead if
others are seeing these tests pass.

Chris Nauroth


On Thu, Jul 14, 2022 at 7:57 AM Masatake Iwasaki 
wrote:

> +1 from myself.
>
> * skimmed the contents of site documentation.
>
> * built the source tarball on Rocky Linux 8 (x86_64) by OpenJDK 8 with
> `-Pnative`.
>
> * launched pseudo distributed cluster including kms and httpfs with
> Kerberos and SSL enabled.
>
>* created encryption zone, put and read files via httpfs.
>* ran example MR wordcount over encryption zone.
>
> * launched 3-node docker cluster with NN-HA and RM-HA enabled and ran some
> example MR jobs.
>
> * built HBase 2.4.11, Hive 3.1.2 and Spark 3.1.2 against Hadoop 3.2.4 RC0
>on CentOS 7 (x86_64) by using Bigtop branch-3.1 and ran smoke-tests.
>https://github.com/apache/bigtop/pull/942
>
>* Hive needs updating exclusion rule to address HADOOP-18088 (migration
> to reload4j).
>
> * built Spark 3.3.0 against Hadoop 3.2.4 RC0 using the staging repository::
>
>  
> staged
> staged-releases
> 
> https://repository.apache.org/content/repositories/orgapachehadoop-1354
> 
> 
>   true
> 
> 
>   true
> 
>   
>
> Thanks,
> Masatake Iwasaki
>
> On 2022/07/13 1:14, Masatake Iwasaki wrote:
> > Hi all,
> >
> > Here's Hadoop 3.2.4 release candidate #0:
> >
> > The RC is available at:
> >https://home.apache.org/~iwasakims/hadoop-3.2.4-RC0/
> >
> > The RC tag is at:
> >https://github.com/apache/hadoop/releases/tag/release-3.2.4-RC0
> >
> > The Maven artifacts are staged at:
> >
> https://repository.apache.org/content/repositories/orgapachehadoop-1354
> >
> > You can find my public key at:
> >https://downloads.apache.org/hadoop/common/KEYS
> >
> > Please evaluate the RC and vote.
> > The vote will be open for (at least) 5 days.
> >
> > Thanks,
> > Masatake Iwasaki
> >
> > -
> > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> > For additional commands, e-mail: common-dev-h...@hadoop.apache.org
> >
>
> -
> To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
>
>

Re: HDFS-16577 - Request for input - Let administrator override connection details when registering datanodes

2022-06-24 Thread Chris Nauroth

Hello Lars,

I can't say I've personally run HDFS on Kubernetes with Kerberos enabled.
However, some of the issues you raise sound like they have some overlap
with the HDFS multi-homing features:

https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html

Have you seen this? Does anything look helpful there?

Chris Nauroth


On Fri, Jun 24, 2022 at 4:55 AM Lars Francke  wrote:

> Hi everyone,
>
> we're trying to get HDFS running in Kubernetes using Kerberos.
> This has some challenges as you might expect.
> We have created an issue for that including a spike:
> https://issues.apache.org/jira/browse/HDFS-16577
>
> Currently (as of 3.2.2, but reading through the release notes this doesn't
> seem to have changed since then) DataNodes use the same properties for
> deciding which port to bind each service to, as for deciding which ports
> are included in the `DatanodeRegistration` sent to the NameNode. Further,
> NameNodes overwrite the DataNode's IP address with the incoming address
> during registration.
>
> Both of these prevent external users from connecting to DataNodes that are
> hosted behind some sort of NAT (such as Kubernetes).
>
> We'd go ahead with a proper implementation/PR but we thought about asking
> for comments/feedback first. Maybe someone else has already done some work
> here that we might have missed etc.
>
> Thank you!
>
> Cheers,
> Lars
>

[jira] [Resolved] (HDFS-16623) IllegalArgumentException in LifelineSender

2022-06-10 Thread Chris Nauroth (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-16623.
--
Hadoop Flags: Reviewed
  Resolution: Fixed

I have committed this to trunk, branch-3.3 and branch-3.2. [~xuzq_zander] , 
thank you for the contribution.

> IllegalArgumentException in LifelineSender
> --
>
> Key: HDFS-16623
> URL: https://issues.apache.org/jira/browse/HDFS-16623
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.4
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> In our production environment, an IllegalArgumentException occurred in the 
> LifelineSender at one DataNode which was undergoing GC at that time. 
> And the bug code is at line 1060 in BPServiceActor.java, because the sleep 
> time is negative.
> {code:java}
> while (shouldRun()) {
>  try {
> if (lifelineNamenode == null) {
>   lifelineNamenode = dn.connectToLifelineNN(lifelineNnAddr);
> }
> sendLifelineIfDue();
> Thread.sleep(scheduler.getLifelineWaitTime());
>   } catch (InterruptedException e) {
> Thread.currentThread().interrupt();
>   } catch (IOException e) {
> LOG.warn("IOException in LifelineSender for " + BPServiceActor.this, 
> e);
>  }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Re: [VOTE] Release Apache Hadoop 2.10.2 - RC0

2022-05-29 Thread Chris Nauroth

+1 (binding)

* Verified all checksums.
* Verified all signatures.
* Built from source, including native code on Linux.
* mvn clean package -Pnative -Psrc -Drequire.openssl -Drequire.snappy
-Drequire.zstd -DskipTests
* Almost all unit tests passed.
* mvn clean test -Pnative -Dparallel-tests -Drequire.snappy
-Drequire.zstd -Drequire.openssl -Dsurefire.rerunFailingTestsCount=3
-DtestsThreadCount=8
* TestBookKeeperHACheckpoints consistently has a few failures.
* TestCapacitySchedulerNodeLabelUpdate is flaky, intermittently timing
out.

These test failures don't look significant enough to hold up a release, so
I'm still voting +1.

Chris Nauroth


On Sun, May 29, 2022 at 2:35 AM Masatake Iwasaki <
iwasak...@oss.nttdata.co.jp> wrote:

> Thanks for the help, Ayush.
>
> I committed HADOOP-16663/HADOOP-16664 and cherry-picked HADOOP-16985 to
> branch-2.10 (and branch-3.2).
> If I need to cut RC1, I will try cherry-picking them to branch-2.10.2
>
> Masatake Iwasaki
>
>
> On 2022/05/28 5:23, Ayush Saxena wrote:
> > The checksum stuff was addressed in HADOOP-16985, so that filename stuff
> is
> > sorted only post 3.3.x
> > BTW it is a known issue:
> >
> https://issues.apache.org/jira/browse/HADOOP-16494?focusedCommentId=16927236&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16927236
> >
> > Must not be a blocker for us
> >
> > The RAT check failing with dependency issue. That also should work post
> > 3.3.x because there is no Hadoop-maven-plugin dependency in
> Hadoop-yarn-api
> > module post 3.3.x, HADOOP-16560 removed it.
> > Ref:
> >
> https://github.com/apache/hadoop/pull/1496/files#diff-f5d219eaf211871f9527ae48da59586e7e9958ea7649de74a1393e599caa6dd6L121-R122
> >
> > So, that is why the RAT check passes for 3.3.x+ without the need of this
> > module. Committing HADOOP-16663, should solve this though.(I haven't
> tried
> > though, just by looking at the problem)
> >
> > Good to have patches, but doesn't look like blockers to me. kind of build
> > related stuffs only, nothing bad with our core Hadoop code.
> >
> > -Ayush
> >
> > On Sat, 28 May 2022 at 01:04, Viraj Jasani  wrote:
> >
> >> +0 (non-binding),
> >>
> >> * Signature/Checksum looks good, though I am not sure where
> >> "target/artifacts" is coming from for the tars, here is the diff (this
> was
> >> the case for 2.10.1 as well but checksum was correct):
> >>
> >> 1c1
> >> < SHA512 (hadoop-2.10.2-site.tar.gz) =
> >>
> >>
> 3055a830003f5012660d92da68a317e15da5b73301c2c73cf618e724c67b7d830551b16928e0c28c10b66f04567e4b6f0b564647015bacc4677e232c0011537f
> >> ---
> >>> SHA512 (target/artifacts/hadoop-2.10.2-site.tar.gz) =
> >>
> >>
> 3055a830003f5012660d92da68a317e15da5b73301c2c73cf618e724c67b7d830551b16928e0c28c10b66f04567e4b6f0b564647015bacc4677e232c0011537f
> >> 1c1
> >> < SHA512 (hadoop-2.10.2-src.tar.gz) =
> >>
> >>
> 483b6a4efd44234153e21ffb63a9f551530a1627f983a8837c655ce1b8ef13486d7178a7917ed3f35525c338e7df9b23404f4a1b0db186c49880448988b88600
> >> ---
> >>> SHA512 (target/artifacts/hadoop-2.10.2-src.tar.gz) =
> >>
> >>
> 483b6a4efd44234153e21ffb63a9f551530a1627f983a8837c655ce1b8ef13486d7178a7917ed3f35525c338e7df9b23404f4a1b0db186c49880448988b88600
> >> 1c1
> >> < SHA512 (hadoop-2.10.2.tar.gz) =
> >>
> >>
> 13e95907073d815e3f86cdcc24193bb5eec0374239c79151923561e863326988c7f32a05fb7a1e5bc962728deb417f546364c2149541d6234221b00459154576
> >> ---
> >>> SHA512 (target/artifacts/hadoop-2.10.2.tar.gz) =
> >>
> >>
> 13e95907073d815e3f86cdcc24193bb5eec0374239c79151923561e863326988c7f32a05fb7a1e5bc962728deb417f546364c2149541d6234221b00459154576
> >>
> >> However, checksums are correct.
> >>
> >> * Builds from source look good
> >>   - mvn clean install  -DskipTests
> >>   - mvn clean package  -Pdist -DskipTests -Dtar
> -Dmaven.javadoc.skip=true
> >>
> >> * Rat check, if run before building from source locally, fails with
> error:
> >>
> >> [ERROR] Plugin org.apache.hadoop:hadoop-maven-plugins:2.10.2 or one of
> its
> >> dependencies could not be resolved: Could not find artifact
> >> org.apache.hadoop:hadoop-maven-plugins:jar:2.10.2 in central (
> >> https://repo.maven.apache.org/maven2) -> [Help 1]
> >> [ERROR]
> >>
> >> However, once we build locally, rat check passes (because
> >> hadoop-maven-plugins 2.1

Re: [VOTE] Release Apache Hadoop 3.3.3 (RC1)

2022-05-16 Thread Chris Nauroth

+1 (binding)

- Verified all checksums.
- Verified all signatures.
- Built from source, including native code on Linux.
- Ran several examples successfully.

Chris Nauroth


On Mon, May 16, 2022 at 10:06 AM Chao Sun  wrote:

> +1
>
> - Compiled from source
> - Verified checksums & signatures
> - Launched a pseudo HDFS cluster and ran some simple commands
> - Ran full Spark tests with the RC
>
> Thanks Steve!
>
> Chao
>
> On Mon, May 16, 2022 at 2:19 AM Ayush Saxena  wrote:
> >
> > +1,
> > * Built from source.
> > * Successful native build on Ubuntu 18.04
> > * Verified Checksums.
> >
> (CHANGELOG.md,RELEASENOTES.md,hadoop-3.3.3-rat.txt,hadoop-3.3.3-site.tar.gz,hadoop-3.3.3-src.tar.gz,hadoop-3.3.3.tar.gz)
> > * Verified Signature.
> > * Successful RAT check
> > * Ran basic HDFS shell commands.
> > * Ran basic YARN shell commands.
> > * Verified version in hadoop version command and UI
> > * Ran some MR example Jobs.
> > * Browsed UI(Namenode/Datanode/ResourceManager/NodeManager/HistoryServer)
> > * Browsed the contents of Maven Artifacts.
> > * Browsed the contents of the website.
> >
> > Thanx Steve for driving the release, Good Luck!!!
> >
> > -Ayush
> >
> > On Mon, 16 May 2022 at 08:20, Xiaoqiao He  wrote:
> >
> > > +1(binding)
> > >
> > > * Verified signature and checksum of the source tarball.
> > > * Built the source code on Ubuntu and OpenJDK 11 by `mvn clean package
> > > -DskipTests -Pnative -Pdist -Dtar`.
> > > * Setup pseudo cluster with HDFS and YARN.
> > > * Run simple FsShell - mkdir/put/get/mv/rm and check the result.
> > > * Run example mr applications and check the result - Pi & wordcount.
> > > * Check the Web UI of NameNode/DataNode/Resourcemanager/NodeManager
> etc.
> > >
> > > Thanks Steve for your work.
> > >
> > > - He Xiaoqiao
> > >
> > > On Mon, May 16, 2022 at 4:25 AM Viraj Jasani 
> wrote:
> > > >
> > > > +1 (non-binding)
> > > >
> > > > * Signature: ok
> > > > * Checksum : ok
> > > > * Rat check (1.8.0_301): ok
> > > >  - mvn clean apache-rat:check
> > > > * Built from source (1.8.0_301): ok
> > > >  - mvn clean install  -DskipTests
> > > > * Built tar from source (1.8.0_301): ok
> > > >  - mvn clean package  -Pdist -DskipTests -Dtar
> -Dmaven.javadoc.skip=true
> > > >
> > > > HDFS, MapReduce and HBase (2.5) CRUD functional testing on
> > > > pseudo-distributed mode looks good.
> > > >
> > > >
> > > > On Wed, May 11, 2022 at 10:26 AM Steve Loughran
> > > 
> > > > wrote:
> > > >
> > > > > I have put together a release candidate (RC1) for Hadoop 3.3.3
> > > > >
> > > > > The RC is available at:
> > > > > https://dist.apache.org/repos/dist/dev/hadoop/3.3.3-RC1/
> > > > >
> > > > > The git tag is release-3.3.3-RC1, commit d37586cbda3
> > > > >
> > > > > The maven artifacts are staged at
> > > > >
> > >
> https://repository.apache.org/content/repositories/orgapachehadoop-1349/
> > > > >
> > > > > You can find my public key at:
> > > > > https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
> > > > >
> > > > > Change log
> > > > >
> https://dist.apache.org/repos/dist/dev/hadoop/3.3.3-RC1/CHANGELOG.md
> > > > >
> > > > > Release notes
> > > > >
> > >
> https://dist.apache.org/repos/dist/dev/hadoop/3.3.3-RC1/RELEASENOTES.md
> > > > >
> > > > > There's a very small number of changes, primarily critical
> > > code/packaging
> > > > > issues and security fixes.
> > > > >
> > > > > * The critical fixes which shipped in the 3.2.3 release.
> > > > > * CVEs in our code and dependencies
> > > > > * Shaded client packaging issues.
> > > > > * A switch from log4j to reload4j
> > > > >
> > > > > reload4j is an active fork of the log4j 1.17 library with the
> classes
> > > > > which contain CVEs removed. Even though hadoop never used those
> > > classes,
> > > > > they regularly raised alerts on security scans and concen from
> users.
> > > > > Switching to the forked project allows us to ship a secure logging
> > > > > framework. It will complicate the builds of downstream
> > > > > maven/ivy/gradle projects which exclude our log4j artifacts, as
> they
> > > > > need to cut the new dependency instead/as well.
> > > > >
> > > > > See the release notes for details.
> > > > >
> > > > > This is the second release attempt. It is the same git commit as
> > > before,
> > > > > but
> > > > > fully recompiled with another republish to maven staging, which
> has bee
> > > > > verified by building spark, as well as a minimal test project.
> > > > >
> > > > > Please try the release and vote. The vote will run for 5 days.
> > > > >
> > > > > -Steve
> > > > >
> > >
> > > -
> > > To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
> > > For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
> > >
> > >
>
> -
> To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
>
>

Re: ElasticByteBufferPool is ever growing thus can cause memory leak.

2022-01-12 Thread Chris Nauroth

Thanks for discussing this, Mukund.

Another difference between the 2 classes is that ElasticByteBufferPool
supports a caller's preference for either on-heap or off-heap buffers and
internally calls either allocate or allocateDirect. Do you intend to
preserve that functionality too in a single merged class?

Chris Nauroth

On Wed, Jan 12, 2022 at 1:25 AM Mukund Madhav Thakur
 wrote:

> Hello Everyone,
> I was just browsing through the code while doing my Vectored IO stuff. It
> seems like ElasticByteBufferPool is an ever growing pool and memory is not
> getting released as there is no WeakReference being maintained in the pool.
> This can cause memory leaks in the production environment.  This is widely
> used in places like StripedReconstructor,
> DFSStripedInputStream, BlockBlobAppendStream etc.
>
> I would suggest we use DirectBufferPool class for direct buffer pooling as
> it already is keeping WeakReference for the buffers. Although, we will have
> to make this implement the ByteBufferPool interface and implement the
> corresponding methods. Happy to make the API changes once finalized.
>
> Thanks,
> Mukund
>

Re: [DISCUSS] Hadoop 3.3.2 release?

2021-09-08 Thread Chris Nauroth

+1

Chao, thank you very much for volunteering on the release.

Chris Nauroth


On Tue, Sep 7, 2021 at 10:00 PM Igor Dvorzhak 
wrote:

> +1
>
> On Tue, Sep 7, 2021 at 10:06 AM Chao Sun  wrote:
>
>> Hi all,
>>
>> It has been almost 3 months since the 3.3.1 release and branch-3.3 has
>> accumulated quite a few commits (118 atm). In particular, Spark community
>> recently found an issue which prevents one from using the shaded Hadoop
>> client together with certain compression codecs such as lz4 and snappy
>> codec. The details are recorded in HADOOP-17891 and SPARK-36669.
>>
>> Therefore, I'm wondering if anyone is also interested in a 3.3.2 release.
>> If there is no objection, I'd like to volunteer myself for the work as
>> well.
>>
>> Best Regards,
>> Chao
>>
>

[jira] [Created] (HDFS-11995) HDFS Architecture documentation incorrectly describes writing to a local temporary file.

2017-06-19 Thread Chris Nauroth (JIRA)

Chris Nauroth created HDFS-11995:


 Summary: HDFS Architecture documentation incorrectly describes 
writing to a local temporary file.
 Key: HDFS-11995
 URL: https://issues.apache.org/jira/browse/HDFS-11995
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Affects Versions: 3.0.0-alpha3
Reporter: Chris Nauroth


The HDFS Architecture documentation has a section titled "Staging" that 
describes clients writing to a local temporary file first before interacting 
with the NameNode to allocate file metadata.  This information is incorrect.  
(Perhaps it was correct a long time ago, but it is no longer accurate with 
respect to the current implementation.)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-11063) Set NameNode RPC server handler thread name with more descriptive information about the RPC call.

2016-10-26 Thread Chris Nauroth (JIRA)

Chris Nauroth created HDFS-11063:


 Summary: Set NameNode RPC server handler thread name with more 
descriptive information about the RPC call.
 Key: HDFS-11063
 URL: https://issues.apache.org/jira/browse/HDFS-11063
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Chris Nauroth


We often run {{jstack}} on a NameNode process as a troubleshooting step if it 
is suffering high load or appears to be hanging.  By reading the stack trace, 
we can identify if a caller is blocked inside an expensive operation.  This 
would be even more helpful if we updated the RPC server handler thread name 
with more descriptive information about the RPC call.  This could include the 
calling user, the called RPC method, and the most significant argument to that 
method (most likely the path).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-11034) Provide a command line tool to clear decommissioned DataNode information from the NameNode without restarting.

2016-10-19 Thread Chris Nauroth (JIRA)

Chris Nauroth created HDFS-11034:


 Summary: Provide a command line tool to clear decommissioned 
DataNode information from the NameNode without restarting.
 Key: HDFS-11034
 URL: https://issues.apache.org/jira/browse/HDFS-11034
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Chris Nauroth


Information about decommissioned DataNodes remains tracked in the NameNode for 
the entire NameNode process lifetime.  Currently, the only way to clear this 
information is to restart the NameNode.  This issue proposes to add a way to 
clear this information online, without requiring a process restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-6277) WebHdfsFileSystem#toUrl does not perform character escaping for rename

2016-09-30 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-6277.
-
Resolution: Won't Fix

This bug is present in the 1.x line, but not 2.x or 3.x.  I'm resolving this as 
Won't Fix, because 1.x is no longer under active maintenance.

> WebHdfsFileSystem#toUrl does not perform character escaping for rename 
> ---
>
> Key: HDFS-6277
> URL: https://issues.apache.org/jira/browse/HDFS-6277
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Ramya Sunil
>Assignee: Chris Nauroth
>
> Found this issue while testing HDFS-6141. WebHdfsFileSystem#toUrl  does not 
> perform character escaping for rename and causes the operation to fail. 
> This bug does not exist on 2.x
> For e.g: 
> $ hadoop dfs -rmr 'webhdfs://:/tmp/test dirname with spaces'
> Problem with Trash.Unexpected HTTP response: code=400 != 200, op=RENAME, 
> message=Bad Request. Consider using -skipTrash option
> rmr: Failed to move to trash: webhdfs://:/tmp/test dirname 
> with spaces



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Re: Permission bit 12 in getFileInfo response

2016-09-19 Thread Chris Nauroth

Hello John,

That is the ACL bit.  The NameNode toggles on the ACL bit in getFileInfo 
responses for inodes that have ACL entries attached to them.  On the client 
side, this results in calls to FsPermission#getAclBit returning true.

The purpose of the ACL bit is to help client applications identify files and 
directories that have ACL entries attached.  One specific example where this is 
useful is in the output of the file system shell "ls" command.  (See 
org.apache.hadoop.fs.shell.Ls#processPath.)  If the ACL bit is turned on, then 
this is how the shell decides to append a '+' character after the basic 
permissions, so the end user knows that ACL entries are present.  If the ACL 
bit didn’t exist, then applications like this would have to be implemented with 
a more costly FileSystem#getAclStatus call, in addition to the existing 
getFileInfo RPC.

Test cases defined in FSAclBaseTest check for the presence of the ACL bit where 
expected.

--Chris Nauroth

On 9/19/16, 10:55 AM, "John Zhuge"  wrote:

Hi Gurus,

Does anyone know the meaning of bit 12 in the permission field of
"getFileInfo" response? To my understanding, the bit 9 is sticky bit, along
with the lower 9 bits for user/group/other.

In this following trace, the "perm" field is "4584", i.e., "10750" in oct:

16/09/15 15:54:56 TRACE ipc.ProtobufRpcEngine: 1: Response <-
NAMENODE:8020: getFileInfo {fs { fileType: IS_DIR path: "" length: 0
permission { perm: 4584 } owner: "USER" group: "supergroup"
modification_time: 1473884314570 access_time: 0 block_replication: 0
blocksize: 0 fileId: 8798130 childrenNum: 1 storagePolicy: 0 }}

Thanks,
John Zhuge
Software Engineer, Cloudera

Re: [VOTE] Release Apache Hadoop 2.7.3 RC1

2016-08-18 Thread Chris Nauroth

Andrew, thanks for adding your perspective on this.

What is a realistic strategy for us to evolve the HDFS audit log in a 
backward-compatible way?  If the API is essentially any form of ad-hoc 
scripting, then for any proposed audit log format change, I can find a reason 
to veto it on grounds of backward incompatibility.

- I can’t add a new field on the end, because that would break an awk script 
that uses $NF expecting to find a specific field.
- I can’t prepend a new field, because that would break a "cut -f1" expecting 
to find the timestamp.
- HDFS can’t add any new features, because someone might have written a script 
that does "exit 1" if it finds an unexpected RPC in the "cmd=" field.
- Hadoop is not allowed to add full IPv6 support, because someone might have 
written a script that looks at the "ip=" field and parses it by IPv4 syntax.

On the CLI, a potential solution for evolving the output is to preserve the old 
format by default and only enable the new format if the user explicitly passes 
a new argument.  What should we do for the audit log?  Configuration flags in 
hdfs-site.xml?  (That of course adds its own brand of complexity.)

I’m particularly interested to hear potential solutions from people like Andrew 
and Allen who have been most vocal about the need for a stable format.  Without 
a solution, this unfortunately devolves into the format being frozen within a 
major release line.

We could benefit from getting a patch on the compatibility doc that addresses 
the HDFS audit log specifically. 

--Chris Nauroth

On 8/18/16, 8:47 AM, "Andrew Purtell"  wrote:

An incompatible APIs change is developer unfriendly. An incompatible 
behavioral change is operator unfriendly. Historically, one dimension of 
incompatibility has had a lot more mindshare than the other. It's great that 
this might be changing for the better. 

Where I work when we move from one Hadoop 2.x minor to another we always 
spend time updating our deployment plans, alerting, log scraping, and related 
things due to changes. Some are debatable as if qualifying for the 
'incompatible' designation. I think the audit logging change that triggered 
this discussion is a good example of one that does. If you want to audit HDFS 
actions those log emissions are your API. (Inotify doesn't offer access control 
events.) One has to code regular expressions for parsing them and reverse 
engineer under what circumstances an audit line is emitted so you can make 
assumptions about what transpired. Change either and you might break someone's 
automation for meeting industry or legal compliance obligations. Not a trivial 
matter. If you don't operate Hadoop in production you might not realize the 
implications of such a change. Glad to see Hadoop has community diversity to 
recognize it in some cases. 

> On Aug 18, 2016, at 6:57 AM, Junping Du  wrote:
> 
> I think Allen's previous comments are very misleading. 
> In my understanding, only incompatible API (RPC, CLIs, WebService, etc.) 
shouldn't land on branch-2, but other incompatible behaviors (logs, audit-log, 
daemon's restart, etc.) should get flexible for landing. Otherwise, how could 
52 issues ( https://s.apache.org/xJk5) marked with incompatible-changes could 
get landed on branch-2 after 2.2.0 release? Most of them are already released. 
> 
> Thanks,
> 
> Junping
> 
> From: Vinod Kumar Vavilapalli 
> Sent: Wednesday, August 17, 2016 9:29 PM
> To: Allen Wittenauer
> Cc: common-...@hadoop.apache.org; hdfs-dev@hadoop.apache.org; 
yarn-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org
> Subject: Re: [VOTE] Release Apache Hadoop 2.7.3 RC1
> 
> I always look at CHANGES.txt entries for incompatible-changes and this 
JIRA obviously wasn’t there.
> 
> Anyways, this shouldn’t be in any of branch-2.* as committers there 
clearly mentioned that this is an incompatible change.
> 
> I am reverting the patch from branch-2* .
> 
> Thanks
> +Vinod
> 
>> On Aug 16, 2016, at 9:29 PM, Allen Wittenauer 
 wrote:
>> 
>> 
>> 
>> -1
>> 
>> HDFS-9395 is an incompatible change:
>> 
>> a) Why is not marked as such in the changes file?
>> b) Why is an incompatible change in a micro release, much less a minor?
>> c) Where is the release note for this change?
>> 
>> 
>>> On Aug 12, 2016, at 9:45 AM, Vinod Kumar Vavilapalli 
 wrote:
>>> 
>>> Hi all,
>>> 
>>> I've created a release candidate RC1 for Apache Hadoop 2.7.3.
>>> 
>>> As discussed before, this is the

Re: [DISCUSS] Retire BKJM from trunk?

2016-07-27 Thread Chris Nauroth

I recommend including the BookKeeper community in this discussion.  I’ve added 
their user@ and dev@ lists to this thread.

I do not see BKJM being used in practice.  Removing it from trunk would be 
attractive in terms of less code for Hadoop to maintain and build, but if we 
find existing users that want to keep it, I wouldn’t object.

--Chris Nauroth

On 7/26/16, 11:14 PM, "Vinayakumar B"  wrote:

Hi All,

   BKJM was Active and made much stable when the NameNode HA was 
implemented and there was no QJM implemented.
   Now QJM is present and is much stable which is adopted by many 
production environment.
   I wonder whether it would be a good time to retire BKJM from trunk?

   Are there any users of BKJM exists?

-Vinay



-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Re: Apache MSDN Offer is Back

2016-07-20 Thread Chris Nauroth

That definitely was possible under the old deal.  You could go through the MSDN 
site and download an iso for various versions of Windows and run it under 
VirtualBox.  The MSDN site also would furnish a license key that you could use 
to activate the machine.

I haven't yet gone through this new process to see if anything has changed in 
the benefits.

--Chris Nauroth

From: Ravi Prakash mailto:ravihad...@gmail.com>>
Date: Wednesday, July 20, 2016 at 12:04 PM
To: Chris Nauroth mailto:cnaur...@hortonworks.com>>
Cc: "common-...@hadoop.apache.org<mailto:common-...@hadoop.apache.org>" 
mailto:common-...@hadoop.apache.org>>, 
"hdfs-dev@hadoop.apache.org<mailto:hdfs-dev@hadoop.apache.org>" 
mailto:hdfs-dev@hadoop.apache.org>>, 
"yarn-...@hadoop.apache.org<mailto:yarn-...@hadoop.apache.org>" 
mailto:yarn-...@hadoop.apache.org>>, 
"mapreduce-...@hadoop.apache.org<mailto:mapreduce-...@hadoop.apache.org>" 
mailto:mapreduce-...@hadoop.apache.org>>
Subject: Re: Apache MSDN Offer is Back

Thanks Chris!

I did avail of the offer a few months ago, and wasn't able to figure out if a 
windows license was also available. I want to run windows inside a virtual 
machine on my Linux laptop, for the rare cases that there are patches that may 
affect that. Any clue if that is possible?

Thanks
Ravi

On Tue, Jul 19, 2016 at 4:09 PM, Chris Nauroth 
mailto:cnaur...@hortonworks.com>> wrote:
A few months ago, we learned that the offer for ASF committers to get an MSDN 
license had gone away.  I'm happy to report that as of a few weeks ago, that 
offer is back in place.  For more details, committers can check out 
https://svn.apache.org/repos/private/committers and read 
donated-licenses/msdn.txt.

--Chris Nauroth

Apache MSDN Offer is Back

2016-07-19 Thread Chris Nauroth

A few months ago, we learned that the offer for ASF committers to get an MSDN 
license had gone away.  I'm happy to report that as of a few weeks ago, that 
offer is back in place.  For more details, committers can check out 
https://svn.apache.org/repos/private/committers and read 
donated-licenses/msdn.txt.

--Chris Nauroth

[jira] [Resolved] (HDFS-10546) hadoop-hdfs-native-client fails distro build when trying to copy libhdfs binaries.

2016-06-17 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-10546.
--
Resolution: Duplicate

I just realized this duplicates HDFS-10353, which fixed the problem in trunk.  
We just need to cherry-pick that patch down to branch-2 and branch-2.8.  I'll 
cover it over there.

> hadoop-hdfs-native-client fails distro build when trying to copy libhdfs 
> binaries.
> --
>
> Key: HDFS-10546
> URL: https://issues.apache.org/jira/browse/HDFS-10546
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: build
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Blocker
>
> During the distro build, hadoop-hdfs-native-client copies the built libhdfs 
> binary artifacts for inclusion in the distro.  It references an incorrect 
> path though.  The copy fails and the build aborts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-10546) hadoop-hdfs-native-client fails distro build when trying to copy libhdfs binaries.

2016-06-17 Thread Chris Nauroth (JIRA)

Chris Nauroth created HDFS-10546:


 Summary: hadoop-hdfs-native-client fails distro build when trying 
to copy libhdfs binaries.
 Key: HDFS-10546
 URL: https://issues.apache.org/jira/browse/HDFS-10546
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: build
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Blocker


During the distro build, hadoop-hdfs-native-client copies the built libhdfs 
binary artifacts for inclusion in the distro.  It references an incorrect path 
though.  The copy fails and the build aborts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Re: Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86

2016-06-10 Thread Chris Nauroth

Interestingly, that FindBugs warning in hadoop-azure-datalake was not
flagged during pre-commit before I committed HADOOP-12666.  I'm going to
propose that we address it in scope of HADOOP-12875.

--Chris Nauroth




On 6/10/16, 10:30 AM, "Apache Jenkins Server" 
wrote:

>For more details, see
>https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/58/
>
>No changes
>
>
>
>
>-1 overall
>
>
>The following subsystems voted -1:
>findbugs unit
>
>
>The following subsystems voted -1 but
>were configured to be filtered/ignored:
>cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace
>
>
>The following subsystems are considered long running:
>(runtime bigger than 1h  0m  0s)
>unit
>
>
>Specific tests:
>
>FindBugs :
>
>   module:hadoop-tools/hadoop-azure-datalake
>   int value cast to float and then passed to Math.round in
>org.apache.hadoop.hdfs.web.PrivateAzureDataLakeFileSystem$BatchByteArrayIn
>putStream.getSplitSize(int) At PrivateAzureDataLakeFileSystem.java:and
>then passed to Math.round in
>org.apache.hadoop.hdfs.web.PrivateAzureDataLakeFileSystem$BatchByteArrayIn
>putStream.getSplitSize(int) At PrivateAzureDataLakeFileSystem.java:[line
>925] 
>
>Failed junit tests :
>
>   hadoop.hdfs.server.namenode.TestEditLog
>   hadoop.yarn.server.resourcemanager.TestClientRMTokens
>   hadoop.yarn.server.resourcemanager.TestAMAuthorization
>   hadoop.yarn.server.TestContainerManagerSecurity
>   hadoop.yarn.server.TestMiniYarnClusterNodeUtilization
>   hadoop.yarn.client.cli.TestLogsCLI
>   hadoop.yarn.client.api.impl.TestAMRMProxy
>   hadoop.yarn.client.api.impl.TestDistributedScheduling
>   hadoop.yarn.client.TestGetGroups
>   hadoop.mapreduce.tools.TestCLI
>   hadoop.mapred.TestMRCJCFileOutputCommitter
>
>Timed out junit tests :
>
>   org.apache.hadoop.yarn.client.cli.TestYarnCLI
>   org.apache.hadoop.yarn.client.api.impl.TestAMRMClient
>   org.apache.hadoop.yarn.client.api.impl.TestYarnClient
>   org.apache.hadoop.yarn.client.api.impl.TestNMClient
>  
>
>   cc:
>
>   
>https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/58/artifact
>/out/diff-compile-cc-root.txt  [4.0K]
>
>   javac:
>
>   
>https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/58/artifact
>/out/diff-compile-javac-root.txt  [164K]
>
>   checkstyle:
>
>   
>https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/58/artifact
>/out/diff-checkstyle-root.txt  [16M]
>
>   pylint:
>
>   
>https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/58/artifact
>/out/diff-patch-pylint.txt  [16K]
>
>   shellcheck:
>
>   
>https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/58/artifact
>/out/diff-patch-shellcheck.txt  [20K]
>
>   shelldocs:
>
>   
>https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/58/artifact
>/out/diff-patch-shelldocs.txt  [16K]
>
>   whitespace:
>
>   
>https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/58/artifact
>/out/whitespace-eol.txt  [12M]
>   
>https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/58/artifact
>/out/whitespace-tabs.txt  [1.3M]
>
>   findbugs:
>
>   
>https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/58/artifact
>/out/branch-findbugs-hadoop-tools_hadoop-azure-datalake-warnings.html
>[8.0K]
>
>   javadoc:
>
>   
>https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/58/artifact
>/out/diff-javadoc-javadoc-root.txt  [2.3M]
>
>   unit:
>
>   
>https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/58/artifact
>/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt  [144K]
>   
>https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/58/artifact
>/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-
>yarn-server-resourcemanager.txt  [60K]
>   
>https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/58/artifact
>/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-
>yarn-server-tests.txt  [268K]
>   
>https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/58/artifact
>/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt
>[908K]
>   
>https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/58/artifact
>/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-ma
>preduce-client-core.txt  [56K]
>   
>https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/58/artifact
>/out/patch-unit-hadoop-mapreduce-proj

Re: INotify and atime

2016-06-08 Thread Chris Nauroth

Hello Rahul,

I don't believe there is any fundamental difference in the handling of the
atime update for HDFS inotify.  I'd expect you to receive an event with
EventType.METADATA and MetadataUpdateEvent.TIMES, and I wouldn't expect
that you'd need to save a new fsimage to trigger the notification.

There is a test in the Hadoop codebase that exercises inotify event
delivery on atime update: TestDFSInotifyEventInputStream#testBasic.  You
might try looking at the source of that test and comparing it to what your
code is doing to see if there is any difference.

--Chris Nauroth

On 6/8/16, 2:17 PM, "rahul gidwani"  wrote:

>I was testing Inotify locally and noticed that unless I save the FSImage I
>can't get a MetadataUpdateEvent to get fired.
>
>The test is pretty basic:
>
>I setup a minicluster with
>conf.setLong("dfs.access.time.precision", 1);
>
>create a file (verify I get a EventType.Create)
>
>I read that file and verify on the filesystem that the atime has changed
>for that file.
>I poll 1 minute for an event to get fired, nothing happens.
>I save the namespace for that fsImage, the event gets fired and I get a
>EventType.METADATA
>
>Is this logic correct?  All the other events I have tested do not require
>saving the FSImage to get fired, is the atime event handled differently
>somehow?
>
>thank you
>rahul

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-10502) Enabled memory locking and now HDFS won't start up

2016-06-08 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-10502.
--
Resolution: Invalid

Hello [~machey].  I recommend taking these questions to the 
u...@hadoop.apache.org mailing list.  We use JIRA for tracking confirmed bugs 
and feature requests.  We use u...@hadoop.apache.org for usage advice and 
troubleshooting.

Regarding whether or not this is a recommended approach, I think it depends on 
a few other factors.  Is the intent to use these cached files from Hadoop 
workloads, such as MapReduce jobs or Hive queries?  If not, then I wonder if 
your use case might be better served by something more directly focused on 
general caching use cases, such as Redis or memcached.  If your use case does 
involve Hadoop integration, then certainly Centralized Cache Management is 
worth exploring.

Regarding the timeouts, I can tell from the exception that this is the 
heartbeat RPC sent from the DataNode to the NameNode.  I recommend 
investigating connectivity between the DataNode and the NameNode and examining 
the logs from both sides to try to determine if something is going wrong in the 
handling of the heartbeat message.  On one hand, a heartbeat timeout is not an 
error condition that is specific to Centralized Cache Management.  It could 
happen whether or not you're using that feature.  On the other hand, the 
heartbeat message does contain some optional information about the state of 
cache capacity and current usage at the DataNode.  That information would 
trigger special handling logic at the NameNode side, so I suppose there is a 
chance that something in that logic is hanging up the heartbeat handling.  
Investigating the logs might reveal more.

u...@hadoop.apache.org would be a good forum for further discussion of both of 
these topics.

> Enabled memory locking and now HDFS won't start up
> --
>
> Key: HDFS-10502
> URL: https://issues.apache.org/jira/browse/HDFS-10502
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 2.7.2
> Environment: RHEL 6.8
>Reporter: Chris Machemer
>
> My goal is to speed up reads.  I have about 500k small files (2k to 15k) and 
> I'm trying to use HDFS as a cache for serialized instances of java objects.
> I've written the code to construct and serialize all the objects out to HDFS, 
> and am now hoping to improve read performance, because accessing the objects 
> from disk-based storage is proving to be too slow for my application's SLA's.
> So my first question is, is using memory locking and hdfs cacheadmin pools 
> and directives the right way to go, to cache my objects into memory, or 
> should I create RAM disks, and do memory-based storage instead?
> If hdfs cacheadmin is the way to go (it's the path I'm going down so far), 
> then I need to figure out if what's happening is a bug or if I've configured 
> something wrong, because when I start up HDFS with a gig of memory locked 
> (both in limits.d for ulimit -l and also in hdfs-site.xml) and the server 
> starts up, and presumably tries to cache things into memory, I get hours and 
> hours of timeouts in the logs like this:
> 2016-06-08 07:42:50,856 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> IOException in offerService
> java.net.SocketTimeoutException: Call From stgb-fe1.litle.com/10.1.9.66 to 
> localhost:8020 failed on socket timeout exception: 
> java.net.SocketTimeoutException: 6 millis timeout while waiting for 
> channel to be ready for read. ch : java.nio.channels.SocketChannel[connected 
> local=/127.0.0.1:51647 remote=localhost/127.0.0.1:8020]; For more details 
> see:  http://wiki.apache.org/hadoop/SocketTimeout
>   at sun.reflect.GeneratedConstructorAccessor8.newInstance(Unknown Source)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:751)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1479)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1412)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
>   at com.sun.proxy.$Proxy13.sendHeartbeat(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.sendHeartbeat(DatanodeProtocolClientSideTranslatorPB.java:153)
>   at 
> org.apache.h

Re: Another thought on client-side support of HDFS federation

2016-05-27 Thread Chris Nauroth

Hello Tianyi HE,

I noticed that a similar design for a federation proxying model has just
been proposed on Apache JIRA HDFS-10467.  You might want to join the
conversation there.

https://issues.apache.org/jira/browse/HDFS-10467


--Chris Nauroth




On 5/2/16, 10:32 AM, "Colin McCabe"  wrote:

>Hi Tianyi HE,
>
>Thanks for sharing this!  This reminds me of the httpfs daemon.  This
>daemon basically sits in front of an HDFS cluster and accepts requests,
>which it serves by forwarding them to the underlying HDFS instance.
>There is some documentation about it here:
>https://hadoop.apache.org/docs/stable/hadoop-hdfs-httpfs/index.html
>
>Since httpfs uses an org.apache.hadoop.fs.FileSystem instance, it seems
>like you could plug in the apache.hadoop.fs.viewfs.ViewFileSystem class
>and be up and running with federation.  I haven't tried this, but I
>would expect that it would work, unless there are bugs in ViewFS itself.
>
>The big advantage of httpfs is that it provides a webhdfs-style REST
>interface.  As you said, this kind of interface makes it simple to use
>any language with REST bindings, without worrying about using a thick
>client.
>
>The big disadvantage of httpfs is that you must move both metadata and
>data operations through the httpfs daemon.  This could become a
>performance bottleneck.  It seems like you are concerned about this
>bottleneck.
>
>We also have webhdfs.  Unlike httpfs, webhdfs doesn't require all the
>data to move through its daemon.  With webhdfs, the client talks to
>DataNodes directly.
>
>I wonder if extending httpfs or webhdfs would be a better approach than
>starting from scratch.  There is a maintenance burden for adding new
>services and daemons.  This was our motivation for removing hftp, for
>example.  It's certainly something to think about.
>
>best,
>Colin
>
>
>On Thu, Apr 28, 2016, at 17:55, 何天一 wrote:
>> Hey guys,
>> 
>> My associates have investigated HDFS federation recently, which, turns
>> out
>> to be a quite good solution for improving scalability on
>> NameNode/DataNode
>> side.
>> 
>> However, we encountered some problem on client-side. Since:
>> A) For historical reason, we use clients in multiple languages to access
>> HDFS, (i.e. python-snakebite, or perhaps libhdfs++). So we either
>> implement
>> multiple versions of ViewFS or we give up the consistency view (which
>>can
>> be confusing to user).
>> B) We have hadoop client configuration deployed on client nodes, which
>>we
>> do not have control over . Also, releasing new configuration could be a
>> real heavy operation because it needs to be pushed to several thousand
>>of
>> nodes, as well as maintaining consistency (say a node is down throughout
>> the operation, then come back online. it could still possess a stale
>> version of configuration).
>> 
>> So we intended to explore another solution to these problems, and came
>>up
>> with a proxy model.
>> That is, build a RPC proxy in front of NameNodes.
>> All clients talk to proxy when they need to consult NameNode, then proxy
>> decide which NameNode should the request go to according to mount table.
>> This solved our problem. All clients are seamlessly upgraded with
>> federation support.
>> We open sourced the proxy recently: https://github.com/bytedance/nnproxy
>> (BTW, all kinds of feedbacks are welcomed)
>> 
>> But there are still a few issues. For example, several modifications
>> needs
>> to be done inside hadoop ipc to support rpc forwarding. We released
>>patch
>> according to which with nnproxy project (
>> https://github.com/bytedance/nnproxy/tree/master/hadoop-patches). But it
>> could be better to have these merged to apache trunk. Does someone think
>> it's worth?
>> 
>> 
>> -- 
>> Cheers,
>> Tianyi HE
>> (+86) 185 0042 4096
>
>-
>To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
>For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
>
>


-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Re: Cp command is not atomic

2016-05-25 Thread Chris Nauroth

Hello Kun,

You are correct that "hdfs dfs -cp" is not atomic, but the details of that
are a bit different from what you described.  For the example you gave,
the sequence of events would be:

1. Open a.xml.
2. Create file b.xml._COPYING_.
3. Copy the bytes from a.xml to b.xml._COPYING_.
4. Rename b.xml._COPYING_ to b.xml.

b.xml._COPYING_ is a temporary file.  All the bytes are written to this
location first.  Only if the full copy is successful, it proceeds to step
4 to rename it to its final destination at b.xml.  The rename is atomic,
so overall, this has the effect that b.xml will never have
partially-written data.  Either the whole copy succeeds or the copy fails
and b.xml doesn't exist.

However, even though the rename is atomic, we can't claim the overall
operation is atomic.  For example, if the process dies between step 2 and
step 3, then the command leaves a lingering side effect in the form of the
b.xml._COPYING_ file.

Perhaps it's sufficient for your use case that the final rename step is
atomic.

--Chris Nauroth

On 5/25/16, 8:21 AM, "Kun Ren"  wrote:

>Hi Genius,
>
>If I understand correctly, the shell command "cp" for the HDFS is not
>atomic, is that correct?
>
>For example:
>
>./bin/hdfs dfs -cp input/a.xml input/b.xml
>
>This command actually does 3 things, 1. read input/a.xml; 2. Create a new
>file input/b.xml; 3. Write the content of a.xml to b.xml;
>
>When I looked at the code, and the client side actually does the 3 steps
>and there are no lock between the 3 step, does it mean that the cp command
>is not guaranteed atomic?
>
>
>Thanks a lot for your reply.

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Re: HDFS Federation-- cross namenodes operations

2016-05-25 Thread Chris Nauroth

Hello Kun,

I replied to this same question on the user@ list.  For usage questions
like this, the user@ list is the correct forum.  It's not necessary to
cross-post to the dev@ lists.

--Chris Nauroth

On 5/25/16, 8:58 AM, "Kun Ren"  wrote:

>Hi Genius,
>
>Does HDFS Federation support the cross namenodes operations?
>
>For example:
>
>./bin/hdfs dfs -cp input1/a.xml input2/b.xml
>
>Supposed that input1 belongs namenode 1, and input 2 belongs namenode 2,
>does Federation support this operation? And if not, why?
>
>Thanks.

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Re: ASF OS X Build Infrastructure

2016-05-21 Thread Chris Nauroth

Hi Ravi,

Something certainly seems off about that bootstrapping problem you encountered. 
 :-)  When I've done this, the artifact I downloaded was an .iso file, which I 
could then use to install a VirtualBox VM.

I'm now tuned into the discussion Sean referenced about the ASF MSDN program.  
I'll send another update when I have something more specific to share.

--Chris Nauroth

From: Ravi Prakash mailto:ravihad...@gmail.com>>
Date: Friday, May 20, 2016 at 4:56 PM
To: Sean Busbey mailto:bus...@cloudera.com>>
Cc: Chris Nauroth mailto:cnaur...@hortonworks.com>>, 
Steve Loughran mailto:ste...@hortonworks.com>>, Hadoop 
Common mailto:common-...@hadoop.apache.org>>, 
"mapreduce-...@hadoop.apache.org<mailto:mapreduce-...@hadoop.apache.org>" 
mailto:mapreduce-...@hadoop.apache.org>>, 
"hdfs-dev@hadoop.apache.org<mailto:hdfs-dev@hadoop.apache.org>" 
mailto:hdfs-dev@hadoop.apache.org>>, 
"yarn-...@hadoop.apache.org<mailto:yarn-...@hadoop.apache.org>" 
mailto:yarn-...@hadoop.apache.org>>
Subject: Re: ASF OS X Build Infrastructure

FWIW, I was able to get a response from the form last month. I was issued a new 
MSDN subscriber ID using which I could have downloaded Microsoft Visual Studio 
(and some other products, I think). I was interested in downloading an image of 
Windows to run in a VM, but the downloader is. wait for it. an exe file 
:-) Haven't gotten around to begging someone with a Windows OS to run that 
image downloader.

On Fri, May 20, 2016 at 10:39 AM, Sean Busbey 
mailto:bus...@cloudera.com>> wrote:
Some talk about the MSDN-for-committers program recently passed by on a private
list. It's still active, it just changed homes within Microsoft. The
info should still be in the committer repo. If something is amiss
please let me know and I'll pipe up to the folks already plugged in to
confirming it's active.

On Fri, May 20, 2016 at 12:13 PM, Chris Nauroth
mailto:cnaur...@hortonworks.com>> wrote:
> It's very disappointing to see that vanish.  I'm following up to see if I
> can learn more about what happened or if I can do anything to help
> reinstate it.
>
> --Chris Nauroth
>
>
>
>
> On 5/20/16, 6:11 AM, "Steve Loughran" 
> mailto:ste...@hortonworks.com>> wrote:
>
>>
>>> On 20 May 2016, at 10:40, Lars Francke 
>>> mailto:lars.fran...@gmail.com>> wrote:
>>>
>>>>
>>>> Regarding lack of personal access to anything but Linux, I'll take
>>>>this as
>>>> an opportunity to remind everyone that ASF committers (not just
>>>>limited to
>>>> Hadoop committers) are entitled to a free MSDN license, which can get
>>>>you
>>>> a Windows VM for validating Windows issues and any patches that touch
>>>> cross-platform concerns, like the native code.  Contributors who are
>>>>not
>>>> committers still might struggle to get access to Windows, but all of us
>>>> reviewing and committing patches do have access.
>>>>
>>>
>>> Actually, from all I can tell this MSDN offer has been discontinued for
>>> now. All the information has been removed from the committers repo. Do
>>>you
>>> have any more up to date information on this?
>>>
>>
>>
>>That's interesting.
>>
>>I did an SVN update and it went away..looks like something happened on
>>April 26
>>
>>No idea, though the svn log has a bit of detail
>>
>>-
>>To unsubscribe, e-mail: 
>>mapreduce-dev-unsubscr...@hadoop.apache.org<mailto:mapreduce-dev-unsubscr...@hadoop.apache.org>
>>For additional commands, e-mail: 
>>mapreduce-dev-h...@hadoop.apache.org<mailto:mapreduce-dev-h...@hadoop.apache.org>
>>
>>
>
>
> -
> To unsubscribe, e-mail: 
> common-dev-unsubscr...@hadoop.apache.org<mailto:common-dev-unsubscr...@hadoop.apache.org>
> For additional commands, e-mail: 
> common-dev-h...@hadoop.apache.org<mailto:common-dev-h...@hadoop.apache.org>
>



--
busbey

-
To unsubscribe, e-mail: 
mapreduce-dev-unsubscr...@hadoop.apache.org<mailto:mapreduce-dev-unsubscr...@hadoop.apache.org>
For additional commands, e-mail: 
mapreduce-dev-h...@hadoop.apache.org<mailto:mapreduce-dev-h...@hadoop.apache.org>

Re: ASF OS X Build Infrastructure

2016-05-20 Thread Chris Nauroth

It's very disappointing to see that vanish.  I'm following up to see if I
can learn more about what happened or if I can do anything to help
reinstate it.

--Chris Nauroth




On 5/20/16, 6:11 AM, "Steve Loughran"  wrote:

>
>> On 20 May 2016, at 10:40, Lars Francke  wrote:
>> 
>>> 
>>> Regarding lack of personal access to anything but Linux, I'll take
>>>this as
>>> an opportunity to remind everyone that ASF committers (not just
>>>limited to
>>> Hadoop committers) are entitled to a free MSDN license, which can get
>>>you
>>> a Windows VM for validating Windows issues and any patches that touch
>>> cross-platform concerns, like the native code.  Contributors who are
>>>not
>>> committers still might struggle to get access to Windows, but all of us
>>> reviewing and committing patches do have access.
>>> 
>> 
>> Actually, from all I can tell this MSDN offer has been discontinued for
>> now. All the information has been removed from the committers repo. Do
>>you
>> have any more up to date information on this?
>> 
>
>
>That's interesting.
>
>I did an SVN update and it went away..looks like something happened on
>April 26
>
>No idea, though the svn log has a bit of detail
>
>-
>To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
>For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
>
>


-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-10438) When NameNode HA is configured to use the lifeline RPC server, it should log the address of that server.

2016-05-19 Thread Chris Nauroth (JIRA)

Chris Nauroth created HDFS-10438:


 Summary: When NameNode HA is configured to use the lifeline RPC 
server, it should log the address of that server.
 Key: HDFS-10438
 URL: https://issues.apache.org/jira/browse/HDFS-10438
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, namenode
Reporter: KWON BYUNGCHANG
Assignee: Chris Nauroth
Priority: Minor


As reported by [~magnum]:

I have configured below
{code}
dfs.namenode.servicerpc-address.xdev.nn1=my.host.com:8040
dfs.namenode.lifeline.rpc-address.xdev.nn1=my.host.com:8041
{code}

servicerpc port is 8040,  lifeline port is 8041.
however zkfc daemon is logging using servicerpc port. 
It may cause confusion.

thank you.

{code}
2016-05-19 19:18:40,566 WARN  ha.HealthMonitor 
(HealthMonitor.java:doHealthChecks(207)) - Service health check failed for 
NameNode at my.host.com/10.114.87.91:8040: The NameNode has no resources 
available
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-10437) ReconfigurationProtocol not covered by HDFSPolicyProvider.

2016-05-19 Thread Chris Nauroth (JIRA)

Chris Nauroth created HDFS-10437:


 Summary: ReconfigurationProtocol not covered by HDFSPolicyProvider.
 Key: HDFS-10437
 URL: https://issues.apache.org/jira/browse/HDFS-10437
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.8.0
Reporter: Chris Nauroth


The {{HDFSPolicyProvider}} class contains an entry for defining the security 
policy of each HDFS RPC protocol interface.  {{ReconfigurationProtocol}} is not 
listed currently.  This may indicate that reconfiguration functionality is not 
working correctly in secured clusters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Re: ASF OS X Build Infrastructure

2016-05-19 Thread Chris Nauroth

Allen, thank you for doing this.

Regarding lack of personal access to anything but Linux, I'll take this as
an opportunity to remind everyone that ASF committers (not just limited to
Hadoop committers) are entitled to a free MSDN license, which can get you
a Windows VM for validating Windows issues and any patches that touch
cross-platform concerns, like the native code.  Contributors who are not
committers still might struggle to get access to Windows, but all of us
reviewing and committing patches do have access.

It has long been on my TODO list to set up similar Jenkins jobs for
Windows, but it keeps slipping.  I'll try once again to bump up priority.

--Chris Nauroth




On 5/19/16, 9:41 AM, "Allen Wittenauer"  wrote:

>   
>   Some of you may not know that the ASF actually does have an OS X machine
>(a Mac mini, so it¹s not a speed demon) in the build infrastructure.
>While messing around with getting all? of the trunk jobs reconfigured to
>do Java 8 and separate maven repos, I noticed that this box tends to sit
>idle most of the day. Why take advantage of it?  Therefore, I also setup
>two jobs for us to use to help alleviate the ³I don¹t have access to
>anything but Linux² excuse when writing code that may not work in a
>portable manner.
>
>Jobs #1:
>
>   https://builds.apache.org/view/H-L/view/Hadoop/job/Precommit-HADOOP-OSX
>
>   This basically runs Apache Yetus precommit with quite a few of the
>unnecessary tests disabled.  For example, there¹s no point in running
>checkstyle.  Note that this job takes the *full* JIRA issue id as input.
>So ŒHADOOP-9902¹ not Œ9902¹.  This allows for one Jenkins job to be used
>for all the Hadoop sub-projects (HADOOP, HDFS, MR, YARN).  ³But my code
>is on github and I don¹t want to upload a patch!²  I haven¹t tested it,
>but it should also take a URL, so just add a .diff to the end of your
>github compare URL and put that in the issue box.  It hypothetically
>should work.
>
>Job #2:
>
>   I¹m still hammering on this one because the email notifications aren¹t
>working to my satisfaction plus we have some extremely Linux-specific
>code in YARNŠ but 
>
>   
> https://builds.apache.org/view/H-L/view/Hadoop/job/hadoop-trunk-osx-java8
>/
>
>   Š is a ³build the world² job similar to what is currently running under
>the individual sub projects.  (This actually makes it one of the few
>³build everything² jobs we have running. Most of the other jobs only
>build that particular sub project.).  It does not run the full unit test
>suite and it also does not build all of the native code.  This gives us a
>place to start on our journey of making Hadoop actually, truly run
>everywhere.  (Interesting side note: It¹s been *extremely* consistent in
>what fails vs. the Linux build hosts.)
>
>   At some point, likely after YETUS-390 is complete, I¹ll switch this job
>over to be run by Apache Yetus in qbt mode so that it¹s actually easier
>to track failures across all dirs.  A huge advantage over raw maven
>commands.
>
>   Happy testing everyone.
>
>   NOTE: if you don¹t have access to launch jobs on builds.apache.org,
>you¹ll need to send a request to private@.  The Apache Hadoop PMC has the
>keys to give access to folks.
>
>
>
>-
>To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
>For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
>
>


-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Re: [VOTE] Merge feature branch HADOOP-12930

2016-05-16 Thread Chris Nauroth

Understood about the tests.

--Chris Nauroth




On 5/15/16, 7:30 AM, "Allen Wittenauer"  wrote:

>
>> On May 14, 2016, at 3:11 PM, Chris Nauroth 
>>wrote:
>> 
>> +1 (binding)
>> 
>> -Tried a dry-run merge of HADOOP-12930 to trunk.
>> -Successfully built distro on Windows.
>> -Ran "hdfs namenode", "hdfs datanode", and various interactive hdfs
>> commands through Cygwin.
>> -Reviewed documentation.
>> 
>> Allen, thank you for the contribution.  Would you please attach a full
>> patch to HADOOP-12930 to check pre-commit results?
>
>
>   Nope.  The whole reason this was done as a branch with multiple patches
>was to prevent Jenkins from getting overwhelmed since it would trigger
>full unit tests on pretty much the entire code base….
>
>> While testing this, I discovered a bug in the distro build for Windows.
>> Could someone please code review my patch on HADOOP-13149?
>
>   Done!
>
>> 
>> --Chris Nauroth
>> 
>> 
>> 
>> 
>> On 5/9/16, 1:26 PM, "Allen Wittenauer"  wrote:
>> 
>>> 
>>> Hey gang!
>>> 
>>> I¹d like to call a vote to run for 7 days (ending May 16 at 13:30 PT)
>>>to
>>> merge the HADOOP-12930 feature branch into trunk. This branch was
>>> developed exclusively by me as per the discussion two months ago as a
>>>way
>>> to make what would be a rather large patch hopefully easier to review.
>>> The vast majority of the branch is code movement in the same file,
>>> additional license headers, maven assembly hooks for distribution, and
>>> variable renames. Not a whole lot of new code, but a big diff file
>>> none-the-less.
>>> 
>>> This branch modifies the Œhadoop¹, Œhdfs¹, Œmapred¹, and Œyarn¹
>>>commands
>>> to allow for subcommands to be added or modified at runtime.  This
>>>allows
>>> for individual users or entire sites to tweak the execution environment
>>> to suit their local needs.  For example, it has been a practice for
>>>some
>>> locations to change the distcp jar out for a custom one.  Using this
>>> functionality, it is possible that the Œhadoop distcp¹ command could
>>>run
>>> the local version without overwriting the bundled jar and for existing
>>> documentation (read: results from Internet searches) to work as written
>>> without modification. This has the potential to be a huge win,
>>>especially
>>> for:
>>> 
>>> * advanced end users looking to supplement the Apache Hadoop
>>>experience
>>> * operations teams that may be able to leverage existing
>>>documentation
>>> without having to remain local ³exception² docs
>>> * development groups wanting an easy way to trial experimental
>>>features
>>> 
>>> Additionally, this branch includes the following, related changes:
>>> 
>>> * Adds the first unit tests for the Œhadoop¹ command
>>> * Adds the infrastructure for hdfs script testing and the first 
>>> unit
>>> test for the Œhdfs¹ command
>>> * Modifies the hadoop-tools components to be dynamic rather 
>>> than hard
>>> coded
>>> * Renames the shell profiles for hdfs, mapred, and yarn to be
>>> consistent with other bundled profiles, including the ones introduced
>>>in
>>> this branch
>>> 
>>> Documentation, including a Œhello world¹-style example, is in the
>>> UnixShellGuide markdown file.  (Of course!)
>>> 
>>>  I am at ApacheCon this week if anyone wants to discuss in-depth.
>>> 
>>> Thanks!
>>> 
>>> P.S.,
>>> 
>>> There are still two open sub-tasks.  These are blocked by other issues
>>> so that we may add unit testing to the shell code in those respective
>>> areas.  I¹ll covert to full issues after HADOOP-12930 is closed.
>>> 
>>> 
>>> -
>>> To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
>>> For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
>>> 
>>> 
>> 
>
>


-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Re: [VOTE] Merge feature branch HADOOP-12930

2016-05-14 Thread Chris Nauroth

+1 (binding)

-Tried a dry-run merge of HADOOP-12930 to trunk.
-Successfully built distro on Windows.
-Ran "hdfs namenode", "hdfs datanode", and various interactive hdfs
commands through Cygwin.
-Reviewed documentation.

Allen, thank you for the contribution.  Would you please attach a full
patch to HADOOP-12930 to check pre-commit results?

While testing this, I discovered a bug in the distro build for Windows.
Could someone please code review my patch on HADOOP-13149?

--Chris Nauroth




On 5/9/16, 1:26 PM, "Allen Wittenauer"  wrote:

>
>   Hey gang!
>
>   I¹d like to call a vote to run for 7 days (ending May 16 at 13:30 PT) to
>merge the HADOOP-12930 feature branch into trunk. This branch was
>developed exclusively by me as per the discussion two months ago as a way
>to make what would be a rather large patch hopefully easier to review.
>The vast majority of the branch is code movement in the same file,
>additional license headers, maven assembly hooks for distribution, and
>variable renames. Not a whole lot of new code, but a big diff file
>none-the-less.
>
>   This branch modifies the Œhadoop¹, Œhdfs¹, Œmapred¹, and Œyarn¹ commands
>to allow for subcommands to be added or modified at runtime.  This allows
>for individual users or entire sites to tweak the execution environment
>to suit their local needs.  For example, it has been a practice for some
>locations to change the distcp jar out for a custom one.  Using this
>functionality, it is possible that the Œhadoop distcp¹ command could run
>the local version without overwriting the bundled jar and for existing
>documentation (read: results from Internet searches) to work as written
>without modification. This has the potential to be a huge win, especially
>for:
>   
>   * advanced end users looking to supplement the Apache Hadoop 
> experience
>   * operations teams that may be able to leverage existing 
> documentation
>without having to remain local ³exception² docs
>   * development groups wanting an easy way to trial experimental 
> features
>
>   Additionally, this branch includes the following, related changes:
>
>   * Adds the first unit tests for the Œhadoop¹ command
>   * Adds the infrastructure for hdfs script testing and the first 
> unit
>test for the Œhdfs¹ command
>   * Modifies the hadoop-tools components to be dynamic rather 
> than hard
>coded
>   * Renames the shell profiles for hdfs, mapred, and yarn to be
>consistent with other bundled profiles, including the ones introduced in
>this branch
>
>   Documentation, including a Œhello world¹-style example, is in the
>UnixShellGuide markdown file.  (Of course!)
>
>I am at ApacheCon this week if anyone wants to discuss in-depth.
>
>   Thanks!
>
>P.S.,
>
>   There are still two open sub-tasks.  These are blocked by other issues
>so that we may add unit testing to the shell code in those respective
>areas.  I¹ll covert to full issues after HADOOP-12930 is closed.
>
>
>-
>To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
>For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
>
>


-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-10373) HDFS ZKFC HealthMonitor Throw a Exception Cause AutoFailOver

2016-05-06 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-10373.
--
Resolution: Invalid

Hello [~piaoyu zhang].  This doesn't look like a bug.  If ZKFC cannot contact 
its peer NameNode for a successful health check RPC, then an HA failover is the 
expected behavior.  This looks like an operational problem in this environment 
that needs further investigation.  "Connection reset by peer" means the remote 
end (the NameNode) closed out the socket before sending the expected response 
data.  I recommend looking at the NameNode logs to see if anything unusual 
happened during the timeframe of the HA failover.  If you need further 
assistance, then consider sending an email to u...@hadoop.apache.org.  I hope 
this helps.

> HDFS ZKFC HealthMonitor Throw a Exception Cause AutoFailOver
> 
>
> Key: HDFS-10373
> URL: https://issues.apache.org/jira/browse/HDFS-10373
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: auto-failover
>Affects Versions: 2.2.0
> Environment: CentOS6.5 Hadoop-2.2.0  
>Reporter: zhangyubiao
> Attachments: screenshot-1.png, 屏幕快照_2016-05-06_上午10.17.22.png
>
>
> HDFS ZKFC HealthMonitor Throw a Exception 
> 2016-05-05 02:00:59,475 WARN org.apache.hadoop.ha.HealthMonitor: 
> Transport-level exception trying to monitor health of NameNode at 
> XXX-XXX-XXX-hadoop.jd.local/172.22.17
> 1.XX:8021: Failed on local exception: java.io.IOException: Connection reset 
> by peer; Host Details : local host is: 
> "XXX-XXX-XXX-hadoop.jd.local/172.22.171.XX"; destinat
> ion host is: XXX-XXX-XXX-hadoop.jd.local":8021;
> Cause HA AutoFailOver



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-10359) Allow trigger block report from all datanodes

2016-05-04 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-10359.
--
Resolution: Won't Fix

I think there is momentum in this conversation towards a "Won't Fix" 
resolution, so I'm resolving the issue now.  [~Tao Jie], thank you for the 
discussion.  Even though this didn't lead to an enhancement, we appreciate the 
participation.

> Allow trigger block report from all datanodes
> -
>
> Key: HDFS-10359
> URL: https://issues.apache.org/jira/browse/HDFS-10359
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.7.0, 2.6.1
>Reporter: Tao Jie
>
> Since we have HDFS-7278 allows trigger block report from one certain 
> datanode. It would be helpful to add a option to this command to trigger 
> block report from all datanodes.
> Command maybe like this:
> *hdfs dfsadmin -triggerBlockReport \[-incremental\] 
> *



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-10356) Ozone: Container server needs enhancements to control of bind address for greater flexibility and testability.

2016-05-02 Thread Chris Nauroth (JIRA)

Chris Nauroth created HDFS-10356:


 Summary: Ozone: Container server needs enhancements to control of 
bind address for greater flexibility and testability.
 Key: HDFS-10356
 URL: https://issues.apache.org/jira/browse/HDFS-10356
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Chris Nauroth


The container server, as implemented in class 
{{org.apache.hadoop.ozone.container.common.transport.server.XceiverServer}}, 
currently does not offer the same degree of flexibility as our other RPC 
servers for controlling the network interface and port used in the bind call.  
There is no "bind-host" property, so it is not possible to control the exact 
network interface selected.  If the requested port is different from the actual 
bound port (i.e. setting port to 0 in test cases), then there is no exposure of 
that actual bound port to clients.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-10351) Ozone: Optimize key writes to chunks by providing a bulk write implementation in ChunkOutputStream.

2016-04-29 Thread Chris Nauroth (JIRA)

Chris Nauroth created HDFS-10351:


 Summary: Ozone: Optimize key writes to chunks by providing a bulk 
write implementation in ChunkOutputStream.
 Key: HDFS-10351
 URL: https://issues.apache.org/jira/browse/HDFS-10351
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Chris Nauroth
Assignee: Chris Nauroth


HDFS-10268 introduced the {{ChunkOutputStream}} class as part of end-to-end 
integration of Ozone receiving key content and writing it to chunks in a 
container.  That patch provided an implementation of the mandatory single-byte 
write method.  We can improve performance by adding an implementation of the 
bulk write method too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-10349) StorageContainerManager fails to compile after merge of HDFS-10312 maxDataLength enforcement.

2016-04-29 Thread Chris Nauroth (JIRA)

Chris Nauroth created HDFS-10349:


 Summary: StorageContainerManager fails to compile after merge of 
HDFS-10312 maxDataLength enforcement.
 Key: HDFS-10349
 URL: https://issues.apache.org/jira/browse/HDFS-10349
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ozone
Reporter: Chris Nauroth
Assignee: Chris Nauroth


HDFS-10312 introduced enforcement of a configurable maximum data length while 
deserializing large block reports.  This change broke compilation of 
{{StorageContainerManager}} on the HDFS-7240 feature branch, due to a 
constructor signature change in {{DatanodeProtocolServerSideTranslatorPB}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Re: handlerCount

2016-04-28 Thread Chris Nauroth

Hello,

In general, configuration property default values will be defined in two
places: 1) hdfs-default.xml, which defines the default property values
when a deployment doesn't specifically set them and 2) DFSConfigKeys, a
class that defines constant default values that the code uses if for some
reason no default value is found during the configuration lookup.

https://github.com/apache/hadoop/blob/rel/release-2.7.2/hadoop-hdfs-project
/hadoop-hdfs/src/main/resources/hdfs-default.xml#L602-L606

https://github.com/apache/hadoop/blob/rel/release-2.7.2/hadoop-hdfs-project
/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java#L473-L
474

--Chris Nauroth

On 4/28/16, 2:51 PM, "Kun Ren"  wrote:

>Hi Genius,
>
>I have a quick question:
>
>I remembered I saw the default value for HandlerCout is 10(The number of
>Handler threads), but I can not find where it is defined in the source
>code, could you please point to me where I can find it in the 2.7.2
>codebase? Thanks a lot.

[jira] [Resolved] (HDFS-10322) DomianSocket error lead to more and more DataNode thread waiting

2016-04-26 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-10322.
--
Resolution: Duplicate

[~chenfolin], thank you for investigating this further.  I'm just updating 
status on this issue to indicate it's a duplicate of a prior issue.

> DomianSocket error lead to more and more DataNode thread waiting 
> -
>
> Key: HDFS-10322
> URL: https://issues.apache.org/jira/browse/HDFS-10322
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.5.0
>Reporter: ChenFolin
> Fix For: 2.6.4
>
>
> When open short read and  a DomianSoket broken pipe error happened,The 
> Datanode will produce more and more waiting threads.
>  It is similar to Bug HADOOP-11802, but i do not think they are same problem, 
> because the DomainSocket thread is in Running state.
> stack log:
> "DataXceiver for client unix:/var/run/hadoop-hdfs/dn.50010 [Waiting for 
> operation #1]" daemon prio=10 tid=0x0278e000 nid=0x2bc6 waiting on 
> condition [0x7f2d6e4a5000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x00061c493500> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:316)
>   at 
> org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:322)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:394)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:178)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:93)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226)
>   at java.lang.Thread.run(Thread.java:745)
> =DomianSocketWatcher
> "Thread-759187" daemon prio=10 tid=0x0219c800 nid=0x8c56 runnable 
> [0x7f2dbe4cb000]
>java.lang.Thread.State: RUNNABLE
>   at org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(Native Method)
>   at 
> org.apache.hadoop.net.unix.DomainSocketWatcher.access$900(DomainSocketWatcher.java:52)
>   at 
> org.apache.hadoop.net.unix.DomainSocketWatcher$1.run(DomainSocketWatcher.java:474)
>   at java.lang.Thread.run(Thread.java:745)
> ===datanode error log
> ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
> datanode-:50010:DataXceiver error processing REQUEST_SHORT_CIRCUIT_SHM 
> operation src: unix:/var/run/hadoop-hdfs/dn.50010 dst: 
> java.net.SocketException: write(2) error: Broken pipe
> at org.apache.hadoop.net.unix.DomainSocket.writeArray0(Native Method)
> at org.apache.hadoop.net.unix.DomainSocket.access$300(DomainSocket.java:45)
> at 
> org.apache.hadoop.net.unix.DomainSocket$DomainOutputStream.write(DomainSocket.java:601)
> at 
> com.google.protobuf.CodedOutputStream.refreshBuffer(CodedOutputStream.java:833)
> at com.google.protobuf.CodedOutputStream.flush(CodedOutputStream.java:843)
> at 
> com.google.protobuf.AbstractMessageLite.writeDelimitedTo(AbstractMessageLite.java:91)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.sendShmSuccessResponse(DataXceiver.java:371)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:409)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:178)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:93)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226)
> at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Reopened] (HDFS-10322) DomianSocket error lead to more and more DataNode thread waiting

2016-04-26 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth reopened HDFS-10322:
--

> DomianSocket error lead to more and more DataNode thread waiting 
> -
>
> Key: HDFS-10322
> URL: https://issues.apache.org/jira/browse/HDFS-10322
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.5.0
>Reporter: ChenFolin
> Fix For: 2.6.4
>
>
> When open short read and  a DomianSoket broken pipe error happened,The 
> Datanode will produce more and more waiting threads.
>  It is similar to Bug HADOOP-11802, but i do not think they are same problem, 
> because the DomainSocket thread is in Running state.
> stack log:
> "DataXceiver for client unix:/var/run/hadoop-hdfs/dn.50010 [Waiting for 
> operation #1]" daemon prio=10 tid=0x0278e000 nid=0x2bc6 waiting on 
> condition [0x7f2d6e4a5000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x00061c493500> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:316)
>   at 
> org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:322)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:394)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:178)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:93)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226)
>   at java.lang.Thread.run(Thread.java:745)
> =DomianSocketWatcher
> "Thread-759187" daemon prio=10 tid=0x0219c800 nid=0x8c56 runnable 
> [0x7f2dbe4cb000]
>java.lang.Thread.State: RUNNABLE
>   at org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(Native Method)
>   at 
> org.apache.hadoop.net.unix.DomainSocketWatcher.access$900(DomainSocketWatcher.java:52)
>   at 
> org.apache.hadoop.net.unix.DomainSocketWatcher$1.run(DomainSocketWatcher.java:474)
>   at java.lang.Thread.run(Thread.java:745)
> ===datanode error log
> ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
> datanode-:50010:DataXceiver error processing REQUEST_SHORT_CIRCUIT_SHM 
> operation src: unix:/var/run/hadoop-hdfs/dn.50010 dst: 
> java.net.SocketException: write(2) error: Broken pipe
> at org.apache.hadoop.net.unix.DomainSocket.writeArray0(Native Method)
> at org.apache.hadoop.net.unix.DomainSocket.access$300(DomainSocket.java:45)
> at 
> org.apache.hadoop.net.unix.DomainSocket$DomainOutputStream.write(DomainSocket.java:601)
> at 
> com.google.protobuf.CodedOutputStream.refreshBuffer(CodedOutputStream.java:833)
> at com.google.protobuf.CodedOutputStream.flush(CodedOutputStream.java:843)
> at 
> com.google.protobuf.AbstractMessageLite.writeDelimitedTo(AbstractMessageLite.java:91)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.sendShmSuccessResponse(DataXceiver.java:371)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:409)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:178)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:93)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226)
> at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-10312) Large block reports may fail to decode at NameNode due to 64 MB protobuf maximum length restriction.

2016-04-19 Thread Chris Nauroth (JIRA)

Chris Nauroth created HDFS-10312:


 Summary: Large block reports may fail to decode at NameNode due to 
64 MB protobuf maximum length restriction.
 Key: HDFS-10312
 URL: https://issues.apache.org/jira/browse/HDFS-10312
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Chris Nauroth
Assignee: Chris Nauroth


Our RPC server caps the maximum size of incoming messages at 64 MB by default.  
For exceptional circumstances, this can be uptuned using 
{{ipc.maximum.data.length}}.  However, for block reports, there is still an 
internal maximum length restriction of 64 MB enforced by protobuf.  (Sample 
stack trace to follow in comments.)  This issue proposes to apply the same 
override to our block list decoding, so that large block reports can proceed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-10268) Ozone: end-to-end integration for create/get volumes, buckets and keys.

2016-04-06 Thread Chris Nauroth (JIRA)

Chris Nauroth created HDFS-10268:


 Summary: Ozone: end-to-end integration for create/get volumes, 
buckets and keys.
 Key: HDFS-10268
 URL: https://issues.apache.org/jira/browse/HDFS-10268
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Chris Nauroth
Assignee: Chris Nauroth


The HDFS-7240 feature branch now has the building blocks required to enable 
end-to-end functionality and testing for create/get volumes, buckets and keys.  
The scope of this patch is to complete the necessary integration in 
{{DistributedStorageHandler}} and related classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HDFS-10257) Quick Thread Local Storage set-up has a small flaw

2016-04-05 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-10257.
--
Resolution: Not A Problem

Great, thanks for confirming [~stevebovy].  I'll go ahead and close this issue.

> Quick Thread Local Storage set-up has a small flaw
> --
>
> Key: HDFS-10257
> URL: https://issues.apache.org/jira/browse/HDFS-10257
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: libhdfs
>Affects Versions: 2.6.4
> Environment: Linux 
>Reporter: Stephen Bovy
>Priority: Minor
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> In   jni_helper.c   in the   getJNIEnvfunction 
> The “THREAD_LOCAL_STORAGE_SET_QUICK(env);”   Macro   is   in the  wrong 
> location;   
> It should precede   the  “threadLocalStorageSet(env)”   as follows ::  
> THREAD_LOCAL_STORAGE_SET_QUICK(env);
> if (threadLocalStorageSet(env)) {
>   return NULL;
> }
> AND IN   “thread_local_storage.h”   the macro:   
> “THREAD_LOCAL_STORAGE_SET_QUICK”
> should be as follows :: 
> #ifdef HAVE_BETTER_TLS
>   #define THREAD_LOCAL_STORAGE_GET_QUICK() \
> static __thread JNIEnv *quickTlsEnv = NULL; \
> { \
>   if (quickTlsEnv) { \
> return quickTlsEnv; \
>   } \
> }
>   #define THREAD_LOCAL_STORAGE_SET_QUICK(env) \
> { \
>   quickTlsEnv = (env); \
>   return env;
> }
> #else
>   #define THREAD_LOCAL_STORAGE_GET_QUICK()
>   #define THREAD_LOCAL_STORAGE_SET_QUICK(env)
> #endif



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HDFS-350) DFSClient more robust if the namenode is busy doing GC

2016-03-22 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-350.

Resolution: Not A Problem

I'm resolving this issue.  In current versions, the client is more robust to 
this kind of failure.  The RPC layer implements retry policies.  Retried 
operations are handled gracefully using either an inherently idempotent 
implementation of the RPC or the retry cache for at-most-once execution.  In 
the event of an extremely long GC, the client would either retry and succeed 
after completion of the GC, or in more extreme cases it would trigger an HA 
failover and the client would successfully issue its call to the the new active 
NameNode.

> DFSClient more robust if the namenode is busy doing GC
> --
>
> Key: HDFS-350
> URL: https://issues.apache.org/jira/browse/HDFS-350
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> In the current code, if the client (writer) encounters an RPC error while 
> fetching a new block id from the namenode, it does not retry. It throws an 
> exception to the application. This becomes especially bad if the namenode is 
> in the middle of a GC and does not respond in time. The reason the client 
> throws an exception is because it does not know whether the namenode 
> successfully allocated a block for this file.
> One possible enhancement would be to make the client retry the addBlock RPC 
> if needed. The client can send the block list that it currently has. The 
> namenode can match the block list send by the client with what it has in its 
> own metadata and then send back a new blockid (or a previously allocated 
> blockid that the client had not yet received because the earlier RPC 
> timedout). This will make the client more robust!
> This works even when we support Appends because the namenode will *always* 
> verify that the client has the lease for the file in question.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-9920) Stop tracking CHANGES.txt in the HDFS-7240 feature branch.

2016-03-08 Thread Chris Nauroth (JIRA)

Chris Nauroth created HDFS-9920:
---

 Summary: Stop tracking CHANGES.txt in the HDFS-7240 feature branch.
 Key: HDFS-9920
 URL: https://issues.apache.org/jira/browse/HDFS-9920
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: build
Reporter: Chris Nauroth
Assignee: Chris Nauroth


Now that we have stopped tracking CHANGES.txt in our main branches, I'd like to 
do the same for the HDFS-7240 feature branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-9907) Exclude Ozone protobuf-generated classes from Findbugs analysis.

2016-03-04 Thread Chris Nauroth (JIRA)

Chris Nauroth created HDFS-9907:
---

 Summary: Exclude Ozone protobuf-generated classes from Findbugs 
analysis.
 Key: HDFS-9907
 URL: https://issues.apache.org/jira/browse/HDFS-9907
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Trivial


Pre-commit runs on the HDFS-7240 feature branch are currently flagging Ozone 
protobuf-generated classes with warnings.  These warnings aren't relevant, 
because we don't directly control the code generated by protoc.  We can exclude 
these classes in the Findbugs configuration, just like we do for other existing 
protobuf-generated classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HDFS-9520) PeerCache evicts too frequently causing connection restablishments

2016-02-22 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-9520.
-
Resolution: Won't Fix

I'm resolving this as Won't Fix as per prior discussion.  (Please feel free to 
reopen if there are further thoughts on configuration tuning.)

> PeerCache evicts too frequently causing connection restablishments
> --
>
> Key: HDFS-9520
> URL: https://issues.apache.org/jira/browse/HDFS-9520
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
> Attachments: HDFS-9520.png
>
>
> Env: 20 node setup
> dfs.client.socketcache.capacity = 16
> Issue:
> ==
> Monitored PeerCache and it was evicting lots of connections during close. Set 
> "dfs.client.socketcache.capacity=20" and tested again. Evictions still 
> happened. Screenshot of profiler is attached in the JIRA.
> Workaround:
> ===
> Temp fix was to set "dfs.client.socketcache.capacity=1000" to prevent 
> eviction. 
> Added more debug logs revealed that multimap.size() was 40 instead of 20. 
> LinkedListMultimap returns the total values instead of key size causing lots 
> of evictions.
> {code}
>if (capacity == multimap.size()) {
>   evictOldest();
> }
> {code}
> Should this be (capacity == multimap.keySet().size())  or is it expected that 
> the "dfs.client.socketcache.capacity" be set to very high value?
> \cc [~gopalv], [~sseth]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HDFS-8943) Read apis in ByteRangeInputStream does not read all the bytes specified when chunked transfer-encoding is used in the server

2016-02-22 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-8943.
-
Resolution: Won't Fix

[~cmccabe], thank you for the reminder.  This is resolved as Won't Fix.

> Read apis in ByteRangeInputStream does not read all the bytes specified when 
> chunked transfer-encoding is used in the server
> 
>
> Key: HDFS-8943
> URL: https://issues.apache.org/jira/browse/HDFS-8943
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 2.7.1
>Reporter: Shradha Revankar
>Assignee: Shradha Revankar
> Attachments: HDFS-8943.000.patch
>
>
> With the default Webhdfs server implementation the read apis in 
> ByteRangeInputStream work as expected reading the correct number of bytes for 
> these apis :
> {{public int read(byte b[], int off, int len)}}
> {{public int read(long position, byte[] buffer, int offset, int length)}}
> But when a custom Webhdfs server implementation is plugged in which uses 
> chunked Transfer-encoding, these apis read only the first chunk. Simple fix 
> would be to loop and read till bytes specified similar to {{readfully()}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HDFS-9798) TestHdfsNativeCodeLoader fails

2016-02-12 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-9798.
-
Resolution: Duplicate

Hi [~ajisakaa].  A recent Yetus change is preventing pre-commit from building 
libhadoop.so before running the HDFS tests.  We're tracking the fix in 
YETUS-281, and there is a patch in progress.

> TestHdfsNativeCodeLoader fails
> --
>
> Key: HDFS-9798
> URL: https://issues.apache.org/jira/browse/HDFS-9798
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Akira AJISAKA
>
> TestHdfsNativeCodeLoader fails intermittently in Jenkins.
> * 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14473/testReport/org.apache.hadoop.fs/TestHdfsNativeCodeLoader/testNativeCodeLoaded/
> * 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14475/testReport/org.apache.hadoop.fs/TestHdfsNativeCodeLoader/testNativeCodeLoaded/
> Error message
> {noformat}
> TestNativeCodeLoader: libhadoop.so testing was required, but libhadoop.so was 
> not loaded.  LD_LIBRARY_PATH = 
> ${env.LD_LIBRARY_PATH}:/testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/native/target/usr/local/lib:/testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/../../hadoop-common-project/hadoop-common/target/native/target/usr/local/lib
> {noformat}
> Stacktrace
> {noformat}
> java.lang.AssertionError: TestNativeCodeLoader: libhadoop.so testing was 
> required, but libhadoop.so was not loaded.  LD_LIBRARY_PATH = 
> ${env.LD_LIBRARY_PATH}:/testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/native/target/usr/local/lib:/testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/../../hadoop-common-project/hadoop-common/target/native/target/usr/local/lib
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.hadoop.fs.TestHdfsNativeCodeLoader.testNativeCodeLoaded(TestHdfsNativeCodeLoader.java:46)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [Release thread] 2.8.0 release activities

2016-02-04 Thread Chris Nauroth

FYI, I've just needed to raise HDFS-9761 to blocker status for the 2.8.0
release.

--Chris Nauroth




On 2/3/16, 6:19 PM, "Karthik Kambatla"  wrote:

>Thanks Vinod. Not labeling 2.8.0 stable sounds perfectly reasonable to me.
>Let us not call it alpha or beta though, it is quite confusing. :)
>
>On Wed, Feb 3, 2016 at 8:17 PM, Gangumalla, Uma 
>wrote:
>
>> Thanks Vinod. +1 for 2.8 release start.
>>
>> Regards,
>> Uma
>>
>> On 2/3/16, 3:53 PM, "Vinod Kumar Vavilapalli" 
>>wrote:
>>
>> >Seems like all the features listed in the Roadmap wiki are in. I¹m
>>going
>> >to try cutting an RC this weekend for a first/non-stable release off of
>> >branch-2.8.
>> >
>> >Let me know if anyone has any objections/concerns.
>> >
>> >Thanks
>> >+Vinod
>> >
>> >> On Nov 25, 2015, at 5:59 PM, Vinod Kumar Vavilapalli
>> >> wrote:
>> >>
>> >> Branch-2.8 is created.
>> >>
>> >> As mentioned before, the goal on branch-2.8 is to put improvements /
>> >>fixes to existing features with a goal of converging on an alpha
>>release
>> >>soon.
>> >>
>> >> Thanks
>> >> +Vinod
>> >>
>> >>
>> >>> On Nov 25, 2015, at 5:30 PM, Vinod Kumar Vavilapalli
>> >>> wrote:
>> >>>
>> >>> Forking threads now in order to track all things related to the
>> >>>release.
>> >>>
>> >>> Creating the branch now.
>> >>>
>> >>> Thanks
>> >>> +Vinod
>> >>>
>> >>>
>> >>>> On Nov 25, 2015, at 11:37 AM, Vinod Kumar Vavilapalli
>> >>>> wrote:
>> >>>>
>> >>>> I think we¹ve converged at a high level w.r.t 2.8. And as I just
>>sent
>> >>>>out an email, I updated the Roadmap wiki reflecting the same:
>> >>>>https://wiki.apache.org/hadoop/Roadmap
>> >>>><https://wiki.apache.org/hadoop/Roadmap>
>> >>>>
>> >>>> I plan to create a 2.8 branch EOD today.
>> >>>>
>> >>>> The goal for all of us should be to restrict improvements & fixes
>>to
>> >>>>only (a) the feature-set documented under 2.8 in the RoadMap wiki
>>and
>> >>>>(b) other minor features that are already in 2.8.
>> >>>>
>> >>>> Thanks
>> >>>> +Vinod
>> >>>>
>> >>>>
>> >>>>> On Nov 11, 2015, at 12:13 PM, Vinod Kumar Vavilapalli
>> >>>>>mailto:vino...@hortonworks.com>> wrote:
>> >>>>>
>> >>>>> - Cut a branch about two weeks from now
>> >>>>> - Do an RC mid next month (leaving ~4weeks since branch-cut)
>> >>>>> - As with 2.7.x series, the first release will still be called as
>> >>>>>early / alpha release in the interest of
>> >>>>>   ‹ gaining downstream adoption
>> >>>>>   ‹ wider testing,
>> >>>>>   ‹ yet reserving our right to fix any inadvertent
>>incompatibilities
>> >>>>>introduced.
>> >>>>
>> >>>
>> >>
>> >
>>
>>

[jira] [Created] (HDFS-9711) Integrate CSRF prevention filter in WebHDFS.

2016-01-27 Thread Chris Nauroth (JIRA)

Chris Nauroth created HDFS-9711:
---

 Summary: Integrate CSRF prevention filter in WebHDFS.
 Key: HDFS-9711
 URL: https://issues.apache.org/jira/browse/HDFS-9711
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode, namenode, webhdfs
Reporter: Chris Nauroth
Assignee: Chris Nauroth


HADOOP-12691 introduced a filter in Hadoop Common to help REST APIs guard 
against cross-site request forgery attacks.  This issue tracks integration of 
that filter in WebHDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Hadoop encryption module as Apache Chimera incubator project

2016-01-21 Thread Chris Nauroth

> My question is, could we consider to adopt the approach for libhadoop.so
>library?


This is something that I have proposed already in HADOOP-11127.  There is
not consensus on proceeding with it from the contributors in that
discussion.  There are some big challenges around how it would impact the
release process.  I also have not had availability to prototype an
implementation to make a stronger case for feasibility.  Kai, if this is
something that you're interested in, then I encourage you to join the
discussion in HADOOP-11127 or even pick up prototyping work if you'd like.
 Since we have that existing JIRA, let's keep this mail thread focused
just on Chimera.  Thank you!

Uma and everyone, thank you for the proposal.  +1 to proceed.

--Chris Nauroth




On 1/20/16, 11:16 PM, "Zheng, Kai"  wrote:

>Thanks Uma. 
>
>I have a question by the way, it's not about Chimera project, but about
>the mentioned advantage 1 and libhadoop.so installation problem. I copied
>the saying as below for convenience.
>
>>>1. As Chimera embedded the native in jar (similar to Snappy java), it
>>>solves the current issues in Hadoop that a HDFS client has to depend
>>>libhadoop.so if the client needs to read encryption zone in HDFS. This
>>>means a HDFS client may has to depend a Hadoop installation in local
>>>machine. For example, HBase uses depends on HDFS client jar other than
>>>a Hadoop installation and then has no access to libhadoop.so. So HBase
>>>cannot use an encryption zone or it cause error.
>
>I believe Haifeng had mentioned the problem in a call when discussing
>erasure coding work, but until now I got to understand what's the
>problem and how Chimera or Snappy Java solved it. It looks like there can
>be some thin clients that don't rely on Hadoop installation so no
>libhadoop.so is available to use on the client host. The approach
>mentioned here is to bundle the library file (*.so) into a jar and
>dynamically extract the file when loading it. When no library file is
>contained in the jar then it goes to the normal case, loading it from an
>installation. It's smart and nice! My question is, could we consider to
>adopt the approach for libhadoop.so library? It might be worth to discuss
>because, we're bundling more and more things into the library (recently
>we just put Intel ISA-L support into it), and such things may be desired
>for such clients. It may also be helpful for development, because
>sometimes when run unit tests that involve native codes, some error may
>happen and complain no place to find libhadoop.so. Thanks.
>
>Regards,
>Kai
>
>-Original Message-
>From: Gangumalla, Uma [mailto:uma.ganguma...@intel.com]
>Sent: Thursday, January 21, 2016 11:20 AM
>To: hdfs-dev@hadoop.apache.org
>Subject: Re: Hadoop encryption module as Apache Chimera incubator project
>
>Hi All,
>Thanks Andrew, ATM, Yi, Kai, Larry. Thanks Haifeng on clarifying release
>stuff.
>
>Please find my responses below.
>
>Andrew wrote:
>If it becomes part of Apache Commons, could we make Chimera a separate
>JAR? We have real difficulties bumping dependency versions right now, so
>ideally we don't need to bump our existing Commons dependencies to use
>Chimera.
>[UMA] Yes, We plan to make separate Jar.
>
>Andrew wrote:
>With this refactoring, do we have confidence that we can get our desired
>changes merged and released in a timely fashion? e.g. if we find another
>bug like HADOOP-11343, we'll first need to get the fix into Chimera, have
>a new Chimera release, then bump Hadoop's Chimera dependency. This also
>relates to the previous point, it's easier to do this dependency bump if
>Chimera is a separate JAR.
>[UMA] Yes and the main target users for this project is Hadoop and Spark
>right now. 
>So, Hadoop requirements would be the priority tasks for it.
>
>
>ATM wrote:
>Uma, would you be up for approaching the Apache Commons folks saying that
>you'd like to contribute Chimera? I'd recommend saying that Hadoop and
>Spark are both on board to depend on this.
>[UMA] Yes, will do that.
>
>
>Kai wrote:
>Just a question. Becoming a separate jar/module in Apache Commons means
>Chimera or the module can be released separately or in a timely manner,
>not coupling with other modules for release in the project? Thanks.
>
>[Haifeng] From apache commons project web (https://commons.apache.org/),
>we see there is already a long list of components in its Apache Commons
>Proper list. Each component has its own release version and date. To join
>and be one of the list is the target.
>
>Larry wrote:
>If what we are looking for is some level of autonomy then it would need
&

[jira] [Reopened] (HDFS-6255) fuse_dfs will not adhere to ACL permissions in some cases

2016-01-19 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth reopened HDFS-6255:
-

I have a theory about what is happening.  fuse_dfs is not specifically made 
aware of the HDFS ACLs.  It only has visibility into the basic permissions.  In 
the case of an ACL entry that widens access (i.e. grant access to a specific 
named user or group), then if FUSE itself is enforcing access based solely on 
permissions, it might block access at the FUSE layer before even delegating to 
the NameNode.  This would be a limitation in granting access via ACLs through 
fuse_dfs, but it would not be a security hole.  (The problem can only make 
access more restrictive, not more relaxed.)

I tried to confirm this in the FUSE code, but I wasn't successful, and I don't 
have time to look deeper right now.  I'm seeing some comments from various 
sources that FUSE is unaware of POSIX ACLs, but can be made aware of xattrs.  
This might mean there is a possibility of making it work with some code changes 
in fuse_dfs.

I'm not entirely sure this is feasible yet, but I'm going to reopen the issue 
and mark it as a new feature request.

> fuse_dfs will not adhere to ACL permissions in some cases
> -
>
> Key: HDFS-6255
> URL: https://issues.apache.org/jira/browse/HDFS-6255
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fuse-dfs
>Affects Versions: 3.0.0, 2.4.0
>    Reporter: Stephen Chu
>Assignee: Chris Nauroth
>
> As hdfs user, I created a directory /tmp/acl_dir/ and set permissions to 700. 
> Then I set a new acl group:jenkins:rwx on /tmp/acl_dir.
> {code}
> jenkins@hdfs-vanilla-1 ~]$ hdfs dfs -getfacl /tmp/acl_dir
> # file: /tmp/acl_dir
> # owner: hdfs
> # group: supergroup
> user::rwx
> group::---
> group:jenkins:rwx
> mask::rwx
> other::---
> {code}
> Through the FsShell, the jenkins user can list /tmp/acl_dir as well as create 
> a file and directory inside.
> {code}
> [jenkins@hdfs-vanilla-1 ~]$ hdfs dfs -touchz /tmp/acl_dir/testfile1
> [jenkins@hdfs-vanilla-1 ~]$ hdfs dfs -mkdir /tmp/acl_dir/testdir1
> hdfs dfs -ls /tmp/acl[jenkins@hdfs-vanilla-1 ~]$ hdfs dfs -ls /tmp/acl_dir/
> Found 2 items
> drwxr-xr-x   - jenkins supergroup  0 2014-04-17 19:11 
> /tmp/acl_dir/testdir1
> -rw-r--r--   1 jenkins supergroup  0 2014-04-17 19:11 
> /tmp/acl_dir/testfile1
> [jenkins@hdfs-vanilla-1 ~]$ 
> {code}
> However, as the same jenkins user, when I try to cd into /tmp/acl_dir using a 
> fuse_dfs mount, I get permission denied. Same permission denied when I try to 
> create or list files.
> {code}
> [jenkins@hdfs-vanilla-1 tmp]$ ls -l
> total 16
> drwxrwx--- 4 hdfsnobody 4096 Apr 17 19:11 acl_dir
> drwx-- 2 hdfsnobody 4096 Apr 17 18:30 acl_dir_2
> drwxr-xr-x 3 mapred  nobody 4096 Mar 11 03:53 mapred
> drwxr-xr-x 4 jenkins nobody 4096 Apr 17 07:25 testcli
> -rwx-- 1 hdfsnobody0 Apr  7 17:18 tf1
> [jenkins@hdfs-vanilla-1 tmp]$ cd acl_dir
> bash: cd: acl_dir: Permission denied
> [jenkins@hdfs-vanilla-1 tmp]$ touch acl_dir/testfile2
> touch: cannot touch `acl_dir/testfile2': Permission denied
> [jenkins@hdfs-vanilla-1 tmp]$ mkdir acl_dir/testdir2
> mkdir: cannot create directory `acl_dir/testdir2': Permission denied
> [jenkins@hdfs-vanilla-1 tmp]$ 
> {code}
> The fuse_dfs debug output doesn't show any error for the above operations:
> {code}
> unique: 18, opcode: OPENDIR (27), nodeid: 2, insize: 48
>unique: 18, success, outsize: 32
> unique: 19, opcode: READDIR (28), nodeid: 2, insize: 80
> readdir[0] from 0
>unique: 19, success, outsize: 312
> unique: 20, opcode: GETATTR (3), nodeid: 2, insize: 56
> getattr /tmp
>unique: 20, success, outsize: 120
> unique: 21, opcode: READDIR (28), nodeid: 2, insize: 80
>unique: 21, success, outsize: 16
> unique: 22, opcode: RELEASEDIR (29), nodeid: 2, insize: 64
>unique: 22, success, outsize: 16
> unique: 23, opcode: GETATTR (3), nodeid: 2, insize: 56
> getattr /tmp
>unique: 23, success, outsize: 120
> unique: 24, opcode: GETATTR (3), nodeid: 3, insize: 56
> getattr /tmp/acl_dir
>unique: 24, success, outsize: 120
> unique: 25, opcode: GETATTR (3), nodeid: 3, insize: 56
> getattr /tmp/acl_dir
>unique: 25, success, outsize: 120
> unique: 26, opcode: GETATTR (3), nodeid: 3, insize: 56
> getattr /tmp/acl_dir
>unique: 26, success, outsize: 120
> unique: 27, opcode: GETATTR (3), nodeid: 3, insize: 56
> getattr /tmp/acl_dir
>unique: 27, success, outsize: 120
> unique: 28, opcode: GETATT

[jira] [Reopened] (HDFS-9569) Log the name of the fsimage being loaded for better supportability

2015-12-17 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth reopened HDFS-9569:
-

I have reverted this patch from trunk, branch-2, branch-2.8 and branch-2.7.  
This patch introduced a test failure in 
{{TestDFSUpgradeFromImage#testUpgradeFromRel2ReservedImage}}.  The test expects 
to see an {{IllegalArgumentException}}, and then retry the upgrade with the 
option to rename reserved paths.  After this patch, the error handling masked 
the {{IllegalArgumentException}}, so the test no longer worked as expected.

> Log the name of the fsimage being loaded for better supportability
> --
>
> Key: HDFS-9569
> URL: https://issues.apache.org/jira/browse/HDFS-9569
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Trivial
>  Labels: supportability
> Fix For: 2.7.3
>
> Attachments: HDFS-9569.001.patch
>
>
> When NN starts to load fsimage, it does
> {code}
>  void loadFSImageFile(FSNamesystem target, MetaRecoveryContext recovery,
>   FSImageFile imageFile, StartupOption startupOption) throws IOException {
>   LOG.debug("Planning to load image :\n" + imageFile);
>   ..
> long txId = loader.getLoadedImageTxId();
> LOG.info("Loaded image for txid " + txId + " from " + curFile);
> {code}
> A debug msg is issued at the beginning with the fsimage file name, then at 
> the end an info msg is issued after loading.
> If the fsimage loading failed due to corrupted fsimage (see HDFS-9406), we 
> don't see the first msg. It'd be helpful to always be able to see from NN 
> logs what fsimage file it's loading.
> Two improvements:
> 1. Change the above debug to info
> 2. If exception happens when loading fsimage, be sure to report the fsimage 
> name being loaded in the error message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-9572) Prevent DataNode log spam if a client connects on the data transfer port but sends no data.

2015-12-17 Thread Chris Nauroth (JIRA)

Chris Nauroth created HDFS-9572:
---

 Summary: Prevent DataNode log spam if a client connects on the 
data transfer port but sends no data.
 Key: HDFS-9572
 URL: https://issues.apache.org/jira/browse/HDFS-9572
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Reporter: Chris Nauroth
Assignee: Chris Nauroth


Monitoring tools may choose to check liveness of the DataNode's data transfer 
port by connecting to it.  The monitoring tool will close the connection 
immediately after establishment without sending any data.  When this happens, 
the DataNode encounters an unexpected EOF and logs a full stack trace.  This 
creates unneeded noise in the logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-9552) Document types of permission checks performed for HDFS operations.

2015-12-11 Thread Chris Nauroth (JIRA)

Chris Nauroth created HDFS-9552:
---

 Summary: Document types of permission checks performed for HDFS 
operations.
 Key: HDFS-9552
 URL: https://issues.apache.org/jira/browse/HDFS-9552
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation
Reporter: Chris Nauroth
Assignee: Chris Nauroth


The HDFS permissions guide discusses our use of a POSIX-like model with read, 
write and execute permissions associated with users, groups and the catch-all 
other class.  However, there is no documentation that describes exactly what 
permission checks are performed by user-facing HDFS operations.  This is a 
frequent source of questions, so it would be good to document this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-9534) Add CLI command to clear storage policy from a path.

2015-12-09 Thread Chris Nauroth (JIRA)

Chris Nauroth created HDFS-9534:
---

 Summary: Add CLI command to clear storage policy from a path.
 Key: HDFS-9534
 URL: https://issues.apache.org/jira/browse/HDFS-9534
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: tools
Reporter: Chris Nauroth


The {{hdfs storagepolicies}} command has sub-commands for {{-setStoragePolicy}} 
and {{-getStoragePolicy}} on a path.  However, there is no 
{{-removeStoragePolicy}} to remove a previously set storage policy on a path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Outdated "HDFS Architecture" page

2015-12-03 Thread Chris Nauroth

Hello Dmitry,

Thank you for reporting this.  You're absolutely right.  That page is
out-of-date with respect to the current architecture.

I filed JIRA HDFS-9505 to track improving this documentation.  If you have
any desire to contribute a documentation patch for this, then please feel
free to assign the JIRA to yourself.

--Chris Nauroth

On 12/3/15, 12:15 PM, "Dmitry Simonov"  wrote:

>Hello!
>
>Excuse me if I'm writing to the wrong mailing list.
>
>The information at
>http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/Hdfs
>Design.html
>seems outdated:
>
>1. "There is a plan to support appending-writes to files in the future."
>
>Append was implemented in HADOOP-1700, wasn't it?
>
>2. "HDFS does not yet implement user quotas or access permissions"
>
>It contradicts to
>https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/Hdf
>sQuotaAdminGuide.html
>and
>https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/Hdf
>sPermissionsGuide.html
>
>3. "Currently, automatic restart and failover of the NameNode software to
>another machine is not supported." - but there is High Availability
>feature.
>
>-- 
>Best Regards,
>Dmitry Simonov

[jira] [Created] (HDFS-9505) HDFS Architecture documentation needs to be refreshed.

2015-12-03 Thread Chris Nauroth (JIRA)

Chris Nauroth created HDFS-9505:
---

 Summary: HDFS Architecture documentation needs to be refreshed.
 Key: HDFS-9505
 URL: https://issues.apache.org/jira/browse/HDFS-9505
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation
Reporter: Chris Nauroth
Priority: Minor


The HDFS Architecture document is out of date with respect to the current 
design of the system.

http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html

There are multiple false statements and omissions of recent features.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HDFS-9495) Data node opens random port for HTTPServer, not configurable

2015-12-02 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-9495.
-
Resolution: Duplicate

Hello, [~neha.bathra].  This issue is tracked in HDFS-9049, so I'm resolving 
this one as a duplicate.

> Data node opens random port for HTTPServer, not configurable
> 
>
> Key: HDFS-9495
> URL: https://issues.apache.org/jira/browse/HDFS-9495
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: neha
>
> Data node opens random port for HTTP Server which is not configurable 
> currently. Better to make it configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HDFS-9471) Webhdfs not working with shell command when kerberos security+https is enabled.

2015-11-30 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-9471.
-
Resolution: Not A Problem

[~surendrasingh], that's a good point about the documentation.  I filed 
HDFS-9483 to track a documentation improvement.  If you're interested in 
providing the documentation, please feel free to pick up that one.  I'm going 
to resolve this one.

> Webhdfs not working with shell command when kerberos security+https is 
> enabled.
> ---
>
> Key: HDFS-9471
> URL: https://issues.apache.org/jira/browse/HDFS-9471
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 2.7.1
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
>Priority: Blocker
> Attachments: HDFS-9471.01.patch
>
>
> *Client exception*
> {code}
> secure@host85:/opt/hdfsdata/HA/install/hadoop/namenode/bin> ./hdfs dfs -ls 
> webhdfs://x.x.x.x:50070/test
> 15/11/25 18:46:55 ERROR web.WebHdfsFileSystem: Unable to get HomeDirectory 
> from original File System
> java.net.SocketException: Unexpected end of file from server
> at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:792)
> {code}
> *Exception in namenode log*
> {code}
> 2015-11-26 11:03:18,231 WARN org.mortbay.log: EXCEPTION
> javax.net.ssl.SSLException: Unrecognized SSL message, plaintext connection?
> at 
> sun.security.ssl.InputRecord.handleUnknownRecord(InputRecord.java:710)
> at sun.security.ssl.InputRecord.read(InputRecord.java:527)
> at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:961)
> at 
> sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1363)
> at 
> sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1391)
> at 
> sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1375)
> at 
> org.mortbay.jetty.security.SslSocketConnector$SslConnection.run(SslSocketConnector.java:708)
> at 
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> {code}
> This is because URL schema hard coded in 
> {{WebHdfsFileSystem.getTransportScheme()}}.
> {code}
>  /**
>* return the underlying transport protocol (http / https).
>*/
>   protected String getTransportScheme() {
> return "http";
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-9483) Documentation does not cover use of "swebhdfs" as URL scheme for SSL-secured WebHDFS.

2015-11-30 Thread Chris Nauroth (JIRA)

Chris Nauroth created HDFS-9483:
---

 Summary: Documentation does not cover use of "swebhdfs" as URL 
scheme for SSL-secured WebHDFS.
 Key: HDFS-9483
 URL: https://issues.apache.org/jira/browse/HDFS-9483
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation
Reporter: Chris Nauroth


If WebHDFS is secured with SSL, then you can use "swebhdfs" as the scheme in a 
URL to access it.  The current documentation does not state this anywhere.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [DISCUSS] Looking to a 2.8.0 release

2015-11-25 Thread Chris Nauroth

+1.  Thanks, Vinod.

--Chris Nauroth




On 11/25/15, 1:45 PM, "Vinod Kumar Vavilapalli"  wrote:

>Okay, tx for this clarification Chris! I dug more into this and now
>realized the actual scope of this. Given the the limited nature of this
>feature (non-Namenode etc) and the WIP nature of the larger umbrella
>HADOOP-11744, we will ship the feature but I’ll stop calling this out as
>a notable feature.
>
>Thanks
>+Vinod
>
>
>> On Nov 25, 2015, at 12:04 PM, Chris Nauroth 
>>wrote:
>> 
>> Hi Vinod,
>> 
>> The HDFS-8155 work is complete in branch-2 already, so feel free to
>> include it in the roadmap.
>> 
>> For those watching the thread that aren't familiar with HDFS-8155, I
>>want
>> to call out that it was a client-side change only.  The WebHDFS client
>>is
>> capable of obtaining OAuth2 tokens and passing them along in its HTTP
>> requests.  The NameNode and DataNode server side currently do not have
>>any
>> support for OAuth2, so overall, this feature is only useful in some very
>> unique deployment architectures right now.  This is all discussed
>> explicitly in documentation committed with HDFS-8155, but I wanted to
>> prevent any mistaken assumptions for people only reading this thread.
>> 
>> --Chris Nauroth
>> 
>> 
>> 
>> 
>> On 11/25/15, 11:08 AM, "Vinod Kumar Vavilapalli" 
>> wrote:
>> 
>>> This is the current state from the feedback I gathered.
>>> - Support priorities across applications within the same queue
>>>YARN-1963
>>>   ‹ Can push as an alpha / beta feature per Sunil
>>> - YARN-1197 Support changing resources of an allocated container:
>>>   ‹ Can push as an alpha/beta feature per Wangda
>>> - YARN-3611 Support Docker Containers In LinuxContainerExecutor: Well
>>> most of it anyways.
>>>   ‹ Can push as an alpha feature.
>>> - YARN Timeline Service v1.5 - YARN-4233
>>>   ‹ Should include per Li Lu
>>> - YARN Timeline Service Next generation: YARN-2928
>>>   ‹ Per analysis from Sangjin, drop this from 2.8.
>>> 
>>> One open feature status
>>> - HDFS-8155Support OAuth2 in WebHDFS: Alpha / Early feature?
>>> 
>>> Updated the Roadmap wiki with the same.
>>> 
>>> Thanks
>>> +Vinod
>>> 
>>>> On Nov 13, 2015, at 12:12 PM, Sangjin Lee  wrote:
>>>> 
>>>> I reviewed the current state of the YARN-2928 changes regarding its
>>>> impact
>>>> if the timeline service v.2 is disabled. It does appear that there
>>>>are a
>>>> lot of things that still do get created and enabled unconditionally
>>>> regardless of configuration. While this is understandable when we were
>>>> working to implement the feature, this clearly needs to be cleaned up
>>>>so
>>>> that when disabled the timeline service v.2 doesn't impact other
>>>>things.
>>>> 
>>>> I filed a JIRA for that work:
>>>> https://issues.apache.org/jira/browse/YARN-4356
>>>> 
>>>> We need to complete it before we can merge.
>>>> 
>>>> Somewhat related is the status of the configuration and what it means
>>>>in
>>>> various contexts (client/app-side vs. server-side, v.1 vs. v.2,
>>>>etc.). I
>>>> know there is an ongoing discussion regarding YARN-4183. We'll need to
>>>> reflect the outcome of that discussion.
>>>> 
>>>> My overall impression of whether this can be done for 2.8 is that it
>>>> looks
>>>> rather challenging given the suggested timeframe. We also need to
>>>> complete
>>>> several major tasks before it is ready.
>>>> 
>>>> Sangjin
>>>> 
>>>> 
>>>> On Wed, Nov 11, 2015 at 5:49 PM, Sangjin Lee  wrote:
>>>> 
>>>>> 
>>>>> On Wed, Nov 11, 2015 at 12:13 PM, Vinod Vavilapalli <
>>>>> vino...@hortonworks.com> wrote:
>>>>> 
>>>>>>   ‹ YARN Timeline Service Next generation: YARN-2928: Lots of
>>>>>> momentum,
>>>>>> but clearly a work in progress. Two options here
>>>>>>   ‹ If it is safe to ship it into 2.8 in a disable manner, we
>>>>>>can
>>>>>> get the early code into trunk and all the way int o2.8.
>>>>>>   ‹ If it is not safe, it organically rolls over into 2.9
>>>>>> 
>>>>> 
>>>>> I'll review the changes on YARN-2928 to see what impact it has (if
>>>>> any) if
>>>>> the timeline service v.2 is disabled.
>>>>> 
>>>>> Another condition for it to make 2.8 is whether the branch will be
>>>>>in a
>>>>> shape in a couple of weeks such that it adds value for folks that
>>>>>want
>>>>> to
>>>>> test it. Hopefully it will become clearer soon.
>>>>> 
>>>>> Sangjin
>>>>> 
>>> 
>> 
>> 
>
>

Re: [DISCUSS] Looking to a 2.8.0 release

2015-11-25 Thread Chris Nauroth

Hi Vinod,

The HDFS-8155 work is complete in branch-2 already, so feel free to
include it in the roadmap.

For those watching the thread that aren't familiar with HDFS-8155, I want
to call out that it was a client-side change only.  The WebHDFS client is
capable of obtaining OAuth2 tokens and passing them along in its HTTP
requests.  The NameNode and DataNode server side currently do not have any
support for OAuth2, so overall, this feature is only useful in some very
unique deployment architectures right now.  This is all discussed
explicitly in documentation committed with HDFS-8155, but I wanted to
prevent any mistaken assumptions for people only reading this thread.

--Chris Nauroth




On 11/25/15, 11:08 AM, "Vinod Kumar Vavilapalli" 
wrote:

>This is the current state from the feedback I gathered.
> - Support priorities across applications within the same queue YARN-1963
>‹ Can push as an alpha / beta feature per Sunil
> - YARN-1197 Support changing resources of an allocated container:
>‹ Can push as an alpha/beta feature per Wangda
> - YARN-3611 Support Docker Containers In LinuxContainerExecutor: Well
>most of it anyways.
>‹ Can push as an alpha feature.
> - YARN Timeline Service v1.5 - YARN-4233
>‹ Should include per Li Lu
> - YARN Timeline Service Next generation: YARN-2928
>‹ Per analysis from Sangjin, drop this from 2.8.
>
>One open feature status
> - HDFS-8155Support OAuth2 in WebHDFS: Alpha / Early feature?
>
>Updated the Roadmap wiki with the same.
>
>Thanks
>+Vinod
>
>> On Nov 13, 2015, at 12:12 PM, Sangjin Lee  wrote:
>> 
>> I reviewed the current state of the YARN-2928 changes regarding its
>>impact
>> if the timeline service v.2 is disabled. It does appear that there are a
>> lot of things that still do get created and enabled unconditionally
>> regardless of configuration. While this is understandable when we were
>> working to implement the feature, this clearly needs to be cleaned up so
>> that when disabled the timeline service v.2 doesn't impact other things.
>> 
>> I filed a JIRA for that work:
>> https://issues.apache.org/jira/browse/YARN-4356
>> 
>> We need to complete it before we can merge.
>> 
>> Somewhat related is the status of the configuration and what it means in
>> various contexts (client/app-side vs. server-side, v.1 vs. v.2, etc.). I
>> know there is an ongoing discussion regarding YARN-4183. We'll need to
>> reflect the outcome of that discussion.
>> 
>> My overall impression of whether this can be done for 2.8 is that it
>>looks
>> rather challenging given the suggested timeframe. We also need to
>>complete
>> several major tasks before it is ready.
>> 
>> Sangjin
>> 
>> 
>> On Wed, Nov 11, 2015 at 5:49 PM, Sangjin Lee  wrote:
>> 
>>> 
>>> On Wed, Nov 11, 2015 at 12:13 PM, Vinod Vavilapalli <
>>> vino...@hortonworks.com> wrote:
>>> 
>>>>‹ YARN Timeline Service Next generation: YARN-2928: Lots of
>>>>momentum,
>>>> but clearly a work in progress. Two options here
>>>>‹ If it is safe to ship it into 2.8 in a disable manner, we can
>>>> get the early code into trunk and all the way int o2.8.
>>>>‹ If it is not safe, it organically rolls over into 2.9
>>>> 
>>> 
>>> I'll review the changes on YARN-2928 to see what impact it has (if
>>>any) if
>>> the timeline service v.2 is disabled.
>>> 
>>> Another condition for it to make 2.8 is whether the branch will be in a
>>> shape in a couple of weeks such that it adds value for folks that want
>>>to
>>> test it. Hopefully it will become clearer soon.
>>> 
>>> Sangjin
>>> 
>

[jira] [Resolved] (HDFS-9370) TestDataNodeUGIProvider fails intermittently due to non-deterministic cache expiry.

2015-11-24 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-9370.
-
Resolution: Duplicate

> TestDataNodeUGIProvider fails intermittently due to non-deterministic cache 
> expiry.
> ---
>
> Key: HDFS-9370
> URL: https://issues.apache.org/jira/browse/HDFS-9370
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>    Reporter: Chris Nauroth
>    Assignee: Chris Nauroth
>Priority: Minor
> Attachments: HDFS-9370.001.patch, HDFS-9370.002.patch
>
>
> {{TestDataNodeUGIProvider}} has hard-coded sleep times waiting for background 
> expiration of entries in a Guava cache.  I have seen this test suite fail 
> intermittently, because expiration is not guaranteed to happen strictly on 
> the boundary of the period defined by the cache's expiration time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Reopened] (HDFS-9370) TestDataNodeUGIProvider fails intermittently due to non-deterministic cache expiry.

2015-11-24 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth reopened HDFS-9370:
-

> TestDataNodeUGIProvider fails intermittently due to non-deterministic cache 
> expiry.
> ---
>
> Key: HDFS-9370
> URL: https://issues.apache.org/jira/browse/HDFS-9370
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>    Reporter: Chris Nauroth
>    Assignee: Chris Nauroth
>Priority: Minor
> Attachments: HDFS-9370.001.patch, HDFS-9370.002.patch
>
>
> {{TestDataNodeUGIProvider}} has hard-coded sleep times waiting for background 
> expiration of entries in a Guava cache.  I have seen this test suite fail 
> intermittently, because expiration is not guaranteed to happen strictly on 
> the boundary of the period defined by the cache's expiration time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-9459) hadoop-hdfs-native-client fails test build on Windows after transition to ctest.

2015-11-24 Thread Chris Nauroth (JIRA)

Chris Nauroth created HDFS-9459:
---

 Summary: hadoop-hdfs-native-client fails test build on Windows 
after transition to ctest.
 Key: HDFS-9459
 URL: https://issues.apache.org/jira/browse/HDFS-9459
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: build, test
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Blocker


HDFS-9369 transitioned to usage of {{ctest}} for running the HDFS native tests. 
 This broke the {{mvn test}} build on Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-9458) TestBackupNode always binds to port 50070, which can cause bind failures.

2015-11-24 Thread Chris Nauroth (JIRA)

Chris Nauroth created HDFS-9458:
---

 Summary: TestBackupNode always binds to port 50070, which can 
cause bind failures.
 Key: HDFS-9458
 URL: https://issues.apache.org/jira/browse/HDFS-9458
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Chris Nauroth


{{TestBackupNode}} does not override port settings to use a dynamically 
selected port for the NameNode HTTP server.  It uses the default of 50070 
defined in hdfs-default.xml.  This should be changed to select a dynamic port 
to avoid bind errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-9450) Fix failing HDFS tests on HDFS-7240 Ozone branch.

2015-11-23 Thread Chris Nauroth (JIRA)

Chris Nauroth created HDFS-9450:
---

 Summary: Fix failing HDFS tests on HDFS-7240 Ozone branch.
 Key: HDFS-9450
 URL: https://issues.apache.org/jira/browse/HDFS-9450
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: test
Reporter: Chris Nauroth


Several test failures have been introduced on the HDFS-7240 Ozone feature 
branch.  This issue tracks fixing those tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-9443) Disabling HDFS client socket cache causes logging message printed to console for CLI commands.

2015-11-19 Thread Chris Nauroth (JIRA)

Chris Nauroth created HDFS-9443:
---

 Summary: Disabling HDFS client socket cache causes logging message 
printed to console for CLI commands.
 Key: HDFS-9443
 URL: https://issues.apache.org/jira/browse/HDFS-9443
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Trivial


The HDFS client's socket cache can be disabled by setting 
{{dfs.client.socketcache.capacity}} to {{0}}.  When this is done, the 
{{PeerCache}} class logs an info-level message stating that the cache is 
disabled.  This message is getting printed to the console for CLI commands, 
which disrupts CLI output.  This issue proposes to downgrade to debug-level 
logging for this message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HDFS-190) DataNode should be marked as final to prevent subclassing

2015-11-10 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-190.

Resolution: Won't Fix

We're now in a situation where the current codebase uses subclassing of 
{{DataNode}} for some tests.  There has been no activity on this issue for many 
years, so it looks unlikely that it would be implemented.  I'm closing it as 
won't fix.

> DataNode should be marked as final to prevent subclassing
> -
>
> Key: HDFS-190
> URL: https://issues.apache.org/jira/browse/HDFS-190
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Steve Loughran
>Priority: Minor
>
> Reviewing the DataNode core, it starts a thread in its constructor calling 
> back in to the Run() method. This is generally perceived as very dangerous, 
> as if DataNode were ever subclassed, the subclass would start to be invoked 
> in the run() method before its own constructor had finished working.
> 1. Consider splitting the constructor from the start() operation.
> 2. If this cannot be changed, mark DataNode as final so nobody can subclass 
> it.  Though if the latter were done, it would be convenient to have a method 
> to let external management components poll for the health of the node, and to 
> pick up reasons for the node shutting down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-9409) DataNode shutdown does not guarantee full shutdown of all threads due to race condition.

2015-11-10 Thread Chris Nauroth (JIRA)

Chris Nauroth created HDFS-9409:
---

 Summary: DataNode shutdown does not guarantee full shutdown of all 
threads due to race condition.
 Key: HDFS-9409
 URL: https://issues.apache.org/jira/browse/HDFS-9409
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Reporter: Chris Nauroth


{{DataNode#shutdown}} is documented to return "only after shutdown is 
complete".  Even after completion of this method, it's possible that threads 
started by the DataNode are still running.  Race conditions in the shutdown 
sequence may cause it to skip stopping and joining the {{BPServiceActor}} 
threads.

This is likely not a big problem in normal operations, because these are daemon 
threads that won't block overall process exit.  It is more of a problem for 
tests, because it makes it impossible to write reliable assertions that these 
threads exited cleanly.  For large test suites, it can also cause an 
accumulation of unneeded threads, which might harm test performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HDFS-9404) Findbugs issue reported in BlockRecoveryWorker$RecoveryTaskContiguous.recover()

2015-11-09 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-9404.
-
Resolution: Duplicate

Hi [~yzhangal].  This is tracked in HDFS-9401.  Thanks!

> Findbugs issue reported in 
> BlockRecoveryWorker$RecoveryTaskContiguous.recover()
> ---
>
> Key: HDFS-9404
> URL: https://issues.apache.org/jira/browse/HDFS-9404
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: HDFS
>Reporter: Yongjun Zhang
>
> https://builds.apache.org/job/PreCommit-HDFS-Build/13431/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html
> Reported:
> Code  Warning
> ECCall to 
> org.apache.hadoop.hdfs.server.protocol.DatanodeRegistration.equals(org.apache.hadoop.hdfs.protocol.DatanodeInfo)
>  in 
> org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$RecoveryTaskContiguous.recover()
> Details
> EC_UNRELATED_TYPES: Call to equals() comparing different types
> This method calls equals(Object) on two references of different class types 
> and analysis suggests they will be to objects of different classes at 
> runtime. Further, examination of the equals methods that would be invoked 
> suggest that either this call will always return false, or else the equals 
> method is not be symmetric (which is a property required by the contract for 
> equals in class Object).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-9400) TestRollingUpgradeRollback fails on branch-2.

2015-11-07 Thread Chris Nauroth (JIRA)

Chris Nauroth created HDFS-9400:
---

 Summary: TestRollingUpgradeRollback fails on branch-2.
 Key: HDFS-9400
 URL: https://issues.apache.org/jira/browse/HDFS-9400
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Chris Nauroth
Priority: Blocker


During a Jenkins pre-commit run on branch-2 for the HDFS-9394 patch, we noticed 
a pre-existing failure in {{TestRollingUpgradeRollback}}.  I have confirmed 
that this test is failing in branch-2 only.  It passes in trunk, and it passes 
in branch-2.7.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: ReadOnly WebHDFS

2015-11-06 Thread Chris Nauroth

Hello Laxman,

I'm curious how this is a new problem after migration from HttpFs to
WebHDFS.  With the HttpFs deployment architecture, were you somehow
proxying only the read-only operations?

I would think this kind of thing could be achieved by writing a custom
authentication filter, deploying that to the HDFS classpath, and then
pointing to it by setting dfs.web.authentication.filter in hdfs-site.xml
to the full name of that custom authentication filter class.  The logic of
the custom authentication filter would check for only read-only operations
and reject the others.  This is a solution that wouldn't require changes
in WebHDFS itself.

This is not a requirement I've heard from anyone else.  I'm generally
reluctant to add features without a widespread need.  Still, if you want
to file an HDFS JIRA for further discussion of your proposal, there is no
harm in that.  It might end up as a "Won't Fix", or perhaps others in the
community see it differently from me, and we'd want to proceed.

Thanks for sharing the work you've done!

--Chris Nauroth

On 11/6/15, 3:02 AM, "Laxman Ch"  wrote:

>Hi,
>
>We run a cluster with security set to simple.
>Also, to some users, we had provided http access to HDFS via HttpFS
>gateways.
>However, this is not scaling and we are suffering from HttpFs gateway
>choking problem. So, we wanted to enable WebHDFS directly on hadoop. But
>this brings in the problem of security. Any user can simply delete
>anything. And, we can't enable immediately enable kerberos security in
>production.
>
>How about introducing a configuration to make WebHDFS readonly?
>We patched this in our clusters cleanly and its working.
>
>Please revert with your comments if its a good idea to push this to
>hadoop.
>If yes, I will create a jira and submit patch.
>-- 
>Thanks,
>Laxman

[jira] [Created] (HDFS-9394) branch-2 hadoop-hdfs-client fails during FileSystem ServiceLoader initialization, because HftpFileSystem is missing.

2015-11-05 Thread Chris Nauroth (JIRA)

Chris Nauroth created HDFS-9394:
---

 Summary: branch-2 hadoop-hdfs-client fails during FileSystem 
ServiceLoader initialization, because HftpFileSystem is missing.
 Key: HDFS-9394
 URL: https://issues.apache.org/jira/browse/HDFS-9394
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Reporter: Chris Nauroth
Priority: Critical


On branch-2, hadoop-hdfs-client contains a {{FileSystem}} service descriptor 
that lists {{HftpFileSystem}} and {{HsftpFileSystem}}.  These classes do not 
reside in hadoop-hdfs-client.  Instead, they reside in hadoop-hdfs.  If the 
application has hadoop-hdfs-client.jar on the classpath, but not 
hadoop-hdfs.jar, then this can cause a {{ServiceConfigurationError}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-9384) TestWebHdfsContentLength intermittently hangs and fails due to TCP conversation mismatch between client and server.

2015-11-05 Thread Chris Nauroth (JIRA)

Chris Nauroth created HDFS-9384:
---

 Summary: TestWebHdfsContentLength intermittently hangs and fails 
due to TCP conversation mismatch between client and server.
 Key: HDFS-9384
 URL: https://issues.apache.org/jira/browse/HDFS-9384
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor


{{TestWebHdfsContentLength}} runs a simple hand-coded HTTP server in a 
background thread to simulate some WebHDFS server responses.  In some 
environments (notably Windows), I have observed that the test can hang and fail 
intermittently.  The root cause is that the server fails to fully consume the 
client's input.  This causes a mismatch in the TCP conversation state, and 
ultimately the client side hangs, then aborts after the 60-second socket 
timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-9378) hadoop-hdfs-client tests do not write logs.

2015-11-04 Thread Chris Nauroth (JIRA)

Chris Nauroth created HDFS-9378:
---

 Summary: hadoop-hdfs-client tests do not write logs.
 Key: HDFS-9378
 URL: https://issues.apache.org/jira/browse/HDFS-9378
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor


The tests that have been split into the hadoop-hdfs-client module are not 
writing any log output, because there is no src/test/resources/log4j.properties 
file in the module.  This makes it more difficult to troubleshoot test failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-9370) TestDataNodeUGIProvider fails intermittently due to non-deterministic cache expiry.

2015-11-03 Thread Chris Nauroth (JIRA)

Chris Nauroth created HDFS-9370:
---

 Summary: TestDataNodeUGIProvider fails intermittently due to 
non-deterministic cache expiry.
 Key: HDFS-9370
 URL: https://issues.apache.org/jira/browse/HDFS-9370
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor


{{TestDataNodeUGIProvider}} has hard-coded sleep times waiting for background 
expiration of entries in a Guava cache.  I have seen this test suite fail 
intermittently, because expiration is not guaranteed to happen strictly on the 
boundary of the period defined by the cache's expiration time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-9362) TestAuditLogger#testAuditLoggerWithCallContext assumes Unix line endings, fails on Windows.

2015-11-02 Thread Chris Nauroth (JIRA)

Chris Nauroth created HDFS-9362:
---

 Summary: TestAuditLogger#testAuditLoggerWithCallContext assumes 
Unix line endings, fails on Windows.
 Key: HDFS-9362
 URL: https://issues.apache.org/jira/browse/HDFS-9362
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor


{{TestAuditLogger#testAuditLoggerWithCallContext}} was added recently to 
exercise the new audit logging with caller context functionality.  The tests 
assume Unix line endings by hard-coding "\n" in asserts.  These tests fail on 
Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-9311) Support optional offload of NameNode HA service health checks to a separate RPC server.

2015-10-26 Thread Chris Nauroth (JIRA)

Chris Nauroth created HDFS-9311:
---

 Summary: Support optional offload of NameNode HA service health 
checks to a separate RPC server.
 Key: HDFS-9311
 URL: https://issues.apache.org/jira/browse/HDFS-9311
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, namenode
Reporter: Chris Nauroth
Assignee: Chris Nauroth


When a NameNode is overwhelmed with load, it can lead to resource exhaustion of 
the RPC handler pools (both client-facing and service-facing).  Eventually, 
this blocks the health check RPC issued from ZKFC, which triggers a failover.  
Depending on fencing configuration, the former active NameNode may be killed.  
In an overloaded situation, the new active NameNode is likely to suffer the 
same fate, because client load patterns don't change after the failover.  This 
can degenerate into flapping between the 2 NameNodes without real recovery.  If 
a NameNode had been killed by fencing, then it would have to transition through 
safe mode, further delaying time to recovery.

This issue proposes a separate, optional RPC server at the NameNode for 
isolating the HA health checks.  These health checks are lightweight operations 
that do not suffer from contention issues on the namesystem lock or other 
shared resources.  Isolating the RPC handlers is sufficient to avoid this 
situation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-9239) DataNode Lifeline Protocol: an alternative protocol for reporting DataNode liveness

2015-10-13 Thread Chris Nauroth (JIRA)

Chris Nauroth created HDFS-9239:
---

 Summary: DataNode Lifeline Protocol: an alternative protocol for 
reporting DataNode liveness
 Key: HDFS-9239
 URL: https://issues.apache.org/jira/browse/HDFS-9239
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode, namenode
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Attachments: DataNode-Lifeline-Protocol.pdf

This issue proposes introduction of a new feature: the DataNode Lifeline 
Protocol.  This is an RPC protocol that is responsible for reporting liveness 
and basic health information about a DataNode to a NameNode.  Compared to the 
existing heartbeat messages, it is lightweight and not prone to resource 
contention problems that can harm accurate tracking of DataNode liveness 
currently.  The attached design document contains more details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HDFS-9138) TestDatanodeStartupFixesLegacyStorageIDs fails on Windows due to failure to unpack old image tarball that contains hard links

2015-09-24 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-9138.
-
Resolution: Not A Problem

Yes, my earlier HDFS-8554 patch fixed this already.  I was only seeing a test 
failure on an older branch.  I'm resolving this issue.

> TestDatanodeStartupFixesLegacyStorageIDs fails on Windows due to failure to 
> unpack old image tarball that contains hard links
> -
>
> Key: HDFS-9138
> URL: https://issues.apache.org/jira/browse/HDFS-9138
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Minor
> Attachments: HDFS-9138.001.patch
>
>
> {{TestDatanodeStartupFixesLegacyStorageIDs#testUpgradeFrom22via26FixesStorageIDs}}
>  uses a checked-in DataNode data directory that contains hard links.  The 
> hard links cannot be handled correctly by the commons-compress library used 
> in the Windows implementation of {{FileUtil#unTar}}.  The result is that the 
> unpacked block files have 0 length, the block files reported to the NameNode 
> are invalid, and therefore the mini-cluster never gets enough good blocks 
> reported to leave safe mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-9138) TestDatanodeStartupFixesLegacyStorageIDs fails on Windows due to failure to unpack old image tarball that contains hard links

2015-09-24 Thread Chris Nauroth (JIRA)

Chris Nauroth created HDFS-9138:
---

 Summary: TestDatanodeStartupFixesLegacyStorageIDs fails on Windows 
due to failure to unpack old image tarball that contains hard links
 Key: HDFS-9138
 URL: https://issues.apache.org/jira/browse/HDFS-9138
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor


{{TestDatanodeStartupFixesLegacyStorageIDs#testUpgradeFrom22via26FixesStorageIDs}}
 uses a checked-in DataNode data directory that contains hard links.  The hard 
links cannot be handled correctly by the commons-compress library used in the 
Windows implementation of {{FileUtil#unTar}}.  The result is that the unpacked 
block files have 0 length, the block files reported to the NameNode are 
invalid, and therefore the mini-cluster never gets enough good blocks reported 
to leave safe mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HDFS-9136) TestDFSUpgrade leaks file descriptors.

2015-09-23 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-9136.
-
Resolution: Not A Problem

I was mistaken.  This actually got fixed as part of the test changes done in 
HDFS-8846.  I'm no longer seeing a problem after that patch.  I'll resolve this.

> TestDFSUpgrade leaks file descriptors.
> --
>
> Key: HDFS-9136
> URL: https://issues.apache.org/jira/browse/HDFS-9136
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Minor
>
> HDFS-8480 introduced code in {{TestDFSUpgrade#testPreserveEditLogs}} that 
> opens edit log files and reads from them, but these files are never closed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-9136) TestDFSUpgrade leaks file descriptors.

2015-09-23 Thread Chris Nauroth (JIRA)

Chris Nauroth created HDFS-9136:
---

 Summary: TestDFSUpgrade leaks file descriptors.
 Key: HDFS-9136
 URL: https://issues.apache.org/jira/browse/HDFS-9136
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor


HDFS-8480 introduced code in {{TestDFSUpgrade#testPreserveEditLogs}} that opens 
edit log files and reads from them, but these files are never closed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-9128) TestWebHdfsFileContextMainOperations and TestSWebHdfsFileContextMainOperations fail due to invalid HDFS path on Windows.

2015-09-23 Thread Chris Nauroth (JIRA)

Chris Nauroth created HDFS-9128:
---

 Summary: TestWebHdfsFileContextMainOperations and 
TestSWebHdfsFileContextMainOperations fail due to invalid HDFS path on Windows.
 Key: HDFS-9128
 URL: https://issues.apache.org/jira/browse/HDFS-9128
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Trivial


These tests do not override the default behavior of using the local file system 
test working directory to construct test paths.  These paths will contain the 
':' character on Windows due to the drive spec.  HDFS rejects the ':' character 
as invalid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Even after HDFS-2856 JSVC References are require..?

2015-09-14 Thread Chris Nauroth

I have no specific numbers to offer on this.  Like you, I expect the
impact is non-zero.  I also expect the impact would depend on the setting
of dfs.data.transfer.protection.  For "authentication", it might just be a
small up-front cost for the SASL authentication handshake.  For
"integrity" or "privacy", it becomes more a function of the payload (the
block data transferred).  For "privacy", I expect a similar hit to
enabling data transfer encryption using dfs.encrypt.data.transfer, which
is already known to be significant.

I think it's a fair point to require a performance comparison before
proceeding with any proposal of deprecating the jsvc deployment option.

--Chris Nauroth




On 9/14/15, 4:23 PM, "Colin P. McCabe"  wrote:

>Has anyone measured the overhead of running SASL on
>DataTransferProtocol?  I would expect it to be non-zero compared with
>simply running on a low port.  The CPU overhead especially could
>regress performance on a typical Hadoop cluster.
>
>best,
>Colin
>
>On Thu, Sep 10, 2015 at 9:55 AM, Chris Nauroth 
>wrote:
>> Yes, I have a paragraph in the docs describing how someone would go
>>about
>> migrating a jsvc-based deployment to a SASL-based deployment.
>>
>> 
>>http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/Se
>>cu
>> reMode.html#Secure_DataNode
>>
>>
>> It's a non-trivial operation that starts by making sure everyone is on
>>2.6
>> first.  This includes client deployments, which are notoriously more
>> difficult to control than server deployments.
>>
>> --Chris Nauroth
>>
>>
>>
>>
>> On 9/10/15, 1:21 AM, "Steve Loughran"  wrote:
>>
>>>SASL authenticates the DN on Hadoop 2.6+, but it requires the clients to
>>>be using the 2.6+ JARs; you can't use it on the 2.2-2.5 artifacts.
>>>
>>>> On 9 Sep 2015, at 18:45, Allen Wittenauer  wrote:
>>>>
>>>>
>>>> FWIW, I still use and prefer jsvc, esp with the sudo trick in place.
>>>>
>>>> On Sep 9, 2015, at 9:35 AM, Chris Nauroth 
>>>>wrote:
>>>>
>>>>> AFAIK, the majority of existing deployments still use jsvc to run a
>>>>> secured DataNode.  It would be a backwards-incompatible change to
>>>>>remove
>>>>> support for this deployment model.  For that reason, I would be -1
>>>>>for
>>>>> removing jsvc support, at least in the 2.x line.
>>>>>
>>>>>
>>>>> It's something that could be considered for 3.x if we think the
>>>>>clean-up
>>>>> benefit outweighs the incompatibility cost.  Before we do that, I'd
>>>>>prefer
>>>>> to hear if end users are having success with the SASL deployment
>>>>>model.
>>>>> Brahma, are you asking because you run clusters with the SASL
>>>>>approach?
>>>>> If so, has it been working well?
>>>>>
>>>>> --Chris Nauroth
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 9/9/15, 9:25 AM, "Haohui Mai"  wrote:
>>>>>
>>>>>> JSVC is no longer required. It causes a lot of headaches in
>>>>>> deployments. It's definitely a good target for clean ups.
>>>>>>
>>>>>> ~Haohui
>>>>>>
>>>>>> On Wed, Sep 9, 2015 at 5:24 AM, Brahma Reddy Battula
>>>>>>  wrote:
>>>>>>> Hi All,
>>>>>>>
>>>>>>> AFAIK JSVC added secure the block tokens(..?).
>>>>>>>
>>>>>>> Since block tokens are secure now (SASL used to secure the
>>>>>>> DataTransferProtocol, which transfers file block content between
>>>>>>>HDFS
>>>>>>> clients and DataNodes),then can we remove jsvc now (script
>>>>>>>files)..?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Thanks & Regards
>>>>>>>
>>>>>>> Brahma Reddy Battula
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>

Re: kindly confirm the behavior if configs are disabled after setting

2015-09-11 Thread Chris Nauroth

Hello Archana,

If ACLs are disabled after ACLs have already been created in the namespace, 
then the behavior is:

1. All subsequent ACL-related API calls fail intentionally.  It is impossible 
to create, update, or delete any ACLs.
2. The NameNode is still capable of loading existing ACL metadata stored in a 
pre-existing fsimage or edit log that was generated while ACLs were enabled.  
If ACLs are disabled, it would not cause the NameNode to abort while loading 
ACL-related metadata.
3. If ACLs are present on a file or directory from a prior configuration with 
ACLs enabled, then those ACLs are still enforced when users attempt to access 
the files.

This behavior is by design.  If you're interested in more of the background 
discussion that led to these design decisions, please see the comments in 
issues HDFS-5899 and HDFS-5925.

--Chris Nauroth

From: Archana talamani 
mailto:archana.talam...@huawei.com>>
Reply-To: "hdfs-dev@hadoop.apache.org<mailto:hdfs-dev@hadoop.apache.org>" 
mailto:hdfs-dev@hadoop.apache.org>>
Date: Friday, September 11, 2015 at 1:31 AM
To: "hdfs-dev@hadoop.apache.org<mailto:hdfs-dev@hadoop.apache.org>" 
mailto:hdfs-dev@hadoop.apache.org>>
Subject: kindly confirm the behavior if configs are disabled after setting

Hi All,

I have a query on ACL/Storage policy  or in general any configs which provide 
enable/disable feature -

Like,
ACL is set on a file/directory, after which permission check will use ACLs for 
checking permission.
What should be the behavior if ACLs are disabled after setting?
whether it should use normal permissions or should continue using ACLs for 
access check?

Kindly reply the expected behavior.


Thanks,
Archana Talamani
HUAWEI TECHNOLOGIES INDIA PVT LTD.
[cid:image001.png@01D0EC9A.60550E10]

-
This e-mail and its attachments contain confidential information from HUAWEI, 
which
is intended only for the person or entity whose address is listed above. Any 
use of the
information contained herein in any way (including, but not limited to, total 
or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify 
the sender by
phone or email immediately and delete it!

1 2 3 4 5 6 >

1 - 100 of 513 matches

Mail list logo