from:"Aaron Fabbri"

Re: Hadoop File API v.s. Commons VFS

2020-03-10 Thread Aaron Fabbri

It is a good question. I'm not familiar with Apache commons VFS (which I
assume you are talking about, versus the BSD/Unix VFS layer). There no
doubt will be semantic differences between Hadoop FS interface and VFS. It
would be an interesting exercise to implement a connector that bridges the
gap, running a Hadoop FileSystem etc. on top of VFS libraries. Anyone else
looked at this or have experience with Apache VFS?

On Fri, Feb 28, 2020 at 6:42 AM David Mollitor  wrote:

> Hello,
>
> I'm curious to know what the history of Hadoop File API is in relationship
> to VFS.  Hadoop supports several file schemes and so does VFS.  Why are
> there two projects working on this same effort and what are the pros/cons
> of each?
>
> Thanks.
>

Re: [DISCUSS] Separate Hadoop Core trunk and Hadoop Ozone trunk source tree

2019-09-19 Thread Aaron Fabbri

+1 (binding)

Thanks to the Ozone folks for their efforts at maintaining good separation
with HDFS and common. I took a lot of heat for the unpopular opinion that
they should  be separate, so I am glad the process has worked out well for
both codebases. It looks like my concerns were addressed and I appreciate
it.  It is cool to see the evolution here.

Aaron

On Thu, Sep 19, 2019 at 3:37 AM Steve Loughran 
wrote:

> in that case,
>
> +1 from me (binding)
>
> On Wed, Sep 18, 2019 at 4:33 PM Elek, Marton  wrote:
>
> >  > one thing to consider here as you are giving up your ability to make
> >  > changes in hadoop-* modules, including hadoop-common, and their
> >  > dependencies, in sync with your own code. That goes for filesystem
> > contract
> >  > tests.
> >  >
> >  > are you happy with that?
> >
> >
> > Yes. I think we can live with it.
> >
> > Fortunatelly the Hadoop parts which are used by Ozone (security + rpc)
> > are stable enough, we didn't need bigger changes until now (small
> > patches are already included in 3.1/3.2).
> >
> > I think it's better to use released Hadoop bits in Ozone anyway, and
> > worst (best?) case we can try to do more frequent patch releases from
> > Hadoop (if required).
> >
> >
> > m.
> >
> >
> >
>

Re: Hadoop Storage online sync in an hour

2019-09-04 Thread Aaron Fabbri

Hi Wei-Chiu,

Can you share the calendar link again for this meeting?

Thanks,
Aaron

On Wed, Sep 4, 2019 at 9:31 AM Matt Foley  wrote:

> Sorry I won’t be able to come today; a work meeting interferes.
> —Matt
>
> On Sep 4, 2019, at 9:10 AM, Wei-Chiu Chuang  wrote:
>
> It's a short week so I didn't set up a predefined topic to discuss.
>
> What should we be discussing? How about Erasure Coding? I'm starting to see
> tricky EC bug reports coming in lately, so looks like folks are using it in
> production. Should we be thinking about the next step for EC in addition to
> bug fixes?
>
> Feel free to contribute other topics. We could also continue discussing
> NameNode Fine-Grained Locking from the last time.
>
> Weichiu
>
>
> -
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>
>

Re: Hadoop storage community online sync

2019-08-21 Thread Aaron Fabbri

Thank you Wei-Chiu for organizing this and sending out notes!

On Wed, Aug 21, 2019 at 1:10 PM Wei-Chiu Chuang  wrote:

> We had a great turnout today, thanks to Konstantin for leading the
> discussion of the NameNode Fine-Grained Locking proposal.
>
> There were at least 16 participants joined the call.
>
> Today's summary can be found here:
>
> https://docs.google.com/document/d/1jXM5Ujvf-zhcyw_5kiQVx6g-HeKe-YGnFS_1-qFXomI/edit#
>
> 8/19/2019
>
> We are moving the sync to 10AM US PDT!
>
> NameNode Fine-Grained Locking via InMemory Namespace Partitioning
>
> Attendee:
>
> Konstantin, Chen, Weichiu, Xiaoyu, Anu, Matt, pljeliazkov, Chao Sun, Clay,
> Bharat Viswanadham, Matt, Craig Condit, Matthew Sharp, skumpf, Artem
> Ervits, Mohammad J Khan, Nanda, Alex Moundalexis.
>
> Konstantin lead the discussion of HDFS-14703
> .
>
> There are three important parts:
>
> (1) Partition namespace into multiple GSet, different part of namespace can
> be processed in parallel.
>
> (2) INode Key
>
> (3) Latch lock
>
> How to support snapshot —> should be able to get partitioned similarly.
>
> Balance partition strategies: several possible ways. Dynamic partition
> strategy, Static partitioning strategy —> no need a higher level navigation
> lock.
>
> Dynamic strategy: starting with 1, and grow.
>
> And: why does the design doc use static partitioning? determining the size
> of partitions is hard. what about starting with 1024 partitions.
>
> Hotspot problem
>
> A related task, HDFS-14617
>  (Improve fsimage load
> time by writing sub-sections to the fsimage index) writes multiple inode
> sections and inode directory sections, and load sections in parallel. It
> sounds like we can combine it with the fine-grained locking and partition
> inode/inode directory sections by the namespace partitions.
>
> Anu: snapshot complicates design. Renames. Copy on write?
>
> Anu: suggest to implement this feature without snapshot support to simplify
> design and implementation.
>
> Konstantin: will develop in a feature branch. Feel free to pick up jiras or
> share thoughts.
>
> FoldedTreeSet implemented in HDFS-9260
>  is relevant. Need to fix
> or revert before developing the namespace partitioning feature.
>
> On Mon, Aug 19, 2019 at 2:55 PM Wei-Chiu Chuang 
> wrote:
>
> > For this week,
> > We will have Konstantin and the LinkedIn folks to discuss a recent
> project
> > that's been baking for quite a while. This is an exciting project as it
> has
> > the potential to improve NameNode's throughput by 40%.
> >
> > HDFS-14703  NameNode
> > Fine-Grained Locking
> >
> > Access instruction, and the past sync notes are available here:
> >
> https://docs.google.com/document/d/1jXM5Ujvf-zhcyw_5kiQVx6g-HeKe-YGnFS_1-qFXomI/edit?usp=sharing
> >
> > Reminder: We have Bi-weekly Hadoop storage online sync every other
> > Wednesday.
> > If there are no objections, I'd like to move the time to 10AM US pacific
> > time (GMT-8)
> >
>

Re: new committer: Gabor Bota

2019-07-10 Thread Aaron Fabbri

Congrats Gabor! Have really enjoyed working with you, looking forward to
more good stuff

On Thu, Jul 4, 2019 at 4:31 AM Gabor Bota 
wrote:

> Thank you!
>
> On Thu, Jul 4, 2019 at 8:50 AM Szilard Nemeth  >
> wrote:
>
> > Congrats, Gabor!
> >
> > On Tue, Jul 2, 2019, 01:36 Sean Mackrory  wrote:
> >
> > > The Project Management Committee (PMC) for Apache Hadoop
> > > has invited Gabor Bota to become a committer and we are pleased
> > > to announce that he has accepted.
> > >
> > > Gabor has been working on the S3A file-system, especially on
> > > the robustness and completeness of S3Guard to help deal with
> > > inconsistency in object storage. I'm excited to see his work
> > > with the community continue!
> > >
> > > Being a committer enables easier contribution to the
> > > project since there is no need to go via the patch
> > > submission process. This should enable better productivity.
> > >
> >
>

Re: [ANNOUNCE] Aaron Fabbri as Hadoop PMC

2019-06-18 Thread Aaron Fabbri

Thank you everybody. Appreciate it.

On Mon, Jun 17, 2019 at 8:43 PM Sree V 
wrote:

> Congratulations, Aaron.
>
>
>
> Thank you./Sree
>
>
>
> On Monday, June 17, 2019, 7:31:47 PM PDT, Dinesh Chitlangia <
> dchitlan...@cloudera.com.INVALID> wrote:
>
>  Congratulations Aaron!
>
> -Dinesh
>
>
> On Mon, Jun 17, 2019 at 9:29 PM Wanqiang Ji  wrote:
>
> > Congratulations!
> >
> > On Tue, Jun 18, 2019 at 8:29 AM Da Zhou  wrote:
> >
> > > Congratulations!
> > >
> > > Regards,
> > > Da
> > >
> > > On Mon, Jun 17, 2019 at 5:14 PM Ajay Kumar  > > .invalid>
> > > wrote:
> > >
> > > > Congrats Aaron!!
> > > >
> > > > On Mon, Jun 17, 2019 at 4:00 PM Daniel Templeton
> > > >  wrote:
> > > >
> > > > > I am very pleased to announce that Aaron Fabbri has now been added
> to
> > > > > the Hadoop PMC.  Welcome aboard, Aaron, and Congratulations!
> > > > >
> > > > > Daniel
> > > > >
> > > > >
> -
> > > > > To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
> > > > > For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: [VOTE] Unprotect HDFS-13891 (HDFS RBF Branch)

2019-05-15 Thread Aaron Fabbri

+1 to unprotect feature branch (in general) for rebasing against trunk.

On Tue, May 14, 2019 at 7:53 PM Dinesh Chitlangia
 wrote:

> +1(non-binding) for branch.
>
> -Dinesh
>
>
>
>
> On Tue, May 14, 2019 at 10:04 PM Brahma Reddy Battula 
> wrote:
>
> > Yes Arpit,it’s not for trunk..
> >
> >
> > On Wed, May 15, 2019 at 2:11 AM, Arpit Agarwal 
> > wrote:
> >
> > > The request is specific to HDFS-13891, correct?
> > >
> > > We should not allow force push on trunk.
> > >
> > >
> > > > On May 14, 2019, at 8:07 AM, Anu Engineer  > .INVALID>
> > > wrote:
> > > >
> > > > Is it possible to unprotect the branches and not the trunk?
> Generally,
> > a
> > > > force push to trunk indicates a mistake and we have had that in the
> > past.
> > > > This is just a suggestion,  even if this request is not met, I am
> still
> > > +1.
> > > >
> > > > Thanks
> > > > Anu
> > > >
> > > >
> > > >
> > > > On Tue, May 14, 2019 at 4:58 AM Takanobu Asanuma <
> > tasan...@yahoo-corp.jp
> > > >
> > > > wrote:
> > > >
> > > >> +1.
> > > >>
> > > >> Thanks!
> > > >> - Takanobu
> > > >>
> > > >> 
> > > >> From: Akira Ajisaka 
> > > >> Sent: Tuesday, May 14, 2019 4:26:30 PM
> > > >> To: Giovanni Matteo Fumarola
> > > >> Cc: Iñigo Goiri; Brahma Reddy Battula; Hadoop Common; Hdfs-dev
> > > >> Subject: Re: [VOTE] Unprotect HDFS-13891 (HDFS RBF Branch)
> > > >>
> > > >> +1 to unprotect the branch.
> > > >>
> > > >> Thanks,
> > > >> Akira
> > > >>
> > > >> On Tue, May 14, 2019 at 3:11 PM Giovanni Matteo Fumarola
> > > >>  wrote:
> > > >>>
> > > >>> +1 to unprotect the branches for rebases.
> > > >>>
> > > >>> On Mon, May 13, 2019 at 11:01 PM Iñigo Goiri 
> > > wrote:
> > > >>>
> > >  Syncing the branch to trunk should be a fairly standard task.
> > >  Is there a way to do this without rebasing and forcing the push?
> > >  As far as I know this has been the standard for other branches
> and I
> > > >> don't
> > >  know of any alternative.
> > >  We should clarify the process as having to get PMC consensus to
> > rebase
> > > >> a
> > >  branch seems a little overkill to me.
> > > 
> > >  +1 from my side to un protect the branch to do the rebase.
> > > 
> > >  On Mon, May 13, 2019, 22:46 Brahma Reddy Battula <
> bra...@apache.org
> > >
> > >  wrote:
> > > 
> > > > Hi Folks,
> > > >
> > > > INFRA-18181 made all the Hadoop branches are protected.
> > > > Badly HDFS-13891 branch needs to rebased as we contribute core
> > > >> patches
> > > > trunk..So,currently we are stuck with rebase as it’s not allowed
> to
> > > >> force
> > > > push.Hence raised INFRA-18361.
> > > >
> > > > Can we have a quick vote for INFRA sign-off to proceed as this is
> > >  blocking
> > > > all branch commits??
> > > >
> > > > --
> > > >
> > > >
> > > >
> > > > --Brahma Reddy Battula
> > > >
> > > 
> > > >>
> > > >>
> -
> > > >> To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
> > > >> For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
> > > >>
> > > >>
> > >
> > > --
> >
> >
> >
> > --Brahma Reddy Battula
> >
>

[jira] [Resolved] (HADOOP-16251) ABFS: add FSMainOperationsBaseTest

2019-05-10 Thread Aaron Fabbri (JIRA)



 [ 
https://issues.apache.org/jira/browse/HADOOP-16251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Fabbri resolved HADOOP-16251.
---
   Resolution: Fixed
Fix Version/s: 3.2.1

> ABFS: add FSMainOperationsBaseTest
> --
>
> Key: HADOOP-16251
> URL: https://issues.apache.org/jira/browse/HADOOP-16251
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Da Zhou
>Assignee: Da Zhou
>Priority: Major
> Fix For: 3.2.1
>
>
> Just happened to see 
> "hadoop/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/FSMainOperationsBaseTest.java",
>  ABFS could inherit this test to increase its test coverage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

[jira] [Created] (HADOOP-16291) HDFS Permissions Guide appears incorrect about getFileStatus()/getFileInfo()

2019-05-02 Thread Aaron Fabbri (JIRA)

Aaron Fabbri created HADOOP-16291:
-

 Summary: HDFS Permissions Guide appears incorrect about 
getFileStatus()/getFileInfo()
 Key: HADOOP-16291
 URL: https://issues.apache.org/jira/browse/HADOOP-16291
 Project: Hadoop Common
  Issue Type: Bug
  Components: documentation
Reporter: Aaron Fabbri


Fix some errors in the HDFS Permissions doc.

Noticed this when reviewing HADOOP-16251. The FS Permissions 
[documentation|https://hadoop.apache.org/docs/r3.2.0/hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html]
 seems to mark a lot of permissions as Not Applicable (N/A) when that is not 
the case. In particular getFileInfo (getFileStatus) checks READ permission bit 
[here|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java#L3202-L3204],
 as it should.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Re: Not getting JIRA email on cc: or tag

2019-05-01 Thread Aaron Fabbri

I got it sorted thanks. FWIW your JIRA email address setting doesn't appear
until you click the pencil (edit) under the Details tab of your JIRA
profile page.

Sorry for the wide distribution here.

-AF

On Wed, May 1, 2019 at 3:30 AM Steve Loughran  wrote:

> 1. you checked your JIRA notification settings?
> 2. is it going to the right email address?
>
> On Wed, May 1, 2019 at 5:30 AM Aaron Fabbri  wrote:
>
>> Hello,
>>
>> I haven't been getting emailed when someone tags or cc:s me in a JIRA. Is
>> there a way to change this?
>>
>> Thanks!
>> Aaron
>>
>

Not getting JIRA email on cc: or tag

2019-04-30 Thread Aaron Fabbri

Hello,

I haven't been getting emailed when someone tags or cc:s me in a JIRA. Is
there a way to change this?

Thanks!
Aaron

Re: [jira] [Created] (HADOOP-16219) Hadoop branch-2 to set java language version to 1.8

2019-04-03 Thread Aaron Fabbri

+1 assuming we get the technical issues sorted. I'll give your patch a go
today.

As Steve mentioned on the JIRA, unsupported JVM is called out as an
exception in the compatibility guidelines:

"The JVM requirements will not change across point releases within the same
minor release except if the JVM version under question becomes unsupported"

On Thu, Mar 28, 2019 at 3:39 PM Steve Loughran 
wrote:

> I've just created a patch to update branch-2 to Java 8. Not sure what's
> going to break, but hopefully all problems can be done with some selective
> changes. We know that Hadoop branch-2 builds and runs on Java 8, it being
> the java version we are already using for our branch-2 build and test runs.
>
> It's time to move on; maintenance is getting harder and harder, as well as
> more inconvenient. And while I know of clusters being deployed, they are,
> AFAIK, branch-2 on java 8 JVMs. This is just changing the language to
> reflect the reality.
>
> Before anyone asks, yes, our compatibility guidelines cover this; we had
> the same issue in the java 6 -> 7 move, remember?
>
> -- Forwarded message -
> From: Steve Loughran (JIRA) 
> Date: Thu, Mar 28, 2019 at 7:33 PM
> Subject: [jira] [Created] (HADOOP-16219) Hadoop branch-2 to set java
> language version to 1.8
> To: 
>
>
> Steve Loughran created HADOOP-16219:
> ---
>
>  Summary: Hadoop branch-2 to set java language version to 1.8
>  Key: HADOOP-16219
>  URL: https://issues.apache.org/jira/browse/HADOOP-16219
>  Project: Hadoop Common
>   Issue Type: Improvement
>   Components: build
> Affects Versions: 2.10.0
> Reporter: Steve Loughran
>
>
> Java 7 is long EOL; having branch-2 require it is simply making the release
> process a pain (we aren't building, testing, or releasing on java 7 JVMs
> any more, are we?).
>
> Staying on java 7 complicates backporting, JAR updates for CVEs (hello
> Guava!)  are becoming impossible.
>
> Proposed: increment javac.version = 1.8
>

Re: Including Original Author in git commits.

2019-02-14 Thread Aaron Fabbri

+1. I think formatted patches and PRs will be an improvement.  I've used
the git --committer thing a couple of times here without issue.

Another unrelated improvement with github is handling of large changes. I
really think large patches should be split up into logical subcommitts and
PR's support this naturally. (We could also use something like quilt and
change the JIRA process to understand patch sets but I may be the only one
excited about that idea.)

On Thu, Feb 14, 2019 at 5:24 AM Vinayakumar B 
wrote:

> So.. if we started doing that already.. we can encourage contributors to
> attach formatted patch.. or create PRs.
>
> And update wiki to follow exact steps to contribute and commit.
>
> -Vinay
>
>
> On Thu, 14 Feb 2019, 4:54 pm Steve Loughran  wrote:
>
> > I've been trying to do that recently, though as it forces me to go to the
> > command line rather than using Atlassian Sourcetree, I've been getting
> > other things wrong. To those people who have been dealing with commits
> I've
> > managed to mess up: apologies.
> >
> > 1. Once someone is down as an author you don't need to add their email
> > address; the first time you will need to get their email address
> > 2. Akira, Aaron and I also use the -S option to GPG sign the commits. We
> > should all be doing that, as it is the way to show who really committed
> the
> > patch. Add --show-signature to the end of any git log to command to see
> > those.
> > 3. note that if you cherry-pick a patch into a different branch, you have
> > to use -S in the git cherry-pick command to resign it.
> >
> > we should all have our GPG keys in the KEYS file, and co-sign the others
> in
> > there, so that we have that mutual trust.
> >
> > -Steve
> >
> > ps: one flaw in the GPG process: if you ever revoke the key then all
> > existing commits are considered untrusted
> >
> >
> http://steveloughran.blogspot.com/2017/10/roca-breaks-my-commit-process.html
> >
> >
> >
> >
> > On Thu, Feb 14, 2019 at 9:12 AM Akira Ajisaka 
> wrote:
> >
> > > Hi Vinay,
> > >
> > > I'm already doing this if I can get the original author name and the
> > > email address in some way.
> > > If the patch is created by git format-patch command, smart-apply-patch
> > > --committer option can do this automatically.
> > >
> >
> > Never knew that
> >
>

Re: Mentorship opportunity for aspiring Hadoop developer

2019-01-17 Thread Aaron Fabbri

(list moved to bcc:)

Thank you for all the responses to my email. There is more interest than I
expected!

I've created a small survey to collect details from people interested in
being mentored.  Please fill it out if you are interested.
https://goo.gl/forms/oRfOUe1nYhrZNX3u2

I also want to set expectations.. based on the large response I won't be
able to work with everyone.  I'll do my best to keep you updated via direct
email.

I am also looking into Google Summer of Code, which may interest you as
well.

Thank you,
Aaron

Re: Mentorship opportunity for aspiring Hadoop developer

2019-01-16 Thread Aaron Fabbri

Good call. I didn't think of GSOC. I'll check it out.

I received a lot of email responses.. Tomorrow I'll send out an update with
a small survey so I can get more organized in helping the folks that
reached out so far.

On Wed, Jan 16, 2019 at 4:21 AM Steve Loughran 
wrote:

> the google summer of code program is setting for the season, ASF supports
> this if you want to volunteer to help out there -it'd be great for the
> project and whoever joins up
>
> > On 14 Jan 2019, at 18:39, Aaron Fabbri  wrote:
> >
> > Hi,
> >
> > I'd like to offer to mentor a developer who is interested in getting into
> > Apache Hadoop development.
> >
> > If you or someone you know is interested, please email me (unicast). This
> > would probably be a three month thing where we meet every week or two.
> >
> > I'm also curious if Apache and/or Hadoop communities have any program or
> > resources on mentoring. I'm interested in growing the dev community so
> > please forward any good links or experiences.
> >
> > Cheers,
> > Aaron
>
>

Mentorship opportunity for aspiring Hadoop developer

2019-01-14 Thread Aaron Fabbri

Hi,

I'd like to offer to mentor a developer who is interested in getting into
Apache Hadoop development.

If you or someone you know is interested, please email me (unicast). This
would probably be a three month thing where we meet every week or two.

I'm also curious if Apache and/or Hadoop communities have any program or
resources on mentoring. I'm interested in growing the dev community so
please forward any good links or experiences.

Cheers,
Aaron

Re: [VOTE] Release Apache Hadoop 3.2.0 - RC1

2019-01-10 Thread Aaron Fabbri

Thanks Sunil and everyone who has worked on this release.

+1 from me.

- Verified checksums for tar file.
- Built from tar.gz.
- Ran through S3A and S3Guard integration tests (in AWS us-west 2).

This includes a yarn minicluster test but is mostly focused on s3a/s3guard.

Cheers,
Aaron


On Thu, Jan 10, 2019 at 2:32 PM Kuhu Shukla 
wrote:

> +1 (non-binding)
>
> - built from source on Mac
> - deployed on a pseudo distributed one node cluster
> - ran example jobs like sleep and wordcount.
>
> Thank you for all the work on this release.
> Regards,
> Kuhu
>
> On Thu, Jan 10, 2019 at 10:32 AM Craig.Condit 
> wrote:
>
> > +1 (non-binding)
> >
> > - built from source on CentOS 7.5
> > - deployed single node cluster
> > - ran several yarn jobs
> > - ran multiple docker jobs, including spark-on-docker
> >
> > On 1/8/19, 5:42 AM, "Sunil G"  wrote:
> >
> > Hi folks,
> >
> >
> > Thanks to all of you who helped in this release [1] and for helping
> to
> > vote
> > for RC0. I have created second release candidate (RC1) for Apache
> > Hadoop
> > 3.2.0.
> >
> >
> > Artifacts for this RC are available here:
> >
> > http://home.apache.org/~sunilg/hadoop-3.2.0-RC1/
> >
> >
> > RC tag in git is release-3.2.0-RC1.
> >
> >
> >
> > The maven artifacts are available via repository.apache.org at
> >
> > https://repository.apache.org/content/repositories/orgapachehadoop-1178/
> >
> >
> > This vote will run 7 days (5 weekdays), ending on 14th Jan at 11:59
> pm
> > PST.
> >
> >
> >
> > 3.2.0 contains 1092 [2] fixed JIRA issues since 3.1.0. Below feature
> > additions
> >
> > are the highlights of this release.
> >
> > 1. Node Attributes Support in YARN
> >
> > 2. Hadoop Submarine project for running Deep Learning workloads on
> YARN
> >
> > 3. Support service upgrade via YARN Service API and CLI
> >
> > 4. HDFS Storage Policy Satisfier
> >
> > 5. Support Windows Azure Storage - Blob file system in Hadoop
> >
> > 6. Phase 3 improvements for S3Guard and Phase 5 improvements S3a
> >
> > 7. Improvements in Router-based HDFS federation
> >
> >
> >
> > Thanks to Wangda, Vinod, Marton for helping me in preparing the
> > release.
> >
> > I have done few testing with my pseudo cluster. My +1 to start.
> >
> >
> >
> > Regards,
> >
> > Sunil
> >
> >
> >
> > [1]
> >
> >
> >
> https://lists.apache.org/thread.html/68c1745dcb65602aecce6f7e6b7f0af3d974b1bf0048e7823e58b06f@%3Cyarn-dev.hadoop.apache.org%3E
> >
> > [2] project in (YARN, HADOOP, MAPREDUCE, HDFS) AND fixVersion in
> > (3.2.0)
> > AND fixVersion not in (3.1.0, 3.0.0, 3.0.0-beta1) AND status =
> Resolved
> > ORDER BY fixVersion ASC
> >
> >
> >
>

Re: Hadoop 3.2 Release Plan proposal

2018-10-02 Thread Aaron Fabbri

- (Virajit) HDFS-12615: Router-based HDFS federation. Improvement
> >>>>> works.
> >>>>> - (Steve) S3Guard Phase III, S3a phase V, Support Windows Azure
> >>>>> Storage. In progress.
> >>>>>
> >>>>> 3. Tentative features:
> >>>>>
> >>>>> - (Haibo Chen) YARN-1011: Resource overcommitment. Looks challenging
> >>>>> to be done before Aug 2018.
> >>>>> - (Eric) YARN-7129: Application Catalog for YARN applications.
> >>>>> Challenging as more discussions are on-going.
> >>>>>
> >>>>> *Summary of 3.2.0 issues status:*
> >>>>>
> >>>>> 39 Blocker and Critical issues [1] are open, I am checking with
> owners
> >>>>> to get status on each of them to get in by Code Freeze date.
> >>>>>
> >>>>> [1] project in (YARN, HADOOP, MAPREDUCE, HDFS) AND priority in
> >>>>> (Blocker, Critical) AND resolution = Unresolved AND "Target
> Version/s" =
> >>>>> 3.2.0 ORDER BY priority DESC
> >>>>>
> >>>>> Thanks,
> >>>>> Sunil
> >>>>>
> >>>>> On Fri, Jul 20, 2018 at 8:03 AM Sunil G  wrote:
> >>>>>
> >>>>>> Thanks Subru for the thoughts.
> >>>>>> One of the main reason for a major release is to push out critical
> >>>>>> features with a faster cadence to the users. If we are pulling more
> and
> >>>>>> more different types of features to a minor release, that branch
> will
> >>>>>> become more destabilized and it may be tough to say that 3.1.2 is
> stable
> >>>>>> that 3.1.1 for eg. We always tend to improve and stabilize features
> in
> >>>>>> subsequent minor release.
> >>>>>> For few companies, it makes sense to push out these new features
> >>>>>> faster to make a reach to the users. Adding to the point to the
> backporting
> >>>>>> issues, I agree that its a pain and we can workaround that with
> some git
> >>>>>> scripts. If we can make such scripts available to committers,
> backport will
> >>>>>> be seem-less across branches and we can achieve the faster release
> cadence
> >>>>>> also.
> >>>>>>
> >>>>>> Thoughts?
> >>>>>>
> >>>>>> - Sunil
> >>>>>>
> >>>>>>
> >>>>>> On Fri, Jul 20, 2018 at 3:37 AM Subru Krishnan 
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Thanks Sunil for volunteering to lead the release effort. I am
> >>>>>>> generally
> >>>>>>> supportive of a release but -1 on a 3.2 (prefer a 3.1.x) as feel we
> >>>>>>> already
> >>>>>>> have too many branches to be maintained. I already see many commits
> >>>>>>> are in
> >>>>>>> different branches with no apparent rationale, for e.g: 3.1 has
> >>>>>>> commits
> >>>>>>> which are absent in 3.0 etc.
> >>>>>>>
> >>>>>>> Additionally AFAIK 3.x has not been deployed in any major
> production
> >>>>>>> setting so the cost of adding features should be minimal.
> >>>>>>>
> >>>>>>> Thoughts?
> >>>>>>>
> >>>>>>> -Subru
> >>>>>>>
> >>>>>>> On Thu, Jul 19, 2018 at 12:31 AM, Sunil G 
> wrote:
> >>>>>>>
> >>>>>>> > Thanks Steve, Aaron, Wangda for sharing thoughts.
> >>>>>>> >
> >>>>>>> > Yes, important changes and features are much needed, hence we
> will
> >>>>>>> be
> >>>>>>> > keeping the door open for them as possible. Also considering few
> >>>>>>> more
> >>>>>>> > offline requests from other folks, I think extending the
> timeframe
> >>>>>>> by
> >>>>>>> > couple of weeks makes sense (including a second RC buffer) and
> >>>>>>> this should
> >>>>>>> > ideally help us to ship this by September itself.
> >>>

[jira] [Created] (HADOOP-15780) S3Guard: document how to deal with non-S3Guard processes writing data to S3Guarded buckets

2018-09-20 Thread Aaron Fabbri (JIRA)

Aaron Fabbri created HADOOP-15780:
-

 Summary: S3Guard: document how to deal with non-S3Guard processes 
writing data to S3Guarded buckets
 Key: HADOOP-15780
 URL: https://issues.apache.org/jira/browse/HADOOP-15780
 Project: Hadoop Common
  Issue Type: Sub-task
Affects Versions: 3.2.0
Reporter: Aaron Fabbri


Our general policy for S3Guard is this: All modifiers of a bucket that is 
configured for use with S3Guard, must use S3Guard. Otherwise, the MetadataStore 
will not be properly updated as the S3 bucket changes and problems will arise.

There are limited circumstances in which may be safe to have an external 
(non-s3guard) process writing data.  There are also scenarios where it 
definitely breaks things.

I think we should start by documenting the cases that this works / does not 
work for. After we've enumerated that, we can suggest enhancements as needed to 
make this sort of configuration easier to use.

To get the ball rolling, some things that do not work:
- Deleting a path *p* with S3Guard, then writing a new file at path *p* without 
S3guard (will still have delete marker in S3Guard, making the file appear to be 
deleted but still visible in S3 due to false "eventual consistency") (as 
[~ste...@apache.org] and I have discussed)
- When fs.s3a.metadatastore.authoritative is true, adding files to directories 
without S3Guard, then listing with S3Guard may exclude externally-written files 
from listings.

(Note, there are also S3A interop issues with other non-S3A clients even 
without S3Guard, due to the unique way S3A interprets empty directory markers).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

[jira] [Created] (HADOOP-15779) S3guard: add inconsistency detection metrics

2018-09-20 Thread Aaron Fabbri (JIRA)

Aaron Fabbri created HADOOP-15779:
-

 Summary: S3guard: add inconsistency detection metrics
 Key: HADOOP-15779
 URL: https://issues.apache.org/jira/browse/HADOOP-15779
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Affects Versions: 3.2.0
Reporter: Aaron Fabbri


S3Guard uses a trailing log of metadata changes made to an S3 bucket to add 
consistency to the eventually-consistent AWS S3 service. We should add some 
metrics that are incremented when we detect inconsistency (eventual 
consistency) in S3.

I'm thinking at least two counters: (1) getFileStatus() (HEAD) inconsistency 
detected, and (2) listing inconsistency detected. We may want to further 
separate into categories (present / not present etc.)

This is related to Auth. Mode and TTL work that is ongoing, so let me outline 
how I think this should all evolve:

This should happen after HADOOP-15621 (TTL for dynamo MetadataStore), since 
that will change *when* we query both S3 and the MetadataStore (e.g. Dynamo) 
for metadata. There I suggest that:
 # Prune time is different than TTL. Prune time: "how long until inconsistency 
is no longer a problem" . TTL time "how long a MetadataStore entry is 
considered authoritative before refresh"
 # Prune expired: delete entries (when hadoop CLI prune command is run). TTL 
Expired: entries become non-authoritative.
 #  Prune implemented in each MetadataStore, but TTL filtering happens in S3A.

Once we have this, S3A will be consulting both S3 and MetadataStore depending 
on configuration and/or age of the entry in the MetadataStore. Today 
HEAD/getFileStatus() is always short-circuit (skips S3 if MetadataStore returns 
results). I think S3A should consult both when TTL is stale, and invoke a 
callback on inconsistency that increments the new metrics. For listing, we 
already are comparing both sources of truth (except when S3A auth mode is on 
and a directory is marked authoritative in MS), so it would be pretty simple to 
invoke a callback on inconsistency and bump some metrics.

Comments / suggestions / questions welcomed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Re: [VOTE] Merge HADOOP-15407 to trunk

2018-09-19 Thread Aaron Fabbri

+1 (Binding). Haven't thoroughly reviewed all the new code but successfully
ran all the integration and unit tests today. Based on the progress this
looks ready to merge.  Good work folks.

On Wed, Sep 19, 2018 at 5:19 PM John Zhuge  wrote:

> +1 (Binding)  Nice effort all around.
>
> On Tue, Sep 18, 2018 at 2:45 AM Steve Loughran 
> wrote:
>
> > +1 (Binding)
> >
> > Ive been testing this; the current branch is rebased to trunk and all the
> > new tests are happy.
> >
> >
> > the connector is as good as any of the connectors get before they are
> > ready for people to play with: there are always surprises in the wild,
> > usually network and configuration —those we have to wait and see what
> > happens,
> >
> >
> >
> >
> >
> >
> > > On 18 Sep 2018, at 04:10, Sean Mackrory  wrote:
> > >
> > > All,
> > >
> > > I would like to propose that HADOOP-15407 be merged to trunk. As
> > described
> > > in that JIRA, this is a complete reimplementation of the current
> > > hadoop-azure storage driver (WASB) with some significant advantages.
> The
> > > impact outside of that module is very limited, however, and it appears
> > that
> > > on-going improvements will continue to be so. The tests have been
> stable
> > > for some time, and I believe we've reached the point of being ready for
> > > broader feedback and to continue incremental improvements in trunk.
> > > HADOOP-15407 was rebased on trunk today and I had a successful test
> run.
> > >
> > > I'd like to call out the contributions of Thomas Marquardt, Da Zhou,
> and
> > Steve
> > > Loughran who have all contributed significantly to getting this branch
> to
> > > its current state. Numerous other developers are named in the commit
> log
> > > and the JIRA.
> > >
> > > I'll start us off:
> > >
> > > +1 (binding)
> >
> >
>
> --
> John
>

[jira] [Created] (HADOOP-15754) s3guard: testDynamoTableTagging should clear existing config

2018-09-12 Thread Aaron Fabbri (JIRA)

Aaron Fabbri created HADOOP-15754:
-

 Summary: s3guard: testDynamoTableTagging should clear existing 
config
 Key: HADOOP-15754
 URL: https://issues.apache.org/jira/browse/HADOOP-15754
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Affects Versions: 3.2.0
Reporter: Aaron Fabbri


I recently committed HADOOP-14734 which adds support for tagging Dynamo DB 
tables for S3Guard when they are created.

 

Later, when testing another patch, I hit a test failure because I still had a 
tag option set in my test configuration (auth-keys.xml) that was adding my own 
table tag.
{noformat}
[ERROR] 
testDynamoTableTagging(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardToolDynamoDB)
  Time elapsed: 13.384 s  <<< FAILURE!

java.lang.AssertionError: expected:<2> but was:<3>

        at org.junit.Assert.fail(Assert.java:88)

        at org.junit.Assert.failNotEquals(Assert.java:743)

        at org.junit.Assert.assertEquals(Assert.java:118)

        at org.junit.Assert.assertEquals(Assert.java:555)

        at org.junit.Assert.assertEquals(Assert.java:542)

        at 
org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardToolDynamoDB.testDynamoTableTagging(ITestS3GuardToolDynamoDB.java:129)

        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

        at java.lang.reflect.Method.invoke(Method.java:498)

        at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)

        at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)

        at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)

        at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)

        at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)

        at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)

        at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)

        at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74){noformat}
I think the solution is just to clear any tag.* options set in the 
configuration at the beginning of the test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

[jira] [Created] (HADOOP-15621) s3guard: implement time-based (TTL) expiry for DynamoDB Metadata Store

2018-07-19 Thread Aaron Fabbri (JIRA)

Aaron Fabbri created HADOOP-15621:
-

 Summary: s3guard: implement time-based (TTL) expiry for DynamoDB 
Metadata Store
 Key: HADOOP-15621
 URL: https://issues.apache.org/jira/browse/HADOOP-15621
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.0.0-beta1
Reporter: Aaron Fabbri
Assignee: Gabor Bota
 Fix For: 3.2.0


LocalMetadataStore is primarily a reference implementation for testing.  It may 
be useful in narrow circumstances where the workload can tolerate short-term 
lack of inter-node consistency:  Being in-memory, one JVM/node's 
LocalMetadataStore will not see another node's changes to the underlying 
filesystem.

To put a bound on the time during which this inconsistency may occur, we should 
implement time-based (a.k.a. Time To Live / TTL)  expiration for 
LocalMetadataStore




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Re: Hadoop 3.2 Release Plan proposal

2018-07-18 Thread Aaron Fabbri

On Tue, Jul 17, 2018 at 7:21 PM Steve Loughran 
wrote:

>
>
> On 16 Jul 2018, at 23:45, Sunil G  sun...@apache.org>> wrote:
>
> I would also would like to take this opportunity to come up with a detailed
> plan.
>
> - Feature freeze date : all features should be merged by August 10, 2018.
>
>
>
> 

>
> Please let me know if I missed any features targeted to 3.2 per this
>
>
> Well there these big todo lists for S3 & S3Guard.
>
> https://issues.apache.org/jira/browse/HADOOP-15226
> https://issues.apache.org/jira/browse/HADOOP-15220
>
>
> There's a bigger bit of work coming on for Azure Datalake Gen 2
> https://issues.apache.org/jira/browse/HADOOP-15407
>
> I don't think this is quite ready yet, I've been doing work on it, but if
> we have a 3 week deadline, I'm going to expect some timely reviews on
> https://issues.apache.org/jira/browse/HADOOP-15546
>
> I've uprated that to a blocker feature; will review the S3 & S3Guard JIRAs
> to see which of those are blocking. Then there are some pressing "guave,
> java 9 prep"
>
>
 I can help with this part if you like.



>
>
>
> timeline. I would like to volunteer myself as release manager of 3.2.0
> release.
>
>
> well volunteered!
>
>
>
Yes, thank you for stepping up.


>
> I think this raises a good q: what timetable should we have for the 3.2. &
> 3.3 releases; if we do want a faster cadence, then having the outline time
> from the 3.2 to the 3.3 release means that there's less concern about
> things not making the 3.2 dealine
>
> -Steve
>
>
Good idea to mitigate the short deadline.

-AF

[jira] [Created] (HADOOP-15525) s3a: clarify / improve support for mixed ACL buckets

2018-06-08 Thread Aaron Fabbri (JIRA)

Aaron Fabbri created HADOOP-15525:
-

 Summary: s3a: clarify / improve support for mixed ACL buckets
 Key: HADOOP-15525
 URL: https://issues.apache.org/jira/browse/HADOOP-15525
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Affects Versions: 3.0.0
Reporter: Aaron Fabbri


Scenario: customer wants to only give a Hadoop cluster access to a subtree of 
an S3 bucket.

For example, assume Hadoop uses some IAM identity "hadoop", which they wish to 
grant full permission to everything under the following path:

s3a://bucket/a/b/c/hadoop-dir

they don't want hadoop user to be able to read/list/delete anything outside of 
the hadoop-dir "subdir"

Problems: 

To implement the "directory structure on flat key space" emulation logic we use 
to present a Hadoop FS on top of a blob store, we need to create / delete / 
list ancestors of {{hadoop-dir}}. 

I'd like us to either (1) document a workaround (example IAM ACLs) that gets 
this basic functionality, and/or (2) make improvements to make this less 
painful.

We've discussed some of these issues before but I didn't see a dedicated JIRA.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

[jira] [Resolved] (HADOOP-11697) Use larger value for fs.s3a.connection.timeout.

2018-06-08 Thread Aaron Fabbri (JIRA)



 [ 
https://issues.apache.org/jira/browse/HADOOP-11697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Fabbri resolved HADOOP-11697.
---
   Resolution: Duplicate
Fix Version/s: 3.0.0

> Use larger value for fs.s3a.connection.timeout.
> ---
>
> Key: HADOOP-11697
> URL: https://issues.apache.org/jira/browse/HADOOP-11697
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 2.6.0
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
>Priority: Minor
>  Labels: s3
> Fix For: 3.0.0
>
> Attachments: HADOOP-11697.001.patch, HDFS-7908.000.patch
>
>
> The default value of {{fs.s3a.connection.timeout}} is {{5}} milliseconds. 
> It causes many {{SocketTimeoutException}} when uploading large files using 
> {{hadoop fs -put}}. 
> Also, the units for {{fs.s3a.connection.timeout}} and 
> {{fs.s3a.connection.estaablish.timeout}} are milliseconds. For s3 
> connections, I think it is not necessary to have sub-seconds timeout value. 
> Thus I suggest to change the time unit to seconds, to easy sys admin's job.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

[jira] [Created] (HADOOP-15444) ITestS3GuardToolDynamo should only run with -Ddynamo

2018-05-02 Thread Aaron Fabbri (JIRA)

Aaron Fabbri created HADOOP-15444:
-

 Summary: ITestS3GuardToolDynamo should only run with -Ddynamo 
 Key: HADOOP-15444
 URL: https://issues.apache.org/jira/browse/HADOOP-15444
 Project: Hadoop Common
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Aaron Fabbri


If you run S3A integration tests with just {{-Ds3guard}} and not {{-Ddynamo}}, 
then I do not think that ITestS3GuardToolDynamo should run.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

[jira] [Created] (HADOOP-15420) s3guard ITestS3GuardToolLocal failures in diff tests

2018-04-26 Thread Aaron Fabbri (JIRA)

Aaron Fabbri created HADOOP-15420:
-

 Summary: s3guard ITestS3GuardToolLocal failures in diff tests
 Key: HADOOP-15420
 URL: https://issues.apache.org/jira/browse/HADOOP-15420
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Aaron Fabbri


Noticed this when testing the patch for HADOOP-13756.

 
{code:java}
[ERROR] Failures:

[ERROR]   
ITestS3GuardToolLocal>AbstractS3GuardToolTestBase.testPruneCommandCLI:221->AbstractS3GuardToolTestBase.testPruneCommand:201->AbstractS3GuardToolTestBase.assertMetastoreListingCount:214->Assert.assertEquals:555->Assert.assertEquals:118->Assert.failNotEquals:743->Assert.fail:88
 Pruned children count 
[PathMetadata{fileStatus=S3AFileStatus{path=s3a://bucket-new/test/testPruneCommandCLI/stale;
 isDirectory=false; length=100; replication=1; blocksize=512; 
modification_time=1524798258286; access_time=0; owner=hdfs; group=hdfs; 
permission=rw-rw-rw-; isSymlink=false; hasAcl=false; isEncrypted=false; 
isErasureCoded=false} isEmptyDirectory=FALSE; isEmptyDirectory=UNKNOWN; 
isDeleted=false}, 
PathMetadata{fileStatus=S3AFileStatus{path=s3a://bucket-new/test/testPruneCommandCLI/fresh;
 isDirectory=false; length=100; replication=1; blocksize=512; 
modification_time=1524798262583; access_time=0; owner=hdfs; group=hdfs; 
permission=rw-rw-rw-; isSymlink=false; hasAcl=false; isEncrypted=false; 
isErasureCoded=false} isEmptyDirectory=FALSE; isEmptyDirectory=UNKNOWN; 
isDeleted=false}] expected:<1> but was:<2>{code}
 

Looking through the code, I'm noticing a couple of issues.

 

1. {{testDiffCommand()}} is in {{ITestS3GuardToolLocal}}, but it should really 
be running for all MetadataStore implementations.  Seem like it should live in 
{{AbstractS3GuardToolTestBase}}.

2. {{AbstractS3GuardToolTestBase#createFile()}} seems wrong. When 
{{onMetadataStore}} is false, it does a {{ContractTestUtils.touch(file)}}, but 
the fs is initialized with a MetadataStore present, so won't the fs put the 
file in the MetadataStore?

There are other tests which explicitly go around the MetadataStore by using 
{{fs.setMetadataStore(nullMS)}}, e.g. ITestS3AInconsistency. We should do 
something similar in {{AbstractS3GuardToolTestBase#createFile()}}, minding any 
issues with parallel test runs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Re: Hadoop-trunk-Commit failing due to libprotoc version

2018-04-23 Thread Aaron Fabbri

This appears to be happening again, e.g.
https://builds.apache.org/job/Hadoop-trunk-Commit/14050/



On Mon, Mar 26, 2018 at 3:31 PM, Xiaoyu Yao  wrote:

> Could this caused by the docker base image changes? As shown in the error
> message, we are still expecting protobuf v2.5.0 but the one in the docker
> image is changed to libprotoc 2.6.1.
>
> In hadoop/dev-support/docker/Dockerfile, we have the following lines to
> install libprotoc without specifying version like below.
>
> RUN apt-get -q update && apt-get -q install -y  libprotoc-dev
>
> I think this can be fixed by specifying the version like:
>
> RUN apt-get -q update && apt-get -q install -y  libprotoc-dev=2.5.0
>
> On 3/26/18, 2:37 PM, "Sean Mackrory"  wrote:
>
> Most of the commit jobs in the last few hours have failed. I would
> suspect
> a change in the machines or images used to run the job. Who has access
> to
> confirm such a change and perhaps correct it?
>
> [ERROR] Failed to execute goal
> org.apache.hadoop:hadoop-maven-plugins:3.2.0-SNAPSHOT:protoc
> (compile-protoc) on project hadoop-common:
> org.apache.maven.plugin.MojoExecutionException: protoc version is
> 'libprotoc 2.6.1', expected version is '2.5.0' -> [Help 1]
>
>
>
> -
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>

[jira] [Created] (HADOOP-15400) Improve S3Guard documentation on Authoritative Mode implemenation

2018-04-19 Thread Aaron Fabbri (JIRA)

Aaron Fabbri created HADOOP-15400:
-

 Summary: Improve S3Guard documentation on Authoritative Mode 
implemenation
 Key: HADOOP-15400
 URL: https://issues.apache.org/jira/browse/HADOOP-15400
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs/s3
Affects Versions: 3.0.1
Reporter: Aaron Fabbri


Part of the design of S3Guard is support for skipping the call to S3 
listObjects and serving directory listings out of the MetadataStore under 
certain circumstances.  This feature is called "authoritative" mode.  I've 
talked to many people about this feature and it seems to be universally 
confusing.

I suggest we improve / add a section to the s3guard.md site docs elaborating on 
what Authoritative Mode is.

It is *not* treating the MetadataStore (e.g. dynamodb) as the source of truth 
in general.

It *is* the ability to short-circuit S3 list objects and serve listings from 
the MetadataStore in some circumstances: 

For S3A to skip S3's list objects on some *path*, and serve it directly from 
the MetadataStore, the following things must all be true:
 # The MetadataStore implementation persists the bit 
{{DirListingMetadata.isAuthorititative}} set when calling 
{{MetadataStore#put(DirListingMetadata)}}
 # The S3A client is configured to allow metadatastore to be authoritative 
source of a directory listing (fs.s3a.metadatastore.authoritative=true).
 # The MetadataStore has a full listing for *path* stored in it.  This only 
happens if the FS client (s3a) explicitly has stored a full directory listing 
with {{DirListingMetadata.isAuthorititative=true}} before the said listing 
request happens.

Note that #1 only currently happens in LocalMetadataStore. Adding support to 
DynamoDBMetadataStore is covered in HADOOP-14154.

Also, the multiple uses of the word "authoritative" are confusing. Two meanings 
are used:
 1. In the FS client configuration fs.s3a.metadatastore.authoritative
 - Behavior of S3A code (not MetadataStore)
 - "S3A is allowed to skip S3.list() when it has full listing from 
MetadataStore"

2. MetadataStore
 When storing a dir listing, can set a bit isAuthoritative
 1 : "full contents of directory"
 0 : "may not be full listing"

Note that a MetadataStore *MAY* persist this bit. (not *MUST*).

We should probably rename the {{DirListingMetadata.isAuthorititative}} to 
{{.fullListing}} or at least put a comment where it is used to clarify its 
meaning.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

[jira] [Resolved] (HADOOP-15207) Hadoop performance Issues

2018-02-01 Thread Aaron Fabbri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Fabbri resolved HADOOP-15207.
---
Resolution: Invalid

Can someone please ban this user? Similar SEO spam has been pasted on other 
sites I can see.  We should remove the link and/or delete the Jira too.

> Hadoop performance Issues
> -
>
> Key: HADOOP-15207
> URL: https://issues.apache.org/jira/browse/HADOOP-15207
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Affects Versions: HADOOP-13345
>Reporter: nicole wells
>Priority: Minor
>
> I am doing a hadoop project where I am working with 100MB, 500MB, 1GB files. 
> A multinode hadoop cluster with 4 nodes is implemented for the purpose. The 
> time taken for running the mapreduce program in multinode cluster is much 
> larger than the time taken in running single node cluster setup. Also, it is 
> shocking to observe that the basic Java program(without [Hadoop 
> BigData|https://mindmajix.com/hadoop-training]) finishes the operation faster 
> than both the single and multi node clusters. Here is the code for the mapper 
> class:
>  
> {code:java}
> public class myMapperClass extends MapReduceBase implements 
> Mapper<LongWritable, Text, Text, IntWritable>
> {
>  private final static IntWritable one = new IntWritable(1);
>  private final static IntWritable two = new IntWritable(2);
>  private final static IntWritable three = new IntWritable(3);
>  private final static IntWritable four = new IntWritable(4);
>  private final static IntWritable five = new IntWritable(5);
>  private final static IntWritable six = new IntWritable(6);
>  private final static IntWritable seven = new IntWritable(7);
>  private final static IntWritable eight = new IntWritable(8);
>  private final static IntWritable nine= new IntWritable(9);
>   private Text srcIP,srcIPN;
>   private Text dstIP,dstIPN;
>   private Text srcPort,srcPortN;
>   private Text dstPort,dstPortN;
>   private Text counter1,counter2,counter3,counter4,counter5 ;
>   //private Text total_records;
>   int ddos_line = 0;
>   //map method that performs the tokenizer job and framing the initial key 
> value pairs
>   @Override
> public void map(LongWritable key, Text value, OutputCollector<Text, 
> IntWritable> output, Reporter reporter) throws IOException
>   {
> String line1 = value.toString();
> ddos_line++;
> int pos1=0;
> int lineno=0;
> int[] count = {10, 10, 10, 10, 10};
> int[] lineIndex = {0, 0, 0, 0, 0};
> for(int i=0;i<9;i++)
> {
> pos1 = line1.indexOf("|",pos1+1);
> }
> srcIP =  new Text( line1.substring(0,line1.indexOf("|")) );
> String srcIPP = srcIP.toString();
> dstIP = new Text(line1.substring( 
> srcIPP.length()+1,line1.indexOf("|",srcIPP.length()+1)) ) ;
> srcPort = new Text( line1.substring(pos1+1,line1.indexOf("|",pos1+1)) 
> );
> pos1 = line1.indexOf("|",pos1+1);
> dstPort = new Text( line1.substring(pos1+1,line1.indexOf("|",pos1+1)) 
> );
> //BufferedReader br = new BufferedReader(new 
> FileReader("/home/yogi/Desktop/normal_small"));
> FileSystem fs = FileSystem.get(new Configuration());
> FileStatus[] status = fs.listStatus(new 
> Path("hdfs://master:54310/usr/local/hadoop/input/normal_small"));
> BufferedReader br=new BufferedReader(new 
> InputStreamReader(fs.open(status[0].getPath(;   
> String line=br.readLine();
> lineno++;
> boolean bool = true;
> while (bool) {
> for(int i=0; i<5;i++)
> {
> if(bool==false)
> break;
> int pos=0;
> int temp;
> for(int j=0;j<9;j++)
> {
> pos = line.indexOf("|",pos+1);
> }
> srcIPN =  new Text( line.substring(0,line.indexOf("|")) );
> String srcIPP2 = srcIPN.toString();
> dstIPN = new Text(line.substring( 
> srcIPP2.length()+1,line.indexOf("|",srcIPP2.length()+1)) ) ;
> srcPortN = new Text( 
> line.substring(pos+1,line.indexOf("|",pos+1)) );
> pos = line.indexOf("|",pos+1);
> dstPortN = new Text( 
> line.substring(

Re: Review for HADOOP-14918 heeded

2018-01-24 Thread Aaron Fabbri

Yep, I'll review it now.

On Wed, Jan 24, 2018 at 9:19 AM, Steve Loughran 
wrote:

> Can I get a review of HADOOP-14918
>
> All it does is rm the local dynamo test for S3Guard because the relevant
> JARs don't get updated by AWS any more, and it is out of sync with the
> latest AWS SDKs. If you (experimentally) upgrade the AWS SDK then you get
> missing method errors in places
>
> Doesn't add code, doesn't add maintenance, just takes away a local test
> option which simplifies the text matrics. thanks
>
> -steve
>

[jira] [Created] (HADOOP-15040) AWS SDK NPE bug spams logs w/ Yarn Log Aggregation

2017-11-14 Thread Aaron Fabbri (JIRA)

Aaron Fabbri created HADOOP-15040:
-

 Summary: AWS SDK NPE bug spams logs w/ Yarn Log Aggregation
 Key: HADOOP-15040
 URL: https://issues.apache.org/jira/browse/HADOOP-15040
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Affects Versions: 3.0.0-beta1
Reporter: Aaron Fabbri
Assignee: Aaron Fabbri


My colleagues working with Yarn log aggregation found that they were getting 
this message spammed in their logs when they used an s3a:// URI for logs 
(yarn.nodemanager.remote-app-log-dir):

{noformat}
getting attribute Region of com.amazonaws.management:type=AwsSdkMetrics threw 
an exception
javax.management.RuntimeMBeanException: java.lang.NullPointerException
at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrow(DefaultMBeanServerInterceptor.java:839)
at 

Caused by: java.lang.NullPointerException
at com.amazonaws.metrics.AwsSdkMetrics.getRegion(AwsSdkMetrics.java:729)
at com.amazonaws.metrics.MetricAdmin.getRegion(MetricAdmin.java:67)
at sun.reflect.GeneratedMethodAccessor132.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71)
at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
{noformat}

This happens even though the aws sdk cloudwatch metrics reporting was disabled 
(default), which is a bug. 

I filed a [github issue|https://github.com/aws/aws-sdk-java/issues/1375|] and 
it looks like a fix should be coming around SDK release 1.11.229 or so.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

[jira] [Created] (HADOOP-14936) S3Guard: remove "experimental" from documentation

2017-10-06 Thread Aaron Fabbri (JIRA)

Aaron Fabbri created HADOOP-14936:
-

 Summary: S3Guard: remove "experimental" from documentation
 Key: HADOOP-14936
 URL: https://issues.apache.org/jira/browse/HADOOP-14936
 Project: Hadoop Common
  Issue Type: Sub-task
Affects Versions: 3.0.0-beta1
Reporter: Aaron Fabbri
Priority: Minor


I think it is time to remove the "experimental feature" designation in the site 
docs for S3Guard.  Discuss.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

[jira] [Resolved] (HADOOP-14746) Cut S3AOutputStream

2017-10-04 Thread Aaron Fabbri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Fabbri resolved HADOOP-14746.
---
   Resolution: Duplicate
Fix Version/s: 3.0.0-beta1

Resolving this.. it was covered by HADOOP-14738

> Cut S3AOutputStream
> ---
>
> Key: HADOOP-14746
> URL: https://issues.apache.org/jira/browse/HADOOP-14746
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.1
>Reporter: Steve Loughran
>Priority: Minor
> Fix For: 3.0.0-beta1
>
>
> We've been happy with the new S3A BlockOutputStream, with better scale, 
> performance, instrumentation & recovery. I propose cutting the 
> older{{S3AOutputStream}} code entirely.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Re: [VOTE] Release Apache Hadoop 3.0.0-beta1 RC0

2017-10-03 Thread Aaron Fabbri

+1

Built from source.  Ran S3A integration tests in us-west-2 with S3Guard
(both Local and Dynamo metadatastore).

Everything worked fine except I hit one integration test failure.  It is a
minor test issue IMO and I've filed HADOOP-14927

Failed tests:

ITestS3GuardToolDynamoDB>AbstractS3GuardToolTestBase.testDestroyNoBucket:228
Expected an exception, got 0
  ITestS3GuardToolLocal>AbstractS3GuardToolTestBase.testDestroyNoBucket:228
Expected an exception, got 0



On Tue, Oct 3, 2017 at 2:45 PM, Ajay Kumar 
wrote:

> +1 (non-binding)
>
> - built from source
> - deployed on single node cluster
> - Basic hdfs operations
> - Run wordcount on a text file
> Thanks,
> Ajay
>
>
> On 10/3/17, 1:04 PM, "Eric Badger"  wrote:
>
> +1 (non-binding)
>
> - Verified all checksums and signatures
> - Built native from source on macOS 10.12.6 and RHEL 7.1
> - Deployed a single node pseudo cluster
> - Ran pi and sleep jobs
> - Verified Docker was marked as experimental
>
> Thanks,
>
> Eric
>
> On Tue, Oct 3, 2017 at 1:41 PM, John Zhuge 
> wrote:
>
> > +1 (binding)
> >
> >- Verified checksums and signatures of all tarballs
> >- Built source with native, Java 1.8.0_131-b11 on Mac OS X 10.12.6
> >- Verified cloud connectors:
> >   - All S3A integration tests
> >   - All ADL live unit tests
> >- Deployed both binary and built source to a pseudo cluster,
> passed the
> >following sanity tests in insecure, SSL, and SSL+Kerberos mode:
> >   - HDFS basic and ACL
> >   - DistCp basic
> >   - MapReduce wordcount (only failed in SSL+Kerberos mode for
> binary
> >   tarball, probably unrelated)
> >   - KMS and HttpFS basic
> >   - Balancer start/stop
> >
> > Hit the following errors but they don't seem to be blocking:
> >
> > == Missing dependencies during build ==
> >
> > > ERROR: hadoop-aliyun has missing dependencies: json-lib-jdk15.jar
> > > ERROR: hadoop-azure has missing dependencies:
> jetty-util-ajax-9.3.19.
> > > v20170502.jar
> > > ERROR: hadoop-azure-datalake has missing dependencies:
> okhttp-2.4.0.jar
> > > ERROR: hadoop-azure-datalake has missing dependencies:
> okio-1.4.0.jar
> >
> >
> > Filed HADOOP-14923, HADOOP-14924, and HADOOP-14925.
> >
> > == Unit tests failed in Kerberos+SSL mode for KMS and HttpFs default
> HTTP
> > servlet /conf, /stacks, and /logLevel ==
> >
> > One example below:
> >
> > >Connecting to
> > > https://localhost:14000/logLevel?log=org.apache.
> hadoop.fs.http.server.
> > HttpFSServer
> > >Exception in thread "main"
> > > org.apache.hadoop.security.authentication.client.
> > AuthenticationException:
> > > Authentication failed, URL:
> > > https://localhost:14000/logLevel?log=org.apache.
> hadoop.fs.http.server.
> > HttpFSServer=jzhuge,
> > > status: 403, message: GSSException: Failure unspecified at GSS-API
> level
> > > (Mechanism level: Request is a replay (34))
> >
> >
> > The /logLevel failure will affect command "hadoop daemonlog".
> >
> >
> > On Tue, Oct 3, 2017 at 10:56 AM, Andrew Wang <
> andrew.w...@cloudera.com>
> > wrote:
> >
> > > Thanks for all the votes thus far! We've gotten the binding +1's
> to close
> > > the release, though are there contributors who could kick the
> tires on
> > > S3Guard and YARN TSv2 alpha2? These are the two new features
> merged since
> > > alpha4, so it'd be good to get some coverage.
> > >
> > >
> > >
> > > On Tue, Oct 3, 2017 at 9:45 AM, Brahma Reddy Battula <
> bra...@apache.org>
> > > wrote:
> > >
> > > >
> > > > Thanks Andrew.
> > > >
> > > > +1 (non binding)
> > > >
> > > > --Built from source
> > > > --installed 3 node HA cluster
> > > > --Verified shell commands and UI
> > > > --Ran wordcount/pic jobs
> > > >
> > > >
> > > >
> > > >
> > > > On Fri, 29 Sep 2017 at 5:34 AM, Andrew Wang <
> andrew.w...@cloudera.com>
> > > > wrote:
> > > >
> > > >> Hi all,
> > > >>
> > > >> Let me start, as always, by thanking the many, many
> contributors who
> > > >> helped
> > > >> with this release! I've prepared an RC0 for 3.0.0-beta1:
> > > >>
> > > >> http://home.apache.org/~wang/3.0.0-beta1-RC0/
> > > >>
> > > >> This vote will run five days, ending on Nov 3rd at 5PM Pacific.
> > > >>
> > > >> beta1 contains 576 fixed JIRA issues comprising a number of bug
> fixes,
> > > >> improvements, and feature enhancements. Notable additions
> include the
> > > >> addition of YARN Timeline Service v2 alpha2, S3Guard,
> completion of
> > the
> > > >> shaded client, and HDFS erasure coding pluggable policy support.
> > > >>
> >

[jira] [Created] (HADOOP-14927) ITestS3GuardTool failures in testDestroyNoBucket()

2017-10-03 Thread Aaron Fabbri (JIRA)

Aaron Fabbri created HADOOP-14927:
-

 Summary: ITestS3GuardTool failures in testDestroyNoBucket()
 Key: HADOOP-14927
 URL: https://issues.apache.org/jira/browse/HADOOP-14927
 Project: Hadoop Common
  Issue Type: Bug
  Components: s3
Affects Versions: 3.0.0-alpha3
Reporter: Aaron Fabbri
Priority: Minor


Hit this when testing for the Hadoop 3.0.0-beta1 RC0.

{noformat}
hadoop-3.0.0-beta1-src/hadoop-tools/hadoop-aws$ mvn clean verify 
-Dit.test="ITestS3GuardTool*" -Dtest=none -Ds3guard -Ddynamo
...
Failed tests: 
  ITestS3GuardToolDynamoDB>AbstractS3GuardToolTestBase.testDestroyNoBucket:228 
Expected an exception, got 0
  ITestS3GuardToolLocal>AbstractS3GuardToolTestBase.testDestroyNoBucket:228 
Expected an exception, got 0
{noformat}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Re: [VOTE] Merge HADOOP-13345 (S3Guard feature branch)

2017-08-25 Thread Aaron Fabbri

Thank you everyone for reviewing and voting on the S3Guard feature branch
merge.

It looks like the Vote was a success. We have six binding +1's (Steve
Loughran, Sean Mackrory, Mingliang Liu, Sanjay Radia, Kihwal Lee, and Lei
(Eddy) Xu) and zero -1's.

I will coordinate w/ Steve L to get this committed to trunk.  I think we
are going to bring it to branch-2 as well.

-AF


On Thu, Aug 24, 2017 at 1:54 PM, Mingliang Liu <lium...@gmail.com> wrote:

> Thanks Andrew. Arpit also told me about this but I forgot to bring it up
> here.
>
> Best,
>
> > On Aug 24, 2017, at 10:59 AM, Andrew Wang <andrew.w...@cloudera.com>
> wrote:
> >
> > FYI that committer +1s are binding on merges, so Sean and Mingliang's +1s
> > can be upgraded to binding.
> >
> > On Thu, Aug 24, 2017 at 6:09 AM, Kihwal Lee <kih...@oath.com.invalid>
> wrote:
> >
> >> +1 (binding)
> >> Great work guys!
> >>
> >> On Thu, Aug 24, 2017 at 5:01 AM, Steve Loughran <ste...@hortonworks.com
> >
> >> wrote:
> >>
> >>>
> >>> On 23 Aug 2017, at 19:21, Aaron Fabbri <fab...@cloudera.com<mailto:fa
> >>> b...@cloudera.com>> wrote:
> >>>
> >>>
> >>> On Tue, Aug 22, 2017 at 10:24 AM, Steve Loughran <
> ste...@hortonworks.com
> >> <
> >>> mailto:ste...@hortonworks.com>> wrote:
> >>> video being processed:  https://www.youtube.com/watch?
> >>> v=oIe5Zl2YsLE=youtu.be
> >>>
> >>>
> >>> Awesome demo Steve, thanks for doing this.  Particularly glad to see
> >> folks
> >>> using and extending the failure injection client.
> >>>
> >>> The HADOOP-13786 iteration turns on throttle event generation. All the
> >> new
> >>> committer stuff is ready for it, but all the existing S3A FS ops react
> >> to a
> >>> throttle exception by failing, when they need to just back off a bit.
> >> This
> >>> complicates testing as I have to explicitly turn off fault injection
> for
> >>> setup & teardown
> >>>
> >>>
> >>> Demoing the CLI tool was great as well.
> >>>
> >>>
> >>> I'm going to have to do another iteration on that CLI tool post-merge,
> as
> >>> I had one big problem: working out if the bucket and all the binding
> >>> settings meant it was "guarded". I think we'll need to track what
> issues
> >>> like that crop up in the field and add the diagnostics/other options.
> >>>
> >>> +I think another one that'd be useful would be to enum all s3guard DDB
> >>> tables in a region/globally & list their allocated IOPs. I know the AWS
> >> UI
> >>> can list tables by region, but you need to look around every region to
> >> find
> >>> out if you've accidentally created one. If you enum all table & look
> for
> >> a
> >>> s3guard version marker, then you can identify tables.
> >>>
> >>> Wanted to mention two things:
> >>>
> >>> 1. Authoritative mode is not fully implemented yet with Dynamo (it
> needs
> >>> to persist an extra bit for directories).  I do have an auth-mode patch
> >>> (done for a hackathon) that I need to post which shows large
> performance
> >>> improvements over what S3Guard has today.  As you said, we don't
> consider
> >>> authoritative mode ready for production yet: we want to play with it
> more
> >>> and improve the prune algorithm first.  Authoritative mode can be
> thought
> >>> of as a nice bonus in the future: The main goal of S3Guard v1 is to fix
> >> the
> >>> get / list consistency issues you mentioned, which it does well.
> >>>
> >>>
> >>> we need to call that out in the release notes.
> >>>
> >>> 2. Also wanted to thank Lei (Eddy) Xu, he was very active during early
> >>> design and contributed some patches as well.
> >>>
> >>>
> >>> good point. Lei: you will get a special mention the next time I do the
> >> demo
> >>>
> >>>
> >>> Again, great demo, enjoyed it!
> >>>
> >>> -AF
> >>>
> >>>
> >>> its actually quite hard to show any benefits of s3guard on the command
> >>> line, so I've ended up showing some scala tests where I turn on the
> >>> (bundled) inconsistent AWS client to show how you then

Re: [VOTE] Merge HADOOP-13345 (S3Guard feature branch)

2017-08-23 Thread Aaron Fabbri

On Tue, Aug 22, 2017 at 10:24 AM, Steve Loughran <ste...@hortonworks.com>
wrote:

> video being processed:  https://www.youtube.com/watch?
> v=oIe5Zl2YsLE=youtu.be
>
>
Awesome demo Steve, thanks for doing this.  Particularly glad to see folks
using and extending the failure injection client.  Demoing the CLI tool was
great as well.

Wanted to mention two things:

1. Authoritative mode is not fully implemented yet with Dynamo (it needs to
persist an extra bit for directories).  I do have an auth-mode patch (done
for a hackathon) that I need to post which shows large performance
improvements over what S3Guard has today.  As you said, we don't consider
authoritative mode ready for production yet: we want to play with it more
and improve the prune algorithm first.  Authoritative mode can be thought
of as a nice bonus in the future: The main goal of S3Guard v1 is to fix the
get / list consistency issues you mentioned, which it does well.

2. Also wanted to thank Lei (Eddy) Xu, he was very active during early
design and contributed some patches as well.

Again, great demo, enjoyed it!

-AF



> its actually quite hard to show any benefits of s3guard on the command
> line, so I've ended up showing some scala tests where I turn on the
> (bundled) inconsistent AWS client to show how you then need to enable
> s3guard to make the stack traces go away
>
>
> On 22 Aug 2017, at 11:17, Steve Loughran <ste...@hortonworks.com ste...@hortonworks.com>> wrote:
>
> +1 (binding)
>
> I'm happy with it; it's a great piece of work by (in no particular order):
> Chris Nauroth, Aaron Fabbri, Sean McRory & Mingliang Liu. plus a few bits
> in the corners where I got to break things while they were all asleep. Also
> deserving a mention: Thomas Demoor & Ewan Higgs @ WDC for consultancy on
> the corners of S3, everyone who tested in (including our QA team), Sanjay
> Radia, & others.
>
> I've already done a couple of iterations of fixing checksyles & code
> reviews, so I think it is ready. I also have a branch-2 patch based on
> earlier work by Mingliang, for people who want that.
>
>
>
>
> On 17 Aug 2017, at 23:07, Aaron Fabbri <fab...@cloudera.com<mailto:fa
> b...@cloudera.com>> wrote:
>
> Hello,
>
> I'd like to open a vote (7 days, ending August 24 at 3:10 PST) to merge the
> HADOOP-13345 feature branch into trunk.
>
> This branch contains the new S3Guard feature which adds metadata
> consistency features to the S3A client.  Formatted site documentation can
> be found here:
>
> https://github.com/apache/hadoop/blob/HADOOP-13345/
> hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/s3guard.md
>
> The current patch against trunk is posted here:
>
> https://issues.apache.org/jira/browse/HADOOP-13998
>
> The branch modifies the s3a portion of the hadoop-tools/hadoop-aws module:
>
> - The feature is off by default, and care has been taken to insure it has
> no impact when disabled.
> - S3Guard can be enabled with the production database which is backed by
> DynamoDB, or with a local, in-memory implementation that facilitates
> integration testing without having to pay for a database.
> - getFileStatus() as well as directory listing consistency has been
> implemented and thoroughly tested, including delete tracking.
> - Convenient Maven profiles for testing with and without S3Guard.
> - New failure injection code and integration tests that exercise it.  We
> use timers and a wrapper around the Amazon SDK client object to force
> consistency delays to occur.  This allows us to assert that S3Guard works
> as advertised.  This will be extended with more types of failure injection
> to continue hardening the S3A client.
>
> Outside of hadoop-tools/hadoop-aws's s3a directory there are some minor
> changes:
>
> - core-default.xml defaults and documentation for s3guard parameters.
> - A couple additional FS contract test cases around rename.
> - More goodies in LambdaTestUtils
> - A new CLI tool for inspecting and manipulating S3Guard features,
> including the backing MetadataStore database.
>
> This branch has seen extensive testing as well as use in production.  This
> branch makes significant improvements to S3A's test toolkit as well.
>
> Performance is typically on par with, and in some cases better than, the
> existing S3A code without S3Guard enabled.
>
> This feature was developed with contributions and feedback from many
> people.  I'd like to thank everyone who worked on HADOOP-13345 as well as
> all of those who contributed feedback and work on the original design
> document.
>
> This is the first major Apache Hadoop project I've worked on from start to
> finish, and I've really enjoyed it.  Please shout if I've missed anything
&g

Re: [VOTE] Merge HADOOP-13345 (S3Guard feature branch)

2017-08-21 Thread Aaron Fabbri

+ Eddy Xu (having list issues)



On Mon, Aug 21, 2017 at 8:10 AM, Steve Loughran <ste...@hortonworks.com>
wrote:

>
> On 18 Aug 2017, at 17:51, John Zhuge <john.zh...@gmail.com ohn.zh...@gmail.com>> wrote:
>
> That will be great. Please record it if possible.
>
> good idea. I'll do a video & demo, that way I can avoid fielding hard
> questions
>
>
Hah!

-AF



>
> On Fri, Aug 18, 2017 at 4:12 AM, Steve Loughran <ste...@hortonworks.com<
> mailto:ste...@hortonworks.com>> wrote:
>
> I can do a demo of this next week if people are interested
>
> > On 17 Aug 2017, at 23:07, Aaron Fabbri <fab...@cloudera.com<mailto:fa
> b...@cloudera.com>> wrote:
> >
> > Hello,
> >
> > I'd like to open a vote (7 days, ending August 24 at 3:10 PST) to merge
> the
> > HADOOP-13345 feature branch into trunk.
> >
> > This branch contains the new S3Guard feature which adds metadata
> > consistency features to the S3A client.  Formatted site documentation can
> > be found here:
> >
> > https://github.com/apache/hadoop/blob/HADOOP-13345/
> hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/s3guard.md
> >
> > The current patch against trunk is posted here:
> >
> > https://issues.apache.org/jira/browse/HADOOP-13998
> >
> > The branch modifies the s3a portion of the hadoop-tools/hadoop-aws
> module:
> >
> > - The feature is off by default, and care has been taken to insure it has
> > no impact when disabled.
> > - S3Guard can be enabled with the production database which is backed by
> > DynamoDB, or with a local, in-memory implementation that facilitates
> > integration testing without having to pay for a database.
> > - getFileStatus() as well as directory listing consistency has been
> > implemented and thoroughly tested, including delete tracking.
> > - Convenient Maven profiles for testing with and without S3Guard.
> > - New failure injection code and integration tests that exercise it.  We
> > use timers and a wrapper around the Amazon SDK client object to force
> > consistency delays to occur.  This allows us to assert that S3Guard works
> > as advertised.  This will be extended with more types of failure
> injection
> > to continue hardening the S3A client.
> >
> > Outside of hadoop-tools/hadoop-aws's s3a directory there are some minor
> > changes:
> >
> > - core-default.xml defaults and documentation for s3guard parameters.
> > - A couple additional FS contract test cases around rename.
> > - More goodies in LambdaTestUtils
> > - A new CLI tool for inspecting and manipulating S3Guard features,
> > including the backing MetadataStore database.
> >
> > This branch has seen extensive testing as well as use in production.
> This
> > branch makes significant improvements to S3A's test toolkit as well.
> >
> > Performance is typically on par with, and in some cases better than, the
> > existing S3A code without S3Guard enabled.
> >
> > This feature was developed with contributions and feedback from many
> > people.  I'd like to thank everyone who worked on HADOOP-13345 as well as
> > all of those who contributed feedback and work on the original design
> > document.
> >
> > This is the first major Apache Hadoop project I've worked on from start
> to
> > finish, and I've really enjoyed it.  Please shout if I've missed anything
> > important here or in the VOTE process.
> >
> > Cheers,
> > Aaron Fabbri
>
>
> -
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org common-dev-unsubscr...@hadoop.apache.org>
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org common-dev-h...@hadoop.apache.org>
>
>
>
>
> --
> John
>
>

[VOTE] Merge HADOOP-13345 (S3Guard feature branch)

2017-08-17 Thread Aaron Fabbri

Hello,

I'd like to open a vote (7 days, ending August 24 at 3:10 PST) to merge the
HADOOP-13345 feature branch into trunk.

This branch contains the new S3Guard feature which adds metadata
consistency features to the S3A client. Formatted site documentation can
be found here:

https://github.com/apache/hadoop/blob/HADOOP-13345/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/s3guard.md

The current patch against trunk is posted here:

https://issues.apache.org/jira/browse/HADOOP-13998

The branch modifies the s3a portion of the hadoop-tools/hadoop-aws module:

- The feature is off by default, and care has been taken to insure it has
no impact when disabled.
- S3Guard can be enabled with the production database which is backed by
DynamoDB, or with a local, in-memory implementation that facilitates
integration testing without having to pay for a database.
- getFileStatus() as well as directory listing consistency has been
implemented and thoroughly tested, including delete tracking.
- Convenient Maven profiles for testing with and without S3Guard.
- New failure injection code and integration tests that exercise it. We
use timers and a wrapper around the Amazon SDK client object to force
consistency delays to occur. This allows us to assert that S3Guard works
as advertised. This will be extended with more types of failure injection
to continue hardening the S3A client.

Outside of hadoop-tools/hadoop-aws's s3a directory there are some minor
changes:

- core-default.xml defaults and documentation for s3guard parameters.
- A couple additional FS contract test cases around rename.
- More goodies in LambdaTestUtils
- A new CLI tool for inspecting and manipulating S3Guard features,
including the backing MetadataStore database.

This branch has seen extensive testing as well as use in production. This
branch makes significant improvements to S3A's test toolkit as well.

Performance is typically on par with, and in some cases better than, the
existing S3A code without S3Guard enabled.

This feature was developed with contributions and feedback from many
people. I'd like to thank everyone who worked on HADOOP-13345 as well as
all of those who contributed feedback and work on the original design
document.

This is the first major Apache Hadoop project I've worked on from start to
finish, and I've really enjoyed it. Please shout if I've missed anything
important here or in the VOTE process.

Cheers,
Aaron Fabbri

Re: Getting close to a vote on a merge of S3Guard., HADOOP-13345

2017-08-16 Thread Aaron Fabbri

On Wed, Aug 16, 2017 at 1:39 PM, Andrew Wang 
wrote:

> Hi Steve,
>
> What's the target release vehicle, and the timeline for merging this? The
> target date for beta1 is mid-September, so any large code movements make me
> nervous.
>

I think this is ready to get in before beta1.  Most of upstream s3a dev has
been happening on this branch so it has a lot of improvements and testing.

> Could you comment on testing and API stability of this branch? I'm trusting
> the judgement of the contributors involved, since there isn't much time to
> fix things before beta1.
>
>
We've done a ton of testing on this branch:

- List consistency tests with failure injection. (HADOOP-13793) This
integration test forces a delay in visibility of certain files by wrapping
the AWS S3 client. It asserts listing is consistent. The test fails without
S3Guard, and succeeds with it.

- All existing S3 integration tests with and without S3Guard. The
filesystem contract tests have been invaluable here. (HADOOP-13589 makes
these very easy to run).

- MetadataStore contract tests that ensure that the API semantics of the
DynamoDB and in-memory reference implementations are correct.

- MetadataStore scale tests that can be used to force DynamoDB service
throttling and ensure we are robust to that.

- Unit tests for different parts of the S3Guard logic.

As you probably know, at Cloudera we are using this codebase in production,
and have run all of our downstream tests including Hive, Spark, Impala on
the new S3A client code, with and without S3Guard enabled.

In terms of API compatibility, the new features sit behind the FileSystem /
FileContext APIs, which have not changed.  Applications don't require any
changes.  Internal APIs for S3Guard, such as MetadataStore (currently
private / evolving), should be properly annotated already.  The S3Guard
work has been active for quite a while now, so the APIs are fairly stable
in practice.

Probably my biggest goal in writing the S3AFileSystem integration code
(HADOOP-13651) was to preserve existing logic and correctness when S3Guard
is not enabled.  One design choice which has worked well was to define a
"null" implementation of the MetadataStore (the API that filesystem clients
use to log metadata changes):

https://github.com/apache/hadoop/blob/HADOOP-13345/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/NullMetadataStore.java

This is used in S3A by default. This made it easier to reason about
correctness and minimized the size of the diff to the FS client as well.

Other questions welcomed!

Cheers,
Aaron

Best,
> Andrew
>
> On Wed, Aug 16, 2017 at 5:25 AM, Steve Loughran 
> wrote:
>
> >
> > FYI, We're getting ready for a patch to merge the current S3Guard branch,
> > HADOOP-13345, via a patch https://issues.apache.org/
> > jira/browse/HADOOP-13998
> >
> > After that's done, we do plan to have a second iteration, work on a
> > 0-rename committer (HADOOP-13786) with all the other tuning and
> > improvements; We'd add a new uber-JIRA & move stuff over, maybe branch,
> > and/or do things patch-by-patch .
> >
> > Anyway, now is a great time for people to download and play
> >
> > https://github.com/apache/hadoop/blob/HADOOP-13345/
> > hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/s3guard.md
> >
> > testing this
> >
> > https://github.com/apache/hadoop/blob/HADOOP-13345/
> > hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/testing.md
> >
> > The Inconsistent AWS Client is also something everyone is free to use for
> > injecting inconsistencies (and soon faults) into their own apps by way of
> > 2-3 config options. Want to know how your code handles S3A being
> observably
> > inconsistent? We'll let you do that.
> >
> > -Steve
> >
> >
> >
>

Requesting write access for Confluence Wiki

2017-07-17 Thread Aaron Fabbri

Hi,

I'd like to create a troubleshooting page on the Hadoop Confluence wiki for
HADOOP-14467.

Can someone please grant me access?  Account under fab...@apache.org.

[jira] [Created] (HADOOP-14633) S3Guard: optimize create codepath

2017-07-07 Thread Aaron Fabbri (JIRA)

Aaron Fabbri created HADOOP-14633:
-

 Summary: S3Guard: optimize create codepath
 Key: HADOOP-14633
 URL: https://issues.apache.org/jira/browse/HADOOP-14633
 Project: Hadoop Common
  Issue Type: Sub-task
 Environment: 


Reporter: Aaron Fabbri
Assignee: Aaron Fabbri
Priority: Minor


Following up on HADOOP-14457, a couple of things to do that will improve create 
performance as I mentioned in the comment 
[here|https://issues.apache.org/jira/browse/HADOOP-14457?focusedCommentId=16078465=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16078465]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Re: [VOTE] Release Apache Hadoop 3.0.0-alpha4-RC0

2017-07-06 Thread Aaron Fabbri

Thanks for the hard work on this!  +1 (non-binding)

- Built from source tarball on OS X w/ Java 1.8.0_45.
- Deployed mini/pseudo cluster.
- Ran grep and wordcount examples.
- Poked around ResourceManager and JobHistory UIs.
- Ran all s3a integration tests in US West 2.


On Thu, Jul 6, 2017 at 10:20 AM, Xiao Chen  wrote:

> Thanks Andrew!
> +1 (non-binding)
>
>- Verified md5's, checked tarball sizes are reasonable
>- Built source tarball and deployed a pseudo-distributed cluster with
>hdfs/kms
>- Tested basic hdfs/kms operations
>- Sanity checked webuis/logs
>
>
> -Xiao
>
> On Wed, Jul 5, 2017 at 10:33 PM, John Zhuge  wrote:
>
> > +1 (non-binding)
> >
> >
> >- Verified checksums and signatures of the tarballs
> >- Built source with native, Java 1.8.0_131 on Mac OS X 10.12.5
> >- Cloud connectors:
> >   - A few S3A integration tests
> >   - A few ADL live unit tests
> >- Deployed both binary and built source to a pseudo cluster, passed
> the
> >following sanity tests in insecure, SSL, and SSL+Kerberos mode:
> >   - HDFS basic and ACL
> >   - DistCp basic
> >   - WordCount (skipped in Kerberos mode)
> >   - KMS and HttpFS basic
> >
> > Thanks Andrew for the great effort!
> >
> > On Wed, Jul 5, 2017 at 1:33 PM, Eric Payne  > invalid>
> > wrote:
> >
> > > Thanks Andrew.
> > > I downloaded the source, built it, and installed it onto a pseudo
> > > distributed 4-node cluster.
> > >
> > > I ran mapred and streaming test cases, including sleep and wordcount.
> > > +1 (non-binding)
> > > -Eric
> > >
> > >   From: Andrew Wang 
> > >  To: "common-dev@hadoop.apache.org" ; "
> > > hdfs-...@hadoop.apache.org" ; "
> > > mapreduce-...@hadoop.apache.org" ; "
> > > yarn-...@hadoop.apache.org" 
> > >  Sent: Thursday, June 29, 2017 9:41 PM
> > >  Subject: [VOTE] Release Apache Hadoop 3.0.0-alpha4-RC0
> > >
> > > Hi all,
> > >
> > > As always, thanks to the many, many contributors who helped with this
> > > release! I've prepared an RC0 for 3.0.0-alpha4:
> > >
> > > http://home.apache.org/~wang/3.0.0-alpha4-RC0/
> > >
> > > The standard 5-day vote would run until midnight on Tuesday, July 4th.
> > > Given that July 4th is a holiday in the US, I expect this vote might
> have
> > > to be extended, but I'd like to close the vote relatively soon after.
> > >
> > > I've done my traditional testing of a pseudo-distributed cluster with a
> > > single task pi job, which was successful.
> > >
> > > Normally my testing would end there, but I'm slightly more confident
> this
> > > time. At Cloudera, we've successfully packaged and deployed a snapshot
> > from
> > > a few days ago, and run basic smoke tests. Some bugs found from this
> > > include HDFS-11956, which fixes backwards compat with Hadoop 2 clients,
> > and
> > > the revert of HDFS-11696, which broke NN QJM HA setup.
> > >
> > > Vijay is working on a test run with a fuller test suite (the results of
> > > which we can hopefully post soon).
> > >
> > > My +1 to start,
> > >
> > > Best,
> > > Andrew
> > >
> > >
> > >
> > >
> >
> >
> >
> > --
> > John
> >
>

[jira] [Created] (HADOOP-14548) S3Guard: issues running parallel tests w/ S3N

2017-06-19 Thread Aaron Fabbri (JIRA)

Aaron Fabbri created HADOOP-14548:
-

 Summary: S3Guard: issues running parallel tests w/ S3N 
 Key: HADOOP-14548
 URL: https://issues.apache.org/jira/browse/HADOOP-14548
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Aaron Fabbri


In general, running S3Guard and parallel tests with S3A and S3N contract tests 
enabled is asking for trouble:  S3Guard code assumes there are not other 
non-S3Guard clients modifying the bucket.

Goal of this JIRA is to:

- Discuss current failures running `mvn verify -Dparallel-tests -Ds3guard 
-Ddynamo` with S3A and S3N contract tests configured.
- Identify any failures here that are worth looking into.
- Document (or enforce) that people should not do this, or should expect 
failures if they do.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

[jira] [Created] (HADOOP-14527) ITestS3GuardListConsistency is too slow

2017-06-14 Thread Aaron Fabbri (JIRA)

Aaron Fabbri created HADOOP-14527:
-

 Summary: ITestS3GuardListConsistency is too slow
 Key: HADOOP-14527
 URL: https://issues.apache.org/jira/browse/HADOOP-14527
 Project: Hadoop Common
  Issue Type: Sub-task
Affects Versions: HADOOP-13345
Reporter: Aaron Fabbri
Assignee: Aaron Fabbri
Priority: Minor


I'm really glad to see folks adopting the inconsistency injection stuff and 
adding test cases to ITestS3GuardListConsistency.  That test class has become 
very slow, however, mostly due to {{Thread.sleep()}} calls that wait for the 
inconsistency injection timers to expire.

I will take a stab at speeding up this test class.  As is it takes about 8 
minutes to run.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

[jira] [Created] (HADOOP-14509) InconsistentAmazonS3Client adds extra paths to listStatus() after delete.

2017-06-08 Thread Aaron Fabbri (JIRA)

Aaron Fabbri created HADOOP-14509:
-

 Summary: InconsistentAmazonS3Client adds extra paths to 
listStatus() after delete.
 Key: HADOOP-14509
 URL: https://issues.apache.org/jira/browse/HADOOP-14509
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Aaron Fabbri
Assignee: Aaron Fabbri


I identified a potential issue in code that simulates list-after-delete 
inconsistency when code reviewing HADOOP-13760.  It appeared to work for the 
existing test cases but now that we are using the inconsistency injection code 
for general testing (e.g. HADOOP-14488) we need to make sure this stuff is 
correct.  

Deliverable is to make sure {{InconsistentAmazonS3Client#restoreListObjects()}} 
is correct.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

[jira] [Created] (HADOOP-14506) Add create() contract test that verifies ancestor dir creation

2017-06-07 Thread Aaron Fabbri (JIRA)

Aaron Fabbri created HADOOP-14506:
-

 Summary: Add create() contract test that verifies ancestor dir 
creation
 Key: HADOOP-14506
 URL: https://issues.apache.org/jira/browse/HADOOP-14506
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs
Affects Versions: HADOOP-13345
Reporter: Aaron Fabbri
Priority: Minor


The semantics of {{FileSystem#create()}} appear to include implicit creation of 
any parent or ancestor directories in the Path being created, unlike Posix 
filesystems, which expect the parent directory to already exist.

S3AFileSystem *does* implicitly create ancestor directories.  It does *not* 
currently enforce that any ancestors are directories (HADOOP-13221).

Deliverable for this JIRA is just a test case added to 
{{AbstractContractCreateTest}} which verifies that missing ancestors are 
created by {{create()}}.

Pulling this test dev work out into separate jira from HADOOP-14457.  Targeting 
S3Guard branch for now, but we could move this.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

[jira] [Created] (HADOOP-14505) simplify mkdirs() after S3Guard delete tracking change

2017-06-07 Thread Aaron Fabbri (JIRA)

Aaron Fabbri created HADOOP-14505:
-

 Summary: simplify mkdirs() after S3Guard delete tracking change
 Key: HADOOP-14505
 URL: https://issues.apache.org/jira/browse/HADOOP-14505
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Aaron Fabbri
Priority: Minor


I noticed after reviewing the S3Guard delete tracking changes for HADOOP-13760, 
that mkdirs() can probably be simplified, replacing the use of 
checkPathForDirectory() with a simple getFileStatus().

Creating a separate JIRA so these changes can be reviewed / tested in isolation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

[jira] [Created] (HADOOP-14504) ProvidedFileStatusIterator#next() may throw IndexOutOfBoundsException

2017-06-07 Thread Aaron Fabbri (JIRA)

Aaron Fabbri created HADOOP-14504:
-

 Summary: ProvidedFileStatusIterator#next() may throw 
IndexOutOfBoundsException
 Key: HADOOP-14504
 URL: https://issues.apache.org/jira/browse/HADOOP-14504
 Project: Hadoop Common
  Issue Type: Sub-task
Affects Versions: HADOOP-13345
Reporter: Aaron Fabbri
Assignee: Aaron Fabbri


[~mackrorysd] noticed this as part of his work on HADOOP-14457.  We are 
splitting that JIRA into smaller patches so we will address this separately.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

[jira] [Created] (HADOOP-14468) S3Guard: make short-circuit getFileStatus() configurable

2017-05-30 Thread Aaron Fabbri (JIRA)

Aaron Fabbri created HADOOP-14468:
-

 Summary: S3Guard: make short-circuit getFileStatus() configurable
 Key: HADOOP-14468
 URL: https://issues.apache.org/jira/browse/HADOOP-14468
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Aaron Fabbri
Assignee: Aaron Fabbri


Currently, when S3Guard is enabled, getFileStatus() will skip S3 if it gets a 
result from the MetadataStore (e.g. dynamodb) first.

I would like to add a new parameter 
{{fs.s3a.metadatastore.getfilestatus.authoritative}} which, when true, keeps 
the current behavior.  When false, S3AFileSystem will check both S3 and the 
MetadataStore.

I'm not sure yet if we want to have this behavior the same for all callers of 
getFileStatus(), or if we only want to check both S3 and MetadataStore for some 
internal callers such as open().



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

[jira] [Created] (HADOOP-14467) S3Guard: Improve FNFE message when opening a stream

2017-05-30 Thread Aaron Fabbri (JIRA)

Aaron Fabbri created HADOOP-14467:
-

 Summary: S3Guard: Improve FNFE message when opening a stream
 Key: HADOOP-14467
 URL: https://issues.apache.org/jira/browse/HADOOP-14467
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Aaron Fabbri
Assignee: Aaron Fabbri


Following up on the [discussion on 
HADOOP-13345|https://issues.apache.org/jira/browse/HADOOP-13345?focusedCommentId=16030050=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16030050],
 because S3Guard can serve getFileStatus() from the MetadataStore without doing 
a HEAD on S3, a FileNotFound error on a file due to S3 GET inconsistency does 
not happen on open(), but on the first read of the stream.  We may add retries 
to the S3 client in the future, but for now we should have an exception message 
that indicates this may be due to inconsistency (assuming it isn't a more 
straightforward case like someone deleting the object out from under you).

This is expected to be a rare case, since the S3 service is now mostly 
consistent for GET.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

[jira] [Created] (HADOOP-14428) s3a: mkdir appears to be broken

2017-05-16 Thread Aaron Fabbri (JIRA)

Aaron Fabbri created HADOOP-14428:
-

 Summary: s3a: mkdir appears to be broken
 Key: HADOOP-14428
 URL: https://issues.apache.org/jira/browse/HADOOP-14428
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Affects Versions: 3.0.0-alpha2, HADOOP-13345
Reporter: Aaron Fabbri
Priority: Blocker


Reproduction is:

hadoop fs -mkdir s3a://my-bucket/dir/
hadoop fs -ls s3a://my-bucket/dir/

ls: `s3a://my-bucket/dir/': No such file or directory

I believe this is a regression from HADOOP-14255.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

[jira] [Resolved] (HADOOP-13981) S3Guard CLI: Add documentation

2017-05-10 Thread Aaron Fabbri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Fabbri resolved HADOOP-13981.
---
Resolution: Fixed

Documentation was added as part of S3Guard CLI JIRAs, resolving.

> S3Guard CLI: Add documentation
> --
>
> Key: HADOOP-13981
> URL: https://issues.apache.org/jira/browse/HADOOP-13981
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>    Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>
> I believe we still need documentation for the new S3Guard CLI commands.  
> Synopsis of all the commands and some examples would be great.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

[jira] [Created] (HADOOP-14391) s3a: auto-detect region for bucket and use right endpoint

2017-05-05 Thread Aaron Fabbri (JIRA)

Aaron Fabbri created HADOOP-14391:
-

 Summary: s3a: auto-detect region for bucket and use right endpoint
 Key: HADOOP-14391
 URL: https://issues.apache.org/jira/browse/HADOOP-14391
 Project: Hadoop Common
  Issue Type: Improvement
  Components: s3
Affects Versions: 3.0.0-alpha2
Reporter: Aaron Fabbri


Specifying the S3A endpoint ({{fs.s3a.endpoint}}) is

- *required* for regions which only support v4 authentication
- A good practice for all regions.

The user experience of having to configure endpoints is not great.  Often it is 
neglected and leads to additional cost, reduced performance, or failures for v4 
auth regions.

I want to explore an option which, when enabled, auto-detects the region for an 
s3 bucket and uses the proper endpoint.  Not sure if this is possible or anyone 
has looked into it yet.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

[jira] [Created] (HADOOP-14323) ITestS3GuardListConsistency failure w/ Local, authoritative metadata store

2017-04-19 Thread Aaron Fabbri (JIRA)

Aaron Fabbri created HADOOP-14323:
-

 Summary: ITestS3GuardListConsistency failure w/ Local, 
authoritative metadata store
 Key: HADOOP-14323
 URL: https://issues.apache.org/jira/browse/HADOOP-14323
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: s3
Affects Versions: HADOOP-13345
Reporter: Aaron Fabbri
Priority: Minor


When doing some testing for HADOOP-14266 I noticed this test failure:

{noformat}
java.lang.NullPointerException: null
at 
org.apache.hadoop.fs.s3a.ITestS3GuardListConsistency.testListStatusWriteBack(ITestS3GuardListConsistency.java:317)
{noformat}

I was running with LocalMetadataStore and 
{{fs.s3a.metadatastore.authoritative}} set to true.  I haven't been testing 
this mode recently so not sure if this case ever worked.  Lower priority but we 
should fix it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

[jira] [Created] (HADOOP-14283) S3A may hang due to bug in AWS SDK 1.11.86

2017-04-05 Thread Aaron Fabbri (JIRA)

Aaron Fabbri created HADOOP-14283:
-

 Summary: S3A may hang due to bug in AWS SDK 1.11.86
 Key: HADOOP-14283
 URL: https://issues.apache.org/jira/browse/HADOOP-14283
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Affects Versions: 3.0.0-alpha2
Reporter: Aaron Fabbri
Assignee: Aaron Fabbri
Priority: Critical


We hit a hang bug when testing S3A with parallel renames.  

I narrowed this down to the newer AWS Java SDK.  It only happens under load, 
and appears to be a failure to wake up a waiting thread on timeout/error.

I've created a github issue here:
https://github.com/aws/aws-sdk-java/issues/1102

I can post a Hadoop scale test which reliably reproduces this after some 
cleanup.  I have posted an SDK-only test here which reproduces the issue 
without Hadoop:

https://github.com/ajfabbri/awstest

I have a support ticket open and am working with Amazon on this bug so I'll 
take this issue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

[jira] [Created] (HADOOP-14144) s3guard: CLI import does not yield an empty diff.

2017-03-02 Thread Aaron Fabbri (JIRA)

Aaron Fabbri created HADOOP-14144:
-

 Summary: s3guard: CLI import does not yield an empty diff.
 Key: HADOOP-14144
 URL: https://issues.apache.org/jira/browse/HADOOP-14144
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Aaron Fabbri
Priority: Minor


I expected the following steps to yield zero diff from `hadoop s3guard diff` 
command.

(1) hadoop s3guard init ... (create fresh table)
(2) hadoop s3guard import (fresh table, existing bucket with data in it)
(3) hadoop s3guard diff ..

Instead I still get a non-zero diff on step #3, and also noticed some entries 
are printed twice.

{noformat}
dude@computer:~/Code/hadoop$ hadoop s3guard diff -meta dynamodb://dude-dev 
-region us-west-2 s3a://dude-dev
S3  D   s3a://fabbri-dev/user/fabbri/test/parentdirdest
S3  D   s3a://fabbri-dev/user/fabbri/test/parentdirdest
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

[jira] [Created] (HADOOP-14096) s3guard: regression in dirListingUnion

2017-02-19 Thread Aaron Fabbri (JIRA)

Aaron Fabbri created HADOOP-14096:
-

 Summary: s3guard: regression in dirListingUnion
 Key: HADOOP-14096
 URL: https://issues.apache.org/jira/browse/HADOOP-14096
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: HADOOP-13345
Reporter: Aaron Fabbri
Priority: Critical


Just noticed HADOOP-14020 introduced a bug in S3Guard#dirListingUnion.

The offending change is here:

{noformat}
-  if (dirMeta.get(s.getPath()) == null) {
-dirMeta.put(s);
-  }
+  changed = changed || dirMeta.put(s);
+}
+
{noformat}

hint: Logical OR is a short-circuit operator.
  Easy fix, but should probably come with a unit test for dirListingUnion().



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

[jira] [Created] (HADOOP-14087) S3A typo in pom.xml test exclusions

2017-02-15 Thread Aaron Fabbri (JIRA)

Aaron Fabbri created HADOOP-14087:
-

 Summary: S3A typo in pom.xml test exclusions
 Key: HADOOP-14087
 URL: https://issues.apache.org/jira/browse/HADOOP-14087
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: HADOOP-13345
Reporter: Aaron Fabbri
Assignee: Aaron Fabbri


When you overload DDB, you get error messages warning of throttling, [as 
documented by 
AWS|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#Programming.Errors.MessagesAndCodes]

Reduce load on DDB by doing a table lookup before the create, then, in table 
create/delete operations and in get/put actions, recognise the error codes and 
retry using an appropriate retry policy (exponential backoff + ultimate 
failure) 




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Re: review for HADOOP-14028

2017-02-06 Thread Aaron Fabbri

I will take a look (though my review will be non-binding).

On Mon, Feb 6, 2017 at 8:28 AM, Steve Loughran 
wrote:

> can I get a review/feedback on, S3A block output streams don't delete
> temporary files in multipart uploads"
> https://issues.apache.org/jira/browse/HADOOP-14028
>
> ...this is pretty serious, and I'd like to get it into 2.8.0, for which I
> need reviewers. Tested against AWS S3, with a test which uses some new
> instrumentation to even verify that blocks are being released.
>
> thanks
>
> -Steve
>

[jira] [Created] (HADOOP-14051) S3Guard: link docs from index, fix typos

2017-02-02 Thread Aaron Fabbri (JIRA)

Aaron Fabbri created HADOOP-14051:
-

 Summary: S3Guard: link docs from index, fix typos
 Key: HADOOP-14051
 URL: https://issues.apache.org/jira/browse/HADOOP-14051
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Aaron Fabbri
Assignee: Aaron Fabbri


JIRA for a quick round of s3guard.md documentation improvements.

- Link from index.md
- Make a pass and fix typos / outdated stuff / spelling etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Re: Proposal to merge S3Guard feature branch

2017-02-02 Thread Aaron Fabbri

On Thu, Feb 2, 2017 at 2:56 PM, Steve Loughran <ste...@hortonworks.com>
wrote:

>
> > On 2 Feb 2017, at 08:52, Aaron Fabbri <fab...@cloudera.com> wrote:
> >
> > Hello,
> >
> > I'd like to propose merging the HADOOP-13345 feature branch to trunk.
> >
> > I just wrote up a status summary on HADOOP-13998 "initial s3guard
> preview"
> > that goes into more detail.
> >
> > Cheers,
> > Aaron
>
> Even though I've been working on it, I'm not convinced it's ready
>
>
Ok.   Would love if we could track outstanding items in HADOOP-13998 so I
can have some indication of how this branch will terminate.  I worked
really hard this week to knock out the remaining items there in hopes of a
merge.

> 1. there's still "TODO s3guard" in bits of the code
> 2. there's not been that much in terms of active play —that is, beyond
> integration tests and benchmarks
> 3. the db format is still stabilising and once that's out, life gets more
> complex. Example: the version marker last week, HADOOP-13876 this week,
> which I still need to review.
>
> I just don't think it's stable enough.
>

Thanks for your response here.  I hope we can weigh the cost of maintaining
a separate S3AFileSystem version against the risk of earlier integration
with trunk.  I'm pretty biased against long-lived feature branches,
personally.

As I mentioned in the JIRA I plan to work on the way we handle empty
directories in S3A.  This could get painful if we continue change
S3AFileSystem in trunk.  The coming metrics changes I want to do also may
be a source of merge conflicts.

>
> Once it' merged in
>
> -development time slows, cost increases: you need review and a +1 from a
> full committer, not a branch committer
>

Functionally this is what I'm doing today.. Will try to get another branch
committer to help you with the workload though.  Really appreciate the
reviews so far!

> -if any change causes a regression in the functionality of trunk, it's
> more of an issue. A regression before the merge is a detail, one on trunk,
> even if short lived, isn't welcome.
>
>
For sure.  I'd hope that the default setting (S3Guard disabled) should be
very solid by now though.  The documentation has scary "this is
experimental" warnings still if folks try to turn it on.

My work on failure injection and DynamoDB load testing should be some
indication I care about stability very much as well.

Thanks!
Aaron

I'm happy with someone to do their own preview of a 3.0.x + s3guard, say
> "play with this and see how much performance you get", but right now, I
> think it needs a few more weeks before getting the broader review which is
> going to be needed, and everyone working on it is confident that it's going
> to be stable

Proposal to merge S3Guard feature branch

2017-02-02 Thread Aaron Fabbri

Hello,

I'd like to propose merging the HADOOP-13345 feature branch to trunk.

I just wrote up a status summary on HADOOP-13998 "initial s3guard preview"
that goes into more detail.

Cheers,
Aaron

[jira] [Created] (HADOOP-14036) S3Guard: intermittent duplicate item keys failure

2017-01-27 Thread Aaron Fabbri (JIRA)

Aaron Fabbri created HADOOP-14036:
-

 Summary: S3Guard: intermittent duplicate item keys failure
 Key: HADOOP-14036
 URL: https://issues.apache.org/jira/browse/HADOOP-14036
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Aaron Fabbri


I see this occasionally when running integration tests with -Ds3guard -Ddynamo:

{noformat}
testRenameToDirWithSamePrefixAllowed(org.apache.hadoop.fs.s3a.ITestS3AFileSystemContract)
  Time elapsed: 2.756 sec  <<< ERROR!
org.apache.hadoop.fs.s3a.AWSServiceIOException: move: 
com.amazonaws.services.dynamodbv2.model.AmazonDynamoDBException: Provided list 
of item keys contains duplicates (Service: AmazonDynamoDBv2; Status Code: 400; 
Error Code: ValidationException; Request ID: 
QSBVQV69279UGOB4AJ4NO9Q86VVV4KQNSO5AEMVJF66Q9ASUAAJG): Provided list of item 
keys contains duplicates (Service: AmazonDynamoDBv2; Status Code: 400; Error 
Code: ValidationException; Request ID: 
QSBVQV69279UGOB4AJ4NO9Q86VVV4KQNSO5AEMVJF66Q9ASUAAJG)
at 
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:178)
at 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.move(DynamoDBMetadataStore.java:408)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.innerRename(S3AFileSystem.java:869)
at org.apache.hadoop.fs.s3a.S3AFileSystem.rename(S3AFileSystem.java:662)
at 
org.apache.hadoop.fs.FileSystemContractBaseTest.rename(FileSystemContractBaseTest.java:525)
at 
org.apache.hadoop.fs.FileSystemContractBaseTest.testRenameToDirWithSamePrefixAllowed(FileSystemContractBaseTest.java:669)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcces
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

[jira] [Created] (HADOOP-14013) S3Guard: fix multi-bucket integration tests

2017-01-20 Thread Aaron Fabbri (JIRA)

Aaron Fabbri created HADOOP-14013:
-

 Summary: S3Guard: fix multi-bucket integration tests
 Key: HADOOP-14013
 URL: https://issues.apache.org/jira/browse/HADOOP-14013
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: HADOOP-13345
Reporter: Aaron Fabbri
Assignee: Aaron Fabbri


HADOOP-13449 adds support for DynamoDBMetadataStore.

The code currently supports two options for choosing DynamoDB table names:
1. Use name of each s3 bucket and auto-create a DynamoDB table for each.
2. Configure a table name in the {{fs.s3a.s3guard.ddb.table}} parameter.

One of the issues is with accessing read-only buckets.  If a user accesses a 
read-only bucket with credentials that do not have DynamoDB write permissions, 
they will get errors when trying to access the read-only bucket.  This 
manifests causes test failures for {{ITestS3AAWSCredentialsProvider}}.

Goals for this JIRA:
- Fix {{ITestS3AAWSCredentialsProvider}} in a way that makes sense for the real 
use-case.
- Allow for a "one DynamoDB table per cluster" configuration with a way to 
chose which credentials are used for DynamoDB.
- Document limitations etc. in the s3guard.md site doc.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

[jira] [Created] (HADOOP-13995) s3guard cli: make tests easier to run and address failure

2017-01-17 Thread Aaron Fabbri (JIRA)

Aaron Fabbri created HADOOP-13995:
-

 Summary: s3guard cli: make tests easier to run and address failure
 Key: HADOOP-13995
 URL: https://issues.apache.org/jira/browse/HADOOP-13995
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Aaron Fabbri
Assignee: Sean Mackrory


Following up on HADOOP-13650, we should:

- Make it clearer which config parameters need to be set for test to succeed, 
and provide good defaults.
- Address any remaining test failures.
- Change TestS3GuardTool to an ITest




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

[jira] [Created] (HADOOP-13981) S3Guard CLI: Add documentation

2017-01-11 Thread Aaron Fabbri (JIRA)

Aaron Fabbri created HADOOP-13981:
-

 Summary: S3Guard CLI: Add documentation
 Key: HADOOP-13981
 URL: https://issues.apache.org/jira/browse/HADOOP-13981
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Reporter: Aaron Fabbri
Assignee: Aaron Fabbri


I believe we still need documentation for the new S3Guard CLI commands.  
Synopsis of all the commands and some examples would be great.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

[jira] [Created] (HADOOP-13980) S3Guard CLI: Add fsck check command

2017-01-11 Thread Aaron Fabbri (JIRA)

Aaron Fabbri created HADOOP-13980:
-

 Summary: S3Guard CLI: Add fsck check command
 Key: HADOOP-13980
 URL: https://issues.apache.org/jira/browse/HADOOP-13980
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Reporter: Aaron Fabbri
Assignee: Aaron Fabbri


As discussed in HADOOP-13650, we want to add an S3Guard CLI command which 
compares S3 with MetadataStore, and returns a failure status if any invariants 
are violated.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

[jira] [Created] (HADOOP-13914) s3guard: improve S3AFileStatus#isEmptyDirectory handling

2016-12-16 Thread Aaron Fabbri (JIRA)

Aaron Fabbri created HADOOP-13914:
-

 Summary: s3guard: improve S3AFileStatus#isEmptyDirectory handling
 Key: HADOOP-13914
 URL: https://issues.apache.org/jira/browse/HADOOP-13914
 Project: Hadoop Common
  Issue Type: Sub-task
Affects Versions: HADOOP-13345
Reporter: Aaron Fabbri


As discussed in HADOOP-13449, proper support for the isEmptyDirectory() flag 
stored in S3AFileStatus is missing from DynamoDBMetadataStore.

The approach taken by LocalMetadataStore is not suitable for the DynamoDB 
implementation, and also sacrifices good code separation to minimize 
S3AFileSystem changes pre-merge to trunk.

I will attach a design doc that attempts to clearly explain the problem and 
preferred solution.  I suggest we do this work after merging the HADOOP-13345 
branch to trunk, but am open to suggestions.

I can also attach a patch of a integration test that exercises the missing case 
and demonstrates a failure with DynamoDBMetadataStore.






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

[jira] [Created] (HADOOP-13886) s3guard: ITestS3AFileOperationCost.testFakeDirectoryDeletion failure

2016-12-09 Thread Aaron Fabbri (JIRA)

Aaron Fabbri created HADOOP-13886:
-

 Summary: s3guard:  
ITestS3AFileOperationCost.testFakeDirectoryDeletion failure
 Key: HADOOP-13886
 URL: https://issues.apache.org/jira/browse/HADOOP-13886
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Reporter: Aaron Fabbri


testFakeDirectoryDeletion(org.apache.hadoop.fs.s3a.ITestS3AFileOperationCost)  
Time elapsed: 10.011 sec  <<< FAILURE!
java.lang.AssertionError: after rename(srcFilePath, destFilePath): 
directories_created expected:<1> but was:<0>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at 
org.apache.hadoop.fs.s3a.S3ATestUtils$MetricDiff.assertDiffEquals(S3ATestUtils.java:431)
at 
org.apache.hadoop.fs.s3a.ITestS3AFileOperationCost.testFakeDirectoryDeletion(ITestS3AFileOperationCost.java:254)

More details to follow in comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

[jira] [Created] (HADOOP-13877) S3Guard: fix TestDynamoDBMetadataStore when fs.s3a.s3guard.ddb.table is set

2016-12-08 Thread Aaron Fabbri (JIRA)

Aaron Fabbri created HADOOP-13877:
-

 Summary: S3Guard: fix TestDynamoDBMetadataStore when 
fs.s3a.s3guard.ddb.table is set
 Key: HADOOP-13877
 URL: https://issues.apache.org/jira/browse/HADOOP-13877
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: HADOOP-13345
Reporter: Aaron Fabbri
Assignee: Aaron Fabbri


I see a couple of failures in the DynamoDB MetadataStore unit test when I set 
{{fs.s3a.s3guard.ddb.table}} in my test/resources/core-site.xml.

I have a fix already, so I'll take this JIRA.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

[jira] [Created] (HADOOP-13876) S3Guard: better support for multi-bucket access including read-only

2016-12-08 Thread Aaron Fabbri (JIRA)

Aaron Fabbri created HADOOP-13876:
-

 Summary: S3Guard: better support for multi-bucket access including 
read-only
 Key: HADOOP-13876
 URL: https://issues.apache.org/jira/browse/HADOOP-13876
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: HADOOP-13345
Reporter: Aaron Fabbri


HADOOP-13449 adds support for DynamoDBMetadataStore.

The code currently supports two options for choosing DynamoDB table names:
1. Use name of each s3 bucket and auto-create a DynamoDB table for each.
2. Configure a table name in the {{fs.s3a.s3guard.ddb.table}} parameter.

One of the issues is with accessing read-only buckets.  If a user accesses a 
read-only bucket with credentials that do not have DynamoDB write permissions, 
they will get errors when trying to access the read-only bucket.  This 
manifests causes test failures for {{ITestS3AAWSCredentialsProvider}}.

Goals for this JIRA:
- Fix {{ITestS3AAWSCredentialsProvider}} in a way that makes sense for the real 
use-case.
- Allow for a "one DynamoDB table per cluster" configuration with a way to 
chose which credentials are used for DynamoDB.
- Document limitations etc. in the s3guard.md site doc.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

[jira] [Created] (HADOOP-13793) s3guard: add inconsistency injection, integration tests

2016-11-03 Thread Aaron Fabbri (JIRA)

Aaron Fabbri created HADOOP-13793:
-

 Summary: s3guard: add inconsistency injection, integration tests
 Key: HADOOP-13793
 URL: https://issues.apache.org/jira/browse/HADOOP-13793
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Aaron Fabbri


Many of us share concerns that testing the consistency features of S3Guard will 
be difficult if we depend on the rare and unpredictable occurrence of actual 
inconsistency in S3 to exercise those code paths.

I think we should have a mechanism for injecting failure to force exercising of 
the consistency codepaths in S3Guard.

Requirements:
- Integration tests that cause S3A to see the types of inconsistency we address 
with S3Guard.
- These are deterministic integration tests.

Unit tests are possible as well, if we were to stub out the S3Client.  That may 
be less bang for the buck, though.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

[jira] [Created] (HADOOP-13761) S3Guard: implement retries

2016-10-25 Thread Aaron Fabbri (JIRA)

Aaron Fabbri created HADOOP-13761:
-

 Summary: S3Guard: implement retries 
 Key: HADOOP-13761
 URL: https://issues.apache.org/jira/browse/HADOOP-13761
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Reporter: Aaron Fabbri


Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
retry logic.

In HADOOP-13651, I added TODO comments in most of the places retry loops are 
needed, including:

- open(path).  If MetadataStore reflects recent create/move of file path, but 
we fail to read it from S3, retry.
- delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the 
file exists, retry.
- rename(src,dest).  If source path is not visible in S3 yet, retry.
- listFiles(). Skip for now. Not currently implemented in S3Guard. I will 
create a separate JIRA for this as it will likely require interface changes 
(i.e. prefix or subtree scan).

We may miss some cases initially and we should do failure injection testing to 
make sure we're covered.  Failure injection tests can be a separate JIRA to 
make this easier to review.

We also need basic configuration parameters around retry policy.  There should 
be a way to specify maximum retry duration, as some applications would prefer 
to receive an error eventually, than waiting indefinitely.  We should also be 
keeping statistics when inconsistency is detected and we enter a retry loop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

[jira] [Created] (HADOOP-13760) S3Guard: add delete tracking

2016-10-25 Thread Aaron Fabbri (JIRA)

Aaron Fabbri created HADOOP-13760:
-

 Summary: S3Guard: add delete tracking
 Key: HADOOP-13760
 URL: https://issues.apache.org/jira/browse/HADOOP-13760
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Reporter: Aaron Fabbri


Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
delete tracking.

Current behavior on delete is to remove the metadata from the MetadataStore.  
To make deletes consistent, we need to add a {{isDeleted}} flag to 
{{PathMetadata}} and check it when returning results from functions like 
{{getFileStatus()}} and {{listStatus()}}.  In HADOOP-13651, I added TODO 
comments in most of the places these new conditions are needed.  The work does 
not look to bad.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

[jira] [Created] (HADOOP-13651) S3Guard: S3AFileSystem Integration with MetadataStore

2016-09-23 Thread Aaron Fabbri (JIRA)

Aaron Fabbri created HADOOP-13651:
-

 Summary: S3Guard: S3AFileSystem Integration with MetadataStore
 Key: HADOOP-13651
 URL: https://issues.apache.org/jira/browse/HADOOP-13651
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Reporter: Aaron Fabbri


Modify S3AFileSystem et al. to optionally use a MetadataStore for metadata 
consistency and caching.

Implementation should have minimal overhead when no MetadataStore is configured.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

[jira] [Created] (HADOOP-13649) s3guard: implement time-based (TTL) expiry for LocalMetadataStore

2016-09-23 Thread Aaron Fabbri (JIRA)

Aaron Fabbri created HADOOP-13649:
-

 Summary: s3guard: implement time-based (TTL) expiry for 
LocalMetadataStore
 Key: HADOOP-13649
 URL: https://issues.apache.org/jira/browse/HADOOP-13649
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Reporter: Aaron Fabbri
Assignee: Aaron Fabbri


LocalMetadataStore is primarily a reference implementation for testing.  It may 
be useful in narrow circumstances where the workload can tolerate short-term 
lack of inter-node consistency:  Being in-memory, one JVM/node's 
LocalMetadataStore will not see another node's changes to the underlying 
filesystem.

To put a bound on the time during which this inconsistency may occur, we should 
implement time-based (a.k.a. Time To Live / TTL)  expiration for 
LocalMetadataStore




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

[jira] [Created] (HADOOP-13631) S3Guard: implement move() for LocalMetadataStore, add unit tests

2016-09-20 Thread Aaron Fabbri (JIRA)

Aaron Fabbri created HADOOP-13631:
-

 Summary: S3Guard: implement move() for LocalMetadataStore, add 
unit tests
 Key: HADOOP-13631
 URL: https://issues.apache.org/jira/browse/HADOOP-13631
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Reporter: Aaron Fabbri
Assignee: Aaron Fabbri


Building on HADOOP-13573 and HADOOP-13452, implement move() in 
LocalMetadataStore and associated MetadataStore unit tests.

(Making this a separate JIRA to break up work into decent-sized and reviewable 
chunks.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Re: [VOTE] Release Apache Hadoop 3.0.0-alpha1 RC0

2016-09-01 Thread Aaron Fabbri

+1, non-binding.

I built everything on OS X and ran the s3a contract tests successfully:

mvn test -Dtest=org.apache.hadoop.fs.contract.s3a.\*

...

Results :


Tests run: 78, Failures: 0, Errors: 0, Skipped: 1


[INFO]


[INFO] BUILD SUCCESS

[INFO]


On Thu, Sep 1, 2016 at 3:39 PM, Andrew Wang 
wrote:

> Good point Allen, I forgot about `hadoop version`. Since it's populated by
> a version-info.properties file, people can always cat that file.
>
> On Thu, Sep 1, 2016 at 3:21 PM, Allen Wittenauer  >
> wrote:
>
> >
> > > On Sep 1, 2016, at 3:18 PM, Allen Wittenauer  >
> > wrote:
> > >
> > >
> > >> On Sep 1, 2016, at 2:57 PM, Andrew Wang 
> > wrote:
> > >>
> > >> Steve requested a git hash for this release. This led us into a brief
> > >> discussion of our use of git tags, wherein we realized that although
> > >> release tags are immutable (start with "rel/"), RC tags are not. This
> is
> > >> based on the HowToRelease instructions.
> > >
> > >   We should probably embed the git hash in one of the files that
> > gets gpg signed.  That's an easy change to create-release.
> >
> >
> > (Well, one more easily accessible than 'hadoop version')
>

[jira] [Created] (HADOOP-13573) S3Guard: create basic contract tests for MetadataStore implementations

2016-09-01 Thread Aaron Fabbri (JIRA)

Aaron Fabbri created HADOOP-13573:
-

 Summary: S3Guard: create basic contract tests for MetadataStore 
implementations
 Key: HADOOP-13573
 URL: https://issues.apache.org/jira/browse/HADOOP-13573
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 2.9.0
Reporter: Aaron Fabbri
Assignee: Aaron Fabbri


We should have some contract-style unit tests for the MetadataStore interface 
to validate that the different implementations provide correct semantics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

[jira] [Created] (HADOOP-13559) Remove close() within try-with-resources

2016-08-29 Thread Aaron Fabbri (JIRA)

Aaron Fabbri created HADOOP-13559:
-

 Summary: Remove close() within try-with-resources
 Key: HADOOP-13559
 URL: https://issues.apache.org/jira/browse/HADOOP-13559
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 2.8.0, 2.9.0
Reporter: Aaron Fabbri
Assignee: Aaron Fabbri
Priority: Minor


My colleague noticed that HADOOP-12994 introduced to places where close() was 
still called manually within a try-with-resources block.

I'll attach a patch to remove the manual close() calls. 

These extra calls to close() are probably safe, as InputStream is a Closable, 
not AutoClosable (the later does not specify close() as idempotent).




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

[jira] [Created] (HADOOP-13476) CredentialProviderFactory fails at class loading from libhdfs (JNI)

2016-08-09 Thread Aaron Fabbri (JIRA)

Aaron Fabbri created HADOOP-13476:
-

 Summary: CredentialProviderFactory fails at class loading from 
libhdfs (JNI)
 Key: HADOOP-13476
 URL: https://issues.apache.org/jira/browse/HADOOP-13476
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Affects Versions: 2.8.0, 2.9.0, 3.0.0-alpha2
Reporter: Aaron Fabbri
Assignee: Aaron Fabbri


This bug was discovered when trying to run Impala (libhdfs.so) with s3a and 
Java KeyStore credentials.  Because JNI threads have a different classloader 
(bootstrap), we fail to load JavaKeyStoreProvider.

{quote}
15:11:42.658087 26310 jni-util.cc:166] java.util.ServiceConfigurationError: 
org.apache.hadoop.security.alias.CredentialProviderFactory: Provider
org.apache.hadoop.security.alias.JavaKeyStoreProvider$Factory not found
at java.util.ServiceLoader.fail(ServiceLoader.java:231)
at java.util.ServiceLoader.access$300(ServiceLoader.java:181)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:365)
at java.util.ServiceLoader$1.next(ServiceLoader.java:445)
at 
org.apache.hadoop.security.alias.CredentialProviderFactory.getProviders(CredentialProviderFactory.java:57)
at 
org.apache.hadoop.conf.Configuration.getPasswordFromCredentialProviders(Configuration.java:1950)
at org.apache.hadoop.conf.Configuration.getPassword(Configuration.java:1930)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.getAWSAccessKeys(S3AFileSystem.java:366)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.getAWSCredentialsProvider(S3AFileSystem.java:415)
at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:176)
{quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Re: AWS S3AInputStream questions

2016-08-05 Thread Aaron Fabbri

On Tue, Aug 2, 2016 at 12:17 AM, Mr rty ff  wrote:
>
> Hi I have few questions about implementation of inputstream in S3.
>  1)public synchronized long getPos() throws IOException
> {return (nextReadPos < 0) ? 0 : nextReadPos;}
> Why does it return nextReadPos  not pos?

My understanding is:

seek() is a lazy implementation.  S3AInputStream keeps track of two
seek positions:

1. current position in underlying stream (pos)
2. next position to read (nextReadPos).

If the seek() implementation were eager, not lazy, we could do the seeking when
seek() is called.  In that case, I think we would only need to keep
track of #1 (pos).

Instead we keep track of where the next read() will start, and
lazily do the seek logic when it is actually needed.

getPos() is supposed to return the position of the next read(),
so nextReadPos is the correct value to return.

> In memeber definition for
> pos/*** This is the public position; the one set in {@link #seek(long)}* and
> returned in {@link #getPos()}.*/

This is probably the source of your confusion.  Looks like this comment should
be changed.  I believe pos is the position of the underlying stream,
not the next read pos. They probably became different when
lazy seek was implemented.

> private long pos;

> 2)seekInStream  In the last lines you have:// close the stream;
>  if read the object will be opened at the new pos
> closeStream("seekInStream()", this.requestedStreamLen);
> pos = targetPos; Why you need this line? Shouldn`t pos be updated
> with actual skipped value? As you did:
> | if (skipped > 0) { |
> | pos += skipped; |

skipped variable is not in scope at that point.

It is used to keep track of how far the underlying stream actually skipped.

The point of this logic is to balance performance between
(a) always reopening the stream at the newly-seeked position
(b) just reading forward and discarding unneeded bytes

I believe (a) was found to inefficient in some cases.

This code implements both approaches, depending on how far
forward the seek() is.  The code you are talking about here is
the (a) case where we reopen the stream on next read().

In this case, we just store the desired position (pos) which
will be used in the next call to read() to open the
stream at the offset 'pos' (see call to lazySeek()).

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

[jira] [Created] (HADOOP-13351) TestDFSClientSocketSize buffer size tests are flaky

2016-07-07 Thread Aaron Fabbri (JIRA)

Aaron Fabbri created HADOOP-13351:
-

 Summary: TestDFSClientSocketSize buffer size tests are flaky
 Key: HADOOP-13351
 URL: https://issues.apache.org/jira/browse/HADOOP-13351
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.8.0, 3.0.0-alpha1
Reporter: Aaron Fabbri


{{TestDFSClientSocketSize}} has two tests that assert that a value that was set 
via {{java.net.Socket#setSendBufferSize}} is equal to the value subsequently 
returned by {{java.net.Socket#getSendBufferSize}}.

These tests are flaky when we run them. The occasionally fail.

This is expected behavior, actually, because {{Socket#setSendBufferSize()}}[is 
only a 
hint|https://docs.oracle.com/javase/7/docs/api/java/net/Socket.html#setSendBufferSize(int)].
  (Similar to how the underlying libc {{setsockopt(SO_SNDBUF)}} works).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

[jira] [Created] (HADOOP-13230) s3a's use of fake empty directory blobs does not interoperate with other s3 tools

2016-06-01 Thread Aaron Fabbri (JIRA)

Aaron Fabbri created HADOOP-13230:
-

 Summary: s3a's use of fake empty directory blobs does not 
interoperate with other s3 tools
 Key: HADOOP-13230
 URL: https://issues.apache.org/jira/browse/HADOOP-13230
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Affects Versions: 2.9.0
Reporter: Aaron Fabbri


Users of s3a may not realize that, in some cases, it does not interoperate well 
with other s3 tools, such as the AWS CLI.  (See HIVE-13778, IMPALA-3558).

Specifically, if a user:

- Creates an empty directory with hadoop fs -mkdir s3a://bucket/path
- Copies data into that directory via another tool, i.e. aws cli.
- Tries to access the data in that directory with any Hadoop software.

Then the last step fails because the fake empty directory blob that s3a wrote 
in the first step, causes s3a (listStatus() etc.) to continue to treat that 
directory as empty, even though the second step was supposed to populate the 
directory with data.

I wanted to document this fact for users. We may mark this as not-fix, "by 
design".. May also be interesting to brainstorm solutions and/or a config 
option to change the behavior if folks care.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

[jira] [Created] (HADOOP-12267) s3a failure due to integer overflow bug in AWS SDK

2015-07-23 Thread Aaron Fabbri (JIRA)

Aaron Fabbri created HADOOP-12267:
-

 Summary: s3a failure due to integer overflow bug in AWS SDK
 Key: HADOOP-12267
 URL: https://issues.apache.org/jira/browse/HADOOP-12267
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Affects Versions: 2.6.0
Reporter: Aaron Fabbri
Assignee: Aaron Fabbri


Under high load writing to Amazon AWS S3 storage, a client can be throttled 
enough to encounter 24 retries in a row.
The amazon http client code (in aws-java-sdk jar) has a bug in its exponential 
backoff retry code, that causes integer overflow, and a call to Thread.sleep() 
with a negative value, which causes client to bail out with an exception (see 
below).


Bug has been fixed in aws-java-sdk:

https://github.com/aws/aws-sdk-java/pull/388

We need to pick this up for hadoop-tools/hadoop-aws.

Error: java.io.IOException: File copy failed: hdfs://path-redacted -- 
s3a://path-redacted
at 
org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:284)
at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:252) 
at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:50)  
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) 
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) 
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method) 
at javax.security.auth.Subject.doAs(Subject.java:415) 
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: 
java.io.IOException: Couldn't run retriable-command: Copying 
hdfs://path-redacted to s3a://path-redacted
at 
org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101)
 
at 
org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:280)
 
... 10 more 
Caused by: com.amazonaws.AmazonClientException: Unable to complete transfer: 
timeout value is negative
at 
com.amazonaws.services.s3.transfer.internal.AbstractTransfer.unwrapExecutionException(AbstractTransfer.java:300)
at 
com.amazonaws.services.s3.transfer.internal.AbstractTransfer.rethrowExecutionException(AbstractTransfer.java:284)
at 
com.amazonaws.services.s3.transfer.internal.CopyImpl.waitForCopyResult(CopyImpl.java:67)
 
at org.apache.hadoop.fs.s3a.S3AFileSystem.copyFile(S3AFileSystem.java:943) 
at org.apache.hadoop.fs.s3a.S3AFileSystem.rename(S3AFileSystem.java:357) 
at 
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.promoteTmpToTarget(RetriableFileCopyCommand.java:220)
at 
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:137)
 
at 
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:100)
at 
org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87) 
... 11 more 
Caused by: java.lang.IllegalArgumentException: timeout value is negative
at java.lang.Thread.sleep(Native Method) 
at 
com.amazonaws.http.AmazonHttpClient.pauseBeforeNextRetry(AmazonHttpClient.java:864)
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:353) 
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232) 
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528)
at com.amazonaws.services.s3.AmazonS3Client.copyObject(AmazonS3Client.java:1507)
at 
com.amazonaws.services.s3.transfer.internal.CopyCallable.copyInOneChunk(CopyCallable.java:143)
at 
com.amazonaws.services.s3.transfer.internal.CopyCallable.call(CopyCallable.java:131)
 
at 
com.amazonaws.services.s3.transfer.internal.CopyMonitor.copy(CopyMonitor.java:189)
 
at 
com.amazonaws.services.s3.transfer.internal.CopyMonitor.call(CopyMonitor.java:134)
 
at 
com.amazonaws.services.s3.transfer.internal.CopyMonitor.call(CopyMonitor.java:46)
  
at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
at java.lang.Thread.run(Thread.java:745) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

88 matches

Mail list logo