[ https://issues.apache.org/jira/browse/HADOOP-14971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213302#comment-16213302 ]
ASF GitHub Bot commented on HADOOP-14971: ----------------------------------------- GitHub user steveloughran opened a pull request: https://github.com/apache/hadoop/pull/282 HADOOP-14971 Merge S3A committers into trunk HADOOP 13786 & MAPREDUCE-6823 code as a PR for better review You can merge this pull request into a Git repository by running: $ git pull https://github.com/steveloughran/hadoop s3guard/HADOOP-13786-committer Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hadoop/pull/282.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #282 ---- commit 70e2a84547936cdfa65c58a2482c498eabbce889 Author: Steve Loughran <ste...@apache.org> Date: 2017-09-06T17:18:15Z HADOOP-13786: apply the HADOOP-13796-on-branch-2 patch to trunk, whitspace fix commit 738b0c045603182b38d1ce08d97f60393043f565 Author: Steve Loughran <ste...@apache.org> Date: 2017-09-06T18:28:37Z HADOOP-13786 fixing docs to avoid doxia bug on level 4 entries commit 6d99b815eb33ccc0d0514e6330eb255f77d29372 Author: Steve Loughran <ste...@apache.org> Date: 2017-09-12T19:57:02Z HADOOP-13786 HADOOP-14303 error handling: l-exp wrappers around core metadata ops commit d9f72547212f7bc5c47b4aab949581e5e0d448ee Author: Steve Loughran <ste...@apache.org> Date: 2017-09-12T19:57:31Z HADOOP-13786 TestStagingCommitter -> java 8 closures commit 09249b354c9c0043d2690cd4d3fbb124e09eb2d8 Author: Steve Loughran <ste...@apache.org> Date: 2017-09-13T15:57:33Z HADOOP-13786 HADOOP-14531 lambda wrapper around all production s3 calls * all invocations of s3 calls are wrapped where appropriate, either with once() (which does the translation), retry() or retryUntranslated * javadocs state retry policy; this is propagated to give callers an idea of what retries already * commit tests -> java 8 lambdas too * test json serdeser in hadoop common * checkstyle commit 5814f22aeab22d5a3bacb27bb456d665530c3d94 Author: Steve Loughran <ste...@apache.org> Date: 2017-09-15T10:30:57Z HADOOP-13786 HADOOP-14531 * new @Retries annotation for the s3a classes to use to make their retry policy more visible in the source. This is a source-only annotation unused anywhere, but does make visible policy. You can't call a non-retrying method and be retrying yourself unless you add your own retry logic * fault injecting AWSS3 client better about knowing when is good to fail (i.e not so aggressively on listing operations) * callback interface for before/during retries unified * and logging cut back so only first failure gets logged on a retry loop. Maybe that could be tuned to remember the previous failure & log if its different class * all integration tests excluding rename() ones are now working when tested with a high (25-50% throttle rate). * DDB logs of capacity limit failures commit 2bd385361bda4ddd1590dfed7c3377bee1ffa739 Author: Steve Loughran <ste...@apache.org> Date: 2017-09-15T10:31:19Z HADOOP-13786 turn off false alarm in findbugs commit e8039d3d7734b607c0d0e093ea6d573672490753 Author: Steve Loughran <ste...@apache.org> Date: 2017-09-19T10:38:32Z HADOOP-13786 MAPREDUCE-6823 FileOutputFormat uses the committer factory, with tests commit 1e61b94490fbf3f75330ceea3b5d3b863f5efbe6 Author: Steve Loughran <ste...@apache.org> Date: 2017-09-19T13:47:44Z HADOOP-13786 * s/DefaultPutTracker/r/PutTracker. Yes, it is the default one, but its misleading as a type. * move to l-expressions in block output stream callables & the committers. Exception: Tasks.runParallel() whose closure is complex enough that the IDE was warning about its size. Maybe best to refactor as a method invoked as this::exec * Adding new statistic, {{committer_bytes_uploaded}}, set when a stream is closed to #of bytes PUT. * S3A FS implements {{StreamCapabilities}}, dynamically declares if it is magic by returning true on {{hasCapability("fs.s3a.magic.enabled")}} when it is. * S3ABlockOutputStream implements {{StreamCapabilities}}; dynamically declares if its output has delayed visibility. Also: that it doesn't do hsync/hflush, obviously. * {{CommitOperations}}: Experimented with replacing {{MaybeIOE}} with Java 8 Optional<> type. Doesn't work as {{maybeThrow}} can't be implemented as {{Optional<IOException>.map((e) -> {throw e;})}}; java's checked exceptions makes maps fairly useless for the Hadoo IOE-throwing APIs. OUutcome: {{MaybeIOE}} unchanged. * Minor cleanup of production & test code * starting to write end user documentation. Needs more clarity on directory vs partitioned output on staging committer, including examples commit 798e0a3e2ed9ad0185ca003151489ff18acdacfb Author: Steve Loughran <ste...@apache.org> Date: 2017-09-21T10:30:38Z HADOOP-13786 MAPREDUCE-6823 adding public getOutputPath to PathOutputCommitter API, as some callers currenly scan the JobConf settings to find this value commit 91611c32e19ab3fb59ebc1c99b8d3855c50de56b Author: Steve Loughran <ste...@apache.org> Date: 2017-09-21T10:38:18Z HADOOP-13786 altering s3a committer code to track MAPREDUCE-6823, commit 91bc628638f65dab3b5f8bdad3e89bcc0c874af0 Author: Steve Loughran <ste...@apache.org> Date: 2017-09-22T10:46:42Z HADOOP-13786 HADOOP-14531: 443 response goes to NoResponseException, treat as retryable for non-idempotent calls only commit d0d36abc95b4108f1c2e7fb3825a4353b47351ec Author: Steve Loughran <ste...@apache.org> Date: 2017-09-22T18:49:00Z HADOOP-13786 downgrade startup log about magic from info to debug. s3guard bucket-info should show its status though. Also, move another anon class to a l-exp commit 77f9fb212d1d83868b85d5689f3cd7ecd7165eec Author: Steve Loughran <ste...@apache.org> Date: 2017-09-26T14:55:37Z HADOOP-13786 HADOOP-14531 DDB throttling events are logged as a quantile/rate metric (Hz) rather than just total count. commit c98b1421ca131406a2059f2a6659d86377eaf971 Author: Steve Loughran <ste...@apache.org> Date: 2017-09-26T19:10:00Z HADOOP-13786 javadocs of Retries commit 78f85138a521a800d99ec2a257ad5cf1c8e6e445 Author: Steve Loughran <ste...@apache.org> Date: 2017-09-27T16:08:25Z HADOOP-13786 HADOOP-14531 rework retry logic, including Ewan's feedback. New names, les logging. Also, exceptions are translated before the event handler is called, even if the operation is untranslated. This means the event handler doesn't need to worry about whether the incoming event is raw/vs translated commit 51d4d519efde7412ea12df930c69e54c3a5432e0 Author: Steve Loughran <ste...@apache.org> Date: 2017-09-27T18:27:55Z HADOOP-13786 checkstyle and bucket-info gains a "-magic" command to verify that magic support is turned on commit 48566c512b6faa7cadc4b7f5b8709ca01a9a9c03 Author: Steve Loughran <ste...@apache.org> Date: 2017-09-28T18:25:14Z HADOOP-13786 MAPREDUCE-6823 more test on the commit factories commit 272e32a0e42d3c798e1def10197997f8ebb5b342 Author: Steve Loughran <ste...@apache.org> Date: 2017-09-28T18:26:23Z HADOOP-13786 more on commit algorithms themselves, turning docs and commit/abort code to match commit 34aee058cdd8d691fd61592e353e3b717b145a94 Author: Steve Loughran <ste...@apache.org> Date: 2017-10-03T15:00:54Z HADOOP-13786 paste in code from how the MR AM creates a committer, to verify that it works without spinning up the whole cluster Change-Id: I6841877fde593d6dffa1ba6065a2dc7564ab3329 (cherry picked from commit 3634f5a20c76c0b28c4f9b4f7e39af4db5fc8c68) commit d1b072c4faea798106378416479f5916b7d3d325 Author: Steve Loughran <ste...@apache.org> Date: 2017-10-03T17:03:59Z HADOOP-13786 MAPREDUCE-6823 improving commentary on committer factory; clean up tests Change-Id: Ie468a243b23e389122b1e1c7281f76671d567167 commit d775a149b45377b31e612212df8485f9aa564f2a Author: Steve Loughran <ste...@apache.org> Date: 2017-10-03T18:25:35Z HADOO-13786 setting up for testing of partitioning merge strategies. I understand what it is trying to do now Change-Id: Ia1e4834e5793a9a768e4f373b7dafb39e195af4e commit f2e0701b81c180e93464d5734e20a3e65509aedb Author: Steve Loughran <ste...@apache.org> Date: 2017-10-04T19:38:45Z HADOOP-13786 partitioned committer work (+some java 8 bits) * move lambda map/flatmap/apply ops on located file status iterator into S3AUtils from TestUtils, use in staging committer & commit operations; * document what partitioned committer does, with notes (needs verification) * testing of Paths.addUUID() and fix failures Change-Id: I7329a45668f272162d836a2bbbf2cf3e71c56e56 commit 40204f1169a515f10f9a0d0c9283b27efb8c2653 Author: Steve Loughran <ste...@apache.org> Date: 2017-10-04T19:40:28Z HADOOP-13786 revert back to java-7 logic in CommitOperations: cute but overcomplex here. Change-Id: I6f5a176e360cc6071a0f35cbb324f50fb335b233 commit fa2860c7505d6ae1ac8360b3998aa0034ecce448 Author: Steve Loughran <ste...@apache.org> Date: 2017-10-09T17:14:57Z HADOOP-13786 MAPREDUCE-6823 remove createCommitter(JobContext) as the only place a FileOutputCommitter is created off a job context is in the code bridging from the v1 to v2 APIs of FileOutputFormat. The new factory model doesn't support v1 MR, so it's not needed. This simplifies testing and allows for code cutbacks in the s3a implementations & downstream. Change-Id: Ifb51c1465a359f7f2cdafb16fe6e21dd143cadbf commit 8f696e74d0d1d4eb0c3737c21705bc61f06087e8 Author: Steve Loughran <ste...@apache.org> Date: 2017-10-09T17:19:13Z HADOOP-13786 S3A committers don't need to support a JobContext in the constructors or factories: remove, clean up tests. Where tests do need to create a Committer with nothing but a JobConf, use the same code which MR itself does for this, now statically exported from AbstractCommitITest Change-Id: I79ab5acd9e4c15f4c1b9b520cf18258a97b7dbdc commit 4cfa70bb1479fb7e938597b5ff0f278ee22fd9f3 Author: Steve Loughran <ste...@apache.org> Date: 2017-10-09T18:46:05Z HADOOP-13786: Success marker: Should we delete this when a job starts? Yes: its presence marks the completion of a job No: if it contains metadata, that data may be valid until the new data is present Change-Id: I359cb943745f6b7b58667f7462bfcb7c0b0313e7 commit ac091be5eb2e975fd89b250f6900fadf4e84351e Author: Steve Loughran <ste...@apache.org> Date: 2017-10-10T18:23:59Z HADOOP-13786 MAPREDUCE-6823 There's now a "BindingPathOutputCommitter" which can be instantiated and which relays its invocations to the factory. This is useful to work with code which takes a committer classname to know what to instantiated -it allows you to delegate to the factory for dynamic binding on a per-destination basis. Change-Id: I0472c60df98a54e5272b221c650c2a09e3d46fa1 commit c74a599bc89db43b9df7b76478e54d6d5666cb11 Author: Steve Loughran <ste...@apache.org> Date: 2017-10-10T18:25:32Z HADOOP-13786 MAPREDUCE-6823 static method to combine createing factory & committer in one go; turns out to be a useful operation downstream, so merits simplification. Tests too. Change-Id: Ie5173141132ba41bad5af9f97fd67056428e7f2b commit 84fab155a549c53ee46760ec390c87b8a54b13f4 Author: Steve Loughran <ste...@apache.org> Date: 2017-10-12T20:28:37Z HADOOP-13786 * WriteOperationHelper no longer takes a key in its constructor, caller must supply on the relevant ops * _SUCCESS file includes a name field which is validated on load; goal is to identify other formats/versions and reject. * big code review of tests, including renaming, cleanup, IDE-suggested cleanup * tests also verify that the hasCapabilities() field returns true for the magic option on a magic write, false for a non-magic one, even on a magic FS. Change-Id: Ia2de777e98c73819d44c2b755fb57be4be5e4a34 ---- > Merge S3A committers into trunk > ------------------------------- > > Key: HADOOP-14971 > URL: https://issues.apache.org/jira/browse/HADOOP-14971 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 > Affects Versions: 3.0.0 > Reporter: Steve Loughran > Assignee: Steve Loughran > > Merge the HADOOP-13786 committer into trunk. This branch is being set up as a > github PR for review there & to keep it out the mailboxes of the watchers on > the main JIRA -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org