[DISCUSS] A unified and open Hadoop community sync up schedule?

2019-06-07 Thread Wangda Tan
Hi Hadoop-devs,

Previous we have regular YARN community sync up (1 hr, biweekly, but not
open to public). Recently because of changes in our schedules, Less folks
showed up in the sync up for the last several months.

I saw the K8s community did a pretty good job to run their sig meetings,
there's regular meetings for different topics, notes, agenda, etc. Such as
https://docs.google.com/document/d/13mwye7nvrmV11q9_Eg77z-1w3X7Q1GTbslpml4J7F3A/edit


For Hadoop community, there are less such regular meetings open to the
public except for Ozone project and offline meetups or Bird-of-Features in
Hadoop/DataWorks Summit. Recently we have a few folks joined DataWorks
Summit at Washington DC and Barcelona, and lots (50+) of folks join the
Ozone/Hadoop/YARN BoF, ask (good) questions and roadmaps. I think it is
important to open such conversations to the public and let more
folk/companies join.

Discussed a small group of community members and wrote a short proposal
about the form, time and topic of the community sync up, thanks for
everybody who have contributed to the proposal! Please feel free to add
your thoughts to the Proposal Google doc

.

Especially for the following parts:
- If you have interests to run any of the community sync-ups, please put
your name to the table inside the proposal. We need more volunteers to help
run the sync-ups in different timezones.
- Please add suggestions to the time, frequency and themes and feel free to
share your thoughts if we should do sync ups for other topics which are not
covered by the proposal.

Link to the Proposal Google doc


Thanks,
Wangda Tan


Re: MapReduce TeraSort fails on S3

2019-06-07 Thread Steve Loughran
(Prabhu and I will work on this online; if  HADOOP-16058 is in then it is
probably just a test setup problem)

On Fri, Jun 7, 2019 at 3:18 PM Prabhu Joseph 
wrote:

> Hi,
>
>  MapReduce TeraSort Job fails on S3 with Output PathExistsException.
> Is this a known issue?
>
> Thanks,
> Prabhu Joseph
>
>
> [hrt_qa@hostname root]$ yarn jar
>
> /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples-3.1.1.7.0.0.0-115.jar
> terasort s3a:/bucket/INPUT s3a://bucket/OUTPUT
>
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of
> YARN_OPTS.
>
> 19/06/07 14:13:11 INFO terasort.TeraSort: starting
>
> 19/06/07 14:13:12 WARN impl.MetricsConfig: Cannot locate configuration:
> tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
>
> 19/06/07 14:13:12 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot
> period at 10 second(s).
>
> 19/06/07 14:13:12 INFO impl.MetricsSystemImpl: s3a-file-system metrics
> system started
>
> 19/06/07 14:13:14 INFO input.FileInputFormat: Total input files to process
> : 2
>
> Spent 396ms computing base-splits.
>
> Spent 3ms computing TeraScheduler splits.
>
> Computing input splits took 400ms
>
> Sampling 2 splits of 2
>
> Making 80 from 1 sampled records
>
> Computing parititions took 685ms
>
> Spent 1088ms computing partitions.
>
> 19/06/07 14:13:15 INFO client.RMProxy: Connecting to ResourceManager at
> hostname:8032
>
> 19/06/07 14:13:17 INFO mapreduce.JobResourceUploader: Disabling Erasure
> Coding for path: /user/hrt_qa/.staging/job_1559891760159_0011
>
> 19/06/07 14:13:17 INFO mapreduce.JobSubmitter: number of splits:2
>
> 19/06/07 14:13:17 INFO mapreduce.JobSubmitter: Submitting tokens for job:
> job_1559891760159_0011
>
> 19/06/07 14:13:17 INFO mapreduce.JobSubmitter: Executing with tokens: []
>
> 19/06/07 14:13:18 INFO conf.Configuration: resource-types.xml not found
>
> 19/06/07 14:13:18 INFO resource.ResourceUtils: Unable to find
> 'resource-types.xml'.
>
> 19/06/07 14:13:18 INFO impl.YarnClientImpl: Submitted application
> application_1559891760159_0011
>
> 19/06/07 14:13:18 INFO mapreduce.Job: The url to track the job:
> http://hostname:8088/proxy/application_1559891760159_0011/
>
> 19/06/07 14:13:18 INFO mapreduce.Job: Running job: job_1559891760159_0011
>
> 19/06/07 14:13:33 INFO mapreduce.Job: Job job_1559891760159_0011 running in
> uber mode : false
>
> 19/06/07 14:13:33 INFO mapreduce.Job:  map 0% reduce 0%
>
> 19/06/07 14:13:34 INFO mapreduce.Job: Job job_1559891760159_0011 failed
> with state FAILED due to: Job setup failed :
> org.apache.hadoop.fs.PathExistsException: `s3a://bucket/OUTPUT': Setting
> job as Task committer attempt_1559891760159_0011_m_00_0: Destination
> path exists and committer conflict resolution mode is "fail"
>
> at
>
> org.apache.hadoop.fs.s3a.commit.staging.StagingCommitter.failDestinationExists(StagingCommitter.java:878)
>
> at
>
> org.apache.hadoop.fs.s3a.commit.staging.DirectoryStagingCommitter.setupJob(DirectoryStagingCommitter.java:71)
>
> at
>
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobSetup(CommitterEventHandler.java:255)
>
> at
>
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:235)
>
> at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>
> at java.lang.Thread.run(Thread.java:748)
>
>
>
> 19/06/07 14:13:34 INFO mapreduce.Job: Counters: 2
>
> Job Counters
>
> Total time spent by all maps in occupied slots (ms)=0
>
> Total time spent by all reduces in occupied slots (ms)=0
>
> 19/06/07 14:13:34 INFO terasort.TeraSort: done
>
> 19/06/07 14:13:34 INFO impl.MetricsSystemImpl: Stopping s3a-file-system
> metrics system...
>
> 19/06/07 14:13:34 INFO impl.MetricsSystemImpl: s3a-file-system metrics
> system stopped.
>
> 19/06/07 14:13:34 INFO impl.MetricsSystemImpl: s3a-file-system metrics
> system shutdown complete.
>


[jira] [Created] (MAPREDUCE-7216) TeraSort Job Fails on S3

2019-06-07 Thread Prabhu Joseph (JIRA)
Prabhu Joseph created MAPREDUCE-7216:


 Summary: TeraSort Job Fails on S3
 Key: MAPREDUCE-7216
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7216
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph


TeraSort Job fails on S3 with below exception. Terasort creates OutputPath and 
writes partition filename but DirectoryStagingCommitter expects output path to 
not exist.


{code}
9/06/07 14:13:34 INFO mapreduce.Job: Job job_1559891760159_0011 failed with 
state FAILED due to: Job setup failed : 
org.apache.hadoop.fs.PathExistsException: `s3a://bucket/OUTPUT': Setting job as 
Task committer attempt_1559891760159_0011_m_00_0: Destination path exists 
and committer conflict resolution mode is "fail"

at 
org.apache.hadoop.fs.s3a.commit.staging.StagingCommitter.failDestinationExists(StagingCommitter.java:878)

at 
org.apache.hadoop.fs.s3a.commit.staging.DirectoryStagingCommitter.setupJob(DirectoryStagingCommitter.java:71)

at 
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobSetup(CommitterEventHandler.java:255)

at 
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:235)

at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)
{code}

Creating partition filename in /tmp or some other directory fixes the issue.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86

2019-06-07 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1160/

[Jun 6, 2019 3:13:39 AM] (xyao) HDDS-1612. Add 'scmcli printTopology' shell 
command to print datanode
[Jun 6, 2019 9:08:18 AM] (stevel) HADOOP-16117. Update AWS SDK to 1.11.563.
[Jun 6, 2019 9:21:55 AM] (sunilg) YARN-9573. DistributedShell cannot specify 
LogAggregationContext.
[Jun 6, 2019 10:24:13 AM] (elek) HDDS-1458. Create a maven profile to run fault 
injection tests.
[Jun 6, 2019 11:52:49 AM] (stevel) HADOOP-16344. Make DurationInfo public 
unstable.
[Jun 6, 2019 1:23:37 PM] (31469764+bshashikant) HDDS-1621. writeData in 
ChunkUtils should not use
[Jun 6, 2019 1:59:01 PM] (wwei) YARN-9590. Correct incompatible, incomplete and 
redundant activities.
[Jun 6, 2019 2:00:00 PM] (elek) HDDS-1645. Change the version of Pico CLI to 
the latest 3.x release -
[Jun 6, 2019 4:49:31 PM] (stevel) Revert "HADOOP-16344. Make DurationInfo 
public unstable."
[Jun 6, 2019 5:13:36 PM] (hanishakoneru) HDDS-1605. Implement AuditLogging for 
OM HA Bucket write requests.
[Jun 6, 2019 5:20:28 PM] (inigoiri) HDFS-14527. Stop all DataNodes may result 
in NN terminate. Contributed
[Jun 6, 2019 6:06:48 PM] (nanda) HDDS-1201. Reporting Corruptions in Containers 
to SCM (#912)
[Jun 6, 2019 6:13:29 PM] (nanda) HDDS-1647 : Recon config tag does not show up 
on Ozone UI. (#914)
[Jun 6, 2019 6:17:59 PM] (nanda) HDDS-1652. HddsDispatcher should not shutdown 
volumeSet. Contributed by
[Jun 6, 2019 6:20:04 PM] (nanda) HDDS-1650. Fix Ozone tests leaking volume 
checker thread. Contributed by
[Jun 6, 2019 6:59:53 PM] (inigoiri) HDFS-14486. The exception classes in some 
throw statements do not
[Jun 6, 2019 7:14:47 PM] (xyao) HDDS-1490. Support configurable container 
placement policy through 'o…
[Jun 6, 2019 8:41:58 PM] (eyang) YARN-9581.  Fixed yarn logs cli to access RM2. 
Contributed
[Jun 7, 2019 1:27:41 AM] (aajisaka) MAPREDUCE-6794. Remove unused properties 
from TTConfig.java




-1 overall


The following subsystems voted -1:
asflicense findbugs hadolint pathlen unit


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

FindBugs :

   
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-documentstore
 
   Unread field:TimelineEventSubDoc.java:[line 56] 
   Unread field:TimelineMetricSubDoc.java:[line 44] 

FindBugs :

   
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-mawo/hadoop-yarn-applications-mawo-core
 
   Class org.apache.hadoop.applications.mawo.server.common.TaskStatus 
implements Cloneable but does not define or use clone method At 
TaskStatus.java:does not define or use clone method At TaskStatus.java:[lines 
39-346] 
   Equals method for 
org.apache.hadoop.applications.mawo.server.worker.WorkerId assumes the argument 
is of type WorkerId At WorkerId.java:the argument is of type WorkerId At 
WorkerId.java:[line 114] 
   
org.apache.hadoop.applications.mawo.server.worker.WorkerId.equals(Object) does 
not check for null argument At WorkerId.java:null argument At 
WorkerId.java:[lines 114-115] 

Failed junit tests :

   hadoop.ha.TestZKFailoverController 
   hadoop.hdfs.web.TestWebHdfsTimeouts 
   hadoop.yarn.applications.distributedshell.TestDistributedShell 
   hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler 
   hadoop.mapreduce.v2.app.TestRuntimeEstimators 
   hadoop.mapred.TestMRTimelineEventHandling 
   hadoop.yarn.service.TestServiceAM 
   hadoop.tools.TestHadoopArchiveLogsRunner 
   hadoop.ozone.container.common.impl.TestHddsDispatcher 
   hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis 
   hadoop.ozone.client.rpc.TestOzoneAtRestEncryption 
   hadoop.ozone.client.rpc.TestFailureHandlingByClient 
   hadoop.ozone.client.rpc.TestOzoneClientRetriesOnException 
   hadoop.ozone.client.rpc.TestOzoneRpcClient 
   hadoop.ozone.client.rpc.TestSecureOzoneRpcClient 
   hadoop.hdds.scm.pipeline.TestRatisPipelineProvider 
   hadoop.hdds.scm.safemode.TestSCMSafeModeWithPipelineRules 
  

   cc:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1160/artifact/out/diff-compile-cc-root.txt
  [4.0K]

   javac:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1160/artifact/out/diff-compile-javac-root.txt
  [332K]

   checkstyle:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1160/artifact/out/diff-checkstyle-root.txt
  [17M]

   hadolint:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1160/artifact/out/diff-patch-hadolint.txt
  [8.0K]

   pathlen:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1160/a

Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2019-06-07 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/345/

No changes




-1 overall


The following subsystems voted -1:
asflicense findbugs hadolint pathlen unit xml


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

XML :

   Parsing Error(s): 
   
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/empty-configuration.xml
 
   hadoop-tools/hadoop-azure/src/config/checkstyle-suppressions.xml 
   hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/public/crossdomain.xml 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/public/crossdomain.xml
 

FindBugs :

   module:hadoop-common-project/hadoop-common 
   Class org.apache.hadoop.fs.GlobalStorageStatistics defines non-transient 
non-serializable instance field map In GlobalStorageStatistics.java:instance 
field map In GlobalStorageStatistics.java 

FindBugs :

   
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-client
 
   Boxed value is unboxed and then immediately reboxed in 
org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.readResultsWithTimestamps(Result,
 byte[], byte[], KeyConverter, ValueConverter, boolean) At 
ColumnRWHelper.java:then immediately reboxed in 
org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.readResultsWithTimestamps(Result,
 byte[], byte[], KeyConverter, ValueConverter, boolean) At 
ColumnRWHelper.java:[line 335] 

Failed junit tests :

   hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys 
   hadoop.hdfs.web.TestWebHdfsTimeouts 
   hadoop.registry.secure.TestSecureLogins 
   hadoop.yarn.server.timelineservice.security.TestTimelineAuthFilterForV2 
  

   cc:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/345/artifact/out/diff-compile-cc-root-jdk1.7.0_95.txt
  [4.0K]

   javac:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/345/artifact/out/diff-compile-javac-root-jdk1.7.0_95.txt
  [328K]

   cc:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/345/artifact/out/diff-compile-cc-root-jdk1.8.0_212.txt
  [4.0K]

   javac:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/345/artifact/out/diff-compile-javac-root-jdk1.8.0_212.txt
  [308K]

   checkstyle:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/345/artifact/out/diff-checkstyle-root.txt
  [16M]

   hadolint:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/345/artifact/out/diff-patch-hadolint.txt
  [4.0K]

   pathlen:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/345/artifact/out/pathlen.txt
  [12K]

   pylint:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/345/artifact/out/diff-patch-pylint.txt
  [24K]

   shellcheck:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/345/artifact/out/diff-patch-shellcheck.txt
  [72K]

   shelldocs:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/345/artifact/out/diff-patch-shelldocs.txt
  [8.0K]

   whitespace:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/345/artifact/out/whitespace-eol.txt
  [12M]
   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/345/artifact/out/whitespace-tabs.txt
  [1.2M]

   xml:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/345/artifact/out/xml.txt
  [12K]

   findbugs:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/345/artifact/out/branch-findbugs-hadoop-common-project_hadoop-common-warnings.html
  [8.0K]
   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/345/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-timelineservice-hbase_hadoop-yarn-server-timelineservice-hbase-client-warnings.html
  [8.0K]

   javadoc:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/345/artifact/out/diff-javadoc-javadoc-root-jdk1.7.0_95.txt
  [16K]
   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/345/artifact/out/diff-javadoc-javadoc-root-jdk1.8.0_212.txt
  [1.1M]

   unit:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/345/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
  [280K]
   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/345/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-registry.txt
  [12K]
   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/345/art

[jira] [Created] (MAPREDUCE-7215) Remove unused properties from MRJobConfig.java

2019-06-07 Thread Wanqiang Ji (JIRA)
Wanqiang Ji created MAPREDUCE-7215:
--

 Summary: Remove unused properties from MRJobConfig.java
 Key: MAPREDUCE-7215
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7215
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Wanqiang Ji


Unused properties:
 * SPLIT_FILE
 * JOB_JOBTRACKER_ID
 * WORKDIR
 * HADOOP_WORK_DIR

Should better used property:
 * DEFAULT_JOB_AM_ACCESS_DISABLED



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org