Apache Hadoop qbt Report: branch-2.10+JDK7 on Linux/x86_64
For more details, see https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/229/ [Mar 6, 2021 10:22:51 PM] (Konstantin Shvachko) HDFS-15808. Add metrics for FSNamesystem read/write lock hold long time. (#2668) Contributed by tomscut. -1 overall The following subsystems voted -1: docker Powered by Apache Yetushttps://yetus.apache.org - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86_64
For more details, see https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/438/ [Mar 5, 2021 12:18:06 PM] (Peter Bacsko) YARN-10639. Queueinfo related capacity, should adjusted to weight mode. Contributed by Qi Zhu. [Mar 5, 2021 12:50:45 PM] (Peter Bacsko) YARN-10642. Race condition: AsyncDispatcher can get stuck by the changes introduced in YARN-8995. Contributed by zhengchenyu. [Mar 5, 2021 1:56:51 PM] (noreply) HADOOP-17563. Update Bouncy Castle to 1.68. (#2740) [Mar 5, 2021 2:56:56 PM] (Peter Bacsko) YARN-10640. Adjust the queue Configured capacity to Configured weight number for weight mode in UI. Contributed by Qi Zhu. [Mar 5, 2021 7:46:40 PM] (Eric Badger) YARN-10664. Allow parameter expansion in NM_ADMIN_USER_ENV. Contributed by Jim [Mar 5, 2021 10:13:35 PM] (Peter Bacsko) YARN-10672. All testcases in TestReservations are flaky. Contributed By Szilard Nemeth. -1 overall The following subsystems voted -1: blanks pathlen unit xml The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: XML : Parsing Error(s): hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-excerpt.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags2.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-sample-output.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/fair-scheduler-invalid.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/yarn-site-with-invalid-allocation-file-ref.xml Failed junit tests : hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks hadoop.yarn.server.nodemanager.TestNodeStatusUpdater hadoop.tools.dynamometer.TestDynamometerInfra hadoop.tools.dynamometer.TestDynamometerInfra cc: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/438/artifact/out/results-compile-cc-root.txt [116K] javac: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/438/artifact/out/results-compile-javac-root.txt [368K] blanks: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/438/artifact/out/blanks-eol.txt [13M] https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/438/artifact/out/blanks-tabs.txt [2.0M] checkstyle: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/438/artifact/out/results-checkstyle-root.txt [16M] pathlen: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/438/artifact/out/results-pathlen.txt [16K] pylint: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/438/artifact/out/results-pylint.txt [20K] shellcheck: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/438/artifact/out/results-shellcheck.txt [28K] xml: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/438/artifact/out/xml.txt [24K] javadoc: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/438/artifact/out/results-javadoc-javadoc-root.txt [1.1M] unit: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/438/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt [324K] https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/438/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt [168K] https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/438/artifact/out/patch-unit-hadoop-tools_hadoop-dynamometer_hadoop-dynamometer-infra.txt [8.0K] https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/438/artifact/out/patch-unit-hadoop-tools_hadoop-dynamometer.txt [24K] Powered by Apache Yetus 0.13.0 https://yetus.apache.org - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-17552) Change ipc.client.rpc-timeout.ms from 0 to 120000 by default to avoid potential hang
[ https://issues.apache.org/jira/browse/HADOOP-17552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki resolved HADOOP-17552. --- Fix Version/s: 3.4.0 Hadoop Flags: Reviewed Resolution: Fixed > Change ipc.client.rpc-timeout.ms from 0 to 12 by default to avoid > potential hang > > > Key: HADOOP-17552 > URL: https://issues.apache.org/jira/browse/HADOOP-17552 > Project: Hadoop Common > Issue Type: Bug > Components: ipc >Affects Versions: 3.2.2 >Reporter: Haoze Wu >Assignee: Haoze Wu >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 9h 10m > Remaining Estimate: 0h > > We are doing some systematic fault injection testing in Hadoop-3.2.2 and > when we try to run a client (e.g., `bin/hdfs dfs -ls /`) to our HDFS cluster > (1 NameNode, 2 DataNodes), the client gets stuck forever. After some > investigation, we believe that it’s a bug in `hadoop.ipc.Client` because the > read method of `hadoop.ipc.Client$Connection$PingInputStream` keeps > swallowing `java.net.SocketTimeoutException` due to the mistaken usage of the > `rpcTimeout` configuration in the `handleTimeout` method. > *Reproduction* > Start HDFS with the default configuration. Then execute a client (we used > the command `bin/hdfs dfs -ls /` in the terminal). While HDFS is trying to > accept the client’s socket, inject a socket error (java.net.SocketException > or java.io.IOException), specifically at line 1402 (line 1403 or 1404 will > also work). > We prepare the scripts for reproduction in a gist > ([https://gist.github.com/functioner/08bcd86491b8ff32860eafda8c140e24]). > *Diagnosis* > When the NameNode tries to accept a client’s socket, basically there are > 4 steps: > # accept the socket (line 1400) > # configure the socket (line 1402-1404) > # make the socket a Reader (after line 1404) > # swallow the possible IOException in line 1350 > {code:java} > //hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java > public void run() { > while (running) { > SelectionKey key = null; > try { > getSelector().select(); > Iterator iter = > getSelector().selectedKeys().iterator(); > while (iter.hasNext()) { > key = iter.next(); > iter.remove(); > try { > if (key.isValid()) { > if (key.isAcceptable()) > doAccept(key); > } > } catch (IOException e) { // line 1350 > } > key = null; > } > } catch (OutOfMemoryError e) { > // ... > } catch (Exception e) { > // ... > } > } > } > void doAccept(SelectionKey key) throws InterruptedException, IOException, > OutOfMemoryError { > ServerSocketChannel server = (ServerSocketChannel) key.channel(); > SocketChannel channel; > while ((channel = server.accept()) != null) { // line 1400 > channel.configureBlocking(false); // line 1402 > channel.socket().setTcpNoDelay(tcpNoDelay); // line 1403 > channel.socket().setKeepAlive(true); // line 1404 > > Reader reader = getReader(); > Connection c = connectionManager.register(channel, > this.listenPort, this.isOnAuxiliaryPort); > // If the connectionManager can't take it, close the connection. > if (c == null) { > if (channel.isOpen()) { > IOUtils.cleanup(null, channel); > } > connectionManager.droppedConnections.getAndIncrement(); > continue; > } > key.attach(c); // so closeCurrentConnection can get the object > reader.addConnection(c); > } > } > {code} > When a SocketException occurs in line 1402 (or 1403 or 1404), the > server.accept() in line 1400 has finished, so we expect the following > behavior: > # The server (NameNode) accepts this connection but it will basically write > nothing to this connection because it’s not added as a Reader data structure. > # The client is aware that the connection has been established, and tries to > read and write in this connection. After some time threshold, the client > finds that it can’t read anything from this connection and exits with some > exception or error. > However, we do not observe behavior 2. The client just gets stuck forever > (>10min). We re-examine the default configuration in > [https://hadoop.apache.org/docs/r3.2.2/hadoop-project-dist/hadoop-common/core-default.xml] > and we
[jira] [Created] (HADOOP-17567) typo in MagicCommitTracker
Pierrick HYMBERT created HADOOP-17567: - Summary: typo in MagicCommitTracker Key: HADOOP-17567 URL: https://issues.apache.org/jira/browse/HADOOP-17567 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Affects Versions: 3.3.0 Reporter: Pierrick HYMBERT Assignee: Steve Loughran the test TestNeworkBinding should be TestNetworkBinding -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org