[ https://issues.apache.org/jira/browse/IMPALA-10669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Quanlong Huang updated IMPALA-10669: ------------------------------------ Comment: was deleted (was: Also saw this in a build: https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/14799 {code} Loading nested orc data (logging to /home/ubuntu/Impala/logs/data_loading/load-nested.log)... FAILED (Took: 2 min 16 sec) '/home/ubuntu/Impala/testdata/bin/load_nested.py -t tpch_nested_orc_def -f orc/def' failed. Tail of log: 2021-09-14 18:53:02,523 INFO:load_nested[348]:Executing: CREATE EXTERNAL TABLE supplier STORED AS orc TBLPROPERTIES('orc.compress' = 'ZLIB','external.table.purge'='TRUE') AS SELECT * FROM tmp_supplier Traceback (most recent call last): File "/home/ubuntu/Impala/testdata/bin/load_nested.py", line 415, in <module> load() File "/home/ubuntu/Impala/testdata/bin/load_nested.py", line 349, in load hive.execute(stmt) File "/home/ubuntu/Impala/tests/comparison/db_connection.py", line 206, in execute return self._cursor.execute(sql, *args, **kwargs) File "/home/ubuntu/Impala/infra/python/env-gcc7.5.0/lib/python2.7/site-packages/impala/hiveserver2.py", line 343, in execute self._wait_to_finish() # make execute synchronous File "/home/ubuntu/Impala/infra/python/env-gcc7.5.0/lib/python2.7/site-packages/impala/hiveserver2.py", line 427, in _wait_to_finish raise OperationalError(resp.errorMessage) impala.error.OperationalError: Error while compiling statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1631643014075_0039_2_00, diagnostics=[Task failed, taskId=task_1631643014075_0039_2_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Container container_1631643014075_0039_01_000002 finished with diagnostics set to [Container failed, exitCode=-104. [2021-09-14 18:53:07.711]Container [pid=106725,containerID=container_1631643014075_0039_01_000002] is running 14094336B beyond the 'PHYSICAL' memory limit. Current usage: 1.0 GB of 1 GB physical memory used; 2.5 GB of 2.1 GB virtual memory used. Killing container. Dump of the process-tree for container_1631643014075_0039_01_000002 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 106725 106723 106725 106725 (bash) 0 0 11546624 742 /bin/bash -c /usr/lib/jvm/java-8-openjdk-amd64/bin/java -Xmx819m -server -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir=/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1631643014075_0039/container_1631643014075_0039_01_000002 -Dtez.root.logger=INFO,CLA -Djava.io.tmpdir=/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/lib/hadoop-yarn/cache/ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1631643014075_0039/container_1631643014075_0039_01_000002/tmp org.apache.tez.runtime.task.TezChild localhost 33614 container_1631643014075_0039_01_000002 application_1631643014075_0039 1 1>/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1631643014075_0039/container_1631643014075_0039_01_000002/stdout 2>/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1631643014075_0039/container_1631643014075_0039_01_000002/stderr |- 106735 106725 106725 106725 (java) 1780 265 2673709056 264843 /usr/lib/jvm/java-8-openjdk-amd64/bin/java -Xmx819m -server -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir=/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1631643014075_0039/container_1631643014075_0039_01_000002 -Dtez.root.logger=INFO,CLA -Djava.io.tmpdir=/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/lib/hadoop-yarn/cache/ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1631643014075_0039/container_1631643014075_0039_01_000002/tmp org.apache.tez.runtime.task.TezChild localhost 33614 container_1631643014075_0039_01_000002 application_1631643014075_0039 1 [2021-09-14 18:53:07.719]Container killed on request. Exit code is 143 [2021-09-14 18:53:07.719]Container exited with a non-zero exit code 143. ]], TaskAttempt 1 failed, info=[Container container_1631643014075_0039_01_000003 finished with diagnostics set to [Container failed, exitCode=-104. [2021-09-14 18:53:16.803]Container [pid=106885,containerID=container_1631643014075_0039_01_000003] is running 21422080B beyond the 'PHYSICAL' memory limit. Current usage: 1.0 GB of 1 GB physical memory used; 2.5 GB of 2.1 GB virtual memory used. Killing container. Dump of the process-tree for container_1631643014075_0039_01_000003 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 106885 106883 106885 106885 (bash) 0 0 11546624 752 /bin/bash -c /usr/lib/jvm/java-8-openjdk-amd64/bin/java -Xmx819m -server -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir=/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1631643014075_0039/container_1631643014075_0039_01_000003 -Dtez.root.logger=INFO,CLA -Djava.io.tmpdir=/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/lib/hadoop-yarn/cache/ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1631643014075_0039/container_1631643014075_0039_01_000003/tmp org.apache.tez.runtime.task.TezChild localhost 33614 container_1631643014075_0039_01_000003 application_1631643014075_0039 1 1>/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1631643014075_0039/container_1631643014075_0039_01_000003/stdout 2>/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1631643014075_0039/container_1631643014075_0039_01_000003/stderr |- 106895 106885 106885 106885 (java) 1507 224 2661277696 266622 /usr/lib/jvm/java-8-openjdk-amd64/bin/java -Xmx819m -server -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir=/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1631643014075_0039/container_1631643014075_0039_01_000003 -Dtez.root.logger=INFO,CLA -Djava.io.tmpdir=/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/lib/hadoop-yarn/cache/ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1631643014075_0039/container_1631643014075_0039_01_000003/tmp org.apache.tez.runtime.task.TezChild localhost 33614 container_1631643014075_0039_01_000003 application_1631643014075_0039 1 [2021-09-14 18:53:16.811]Container killed on request. Exit code is 143 [2021-09-14 18:53:16.811]Container exited with a non-zero exit code 143. ]], TaskAttempt 2 failed, info=[Container container_1631643014075_0039_01_000004 finished with diagnostics set to [Container failed, exitCode=-104. [2021-09-14 18:53:25.895]Container [pid=107002,containerID=container_1631643014075_0039_01_000004] is running 7229440B beyond the 'PHYSICAL' memory limit. Current usage: 1.0 GB of 1 GB physical memory used; 2.5 GB of 2.1 GB virtual memory used. Killing container. Dump of the process-tree for container_1631643014075_0039_01_000004 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 107002 107000 107002 107002 (bash) 0 0 11546624 767 /bin/bash -c /usr/lib/jvm/java-8-openjdk-amd64/bin/java -Xmx819m -server -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir=/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1631643014075_0039/container_1631643014075_0039_01_000004 -Dtez.root.logger=INFO,CLA -Djava.io.tmpdir=/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/lib/hadoop-yarn/cache/ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1631643014075_0039/container_1631643014075_0039_01_000004/tmp org.apache.tez.runtime.task.TezChild localhost 33614 container_1631643014075_0039_01_000004 application_1631643014075_0039 1 1>/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1631643014075_0039/container_1631643014075_0039_01_000004/stdout 2>/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1631643014075_0039/container_1631643014075_0039_01_000004/stderr |- 107012 107002 107002 107002 (java) 1508 227 2689318912 263142 /usr/lib/jvm/java-8-openjdk-amd64/bin/java -Xmx819m -server -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir=/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1631643014075_0039/container_1631643014075_0039_01_000004 -Dtez.root.logger=INFO,CLA -Djava.io.tmpdir=/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/lib/hadoop-yarn/cache/ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1631643014075_0039/container_1631643014075_0039_01_000004/tmp org.apache.tez.runtime.task.TezChild localhost 33614 container_1631643014075_0039_01_000004 application_1631643014075_0039 1 [2021-09-14 18:53:25.904]Container killed on request. Exit code is 143 [2021-09-14 18:53:25.904]Container exited with a non-zero exit code 143. ]], TaskAttempt 3 failed, info=[Container container_1631643014075_0039_01_000005 finished with diagnostics set to [Container failed, exitCode=-104. [2021-09-14 18:53:34.986]Container [pid=107126,containerID=container_1631643014075_0039_01_000005] is running 12509184B beyond the 'PHYSICAL' memory limit. Current usage: 1.0 GB of 1 GB physical memory used; 2.5 GB of 2.1 GB virtual memory used. Killing container. Dump of the process-tree for container_1631643014075_0039_01_000005 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 107126 107124 107126 107126 (bash) 0 0 11546624 756 /bin/bash -c /usr/lib/jvm/java-8-openjdk-amd64/bin/java -Xmx819m -server -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir=/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1631643014075_0039/container_1631643014075_0039_01_000005 -Dtez.root.logger=INFO,CLA -Djava.io.tmpdir=/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/lib/hadoop-yarn/cache/ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1631643014075_0039/container_1631643014075_0039_01_000005/tmp org.apache.tez.runtime.task.TezChild localhost 33614 container_1631643014075_0039_01_000005 application_1631643014075_0039 1 1>/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1631643014075_0039/container_1631643014075_0039_01_000005/stdout 2>/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1631643014075_0039/container_1631643014075_0039_01_000005/stderr |- 107136 107126 107126 107126 (java) 1509 250 2666205184 264442 /usr/lib/jvm/java-8-openjdk-amd64/bin/java -Xmx819m -server -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir=/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1631643014075_0039/container_1631643014075_0039_01_000005 -Dtez.root.logger=INFO,CLA -Djava.io.tmpdir=/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/lib/hadoop-yarn/cache/ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1631643014075_0039/container_1631643014075_0039_01_000005/tmp org.apache.tez.runtime.task.TezChild localhost 33614 container_1631643014075_0039_01_000005 application_1631643014075_0039 1 [2021-09-14 18:53:34.994]Container killed on request. Exit code is 143 [2021-09-14 18:53:34.994]Container exited with a non-zero exit code 143. ]]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1631643014075_0039_2_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0 {code}) > Loading nested ORC data is flaky during Docker-based tests > ---------------------------------------------------------- > > Key: IMPALA-10669 > URL: https://issues.apache.org/jira/browse/IMPALA-10669 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure > Affects Versions: Impala 4.0.0 > Reporter: Laszlo Gaal > Assignee: Laszlo Gaal > Priority: Major > > Docker-based tests (using {{docker/test-wirh-docker.py}} often fail in the > dataload phase when trying to load ORC tables with complex types. The failure > happens quite often (at least in about 50% of the runs), and when it happens, > the failure pattern is quite consistent: it is always a Tez container > overrunning its allotted memory. > The signature is: > {code} > 2021-04-18 13:32:19.551921 [2021-04-18 13:31:51.355]Container killed on > request. Exit code is 143 > 2021-04-18 13:32:19.551966 [2021-04-18 13:31:51.356]Container exited with a > non-zero exit code 143. > 2021-04-18 13:32:19.552181 ]], TaskAttempt 1 failed, info=[Container > container_1618776748992_0039_01_000003 finished with diagnostics set to > [Container failed, exitCode=-104. [2021-04-18 13:32:00.379]Container > [pid=11530,containerID=container_1618776748992_0039_01_000003] is running > 2785280B beyond the 'PHYSICAL' memory limit. Current usage: 1.0 GB of 1 GB > physical memory used; 2.6 GB of 2.1 GB virtual memory used. Killing container. > 2021-04-18 13:32:19.552224 Dump of the process-tree for > container_1618776748992_0039_01_000003 : > 2021-04-18 13:32:19.552298 |- PID PPID PGRPID SESSID CMD_NAME > USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) > RSSMEM_USAGE(PAGES) FULL_CMD_LINE > 2021-04-18 13:32:19.552753 |- 11540 11530 11530 11530 (java) 2048 85 > 2761297920 262152 > /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.282.b08-2.el8_3.x86_64/bin/java > -Xmx819m -server -Djava.net.preferIPv4Stack=true > -Dhadoop.metrics.log.level=WARN > -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator > -Dlog4j.configuration=tez-container-log4j.properties > -Dyarn.app.container.log.dir=/home/impdev/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1618776748992_0039/container_1618776748992_0039_01_000003 > -Dtez.root.logger=INFO,CLA > -Djava.io.tmpdir=/home/impdev/Impala/testdata/cluster/cdh7/node-1/var/lib/hadoop-yarn/cache/impdev/nm-local-dir/usercache/impdev/appcache/application_1618776748992_0039/container_1618776748992_0039_01_000003/tmp > org.apache.tez.runtime.task.TezChild localhost 38999 > container_1618776748992_0039_01_000003 application_1618776748992_0039 1 > 2021-04-18 13:32:19.553375 |- 11530 11528 11530 11530 (bash) 0 0 10010624 > 672 /bin/bash -c > /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.282.b08-2.el8_3.x86_64/bin/java > -Xmx819m -server -Djava.net.preferIPv4Stack=true > -Dhadoop.metrics.log.level=WARN > -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator > -Dlog4j.configuration=tez-container-log4j.properties > -Dyarn.app.container.log.dir=/home/impdev/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1618776748992_0039/container_1618776748992_0039_01_000003 > -Dtez.root.logger=INFO,CLA > -Djava.io.tmpdir=/home/impdev/Impala/testdata/cluster/cdh7/node-1/var/lib/hadoop-yarn/cache/impdev/nm-local-dir/usercache/impdev/appcache/application_1618776748992_0039/container_1618776748992_0039_01_000003/tmp > org.apache.tez.runtime.task.TezChild localhost 38999 > container_1618776748992_0039_01_000003 application_1618776748992_0039 1 > 1>/home/impdev/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1618776748992_0039/container_1618776748992_0039_01_000003/stdout > > 2>/home/impdev/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1618776748992_0039/container_1618776748992_0039_01_000003/stderr > > {code} > The failure has only been seen on AWS m5.12xl instances so far, which have > 192GB of RAM, all of which is available to the initial container doing the > compile/link and dataload phases of a test run. > The same code runs with no problems on m5.4xl (64GB RAM) and r5.4xl (128GB > RAM) instances during other build jobs. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org