Hi Jacques,
The TL;DR is that the ofbiz process ran out of memory and was killed by the
kernel.
The original limit of 1600MB was being hit, causing an immediate rather
than graceful kill. This limit has been lifted to 2400MB.
The following shows the steps taken to look into why OFBiz had stopped once
I had ssh'd to ofbiz-vm1, became root and changed to the
/home/ofbizdocker/demo-trunk directory
root@ofbiz-vm1:/home/ofbizdocker/demo-trunk# date
Wed Mar 29 18:44:16 UTC 2023
root@ofbiz-vm1:/home/ofbizdocker/demo-trunk# docker compose ps -a
NAME IMAGE COMMAND
SERVICE CREATED STATUS
PORTS
demo-trunk-db-1 postgres:13
"docker-entrypoint.s…" db 16 hours ago Up 16
hours 5432/tcp
demo-trunk-ofbiz-1 ghcr.io/apache/ofbiz:trunk-plugins-snapshot
"/ofbiz/docker-entry…" ofbiz 16 hours ago Exited
(137) 7 hours ago
The demo-trunk-ofbiz-1 container stopped 7 hours ago, but not much
information to go on here.
Reviewing the logs with `docker compose logs` does not reveal anything
about why the demo-trunk-ofbiz-1 container shut down.
Next we'll try inspecting the stopped container:
root@ofbiz-vm1:/home/ofbizdocker/demo-trunk# docker inspect
demo-trunk-ofbiz-1
[{
"Id":
"8afc7a9b0a1f34c7a72f4b24f4bb9f2679233821f1a9697302719ac0a144b092",
"Created": "2023-03-29T02:35:03.522931049Z",
"Path": "/ofbiz/docker-entrypoint.sh",
"Args": [
"bin/ofbiz"
],
"State": {
"Status": "exited",
"Running": false,
"Paused": false,
"Restarting": false,
"OOMKilled": true,
"Dead": false,
"Pid": 0,
"ExitCode": 137,
"Error": "",
"StartedAt": "2023-03-29T02:35:04.063103982Z",
"FinishedAt": "2023-03-29T11:18:12.827856157Z"
[... snip ...]
From the above we see the container was OOMKilled (out of memory)
It is likely that the OFBiz deployment attempted to exceed its 1600MB limit
defined in the docker-compose.yml file.
root@ofbiz-vm1:/home/ofbizdocker/demo-trunk# cat docker-compose.yml
version: "2.4"
services:
db:
image: postgres:13
mem_limit: 300M
memswap_limit: 300M
cpu_shares: 200
restart: "no"
volumes:
- ./postgres-initdb.d:/docker-entrypoint-initdb.d
env_file:
- postgres.env
- ofbiz-postgres.env
ofbiz:
image: ghcr.io/apache/ofbiz:trunk-plugins-snapshot
mem_limit: 1600M
memswap_limit: 1600M
cpu_shares: 200
ports:
- 127.0.0.1:8009:8009
volumes:
-
./after-config-applied.d:/docker-entrypoint-hooks/after-config-applied.d
env_file:
- ofbiz-postgres.env
environment:
OFBIZ_HOST: demo-trunk.ofbiz.apache.org
OFBIZ_ENABLE_AJP_PORT: 1
OFBIZ_DATA_LOAD: demo
OFBIZ_CONTENT_URL_PREFIX: https://demo-trunk.ofbiz.apache.org
Now that we know ofbiz will killed due to running out of memory at 11:18:12
UTC, we can check the logs again to see what requests were being made
around this time.
root@ofbiz-vm1:/home/ofbizdocker/demo-trunk#
demo-trunk-ofbiz-1 | 2023-03-29 11:18:05,237 |-0.0.0.0-8009-exec-1
|ControlServlet |T| [[[ordermgr::ajaxSetUserPreference
(Domain:https://demo-trunk.ofbiz.apache.org)] Request Begun,
encoding=[UTF-8]- total:0.0,since last(Begin):0.0]]
demo-trunk-ofbiz-1 | 2023-03-29 11:18:05,241 |-0.0.0.0-8009-exec-1
|ServiceDispatcher |T| Sync service
[order/preferenceGetSetPermission] finished in [1] milliseconds
demo-trunk-ofbiz-1 | 2023-03-29 11:18:05,244 |-0.0.0.0-8009-exec-1
|ServiceDispatcher |T| Sync service [order/setUserPreference]
finished in [3] milliseconds
demo-trunk-ofbiz-1 | 2023-03-29 11:18:05,244 |-0.0.0.0-8009-exec-1
|RequestHandler |I| Ran Event [service:#setUserPreference]
from [request], result is [success]
demo-trunk-ofbiz-1 | 2023-03-29 11:18:05,244 |-0.0.0.0-8009-exec-1
|ServerHitBin |I| Visit delegatorName=default,
ServerHitBin delegatorName=default
demo-trunk-ofbiz-1 | 2023-03-29 11:18:05,246 |-0.0.0.0-8009-exec-1
|ControlServlet |T| [[[ordermgr::ajaxSetUserPreference
(Domain:https://demo-trunk.ofbiz.apache.org)] Request Done-
total:0.009,since last([ordermgr::ajaxSe...):0.009]]
demo-trunk-ofbiz-1 | 2023-03-29 11:18:12,294 |-0.0.0.0-8009-exec-8
|ControlServlet |T| [[[ordermgr::createOrderConversation
(Domain:https://demo-trunk.ofbiz.apache.org)] Request Begun,
encoding=[UTF-8]- total:0.0,since last(Begin):0.0]]
demo-trunk-ofbiz-1 | 2023-03-29 11:18:12,314 |-0.0.0.0-8009-exec-8
|ConfigXMLReader |I| controller loaded: 0.0s, 0 requests, 0
views in
file:/ofbiz/framework/common/webcommon/WEB-INF/handlers-controller.xml
demo-trunk-ofbiz-1 | 2023-03-29 11:18:12,315 |-0.0.0.0-8009-exec-8
|ConfigXMLReader |I| controller loaded: 0.006s, 52 requests,
21 views in
file:/ofbiz/framework/common/webcommon/WEB-INF/common-controller.xml
demo-trunk-ofbiz-1 | 2023-03-29 11:18:12,320 |-0.0.0.0-8009-exec-8
|ConfigXMLReader |I| controller loaded: 0.0s, 4 requests, 0
views in file:/ofbiz/applications/commonext/webapp/WEB-INF/controller.xml
demo-trunk-ofbiz-1 | 2023-03-29 11:18:12,376 |-0.0.0.0-8009-exec-8
|ConfigXMLReader |I| controller loaded: 0.004s, 347 requests,
134 views in
file:/ofbiz/applications/content/webapp/content/WEB-INF/controller.xml
demo-trunk-ofbiz-1 | 2023-03-29 11:18:12,380 |-0.0.0.0-8009-exec-8
|ConfigXMLReader |I| controller loaded: 0.077s, 380 requests,
153 views in
file:/ofbiz/applications/order/webapp/ordermgr/WEB-INF/controller.xml
demo-trunk-ofbiz-1 | 2023-03-29 11:18:12,384 |-0.0.0.0-8009-exec-8
|ServiceDispatcher |E| Incoming context (in runSync :
createOrderConversation) does not match expected requirements
demo-trunk-ofbiz-1 | org.apache.ofbiz.service.ServiceValidationException:
The following required parameter is missing: communicationEventPrpTypId
demo-trunk-ofbiz-1 | at
org.apache.ofbiz.service.ModelService.resolveRequiredValues(ModelService.java:1270)
~[ofbiz.jar:?]
demo-trunk-ofbiz-1 | at
org.apache.ofbiz.service.ModelService.validate(ModelService.java:1150)
~[ofbiz.jar:?]
.
.
.
demo-trunk-ofbiz-1 | at
org.apache.coyote.ajp.AjpProcessor.service(AjpProcessor.java:433)
~[tomcat-coyote-9.0.72.jar:9.0.72]
demo-trunk-ofbiz-1 | at
org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:63)
~[tomcat-coyote-9.0.72.jar:9.0.72]
demo-trunk-ofbiz-1 | at
org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:926)
~[tomcat-coyote-9.0.72.jar:9.0.72]
demo-trunk-ofbiz-1 | at
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1791)
~[tomcat-coyote-9.0.72.jar:9.0.72]
demo-trunk-ofbiz-1 | at
org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
~[tomcat-coyote-9.0.72.jar:9.0.72]
demo-trunk-ofbiz-1 | at
org.apache.tomcat.util.threads.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1191)
~[tomcat-util-9.0.72.jar:9.0.72]
demo-trunk-ofbiz-1 | at
org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:659)
~[tomcat-util-9.0.72.jar:9.0.72]
demo-trunk-ofbiz-1 | at
org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
~[tomcat-util-9.0.72.jar:9.0.72]
demo-trunk-ofbiz-1 | at java.lang.Thread.run(Thread.java:833) ~[?:?]
demo-trunk-ofbiz-1 | 2023-03-29 11:18:12,385 |-0.0.0.0-8009-exec-8
|ServiceDispatcher |T| [[Sync service failed...-
total:0.0,since last(Begin):0.0]] - 'order / createOrderConversation'
demo-trunk-ofbiz-1 | 2023-03-29 11:18:12,385 |-0.0.0.0-8009-exec-8
|TransactionUtil |I| Transaction rolled back
demo-trunk-ofbiz-1 | 2023-03-29 11:18:12,385 |-0.0.0.0-8009-exec-8
|TransactionUtil |W| Not committing transaction, status is No
Transaction (6)
demo-trunk-ofbiz-1 | 2023-03-29 11:18:12,385 |-0.0.0.0-8009-exec-8
|RequestHandler |I| Ran Event
[service:#createOrderConversation] from [request], result is [error]
demo-trunk-ofbiz-1 | 2023-03-29 11:18:12,385 |-0.0.0.0-8009-exec-8
|RequestHandler |E| Request createOrderConversation caused
an error with the following message: [The following required parameter is
missing: communicationEventPrpTypId]
demo-trunk-ofbiz-1 | 2023-03-29 11:18:12,386 |-0.0.0.0-8009-exec-8
|RequestHandler |I| Sending redirect to: [
https://demo-trunk.ofbiz.apache.org:443/ordermgr/control/orderview?orderId=WSCO10012].
Hidden sessionId by default.
demo-trunk-ofbiz-1 | 2023-03-29 11:18:12,395 |-0.0.0.0-8009-exec-8
|ServerHitBin |I| Visit delegatorName=default,
ServerHitBin delegatorName=default
demo-trunk-ofbiz-1 | 2023-03-29 11:18:12,397 |-0.0.0.0-8009-exec-8
|ControlServlet |T| [[[ordermgr::createOrderConversation
(Domain:https://demo-trunk.ofbiz.apache.org)] Request Done-
total:0.103,since last([ordermgr::create...):0.103]]
So it looks like a request was made to createOrderConversation which failed
and issued a redirect to
https://demo-trunk.ofbiz.apache.org:443/ordermgr/control/orderview?orderId=WSCO10012.
And then the logs show nothing else.
Next we can take a look at /var/log/kern.log around 11:18:12 UTC to see if
we can find out why the ofbiz service was killed.
root@ofbiz-vm1:/home/ofbizdocker/demo-trunk# cd /var/log
root@ofbiz-vm1:/var/log# less kern.log
Mar 29 11:18:12 ofbiz-vm1 kernel: [3983121.039472] C2 CompilerThre invoked
oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
Mar 29 11:18:12 ofbiz-vm1 kernel: [3983121.039481] CPU: 0 PID: 3444767
Comm: C2 CompilerThre Not tainted 5.15.0-1028-aws #32~20.04.1-Ubuntu
Mar 29 11:18:12 ofbiz-vm1 kernel: [3983121.039484] Hardware name: Xen HVM
domU, BIOS 4.11.amazon 08/24/2006
Mar 29 11:18:12 ofbiz-vm1 kernel: [3983121.039486] Call Trace:
Mar 29 11:18:12 ofbiz-vm1 kernel: [3983121.039489] <TASK>
Mar 29 11:18:12 ofbiz-vm1 kernel: [3983121.039492] dump_stack_lvl+0x4a/0x63
Mar 29 11:18:12 ofbiz-vm1 kernel: [3983121.039498] dump_stack+0x10/0x16
Mar 29 11:18:12 ofbiz-vm1 kernel: [3983121.039501] dump_header+0x53/0x225
Mar 29 11:18:12 ofbiz-vm1 kernel: [3983121.039505]
oom_kill_process.cold+0xb/0x10
Mar 29 11:18:12 ofbiz-vm1 kernel: [3983121.039509]
out_of_memory+0x1dc/0x530
.
.
.
Mar 29 11:18:12 ofbiz-vm1 kernel: [3983121.039576] memory: usage 1638400kB,
limit 1638400kB, failcnt 0
Mar 29 11:18:12 ofbiz-vm1 kernel: [3983121.039578] memory+swap: usage
1638400kB, limit 1638400kB, failcnt 1441
Mar 29 11:18:12 ofbiz-vm1 kernel: [3983121.039579] kmem: usage 8120kB,
limit 9007199254740988kB, failcnt 0
Mar 29 11:18:12 ofbiz-vm1 kernel: [3983121.039581] Memory cgroup stats for
/docker/8afc7a9b0a1f34c7a72f4b24f4bb9f2679233821f1a9697302719ac0a144b092:
.
.
.
Mar 29 11:18:12 ofbiz-vm1 kernel: [3983121.039598] Tasks state (memory
values in pages):
Mar 29 11:18:12 ofbiz-vm1 kernel: [3983121.039599] [ pid ] uid tgid
total_vm rss pgtables_bytes swapents oom_score_adj name
Mar 29 11:18:12 ofbiz-vm1 kernel: [3983121.039604] [3442255] 1000 3442255
1744968 417268 4272128 0 0 java
Mar 29 11:18:12 ofbiz-vm1 kernel: [3983121.039607]
oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=8afc7a9b0a1f34c7a72f4b24f4bb9f2679233821f1a9697302719ac0a144b092,mems_allowed=0,oom_memcg=/docker/8afc7a9b0a1f34c7a72f4b24f4bb9f2679233821f1a9697302719ac0a144b092,task_memcg=/docker/8afc7a9b0a1f34c7a72f4b24f4bb9f2679233821f1a9697302719ac0a144b092,task=java,pid=3442255,uid=1000
Mar 29 11:18:12 ofbiz-vm1 kernel: [3983121.039805] Memory cgroup out of
memory: Killed process 3442255 (java) total-vm:6979872kB,
anon-rss:1624552kB, file-rss:44520kB, shmem-rss:0kB, UID:1000
pgtables:4172kB oom_score_adj:0
From the above logs we have line,
"Memory cgroup out of memory: Killed process 3442255 (java)
total-vm:6979872kB, anon-rss:1624552kB, file-rss:44520kB, shmem-rss:0kB,
UID:1000 pgtables:4172kB oom_score_adj:0"
This tells us that process 3442255, a java process was killed.
Further up in the log we have line:
"memory: usage 1638400kB, limit 1638400kB, failcnt 0"
Basically our ofbiz java process was using 1600MB, hitting its memory
limit. This is surprising though as our ofbiz processes launch with
parameters: -Xms128M -Xmx1024M
These parameters should limit heap usage to 1024MB. Perhaps other parts of
the Java VM are using over 600MB to manage themselves?
As a work around I will increase the limit assigned to the docker ofbiz
container, but it would be good to be able to predict expected memory usage
by OFBiz.
Once OFBiz was up and running again, there were still problems connecting
to https://demo-trunk.ofbiz.apache.org/partymgr.
[Wed Mar 29 19:27:01.264143 2023] [proxy_ajp:error] [pid 3849696:tid
139767516272384] (120006)APR does not understand this error code: [client
185.191.171.12:2680] AH00878: read response failed from 127.0.0.1:8009
(localhost)
It seems that Apache HTTPd did not like the responses it was getting from
OFBiz Tomcat on port 8009.
I didn't know how to investigate this issue any further, so ended up
removing and recreating both the database and ofbiz containers. Once this
was done demo-trunk appears to be working normally.
Any suggestions on how to investigate the issue with the AJP port if
similar problems occur will be gratefully received.
Thanks,
Dan.