Looks like it was the OOM reaper killing the task. Thanks for pointing me in the right direction Brian!
David Gersting Linux Systems Administrator WVU Information Technology Services On 12/12/16 1:13 PM, Brian Bouterse wrote: > I think the celery worker is experiencing a segfault or maybe it's > being killed by the OOM. If the OOM is killing it there would be log > evidence. If it's a segfault, with Python a segfault is unlikely, so > this is probably a segfault while calling to the system using > subprocess which Pulp does in various places. I haven't looked in the > publish code of platform and rpm to look for subprocess usage but that > would probably hint at the problem. To really debug something like > that you would want capture a coredump. I think celery has the ability > to capture coredumps, but I've never done it. > > The pulp-smash tests for publish showed they were working. Is it > possible that this could be an environment issue? Is it possible to > reproduce the issue on separate hardware to rule that out. If it is > reproducable, I recommend opening a bug [0]. > > [0]: https://pulp.plan.io/projects/pulp/issues/new > <https://pulp.plan.io/projects/pulp/issues/new> > > -Brian > > On Mon, Dec 12, 2016 at 11:49 AM, David Gersting > <[email protected] <mailto:[email protected]>> wrote: > > Hello everyone, > > I've been banging my head against the desk for a while on this > one, and > could use the group's help. > > I have a rather large repo (OEL 6's base repo with 36,684 RPMs) > that I'm > trying to mirror locally to speed up our os patching, and every time I > try to publish the repo the task fails just after the "Publishing > Delta > RPMs" step starts. After some digging it seems to me that the > worker is > timing out. Has anyone else seen this and/or know how I can fix it or > increase the timeout for this task? > > I've attached the full shell output for anyone who wants it, but the > error message I'm seeing from the worker is: > # journalctl --unit=pulp_worker-5 > *SNIP* > Dec 12 10:48:19 *HOSTNAME* pulp[1403]: celery.worker.job:ERROR: > (1403-27776) Task > > pulp.server.managers.repo.publish.publish[e3d25854-757c-40af-8979-d0b7287263ed] > raised unexpected: WorkerLostError('Worker exited prematurely: > signal 9 > (SIGKILL).',) > Dec 12 10:48:19 *HOSTNAME* pulp[1403]: celery.worker.job:ERROR: > (1403-27776) Traceback (most recent call last): > Dec 12 10:48:19 *HOSTNAME* pulp[1403]: celery.worker.job:ERROR: > (1403-27776) File > "/usr/lib64/python2.7/site-packages/billiard/pool.py", line 1171, in > mark_as_worker_lost > Dec 12 10:48:19 *HOSTNAME* pulp[1403]: celery.worker.job:ERROR: > (1403-27776) human_status(exitcode)), > Dec 12 10:48:19 *HOSTNAME* pulp[1403]: celery.worker.job:ERROR: > (1403-27776) WorkerLostError: Worker exited prematurely: signal 9 > (SIGKILL). > Dec 12 10:48:21 *HOSTNAME* pulp[49191]: py.warnings:WARNING: > (49191-27776) > /usr/lib64/python2.7/site-packages/pymongo/topology.py:74: > UserWarning: MongoClient opened before fork. Create MongoClient with > connect=False, or create client after forking. Se > Dec 12 10:48:21 *HOSTNAME* pulp[49191]: py.warnings:WARNING: > (49191-27776) "MongoClient opened before fork. Create MongoClient " > Dec 12 10:48:21 *HOSTNAME* pulp[49191]: py.warnings:WARNING: > (49191-27776) > Dec 12 10:48:22 *HOSTNAME* pulp[49191]: > pulp.server.async.tasks:INFO: Task failed : > [e3d25854-757c-40af-8979-d0b7287263ed] > > > > Any help would be much appreciated! > > -- > David Gersting > Linux Systems Administrator > WVU Information Technology Services > > > _______________________________________________ > Pulp-list mailing list > [email protected] <mailto:[email protected]> > https://www.redhat.com/mailman/listinfo/pulp-list > <https://www.redhat.com/mailman/listinfo/pulp-list> > >
_______________________________________________ Pulp-list mailing list [email protected] https://www.redhat.com/mailman/listinfo/pulp-list
