If I run it by hand instead of via systemd, everything works fine. If I edit the .service file and change LimitNOFILE=infinity to LimitNOFILE=524288, everything works fine.
Interestingly, in a root shell, I get "operation not permitted" when trying to do `ulimit -n unlimited`. The highest value I can set it to is 1073741816, which is the value of the fs.nr_open sysctl. -Mat On 7/21/20 12:45 PM, Bockelman, Brian wrote: > Ohh - is that in the middle of "close() all FDs possible" code? > > Does "strace" show a lot of close() followed by EBADF? What's the process > limit on FDs? > > Brian > >> On Jul 21, 2020, at 12:41 PM, Mátyás Selmeci <mat...@cs.wisc.edu> wrote: >> >> Here's the pstack of the child: >> >> >> #0 0x00007fd5c788fa17 in close () from /usr/lib64/libpthread.so.0 >> #1 0x00007fd5c83eb03e in CreateProcessForkit::exec() () from >> /usr/lib64/libcondor_utils_8_9_8.so >> #2 0x00007fd5c83eb89c in CreateProcessForkit::fork_exec() () from >> /usr/lib64/libcondor_utils_8_9_8.so >> #3 0x00007fd5c83f85cb in DaemonCore::Create_Process(char const*, ArgList >> const&, priv_state, int, int, int, Env const*, char const*, FamilyInfo*, >> Stream**, int*, int*, int, __sigset_t*, int, unsigned long*, int*, char >> const*, MyString*, FilesystemRemap*, long) () from >> /usr/lib64/libcondor_utils_8_9_8.so >> #4 0x00007fd5c82da64b in ProcFamilyProxy::start_procd() () from >> /usr/lib64/libcondor_utils_8_9_8.so >> #5 0x00007fd5c82db283 in ProcFamilyProxy::ProcFamilyProxy(char const*) () >> from /usr/lib64/libcondor_utils_8_9_8.so >> #6 0x00007fd5c82d9e18 in ProcFamilyInterface::create(char const*) () from >> /usr/lib64/libcondor_utils_8_9_8.so >> #7 0x00007fd5c83f9236 in DaemonCore::Create_Process(char const*, ArgList >> const&, priv_state, int, int, int, Env const*, char const*, FamilyInfo*, >> Stream**, int*, int*, int, __sigset_t*, int, unsigned long*, int*, char >> const*, MyString*, FilesystemRemap*, long) () from >> /usr/lib64/libcondor_utils_8_9_8.so >> #8 0x0000000000416315 in daemon::RealStart() () >> #9 0x0000000000416f3a in Daemons::StartDaemonHere(daemon*) () >> #10 0x0000000000416fe3 in Daemons::StartAllDaemons() () >> #11 0x000000000040ebbe in main_init(int, char**) () >> #12 0x00007fd5c8403468 in dc_main(int, char**) () from >> /usr/lib64/libcondor_utils_8_9_8.so >> #13 0x00007fd5c76d9042 in __libc_start_main () from /usr/lib64/libc.so.6 >> #14 0x000000000040b90e in _start () >> >> >> On 7/21/20 12:34 PM, Bockelman, Brian wrote: >>> Hi Mat, >>> >>> Could you do a "pstack" of the child condor_master process? >>> >>> Unfortunately, from your traceback, it like the master is simply waiting >>> for the child to do something (either exec or error out) -- not too much >>> info there. >>> >>> Brian >>> >>>> On Jul 21, 2020, at 12:05 PM, Mátyás Selmeci via HTCondor-devel >>>> <htcondor-devel@cs.wisc.edu> wrote: >>>> >>>> Hey folks, >>>> >>>> I've got a problem running 8.9.8 on my Fedora 32 laptop (I'm using an >>>> RPM Tim gave me from an NMI build): when I start condor, the master >>>> forks and the child master gets into an infinite loop, eating an entire >>>> CPU and not responding to SIGTERM. The last line in the MasterLog is: >>>> >>>> 07/21/20 11:46:56 (fd:1) (pid:233863) (D_DAEMONCORE) About to exec >>>> "/usr/sbin/condor_procd" >>>> >>>> SELinux is off. I attached my MasterLog with D_ALL:2 and >>>> condor_config_val -summary (that feature's great). The traceback >>>> at the end of MasterLog is me killing sending SIGABRT to both >>>> condor_master processes. >>>> >>>> Any ideas? >>>> >>>> Thanks, >>>> -Mat >>>> <MasterLog.txt><summary.txt>_______________________________________________ _______________________________________________ HTCondor-devel mailing list HTCondor-devel@cs.wisc.edu https://lists.cs.wisc.edu/mailman/listinfo/htcondor-devel