(sorry for the late reply, this fell through the cracks..)
On 10.08.2011 11:48, Dave Page wrote:
On Thu, Aug 4, 2011 at 8:30 PM, Merlin Moncure<[email protected]> wrote:
On Thu, Aug 4, 2011 at 2:19 PM, Heikki Linnakangas
<[email protected]> wrote:
I created 100 identical pgagent jobs, with one step that simply does "SELECT
pg_sleep(10)". I then forced them all to run immediately, with "UPDATE
pgagent.pga_job SET jobnextrun=now();". pgagent crashed.
What happened is that the when all those jobs are launched at the same time,
the server ran into the max_connections limit, and pgagent didn't handle
that too well. JobThread::JobThread constructor does not check for NULL
result from DBConn::Get(), and passes a NULL connection to Job::Job, which
tries to reference it, leading to a segfault.
I propose the attached patch.
hm, in the event that happens, is that logged in the client somehow?
wouldn't you want to throw an exception or something like that?
I think the most straightforward way to handle this is to dump an
error into pgagent.pga_joblog when deleting the thread. Might be a
little ugly to pass the original error message back rather than a
generic one though. Can you take a look Heikki?
You mean something like the attached? Works for me, but inserting an
entry in joblog for each failed attempt might create a lot of entries
there, if the problem persists.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
diff --git a/job.cpp b/job.cpp
index e4d7784..9d02977 100644
--- a/job.cpp
+++ b/job.cpp
@@ -354,11 +354,13 @@ JobThread::JobThread(const wxString &jid)
jobid = jid;
DBconn *threadConn = DBconn::Get(DBconn::GetBasicConnectString(), serviceDBname);
- job = new Job(threadConn, jobid);
-
- if (job->Runnable())
- runnable = true;
+ if (threadConn)
+ {
+ job = new Job(threadConn, jobid);
+ if (job->Runnable())
+ runnable = true;
+ }
}
diff --git a/pgAgent.cpp b/pgAgent.cpp
index ce15e08..ff9d069 100644
--- a/pgAgent.cpp
+++ b/pgAgent.cpp
@@ -109,8 +109,23 @@ int MainRestartLoop(DBconn *serviceConn)
jt->Run();
foundJobToExecute = true;
}
+ else
+ {
+ // Failed to launch the thread. Insert an entry with
+ // "internal error" status in the joblog table, to leave
+ // a trace of fact that we tried to launch the job.
+ DBresult *res = serviceConn->Execute(
+ wxT("INSERT INTO pgagent.pga_joblog(jlgid, jlgjobid, jlgstatus) ")
+ wxT("VALUES (nextval('pgagent.pga_joblog_jlgid_seq'), ") + jobid + wxT(", 'i')"));
+ if (res)
+ delete res;
+
+ // A thread object that's started will destroy itself when
+ // it's finished, but one that never starts we'll have to
+ // destory ourselves.
+ delete jt;
+ }
res->MoveNext();
-
}
delete res;
--
Sent via pgadmin-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgadmin-hackers