(sorry for the late reply, this fell through the cracks..)

On 10.08.2011 11:48, Dave Page wrote:
On Thu, Aug 4, 2011 at 8:30 PM, Merlin Moncure<[email protected]>  wrote:
On Thu, Aug 4, 2011 at 2:19 PM, Heikki Linnakangas
<[email protected]>  wrote:
I created 100 identical pgagent jobs, with one step that simply does "SELECT
pg_sleep(10)". I then forced them all to run immediately, with "UPDATE
pgagent.pga_job SET jobnextrun=now();". pgagent crashed.

What happened is that the when all those jobs are launched at the same time,
the server ran into the max_connections limit, and pgagent didn't handle
that too well. JobThread::JobThread constructor does not check for NULL
result from DBConn::Get(), and passes a NULL connection to Job::Job, which
tries to reference it, leading to a segfault.

I propose the attached patch.

hm, in the event that happens, is that logged in the client somehow?
wouldn't you want to throw an exception or something like that?

I think the most straightforward way to handle this is to dump an
error into pgagent.pga_joblog when deleting the thread. Might be a
little ugly to pass the original error message back rather than a
generic one though. Can you take a look Heikki?

You mean something like the attached? Works for me, but inserting an entry in joblog for each failed attempt might create a lot of entries there, if the problem persists.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/job.cpp b/job.cpp
index e4d7784..9d02977 100644
--- a/job.cpp
+++ b/job.cpp
@@ -354,11 +354,13 @@ JobThread::JobThread(const wxString &jid)
 	jobid = jid;
 
 	DBconn *threadConn = DBconn::Get(DBconn::GetBasicConnectString(), serviceDBname);
-	job = new Job(threadConn, jobid);
-
-	if (job->Runnable())
-		runnable = true;
+	if (threadConn)
+	{
+		job = new Job(threadConn, jobid);
 
+		if (job->Runnable())
+			runnable = true;
+	}
 }
 
 
diff --git a/pgAgent.cpp b/pgAgent.cpp
index ce15e08..ff9d069 100644
--- a/pgAgent.cpp
+++ b/pgAgent.cpp
@@ -109,8 +109,23 @@ int MainRestartLoop(DBconn *serviceConn)
 					jt->Run();
 					foundJobToExecute = true;
 				}
+				else
+				{
+					// Failed to launch the thread. Insert an entry with
+					// "internal error" status in the joblog table, to leave
+					// a trace of fact that we tried to launch the job.
+					DBresult *res = serviceConn->Execute(
+						wxT("INSERT INTO pgagent.pga_joblog(jlgid, jlgjobid, jlgstatus) ")
+						wxT("VALUES (nextval('pgagent.pga_joblog_jlgid_seq'), ") + jobid + wxT(", 'i')"));
+					if (res)
+						delete res;
+
+					// A thread object that's started will destroy itself when
+					// it's finished, but one that never starts we'll have to
+					// destory ourselves.
+					delete jt;
+				}
 				res->MoveNext();
-
 			}
 
 			delete res;
-- 
Sent via pgadmin-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgadmin-hackers

Reply via email to