Re: [GENERAL] queueing via database table?

2007-01-03 Thread Vivek Khera


On Jan 3, 2007, at 2:00 AM, Steve Atkins wrote:


Holding a lock while generating the thumbnail doesn't
sound like a great idea, and I think that the select
for update will end up serialising the requests.

I'd add a "rendering" field, text, defaulting
to an empty string.

Then do a "select for update where ... and rendering = '' limit 1",
update the rendering field to the hostname of the box doing the
work and commit. Render the thumbnail. Delete the record.

That'll also give you an easy way to show status of which
box is rendering which scene.

Depending on what else you're putting into the where clause
a partial index on something for records where rendering=''
might be helpful.


this is more or less how we do it, so i second this.

we also use NOTIFY/LISTEN to "wake up" the job processors when new  
work is added.




smime.p7s
Description: S/MIME cryptographic signature


Re: [GENERAL] queueing via database table?

2007-01-03 Thread Scott Marlowe
On Tue, 2007-01-02 at 22:34 -0800, Mark Harrison wrote:
> I have a cluster of CPUs generating thumbnails for
> a render farm.  I would like to place thumbnail
> requests on a queue, and have the cluster of client
> dequeue the requests and process them.
> 
> Of course, each request should be only dequeued once...
> if a thumbnail is being processed by one CPU, it
> shouldn't be processed by another CPU.
> 
> Does the following sound like a reasonable approach?
> If not, what's a good way to go?
> 
> The processes generating the requests will insert into
> a queue table.  They may add a priority and timestamp.
> 
> The several processes servicing the requests will do a
> SELECT FOR UPDATE where ... limit 1, generate thumbnail,
> delete the record and commit.

Here's what I'd do.  Create two sequences.  Sequence one is used to
assign ids to the thumbnail records when they're placed into the control
table.  The other is used to "check out" the records. 

A process selects nextval from sequence two, and then selects the
corresponding record info from the control table, and marks the record
as being in work.  When it's done, it marks it as done, and selects
another value from the second sequence and repeats the process.

Every now and then run a check program to look for thumbs that have been
missed or skipped and process them or assign them a new id from sequence
one to put them back into the queue.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] queueing via database table?

2007-01-03 Thread Ron Johnson
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 01/03/07 00:34, Mark Harrison wrote:
> I have a cluster of CPUs generating thumbnails for
> a render farm.  I would like to place thumbnail
> requests on a queue, and have the cluster of client
> dequeue the requests and process them.
> 
> Of course, each request should be only dequeued once...
> if a thumbnail is being processed by one CPU, it
> shouldn't be processed by another CPU.
> 
> Does the following sound like a reasonable approach?
> If not, what's a good way to go?
> 
> The processes generating the requests will insert into
> a queue table.  They may add a priority and timestamp.
> 
> The several processes servicing the requests will do a
> SELECT FOR UPDATE where ... limit 1, generate thumbnail,
> delete the record and commit.
> 
> Comments and suggestions welcome,

That's not what relational tables are good at.  Instead, use a
message queuing library with a file backing-store (so that if the
machine goes down for any reason, the messages are still in the queue).

In a message-passing system, a network-aware daemon manages a set of
named FIFO queues.  Some processes call in_q(), and other processes
(same machine, or not) call de_q().  If nothing is calling de_q(),
messages just pile up in the queue until such time as something
*does* start calling de_q().


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFFm504S9HxQb37XmcRAmj6AKDWa7Sx15wygoTc+/wOfLZIpqi4awCg34SZ
rkq1IEjdqu1zx0B5QyFW/n0=
=V0oF
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] queueing via database table?

2007-01-03 Thread Gregory S. Williamson
Mark --

As others have indicated, there may be some blocking issues with the approach 
you outlined.

A variant I have seen used in the past uses a table with a unique id for the 
job, the work queue it is in, a status flag, priority and at least one time 
stamp (and perhaps space for a process id).

Each client that wants work issues a request (SELECT FOR UPDATE) to get the 
next job in its queue that has a status flag of "Available" ordered by priority 
or initial time of creation, etc.; update that entry with the current timestamp 
(and perhaps the process id of the client) and set the status flag to show the 
job is now being worked on all in one transaction.

This releases the job but now with a changed status flag so other processes 
pulling work from the same queue won't see it anymore.

When the job finishes it selects its entry and updates the status flag and 
timestamp (and probably clears its process id). Logic for how to bump a job to 
the next step can be embedded in the client or in another process, depending on 
your needs.

It is useful to have a daemon or some other process to sweep the queue table 
and at least send an alert about stale or frozen jobs.

HTH,

Greg Williamson
DBA
GlobeXplorer LLC
-Original Message-
From:   [EMAIL PROTECTED] on behalf of Mark Harrison
Sent:   Tue 1/2/2007 10:34 PM
To: pgsql-general@postgresql.org
Cc: 
Subject:[GENERAL] queueing via database table?

I have a cluster of CPUs generating thumbnails for
a render farm.  I would like to place thumbnail
requests on a queue, and have the cluster of client
dequeue the requests and process them.

Of course, each request should be only dequeued once...
if a thumbnail is being processed by one CPU, it
shouldn't be processed by another CPU.

Does the following sound like a reasonable approach?
If not, what's a good way to go?

The processes generating the requests will insert into
a queue table.  They may add a priority and timestamp.

The several processes servicing the requests will do a
SELECT FOR UPDATE where ... limit 1, generate thumbnail,
delete the record and commit.

Comments and suggestions welcome,
Mark

-- 
Mark Harrison
Pixar Animation Studios

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


---
Click link below if it is SPAM [EMAIL PROTECTED]
"https://mailscanner.globexplorer.com/dspam/dspam.cgi?signatureID=459b5025191744846743324&[EMAIL
 PROTECTED]&retrain=spam&template=history&history_page=1"
!DSPAM:459b5025191744846743324!
---






---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] queueing via database table?

2007-01-03 Thread Dawid Kuroczko

On 1/3/07, Mark Harrison <[EMAIL PROTECTED]> wrote:

Does the following sound like a reasonable approach?
If not, what's a good way to go?

The processes generating the requests will insert into
a queue table.  They may add a priority and timestamp.

The several processes servicing the requests will do a
SELECT FOR UPDATE where ... limit 1, generate thumbnail,
delete the record and commit.


Well, this will block.  So it will mean that only one thumbnail
will be processed while running the transaction.

You may want to rather use SELECT FOR UPDATE NOWAIT,
probably "wrapped" into a PL/PgSQL function.  I did that and
I'm quite satisfied with this approach.

A simple implementation would be something like this:

CREATE OR REPLACE FUNCTION get_next() RETURNS int AS $$
 DECLARE
  r RECORD;
 BEGIN
   FOR r IN SELECT id FROM foo_table LIMIT 100 LOOP
  BEGIN
PERFORM id FROM foo_table WHERE id=r.id FOR UPDATE NOWAIT;
RETURN r.id;
  EXCEPTION
WHEN lock_not_available THEN -- do nothing
  END;
   END LOOP;
   RETURN NULL;
 END;
$$ LANGUAGE PLpgSQL;

Of course you should customize the query, and use better tuned limit.
I think good rule of the thumb size of LIMIT is twice the number of
simultaneous processing nodes working.  An ORDER BY might be
worh it or not, etc, etc.

Other approach might be using something like
 LOOP
   BEGIN
 SELECT id INTO i FROM foo_table LIMIT 1 OFFSET n FOR UPDATE NOWAIT;
 RETURN i;
   EXCEPTION
 WHEN lock_not_avaibale THEN -- do nothing;
   END;
   n := n + 1;
 END LOOP;

But I feel it will be slower most of the time.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] queueing via database table?

2007-01-03 Thread Richard Huxton

Steve Atkins wrote:


Holding a lock while generating the thumbnail doesn't
sound like a great idea, and I think that the select
for update will end up serialising the requests.

I'd add a "rendering" field, text, defaulting
to an empty string.

Then do a "select for update where ... and rendering = '' limit 1",
update the rendering field to the hostname of the box doing the
work and commit. Render the thumbnail. Delete the record.


Assuming each processing host keeps its connection open, I'd store the 
process-id instead (get via function pg_backend_pid()). Also have a 
separate status (pending|processing|done) and timestamps to track when 
each status is set. Only delete rows that have been marked "done" for a 
certain length of time.


This will let you spot when a host has stopped processing (e.g. crashed) 
and also let you measure throughput on particular hosts.


The other thing to be aware of is that queries of the SELECT FOR UPDATE 
LIMIT 1 form can return 0 rows. Open two psql connections and try it to 
see what happens. You'll need to have the processing clients retry the 
query in this case.


HTH
--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] queueing via database table?

2007-01-02 Thread Steve Atkins


On Jan 2, 2007, at 10:34 PM, Mark Harrison wrote:


I have a cluster of CPUs generating thumbnails for
a render farm.  I would like to place thumbnail
requests on a queue, and have the cluster of client
dequeue the requests and process them.

Of course, each request should be only dequeued once...
if a thumbnail is being processed by one CPU, it
shouldn't be processed by another CPU.

Does the following sound like a reasonable approach?
If not, what's a good way to go?

The processes generating the requests will insert into
a queue table.  They may add a priority and timestamp.

The several processes servicing the requests will do a
SELECT FOR UPDATE where ... limit 1, generate thumbnail,
delete the record and commit.

Comments and suggestions welcome,


Holding a lock while generating the thumbnail doesn't
sound like a great idea, and I think that the select
for update will end up serialising the requests.

I'd add a "rendering" field, text, defaulting
to an empty string.

Then do a "select for update where ... and rendering = '' limit 1",
update the rendering field to the hostname of the box doing the
work and commit. Render the thumbnail. Delete the record.

That'll also give you an easy way to show status of which
box is rendering which scene.

Depending on what else you're putting into the where clause
a partial index on something for records where rendering=''
might be helpful.

Cheers,
  Steve


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match