On May 2, 2009, at 05:37, Roy M. wrote:

On Sat, May 2, 2009 at 4:04 PM, Rocco Caputo <rcap...@pobox.com> wrote:
You seem to be seeking examples where POE isn't necessarily appropriate. I recommend looking for CPU-bound problems, since that's where POE doesn't directly help. Even in the CPU-bound case, a subthread or child process can externalize and parallelize the work. POE can continue doing non- blocking work in the meantime, and it can be notified when the side thread or process
is done.


Thanks for your lengthly reply.
I think I better explain my usage in more detail (sorry for not begin
done before).

Case:

1. I need to write a script to download files from 50 FTP servers every hour.

2. For each FTP server, I need to download 100-300 files.

3. I can accept connection to each FTP server is being single threaded
(since the max. time for operations on a single FTP server is quite
short, i.e. <  5 minutes)

Time constraint = 1 hour.
50 servers * 300 files = 15,000 files.
15,000 files * 5 minutes = 75,000 minutes.

You need to fit 1,250 hours worth of file transfer into a realtime hour.

You will need to run up to 1,250 simultaneous connections.
15,000 files / 1,250 connections = 12 files per connection.
12 files * 5 minutes = 60 minutes, so that works.

1,250 connections / 50 servers = 25 simultaneous connections per server.

How large are the files, in octets?

File size divided by 5 minutes = data rate per file, in octets per minute. Multiply that by 1,250 to determine your required network capacity. Divide what you require by your actual network capacity to determine how many networks you'll need. If the result is greater than one, then you'll need to buy one or more networks. Remember to round the quotient up to the next integer; you don't want to be half a network short.

Plan for growth.

4. So I tried to investigate POE, as FTP component is avaliable.

You should explore the number of network interfaces, disk channels, memory, and CPU cores you'll need with each technology option at your disposal. Divide machine capacities by resource requirements, to find the number of machines you'll need. Use the maximum number, which will reflect your tightest bottleneck. If the bottleneck is network interfaces, make sure you don't use more than your network can support.

If budget is a bottleneck, choose the technology that minimizes hardware costs.

5. Currently written using simple single threaded POE program, and In
the `authenticated` method in POE::Component::Client::FTP, I need to
write the log into an external MSSQL (not MySQL, wrongly typed b4), so
this process should be blocking as I think.

The log seems to be per connection. You can avoid blocking by journaling log entries to files, then spooling the journals into your SQL server with a dedicated background process.

Questions:

a. Is POE suitable for my jobs?

Your question implies additional constraints that you haven't mentioned.

POE is suitable for the current constraints and the work as specified, assuming that you have adequate hardware resources to achieve the work at all.

b. Do I need to use thread + POE in order to run 50 works together?

Threads are inappropriate for this task. You will almost certainly need more than one machine to do this work, which implies a multiprocess solution. Threads are redundant complexity, unless you can show a compelling need to share memory between downloaders.

c. Within each worker/process, I have some blocking operation in the
callback which prevent it from switching event more quickly, whar are
the normal way to handle?

You have yet to show that the blocking is significant or cannot be worked around trivially. You have larger worries right now.

Your questions imply that you haven't thought the problem through. If you need to hire an analyst, just ask. I'm sure some who read this list are available.

--
Rocco Caputo - rcap...@pobox.com

Reply via email to