Hi everyone,

I am an almost complete and desperate newcomer, trying to learn how
does Globus work and how can I use it to perform distributed data
mining tasks.
I have read GT4 Programmers Tutorial, as well as GT4 Primer and other
documents. I am still a little bit confused.

The general idea of my Master thesis is to have a Grid providing data
mining services and discovering them on runtime (on the client side).
After the discovery phase, the client would then submit data mining
tasks to the discovered nodes.

At the moment, I am trying to figure out how I will use Globus in my
project. Here are my questions :) :

- Is Globus to be used only as a "gateway" to PBS/LSF/SGE/Condor
clusters? Will I use the indexing services to discover registered
clusters and then choose one from the list (considering its
availability, number of CPUs, etc)?
   - If so, what is the point on creating services and deploying them,
if I am going to execute tasks by copying executable code to target
machines and execute it there? If I understand correctly, I would only
use Globus as a way to aggregate different clusters.
   - If not, should I deploy N-instances of services (e.g. one for
each data mining algorithm) on each Globus node and then choose one
service using the index service?
- Can I get detailed information from the indexing services, like CPU
count, architecture, MIPS, Memory, etc?

I am sorry if I did not make my questions clear enough. (please have
some patience with me :))

Thanks in advance.

Best regards,
Nuno Guerreiro

Reply via email to