Hi everyone, I am an almost complete and desperate newcomer, trying to learn how does Globus work and how can I use it to perform distributed data mining tasks. I have read GT4 Programmers Tutorial, as well as GT4 Primer and other documents. I am still a little bit confused.
The general idea of my Master thesis is to have a Grid providing data mining services and discovering them on runtime (on the client side). After the discovery phase, the client would then submit data mining tasks to the discovered nodes. At the moment, I am trying to figure out how I will use Globus in my project. Here are my questions :) : - Is Globus to be used only as a "gateway" to PBS/LSF/SGE/Condor clusters? Will I use the indexing services to discover registered clusters and then choose one from the list (considering its availability, number of CPUs, etc)? - If so, what is the point on creating services and deploying them, if I am going to execute tasks by copying executable code to target machines and execute it there? If I understand correctly, I would only use Globus as a way to aggregate different clusters. - If not, should I deploy N-instances of services (e.g. one for each data mining algorithm) on each Globus node and then choose one service using the index service? - Can I get detailed information from the indexing services, like CPU count, architecture, MIPS, Memory, etc? I am sorry if I did not make my questions clear enough. (please have some patience with me :)) Thanks in advance. Best regards, Nuno Guerreiro
