This needed editing... take two: These seen like good questions to me. I would like to know if there is something for software analogous to the domain naming service for URLs--a "Software Naming Service." Does such a thing exist?
There are several software package systems, such as rpm (red hat), pacman (open source, used by OSG), yast (SuSE), apt-get (Debian) and fink (Mac OS version of Debian's apt-get). But these are incompatible. It seems that an automated software indexing service hasn't been abstracted beyond the level of the machine, to the level of a cluster, or to the grid. Each of these tools has query features. Now try querying what software is installed on a cluster, or a grid. What tools would you use? They don't seem to exist, or if they are, they haven't made it very far in Google's page ranking. There seems to be no Software Naming Service that could be queried and used, comparable to the Domain Naming Service. While I'm on the subject of tools for the end user, what about a shell that abstracts commands that you do from a workstation to the grid level? Something that might be called the "gshell." Where is the grid equivalent of the path? Of ls? Or for someone who wants to run a job on some collection of clusters, but needs certain libraries, which may be installed on different machines out there, somewhere. Is there a grid equivalent of ldconfig? Or even of something deprecated, like LD_LIBRARY_PATH? While I appreciate that the globus toolkit is intended to solve recurrent middleware problems, where is it being used to address the most recurring problem of all: getting users to use it? -----Original Message----- From: [EMAIL PROTECTED] on behalf of Nuno Guerreiro Sent: Fri 4/4/2008 6:29 PM To: [email protected] Subject: [gt-user] Data Mining using Globus Hi everyone, I am an almost complete and desperate newcomer, trying to learn how does Globus work and how can I use it to perform distributed data mining tasks. I have read GT4 Programmers Tutorial, as well as GT4 Primer and other documents. I am still a little bit confused. The general idea of my Master thesis is to have a Grid providing data mining services and discovering them on runtime (on the client side). After the discovery phase, the client would then submit data mining tasks to the discovered nodes. At the moment, I am trying to figure out how I will use Globus in my project. Here are my questions :) : - Is Globus to be used only as a "gateway" to PBS/LSF/SGE/Condor clusters? Will I use the indexing services to discover registered clusters and then choose one from the list (considering its availability, number of CPUs, etc)? - If so, what is the point on creating services and deploying them, if I am going to execute tasks by copying executable code to target machines and execute it there? If I understand correctly, I would only use Globus as a way to aggregate different clusters. - If not, should I deploy N-instances of services (e.g. one for each data mining algorithm) on each Globus node and then choose one service using the index service? - Can I get detailed information from the indexing services, like CPU count, architecture, MIPS, Memory, etc? I am sorry if I did not make my questions clear enough. (please have some patience with me :)) Thanks in advance. Best regards, Nuno Guerreiro
