Thank you for you comments Florian :) I'm still trying to find out how to connect all the pieces, and most important of all, which pieces should I choose. From what I've read, in theory, I could use Condor-G to submit a job to one of several Globus Gatekeepers, which would then submit that job to a local scheduler like PBS or Condor. This local scheduler would choose a machine, execute the job and send the results back to me.
I don't know if this is a realistic scenario but, here it goes: - Using a data mining toolkit like Weka, I would develop an application that performs some data analysis. - I would submit that application, plus the Weka toolkit, plus the data set to process to Condor-G. - Condor-G would then choose a Globus Gatekeeper and forward my request/job. - The Globus Gatekeeper would then forward my request/job to a local scheduler. - The local scheduler would execute my application, using the Weka toolkit, to process the data set. - The results would be sent by the local scheduler to Globus which would then forward them to Condor-G, making them available to me. Can this be a feasible option? Is Condor-G able to use a service like the Globus Index Service and choose a gatekeeper? Thanks, On Fri, Apr 4, 2008 at 11:55 PM, Lengyel, Florian <[EMAIL PROTECTED]> wrote: > > > > > This needed editing... take two: > > > These seen like good questions to me. I would like to know > if there is something for software analogous to the > domain naming service for URLs--a "Software Naming Service." > Does such a thing exist? > > > > There are several software package systems, such as rpm (red hat), > pacman (open source, used by OSG), yast (SuSE), > apt-get (Debian) and fink (Mac OS version of Debian's apt-get). > But these are incompatible. It seems that an automated software > indexing service hasn't been abstracted beyond the > > level of the machine, to the level of a cluster, or to the grid. > > Each of these tools has query features. Now try querying what software > is installed on a cluster, or a grid. What tools would you use? They don't > seem to exist, or if they are, they haven't made it very far in Google's > page ranking. > There seems to be no Software Naming Service that could > be queried and used, comparable to the Domain Naming Service. > > While I'm on the subject of tools for the end user, what about > a shell that abstracts commands that you do from a workstation to the > grid level? Something that might be called the "gshell." > > > Where is the grid equivalent of the path? Of ls? > Or for someone who wants to run a job on some collection of clusters, > but needs certain libraries, which may be installed on different > machines out there, somewhere. Is there a grid equivalent of > ldconfig? Or even of something deprecated, like LD_LIBRARY_PATH? > > While I appreciate that the globus toolkit is intended to solve > recurrent middleware problems, where is it being used to address > the most recurring problem of all: getting users to use it? > > > > > > -----Original Message----- > From: [EMAIL PROTECTED] on behalf of Nuno Guerreiro > Sent: Fri 4/4/2008 6:29 PM > To: [email protected] > Subject: [gt-user] Data Mining using Globus > > Hi everyone, > > I am an almost complete and desperate newcomer, trying to learn how > does Globus work and how can I use it to perform distributed data > mining tasks. > I have read GT4 Programmers Tutorial, as well as GT4 Primer and other > documents. I am still a little bit confused. > > The general idea of my Master thesis is to have a Grid providing data > mining services and discovering them on runtime (on the client side). > After the discovery phase, the client would then submit data mining > tasks to the discovered nodes. > > At the moment, I am trying to figure out how I will use Globus in my > project. Here are my questions :) : > > - Is Globus to be used only as a "gateway" to PBS/LSF/SGE/Condor > clusters? Will I use the indexing services to discover registered > clusters and then choose one from the list (considering its > availability, number of CPUs, etc)? > - If so, what is the point on creating services and deploying them, > if I am going to execute tasks by copying executable code to target > machines and execute it there? If I understand correctly, I would only > use Globus as a way to aggregate different clusters. > - If not, should I deploy N-instances of services (e.g. one for > each data mining algorithm) on each Globus node and then choose one > service using the index service? > - Can I get detailed information from the indexing services, like CPU > count, architecture, MIPS, Memory, etc? > > I am sorry if I did not make my questions clear enough. (please have > some patience with me :)) > > Thanks in advance. > > Best regards, > Nuno Guerreiro > > > >
