Re: [gt-user] Data Mining using Globus

Nuno Guerreiro Fri, 04 Apr 2008 16:27:12 -0700

Thank you for you comments Florian :)

I'm still trying to find out how to connect all the pieces, and most
important of all, which pieces should I choose. From what I've read,
in theory, I could use Condor-G to submit a job to one of several
Globus Gatekeepers, which would then submit that job to a local
scheduler like PBS or Condor. This local scheduler would choose a
machine, execute the job and send the results back to me.


I don't know if this is a realistic scenario but, here it goes:

- Using a data mining toolkit like Weka, I would develop an
application that performs some data analysis.
- I would submit that application, plus the Weka toolkit, plus the
data set to process to Condor-G.
- Condor-G would then choose a Globus Gatekeeper and forward my request/job.
- The Globus Gatekeeper would then forward my request/job to a local scheduler.
- The local scheduler would execute my application, using the Weka
toolkit, to process the data set.
- The results would be sent by the local scheduler to Globus which
would then forward them to Condor-G, making them available to me.

Can this be a feasible option?
Is Condor-G able to use a service like the Globus Index Service and
choose a gatekeeper?

Thanks,
On Fri, Apr 4, 2008 at 11:55 PM, Lengyel, Florian <[EMAIL PROTECTED]> wrote:
>
>
>
>
> This needed editing... take two:
>
>
>  These seen like good questions to me. I would like to know
>  if there is something for software analogous to the
>  domain naming service for URLs--a "Software Naming Service."
>  Does such a thing exist?
>
>
>
>  There are several  software package systems, such as rpm (red hat),
>  pacman (open source, used by OSG), yast (SuSE),
>  apt-get (Debian) and fink (Mac OS version of Debian's apt-get).
>  But these are incompatible. It seems that an automated software
>  indexing service hasn't been abstracted beyond the
>
>  level of the machine, to the level of a cluster, or to the grid.
>
>  Each of these tools has query features. Now try querying what software
>  is installed on a cluster, or a grid.  What tools would you use? They don't
>  seem to exist, or if they are, they haven't made it very far in Google's
> page ranking.
>  There seems to be no Software Naming Service that could
>  be queried and used, comparable to the Domain Naming Service.
>
>  While I'm on the subject of tools for the end user, what about
>  a shell that abstracts commands that you do from a workstation to the
>  grid level? Something that might be called the "gshell."
>
>
>  Where is the grid equivalent of the path? Of ls?
>  Or for someone who wants to run a job on some collection of clusters,
>  but needs certain libraries, which may be installed on different
>  machines out there, somewhere. Is there a grid equivalent of
>  ldconfig? Or even of something deprecated, like LD_LIBRARY_PATH?
>
>  While I appreciate that the globus toolkit is intended to solve
>  recurrent middleware problems, where is it being used to address
>  the most recurring problem of all: getting users to use it?
>
>
>
>
>
>  -----Original Message-----
>  From: [EMAIL PROTECTED] on behalf of Nuno Guerreiro
>  Sent: Fri 4/4/2008 6:29 PM
>  To: [email protected]
>  Subject: [gt-user] Data Mining using Globus
>
>  Hi everyone,
>
>  I am an almost complete and desperate newcomer, trying to learn how
>  does Globus work and how can I use it to perform distributed data
>  mining tasks.
>  I have read GT4 Programmers Tutorial, as well as GT4 Primer and other
>  documents. I am still a little bit confused.
>
>  The general idea of my Master thesis is to have a Grid providing data
>  mining services and discovering them on runtime (on the client side).
>  After the discovery phase, the client would then submit data mining
>  tasks to the discovered nodes.
>
>  At the moment, I am trying to figure out how I will use Globus in my
>  project. Here are my questions :) :
>
>  - Is Globus to be used only as a "gateway" to PBS/LSF/SGE/Condor
>  clusters? Will I use the indexing services to discover registered
>  clusters and then choose one from the list (considering its
>  availability, number of CPUs, etc)?
>     - If so, what is the point on creating services and deploying them,
>  if I am going to execute tasks by copying executable code to target
>  machines and execute it there? If I understand correctly, I would only
>  use Globus as a way to aggregate different clusters.
>     - If not, should I deploy N-instances of services (e.g. one for
>  each data mining algorithm) on each Globus node and then choose one
>  service using the index service?
>  - Can I get detailed information from the indexing services, like CPU
>  count, architecture, MIPS, Memory, etc?
>
>  I am sorry if I did not make my questions clear enough. (please have
>  some patience with me :))
>
>  Thanks in advance.
>
>  Best regards,
>  Nuno Guerreiro
>
>
>
>

Re: [gt-user] Data Mining using Globus

Reply via email to