RE: [gt-user] Data Mining using Globus

Lengyel, Florian Fri, 04 Apr 2008 15:55:35 -0700

This needed editing... take two:

These seen like good questions to me. I would like to know
if there is something for software analogous to the
domain naming service for URLs--a "Software Naming Service."
Does such a thing exist?




There are several  software package systems, such as rpm (red hat), 
pacman (open source, used by OSG), yast (SuSE),
apt-get (Debian) and fink (Mac OS version of Debian's apt-get).
But these are incompatible. It seems that an automated software
indexing service hasn't been abstracted beyond the
level of the machine, to the level of a cluster, or to the grid.

Each of these tools has query features. Now try querying what software
is installed on a cluster, or a grid.  What tools would you use? They don't 
seem to exist, or if they are, they haven't made it very far in Google's page 
ranking. 
There seems to be no Software Naming Service that could
be queried and used, comparable to the Domain Naming Service.

While I'm on the subject of tools for the end user, what about
a shell that abstracts commands that you do from a workstation to the
grid level? Something that might be called the "gshell."

Where is the grid equivalent of the path? Of ls?
Or for someone who wants to run a job on some collection of clusters,
but needs certain libraries, which may be installed on different 
machines out there, somewhere. Is there a grid equivalent of  
ldconfig? Or even of something deprecated, like LD_LIBRARY_PATH? 

While I appreciate that the globus toolkit is intended to solve
recurrent middleware problems, where is it being used to address
the most recurring problem of all: getting users to use it?



-----Original Message-----
From: [EMAIL PROTECTED] on behalf of Nuno Guerreiro
Sent: Fri 4/4/2008 6:29 PM
To: [email protected]
Subject: [gt-user] Data Mining using Globus
 
Hi everyone,

I am an almost complete and desperate newcomer, trying to learn how
does Globus work and how can I use it to perform distributed data
mining tasks.
I have read GT4 Programmers Tutorial, as well as GT4 Primer and other
documents. I am still a little bit confused.

The general idea of my Master thesis is to have a Grid providing data
mining services and discovering them on runtime (on the client side).
After the discovery phase, the client would then submit data mining
tasks to the discovered nodes.

At the moment, I am trying to figure out how I will use Globus in my
project. Here are my questions :) :

- Is Globus to be used only as a "gateway" to PBS/LSF/SGE/Condor
clusters? Will I use the indexing services to discover registered
clusters and then choose one from the list (considering its
availability, number of CPUs, etc)?
   - If so, what is the point on creating services and deploying them,
if I am going to execute tasks by copying executable code to target
machines and execute it there? If I understand correctly, I would only
use Globus as a way to aggregate different clusters.
   - If not, should I deploy N-instances of services (e.g. one for
each data mining algorithm) on each Globus node and then choose one
service using the index service?
- Can I get detailed information from the indexing services, like CPU
count, architecture, MIPS, Memory, etc?

I am sorry if I did not make my questions clear enough. (please have
some patience with me :))

Thanks in advance.

Best regards,
Nuno Guerreiro

RE: [gt-user] Data Mining using Globus

Reply via email to