Re: Running UIMA on a cluster

Eddie Epstein Tue, 01 May 2012 10:58:17 -0700

The UIMA-AS framework doesn't have any support for deploying processes
across a cluster. SGE could be used to play that role.


Because UIMA-AS services register with a JMS broker, and the UIMA-AS client
communicates with these services via the broker, it doesn't matter
where they run.

Eddie

On Fri, Apr 27, 2012 at 5:48 PM, John David Osborne <ozb...@uab.edu> wrote:
> Very helpful responses from you and Thomas, thanks guys!  The README in
> the 2.3.1 documentation is very useful.
>
> I'm still confused about one thing, and I am dreading the answer. How does
> UIMA-AS play with pre-existing tools like SGE? I'm under the impression
> that it is basically going to ignore SGE and try to start jobs on the
> compute nodes by itself. Is everybody running UIMA on  dedicated clusters
> more or less?
>
> I'm in a situation where I'm looking to run on a cluster shared pretty
> much University wide for which SGE is the main (probably only) job
> submission method.
>
>  -John
>
>
>
> On 4/27/12 2:59 PM, "Eric Riebling" <e...@cs.cmu.edu> wrote:
>
>>We've had success deploying annotators on cluster nodes (using UIMA-AS
>>deployment descriptors) registered to a UIMA-AS broker running on the
>>head node.  If the cluster use shared data folders, you only need to
>>put the code in one place for it to 'appear' on all nodes.
>>
>>Then we run a collection reader and CAS consumer on the head node,
>>with the amount of scale-out specified on the command line of
>>runRemoteAsyncAE.sh, something like this:
>>
>>   $UIMA_HOME/bin/runRemoteAsyncAE.sh -c (path.to)XmiCollectionReader.xml
>>tcp://localhost:6
>>1616 (name of deployed service) -p (number of nodes) -o output_foldername
>>
>>With enough scale-out, the limiting factor becomes the speed of the CR
>>and CC on the head node.  This is the briefest explanation I can give,
>>not sure it's a 'best practice' but it works. :)
>>
>>On 4/27/2012 3:35 PM, John David Osborne wrote:
>>> Hello,
>>>
>>> Is there any best practice documentation out there for running
>>> UIMA/UIMA-AS on a cluster? I have only run single machine instances of
>>> UIMA (mostly through Eclipse) and have not investigated the ability to
>>> perform multiple simultaneous analyses in order to process large
>>>document
>>> collections.
>>>
>>> It's not clear to me how UIMA would operate in a cluster environment, do
>>> people really do message passing using JMI? I'm guessing this is the
>>>case
>>> as I seeing references to MPICH, SGE or other things I am more used to.
>>> I've looked through some of the documentation (including all the
>>>Overview
>>> &  SDK setup) but am not finding anything helpful. I've also tried
>>>googling
>>> but I am not getting much except this:
>>> http://comments.gmane.org/gmane.comp.apache.uima.general/2131 which
>>>makes
>>> me think it is possible.
>>>
>>> Currently with my level of confusion I think it may be best to have
>>> multiple instances of UIMA on a cluster and just submit jobs processing
>>> discrete document sets to our SGE cluster and ignore whatever scaling
>>> features are actually present in UIMA since the document processing I
>>>plan
>>> to do is data parallel.
>>>
>>> -John
>>>
>>>
>>
>>--
>>Eric Riebling                 Senior Systems Programmer
>>http://ericriebling.com       CMU Language Technologies Institute
>>
>

Re: Running UIMA on a cluster

Reply via email to