The UIMA-AS framework doesn't have any support for deploying processes across a cluster. SGE could be used to play that role.
Because UIMA-AS services register with a JMS broker, and the UIMA-AS client communicates with these services via the broker, it doesn't matter where they run. Eddie On Fri, Apr 27, 2012 at 5:48 PM, John David Osborne <ozb...@uab.edu> wrote: > Very helpful responses from you and Thomas, thanks guys! The README in > the 2.3.1 documentation is very useful. > > I'm still confused about one thing, and I am dreading the answer. How does > UIMA-AS play with pre-existing tools like SGE? I'm under the impression > that it is basically going to ignore SGE and try to start jobs on the > compute nodes by itself. Is everybody running UIMA on dedicated clusters > more or less? > > I'm in a situation where I'm looking to run on a cluster shared pretty > much University wide for which SGE is the main (probably only) job > submission method. > > -John > > > > On 4/27/12 2:59 PM, "Eric Riebling" <e...@cs.cmu.edu> wrote: > >>We've had success deploying annotators on cluster nodes (using UIMA-AS >>deployment descriptors) registered to a UIMA-AS broker running on the >>head node. If the cluster use shared data folders, you only need to >>put the code in one place for it to 'appear' on all nodes. >> >>Then we run a collection reader and CAS consumer on the head node, >>with the amount of scale-out specified on the command line of >>runRemoteAsyncAE.sh, something like this: >> >> $UIMA_HOME/bin/runRemoteAsyncAE.sh -c (path.to)XmiCollectionReader.xml >>tcp://localhost:6 >>1616 (name of deployed service) -p (number of nodes) -o output_foldername >> >>With enough scale-out, the limiting factor becomes the speed of the CR >>and CC on the head node. This is the briefest explanation I can give, >>not sure it's a 'best practice' but it works. :) >> >>On 4/27/2012 3:35 PM, John David Osborne wrote: >>> Hello, >>> >>> Is there any best practice documentation out there for running >>> UIMA/UIMA-AS on a cluster? I have only run single machine instances of >>> UIMA (mostly through Eclipse) and have not investigated the ability to >>> perform multiple simultaneous analyses in order to process large >>>document >>> collections. >>> >>> It's not clear to me how UIMA would operate in a cluster environment, do >>> people really do message passing using JMI? I'm guessing this is the >>>case >>> as I seeing references to MPICH, SGE or other things I am more used to. >>> I've looked through some of the documentation (including all the >>>Overview >>> & SDK setup) but am not finding anything helpful. I've also tried >>>googling >>> but I am not getting much except this: >>> http://comments.gmane.org/gmane.comp.apache.uima.general/2131 which >>>makes >>> me think it is possible. >>> >>> Currently with my level of confusion I think it may be best to have >>> multiple instances of UIMA on a cluster and just submit jobs processing >>> discrete document sets to our SGE cluster and ignore whatever scaling >>> features are actually present in UIMA since the document processing I >>>plan >>> to do is data parallel. >>> >>> -John >>> >>> >> >>-- >>Eric Riebling Senior Systems Programmer >>http://ericriebling.com CMU Language Technologies Institute >> >