Re: DUCC- process_dd

2015-05-01 Thread Eddie Epstein
Reshu,

UIMA-AS configurations are normally used in DUCC as Services for
interactive applications or to support Jobs. They can be used in Jobs, but
typically are not.

There is also a difference in the inputs between Job processes and
Services. Services will normally receive a CAS with the artifact to be
analyzed. A Job process will receive a CAS with reference to the artifact
or even collection of artifacts; this is important for Job scale out to
avoid making the Job's Collection Reader a bottleneck.

I suggest starting with one of the sample applications and adapting it to
your needs. We can help if you give some details about the format of the
input and output data.

Eddie

On Fri, May 1, 2015 at 12:31 AM, reshu.agarwal reshu.agar...@orkash.com
wrote:

 Eddie,

 I was using this same scenario and doing hit and try to compare this with
 UIMA AS to get the more scaled pipeline as I think UIMA AS can also did
 this. But I am unable to touch the processing time of DUCC's default
 configuration like you mentioned with UIMA AS.

 Can you help me in doing this? I just want to do scaling by using best
 configuration of UIMA AS and DUCC which can be done using process_dd. But
 How??

 Thanks in advanced.

 Reshu.


 On 05/01/2015 03:28 AM, Eddie Epstein wrote:

 The simplest way of vertically scaling a Job process is to specify the
 analysis pipeline using core UIMA descriptors and then using
 --process_thread_count to specify how many copies of the pipeline to
 deploy, each in a different thread. No use of UIMA-AS at all. Please check
 out the Raw Text Processing sample application that comes with DUCC.

 On Wed, Apr 29, 2015 at 12:30 AM, reshu.agarwal reshu.agar...@orkash.com
 
 wrote:

  Ohh!!! I misunderstand this. I thought this would scale my Aggregate and
 AEs both.

 I want to scale aggregate as well as individual AEs. Is there any way of
 doing this in UIMA AS/DUCC?



 On 04/28/2015 07:14 PM, Jaroslaw Cwiklik wrote:

  In async aggregate you scale individual AEs not the aggregate as a
 whole.
 The below configuration should do that. Are there any warnings from
 dd2spring at startup with your configuration?

 analysisEngine async=true 

   delegates
   analysisEngine
 key=ChunkerDescriptor
   scaleout
 numberOfInstances=5 /
   /analysisEngine
   analysisEngine
 key=NEDescriptor
   scaleout
 numberOfInstances=5 /
   /analysisEngine
   analysisEngine
 key=StemmerDescriptor
   scaleout
 numberOfInstances=5 /
   /analysisEngine
   analysisEngine
 key=ConsumerDescriptor
   scaleout
 numberOfInstances=5 /
   /analysisEngine
   /delegates
   /analysisEngine

 Jerry

 On Tue, Apr 28, 2015 at 5:20 AM, reshu.agarwal 
 reshu.agar...@orkash.com
 wrote:

   Hi,

 I was trying to scale my processing pipeline to be run in DUCC
 environment
 with uima as process_dd. If I was trying to scale using the below given
 configuration, the threads started were not as expected:


 analysisEngineDeploymentDescription
   xmlns=http://uima.apache.org/resourceSpecifier;

   nameUima v3 Deployment Descripter/name
   descriptionDeploys Uima v3 Aggregate AE using the Advanced
 Fixed
 Flow
   Controller/description

   deployment protocol=jms provider=activemq
   casPool numberOfCASes=5 /
   service
   inputQueue endpoint=UIMA_Queue_test
 brokerURL=tcp://localhost:61617?jms.useCompression=true prefetch=0
 /
   topDescriptor
   import


 location=../Uima_v3_test/desc/orkash/ae/aggregate/FlowController_Uima.xml
 /
   /topDescriptor
   analysisEngine async=true
 key=FlowControllerAgg internalReplyQueueScaleout=10
 inputQueueScaleout=10
   scaleout numberOfInstances=5/
   delegates
   analysisEngine
 key=ChunkerDescriptor
   scaleout
 numberOfInstances=5 /
   /analysisEngine
   analysisEngine
 key=NEDescriptor
   scaleout
 numberOfInstances=5 /
   /analysisEngine
   analysisEngine
 

Re: DUCC- process_dd

2015-04-30 Thread reshu.agarwal

Eddie,

I was using this same scenario and doing hit and try to compare this 
with UIMA AS to get the more scaled pipeline as I think UIMA AS can also 
did this. But I am unable to touch the processing time of DUCC's default 
configuration like you mentioned with UIMA AS.


Can you help me in doing this? I just want to do scaling by using best 
configuration of UIMA AS and DUCC which can be done using process_dd. 
But How??


Thanks in advanced.

Reshu.

On 05/01/2015 03:28 AM, Eddie Epstein wrote:

The simplest way of vertically scaling a Job process is to specify the
analysis pipeline using core UIMA descriptors and then using
--process_thread_count to specify how many copies of the pipeline to
deploy, each in a different thread. No use of UIMA-AS at all. Please check
out the Raw Text Processing sample application that comes with DUCC.

On Wed, Apr 29, 2015 at 12:30 AM, reshu.agarwal reshu.agar...@orkash.com
wrote:


Ohh!!! I misunderstand this. I thought this would scale my Aggregate and
AEs both.

I want to scale aggregate as well as individual AEs. Is there any way of
doing this in UIMA AS/DUCC?



On 04/28/2015 07:14 PM, Jaroslaw Cwiklik wrote:


In async aggregate you scale individual AEs not the aggregate as a whole.
The below configuration should do that. Are there any warnings from
dd2spring at startup with your configuration?

analysisEngine async=true 

  delegates
  analysisEngine
key=ChunkerDescriptor
  scaleout
numberOfInstances=5 /
  /analysisEngine
  analysisEngine
key=NEDescriptor
  scaleout
numberOfInstances=5 /
  /analysisEngine
  analysisEngine
key=StemmerDescriptor
  scaleout
numberOfInstances=5 /
  /analysisEngine
  analysisEngine
key=ConsumerDescriptor
  scaleout
numberOfInstances=5 /
  /analysisEngine
  /delegates
  /analysisEngine

Jerry

On Tue, Apr 28, 2015 at 5:20 AM, reshu.agarwal reshu.agar...@orkash.com
wrote:

  Hi,

I was trying to scale my processing pipeline to be run in DUCC
environment
with uima as process_dd. If I was trying to scale using the below given
configuration, the threads started were not as expected:


analysisEngineDeploymentDescription
  xmlns=http://uima.apache.org/resourceSpecifier;

  nameUima v3 Deployment Descripter/name
  descriptionDeploys Uima v3 Aggregate AE using the Advanced
Fixed
Flow
  Controller/description

  deployment protocol=jms provider=activemq
  casPool numberOfCASes=5 /
  service
  inputQueue endpoint=UIMA_Queue_test
brokerURL=tcp://localhost:61617?jms.useCompression=true prefetch=0 /
  topDescriptor
  import

location=../Uima_v3_test/desc/orkash/ae/aggregate/FlowController_Uima.xml
/
  /topDescriptor
  analysisEngine async=true
key=FlowControllerAgg internalReplyQueueScaleout=10
inputQueueScaleout=10
  scaleout numberOfInstances=5/
  delegates
  analysisEngine
key=ChunkerDescriptor
  scaleout
numberOfInstances=5 /
  /analysisEngine
  analysisEngine
key=NEDescriptor
  scaleout
numberOfInstances=5 /
  /analysisEngine
  analysisEngine
key=StemmerDescriptor
  scaleout
numberOfInstances=5 /
  /analysisEngine
  analysisEngine
key=ConsumerDescriptor
  scaleout
numberOfInstances=5 /
  /analysisEngine
  /delegates
  /analysisEngine
  /service
  /deployment

/analysisEngineDeploymentDescription


There should be 5 threads of FlowControllerAgg where each thread will
have
5 more threads of each ChunkerDescriptor,NEDescriptor,StemmerDescriptor
and
ConsumerDescriptor.

But I didn't think it is actually happening in case of DUCC.

Thanks in advance.

Reshu.








Re: DUCC- process_dd

2015-04-28 Thread reshu.agarwal


Ohh!!! I misunderstand this. I thought this would scale my Aggregate and 
AEs both.


I want to scale aggregate as well as individual AEs. Is there any way of 
doing this in UIMA AS/DUCC?



On 04/28/2015 07:14 PM, Jaroslaw Cwiklik wrote:

In async aggregate you scale individual AEs not the aggregate as a whole.
The below configuration should do that. Are there any warnings from
dd2spring at startup with your configuration?

analysisEngine async=true 

 delegates
 analysisEngine
key=ChunkerDescriptor
 scaleout
numberOfInstances=5 /
 /analysisEngine
 analysisEngine key=NEDescriptor
 scaleout
numberOfInstances=5 /
 /analysisEngine
 analysisEngine
key=StemmerDescriptor
 scaleout
numberOfInstances=5 /
 /analysisEngine
 analysisEngine
key=ConsumerDescriptor
 scaleout
numberOfInstances=5 /
 /analysisEngine
 /delegates
 /analysisEngine

Jerry

On Tue, Apr 28, 2015 at 5:20 AM, reshu.agarwal reshu.agar...@orkash.com
wrote:


Hi,

I was trying to scale my processing pipeline to be run in DUCC environment
with uima as process_dd. If I was trying to scale using the below given
configuration, the threads started were not as expected:


analysisEngineDeploymentDescription
 xmlns=http://uima.apache.org/resourceSpecifier;

 nameUima v3 Deployment Descripter/name
 descriptionDeploys Uima v3 Aggregate AE using the Advanced Fixed
Flow
 Controller/description

 deployment protocol=jms provider=activemq
 casPool numberOfCASes=5 /
 service
 inputQueue endpoint=UIMA_Queue_test
brokerURL=tcp://localhost:61617?jms.useCompression=true prefetch=0 /
 topDescriptor
 import
location=../Uima_v3_test/desc/orkash/ae/aggregate/FlowController_Uima.xml
/
 /topDescriptor
 analysisEngine async=true
key=FlowControllerAgg internalReplyQueueScaleout=10
inputQueueScaleout=10
 scaleout numberOfInstances=5/
 delegates
 analysisEngine
key=ChunkerDescriptor
 scaleout
numberOfInstances=5 /
 /analysisEngine
 analysisEngine key=NEDescriptor
 scaleout
numberOfInstances=5 /
 /analysisEngine
 analysisEngine
key=StemmerDescriptor
 scaleout
numberOfInstances=5 /
 /analysisEngine
 analysisEngine
key=ConsumerDescriptor
 scaleout
numberOfInstances=5 /
 /analysisEngine
 /delegates
 /analysisEngine
 /service
 /deployment

/analysisEngineDeploymentDescription


There should be 5 threads of FlowControllerAgg where each thread will have
5 more threads of each ChunkerDescriptor,NEDescriptor,StemmerDescriptor and
ConsumerDescriptor.

But I didn't think it is actually happening in case of DUCC.

Thanks in advance.

Reshu.







DUCC- process_dd

2015-04-28 Thread reshu.agarwal

Hi,

I was trying to scale my processing pipeline to be run in DUCC 
environment with uima as process_dd. If I was trying to scale using the 
below given configuration, the threads started were not as expected:



analysisEngineDeploymentDescription
xmlns=http://uima.apache.org/resourceSpecifier;

nameUima v3 Deployment Descripter/name
descriptionDeploys Uima v3 Aggregate AE using the Advanced 
Fixed Flow

Controller/description

deployment protocol=jms provider=activemq
casPool numberOfCASes=5 /
service
inputQueue endpoint=UIMA_Queue_test 
brokerURL=tcp://localhost:61617?jms.useCompression=true prefetch=0 /

topDescriptor
import 
location=../Uima_v3_test/desc/orkash/ae/aggregate/FlowController_Uima.xml 
/

/topDescriptor
analysisEngine async=true 
key=FlowControllerAgg internalReplyQueueScaleout=10 
inputQueueScaleout=10

scaleout numberOfInstances=5/
delegates
analysisEngine 
key=ChunkerDescriptor
scaleout 
numberOfInstances=5 /

/analysisEngine
analysisEngine key=NEDescriptor
scaleout 
numberOfInstances=5 /

/analysisEngine
analysisEngine 
key=StemmerDescriptor
scaleout 
numberOfInstances=5 /

/analysisEngine
analysisEngine 
key=ConsumerDescriptor
scaleout 
numberOfInstances=5 /

/analysisEngine
/delegates
/analysisEngine
/service
/deployment

/analysisEngineDeploymentDescription


There should be 5 threads of FlowControllerAgg where each thread will 
have 5 more threads of each 
ChunkerDescriptor,NEDescriptor,StemmerDescriptor and ConsumerDescriptor.


But I didn't think it is actually happening in case of DUCC.

Thanks in advance.

Reshu.