Re: DUCC- process_dd
Reshu, UIMA-AS configurations are normally used in DUCC as Services for interactive applications or to support Jobs. They can be used in Jobs, but typically are not. There is also a difference in the inputs between Job processes and Services. Services will normally receive a CAS with the artifact to be analyzed. A Job process will receive a CAS with reference to the artifact or even collection of artifacts; this is important for Job scale out to avoid making the Job's Collection Reader a bottleneck. I suggest starting with one of the sample applications and adapting it to your needs. We can help if you give some details about the format of the input and output data. Eddie On Fri, May 1, 2015 at 12:31 AM, reshu.agarwal reshu.agar...@orkash.com wrote: Eddie, I was using this same scenario and doing hit and try to compare this with UIMA AS to get the more scaled pipeline as I think UIMA AS can also did this. But I am unable to touch the processing time of DUCC's default configuration like you mentioned with UIMA AS. Can you help me in doing this? I just want to do scaling by using best configuration of UIMA AS and DUCC which can be done using process_dd. But How?? Thanks in advanced. Reshu. On 05/01/2015 03:28 AM, Eddie Epstein wrote: The simplest way of vertically scaling a Job process is to specify the analysis pipeline using core UIMA descriptors and then using --process_thread_count to specify how many copies of the pipeline to deploy, each in a different thread. No use of UIMA-AS at all. Please check out the Raw Text Processing sample application that comes with DUCC. On Wed, Apr 29, 2015 at 12:30 AM, reshu.agarwal reshu.agar...@orkash.com wrote: Ohh!!! I misunderstand this. I thought this would scale my Aggregate and AEs both. I want to scale aggregate as well as individual AEs. Is there any way of doing this in UIMA AS/DUCC? On 04/28/2015 07:14 PM, Jaroslaw Cwiklik wrote: In async aggregate you scale individual AEs not the aggregate as a whole. The below configuration should do that. Are there any warnings from dd2spring at startup with your configuration? analysisEngine async=true delegates analysisEngine key=ChunkerDescriptor scaleout numberOfInstances=5 / /analysisEngine analysisEngine key=NEDescriptor scaleout numberOfInstances=5 / /analysisEngine analysisEngine key=StemmerDescriptor scaleout numberOfInstances=5 / /analysisEngine analysisEngine key=ConsumerDescriptor scaleout numberOfInstances=5 / /analysisEngine /delegates /analysisEngine Jerry On Tue, Apr 28, 2015 at 5:20 AM, reshu.agarwal reshu.agar...@orkash.com wrote: Hi, I was trying to scale my processing pipeline to be run in DUCC environment with uima as process_dd. If I was trying to scale using the below given configuration, the threads started were not as expected: analysisEngineDeploymentDescription xmlns=http://uima.apache.org/resourceSpecifier; nameUima v3 Deployment Descripter/name descriptionDeploys Uima v3 Aggregate AE using the Advanced Fixed Flow Controller/description deployment protocol=jms provider=activemq casPool numberOfCASes=5 / service inputQueue endpoint=UIMA_Queue_test brokerURL=tcp://localhost:61617?jms.useCompression=true prefetch=0 / topDescriptor import location=../Uima_v3_test/desc/orkash/ae/aggregate/FlowController_Uima.xml / /topDescriptor analysisEngine async=true key=FlowControllerAgg internalReplyQueueScaleout=10 inputQueueScaleout=10 scaleout numberOfInstances=5/ delegates analysisEngine key=ChunkerDescriptor scaleout numberOfInstances=5 / /analysisEngine analysisEngine key=NEDescriptor scaleout numberOfInstances=5 / /analysisEngine analysisEngine
Re: DUCC- process_dd
Eddie, I was using this same scenario and doing hit and try to compare this with UIMA AS to get the more scaled pipeline as I think UIMA AS can also did this. But I am unable to touch the processing time of DUCC's default configuration like you mentioned with UIMA AS. Can you help me in doing this? I just want to do scaling by using best configuration of UIMA AS and DUCC which can be done using process_dd. But How?? Thanks in advanced. Reshu. On 05/01/2015 03:28 AM, Eddie Epstein wrote: The simplest way of vertically scaling a Job process is to specify the analysis pipeline using core UIMA descriptors and then using --process_thread_count to specify how many copies of the pipeline to deploy, each in a different thread. No use of UIMA-AS at all. Please check out the Raw Text Processing sample application that comes with DUCC. On Wed, Apr 29, 2015 at 12:30 AM, reshu.agarwal reshu.agar...@orkash.com wrote: Ohh!!! I misunderstand this. I thought this would scale my Aggregate and AEs both. I want to scale aggregate as well as individual AEs. Is there any way of doing this in UIMA AS/DUCC? On 04/28/2015 07:14 PM, Jaroslaw Cwiklik wrote: In async aggregate you scale individual AEs not the aggregate as a whole. The below configuration should do that. Are there any warnings from dd2spring at startup with your configuration? analysisEngine async=true delegates analysisEngine key=ChunkerDescriptor scaleout numberOfInstances=5 / /analysisEngine analysisEngine key=NEDescriptor scaleout numberOfInstances=5 / /analysisEngine analysisEngine key=StemmerDescriptor scaleout numberOfInstances=5 / /analysisEngine analysisEngine key=ConsumerDescriptor scaleout numberOfInstances=5 / /analysisEngine /delegates /analysisEngine Jerry On Tue, Apr 28, 2015 at 5:20 AM, reshu.agarwal reshu.agar...@orkash.com wrote: Hi, I was trying to scale my processing pipeline to be run in DUCC environment with uima as process_dd. If I was trying to scale using the below given configuration, the threads started were not as expected: analysisEngineDeploymentDescription xmlns=http://uima.apache.org/resourceSpecifier; nameUima v3 Deployment Descripter/name descriptionDeploys Uima v3 Aggregate AE using the Advanced Fixed Flow Controller/description deployment protocol=jms provider=activemq casPool numberOfCASes=5 / service inputQueue endpoint=UIMA_Queue_test brokerURL=tcp://localhost:61617?jms.useCompression=true prefetch=0 / topDescriptor import location=../Uima_v3_test/desc/orkash/ae/aggregate/FlowController_Uima.xml / /topDescriptor analysisEngine async=true key=FlowControllerAgg internalReplyQueueScaleout=10 inputQueueScaleout=10 scaleout numberOfInstances=5/ delegates analysisEngine key=ChunkerDescriptor scaleout numberOfInstances=5 / /analysisEngine analysisEngine key=NEDescriptor scaleout numberOfInstances=5 / /analysisEngine analysisEngine key=StemmerDescriptor scaleout numberOfInstances=5 / /analysisEngine analysisEngine key=ConsumerDescriptor scaleout numberOfInstances=5 / /analysisEngine /delegates /analysisEngine /service /deployment /analysisEngineDeploymentDescription There should be 5 threads of FlowControllerAgg where each thread will have 5 more threads of each ChunkerDescriptor,NEDescriptor,StemmerDescriptor and ConsumerDescriptor. But I didn't think it is actually happening in case of DUCC. Thanks in advance. Reshu.
Re: DUCC- process_dd
Ohh!!! I misunderstand this. I thought this would scale my Aggregate and AEs both. I want to scale aggregate as well as individual AEs. Is there any way of doing this in UIMA AS/DUCC? On 04/28/2015 07:14 PM, Jaroslaw Cwiklik wrote: In async aggregate you scale individual AEs not the aggregate as a whole. The below configuration should do that. Are there any warnings from dd2spring at startup with your configuration? analysisEngine async=true delegates analysisEngine key=ChunkerDescriptor scaleout numberOfInstances=5 / /analysisEngine analysisEngine key=NEDescriptor scaleout numberOfInstances=5 / /analysisEngine analysisEngine key=StemmerDescriptor scaleout numberOfInstances=5 / /analysisEngine analysisEngine key=ConsumerDescriptor scaleout numberOfInstances=5 / /analysisEngine /delegates /analysisEngine Jerry On Tue, Apr 28, 2015 at 5:20 AM, reshu.agarwal reshu.agar...@orkash.com wrote: Hi, I was trying to scale my processing pipeline to be run in DUCC environment with uima as process_dd. If I was trying to scale using the below given configuration, the threads started were not as expected: analysisEngineDeploymentDescription xmlns=http://uima.apache.org/resourceSpecifier; nameUima v3 Deployment Descripter/name descriptionDeploys Uima v3 Aggregate AE using the Advanced Fixed Flow Controller/description deployment protocol=jms provider=activemq casPool numberOfCASes=5 / service inputQueue endpoint=UIMA_Queue_test brokerURL=tcp://localhost:61617?jms.useCompression=true prefetch=0 / topDescriptor import location=../Uima_v3_test/desc/orkash/ae/aggregate/FlowController_Uima.xml / /topDescriptor analysisEngine async=true key=FlowControllerAgg internalReplyQueueScaleout=10 inputQueueScaleout=10 scaleout numberOfInstances=5/ delegates analysisEngine key=ChunkerDescriptor scaleout numberOfInstances=5 / /analysisEngine analysisEngine key=NEDescriptor scaleout numberOfInstances=5 / /analysisEngine analysisEngine key=StemmerDescriptor scaleout numberOfInstances=5 / /analysisEngine analysisEngine key=ConsumerDescriptor scaleout numberOfInstances=5 / /analysisEngine /delegates /analysisEngine /service /deployment /analysisEngineDeploymentDescription There should be 5 threads of FlowControllerAgg where each thread will have 5 more threads of each ChunkerDescriptor,NEDescriptor,StemmerDescriptor and ConsumerDescriptor. But I didn't think it is actually happening in case of DUCC. Thanks in advance. Reshu.
DUCC- process_dd
Hi, I was trying to scale my processing pipeline to be run in DUCC environment with uima as process_dd. If I was trying to scale using the below given configuration, the threads started were not as expected: analysisEngineDeploymentDescription xmlns=http://uima.apache.org/resourceSpecifier; nameUima v3 Deployment Descripter/name descriptionDeploys Uima v3 Aggregate AE using the Advanced Fixed Flow Controller/description deployment protocol=jms provider=activemq casPool numberOfCASes=5 / service inputQueue endpoint=UIMA_Queue_test brokerURL=tcp://localhost:61617?jms.useCompression=true prefetch=0 / topDescriptor import location=../Uima_v3_test/desc/orkash/ae/aggregate/FlowController_Uima.xml / /topDescriptor analysisEngine async=true key=FlowControllerAgg internalReplyQueueScaleout=10 inputQueueScaleout=10 scaleout numberOfInstances=5/ delegates analysisEngine key=ChunkerDescriptor scaleout numberOfInstances=5 / /analysisEngine analysisEngine key=NEDescriptor scaleout numberOfInstances=5 / /analysisEngine analysisEngine key=StemmerDescriptor scaleout numberOfInstances=5 / /analysisEngine analysisEngine key=ConsumerDescriptor scaleout numberOfInstances=5 / /analysisEngine /delegates /analysisEngine /service /deployment /analysisEngineDeploymentDescription There should be 5 threads of FlowControllerAgg where each thread will have 5 more threads of each ChunkerDescriptor,NEDescriptor,StemmerDescriptor and ConsumerDescriptor. But I didn't think it is actually happening in case of DUCC. Thanks in advance. Reshu.