Re: DUCC doesn't use all available machines

2014-11-30 Thread Simon Hafner
2014-11-30 7:25 GMT-06:00 Eddie Epstein :
> On Sat, Nov 29, 2014 at 4:46 PM, Simon Hafner  wrote:
>
>> I've thrown some numbers at it (doubling each) and it's running at
>> comfortable 125 procs. However, at about 6.1k of 6.5k items, the procs
>> drop down to 30.
>>
>
> 125 processes at 8 threads each = 1000 active pipelines. How CPU cores
> are these 1000 pipelines running on?
Only 60.


Re: DUCC doesn't use all available machines

2014-11-29 Thread Simon Hafner
I've thrown some numbers at it (doubling each) and it's running at
comfortable 125 procs. However, at about 6.1k of 6.5k items, the procs
drop down to 30.

2014-11-28 18:13 GMT-06:00 Eddie Epstein :
> Now you are hitting a limit configured in ducc.properties:
>
>   # Max number of work-item CASes for each job
>   ducc.threads.limit = 500
>
> 62 job process * 8 threads per process = 496 max concurrent work items.
> This was put in to limit the memory required by the job driver. This value
> can probably be pushed up in the range of 700-800 before the job driver
> will go OOM. There are configuration parameters to increase JD memory:
>
>   # Memory size in MB allocated for each JD
>   ducc.jd.share.quantum = 450
>   # JD max heap size. Should be smaller than the JD share quantum
>   ducc.driver.jvm.args = -Xmx400M -DUimaAsCasTracking
>
> DUCC would have to be restarted for the JD size parameters to take effect.
>
> One of the current DUCC development items is to significantly reduce the
> memory needed per work item, and raise the default limit for concurrent
> work items by two or three orders of magnitude.
>
>
>
> On Fri, Nov 28, 2014 at 6:40 PM, Simon Hafner  wrote:
>
>> I've put the fudge to 12000, and it jumped immediately to 62 procs.
>> However, it doesn't spawn new ones even though it has about 6k items
>> left and it doesn't spawn more procs.
>>
>> 2014-11-17 15:30 GMT-06:00 Jim Challenger :
>> > It is also possible that RM "prediction" has decided that additional
>> > processes are not needed.  It
>> > appears that there were likely 64 work items dispatched, plus the 6
>> > completed, leaving only
>> > 30 that were "idle".  If these work items appeared to be completing
>> quickly,
>> > the RM would decide
>> > that scale-up would be wasteful and not do it.
>> >
>> > Very gory details if you're interested:
>> > The time to start a new processes is measured by the RM based on the
>> > observed initialization time of the processes plus an estimate of how
>> long
>> > it would take to get
>> > a new process actually running.  A fudge-factor is added on top of this
>> > because in a large operation
>> > it is wasteful to start processes (with associated preemptions) that only
>> > end up doing a "few" work
>> > tems.  All is subjective and configurable.
>> >
>> > The average time-per-work item is also reported to the RM.
>> >
>> > The RM then looks at the number of work items remaining, and the
>> estimated
>> > time needed to
>> > processes this work based on the above, and if it determines that the job
>> > will be completed before
>> > new processes can be scaled up and initialized, it does not scale up.
>> >
>> > For short jobs, this can be a bit inaccurate, but those jobs are short :)
>> >
>> > For longer jobs, the time-per-work-item becomes increasingly accurate so
>> the
>> > RM prediction tends
>> > to improve and ramp-up WILL occur if the work-item time turns out to be
>> > larger than originally
>> > thought.  (Our experience is that work-item times are mostly uniform with
>> > occasional outliers, but
>> > the prediction seems to work well).
>> >
>> > Relevant configuration parameters in ducc.properties:
>> > # Predict when a job will end and avoid expanding if not needed. Set to
>> > false to disable prediction.
>> >ducc.rm.prediction = true
>> > # Add this fudge factor (milliseconds) to the expansion target when using
>> > prediction
>> >ducc.rm.prediction.fudge = 12
>> >
>> > You can observe this in the rm log, see the example below.  I'm
>> preparing a
>> > guide to this log; for now,
>> > the net of these two log lines is: the projection for the job in question
>> > (job 208927) is that 16 processes
>> > are needed to complete this job, even though the job could use 20
>> processes
>> > at full expanseion - the BaseCap -
>> > so a max of 16 will be scheduled for it,  subject to fair-share
>> constraint.
>> >
>> > 17 Nov 2014 15:07:38,880  INFO RM.RmJob - */getPrjCap/* 208927  bobuser
>> O 2
>> > T 343171 NTh 128 TI 143171 TR 6748.601431980907 R 1.8967e-02 QR 5043 P
>> 6509
>> > F 0 ST 1416254363603*/return 16/*
>> > 17 Nov 2014 15:07:38,880  INFO RM.RmJob - */initJobCap/* 208927 bobuser
>> O 2
>> > */Base cap:/* 20 Expected future cap: 16 poten

Re: DUCC doesn't use all available machines

2014-11-28 Thread Simon Hafner
I've put the fudge to 12000, and it jumped immediately to 62 procs.
However, it doesn't spawn new ones even though it has about 6k items
left and it doesn't spawn more procs.

2014-11-17 15:30 GMT-06:00 Jim Challenger :
> It is also possible that RM "prediction" has decided that additional
> processes are not needed.  It
> appears that there were likely 64 work items dispatched, plus the 6
> completed, leaving only
> 30 that were "idle".  If these work items appeared to be completing quickly,
> the RM would decide
> that scale-up would be wasteful and not do it.
>
> Very gory details if you're interested:
> The time to start a new processes is measured by the RM based on the
> observed initialization time of the processes plus an estimate of how long
> it would take to get
> a new process actually running.  A fudge-factor is added on top of this
> because in a large operation
> it is wasteful to start processes (with associated preemptions) that only
> end up doing a "few" work
> tems.  All is subjective and configurable.
>
> The average time-per-work item is also reported to the RM.
>
> The RM then looks at the number of work items remaining, and the estimated
> time needed to
> processes this work based on the above, and if it determines that the job
> will be completed before
> new processes can be scaled up and initialized, it does not scale up.
>
> For short jobs, this can be a bit inaccurate, but those jobs are short :)
>
> For longer jobs, the time-per-work-item becomes increasingly accurate so the
> RM prediction tends
> to improve and ramp-up WILL occur if the work-item time turns out to be
> larger than originally
> thought.  (Our experience is that work-item times are mostly uniform with
> occasional outliers, but
> the prediction seems to work well).
>
> Relevant configuration parameters in ducc.properties:
> # Predict when a job will end and avoid expanding if not needed. Set to
> false to disable prediction.
>ducc.rm.prediction = true
> # Add this fudge factor (milliseconds) to the expansion target when using
> prediction
>ducc.rm.prediction.fudge = 12
>
> You can observe this in the rm log, see the example below.  I'm preparing a
> guide to this log; for now,
> the net of these two log lines is: the projection for the job in question
> (job 208927) is that 16 processes
> are needed to complete this job, even though the job could use 20 processes
> at full expanseion - the BaseCap -
> so a max of 16 will be scheduled for it,  subject to fair-share constraint.
>
> 17 Nov 2014 15:07:38,880  INFO RM.RmJob - */getPrjCap/* 208927  bobuser O 2
> T 343171 NTh 128 TI 143171 TR 6748.601431980907 R 1.8967e-02 QR 5043 P 6509
> F 0 ST 1416254363603*/return 16/*
> 17 Nov 2014 15:07:38,880  INFO RM.RmJob - */initJobCap/* 208927 bobuser O 2
> */Base cap:/* 20 Expected future cap: 16 potential cap 16 actual cap 16
>
> Jim
>
>
> On 11/17/14, 3:44 PM, Eddie Epstein wrote:
>>
>> DuccRawTextSpec.job specifies that each job process (JP)
>> run 8 analytic pipeline threads. So for this job with 100 work
>> items, no more than 13 JPs would ever be started.
>>
>> After successful initialization of the first JP, DUCC begins scaling
>> up the number of JPs using doubling. During JP scale up the
>> scheduler monitors the work item completion rate, compares that
>> with the JP initialization time, and stops scaling up JPs when
>> starting more JPs will not make the job run any faster.
>>
>> Of course JP scale up is also limited by the job's "fair share"
>> of resources relative to total resources available for all preemptable
>> jobs.
>>
>> To see more JPs, increase the number and/or size of the input text files,
>> or decrease the number of pipeline threads per JP.
>>
>> Note that it can be counter productive to run "too many" pipeline
>> threads per machine. Assuming analytic threads are 100% CPU bound,
>> running more threads than real cores will often slow down the overall
>> document processing rate.
>>
>>
>> On Mon, Nov 17, 2014 at 6:48 AM, Simon Hafner 
>> wrote:
>>
>>> I fired the DuccRawTextSpec.job on a cluster consisting of three
>>> machines, with 100 documents. The scheduler only runs the processes on
>>> two machines instead of all three. Can I mess with a few config
>>> variables to make it use all three?
>>>
>>> id:22 state:Running total:100 done:0 error:0 retry:0 procs:1
>>> id:22 state:Running total:100 done:0 error:0 retry:0 procs:2
>>> id:22 state:Running total:100 done:0 error:0 retry:0 procs:4
>>> id:22 state:Running total:100 done:1 error:0 retry:0 procs:8
>>> id:22 state:Running total:100 done:6 error:0 retry:0 procs:8
>>>
>


Re: Ducc: Rename failed

2014-11-28 Thread Simon Hafner
2014-11-28 14:18 GMT-06:00 Eddie Epstein :
> To debug, please add the following option to the job submission:
> --all_in_one local
>
> This will run all the code in a single process on the machine doing the
> submit. Hopefully the log file and/or console will be more informative.
Yes, that helped. It was a missing classpath.


Re: Ducc: Rename failed

2014-11-28 Thread Simon Hafner
2014-11-28 10:45 GMT-06:00 Eddie Epstein :
> DuccCasCC component has presumably created
> /home/ducc/analysis/txt.processed/5911.txt_0_processed.zip_temp and written
> to it?
I don't know, the _temp file doesn't exist anymore.

> Did you run this sample job in something other than cluster mode?
I get the same error running on a single machine.


Ducc: Rename failed

2014-11-28 Thread Simon Hafner
When running DUCC in cluster mode, I get "Rename failed". The file
mentioned in the error message exists in the txt.processed/ directory.
The mount is via nfs (rw,sync,insecure).

org.apache.uima.resource.ResourceProcessException: Received Exception
In Message From Service on Queue:ducc.jd.queue.75 Broker:
tcp://10.0.0.164:61617?jms.useCompression=true Cas
Identifier:18acd63:149f6f562d3:-7fa6 Exception:{3}
at 
org.apache.uima.adapter.jms.client.BaseUIMAAsynchronousEngineCommon_impl.sendAndReceiveCAS(BaseUIMAAsynchronousEngineCommon_impl.java:2230)
at 
org.apache.uima.adapter.jms.client.BaseUIMAAsynchronousEngineCommon_impl.sendAndReceiveCAS(BaseUIMAAsynchronousEngineCommon_impl.java:2049)
at org.apache.uima.ducc.jd.client.WorkItem.run(WorkItem.java:145)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.uima.aae.error.UimaEEServiceException:
org.apache.uima.analysis_engine.AnalysisEngineProcessException
at 
org.apache.uima.adapter.jms.activemq.JmsOutputChannel.sendReply(JmsOutputChannel.java:932)
at 
org.apache.uima.aae.controller.BaseAnalysisEngineController.handleAction(BaseAnalysisEngineController.java:1172)
at 
org.apache.uima.aae.controller.PrimitiveAnalysisEngineController_impl.takeAction(PrimitiveAnalysisEngineController_impl.java:1145)
at 
org.apache.uima.aae.error.handler.ProcessCasErrorHandler.handleError(ProcessCasErrorHandler.java:405)
at 
org.apache.uima.aae.error.ErrorHandlerChain.handle(ErrorHandlerChain.java:57)
at 
org.apache.uima.aae.controller.PrimitiveAnalysisEngineController_impl.process(PrimitiveAnalysisEngineController_impl.java:1065)
at 
org.apache.uima.aae.handler.HandlerBase.invokeProcess(HandlerBase.java:121)
at 
org.apache.uima.aae.handler.input.ProcessRequestHandler_impl.handleProcessRequestFromRemoteClient(ProcessRequestHandler_impl.java:543)
at 
org.apache.uima.aae.handler.input.ProcessRequestHandler_impl.handle(ProcessRequestHandler_impl.java:1050)
at 
org.apache.uima.aae.handler.input.MetadataRequestHandler_impl.handle(MetadataRequestHandler_impl.java:78)
at 
org.apache.uima.adapter.jms.activemq.JmsInputChannel.onMessage(JmsInputChannel.java:728)
at 
org.springframework.jms.listener.AbstractMessageListenerContainer.doInvokeListener(AbstractMessageListenerContainer.java:535)
at 
org.springframework.jms.listener.AbstractMessageListenerContainer.invokeListener(AbstractMessageListenerContainer.java:495)
at 
org.springframework.jms.listener.AbstractMessageListenerContainer.doExecuteListener(AbstractMessageListenerContainer.java:467)
at 
org.springframework.jms.listener.AbstractPollingMessageListenerContainer.doReceiveAndExecute(AbstractPollingMessageListenerContainer.java:325)
at 
org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveAndExecute(AbstractPollingMessageListenerContainer.java:263)
at 
org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.invokeListener(DefaultMessageListenerContainer.java:1058)
at 
org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListenerContainer.java:952)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at 
org.apache.uima.aae.UimaAsThreadFactory$1.run(UimaAsThreadFactory.java:129)
... 1 more
Caused by: org.apache.uima.analysis_engine.AnalysisEngineProcessException
at org.apache.uima.ducc.sampleapps.DuccCasCC.process(DuccCasCC.java:117)
at 
org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:48)
at 
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:385)
at 
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:309)
at 
org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:569)
at 
org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.(ASB_impl.java:411)
at 
org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:344)
at 
org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:266)
at 
org.apache.uima.aae.controller.PrimitiveAnalysisEngineControll

Re: DUCC org.apache.uima.util.InvalidXMLException and no logs

2014-11-28 Thread Simon Hafner
2014-11-27 11:44 GMT-06:00 Eddie Epstein :
> Those are the only two log files? Should be a ducc.log (probably with no
> more info than on the console), and either one or both of the job driver
> logfiles: jd.out.log and jobid-JD-jdnode-jdpid.log. If for some reason the
> job driver failed to start, check the job driver agent log (the agent
> managing the System/JobDriver reservation) for more info on what happened.

The job driver logs do not exist. I rebooted the machine and now it
works. I'll take a look at the agent log next time.


DUCC org.apache.uima.util.InvalidXMLException and no logs

2014-11-26 Thread Simon Hafner
When launching the Raw Text example application, it doesn't load with
the following error:

[ducc@ip-10-0-0-164 analysis]$ MyAppDir=$PWD MyInputDir=$PWD/txt
MyOutputDir=$PWD/txt.processed ~/ducc_install/bin/ducc_submit -f
DuccRawTextSpec.job
Job 50 submitted
id:50 location:5991@ip-10-0-0-164
id:50 state:WaitingForDriver
id:50 state:Completing total:-1 done:0 error:0 retry:0 procs:0
id:50 state:Completed total:-1 done:0 error:0 retry:0 procs:0
id:50 rationale:job driver exception occurred:
org.apache.uima.util.InvalidXMLException at
org.apache.uima.ducc.common.uima.UimaUtils.getXMLInputSource(UimaUtils.java:246)

However, there are no logs with a stacktrace or similar, how do I get
hold of one? The only files in the log directory are:

[ducc@ip-10-0-0-164 analysis]$ cat logs/50/specified-by-user.properties
#Thu Nov 27 03:00:57 UTC 2014
working_directory=/home/ducc/analysis
process_descriptor_CM=org.apache.uima.ducc.sampleapps.DuccTextCM
driver_descriptor_CR=org.apache.uima.ducc.sampleapps.DuccJobTextCR
cancel_on_interrupt=
process_descriptor_CC_overrides=UseBinaryCompression\=true
process_descriptor_CC=org.apache.uima.ducc.sampleapps.DuccCasCC
log_directory=/home/ducc/analysis/logs
wait_for_completion=
classpath=/home/ducc/analysis/lib/*
process_thread_count=8
driver_descriptor_CR_overrides=BlockSize\=10 SendToLast\=true
InputDirectory\=/home/ducc/analysis/txt
OutputDirectory\=/home/ducc/analysis/txt.processed
process_per_item_time_max=20
process_descriptor_AE=/home/ducc/analysis/opennlp.uima.OpenNlpTextAnalyzer/opennlp.uima.OpenNlpTextAnalyzer_pear.xml
description=DUCC raw text sample application
process_jvm_args=-Xmx3G -XX\:+UseCompressedOops
-Djava.util.logging.config.file\=/home/ducc/analysis/ConsoleLogger.properties
scheduling_class=normal
process_memory_size=4
specification=DuccRawTextSpec.job

[ducc@ip-10-0-0-164 analysis]$ cat logs/50/job-specification.properties
#Thu Nov 27 03:00:57 UTC 2014
working_directory=/home/ducc/analysis
process_descriptor_CM=org.apache.uima.ducc.sampleapps.DuccTextCM
process_failures_limit=20
driver_descriptor_CR=org.apache.uima.ducc.sampleapps.DuccJobTextCR
cancel_on_interrupt=
process_descriptor_CC_overrides=UseBinaryCompression\=true
process_descriptor_CC=org.apache.uima.ducc.sampleapps.DuccCasCC
classpath_order=ducc-before-user
log_directory=/home/ducc/analysis/logs
submitter_pid_at_host=5991@ip-10-0-0-164
wait_for_completion=
classpath=/home/ducc/analysis/lib/*
process_thread_count=8
driver_descriptor_CR_overrides=BlockSize\=10 SendToLast\=true
InputDirectory\=/home/ducc/analysis/txt
OutputDirectory\=/home/ducc/analysis/txt.processed
process_initialization_failures_cap=99
process_per_item_time_max=20
process_descriptor_AE=/home/ducc/analysis/opennlp.uima.OpenNlpTextAnalyzer/opennlp.uima.OpenNlpTextAnalyzer_pear.xml
description=DUCC raw text sample application
process_jvm_args=-Xmx3G -XX\:+UseCompressedOops
-Djava.util.logging.config.file\=/home/ducc/analysis/ConsoleLogger.properties
scheduling_class=normal
environment=HOME\=/home/ducc LANG\=en_US.UTF-8 USER\=ducc
process_memory_size=4
user=ducc
specification=DuccRawTextSpec.job


Re: DUCC stuck Waiting for Resources - new install on CentOS 6.5 VM

2014-11-18 Thread Simon Hafner
How many shares does your agent have available?

2014-11-18 14:37 GMT-06:00 Dan Heinze :
> I've read the "DUCC stuck Waiting for Resources on Amazon..." thread.
> I have a similar problem.  I did my first install of DUCC yesterday on a
> CentOS 6.5 VM with 9GB RAM.  No problems with the install. ./start_ducc -s
> seems to work fine, but when I look at ducc-mon Reservations, I find that
> Job Driver is stuck "Waiting for Resources", I have given it hours, but it
> just stays stuck there.  Also, nothing is being written to the logs... the
> ${DUCC_HOME}/logs directory is empty.  Any help will be appreciated.
>
> -Dan
>


DUCC doesn't use all available machines

2014-11-17 Thread Simon Hafner
I fired the DuccRawTextSpec.job on a cluster consisting of three
machines, with 100 documents. The scheduler only runs the processes on
two machines instead of all three. Can I mess with a few config
variables to make it use all three?

id:22 state:Running total:100 done:0 error:0 retry:0 procs:1
id:22 state:Running total:100 done:0 error:0 retry:0 procs:2
id:22 state:Running total:100 done:0 error:0 retry:0 procs:4
id:22 state:Running total:100 done:1 error:0 retry:0 procs:8
id:22 state:Running total:100 done:6 error:0 retry:0 procs:8


Re: DUCC 1.1.0- How to Run two DUCC version on same machines with different user

2014-11-17 Thread Simon Hafner
2014-11-17 0:00 GMT-06:00 reshu.agarwal :
> I want to run two DUCC version i.e. 1.0.0 and 1.1.0 on same machines with
> different user. Can this be possible?
Yes, that should be possible. You'll have to make sure there are no
port conflicts, I'd guess the ActiveMQ port is hardcoded, the rest
might be randomly assigned. Just set that port manually and watch out
for any errors during the start to see which other components have
hardcoded port numbers.

Personally, I'd just fire up a VM with qemu or VirtualBox.


Re: DUCC stuck at WaitingForResources on an Amazon Linux

2014-11-14 Thread Simon Hafner
So to run effectively, I would need more memory, because the job wants
two shares? ... Yes. With a larger node it works. What would be a
reasonable memory size for a ducc node?

2014-11-14 9:38 GMT-06:00 Lou DeGenaro :
> Simon,
>
> Congratulations!  You found a bug in DUCC's Web Server.  It was incorrectly
> rounding up when reporting the number of shares for a machine.  This issue
> is addressed by Jira 4104 .
>
> Lou.
>
> On Fri, Nov 14, 2014 at 7:49 AM, Jim Challenger  wrote:
>
>> Simon,
>> It looks like the problem is the amount of RAM on your machine. It's
>> going to be hard to get any meaningful work running on < about 8G.
>>
>> Here's what to do to get the test job to run on your 4G machine:
>> 1.  In the resources folder, edit ducc.properties and change this:
>>   ducc.jd.host.memory.size=2GB
>>  to this:
>>   ducc.jd.host.memory.size=1GB
>>
>>  This is the amount of RAM that DUCC reserves for itself to manage
>> it's "head" processes.
>>
>> 2.  In the examples/simple folder, edit 1.job and change this:
>>  process_memory_size2
>>  to this:
>>  process_memory_size1
>>
>>  This is the amount of memory in GB that the sample 1.job is
>> requesting.
>>
>>  3.  Stop ducc and restart it so the ducc processes reset the
>> jd.host.memory size from the new ducc.properties.
>>
>>  4.  Rerun 1.job and all should be well.
>>
>>   Here are the gory details from the RM log, if you're interested.  In
>> the RM log, I see these lines.
>>
>> 13 Nov 2014 22:04:14,909  INFO RM.NodePool - queryMachines N/A
>>  Name   Order Active Shares Unused Shares Memory (MB)
>> Jobs
>> - - - ---
>> -- ...
>> .us-west-2.compute.internal 3 2 13955 7 [1]
>>
>> This says you have 3G of **usable-by-ducc** RAM, of which 2G are used by
>> the reservation/job "7", and that you have 1GB free.  The reason you have
>> only 3GB **usable** is that usually the hardware/opsys will reserve a small
>> part of the installed RAM for itself, so the reported RAM is a tad
>> smaller.  To avoid overcommitting the system, we use the reported value,
>> not the installed value.  Most or all of the jobs here will easily
>> overwhelm even the largest machines if we don't do this.
>>
>> Next,  these lines show the actual schedule the RM is trying to build.
>> Dormant:
>> IDJobName   User Class Shares
>> Order QShares NTh Memory nQuest Ques Rem InitWait Max P/Nst
>>J_8 Test_job_1   ducc normal  0
>>  2   0   2  2 15   15 true 8
>>
>> Reserved:
>> IDJobName   User Class Shares
>> Order QShares NTh Memory nQuest Ques Rem InitWait Max P/Nst
>>R_7 Job_Driver System JobDriver  1
>>2   2   0  2  00 0 1
>>
>> This confirms that the DUCC reservation "7" occupies 2G, and that job "8"
>> is requesting 2G but is "dormant", i.e. waiting for resources.  Since there
>> is only 3G available on this machine, job 8 will wait.
>>
>> Best,
>> Jim
>>
>>
>>
>>
>>
>>


Re: DUCC stuck at WaitingForResources on an Amazon Linux

2014-11-13 Thread Simon Hafner
4,925 DEBUG RM.NodepoolScheduler -
apportion_qshares N/A countClassShares   normal gbo  0   0   0
  0
13 Nov 2014 22:04:14,925 DEBUG RM.NodepoolScheduler -
apportion_qshares N/A countClassShares vshares   0   1   0   0
13 Nov 2014 22:04:14,925 DEBUG RM.NodepoolScheduler -
apportion_qshares N/A countClassShares nshares   0   1   0   0
13 Nov 2014 22:04:14,925  INFO RM.NodepoolScheduler - whatOfFairShare
   N/A Schedule class urgent
13 Nov 2014 22:04:14,925  INFO RM.NodepoolScheduler - whatOfFairShare
   N/A No jobs to schedule in class  urgent
13 Nov 2014 22:04:14,925  INFO RM.NodepoolScheduler - whatOfFairShare
   N/A Schedule class high
13 Nov 2014 22:04:14,925  INFO RM.NodepoolScheduler - whatOfFairShare
   N/A No jobs to schedule in class  high
13 Nov 2014 22:04:14,926  INFO RM.NodepoolScheduler - whatOfFairShare
   N/A Schedule class standalone
13 Nov 2014 22:04:14,926  INFO RM.NodepoolScheduler - whatOfFairShare
   N/A No jobs to schedule in class  standalone
13 Nov 2014 22:04:14,926  INFO RM.NodepoolScheduler - whatOfFairShare
   N/A Schedule class weekly
13 Nov 2014 22:04:14,926  INFO RM.NodepoolScheduler - whatOfFairShare
   N/A No jobs to schedule in class  weekly
13 Nov 2014 22:04:14,926  INFO RM.NodepoolScheduler - whatOfFairShare
   N/A Schedule class normal
13 Nov 2014 22:04:14,926  INFO RM.NodepoolScheduler - whatOfFairShare
 8 Scheduling job in class  normal : 0 shares given, order 2
13 Nov 2014 22:04:14,926  INFO RM.NodepoolScheduler - whatOfFairShare
   N/A Schedule class low
13 Nov 2014 22:04:14,926  INFO RM.NodepoolScheduler - whatOfFairShare
   N/A No jobs to schedule in class  low
13 Nov 2014 22:04:14,926  INFO RM.NodepoolScheduler - whatOfFairShare
   N/A Schedule class background
13 Nov 2014 22:04:14,926  INFO RM.NodepoolScheduler - whatOfFairShare
   N/A No jobs to schedule in class  background
13 Nov 2014 22:04:14,926  INFO RM.NodepoolScheduler -
traverseNodepoolsForExpansion N/A --- stop_here_dx 8
13 Nov 2014 22:04:14,926  INFO RM.NodePool - doExpansion N/A NP:
--default-- Expansions in this order: 8:notfound
13 Nov 2014 22:04:14,927  INFO RM.NodepoolScheduler - doEvictions
N/A  --default-- Counted Current Needed Order
13 Nov 2014 22:04:14,927  INFO RM.NodepoolScheduler - doEvictions
 8  --default--   0   0  0 2
13 Nov 2014 22:04:14,927  INFO RM.NodepoolScheduler - doEvictions
N/A  --default-- Counted Current Needed Order
13 Nov 2014 22:04:14,927  INFO RM.NodepoolScheduler - doEvictions
 7  --default--   1   1  0 2
13 Nov 2014 22:04:14,927 DEBUG RM.NodepoolScheduler - doEvictions
N/A --default-- NeededByOrder before any eviction: [0, 0, 0, 0]
13 Nov 2014 22:04:14,927 DEBUG RM.NodepoolScheduler -
detectFragmentation N/A vMachines:   0   1   0   0
13 Nov 2014 22:04:14,927 DEBUG RM.NodepoolScheduler -
detectFragmentation N/A Nodepools:--default--
13 Nov 2014 22:04:14,927  INFO RM.NodepoolScheduler -
detectFragmentation N/A Nodepool   User PureFS  NSh
Counted Needed  O Class: normal
13 Nov 2014 22:04:14,927  INFO RM.NodepoolScheduler -
detectFragmentation   8  --default--   ducc  00
0  0  2
13 Nov 2014 22:04:14,927  INFO RM.NodepoolScheduler -
detectFragmentation N/A Nodepool   User PureFS  NSh
Counted Needed  O Class: JobDriver
13 Nov 2014 22:04:14,928  INFO RM.NodepoolScheduler -
insureFullEviction N/A No needy jobs, defragmentation bypassed.
13 Nov 2014 22:04:14,934  INFO RM.Scheduler - schedule N/A
--- Scheduler returns ---
13 Nov 2014 22:04:14,934  INFO RM.Scheduler - schedule N/A
 Expanded:


Shrunken:
   

Stable:
   

Dormant:
IDJobName   User  Class
Shares Order QShares NTh Memory nQuest Ques Rem InitWait Max P/Nst
   J_8 Test_job_1   ducc normal
  0 2   0   2  2 15   15 true 8

Reserved:
IDJobName   User  Class
Shares Order QShares NTh Memory nQuest Ques Rem InitWait Max P/Nst
   R_7 Job_Driver System  JobDriver
  1 2   2   0  2  000 1


13 Nov 2014 22:04:14,934  INFO RM.Scheduler - schedule N/A

13 Nov 2014 22:04:14,934  INFO RM.JobManagerConverter - createState
 N/A Schedule sent to Orchestrator
13 Nov 2014 22:04:14,934  INFO RM.JobManagerConverter - createState N/A
Reservation 7
Existing[1]: .us-west-2.compute.internal.1^0
Additions[0]:
Removals[0]:
Job 8
Existing[0]:
Additions[0]:
Removals[0]:

13 Nov 2014 22:04:14,946  INFO RM.ResourceManagerComponent -
runScheduler N/A  30 --- Scheduling loop returns


2014-11-13 12:12 GMT-06:00 Eddie Epstein :
> Simon,
>
> The DUCC resource manager logs into rm.log. Did you look there for reasons
> the resources are not being allocated?
>
> Eddie
>
> On Wed, No

Re: DUCC stuck at WaitingForResources on an Amazon Linux

2014-11-12 Thread Simon Hafner
4 shares total, 2 in use.

2014-11-12 5:06 GMT-06:00 Lou DeGenaro :
> Try looking at your DUCC's web server.  On the System -> Machines page
> do you see any shares not inuse?
>
> Lou.
>
> On Wed, Nov 12, 2014 at 5:51 AM, Simon Hafner  wrote:
>> I've set up DUCC according to
>> https://cwiki.apache.org/confluence/display/UIMA/DUCC
>>
>> ducc_install/bin/ducc_submit -f ducc_install/examples/simple/1.job
>>
>> the job is stuck at WaitingForResources.
>>
>> 12 Nov 2014 10:37:30,175  INFO Agent.LinuxNodeMetricsProcessor -
>> process N/A ... Agent Collecting User Processes
>> 12 Nov 2014 10:37:30,176  INFO Agent.NodeAgent -
>> copyAllUserReservations N/A +++ Copying User Reservations
>> - List Size:0
>> 12 Nov 2014 10:37:30,176  INFO Agent.LinuxNodeMetricsProcessor - call
>>N/A ** User Process Map Size After
>> copyAllUserReservations:0
>> 12 Nov 2014 10:37:30,176  INFO Agent.LinuxNodeMetricsProcessor - call
>>N/A ** User Process Map Size After
>> copyAllUserRougeProcesses:0
>> 12 Nov 2014 10:37:30,182  INFO Agent.LinuxNodeMetricsProcessor - call N/A
>> 12 Nov 2014 10:37:30,182  INFO Agent.LinuxNodeMetricsProcessor - call
>>N/A 
>> **
>> 12 Nov 2014 10:37:30,182  INFO Agent.LinuxNodeMetricsProcessor -
>> process N/A ... Agent ip-172-31-7-237.us-west-2.compute.internal
>> Posting Memory:4050676 Memory Free:4013752 Swap Total:0 Swap Free:0
>> Low Swap Threshold Defined in ducc.properties:0
>> 12 Nov 2014 10:37:33,303  INFO Agent.AgentEventListener -
>> reportIncomingStateForThisNode N/A Received OR Sequence:699 Thread
>> ID:13
>> 12 Nov 2014 10:37:33,303  INFO Agent.AgentEventListener -
>> reportIncomingStateForThisNode N/A
>> JD--> JobId:6 ProcessId:0 PID:8168 Status:Running Resource
>> State:Allocated isDeallocated:false
>> 12 Nov 2014 10:37:33,303  INFO Agent.NodeAgent - setReservations
>> N/A +++ Copied User Reservations - List Size:0
>> 12 Nov 2014 10:37:33,405  INFO
>> Agent.AgentConfiguration$$EnhancerByCGLIB$$cc49880b - getSwapUsage-
>>  N/A PID:8168 Swap Usage:0
>> 12 Nov 2014 10:37:33,913  INFO
>> Agent.AgentConfiguration$$EnhancerByCGLIB$$cc49880b -
>> collectProcessCurrentCPU N/A 0.0 == CPUTIME:0.0
>> 12 Nov 2014 10:37:33,913  INFO
>> Agent.AgentConfiguration$$EnhancerByCGLIB$$cc49880b - process N/A
>> --- PID:8168 Major Faults:0 Process Swap Usage:0 Max Swap
>> Usage Allowed:-108574720 Time to Collect Swap Usage:0
>>
>> I'm using a t2.medium instance (2 CPU, ~ 4GB RAM) and the stock Amazon
>> Linux (looks centos based).
>>
>> To install maven (not in the repos)
>>
>> #! /bin/bash
>>
>> TEMPORARY_DIRECTORY="$(mktemp -d)"
>> DOWNLOAD_TO="$TEMPORARY_DIRECTORY/maven.tgz"
>>
>> echo 'Downloading Maven to: ' "$DOWNLOAD_TO"
>>
>> wget -O "$DOWNLOAD_TO"
>> http://www.eng.lsu.edu/mirrors/apache/maven/maven-3/3.2.3/binaries/apache-maven-3.2.3-bin.tar.gz
>>
>> echo 'Extracting Maven'
>> tar xzf $DOWNLOAD_TO -C $TEMPORARY_DIRECTORY
>> rm $DOWNLOAD_TO
>>
>> echo 'Configuring Envrionment'
>>
>> mv $TEMPORARY_DIRECTORY/apache-maven-* /usr/local/maven
>> echo -e 'export M2_HOME=/usr/local/maven\nexport
>> PATH=${M2_HOME}/bin:${PATH}' > /etc/profile.d/maven.sh
>> source /etc/profile.d/maven.sh
>>
>> echo 'The maven version: ' `mvn -version` ' has been installed.'
>> echo -e '\n\n!! Note you must relogin to get mvn in your path !!'
>> echo 'Removing the temporary directory...'
>> rm -r "$TEMPORARY_DIRECTORY"
>> echo 'Your Maven Installation is Complete.'


DUCC stuck at WaitingForResources on an Amazon Linux

2014-11-12 Thread Simon Hafner
I've set up DUCC according to
https://cwiki.apache.org/confluence/display/UIMA/DUCC

ducc_install/bin/ducc_submit -f ducc_install/examples/simple/1.job

the job is stuck at WaitingForResources.

12 Nov 2014 10:37:30,175  INFO Agent.LinuxNodeMetricsProcessor -
process N/A ... Agent Collecting User Processes
12 Nov 2014 10:37:30,176  INFO Agent.NodeAgent -
copyAllUserReservations N/A +++ Copying User Reservations
- List Size:0
12 Nov 2014 10:37:30,176  INFO Agent.LinuxNodeMetricsProcessor - call
   N/A ** User Process Map Size After
copyAllUserReservations:0
12 Nov 2014 10:37:30,176  INFO Agent.LinuxNodeMetricsProcessor - call
   N/A ** User Process Map Size After
copyAllUserRougeProcesses:0
12 Nov 2014 10:37:30,182  INFO Agent.LinuxNodeMetricsProcessor - call N/A
12 Nov 2014 10:37:30,182  INFO Agent.LinuxNodeMetricsProcessor - call
   N/A 
**
12 Nov 2014 10:37:30,182  INFO Agent.LinuxNodeMetricsProcessor -
process N/A ... Agent ip-172-31-7-237.us-west-2.compute.internal
Posting Memory:4050676 Memory Free:4013752 Swap Total:0 Swap Free:0
Low Swap Threshold Defined in ducc.properties:0
12 Nov 2014 10:37:33,303  INFO Agent.AgentEventListener -
reportIncomingStateForThisNode N/A Received OR Sequence:699 Thread
ID:13
12 Nov 2014 10:37:33,303  INFO Agent.AgentEventListener -
reportIncomingStateForThisNode N/A
JD--> JobId:6 ProcessId:0 PID:8168 Status:Running Resource
State:Allocated isDeallocated:false
12 Nov 2014 10:37:33,303  INFO Agent.NodeAgent - setReservations
N/A +++ Copied User Reservations - List Size:0
12 Nov 2014 10:37:33,405  INFO
Agent.AgentConfiguration$$EnhancerByCGLIB$$cc49880b - getSwapUsage-
 N/A PID:8168 Swap Usage:0
12 Nov 2014 10:37:33,913  INFO
Agent.AgentConfiguration$$EnhancerByCGLIB$$cc49880b -
collectProcessCurrentCPU N/A 0.0 == CPUTIME:0.0
12 Nov 2014 10:37:33,913  INFO
Agent.AgentConfiguration$$EnhancerByCGLIB$$cc49880b - process N/A
--- PID:8168 Major Faults:0 Process Swap Usage:0 Max Swap
Usage Allowed:-108574720 Time to Collect Swap Usage:0

I'm using a t2.medium instance (2 CPU, ~ 4GB RAM) and the stock Amazon
Linux (looks centos based).

To install maven (not in the repos)

#! /bin/bash

TEMPORARY_DIRECTORY="$(mktemp -d)"
DOWNLOAD_TO="$TEMPORARY_DIRECTORY/maven.tgz"

echo 'Downloading Maven to: ' "$DOWNLOAD_TO"

wget -O "$DOWNLOAD_TO"
http://www.eng.lsu.edu/mirrors/apache/maven/maven-3/3.2.3/binaries/apache-maven-3.2.3-bin.tar.gz

echo 'Extracting Maven'
tar xzf $DOWNLOAD_TO -C $TEMPORARY_DIRECTORY
rm $DOWNLOAD_TO

echo 'Configuring Envrionment'

mv $TEMPORARY_DIRECTORY/apache-maven-* /usr/local/maven
echo -e 'export M2_HOME=/usr/local/maven\nexport
PATH=${M2_HOME}/bin:${PATH}' > /etc/profile.d/maven.sh
source /etc/profile.d/maven.sh

echo 'The maven version: ' `mvn -version` ' has been installed.'
echo -e '\n\n!! Note you must relogin to get mvn in your path !!'
echo 'Removing the temporary directory...'
rm -r "$TEMPORARY_DIRECTORY"
echo 'Your Maven Installation is Complete.'