[jira] [Created] (AURORA-1594) End-to-end test is broken

2016-01-25 Thread Bill Farner (JIRA)
Bill Farner created AURORA-1594:
---

 Summary: End-to-end test is broken
 Key: AURORA-1594
 URL: https://issues.apache.org/jira/browse/AURORA-1594
 Project: Aurora
  Issue Type: Bug
  Components: Scheduler
Reporter: Bill Farner
Priority: Blocker


{noformat}
+ aurora job create devcluster/vagrant/test/http_example 
/vagrant/src/test/sh/org/apache/aurora/e2e/http/http_example.aurora
 WARN]
WARNING: endpoint, expected_response, and expected_response_code are deprecated 
and will be removed
in the next release. Please consult updated documentation.

 INFO] Creating job http_example
 WARN] Could not connect to scheduler: No schedulers detected in devcluster!
 WARN] Could not connect to scheduler: No schedulers detected in devcluster!
Job creation failed due to error:
java.lang.IllegalArgumentException: Multiple entries with same key: 
ITaskConfig{job=IJobKey{role=vagrant, environment=test, name=http_example}, 
owner=IIdentity{role=null, user=vagrant}, environment=null, jobName=null, 
isService=true, numCpus=0.4, ramMb=32, diskMb=64, priority=0, 
maxTaskFailures=1, production=false, tier=null, constraints=[], 
requestedPorts=[http], taskLinks={http=http://%host%:%port:http%}, 
contactEmail=vagrant@localhost, 
executorConfig=IExecutorConfig{name=AuroraExecutor, data={"environment": 
"test", "health_check_config": {"expected_response_code": 0, "endpoint": 
"/health", "health_checker": {"http": {"expected_response_code": 0, "endpoint": 
"/health", "expected_response": "ok"}}, "initial_interval_secs": 5.0, 
"expected_response": "ok", "max_consecutive_failures": 0, "timeout_secs": 1.0, 
"interval_secs": 1.0}, "name": "http_example", "service": true, 
"max_task_failures": 1, "cron_collision_policy": "KILL_EXISTING", 
"enable_hooks": false, "cluster": "devcluster", "task": {"processes": 
[{"daemon": false, "name": "stage_server", "ephemeral": false, "max_failures": 
1, "min_duration": 5, "cmdline": "cp 
/vagrant/src/test/sh/org/apache/aurora/e2e/http_example.py .", "final": false}, 
{"daemon": false, "name": "run_server", "ephemeral": false, "max_failures": 1, 
"min_duration": 5, "cmdline": "python http_example.py {{thermos.ports[http]}}", 
"final": false}], "name": "http_example", "finalization_wait": 30, 
"max_failures": 1, "max_concurrency": 0, "resources": {"disk": 67108864, "ram": 
33554432, "cpu": 0.4}, "constraints": [{"order": ["stage_server", 
"run_server"]}]}, "production": false, "role": "vagrant", "contact": 
"vagrant@localhost", "announce": {"primary_port": "http", "portmap": {"aurora": 
"http"}}, "lifecycle": {"http": {"graceful_shutdown_endpoint": "/quitquitquit", 
"port": "health", "shutdown_endpoint": "/abortabortabort"}}, "priority": 0}}, 
metadata=[], container=IContainer{setField=MESOS, 
value=IMesosContainer{}}}=org.apache.aurora.scheduler.storage.db.views.DbTaskConfig@7b345c31
 and ITaskConfig{job=IJobKey{role=vagrant, environment=test, 
name=http_example}, owner=IIdentity{role=null, user=vagrant}, environment=null, 
jobName=null, isService=true, numCpus=0.4, ramMb=32, diskMb=64, priority=0, 
maxTaskFailures=1, production=false, tier=null, constraints=[], 
requestedPorts=[http], taskLinks={http=http://%host%:%port:http%}, 
contactEmail=vagrant@localhost, 
executorConfig=IExecutorConfig{name=AuroraExecutor, data={"environment": 
"test", "health_check_config": {"expected_response_code": 0, "endpoint": 
"/health", "health_checker": {"http": {"expected_response_code": 0, "endpoint": 
"/health", "expected_response": "ok"}}, "initial_interval_secs": 5.0, 
"expected_response": "ok", "max_consecutive_failures": 0, "timeout_secs": 1.0, 
"interval_secs": 1.0}, "name": "http_example", "service": true, 
"max_task_failures": 1, "cron_collision_policy": "KILL_EXISTING", 
"enable_hooks": false, "cluster": "devcluster", "task": {"processes": 
[{"daemon": false, "name": "stage_server", "ephemeral": false, "max_failures": 
1, "min_duration": 5, "cmdline": "cp 
/vagrant/src/test/sh/org/apache/aurora/e2e/http_example.py .", "final": false}, 
{"daemon": false, "name": "run_server", "ephemeral": false, "max_failures": 1, 
"min_duration": 5, "cmdline": "python http_example.py {{thermos.ports[http]}}", 
"final": false}], "name": "http_example", "finalization_wait": 30, 
"max_failures": 1, "max_concurrency": 0, "resources": {"disk": 67108864, "ram": 
33554432, "cpu": 0.4}, "constraints": [{"order": ["stage_server", 
"run_server"]}]}, "production": false, "role": "vagrant", "contact": 
"vagrant@localhost", "announce": {"primary_port": "http", "portmap": {"aurora": 
"http"}}, "lifecycle": {"http": {"graceful_shutdown_endpoint": "/quitquitquit", 
"port": "health", "shutdown_endpoint": "/abortabortabort"}}, "priority": 0}}, 
metadata=[], container=IContainer{setField=MESOS, 
value=IMesosContainer{}}}=org.apache.aurora.scheduler.storage.db.views.DbTaskConfig@7ac8690c.
 To index multiple 

[jira] [Updated] (AURORA-1594) End-to-end test is broken

2016-01-25 Thread Bill Farner (JIRA)

 [ 
https://issues.apache.org/jira/browse/AURORA-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Farner updated AURORA-1594:

Description: 
{noformat}
+ aurora job create devcluster/vagrant/test/http_example 
/vagrant/src/test/sh/org/apache/aurora/e2e/http/http_example.aurora
 WARN]
WARNING: endpoint, expected_response, and expected_response_code are deprecated 
and will be removed
in the next release. Please consult updated documentation.

 INFO] Creating job http_example
 WARN] Could not connect to scheduler: No schedulers detected in devcluster!
 WARN] Could not connect to scheduler: No schedulers detected in devcluster!
Job creation failed due to error:
java.lang.IllegalArgumentException: Multiple entries with same key: 
ITaskConfig{job=IJobKey{role=vagrant, environment=test, name=http_example}, 
owner=IIdentity{role=null, user=vagrant}, environment=null, jobName=null, 
isService=true, numCpus=0.4, ramMb=32, diskMb=64, priority=0, 
maxTaskFailures=1, production=false, tier=null, constraints=[], 
requestedPorts=[http], taskLinks={http=http://%host%:%port:http%}, 
contactEmail=vagrant@localhost, 
executorConfig=IExecutorConfig{name=AuroraExecutor, data={"environment": 
"test", "health_check_config": {"expected_response_code": 0, "endpoint": 
"/health", "health_checker": {"http": {"expected_response_code": 0, "endpoint": 
"/health", "expected_response": "ok"}}, "initial_interval_secs": 5.0, 
"expected_response": "ok", "max_consecutive_failures": 0, "timeout_secs": 1.0, 
"interval_secs": 1.0}, "name": "http_example", "service": true, 
"max_task_failures": 1, "cron_collision_policy": "KILL_EXISTING", 
"enable_hooks": false, "cluster": "devcluster", "task": {"processes": 
[{"daemon": false, "name": "stage_server", "ephemeral": false, "max_failures": 
1, "min_duration": 5, "cmdline": "cp 
/vagrant/src/test/sh/org/apache/aurora/e2e/http_example.py .", "final": false}, 
{"daemon": false, "name": "run_server", "ephemeral": false, "max_failures": 1, 
"min_duration": 5, "cmdline": "python http_example.py {{thermos.ports[http]}}", 
"final": false}], "name": "http_example", "finalization_wait": 30, 
"max_failures": 1, "max_concurrency": 0, "resources": {"disk": 67108864, "ram": 
33554432, "cpu": 0.4}, "constraints": [{"order": ["stage_server", 
"run_server"]}]}, "production": false, "role": "vagrant", "contact": 
"vagrant@localhost", "announce": {"primary_port": "http", "portmap": {"aurora": 
"http"}}, "lifecycle": {"http": {"graceful_shutdown_endpoint": "/quitquitquit", 
"port": "health", "shutdown_endpoint": "/abortabortabort"}}, "priority": 0}}, 
metadata=[], container=IContainer{setField=MESOS, 
value=IMesosContainer{}}}=org.apache.aurora.scheduler.storage.db.views.DbTaskConfig@7b345c31
 and ITaskConfig{job=IJobKey{role=vagrant, environment=test, 
name=http_example}, owner=IIdentity{role=null, user=vagrant}, environment=null, 
jobName=null, isService=true, numCpus=0.4, ramMb=32, diskMb=64, priority=0, 
maxTaskFailures=1, production=false, tier=null, constraints=[], 
requestedPorts=[http], taskLinks={http=http://%host%:%port:http%}, 
contactEmail=vagrant@localhost, 
executorConfig=IExecutorConfig{name=AuroraExecutor, data={"environment": 
"test", "health_check_config": {"expected_response_code": 0, "endpoint": 
"/health", "health_checker": {"http": {"expected_response_code": 0, "endpoint": 
"/health", "expected_response": "ok"}}, "initial_interval_secs": 5.0, 
"expected_response": "ok", "max_consecutive_failures": 0, "timeout_secs": 1.0, 
"interval_secs": 1.0}, "name": "http_example", "service": true, 
"max_task_failures": 1, "cron_collision_policy": "KILL_EXISTING", 
"enable_hooks": false, "cluster": "devcluster", "task": {"processes": 
[{"daemon": false, "name": "stage_server", "ephemeral": false, "max_failures": 
1, "min_duration": 5, "cmdline": "cp 
/vagrant/src/test/sh/org/apache/aurora/e2e/http_example.py .", "final": false}, 
{"daemon": false, "name": "run_server", "ephemeral": false, "max_failures": 1, 
"min_duration": 5, "cmdline": "python http_example.py {{thermos.ports[http]}}", 
"final": false}], "name": "http_example", "finalization_wait": 30, 
"max_failures": 1, "max_concurrency": 0, "resources": {"disk": 67108864, "ram": 
33554432, "cpu": 0.4}, "constraints": [{"order": ["stage_server", 
"run_server"]}]}, "production": false, "role": "vagrant", "contact": 
"vagrant@localhost", "announce": {"primary_port": "http", "portmap": {"aurora": 
"http"}}, "lifecycle": {"http": {"graceful_shutdown_endpoint": "/quitquitquit", 
"port": "health", "shutdown_endpoint": "/abortabortabort"}}, "priority": 0}}, 
metadata=[], container=IContainer{setField=MESOS, 
value=IMesosContainer{}}}=org.apache.aurora.scheduler.storage.db.views.DbTaskConfig@7ac8690c.
 To index multiple values under a key, use Multimaps.index.
+ collect_result
+ [[ 1 = 0 ]]
+ echo '!!! FAIL (something returned non-zero) for [[ $RETCODE = 0 ]]'
{noformat}

Stack trace:

[jira] [Commented] (AURORA-1052) Populate Labels in TaskConfig

2016-01-25 Thread Stephan Erb (JIRA)

[ 
https://issues.apache.org/jira/browse/AURORA-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115697#comment-15115697
 ] 

Stephan Erb commented on AURORA-1052:
-

I'll try to look into that in a couple of days. 

Idea would be to have a command line flag defaulting to 
`org.apache.aurora.metadata` that will be used as a prefix for any metadata 
entry mapped to a label. This prefix could be changed globally by the cluster 
administrator to `com.myorganization` but could also be set to empty if he 
wants to leave it up to the user to decide.

What do you think?

> Populate Labels in TaskConfig
> -
>
> Key: AURORA-1052
> URL: https://issues.apache.org/jira/browse/AURORA-1052
> Project: Aurora
>  Issue Type: Story
>  Components: Scheduler
>Reporter: Stephan Erb
>Priority: Minor
>  Labels: newbie
>
> Mesos has introduced labels on tasks (MESOS-2120). These correspond to what 
> Aurora calls metadata. 
> We should therefore set task labels according to our metadata information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AURORA-1052) Populate Labels in TaskConfig

2016-01-25 Thread Stephan Erb (JIRA)

[ 
https://issues.apache.org/jira/browse/AURORA-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115712#comment-15115712
 ] 

Stephan Erb commented on AURORA-1052:
-

I'd think the enduser would not need an additional interface. He could simply 
could simply use an appropriate metadata key himself.

> Populate Labels in TaskConfig
> -
>
> Key: AURORA-1052
> URL: https://issues.apache.org/jira/browse/AURORA-1052
> Project: Aurora
>  Issue Type: Story
>  Components: Scheduler
>Reporter: Stephan Erb
>Priority: Minor
>  Labels: newbie
>
> Mesos has introduced labels on tasks (MESOS-2120). These correspond to what 
> Aurora calls metadata. 
> We should therefore set task labels according to our metadata information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (AURORA-1052) Populate Labels in TaskConfig

2016-01-25 Thread Stephan Erb (JIRA)

 [ 
https://issues.apache.org/jira/browse/AURORA-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Erb updated AURORA-1052:

Comment: was deleted

(was: I'll try to look into that in a couple of days. 

Idea would be to have a command line flag defaulting to 
`org.apache.aurora.metadata` that will be used as a prefix for any metadata 
entry mapped to a label. This prefix could be changed globally by the cluster 
administrator to `com.myorganization` but could also be set to empty if he 
wants to leave it up to the user to decide.

What do you think?)

> Populate Labels in TaskConfig
> -
>
> Key: AURORA-1052
> URL: https://issues.apache.org/jira/browse/AURORA-1052
> Project: Aurora
>  Issue Type: Story
>  Components: Scheduler
>Reporter: Stephan Erb
>Priority: Minor
>  Labels: newbie
>
> Mesos has introduced labels on tasks (MESOS-2120). These correspond to what 
> Aurora calls metadata. 
> We should therefore set task labels according to our metadata information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (AURORA-1254) Remove UpdateConfig.restart_threshold

2016-01-25 Thread John Sirois (JIRA)

 [ 
https://issues.apache.org/jira/browse/AURORA-1254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sirois reassigned AURORA-1254:
---

Assignee: John Sirois

> Remove UpdateConfig.restart_threshold
> -
>
> Key: AURORA-1254
> URL: https://issues.apache.org/jira/browse/AURORA-1254
> Project: Aurora
>  Issue Type: Task
>  Components: Client
>Reporter: Bill Farner
>Assignee: John Sirois
>Priority: Minor
>
> This field has been deprecated as it no longer does anything.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AURORA-1052) Populate Labels in TaskConfig

2016-01-25 Thread Zhitao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/AURORA-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115677#comment-15115677
 ] 

Zhitao Li commented on AURORA-1052:
---

One more comment: Mesos community is planning on populating Docker labels from 
Mesos labels in https://issues.apache.org/jira/browse/MESOS-4446, so following 
#2 probably makes more sense to avoid conflicts.

> Populate Labels in TaskConfig
> -
>
> Key: AURORA-1052
> URL: https://issues.apache.org/jira/browse/AURORA-1052
> Project: Aurora
>  Issue Type: Story
>  Components: Scheduler
>Reporter: Stephan Erb
>Priority: Minor
>  Labels: newbie
>
> Mesos has introduced labels on tasks (MESOS-2120). These correspond to what 
> Aurora calls metadata. 
> We should therefore set task labels according to our metadata information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AURORA-1052) Populate Labels in TaskConfig

2016-01-25 Thread Stephan Erb (JIRA)

[ 
https://issues.apache.org/jira/browse/AURORA-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115698#comment-15115698
 ] 

Stephan Erb commented on AURORA-1052:
-

I'll try to look into that in a couple of days. 

Idea would be to have a command line flag defaulting to 
`org.apache.aurora.metadata` that will be used as a prefix for any metadata 
entry mapped to a label. This prefix could be changed globally by the cluster 
administrator to `com.myorganization` but could also be set to empty if he 
wants to leave it up to the user to decide.

What do you think?

> Populate Labels in TaskConfig
> -
>
> Key: AURORA-1052
> URL: https://issues.apache.org/jira/browse/AURORA-1052
> Project: Aurora
>  Issue Type: Story
>  Components: Scheduler
>Reporter: Stephan Erb
>Priority: Minor
>  Labels: newbie
>
> Mesos has introduced labels on tasks (MESOS-2120). These correspond to what 
> Aurora calls metadata. 
> We should therefore set task labels according to our metadata information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (AURORA-1563) Deprecate endpoint, expected_response and expected_response_code from HealthCheckConfig

2016-01-25 Thread John Sirois (JIRA)

 [ 
https://issues.apache.org/jira/browse/AURORA-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sirois reassigned AURORA-1563:
---

Assignee: John Sirois

> Deprecate endpoint, expected_response and expected_response_code from 
> HealthCheckConfig
> ---
>
> Key: AURORA-1563
> URL: https://issues.apache.org/jira/browse/AURORA-1563
> Project: Aurora
>  Issue Type: Story
>Reporter: Dmitriy Shirchenko
>Assignee: John Sirois
> Fix For: 0.12.0
>
>
> For example, remove deprecated code from health_checker.py and config.py 
> which supports 2 ways of getting attributes listed in the title of this task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (AURORA-1594) End-to-end test is broken

2016-01-25 Thread Bill Farner (JIRA)

 [ 
https://issues.apache.org/jira/browse/AURORA-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Farner reassigned AURORA-1594:
---

Assignee: Bill Farner

> End-to-end test is broken
> -
>
> Key: AURORA-1594
> URL: https://issues.apache.org/jira/browse/AURORA-1594
> Project: Aurora
>  Issue Type: Bug
>  Components: Scheduler
>Reporter: Bill Farner
>Assignee: Bill Farner
>Priority: Blocker
>
> {noformat}
> + aurora job create devcluster/vagrant/test/http_example 
> /vagrant/src/test/sh/org/apache/aurora/e2e/http/http_example.aurora
>  WARN]
> WARNING: endpoint, expected_response, and expected_response_code are 
> deprecated and will be removed
> in the next release. Please consult updated documentation.
>  INFO] Creating job http_example
>  WARN] Could not connect to scheduler: No schedulers detected in devcluster!
>  WARN] Could not connect to scheduler: No schedulers detected in devcluster!
> Job creation failed due to error:
>   java.lang.IllegalArgumentException: Multiple entries with same key: 
> ITaskConfig{job=IJobKey{role=vagrant, environment=test, name=http_example}, 
> owner=IIdentity{role=null, user=vagrant}, environment=null, jobName=null, 
> isService=true, numCpus=0.4, ramMb=32, diskMb=64, priority=0, 
> maxTaskFailures=1, production=false, tier=null, constraints=[], 
> requestedPorts=[http], taskLinks={http=http://%host%:%port:http%}, 
> contactEmail=vagrant@localhost, 
> executorConfig=IExecutorConfig{name=AuroraExecutor, data={"environment": 
> "test", "health_check_config": {"expected_response_code": 0, "endpoint": 
> "/health", "health_checker": {"http": {"expected_response_code": 0, 
> "endpoint": "/health", "expected_response": "ok"}}, "initial_interval_secs": 
> 5.0, "expected_response": "ok", "max_consecutive_failures": 0, 
> "timeout_secs": 1.0, "interval_secs": 1.0}, "name": "http_example", 
> "service": true, "max_task_failures": 1, "cron_collision_policy": 
> "KILL_EXISTING", "enable_hooks": false, "cluster": "devcluster", "task": 
> {"processes": [{"daemon": false, "name": "stage_server", "ephemeral": false, 
> "max_failures": 1, "min_duration": 5, "cmdline": "cp 
> /vagrant/src/test/sh/org/apache/aurora/e2e/http_example.py .", "final": 
> false}, {"daemon": false, "name": "run_server", "ephemeral": false, 
> "max_failures": 1, "min_duration": 5, "cmdline": "python http_example.py 
> {{thermos.ports[http]}}", "final": false}], "name": "http_example", 
> "finalization_wait": 30, "max_failures": 1, "max_concurrency": 0, 
> "resources": {"disk": 67108864, "ram": 33554432, "cpu": 0.4}, "constraints": 
> [{"order": ["stage_server", "run_server"]}]}, "production": false, "role": 
> "vagrant", "contact": "vagrant@localhost", "announce": {"primary_port": 
> "http", "portmap": {"aurora": "http"}}, "lifecycle": {"http": 
> {"graceful_shutdown_endpoint": "/quitquitquit", "port": "health", 
> "shutdown_endpoint": "/abortabortabort"}}, "priority": 0}}, metadata=[], 
> container=IContainer{setField=MESOS, 
> value=IMesosContainer{}}}=org.apache.aurora.scheduler.storage.db.views.DbTaskConfig@7b345c31
>  and ITaskConfig{job=IJobKey{role=vagrant, environment=test, 
> name=http_example}, owner=IIdentity{role=null, user=vagrant}, 
> environment=null, jobName=null, isService=true, numCpus=0.4, ramMb=32, 
> diskMb=64, priority=0, maxTaskFailures=1, production=false, tier=null, 
> constraints=[], requestedPorts=[http], 
> taskLinks={http=http://%host%:%port:http%}, contactEmail=vagrant@localhost, 
> executorConfig=IExecutorConfig{name=AuroraExecutor, data={"environment": 
> "test", "health_check_config": {"expected_response_code": 0, "endpoint": 
> "/health", "health_checker": {"http": {"expected_response_code": 0, 
> "endpoint": "/health", "expected_response": "ok"}}, "initial_interval_secs": 
> 5.0, "expected_response": "ok", "max_consecutive_failures": 0, 
> "timeout_secs": 1.0, "interval_secs": 1.0}, "name": "http_example", 
> "service": true, "max_task_failures": 1, "cron_collision_policy": 
> "KILL_EXISTING", "enable_hooks": false, "cluster": "devcluster", "task": 
> {"processes": [{"daemon": false, "name": "stage_server", "ephemeral": false, 
> "max_failures": 1, "min_duration": 5, "cmdline": "cp 
> /vagrant/src/test/sh/org/apache/aurora/e2e/http_example.py .", "final": 
> false}, {"daemon": false, "name": "run_server", "ephemeral": false, 
> "max_failures": 1, "min_duration": 5, "cmdline": "python http_example.py 
> {{thermos.ports[http]}}", "final": false}], "name": "http_example", 
> "finalization_wait": 30, "max_failures": 1, "max_concurrency": 0, 
> "resources": {"disk": 67108864, "ram": 33554432, "cpu": 0.4}, "constraints": 
> [{"order": ["stage_server", "run_server"]}]}, "production": false, "role": 
> "vagrant", "contact": "vagrant@localhost", "announce": {"primary_port": 
> "http", 

[jira] [Commented] (AURORA-1258) Improve procedure for adding instances to a job

2016-01-25 Thread Maxim Khutornenko (JIRA)

[ 
https://issues.apache.org/jira/browse/AURORA-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116278#comment-15116278
 ] 

Maxim Khutornenko commented on AURORA-1258:
---

Final details are captured here: http://markmail.org/message/2smaej5n5e54li3g

> Improve procedure for adding instances to a job
> ---
>
> Key: AURORA-1258
> URL: https://issues.apache.org/jira/browse/AURORA-1258
> Project: Aurora
>  Issue Type: Story
>  Components: Reliability, Usability
>Reporter: Joe Smith
>Assignee: Maxim Khutornenko
>
> The current process for adding instances to a job is highly manual, and 
> potentially dangerous.
> 1. Take a config for a job with 10 instances, update it to 20 instances.
> 2. The batch size will be increased, and users will need to specify shards 10 
> to 19.
> 3. After this update is complete, users will need to manually update shards 
> 0-9 again.
> There may be other changes pulled in as part of this update other than just 
> increasing the number of instances, which could further complicate things.
> One possible improvement would be to change the updater from 
> 'under-provision' where it kills instances first, then schedules new 
> instances, to an 'over-provision' where it adds on new instances, then 
> backpedals and kills the old instances.
> Overall, a single command or process for a user to take an already-existing 
> job and increase the number of instances would reduce overhead and 
> fat-fingering.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AURORA-1593) PubSubEventModule fails to dispatch events to TaskHistoryPruner on startup

2016-01-25 Thread John Sirois (JIRA)

[ 
https://issues.apache.org/jira/browse/AURORA-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116291#comment-15116291
 ] 

John Sirois commented on AURORA-1593:
-

Just noting that the interaction here is more complex than it may 1st appear 
since the prunes are executed via a DelayExecutor which is gated (work units 
are queued), during storage start (log recovery).
Still digging a bit to make sure I have this all sussed pre-Zameer's change, 
with Zameer's change and with my change.

> PubSubEventModule fails to dispatch events to TaskHistoryPruner on startup
> --
>
> Key: AURORA-1593
> URL: https://issues.apache.org/jira/browse/AURORA-1593
> Project: Aurora
>  Issue Type: Bug
>Reporter: Zameer Manji
>Assignee: John Sirois
>
> On latest master I see several exceptions that look like:
> {noformat}
> E0122 22:59:19.272 [AsyncProcessor-7, PubsubEventModule:84] Failed to 
> dispatch event to public void 
> org.apache.aurora.scheduler.pruning.TaskHistoryPruner.recordStateChange(org.apache.aurora.scheduler.events.PubsubEvent$TaskStateChange):
>  java.lang.IllegalStateException j
> ava.lang.IllegalStateException: null
> at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:159) 
> ~[guava-19.0.jar:na]
> at 
> org.apache.aurora.scheduler.pruning.TaskHistoryPruner.recordStateChange(TaskHistoryPruner.java:117)
>  ~[aurora-116.jar:na]
> at sun.reflect.GeneratedMethodAccessor116.invoke(Unknown Source) 
> ~[na:na]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[na:1.8.0_66-Tw8r9b2]
> at java.lang.reflect.Method.invoke(Method.java:497) 
> ~[na:1.8.0_66-Tw8r9b2]
> at 
> com.google.common.eventbus.Subscriber.invokeSubscriberMethod(Subscriber.java:95)
>  ~[guava-19.0.jar:na]
> at 
> com.google.common.eventbus.Subscriber$SynchronizedSubscriber.invokeSubscriberMethod(Subscriber.java:154)
>  ~[guava-19.0.jar:na]
> at com.google.common.eventbus.Subscriber$1.run(Subscriber.java:80) 
> ~[guava-19.0.jar:na]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_66-Tw8r9b2]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[na:1.8.0_66-Tw8r9b2]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>  ~[na:1.8.0_66-Tw8r9b2]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>  ~[na:1.8.0_66-Tw8r9b2]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[na:1.8.0_66-Tw8r9b2]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  ~[na:1.8.0_66-Tw8r9b2]
> at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_66-Tw8r9b2]
> {noformat}
> The problem is that {{TaskHistoryPruner}} assumes it is started before the 
> event bus starts sending events to the service. This appears to not be the 
> case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AURORA-1593) PubSubEventModule fails to dispatch events to TaskHistoryPruner on startup

2016-01-25 Thread John Sirois (JIRA)

[ 
https://issues.apache.org/jira/browse/AURORA-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116365#comment-15116365
 ] 

John Sirois commented on AURORA-1593:
-

OK - still sorting through the {{DelayExecutor}} gating, but it does not look 
like it gates any {{TaskStateChange}} events - the events consumed by 
{{TaskHistoryPruner}}.

> PubSubEventModule fails to dispatch events to TaskHistoryPruner on startup
> --
>
> Key: AURORA-1593
> URL: https://issues.apache.org/jira/browse/AURORA-1593
> Project: Aurora
>  Issue Type: Bug
>Reporter: Zameer Manji
>Assignee: John Sirois
>
> On latest master I see several exceptions that look like:
> {noformat}
> E0122 22:59:19.272 [AsyncProcessor-7, PubsubEventModule:84] Failed to 
> dispatch event to public void 
> org.apache.aurora.scheduler.pruning.TaskHistoryPruner.recordStateChange(org.apache.aurora.scheduler.events.PubsubEvent$TaskStateChange):
>  java.lang.IllegalStateException j
> ava.lang.IllegalStateException: null
> at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:159) 
> ~[guava-19.0.jar:na]
> at 
> org.apache.aurora.scheduler.pruning.TaskHistoryPruner.recordStateChange(TaskHistoryPruner.java:117)
>  ~[aurora-116.jar:na]
> at sun.reflect.GeneratedMethodAccessor116.invoke(Unknown Source) 
> ~[na:na]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[na:1.8.0_66-Tw8r9b2]
> at java.lang.reflect.Method.invoke(Method.java:497) 
> ~[na:1.8.0_66-Tw8r9b2]
> at 
> com.google.common.eventbus.Subscriber.invokeSubscriberMethod(Subscriber.java:95)
>  ~[guava-19.0.jar:na]
> at 
> com.google.common.eventbus.Subscriber$SynchronizedSubscriber.invokeSubscriberMethod(Subscriber.java:154)
>  ~[guava-19.0.jar:na]
> at com.google.common.eventbus.Subscriber$1.run(Subscriber.java:80) 
> ~[guava-19.0.jar:na]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_66-Tw8r9b2]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[na:1.8.0_66-Tw8r9b2]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>  ~[na:1.8.0_66-Tw8r9b2]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>  ~[na:1.8.0_66-Tw8r9b2]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[na:1.8.0_66-Tw8r9b2]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  ~[na:1.8.0_66-Tw8r9b2]
> at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_66-Tw8r9b2]
> {noformat}
> The problem is that {{TaskHistoryPruner}} assumes it is started before the 
> event bus starts sending events to the service. This appears to not be the 
> case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)