[jira] [Created] (AURORA-1534) Announce Aurora with the Same JSON as Aurora Jobs
Paul Cavallaro created AURORA-1534: -- Summary: Announce Aurora with the Same JSON as Aurora Jobs Key: AURORA-1534 URL: https://issues.apache.org/jira/browse/AURORA-1534 Project: Aurora Issue Type: Task Affects Versions: 0.9.0 Reporter: Paul Cavallaro Priority: Minor Currently Aurora announce itself under /aurora/services in ZooKeeper, but the JSON it writes is not the same as the normal Aurora Jobs being announced. Namely it does not include a 'shard' key:value. Having it be the same, is useful for reusing code that watches Aurora Jobs to be able to use to watch the Aurora Leader (for accessing the HTTP API, Thrift API, etc.). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (AURORA-1367) 0.10.0 deprecations
[ https://issues.apache.org/jira/browse/AURORA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zameer Manji resolved AURORA-1367. -- Resolution: Fixed > 0.10.0 deprecations > --- > > Key: AURORA-1367 > URL: https://issues.apache.org/jira/browse/AURORA-1367 > Project: Aurora > Issue Type: Epic >Reporter: Bill Farner >Assignee: Zameer Manji > > Features/behaviors that are scheduled to be deprecated in the 0.10.0 release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (AURORA-1367) 0.10.0 deprecations
[ https://issues.apache.org/jira/browse/AURORA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zameer Manji reopened AURORA-1367: -- > 0.10.0 deprecations > --- > > Key: AURORA-1367 > URL: https://issues.apache.org/jira/browse/AURORA-1367 > Project: Aurora > Issue Type: Epic >Reporter: Bill Farner >Assignee: Zameer Manji > > Features/behaviors that are scheduled to be deprecated in the 0.10.0 release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AURORA-1252) Deprecate UpdateConfig restart_threshold setting
[ https://issues.apache.org/jira/browse/AURORA-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14986249#comment-14986249 ] Zameer Manji commented on AURORA-1252: -- I moved this to the 0.11.0 release. > Deprecate UpdateConfig restart_threshold setting > > > Key: AURORA-1252 > URL: https://issues.apache.org/jira/browse/AURORA-1252 > Project: Aurora > Issue Type: Task > Components: Client >Reporter: Bill Farner >Priority: Minor > > Display a warning when a configuration specifies restart_threshold in an > update configuration, as it is no longer used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AURORA-1150) Deprecate JSON output in the client
[ https://issues.apache.org/jira/browse/AURORA-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14986247#comment-14986247 ] Zameer Manji commented on AURORA-1150: -- I moved this to the 0.11.0 release. > Deprecate JSON output in the client > --- > > Key: AURORA-1150 > URL: https://issues.apache.org/jira/browse/AURORA-1150 > Project: Aurora > Issue Type: Story > Components: Client, Technical Debt >Reporter: Bill Farner >Priority: Minor > > The client currently provides a crutch to extract JSON from Aurora, which > should go away once we have a formal API that produces human-friendly JSON. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AURORA-785) Remove client-side update code
[ https://issues.apache.org/jira/browse/AURORA-785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14986244#comment-14986244 ] Zameer Manji commented on AURORA-785: - I moved this to the 0.11.0 release. > Remove client-side update code > -- > > Key: AURORA-785 > URL: https://issues.apache.org/jira/browse/AURORA-785 > Project: Aurora > Issue Type: Task > Components: Client, Technical Debt >Reporter: Bill Farner > > Once scheduler-driven updates are considered stable, make that the only > update path from the client. Remove update orchestration code from the > client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AURORA-1254) Remove UpdateConfig.restart_threshold
[ https://issues.apache.org/jira/browse/AURORA-1254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14986242#comment-14986242 ] Zameer Manji commented on AURORA-1254: -- I moved this to the 0.11.0 release. > Remove UpdateConfig.restart_threshold > - > > Key: AURORA-1254 > URL: https://issues.apache.org/jira/browse/AURORA-1254 > Project: Aurora > Issue Type: Task > Components: Client >Reporter: Bill Farner >Priority: Minor > > This field has been deprecated as it no longer does anything. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AURORA-1247) Remove JobUpdateSettings.maxWaitToInstanceRunningMs
[ https://issues.apache.org/jira/browse/AURORA-1247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14986235#comment-14986235 ] Zameer Manji commented on AURORA-1247: -- I moved this to the 0.11.0 release. > Remove JobUpdateSettings.maxWaitToInstanceRunningMs > --- > > Key: AURORA-1247 > URL: https://issues.apache.org/jira/browse/AURORA-1247 > Project: Aurora > Issue Type: Story >Reporter: Bill Farner > > Field was deprecated in 0.8.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AURORA-1528) New snooze feature for cron jobs
[ https://issues.apache.org/jira/browse/AURORA-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14986192#comment-14986192 ] Bill Farner commented on AURORA-1528: - With this use case as an example, my major concern is that we could give the user a false sense of security that we are protecting them from corrupting (their external) data during an operation like a schema migration. Given that risk, disclaimers/warnings, and knobs we would need to include to enable this, i don't think it's something we should pursue. > New snooze feature for cron jobs > > > Key: AURORA-1528 > URL: https://issues.apache.org/jira/browse/AURORA-1528 > Project: Aurora > Issue Type: Story > Components: Scheduler >Reporter: Thomas Sun >Assignee: Thomas Sun >Priority: Minor > > something that would fulfill my need would be in the next > d:h:m:s(days:hours:minutes:seconds) from NOW, > do not run(or skip running) the job even though it may be scheduled to run > for x times within that time range. > if a job is scheduled to run at T0, T6, T12, T18 UTC everyday, and I did a > snooze 0:8:0:0 for this job at UTC T16, then the next 2 runs(T18 today and T0 > tomorrow) of the job should be skipped. > another useful interface for me would be snooze x -> x is the number of runs > from NOW to skip. > Design: > https://docs.google.com/document/d/1NEMb9Qbc9T9ZBlOifkRjB2OYGnO_pNURNQcsSLXArSk/edit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AURORA-1528) New snooze feature for cron jobs
[ https://issues.apache.org/jira/browse/AURORA-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14985946#comment-14985946 ] Joshua Cohen commented on AURORA-1528: -- [~dabaitu] Unfortunately it's a bit more complex than just having the quartz scheduler pause/resume the job. The fact that the cron job is snoozed would need to be written to Aurora's storage so that in the event of a failover that snooze is respected by the new leader. [~wfarner] This still seems like nice-to-have functionality to me, though I admit the use case is probably fairly limited. It's a question of whether the complexity to maintain this outweighs the complexity of requiring people to manage this outside of Aurora. > New snooze feature for cron jobs > > > Key: AURORA-1528 > URL: https://issues.apache.org/jira/browse/AURORA-1528 > Project: Aurora > Issue Type: Story > Components: Scheduler >Reporter: Thomas Sun >Assignee: Thomas Sun >Priority: Minor > > something that would fulfill my need would be in the next > d:h:m:s(days:hours:minutes:seconds) from NOW, > do not run(or skip running) the job even though it may be scheduled to run > for x times within that time range. > if a job is scheduled to run at T0, T6, T12, T18 UTC everyday, and I did a > snooze 0:8:0:0 for this job at UTC T16, then the next 2 runs(T18 today and T0 > tomorrow) of the job should be skipped. > another useful interface for me would be snooze x -> x is the number of runs > from NOW to skip. > Design: > https://docs.google.com/document/d/1NEMb9Qbc9T9ZBlOifkRjB2OYGnO_pNURNQcsSLXArSk/edit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (AURORA-1531) upgrade psutil to 3.2.2 from 2.1.3
[ https://issues.apache.org/jira/browse/AURORA-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Shirchenko updated AURORA-1531: --- Labels: Uber (was: uber) > upgrade psutil to 3.2.2 from 2.1.3 > -- > > Key: AURORA-1531 > URL: https://issues.apache.org/jira/browse/AURORA-1531 > Project: Aurora > Issue Type: Task >Reporter: Dmitriy Shirchenko >Assignee: Bill Farner >Priority: Minor > Labels: Uber > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (AURORA-1529) Wrong command to `Build a client executable`
[ https://issues.apache.org/jira/browse/AURORA-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Shirchenko updated AURORA-1529: --- Labels: Uber (was: uber) > Wrong command to `Build a client executable` > > > Key: AURORA-1529 > URL: https://issues.apache.org/jira/browse/AURORA-1529 > Project: Aurora > Issue Type: Bug >Reporter: Dmitriy Shirchenko >Assignee: Dmitriy Shirchenko >Priority: Trivial > Labels: Uber > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (AURORA-1531) upgrade psutil to 3.2.2 from 2.1.3
[ https://issues.apache.org/jira/browse/AURORA-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Shirchenko updated AURORA-1531: --- Labels: uber (was: ) > upgrade psutil to 3.2.2 from 2.1.3 > -- > > Key: AURORA-1531 > URL: https://issues.apache.org/jira/browse/AURORA-1531 > Project: Aurora > Issue Type: Task >Reporter: Dmitriy Shirchenko >Assignee: Bill Farner >Priority: Minor > Labels: uber > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (AURORA-1533) Transient connection errors can leave client in irrecoverable state
[ https://issues.apache.org/jira/browse/AURORA-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Khutornenko resolved AURORA-1533. --- Resolution: Fixed > Transient connection errors can leave client in irrecoverable state > --- > > Key: AURORA-1533 > URL: https://issues.apache.org/jira/browse/AURORA-1533 > Project: Aurora > Issue Type: Bug >Reporter: Stephan Erb >Assignee: Stephan Erb >Priority: Minor > > During a cluster update, some of our schedulers returned an unknown error to > connecting clients ([relevant > code|https://github.com/apache/aurora/blob/b712d577364f6b1613b54ba696bac4ddc255ae58/src/main/python/apache/aurora/client/api/scheduler_client.py#L268]). > Long running clients failed to recover from these errors as the code > assumed the connection was already established. Subsequent scheduling calls > thus failed with the following exception: > {code} > File > "venv/local/lib/python2.7/site-packages/apache/aurora/client/api/__init__.py" > in query_no_configs > 140. raise self.ThriftInternalError(e.args[0]) > Exception Type: ThriftInternalError > Exception Value: Error during thrift call getTasksWithoutConfigs to > testcluster: 'NoneType' object has no attribute 'getTasksWithoutConfigs' > {code} > Background: We are using the python client to dispatch calls to Aurora from > within a long-running web service. The connection is kept open as long as the > web service is running. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (AURORA-1533) Transient connection errors can leave client in irrecoverable state
[ https://issues.apache.org/jira/browse/AURORA-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Khutornenko updated AURORA-1533: -- Assignee: Stephan Erb > Transient connection errors can leave client in irrecoverable state > --- > > Key: AURORA-1533 > URL: https://issues.apache.org/jira/browse/AURORA-1533 > Project: Aurora > Issue Type: Bug >Reporter: Stephan Erb >Assignee: Stephan Erb >Priority: Minor > > During a cluster update, some of our schedulers returned an unknown error to > connecting clients ([relevant > code|https://github.com/apache/aurora/blob/b712d577364f6b1613b54ba696bac4ddc255ae58/src/main/python/apache/aurora/client/api/scheduler_client.py#L268]). > Long running clients failed to recover from these errors as the code > assumed the connection was already established. Subsequent scheduling calls > thus failed with the following exception: > {code} > File > "venv/local/lib/python2.7/site-packages/apache/aurora/client/api/__init__.py" > in query_no_configs > 140. raise self.ThriftInternalError(e.args[0]) > Exception Type: ThriftInternalError > Exception Value: Error during thrift call getTasksWithoutConfigs to > testcluster: 'NoneType' object has no attribute 'getTasksWithoutConfigs' > {code} > Background: We are using the python client to dispatch calls to Aurora from > within a long-running web service. The connection is kept open as long as the > web service is running. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AURORA-1533) Transient connection errors can leave client in irrecoverable state
[ https://issues.apache.org/jira/browse/AURORA-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14985191#comment-14985191 ] Stephan Erb commented on AURORA-1533: - Review request: https://reviews.apache.org/r/39854/ > Transient connection errors can leave client in irrecoverable state > --- > > Key: AURORA-1533 > URL: https://issues.apache.org/jira/browse/AURORA-1533 > Project: Aurora > Issue Type: Bug >Reporter: Stephan Erb >Priority: Minor > > During a cluster update, some of our schedulers returned an unknown error to > connecting clients ([relevant > code|https://github.com/apache/aurora/blob/b712d577364f6b1613b54ba696bac4ddc255ae58/src/main/python/apache/aurora/client/api/scheduler_client.py#L268]). > Long running clients failed to recover from these errors as the code > assumed the connection was already established. Subsequent scheduling calls > thus failed with the following exception: > {code} > File > "venv/local/lib/python2.7/site-packages/apache/aurora/client/api/__init__.py" > in query_no_configs > 140. raise self.ThriftInternalError(e.args[0]) > Exception Type: ThriftInternalError > Exception Value: Error during thrift call getTasksWithoutConfigs to > testcluster: 'NoneType' object has no attribute 'getTasksWithoutConfigs' > {code} > Background: We are using the python client to dispatch calls to Aurora from > within a long-running web service. The connection is kept open as long as the > web service is running. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (AURORA-1533) Transient connection errors can leave client in irrecoverable state
Stephan Erb created AURORA-1533: --- Summary: Transient connection errors can leave client in irrecoverable state Key: AURORA-1533 URL: https://issues.apache.org/jira/browse/AURORA-1533 Project: Aurora Issue Type: Bug Reporter: Stephan Erb Priority: Minor During a cluster update, some of our schedulers returned an unknown error to connecting clients ([relevant code|https://github.com/apache/aurora/blob/b712d577364f6b1613b54ba696bac4ddc255ae58/src/main/python/apache/aurora/client/api/scheduler_client.py#L268]). Long running clients failed to recover from these errors as the code assumed the connection was already established. Subsequent scheduling calls thus failed with the following exception: {code} File "venv/local/lib/python2.7/site-packages/apache/aurora/client/api/__init__.py" in query_no_configs 140. raise self.ThriftInternalError(e.args[0]) Exception Type: ThriftInternalError Exception Value: Error during thrift call getTasksWithoutConfigs to testcluster: 'NoneType' object has no attribute 'getTasksWithoutConfigs' {code} Background: We are using the python client to dispatch calls to Aurora from within a long-running web service. The connection is kept open as long as the web service is running. -- This message was sent by Atlassian JIRA (v6.3.4#6332)