date:20161123

[jira] [Assigned] (AURORA-1827) Fix SLA percentile calculation

2016-11-23 Thread Reza Motamedi (JIRA)


 [ 
https://issues.apache.org/jira/browse/AURORA-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reza Motamedi reassigned AURORA-1827:
-

Assignee: Reza Motamedi

> Fix SLA percentile calculation 
> ---
>
> Key: AURORA-1827
> URL: https://issues.apache.org/jira/browse/AURORA-1827
> Project: Aurora
>  Issue Type: Story
>Reporter: Reza Motamedi
>Assignee: Reza Motamedi
>Priority: Trivial
>  Labels: newbie, sla
>
> The calculation of mttX (median-time-to-X) depends on the computation of 
> percentile values. The current implementation does not behave nicely with a 
> small sample size. For instance, for a given sample set of  {50, 150}, 
> 50-percentile is reported to be 50. Although, 100 seems a more appropriate 
> return value.
> One solution is to modify `SlaUtil` to perform an extrapolation when the 
> sample size is small or when the corresponding index to a percentile value is 
> not an integer. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (AURORA-118) Add percentiles to @Timed, or write a new decorator to add percentiles

2016-11-23 Thread Reza Motamedi (JIRA)


 [ 
https://issues.apache.org/jira/browse/AURORA-118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reza Motamedi reassigned AURORA-118:


Assignee: Reza Motamedi

> Add percentiles to @Timed, or write a new decorator to add percentiles
> --
>
> Key: AURORA-118
> URL: https://issues.apache.org/jira/browse/AURORA-118
> Project: Aurora
>  Issue Type: Task
>  Components: Scheduler
>Reporter: Bill Farner
>Assignee: Reza Motamedi
>Priority: Minor
>  Labels: newbie
>
> The @Timed annotation is really nice for 'sprinkling on' instrumentation, but 
> doesn't expose percentiles.  We've seen several areas where a long tail of 
> slow operations caused major performance issues, so spotting these with 
> percentiles would be very helpful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (AURORA-1825) Enable async logging by default

2016-11-23 Thread Zameer Manji (JIRA)


[ 
https://issues.apache.org/jira/browse/AURORA-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15691758#comment-15691758
 ] 

Zameer Manji commented on AURORA-1825:
--

Locally I removed the expensive parts of our logback config with:
{noformat}
diff --git c/src/main/resources/logback.xml w/src/main/resources/logback.xml
index 84c175c..6206806 100644
--- c/src/main/resources/logback.xml
+++ w/src/main/resources/logback.xml
@@ -23,7 +23,7 @@ limitations under the License.
 System.err
 
   
-%.-1level%date{MMdd HH:mm:ss.SSS} [%thread, %class{0}:%line] %message 
%xThrowable%n
+%.-1level%date{MMdd HH:mm:ss.SSS} [%thread] %message %xThrowable%n
   
 
   

{noformat}

Before:
{noformat}
Benchmark   (numPendingTasks)  
(numTasksToDelete)   Mode  Cnt  Score   Error  Units
StateManagerBenchmarks.DeleteTasksBenchmark.run   N/A   
 1000  thrpt   10  2.510 ± 0.557  ops/s
StateManagerBenchmarks.DeleteTasksBenchmark.run   N/A   
1  thrpt   10  0.272 ± 0.030  ops/s
StateManagerBenchmarks.DeleteTasksBenchmark.run   N/A   
5  thrpt   10  0.053 ± 0.011  ops/s
StateManagerBenchmarks.InsertPendingTasksBenchmark.run   1000   
  N/A  thrpt   10  2.446 ± 0.698  ops/s
StateManagerBenchmarks.InsertPendingTasksBenchmark.run  1   
  N/A  thrpt   10  0.246 ± 0.018  ops/s
StateManagerBenchmarks.InsertPendingTasksBenchmark.run  5   
  N/A  thrpt   10  0.041 ± 0.006  ops/s
{noformat}

After:

{noformat}
Benchmark   (numPendingTasks)  
(numTasksToDelete)   Mode  Cnt  Score   Error  Units
StateManagerBenchmarks.DeleteTasksBenchmark.run   N/A   
 1000  thrpt   10  8.640 ± 1.431  ops/s
StateManagerBenchmarks.DeleteTasksBenchmark.run   N/A   
1  thrpt   10  0.892 ± 0.066  ops/s
StateManagerBenchmarks.DeleteTasksBenchmark.run   N/A   
5  thrpt   10  0.172 ± 0.010  ops/s
StateManagerBenchmarks.InsertPendingTasksBenchmark.run   1000   
  N/A  thrpt   10  4.837 ± 1.511  ops/s
StateManagerBenchmarks.InsertPendingTasksBenchmark.run  1   
  N/A  thrpt   10  0.510 ± 0.315  ops/s
StateManagerBenchmarks.InsertPendingTasksBenchmark.run  5   
  N/A  thrpt   10  0.079 ± 0.052  ops/s
{noformat}

I picked this benchmark because it logs a lot in the critical path.

We could probably fix this problem by removing line number and removing class 
name with the logger name. The net result would be no line numbers but way 
faster logging.

> Enable async logging by default
> ---
>
> Key: AURORA-1825
> URL: https://issues.apache.org/jira/browse/AURORA-1825
> Project: Aurora
>  Issue Type: Task
>Reporter: Zameer Manji
>Assignee: Jing Chen
>Priority: Minor
>
> Based on my experience while working on AURORA-1823 and [~StephanErb]'s work 
> on logging recently, I think it would be best if we enabled async logging.
> For example if one attempts to parallelize the work inside 
> {{StateManagerImpl}} there isn't much benefit because all of the state 
> transitions are logged and all of the threads would contend for the lock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (AURORA-1825) Enable async logging by default

2016-11-23 Thread Mehrdad Nurolahzade (JIRA)


[ 
https://issues.apache.org/jira/browse/AURORA-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15691616#comment-15691616
 ] 

Mehrdad Nurolahzade commented on AURORA-1825:
-

Just as a side note:

It would be interesting to see benchmarks on logging with expensive reflection 
based patterns like class name or line number removed. I did not find anything 
on this on Logback but Log4j documentation, for example, explicitly warns 
against using such patterns in performance critical systems: 
[https://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/PatternLayout.html]

{quote}WARNING Generating the caller class information is slow. Thus, use 
should be avoided unless execution speed is not an issue.{quote}
{quote}WARNING Generating caller location information is extremely slow and 
should be avoided unless execution speed is not an issue.{quote}

> Enable async logging by default
> ---
>
> Key: AURORA-1825
> URL: https://issues.apache.org/jira/browse/AURORA-1825
> Project: Aurora
>  Issue Type: Task
>Reporter: Zameer Manji
>Assignee: Jing Chen
>Priority: Minor
>
> Based on my experience while working on AURORA-1823 and [~StephanErb]'s work 
> on logging recently, I think it would be best if we enabled async logging.
> For example if one attempts to parallelize the work inside 
> {{StateManagerImpl}} there isn't much benefit because all of the state 
> transitions are logged and all of the threads would contend for the lock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (AURORA-1825) Enable async logging by default

2016-11-23 Thread Stephan Erb (JIRA)


[ 
https://issues.apache.org/jira/browse/AURORA-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15691562#comment-15691562
 ] 

Stephan Erb commented on AURORA-1825:
-

I would love to see a benchmark showing that async logging is worthwhile before 
we go down that route. Asynchronous logging can be a real pain when debugging 
crashes and spurious bugs.

> Enable async logging by default
> ---
>
> Key: AURORA-1825
> URL: https://issues.apache.org/jira/browse/AURORA-1825
> Project: Aurora
>  Issue Type: Task
>Reporter: Zameer Manji
>Assignee: Jing Chen
>Priority: Minor
>
> Based on my experience while working on AURORA-1823 and [~StephanErb]'s work 
> on logging recently, I think it would be best if we enabled async logging.
> For example if one attempts to parallelize the work inside 
> {{StateManagerImpl}} there isn't much benefit because all of the state 
> transitions are logged and all of the threads would contend for the lock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (AURORA-1825) Enable async logging by default

2016-11-23 Thread Jing Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/AURORA-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Chen reassigned AURORA-1825:
-

Assignee: Jing Chen

> Enable async logging by default
> ---
>
> Key: AURORA-1825
> URL: https://issues.apache.org/jira/browse/AURORA-1825
> Project: Aurora
>  Issue Type: Task
>Reporter: Zameer Manji
>Assignee: Jing Chen
>Priority: Minor
>
> Based on my experience while working on AURORA-1823 and [~StephanErb]'s work 
> on logging recently, I think it would be best if we enabled async logging.
> For example if one attempts to parallelize the work inside 
> {{StateManagerImpl}} there isn't much benefit because all of the state 
> transitions are logged and all of the threads would contend for the lock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (AURORA-1829) Expose stats on preemptor BiCache expirations

2016-11-23 Thread Mehrdad Nurolahzade (JIRA)

Mehrdad Nurolahzade created AURORA-1829:
---

 Summary: Expose stats on preemptor BiCache expirations
 Key: AURORA-1829
 URL: https://issues.apache.org/jira/browse/AURORA-1829
 Project: Aurora
  Issue Type: Story
  Components: Scheduler
Reporter: Mehrdad Nurolahzade
Priority: Minor


We are currently collecting stats for the size of preemptor {{BiCache}} 
({{reservation_cache_size}}). We could additionally collect cache expiration 
stats to monitor overall preemption effectiveness. 

Currently, we have no visibility into whether reservations made by preemption 
are actually consumed or simply expire.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (AURORA-1827) Fix SLA percentile calculation

2016-11-23 Thread Zameer Manji (JIRA)


[ 
https://issues.apache.org/jira/browse/AURORA-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15691040#comment-15691040
 ] 

Zameer Manji commented on AURORA-1827:
--

I upgraded us to Guava 20. It has a 
[Quantiles|http://google.github.io/guava/releases/20.0/api/docs/com/google/common/math/Quantiles.html]
 class and a 
[Stats|http://google.github.io/guava/releases/20.0/api/docs/com/google/common/math/Stats.html]
 class that could be very helpful here.

> Fix SLA percentile calculation 
> ---
>
> Key: AURORA-1827
> URL: https://issues.apache.org/jira/browse/AURORA-1827
> Project: Aurora
>  Issue Type: Story
>Reporter: Reza Motamedi
>Priority: Trivial
>  Labels: newbie, sla
>
> The calculation of mttX (median-time-to-X) depends on the computation of 
> percentile values. The current implementation does not behave nicely with a 
> small sample size. For instance, for a given sample set of  {50, 150}, 
> 50-percentile is reported to be 50. Although, 100 seems a more appropriate 
> return value.
> One solution is to modify `SlaUtil` to perform an extrapolation when the 
> sample size is small or when the corresponding index to a percentile value is 
> not an integer. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (AURORA-1828) Expose stats on the number of offers evaluated before a task is assigned

2016-11-23 Thread Mehrdad Nurolahzade (JIRA)

Mehrdad Nurolahzade created AURORA-1828:
---

 Summary: Expose stats on the number of offers evaluated before a 
task is assigned
 Key: AURORA-1828
 URL: https://issues.apache.org/jira/browse/AURORA-1828
 Project: Aurora
  Issue Type: Story
  Components: Scheduler
Reporter: Mehrdad Nurolahzade
Priority: Minor


Expose stats on the number of offers evaluated before a task is assigned by 
{{TaskAssigner}}.

Although the number of invocations of the {{SchedulingFilterImpl.filter()}} 
method exposes the number of offers examined per unit of time. But, it does not 
provide us with visibility into how many offers are examined before a task is 
assigned in {{TaskSchedulerImpl.maybeAssign()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (AURORA-1826) Expose Thrift server request workload stats

2016-11-23 Thread Mehrdad Nurolahzade (JIRA)


 [ 
https://issues.apache.org/jira/browse/AURORA-1826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehrdad Nurolahzade updated AURORA-1826:

Labels: newbie  (was: )

> Expose Thrift server request workload stats
> ---
>
> Key: AURORA-1826
> URL: https://issues.apache.org/jira/browse/AURORA-1826
> Project: Aurora
>  Issue Type: Story
>  Components: Scheduler
>Reporter: Mehrdad Nurolahzade
>Priority: Minor
>  Labels: newbie
>
> Current Thrift server stats expose the number and timing of requests received 
> by the server.  However, they fail to reflect the size of the requests. This 
> is limiting us in having an accurate view of the workload currently handled 
> by the scheduler. 
> For example, every call to {{restartShards()}} is recorded as one event 
> despite the fact that a request might only restart one shard while another 
> request might seek to restart 1K shards. The request workload can be factored 
> in to better interpret timing information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (AURORA-1827) Fix SLA percentile calculation

2016-11-23 Thread Joshua Cohen (JIRA)


 [ 
https://issues.apache.org/jira/browse/AURORA-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joshua Cohen updated AURORA-1827:
-
Labels: newbie sla  (was: sla)

> Fix SLA percentile calculation 
> ---
>
> Key: AURORA-1827
> URL: https://issues.apache.org/jira/browse/AURORA-1827
> Project: Aurora
>  Issue Type: Story
>Reporter: Reza Motamedi
>Priority: Trivial
>  Labels: newbie, sla
>
> The calculation of mttX (median-time-to-X) depends on the computation of 
> percentile values. The current implementation does not behave nicely with a 
> small sample size. For instance, for a given sample set of  {50, 150}, 
> 50-percentile is reported to be 50. Although, 100 seems a more appropriate 
> return value.
> One solution is to modify `SlaUtil` to perform an extrapolation when the 
> sample size is small or when the corresponding index to a percentile value is 
> not an integer. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (AURORA-1827) Fix SLA percentile calculation

2016-11-23 Thread Reza Motamedi (JIRA)

Reza Motamedi created AURORA-1827:
-

 Summary: Fix SLA percentile calculation 
 Key: AURORA-1827
 URL: https://issues.apache.org/jira/browse/AURORA-1827
 Project: Aurora
  Issue Type: Story
Reporter: Reza Motamedi
Priority: Trivial


The calculation of mttX (median-time-to-X) depends on the computation of 
percentile values. The current implementation does not behave nicely with a 
small sample size. For instance, for a given sample set of  {50, 150}, 
50-percentile is reported to be 50. Although, 100 seems a more appropriate 
return value.

One solution is to modify `SlaUtil` to perform an extrapolation when the sample 
size is small or when the corresponding index to a percentile value is not an 
integer. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (AURORA-1826) Expose Thrift server request workload stats

2016-11-23 Thread Mehrdad Nurolahzade (JIRA)

Mehrdad Nurolahzade created AURORA-1826:
---

 Summary: Expose Thrift server request workload stats
 Key: AURORA-1826
 URL: https://issues.apache.org/jira/browse/AURORA-1826
 Project: Aurora
  Issue Type: Story
  Components: Scheduler
Reporter: Mehrdad Nurolahzade
Priority: Minor


Current Thrift server stats expose the number and timing of requests received 
by the server.  However, they fail to reflect the size of the requests. This is 
limiting us in having an accurate view of the workload currently handled by the 
scheduler. 

For example, every call to {{restartShards()}} is recorded as one event despite 
the fact that a request might only restart one shard while another request 
might seek to restart 1K shards. The request workload can be factored in to 
better interpret timing information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (AURORA-1780) Offers with unknown resources types to Aurora crash the scheduler

2016-11-23 Thread Stephan Erb (JIRA)


[ 
https://issues.apache.org/jira/browse/AURORA-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15689949#comment-15689949
 ] 

Stephan Erb commented on AURORA-1780:
-

Fix is on master. Thanks!

{code}
commit 4797dfe33ba08183fa9596a46ac8be51a64e08bb
Author: Renan DelValle 
Date:   Wed Nov 23 13:08:51 2016 +0100

Filter out calls to fromResource for resources that Aurora does not support 
yet to avoid crashing

Added filters whenever fromResource is called for a Protos.Resource in 
order to avoid Aurora crashing.
Previously only bagFromMesosResources was using the SUPPORTED_RESOURCE 
filter.

Reviewed at https://reviews.apache.org/r/53923/

 src/main/java/org/apache/aurora/scheduler/resources/ResourceManager.java | 23 
+--
 1 file changed, 17 insertions(+), 6 deletions(-)
{code}

> Offers with unknown resources types to Aurora crash the scheduler
> -
>
> Key: AURORA-1780
> URL: https://issues.apache.org/jira/browse/AURORA-1780
> Project: Aurora
>  Issue Type: Bug
> Environment: vagrant
>Reporter: Renan DelValle
>Assignee: Renan DelValle
> Fix For: 0.17.0
>
>
> Taking offers from Agents which have resources that are not known to Aurora 
> cause the Scheduler to crash.
> Steps to reproduce:
> {code}
> vagrant up
> sudo service mesos-slave stop
> echo 
> "cpus(aurora-role):0.5;cpus(*):3.5;mem(aurora-role):1024;disk:2;gpus(*):4;test:200"
>  | sudo tee /etc/mesos-slave/resources
> sudo rm -f /var/lib/mesos/meta/slaves/latest
> sudo service mesos-slave start
> {code}
> Wait around a few moments for the offer to be made to Aurora
> {code}
> I0922 02:41:57.839 [Thread-19, MesosSchedulerImpl:142] Received notification 
> of lost agent: value: "cadaf569-171d-42fc-a417-fbd608ea5bab-S0"
> I0922 02:42:30.585597  2999 log.cpp:577] Attempting to append 109 bytes to 
> the log
> I0922 02:42:30.585654  2999 coordinator.cpp:348] Coordinator attempting to 
> write APPEND action at position 4
> I0922 02:42:30.585747  2999 replica.cpp:537] Replica received write request 
> for position 4 from (10)@192.168.33.7:8083
> I0922 02:42:30.586858  2999 leveldb.cpp:341] Persisting action (125 bytes) to 
> leveldb took 1.086601ms
> I0922 02:42:30.586897  2999 replica.cpp:712] Persisted action at 4
> I0922 02:42:30.587020  2999 replica.cpp:691] Replica received learned notice 
> for position 4 from @0.0.0.0:0
> I0922 02:42:30.587785  2999 leveldb.cpp:341] Persisting action (127 bytes) to 
> leveldb took 746999ns
> I0922 02:42:30.587805  2999 replica.cpp:712] Persisted action at 4
> I0922 02:42:30.587811  2999 replica.cpp:697] Replica learned APPEND action at 
> position 4
> I0922 02:42:30.601 [SchedulerImpl-0, OfferManager$OfferManagerImpl:185] 
> Returning offers for cadaf569-171d-42fc-a417-fbd608ea5bab-S1 for compaction.
> Sep 22, 2016 2:42:38 AM 
> com.google.common.util.concurrent.ServiceManager$ServiceListener failed
> SEVERE: Service SlotSizeCounterService [FAILED] has failed in the RUNNING 
> state.
> java.lang.NullPointerException: Unknown Mesos resource: name: "test"
> type: SCALAR
> scalar {
>   value: 200.0
> }
> role: "*"
>   at java.util.Objects.requireNonNull(Objects.java:228)
>   at 
> org.apache.aurora.scheduler.resources.ResourceType.fromResource(ResourceType.java:355)
>   at 
> org.apache.aurora.scheduler.resources.ResourceManager.lambda$static$0(ResourceManager.java:52)
>   at com.google.common.collect.Iterators$7.computeNext(Iterators.java:675)
>   at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
>   at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
>   at java.util.Iterator.forEachRemaining(Iterator.java:115)
>   at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
>   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>   at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   at 
> java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>   at 
> org.apache.aurora.scheduler.resources.ResourceManager.bagFromResources(ResourceManager.java:274)
>   at 
> org.apache.aurora.scheduler.resources.ResourceManager.bagFromMesosResources(ResourceManager.java:239)
>   at 
> org.apache.aurora.scheduler.stats.AsyncStatsModule$OfferAdapter.get(AsyncStatsModule.java:153)
>   at 
> org.apache.aurora.scheduler.stats.SlotSizeCounter.run(SlotSizeCounter.java:168)
>   at 
> org.apache.aurora.scheduler.stats.AsyncStatsModule$SlotSizeCounterService.ru

[jira] [Resolved] (AURORA-1780) Offers with unknown resources types to Aurora crash the scheduler

2016-11-23 Thread Stephan Erb (JIRA)


 [ 
https://issues.apache.org/jira/browse/AURORA-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Erb resolved AURORA-1780.
-
Resolution: Fixed

> Offers with unknown resources types to Aurora crash the scheduler
> -
>
> Key: AURORA-1780
> URL: https://issues.apache.org/jira/browse/AURORA-1780
> Project: Aurora
>  Issue Type: Bug
> Environment: vagrant
>Reporter: Renan DelValle
>Assignee: Renan DelValle
> Fix For: 0.17.0
>
>
> Taking offers from Agents which have resources that are not known to Aurora 
> cause the Scheduler to crash.
> Steps to reproduce:
> {code}
> vagrant up
> sudo service mesos-slave stop
> echo 
> "cpus(aurora-role):0.5;cpus(*):3.5;mem(aurora-role):1024;disk:2;gpus(*):4;test:200"
>  | sudo tee /etc/mesos-slave/resources
> sudo rm -f /var/lib/mesos/meta/slaves/latest
> sudo service mesos-slave start
> {code}
> Wait around a few moments for the offer to be made to Aurora
> {code}
> I0922 02:41:57.839 [Thread-19, MesosSchedulerImpl:142] Received notification 
> of lost agent: value: "cadaf569-171d-42fc-a417-fbd608ea5bab-S0"
> I0922 02:42:30.585597  2999 log.cpp:577] Attempting to append 109 bytes to 
> the log
> I0922 02:42:30.585654  2999 coordinator.cpp:348] Coordinator attempting to 
> write APPEND action at position 4
> I0922 02:42:30.585747  2999 replica.cpp:537] Replica received write request 
> for position 4 from (10)@192.168.33.7:8083
> I0922 02:42:30.586858  2999 leveldb.cpp:341] Persisting action (125 bytes) to 
> leveldb took 1.086601ms
> I0922 02:42:30.586897  2999 replica.cpp:712] Persisted action at 4
> I0922 02:42:30.587020  2999 replica.cpp:691] Replica received learned notice 
> for position 4 from @0.0.0.0:0
> I0922 02:42:30.587785  2999 leveldb.cpp:341] Persisting action (127 bytes) to 
> leveldb took 746999ns
> I0922 02:42:30.587805  2999 replica.cpp:712] Persisted action at 4
> I0922 02:42:30.587811  2999 replica.cpp:697] Replica learned APPEND action at 
> position 4
> I0922 02:42:30.601 [SchedulerImpl-0, OfferManager$OfferManagerImpl:185] 
> Returning offers for cadaf569-171d-42fc-a417-fbd608ea5bab-S1 for compaction.
> Sep 22, 2016 2:42:38 AM 
> com.google.common.util.concurrent.ServiceManager$ServiceListener failed
> SEVERE: Service SlotSizeCounterService [FAILED] has failed in the RUNNING 
> state.
> java.lang.NullPointerException: Unknown Mesos resource: name: "test"
> type: SCALAR
> scalar {
>   value: 200.0
> }
> role: "*"
>   at java.util.Objects.requireNonNull(Objects.java:228)
>   at 
> org.apache.aurora.scheduler.resources.ResourceType.fromResource(ResourceType.java:355)
>   at 
> org.apache.aurora.scheduler.resources.ResourceManager.lambda$static$0(ResourceManager.java:52)
>   at com.google.common.collect.Iterators$7.computeNext(Iterators.java:675)
>   at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
>   at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
>   at java.util.Iterator.forEachRemaining(Iterator.java:115)
>   at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
>   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>   at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   at 
> java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>   at 
> org.apache.aurora.scheduler.resources.ResourceManager.bagFromResources(ResourceManager.java:274)
>   at 
> org.apache.aurora.scheduler.resources.ResourceManager.bagFromMesosResources(ResourceManager.java:239)
>   at 
> org.apache.aurora.scheduler.stats.AsyncStatsModule$OfferAdapter.get(AsyncStatsModule.java:153)
>   at 
> org.apache.aurora.scheduler.stats.SlotSizeCounter.run(SlotSizeCounter.java:168)
>   at 
> org.apache.aurora.scheduler.stats.AsyncStatsModule$SlotSizeCounterService.runOneIteration(AsyncStatsModule.java:130)
>   at 
> com.google.common.util.concurrent.AbstractScheduledService$ServiceDelegate$Task.run(AbstractScheduledService.java:189)
>   at com.google.common.util.concurrent.Callables$3.run(Callables.java:100)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>   at 
> jav

[jira] [Assigned] (AURORA-1827) Fix SLA percentile calculation

[jira] [Assigned] (AURORA-118) Add percentiles to @Timed, or write a new decorator to add percentiles

[jira] [Commented] (AURORA-1825) Enable async logging by default

[jira] [Commented] (AURORA-1825) Enable async logging by default

[jira] [Commented] (AURORA-1825) Enable async logging by default

[jira] [Assigned] (AURORA-1825) Enable async logging by default

[jira] [Created] (AURORA-1829) Expose stats on preemptor BiCache expirations

[jira] [Commented] (AURORA-1827) Fix SLA percentile calculation

[jira] [Created] (AURORA-1828) Expose stats on the number of offers evaluated before a task is assigned

[jira] [Updated] (AURORA-1826) Expose Thrift server request workload stats

[jira] [Updated] (AURORA-1827) Fix SLA percentile calculation

[jira] [Created] (AURORA-1827) Fix SLA percentile calculation

[jira] [Created] (AURORA-1826) Expose Thrift server request workload stats

[jira] [Commented] (AURORA-1780) Offers with unknown resources types to Aurora crash the scheduler

[jira] [Resolved] (AURORA-1780) Offers with unknown resources types to Aurora crash the scheduler

15 matches

Site Navigation

Mail list logo

Footer information