[
https://issues.apache.org/jira/browse/GRIFFIN-293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
William Guo resolved GRIFFIN-293.
---------------------------------
Resolution: Fixed
Issue resolved by pull request 541
[https://github.com/apache/griffin/pull/541]
> [Service] livy.need.queue=true
> ------------------------------
>
> Key: GRIFFIN-293
> URL: https://issues.apache.org/jira/browse/GRIFFIN-293
> Project: Griffin
> Issue Type: Bug
> Affects Versions: 0.6.0
> Reporter: Nevena Veljkovic
> Priority: Critical
> Fix For: 0.6.0
>
> Time Spent: 40m
> Remaining Estimate: 0h
>
> While using griffin in several productions environments, having x10 jobs
> starting at same hour, minute, second, we figured out that 2 (or more)
> concurrent griffin jobs are not submitted and executed to the end (the last
> was submitted multiple times, the rest never).
> example
> 2 jobs "beta_node_metrics_fact" and "beta_node_master_dimension_device",
> difference between them is 1 millisecond
> {code:java}
> 2019-09-28 14:00:37.090 INFO 2732 --- [ryBean_Worker-4]
> o.a.g.c.j.SparkSubmitJob [203] : {
> "measure.type" : "griffin",
> "id" : 60560,
> "name" : "beta_node_metrics_fact",
> 2019-09-28 14:00:37.091 INFO 2732 --- [ryBean_Worker-5]
> o.a.g.c.j.SparkSubmitJob [203] : {
> "measure.type" : "griffin",
> "id" : 63751,
> "name" : "beta_node_master_dimension_device",
> {code}
> livy submitted 2 jobs/tasks, both contained
> "beta_node_master_dimension_device"
> That's why decided to use setting "livy.need.queue=true".
> During testing we figured out queueing does not work at all as
> LivyTaskSubmitHelper's member sparkSubmitJob was not instantiated
>
> [https://github.com/apache/griffin/blob/master/service/src/main/java/org/apache/griffin/core/job/LivyTaskSubmitHelper.java#L64]
> We fixed this and continue with testing.
> During testing we figured out that curConcurrentTaskNum does not decrease
> finished tasks (state SUCCESS or DEAD).
>
> [https://github.com/apache/griffin/blob/master/service/src/main/java/org/apache/griffin/core/job/JobServiceImpl.java#L632-L633]
> We fixed this also.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)