[jira] [Comment Edited] (SPARK-18935) Use Mesos "Dynamic Reservation" resource for Spark

Stavros Kontopoulos (JIRA) Thu, 28 Sep 2017 07:13:31 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-18935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16184025#comment-16184025
 ]


Stavros Kontopoulos edited comment on SPARK-18935 at 9/28/17 2:12 PM:
----------------------------------------------------------------------

I can reproduce it locally in client mode according to the example, I am 
looking into this.

Btw I tested this a while ago with spark latest version (master) and cluster 
mode (if I recall correctly) and stuck here:

{noformat}
I0829 14:05:56.342872 21756 master.cpp:6532] Sending status update TASK_ERROR 
for task 1 of framework f0a1a46a-e404-4faa-87f7-29479f30b57e-0009 'Total 
resources cpus(spark-prive)(allocated: spark-prive):1; cpus(*)(allocated: 
spark-prive):1; mem(spark-prive)(allocated: spark-prive):1024; 
mem(*)(allocated: spark-prive):384 required by task and its executor is more 
than available disk(spark-prive, )(allocated: spark-prive):1000; 
ports(spark-prive, )(allocated: spark-prive):[31000-32000]; cpus(spark-prive, 
)(allocated: spark-prive):1; mem(spark-prive, )(allocated: spark-prive):1024; 
cpus(*)(allocated: spark-prive):1; mem(*)(allocated: spark-prive):976; 
disk(*)(allocated: spark-prive):9000'
{noformat}


To reserve resources I used:

{code:java}
curl -i -d slaveId=cf885682-8a28-4e82-b5db-a01277edfafc-S0 -d 
resources='[{"name":"disk","type":"SCALAR","scalar": {"value":1000} 
,"role":"spark-prive","reservation":{"principal":""}},{"name":"ports","type":"RANGES","ranges":
 { "range": [{"begin": 31000, "end": 32000}] 
},"role":"spark-prive","reservation":{"principal":""}}, 
{"name":"cpus","type":"SCALAR","scalar": {"value":1} 
,"role":"spark-prive","reservation":{"principal":""}},{"name":"mem","type":"SCALAR","scalar":
 {"value":1024},"role":"spark-prive","reservation":{"principal":""}}]' -X POST 
http://172.17.0.1:5050/master/reserve

{code}

The error comes from here: 
[https://github.com/apache/mesos/blob/11ee081ee578ea12e85799e00c5fe8b89eb6ea5f/src/master/validation.cpp#L1339].
I havent checked with a principal yet, but seems optional if you have no 
authorization 
(https://github.com/apache/mesos/commit/efbdef8dfd96ff08c1342b171ef89dcb266bdce7.
However the comments here require it: 
https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto#L274-L278
The error itself indicates that for some reason the task resources requested 
are less than the available on the slave. I checked the numbers does not make 
sense unless there is a mismatch for the principal empty vs "" or something. I 
will test. Since numbers are fine it seems the failure is somewhere here where 
the resources are compared: 
https://github.com/apache/mesos/blob/d47641039f5e2dd18af007250ef7ae2a34258a2d/src/common/resources.cpp#L438



was (Author: skonto):
I can reproduce it locally according to the example, I am looking at it.

Btw I tested this a while ago with spark latest version (master) and cluster 
mode (if I recall correctly) and stuck here:

{noformat}
I0829 14:05:56.342872 21756 master.cpp:6532] Sending status update TASK_ERROR 
for task 1 of framework f0a1a46a-e404-4faa-87f7-29479f30b57e-0009 'Total 
resources cpus(spark-prive)(allocated: spark-prive):1; cpus(*)(allocated: 
spark-prive):1; mem(spark-prive)(allocated: spark-prive):1024; 
mem(*)(allocated: spark-prive):384 required by task and its executor is more 
than available disk(spark-prive, )(allocated: spark-prive):1000; 
ports(spark-prive, )(allocated: spark-prive):[31000-32000]; cpus(spark-prive, 
)(allocated: spark-prive):1; mem(spark-prive, )(allocated: spark-prive):1024; 
cpus(*)(allocated: spark-prive):1; mem(*)(allocated: spark-prive):976; 
disk(*)(allocated: spark-prive):9000'
{noformat}


To reserve resources I used:

{code:java}
curl -i -d slaveId=cf885682-8a28-4e82-b5db-a01277edfafc-S0 -d 
resources='[{"name":"disk","type":"SCALAR","scalar": {"value":1000} 
,"role":"spark-prive","reservation":{"principal":""}},{"name":"ports","type":"RANGES","ranges":
 { "range": [{"begin": 31000, "end": 32000}] 
},"role":"spark-prive","reservation":{"principal":""}}, 
{"name":"cpus","type":"SCALAR","scalar": {"value":1} 
,"role":"spark-prive","reservation":{"principal":""}},{"name":"mem","type":"SCALAR","scalar":
 {"value":1024},"role":"spark-prive","reservation":{"principal":""}}]' -X POST 
http://172.17.0.1:5050/master/reserve

{code}

The error comes from here: 
[https://github.com/apache/mesos/blob/11ee081ee578ea12e85799e00c5fe8b89eb6ea5f/src/master/validation.cpp#L1339].
I havent checked with a principal yet, but seems optional if you have no 
authorization 
(https://github.com/apache/mesos/commit/efbdef8dfd96ff08c1342b171ef89dcb266bdce7.
However the comments here require it: 
https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto#L274-L278
The error itself indicates that for some reason the task resources requested 
are less than the available on the slave. I checked the numbers does not make 
sense unless there is a mismatch for the principal empty vs "" or something. I 
will test. Since numbers are fine it seems the failure is somewhere here where 
the resources are compared: 
https://github.com/apache/mesos/blob/d47641039f5e2dd18af007250ef7ae2a34258a2d/src/common/resources.cpp#L438


> Use Mesos "Dynamic Reservation" resource for Spark
> --------------------------------------------------
>
>                 Key: SPARK-18935
>                 URL: https://issues.apache.org/jira/browse/SPARK-18935
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 2.0.0, 2.0.1, 2.0.2
>            Reporter: jackyoh
>
> I'm running spark on Apache Mesos
> Please follow these steps to reproduce the issue:
> 1. First, run Mesos resource reserve:
> curl -i -d slaveId=c24d1cfb-79f3-4b07-9f8b-c7b19543a333-S0 -d 
> resources='[{"name":"cpus","type":"SCALAR","scalar":{"value":20},"role":"spark","reservation":{"principal":""}},{"name":"mem","type":"SCALAR","scalar":{"value":4096},"role":"spark","reservation":{"principal":""}}]'
>  -X POST http://192.168.1.118:5050/master/reserve
> 2. Then run spark-submit command:
> ./spark-submit --class org.apache.spark.examples.SparkPi --master 
> mesos://192.168.1.118:5050 --conf spark.mesos.role=spark  
> ../examples/jars/spark-examples_2.11-2.0.2.jar 10000
> And the console will keep loging same warning message as shown below: 
> 16/12/19 22:33:28 WARN TaskSchedulerImpl: Initial job has not accepted any 
> resources; check your cluster UI to ensure that workers are registered and 
> have sufficient resources



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-18935) Use Mesos "Dynamic Reservation" resource for Spark

Reply via email to