[ https://issues.apache.org/jira/browse/SPARK-18935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16184025#comment-16184025 ]
Stavros Kontopoulos edited comment on SPARK-18935 at 9/28/17 2:12 PM: ---------------------------------------------------------------------- I can reproduce it locally in client mode according to the example, I am looking into this. Btw I tested this a while ago with spark latest version (master) and cluster mode (if I recall correctly) and stuck here: {noformat} I0829 14:05:56.342872 21756 master.cpp:6532] Sending status update TASK_ERROR for task 1 of framework f0a1a46a-e404-4faa-87f7-29479f30b57e-0009 'Total resources cpus(spark-prive)(allocated: spark-prive):1; cpus(*)(allocated: spark-prive):1; mem(spark-prive)(allocated: spark-prive):1024; mem(*)(allocated: spark-prive):384 required by task and its executor is more than available disk(spark-prive, )(allocated: spark-prive):1000; ports(spark-prive, )(allocated: spark-prive):[31000-32000]; cpus(spark-prive, )(allocated: spark-prive):1; mem(spark-prive, )(allocated: spark-prive):1024; cpus(*)(allocated: spark-prive):1; mem(*)(allocated: spark-prive):976; disk(*)(allocated: spark-prive):9000' {noformat} To reserve resources I used: {code:java} curl -i -d slaveId=cf885682-8a28-4e82-b5db-a01277edfafc-S0 -d resources='[{"name":"disk","type":"SCALAR","scalar": {"value":1000} ,"role":"spark-prive","reservation":{"principal":""}},{"name":"ports","type":"RANGES","ranges": { "range": [{"begin": 31000, "end": 32000}] },"role":"spark-prive","reservation":{"principal":""}}, {"name":"cpus","type":"SCALAR","scalar": {"value":1} ,"role":"spark-prive","reservation":{"principal":""}},{"name":"mem","type":"SCALAR","scalar": {"value":1024},"role":"spark-prive","reservation":{"principal":""}}]' -X POST http://172.17.0.1:5050/master/reserve {code} The error comes from here: [https://github.com/apache/mesos/blob/11ee081ee578ea12e85799e00c5fe8b89eb6ea5f/src/master/validation.cpp#L1339]. I havent checked with a principal yet, but seems optional if you have no authorization (https://github.com/apache/mesos/commit/efbdef8dfd96ff08c1342b171ef89dcb266bdce7. However the comments here require it: https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto#L274-L278 The error itself indicates that for some reason the task resources requested are less than the available on the slave. I checked the numbers does not make sense unless there is a mismatch for the principal empty vs "" or something. I will test. Since numbers are fine it seems the failure is somewhere here where the resources are compared: https://github.com/apache/mesos/blob/d47641039f5e2dd18af007250ef7ae2a34258a2d/src/common/resources.cpp#L438 was (Author: skonto): I can reproduce it locally according to the example, I am looking at it. Btw I tested this a while ago with spark latest version (master) and cluster mode (if I recall correctly) and stuck here: {noformat} I0829 14:05:56.342872 21756 master.cpp:6532] Sending status update TASK_ERROR for task 1 of framework f0a1a46a-e404-4faa-87f7-29479f30b57e-0009 'Total resources cpus(spark-prive)(allocated: spark-prive):1; cpus(*)(allocated: spark-prive):1; mem(spark-prive)(allocated: spark-prive):1024; mem(*)(allocated: spark-prive):384 required by task and its executor is more than available disk(spark-prive, )(allocated: spark-prive):1000; ports(spark-prive, )(allocated: spark-prive):[31000-32000]; cpus(spark-prive, )(allocated: spark-prive):1; mem(spark-prive, )(allocated: spark-prive):1024; cpus(*)(allocated: spark-prive):1; mem(*)(allocated: spark-prive):976; disk(*)(allocated: spark-prive):9000' {noformat} To reserve resources I used: {code:java} curl -i -d slaveId=cf885682-8a28-4e82-b5db-a01277edfafc-S0 -d resources='[{"name":"disk","type":"SCALAR","scalar": {"value":1000} ,"role":"spark-prive","reservation":{"principal":""}},{"name":"ports","type":"RANGES","ranges": { "range": [{"begin": 31000, "end": 32000}] },"role":"spark-prive","reservation":{"principal":""}}, {"name":"cpus","type":"SCALAR","scalar": {"value":1} ,"role":"spark-prive","reservation":{"principal":""}},{"name":"mem","type":"SCALAR","scalar": {"value":1024},"role":"spark-prive","reservation":{"principal":""}}]' -X POST http://172.17.0.1:5050/master/reserve {code} The error comes from here: [https://github.com/apache/mesos/blob/11ee081ee578ea12e85799e00c5fe8b89eb6ea5f/src/master/validation.cpp#L1339]. I havent checked with a principal yet, but seems optional if you have no authorization (https://github.com/apache/mesos/commit/efbdef8dfd96ff08c1342b171ef89dcb266bdce7. However the comments here require it: https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto#L274-L278 The error itself indicates that for some reason the task resources requested are less than the available on the slave. I checked the numbers does not make sense unless there is a mismatch for the principal empty vs "" or something. I will test. Since numbers are fine it seems the failure is somewhere here where the resources are compared: https://github.com/apache/mesos/blob/d47641039f5e2dd18af007250ef7ae2a34258a2d/src/common/resources.cpp#L438 > Use Mesos "Dynamic Reservation" resource for Spark > -------------------------------------------------- > > Key: SPARK-18935 > URL: https://issues.apache.org/jira/browse/SPARK-18935 > Project: Spark > Issue Type: Bug > Affects Versions: 2.0.0, 2.0.1, 2.0.2 > Reporter: jackyoh > > I'm running spark on Apache Mesos > Please follow these steps to reproduce the issue: > 1. First, run Mesos resource reserve: > curl -i -d slaveId=c24d1cfb-79f3-4b07-9f8b-c7b19543a333-S0 -d > resources='[{"name":"cpus","type":"SCALAR","scalar":{"value":20},"role":"spark","reservation":{"principal":""}},{"name":"mem","type":"SCALAR","scalar":{"value":4096},"role":"spark","reservation":{"principal":""}}]' > -X POST http://192.168.1.118:5050/master/reserve > 2. Then run spark-submit command: > ./spark-submit --class org.apache.spark.examples.SparkPi --master > mesos://192.168.1.118:5050 --conf spark.mesos.role=spark > ../examples/jars/spark-examples_2.11-2.0.2.jar 10000 > And the console will keep loging same warning message as shown below: > 16/12/19 22:33:28 WARN TaskSchedulerImpl: Initial job has not accepted any > resources; check your cluster UI to ensure that workers are registered and > have sufficient resources -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org