Re: Using dynamic allocation and shuffle service in Standalone Mode

Yuval Itzchakov Tue, 08 Mar 2016 11:57:01 -0800

Great.
Thanks a lot Silvio.

On Tue, Mar 8, 2016, 21:39 Andrew Or <and...@databricks.com> wrote:


> Hi Yuval, if you start the Workers with `spark.shuffle.service.enabled =
> true` then the workers will each start a shuffle service automatically. No
> need to start the shuffle services yourself separately.
>
> -Andrew
>
> 2016-03-08 11:21 GMT-08:00 Silvio Fiorito <silvio.fior...@granturing.com>:
>
>> There’s a script to start it up under sbin, start-shuffle-service.sh. Run
>> that on each of your worker nodes.
>>
>>
>>
>>
>>
>>
>>
>> *From: *Yuval Itzchakov <yuva...@gmail.com>
>> *Sent: *Tuesday, March 8, 2016 2:17 PM
>> *To: *Silvio Fiorito <silvio.fior...@granturing.com>;
>> user@spark.apache.org
>> *Subject: *Re: Using dynamic allocation and shuffle service in
>> Standalone Mode
>>
>>
>> Actually, I assumed that setting the flag in the spark job would turn on
>> the shuffle service in the workers. I now understand that assumption was
>> wrong.
>>
>> Is there any way to set the flag via the driver? Or must I manually set
>> it via spark-env.sh on each worker?
>>
>>
>> On Tue, Mar 8, 2016, 20:14 Silvio Fiorito <silvio.fior...@granturing.com>
>> wrote:
>>
>>> You’ve started the external shuffle service on all worker nodes,
>>> correct? Can you confirm they’re still running and haven’t exited?
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *From: *Yuval.Itzchakov <yuva...@gmail.com>
>>> *Sent: *Tuesday, March 8, 2016 12:41 PM
>>> *To: *user@spark.apache.org
>>> *Subject: *Using dynamic allocation and shuffle service in Standalone
>>> Mode
>>>
>>>
>>> Hi,
>>> I'm using Spark 1.6.0, and according to the documentation, dynamic
>>> allocation and spark shuffle service should be enabled.
>>>
>>> When I submit a spark job via the following:
>>>
>>> spark-submit \
>>> --master **** \
>>> --deploy-mode cluster \
>>> --executor-cores 3 \
>>> --conf "spark.streaming.backpressure.enabled=true" \
>>> --conf "spark.dynamicAllocation.enabled=true" \
>>> --conf "spark.dynamicAllocation.minExecutors=2" \
>>> --conf "spark.dynamicAllocation.maxExecutors=24" \
>>> --conf "spark.shuffle.service.enabled=true" \
>>> --conf "spark.executor.memory=8g" \
>>> --conf "spark.driver.memory=10g" \
>>> --class SparkJobRunner
>>>
>>> /opt/clicktale/entityCreator/com.clicktale.ai.entity-creator-assembly-0.0.2.jar
>>>
>>> I'm seeing error logs from the workers being unable to connect to the
>>> shuffle service:
>>>
>>> 16/03/08 17:33:15 ERROR storage.BlockManager: Failed to connect to
>>> external
>>> shuffle server, will retry 2 more times after waiting 5 seconds...
>>> java.io.IOException: Failed to connect to ****
>>>         at
>>>
>>> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
>>>         at
>>>
>>> org.apache.spark.network.client.TransportClientFactory.createUnmanagedClient(TransportClientFactory.java:181)
>>>         at
>>>
>>> org.apache.spark.network.shuffle.ExternalShuffleClient.registerWithShuffleServer(ExternalShuffleClient.java:141)
>>>         at
>>>
>>> org.apache.spark.storage.BlockManager$$anonfun$registerWithExternalShuffleServer$1.apply$mcVI$sp(BlockManager.scala:211)
>>>         at
>>> scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
>>>         at
>>>
>>> org.apache.spark.storage.BlockManager.registerWithExternalShuffleServer(BlockManager.scala:208)
>>>         at
>>> org.apache.spark.storage.BlockManager.initialize(BlockManager.scala:194)
>>>         at org.apache.spark.executor.Executor.<init>(Executor.scala:85)
>>>         at
>>>
>>> org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:83)
>>>         at
>>>
>>> org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116)
>>>         at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
>>>         at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
>>>         at
>>>
>>> org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)
>>>         at
>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>>         at
>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>>         at java.lang.Thread.run(Thread.java:745)
>>>
>>> I verified all relevant ports are open. Has anyone else experienced such
>>> a
>>> failure?
>>>
>>> Yuval.
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Using-dynamic-allocation-and-shuffle-service-in-Standalone-Mode-tp26430.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>>>
>

Re: Using dynamic allocation and shuffle service in Standalone Mode

Reply via email to