Re: spark.akka.frameSize stalls job in 1.1.0

Jerry Ye Sat, 16 Aug 2014 00:19:27 -0700

Hi Xiangrui,
I actually tried branch-1.1 and master and it resulted in the job being
stuck at the TaskSetManager:
14/08/16 06:55:48 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0
with 2 tasks
14/08/16 06:55:48 INFO scheduler.TaskSetManager: Starting task 1.0:0 as TID
2 on executor 8: ip-10-226-199-225.us-west-2.compute.internal
(PROCESS_LOCAL)
14/08/16 06:55:48 INFO scheduler.TaskSetManager: Serialized task 1.0:0 as
28055875 bytes in 162 ms
14/08/16 06:55:48 INFO scheduler.TaskSetManager: Starting task 1.0:1 as TID
3 on executor 0: ip-10-249-53-62.us-west-2.compute.internal (PROCESS_LOCAL)
14/08/16 06:55:48 INFO scheduler.TaskSetManager: Serialized task 1.0:1 as
28055875 bytes in 178 ms


It's been 10 minutes with no progress on relatively small data. I'll let it
run overnight and update in the morning. Is there some place that I should
look to see what is happening? I tried to ssh into the executor and look at
/root/spark/logs but there wasn't anything informative there.

I'm sure using CountByValue works fine but my use of a HashMap is only an
example. In my actual task, I'm loading a Trie data structure to perform
efficient string matching between a dataset of locations and strings
possibly containing mentions of locations.

This seems like a common thing, to process input with a relatively memory
intensive object like a Trie. I hope I'm not missing something obvious. Do
you know of any example code like my use case?

Thanks!

- jerry




On Fri, Aug 15, 2014 at 10:02 PM, Xiangrui Meng <[email protected]> wrote:

> Just saw you used toArray on an RDD. That copies all data to the
> driver and it is deprecated. countByValue is what you need:
>
> val samples = sc.textFile("s3n://geonames")
> val counts = samples.countByValue()
> val result = samples.map(l => (l, counts.getOrElse(l, 0L))
>
> Could you also try to use the latest branch-1.1 or master with the
> default akka.frameSize setting? The serialized task size should be
> small because we now use broadcast RDD objects.
>
> -Xiangrui
>
> On Fri, Aug 15, 2014 at 5:11 PM, jerryye <[email protected]> wrote:
> > Hi Xiangrui,
> > You were right, I had to use --driver_memory instead of setting it in
> > spark-defaults.conf.
> >
> > However, now my just hangs with the following message:
> > 4/08/15 23:54:46 INFO scheduler.TaskSetManager: Serialized task 1.0:0 as
> > 29433434 bytes in 202 ms
> > 14/08/15 23:54:46 INFO scheduler.TaskSetManager: Starting task 1.0:1 as
> TID
> > 3 on executor 1: ip-10-226-198-31.us-west-2.compute.internal
> (PROCESS_LOCAL)
> > 14/08/15 23:54:46 INFO scheduler.TaskSetManager: Serialized task 1.0:1 as
> > 29433434 bytes in 203 ms
> >
> > Any ideas on where else to look?
> >
> >
> > On Fri, Aug 15, 2014 at 3:29 PM, Xiangrui Meng [via Apache Spark
> Developers
> > List] <[email protected]> wrote:
> >
> >> Did you verify the driver memory in the Executor tab of the WebUI? I
> >> think you need `--driver-memory 8g` with spark-shell or spark-submit
> >> instead of setting it in spark-defaults.conf.
> >>
> >> On Fri, Aug 15, 2014 at 12:41 PM, jerryye <[hidden email]
> >> <http://user/SendEmail.jtp?type=node&node=7883&i=0>> wrote:
> >>
> >> > Setting spark.driver.memory has no effect. It's still hanging trying
> to
> >> > compute result.count when I'm sampling greater than 35% regardless of
> >> what
> >> > value of spark.driver.memory I'm setting.
> >> >
> >> > Here's my settings:
> >> > export SPARK_JAVA_OPTS="-Xms5g -Xmx10g -XX:MaxPermSize=10g"
> >> > export SPARK_MEM=10g
> >> >
> >> > in conf/spark-defaults:
> >> > spark.driver.memory 1500
> >> > spark.serializer org.apache.spark.serializer.KryoSerializer
> >> > spark.kryoserializer.buffer.mb 500
> >> > spark.executor.memory 58315m
> >> > spark.executor.extraLibraryPath /root/ephemeral-hdfs/lib/native/
> >> > spark.executor.extraClassPath /root/ephemeral-hdfs/conf
> >> >
> >> >
> >> >
> >> > --
> >> > View this message in context:
> >>
> http://apache-spark-developers-list.1001551.n3.nabble.com/spark-akka-frameSize-stalls-job-in-1-1-0-tp7865p7877.html
> >>
> >> > Sent from the Apache Spark Developers List mailing list archive at
> >> Nabble.com.
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: [hidden email]
> >> <http://user/SendEmail.jtp?type=node&node=7883&i=1>
> >> > For additional commands, e-mail: [hidden email]
> >> <http://user/SendEmail.jtp?type=node&node=7883&i=2>
> >> >
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [hidden email]
> >> <http://user/SendEmail.jtp?type=node&node=7883&i=3>
> >> For additional commands, e-mail: [hidden email]
> >> <http://user/SendEmail.jtp?type=node&node=7883&i=4>
> >>
> >>
> >>
> >> ------------------------------
> >>  If you reply to this email, your message will be added to the
> discussion
> >> below:
> >>
> >>
> http://apache-spark-developers-list.1001551.n3.nabble.com/spark-akka-frameSize-stalls-job-in-1-1-0-tp7865p7883.html
> >>  To start a new topic under Apache Spark Developers List, email
> >> [email protected]
> >> To unsubscribe from spark.akka.frameSize stalls job in 1.1.0, click here
> >> <
> http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=7865&code=amVycnl5ZUBnbWFpbC5jb218Nzg2NXwtNTI4OTc1MTAz
> >
> >> .
> >> NAML
> >> <
> http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml
> >
> >>
> >
> >
> >
> >
> > --
> > View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/spark-akka-frameSize-stalls-job-in-1-1-0-tp7865p7886.html
> > Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>

Re: spark.akka.frameSize stalls job in 1.1.0

Reply via email to