Re: SparkContext startup time out
I am bumping into this problem as well. I am trying to move to akka 2.3.x from 2.2.x in order to port to Scala 2.11 - only akka 2.3.x is available in Scala 2.11. All 2.2.x akka works fine, and all 2.3.x akka give the following exception in "new SparkContext". Still investigating why.. java.util.concurrent.TimeoutException: Futures timed out after [1 milliseconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at akka.remote.Remoting.start(Remoting.scala:180) at akka.remote.RemoteActorRefProvider.init(RemoteActorRefProvider.scala:184) at akka.actor.ActorSystemImpl.liftedTree2$1(ActorSystem.scala:618) at akka.actor.ActorSystemImpl._start$lzycompute(ActorSystem.scala:615) at akka.actor.ActorSystemImpl._start(ActorSystem.scala:615) On Fri, May 30, 2014 at 6:33 AM, Pierre B < pierre.borckm...@realimpactanalytics.com> wrote: > I was annoyed by this as well. > It appears that just permuting the order of decencies inclusion solves this > problem: > > first spark, than your cdh hadoop distro. > > HTH, > > Pierre > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/SparkContext-startup-time-out-tp1753p6582.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >
Re: setting inputMetrics in HadoopRDD#compute()
Reynold you're totally right, as discussed offline -- I didn't think about the limit use case when I wrote this. Sandy, is it easy to fix this as part of your patch to use StatisticsData? If not, I can fix it in a separate patch. On Sat, Jul 26, 2014 at 12:12 PM, Reynold Xin wrote: > That makes sense, Sandy. > > When you add the patch, can you make sure you comment inline on why the > fallback is needed? > > > > On Sat, Jul 26, 2014 at 11:46 AM, Sandy Ryza > wrote: > >> I'm working on a patch that switches this stuff out with the Hadoop >> FileSystem StatisticsData, which will both give an accurate count and >> allow >> us to get metrics while the task is in progress. A hitch is that it >> relies >> on https://issues.apache.org/jira/browse/HADOOP-10688, so we still might >> want a fallback for versions of Hadoop that don't have this API. >> >> >> On Sat, Jul 26, 2014 at 10:47 AM, Reynold Xin >> wrote: >> >> > There is one piece of information that'd be useful to know, which is the >> > source of the input. Even in the presence of an IOException, the input >> > metrics still specifies the task is reading from Hadoop. >> > >> > However, I'm slightly confused by this -- I think usually we'd want to >> > report the number of bytes read, rather than the total input size. For >> > example, if there is a limit (only read the first 5 records), the actual >> > number of bytes read is much smaller than the total split size. >> > >> > Kay, am I mis-interpreting this? >> > >> > >> > >> > On Sat, Jul 26, 2014 at 7:42 AM, Ted Yu wrote: >> > >> > > Hi, >> > > Starting at line 203: >> > > try { >> > > /* bytesRead may not exactly equal the bytes read by a task: >> > split >> > > boundaries aren't >> > > * always at record boundaries, so tasks may need to read into >> > > other splits to complete >> > > * a record. */ >> > > inputMetrics.bytesRead = split.inputSplit.value.getLength() >> > > } catch { >> > > case e: java.io.IOException => >> > > logWarning("Unable to get input size to set InputMetrics for >> > > task", e) >> > > } >> > > context.taskMetrics.inputMetrics = Some(inputMetrics) >> > > >> > > If there is IOException, context.taskMetrics.inputMetrics is set by >> > > wrapping inputMetrics - as if there wasn't any error. >> > > >> > > I wonder if the above code should distinguish the error condition. >> > > >> > > Cheers >> > > >> > >> > >
Re: setting inputMetrics in HadoopRDD#compute()
That makes sense, Sandy. When you add the patch, can you make sure you comment inline on why the fallback is needed? On Sat, Jul 26, 2014 at 11:46 AM, Sandy Ryza wrote: > I'm working on a patch that switches this stuff out with the Hadoop > FileSystem StatisticsData, which will both give an accurate count and allow > us to get metrics while the task is in progress. A hitch is that it relies > on https://issues.apache.org/jira/browse/HADOOP-10688, so we still might > want a fallback for versions of Hadoop that don't have this API. > > > On Sat, Jul 26, 2014 at 10:47 AM, Reynold Xin wrote: > > > There is one piece of information that'd be useful to know, which is the > > source of the input. Even in the presence of an IOException, the input > > metrics still specifies the task is reading from Hadoop. > > > > However, I'm slightly confused by this -- I think usually we'd want to > > report the number of bytes read, rather than the total input size. For > > example, if there is a limit (only read the first 5 records), the actual > > number of bytes read is much smaller than the total split size. > > > > Kay, am I mis-interpreting this? > > > > > > > > On Sat, Jul 26, 2014 at 7:42 AM, Ted Yu wrote: > > > > > Hi, > > > Starting at line 203: > > > try { > > > /* bytesRead may not exactly equal the bytes read by a task: > > split > > > boundaries aren't > > > * always at record boundaries, so tasks may need to read into > > > other splits to complete > > > * a record. */ > > > inputMetrics.bytesRead = split.inputSplit.value.getLength() > > > } catch { > > > case e: java.io.IOException => > > > logWarning("Unable to get input size to set InputMetrics for > > > task", e) > > > } > > > context.taskMetrics.inputMetrics = Some(inputMetrics) > > > > > > If there is IOException, context.taskMetrics.inputMetrics is set by > > > wrapping inputMetrics - as if there wasn't any error. > > > > > > I wonder if the above code should distinguish the error condition. > > > > > > Cheers > > > > > >
Re: setting inputMetrics in HadoopRDD#compute()
I'm working on a patch that switches this stuff out with the Hadoop FileSystem StatisticsData, which will both give an accurate count and allow us to get metrics while the task is in progress. A hitch is that it relies on https://issues.apache.org/jira/browse/HADOOP-10688, so we still might want a fallback for versions of Hadoop that don't have this API. On Sat, Jul 26, 2014 at 10:47 AM, Reynold Xin wrote: > There is one piece of information that'd be useful to know, which is the > source of the input. Even in the presence of an IOException, the input > metrics still specifies the task is reading from Hadoop. > > However, I'm slightly confused by this -- I think usually we'd want to > report the number of bytes read, rather than the total input size. For > example, if there is a limit (only read the first 5 records), the actual > number of bytes read is much smaller than the total split size. > > Kay, am I mis-interpreting this? > > > > On Sat, Jul 26, 2014 at 7:42 AM, Ted Yu wrote: > > > Hi, > > Starting at line 203: > > try { > > /* bytesRead may not exactly equal the bytes read by a task: > split > > boundaries aren't > > * always at record boundaries, so tasks may need to read into > > other splits to complete > > * a record. */ > > inputMetrics.bytesRead = split.inputSplit.value.getLength() > > } catch { > > case e: java.io.IOException => > > logWarning("Unable to get input size to set InputMetrics for > > task", e) > > } > > context.taskMetrics.inputMetrics = Some(inputMetrics) > > > > If there is IOException, context.taskMetrics.inputMetrics is set by > > wrapping inputMetrics - as if there wasn't any error. > > > > I wonder if the above code should distinguish the error condition. > > > > Cheers > > >
Re: setting inputMetrics in HadoopRDD#compute()
There is one piece of information that'd be useful to know, which is the source of the input. Even in the presence of an IOException, the input metrics still specifies the task is reading from Hadoop. However, I'm slightly confused by this -- I think usually we'd want to report the number of bytes read, rather than the total input size. For example, if there is a limit (only read the first 5 records), the actual number of bytes read is much smaller than the total split size. Kay, am I mis-interpreting this? On Sat, Jul 26, 2014 at 7:42 AM, Ted Yu wrote: > Hi, > Starting at line 203: > try { > /* bytesRead may not exactly equal the bytes read by a task: split > boundaries aren't > * always at record boundaries, so tasks may need to read into > other splits to complete > * a record. */ > inputMetrics.bytesRead = split.inputSplit.value.getLength() > } catch { > case e: java.io.IOException => > logWarning("Unable to get input size to set InputMetrics for > task", e) > } > context.taskMetrics.inputMetrics = Some(inputMetrics) > > If there is IOException, context.taskMetrics.inputMetrics is set by > wrapping inputMetrics - as if there wasn't any error. > > I wonder if the above code should distinguish the error condition. > > Cheers >
setting inputMetrics in HadoopRDD#compute()
Hi, Starting at line 203: try { /* bytesRead may not exactly equal the bytes read by a task: split boundaries aren't * always at record boundaries, so tasks may need to read into other splits to complete * a record. */ inputMetrics.bytesRead = split.inputSplit.value.getLength() } catch { case e: java.io.IOException => logWarning("Unable to get input size to set InputMetrics for task", e) } context.taskMetrics.inputMetrics = Some(inputMetrics) If there is IOException, context.taskMetrics.inputMetrics is set by wrapping inputMetrics - as if there wasn't any error. I wonder if the above code should distinguish the error condition. Cheers