setting inputMetrics in HadoopRDD#compute()

2014-07-26 Thread Ted Yu
Hi, Starting at line 203: try { /* bytesRead may not exactly equal the bytes read by a task: split boundaries aren't * always at record boundaries, so tasks may need to read into other splits to complete * a record. */ inputMetrics.bytesRead = split.inputSpli

Re: setting inputMetrics in HadoopRDD#compute()

2014-07-26 Thread Reynold Xin
There is one piece of information that'd be useful to know, which is the source of the input. Even in the presence of an IOException, the input metrics still specifies the task is reading from Hadoop. However, I'm slightly confused by this -- I think usually we'd want to report the number of bytes

Re: setting inputMetrics in HadoopRDD#compute()

2014-07-26 Thread Sandy Ryza
I'm working on a patch that switches this stuff out with the Hadoop FileSystem StatisticsData, which will both give an accurate count and allow us to get metrics while the task is in progress. A hitch is that it relies on https://issues.apache.org/jira/browse/HADOOP-10688, so we still might want a

Re: setting inputMetrics in HadoopRDD#compute()

2014-07-26 Thread Reynold Xin
That makes sense, Sandy. When you add the patch, can you make sure you comment inline on why the fallback is needed? On Sat, Jul 26, 2014 at 11:46 AM, Sandy Ryza wrote: > I'm working on a patch that switches this stuff out with the Hadoop > FileSystem StatisticsData, which will both give an a

Re: setting inputMetrics in HadoopRDD#compute()

2014-07-26 Thread Kay Ousterhout
Reynold you're totally right, as discussed offline -- I didn't think about the limit use case when I wrote this. Sandy, is it easy to fix this as part of your patch to use StatisticsData? If not, I can fix it in a separate patch. On Sat, Jul 26, 2014 at 12:12 PM, Reynold Xin wrote: > That mak

Re: SparkContext startup time out

2014-07-26 Thread Anand Avati
I am bumping into this problem as well. I am trying to move to akka 2.3.x from 2.2.x in order to port to Scala 2.11 - only akka 2.3.x is available in Scala 2.11. All 2.2.x akka works fine, and all 2.3.x akka give the following exception in "new SparkContext". Still investigating why.. java.util.