> On 30 Dec 2015, at 13:19, alvarobrandon <alvarobran...@gmail.com> wrote:
> 
> Hello:
> 
> Is there anyway of monitoring the number of Bytes or blocks read and written
> by an Spark application?. I'm running Spark with YARN and I want to measure
> how I/O intensive a set of applications are. Closest thing I have seen is
> the HDFS DataNode Logs in YARN but they don't seem to have Spark
> applications specific reads and writes.
> 
> 2015-12-21 18:29:15,347 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
> /127.0.0.1:53805, dest: /127.0.0.1:50010, bytes: 72159, op: HDFS_WRITE,
> cliID: DFSClient_NONMAPREDUCE_-1850086307_1, offset: 0, srvID:
> a9edc8ad-fb09-4621-b469-76de587560c0, blockid:
> BP-189543387-138.100.13.81-1450715936956:blk_1073741837_1013, duration:
> 2619119
> hadoop-alvarobrandon-datanode-usuariop81.fi.upm.es.log:2015-12-21
> 18:29:15,429 INFO org.apache.hadoop.hdfs.server.d
> 
> Is there any trace about this kind of operations to be found in any log?


1. the HDFS namenode and datanodes all collect metrics of their use, with 
org.apache.hadoop.hdfs.server.datanode.metrics.DataNodeMetrics being the most 
interesting on IO.
2. FileSystem.Statistics is a static structure collecting data on operations 
and data for each thread in a client process.
3. The HDFS input streams also supports some read statistics (ReadStatistics 
via getReadReadStatistics)
4. the recent versions of HDFS are also adding htrace support, to trace 
end-to-end performance.

I'd start with FileSystem.Statistics; if that's not being collected across 
spark jobs, it should be possible


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to