Hi,

In Spark's in-memory logic, without cache, elements are accessed in an
iterator-based streaming style [
http://www.slideshare.net/liancheng/dtcc-14-spark-runtime-internals?next_slideshow=1
]

I have two questions:


   1. if elements are read one line at at time from HDFS (disk) and then
   transformed based on the rdd operations, how is this efficient?
   2. which class in the Spark source does this? I'm expecting some kind of:

           for (partition_index <- iterator_over_a_partition)
               read_hdfs_line(partition_index).apply_tranformation()


Thanks,

Reply via email to