Using Spark 1.6.2, I want to understand what « Duration » really mean (and why is slow).
Running a simple SELECT COUNT against a parquet file, stored within HDFS: NODE_LOCAL 1 / DATA02 2018/02/19 09:54:27 5 s 30 ms 8.8 MB (hadoop) / 3010830 8 ms 77.2 KB / 1666 This means "took 5 secondes to read 8 M from HDFS » ? Thomas Decaux