Hi,

We are running Spark jobs on an Alluxio Cluster which is serving 13
gigabytes of data with 99% of the data is in memory. I was hoping to speed
up the Spark jobs by reading the in-memory data in Alluxio, but found
Alluxio local hit rate is only 1.68%, while Alluxio remote hit rate is
98.32%. By monitoring the network IO across all worker nodes through
"dstat" command, I found that only two nodes had about 1GB of recv or send
in the whole precessand, and it is sending  1GB or receiving 1GB during
Spark Shuffle Stage. Is there any metrics I could check or configuration to
tune ?


Best,

Jerry

Reply via email to