Hi all,

I have a Spark-0.9 cluster, which has 16 nodes.

I wrote a Spark application to read data from an HBase table, which has 86
regions spreading over 20 RegionServers.

I submitted the Spark app in Spark standalone mode and found that there
were 86 executors running on just 3 nodes and it took about  30 minutes to
read data from the table. In this case, I noticed from Spark master UI
that Locality
Level of all executors are "PROCESS_LOCAL".

Later I ran the same app again (without any code changed) and found that
those 86 executors were running on 16 nodes, and this time it took just 4
minutes to read date from the same HBase table. In this case, I noticed
that Locality Level of most executors are "NODE_LOCAL".

After testing multiple times, I found the two cases above occur randomly.

So I have 2 questions:
1)  Why would the two cases above occur randomly when I submitted the same
application multiple times ?
2)  Would the spread of executors influence locality level ?

Thank you.

Reply via email to