I found huge performance regression ( 1/20 of original) of my application after
Spark git commit: 0441515f221146756800dc583b225bdec8a6c075.
Apply the following patch, will fix my issue:
diff --git a/core/src/main/scala/org/apache/spark/executor/Executor.scala
b/core/src/main/scala/org/apache/spark/executor/Executor.scala
index 214a8c8..ebec21d 100644
--- a/core/src/main/scala/org/apache/spark/executor/Executor.scala
+++ b/core/src/main/scala/org/apache/spark/executor/Executor.scala
@@ -145,7 +145,7 @@ private[spark] class Executor(
}
}
- override def run() {
+ override def run() : Unit = SparkHadoopUtil.get.runAsSparkUser { () =>
val startTime = System.currentTimeMillis()
SparkEnv.set(env)
Thread.currentThread.setContextClassLoader(replClassLoader)
In the runAsSparkUser will call the 'UserGroupInformation.doAs()' to execute
the task and my application running OK;
if not through it, the performance was very poor. Application hotspot was
JNIHandleBlock::alloc_handle (JVM code, very high CPI (cycles per instruction,
< 1 is OK) > 10)
My application passed large array data (>80K length) to native C code through
JNI.
Why the "UserGroupInformation.doAs()" great impacted the performance under this
situation?
Thanks,
Zhonghui