[jira] [Updated] (SPARK-46198) Unexpected Shuffle Read when using cached DataFrame
[ https://issues.apache.org/jira/browse/SPARK-46198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vitaliy Savkin updated SPARK-46198: --- Description: When a computation is based on a cached data frame, I expect to see no Shuffle Reads, but they happen under certain circumstances. *Reproduction* {code:scala} val ctx: SQLContext = // init context val root = "s3a://af-data-eu-west-1-stg-parquet/vitalii-test-coalesce" def populateAndRead(tag: String): DataFrame = { val path = s"$root/numbers_$tag" // import ctx.implicits._ // import org.apache.spark.sql.functions.lit // (0 to 10 * 1000 * 1000) //.toDF("id") //.withColumn(tag, lit(tag.toUpperCase)) //.repartition(100) //.write //.option("header", "true") //.mode("ignore") //.csv(path) ctx.read.option("header", "true").csv(path).withColumnRenamed("id", tag + "_id") } val dfa = populateAndRead("a1") val dfb = populateAndRead("b1") val res = dfa.join(dfb, dfa("a1_id") === dfb("b1_id")) .unionByName(dfa.join(dfb, dfa("a1") === dfb("b1"))) .cache() println(res.count()) res.coalesce(1).write.mode("overwrite").csv(s"$root/numbers") {code} Relevant configs {code:scala} spark.executor.instances=10 spark.executor.cores=7 spark.executor.memory=40g spark.executor.memoryOverhead=5g spark.shuffle.service.enabled=true spark.sql.adaptive.enabled=false spark.sql.autoBroadcastJoinThreshold=-1 {code} Spark Plan says that cache is used {code:scala} == Physical Plan == Execute InsertIntoHadoopFsRelationCommand (27) +- Coalesce (26) +- InMemoryTableScan (1) +- InMemoryRelation (2) +- Union (25) :- * SortMergeJoin Inner (13) : :- * Sort (7) : : +- Exchange (6) : : +- * Project (5) : : +- * Filter (4) : : +- Scan csv (3) : +- * Sort (12) : +- Exchange (11) : +- * Project (10) : +- * Filter (9) : +- Scan csv (8) +- * SortMergeJoin Inner (24) :- * Sort (18) : +- Exchange (17) : +- * Project (16) : +- * Filter (15) : +- Scan csv (14) +- * Sort (23) +- Exchange (22) +- * Project (21) +- * Filter (20) +- Scan csv (19) {code} But when running on YARN, the csv job has shuffle reads. !shuffle.png! *Additional info* - I was unable to reproduce it with local Spark. - If {{.withColumnRenamed("id", tag + "_id")}} is dropped and the join conditions are changed to just {{{}"id"{}}}, the issue disappears! - This behaviour is stable - it's not a result of failed instances. *Production impact* Without cache saving data in production takes much longer (30 seconds vs 18 seconds). To avoid shuffle reads, we had to add a {{repartition}} step before {{cache}} as a workaround, which reduced time from 18 seconds to 10. was: When a computation is base on a cached data frames, I expect to see no Shuffle Reads, but it happens under certain circumstances. *Reproduction* {code:scala} val ctx: SQLContext = // init context val root = "s3a://af-data-eu-west-1-stg-parquet/vitalii-test-coalesce" def populateAndRead(tag: String): DataFrame = { val path = s"$root/numbers_$tag" // import ctx.implicits._ // import org.apache.spark.sql.functions.lit // (0 to 10 * 1000 * 1000) //.toDF("id") //.withColumn(tag, lit(tag.toUpperCase)) //.repartition(100) //.write //.option("header", "true") //.mode("ignore") //.csv(path) ctx.read.option("header", "true").csv(path).withColumnRenamed("id", tag + "_id") } val dfa = populateAndRead("a1") val dfb = populateAndRead("b1") val res = dfa.join(dfb, dfa("a1_id") === dfb("b1_id")) .unionByName(dfa.join(dfb, dfa("a1") === dfb("b1"))) .cache() println(res.count()) res.coalesce(1).write.mode("overwrite").csv(s"$root/numbers") {code} Relevant configs {code:scala} spark.executor.instances=10 spark.executor.cores=7 spark.executor.memory=40g spark.executor.memoryOverhead=5g spark.shuffle.service.enabled=true spark.sql.adaptive.enabled=false spark.sql.autoBroadcastJoinThreshold=-1 {code} Spark Plan says that cache is used {code:scala} == Physical Plan == Execute InsertIntoHadoopFsRelationCommand (27) +- Coalesce (26) +- InMemoryTableScan (1) +- InMemoryRelation (2) +- Union (25) :- * SortMergeJoin Inner (13)
[jira] [Updated] (SPARK-46198) Unexpected Shuffle Read when using cached DataFrame
[ https://issues.apache.org/jira/browse/SPARK-46198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vitaliy Savkin updated SPARK-46198: --- Description: When a computation is base on a cached data frames, I expect to see no Shuffle Reads, but it happens under certain circumstances. *Reproduction* {code:scala} val ctx: SQLContext = // init context val root = "s3a://af-data-eu-west-1-stg-parquet/vitalii-test-coalesce" def populateAndRead(tag: String): DataFrame = { val path = s"$root/numbers_$tag" // import ctx.implicits._ // import org.apache.spark.sql.functions.lit // (0 to 10 * 1000 * 1000) //.toDF("id") //.withColumn(tag, lit(tag.toUpperCase)) //.repartition(100) //.write //.option("header", "true") //.mode("ignore") //.csv(path) ctx.read.option("header", "true").csv(path).withColumnRenamed("id", tag + "_id") } val dfa = populateAndRead("a1") val dfb = populateAndRead("b1") val res = dfa.join(dfb, dfa("a1_id") === dfb("b1_id")) .unionByName(dfa.join(dfb, dfa("a1") === dfb("b1"))) .cache() println(res.count()) res.coalesce(1).write.mode("overwrite").csv(s"$root/numbers") {code} Relevant configs {code:scala} spark.executor.instances=10 spark.executor.cores=7 spark.executor.memory=40g spark.executor.memoryOverhead=5g spark.shuffle.service.enabled=true spark.sql.adaptive.enabled=false spark.sql.autoBroadcastJoinThreshold=-1 {code} Spark Plan says that cache is used {code:scala} == Physical Plan == Execute InsertIntoHadoopFsRelationCommand (27) +- Coalesce (26) +- InMemoryTableScan (1) +- InMemoryRelation (2) +- Union (25) :- * SortMergeJoin Inner (13) : :- * Sort (7) : : +- Exchange (6) : : +- * Project (5) : : +- * Filter (4) : : +- Scan csv (3) : +- * Sort (12) : +- Exchange (11) : +- * Project (10) : +- * Filter (9) : +- Scan csv (8) +- * SortMergeJoin Inner (24) :- * Sort (18) : +- Exchange (17) : +- * Project (16) : +- * Filter (15) : +- Scan csv (14) +- * Sort (23) +- Exchange (22) +- * Project (21) +- * Filter (20) +- Scan csv (19) {code} But when running on YARN, the csv job has shuffle reads. !shuffle.png! *Additional info* - I was unable to reproduce it with local Spark. - If {{.withColumnRenamed("id", tag + "_id")}} is dropped and the join conditions are changed to just {{{}"id"{}}}, the issue disappears! - This behaviour is stable - it's not a result of failed instances. *Production impact* Without cache saving data in production takes much longer (30 seconds vs 18 seconds). To avoid shuffle reads, we had to add a {{repartition}} step before {{cache}} as a workaround, which reduced time from 18 seconds to 10. was: When a computation is base on a cached data frames, I expect to see no Shuffle Reads, but it happens under certain circumstances. *Reproduction* {code:scala} val ctx: SQLContext = // init context val root = "s3a://af-data-eu-west-1-stg-parquet/vitalii-test-coalesce" def populateAndRead(tag: String): DataFrame = { val path = s"$root/numbers_$tag" // import ctx.implicits._ // import org.apache.spark.sql.functions.lit // (0 to 10 * 1000 * 1000) //.toDF("id") //.withColumn(tag, lit(tag.toUpperCase)) //.repartition(100) //.write //.option("header", "true") //.mode("ignore") //.csv(path) ctx.read.option("header", "true").csv(path).withColumnRenamed("id", tag + "_id") } val dfa = populateAndRead("a1") val dfb = populateAndRead("b1") val res = dfa.join(dfb, dfa("a1_id") === dfb("b1_id")) .unionByName(dfa.join(dfb, dfa("a1") === dfb("b1"))) .cache() println(res.count()) res.coalesce(1).write.mode("overwrite").csv(s"$root/numbers") {code} Relevant configs {code:scala} spark.executor.instances=10 spark.executor.cores=7 spark.executor.memory=40g spark.executor.memoryOverhead=5g spark.shuffle.service.enabled=true spark.sql.adaptive.enabled=false spark.sql.autoBroadcastJoinThreshold=-1 {code} Spark Plan says that cache is used {code:scala} == Physical Plan == Execute InsertIntoHadoopFsRelationCommand (27) +- Coalesce (26) +- InMemoryTableScan (1) +- InMemoryRelation (2) +- Union (25) :- * SortMergeJoin Inner (13)
[jira] [Created] (SPARK-46198) Unexpected Shuffle Read when using cached DataFrame
Vitaliy Savkin created SPARK-46198: -- Summary: Unexpected Shuffle Read when using cached DataFrame Key: SPARK-46198 URL: https://issues.apache.org/jira/browse/SPARK-46198 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.1 Reporter: Vitaliy Savkin Attachments: shuffle.png When a computation is base on a cached data frames, I expect to see no Shuffle Reads, but it happens under certain circumstances. *Reproduction* {code:scala} val ctx: SQLContext = // init context val root = "s3a://af-data-eu-west-1-stg-parquet/vitalii-test-coalesce" def populateAndRead(tag: String): DataFrame = { val path = s"$root/numbers_$tag" // import ctx.implicits._ // import org.apache.spark.sql.functions.lit // (0 to 10 * 1000 * 1000) //.toDF("id") //.withColumn(tag, lit(tag.toUpperCase)) //.repartition(100) //.write //.option("header", "true") //.mode("ignore") //.csv(path) ctx.read.option("header", "true").csv(path).withColumnRenamed("id", tag + "_id") } val dfa = populateAndRead("a1") val dfb = populateAndRead("b1") val res = dfa.join(dfb, dfa("a1_id") === dfb("b1_id")) .unionByName(dfa.join(dfb, dfa("a1") === dfb("b1"))) .cache() println(res.count()) res.coalesce(1).write.mode("overwrite").csv(s"$root/numbers") {code} Relevant configs {code:scala} spark.executor.instances=10 spark.executor.cores=7 spark.executor.memory=40g spark.executor.memoryOverhead=5g spark.shuffle.service.enabled=true spark.sql.adaptive.enabled=false spark.sql.autoBroadcastJoinThreshold=-1 {code} Spark Plan says that cache is used {code:scala} == Physical Plan == Execute InsertIntoHadoopFsRelationCommand (27) +- Coalesce (26) +- InMemoryTableScan (1) +- InMemoryRelation (2) +- Union (25) :- * SortMergeJoin Inner (13) : :- * Sort (7) : : +- Exchange (6) : : +- * Project (5) : : +- * Filter (4) : : +- Scan csv (3) : +- * Sort (12) : +- Exchange (11) : +- * Project (10) : +- * Filter (9) : +- Scan csv (8) +- * SortMergeJoin Inner (24) :- * Sort (18) : +- Exchange (17) : +- * Project (16) : +- * Filter (15) : +- Scan csv (14) +- * Sort (23) +- Exchange (22) +- * Project (21) +- * Filter (20) +- Scan csv (19) {code} But when running on YARN, the csv job has shuffle reads. !image-2023-12-01-09-27-39-463.png! *Additional info* - I was unable to reproduce it with local Spark. - If {{.withColumnRenamed("id", tag + "_id")}} is dropped and the join conditions are changed to just {{{}"id"{}}}, the issue disappears! - This behaviour is stable - it's not a result of failed instances. *Production impact* Without cache saving data in production takes much longer (30 seconds vs 18 seconds). To avoid shuffle reads, we had to add a {{repartition}} step before {{cache}} as a workaround, which reduced time from 18 seconds to 10. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46198) Unexpected Shuffle Read when using cached DataFrame
[ https://issues.apache.org/jira/browse/SPARK-46198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vitaliy Savkin updated SPARK-46198: --- Attachment: shuffle.png > Unexpected Shuffle Read when using cached DataFrame > --- > > Key: SPARK-46198 > URL: https://issues.apache.org/jira/browse/SPARK-46198 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1 >Reporter: Vitaliy Savkin >Priority: Major > Attachments: shuffle.png > > > When a computation is base on a cached data frames, I expect to see no > Shuffle Reads, but it happens under certain circumstances. > *Reproduction* > {code:scala} > val ctx: SQLContext = // init context > val root = "s3a://af-data-eu-west-1-stg-parquet/vitalii-test-coalesce" > def populateAndRead(tag: String): DataFrame = { > val path = s"$root/numbers_$tag" > // import ctx.implicits._ > // import org.apache.spark.sql.functions.lit > // (0 to 10 * 1000 * 1000) > //.toDF("id") > //.withColumn(tag, lit(tag.toUpperCase)) > //.repartition(100) > //.write > //.option("header", "true") > //.mode("ignore") > //.csv(path) > ctx.read.option("header", "true").csv(path).withColumnRenamed("id", tag > + "_id") > } > val dfa = populateAndRead("a1") > val dfb = populateAndRead("b1") > val res = > dfa.join(dfb, dfa("a1_id") === dfb("b1_id")) > .unionByName(dfa.join(dfb, dfa("a1") === dfb("b1"))) > .cache() > println(res.count()) > res.coalesce(1).write.mode("overwrite").csv(s"$root/numbers") > {code} > Relevant configs > {code:scala} > spark.executor.instances=10 > spark.executor.cores=7 > spark.executor.memory=40g > spark.executor.memoryOverhead=5g > spark.shuffle.service.enabled=true > spark.sql.adaptive.enabled=false > spark.sql.autoBroadcastJoinThreshold=-1 > {code} > Spark Plan says that cache is used > {code:scala} > == Physical Plan == > Execute InsertIntoHadoopFsRelationCommand (27) > +- Coalesce (26) > +- InMemoryTableScan (1) > +- InMemoryRelation (2) > +- Union (25) > :- * SortMergeJoin Inner (13) > : :- * Sort (7) > : : +- Exchange (6) > : : +- * Project (5) > : : +- * Filter (4) > : : +- Scan csv (3) > : +- * Sort (12) > : +- Exchange (11) > : +- * Project (10) > : +- * Filter (9) > : +- Scan csv (8) > +- * SortMergeJoin Inner (24) > :- * Sort (18) > : +- Exchange (17) > : +- * Project (16) > : +- * Filter (15) > : +- Scan csv (14) > +- * Sort (23) > +- Exchange (22) > +- * Project (21) > +- * Filter (20) > +- Scan csv (19) > {code} > But when running on YARN, the csv job has shuffle reads. > !image-2023-12-01-09-27-39-463.png! > *Additional info* > - I was unable to reproduce it with local Spark. > - If {{.withColumnRenamed("id", tag + "_id")}} is dropped and the join > conditions are changed to just {{{}"id"{}}}, the issue disappears! > - This behaviour is stable - it's not a result of failed instances. > *Production impact* > Without cache saving data in production takes much longer (30 seconds vs 18 > seconds). To avoid shuffle reads, we had to add a {{repartition}} step before > {{cache}} as a workaround, which reduced time from 18 seconds to 10. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26587) Deadlock between SparkUI thread and Driver thread
[ https://issues.apache.org/jira/browse/SPARK-26587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vitaliy Savkin updated SPARK-26587: --- Description: One time in a month (~1000 runs) one of our spark applications freezes at startup. jstack says that there is a deadlock. Please see locks 0x802c00c0 and 0x8271bb98 in stacktraces below. {noformat} "Driver": at java.lang.Package.getSystemPackage(Package.java:540) - waiting to lock <0x802c00c0> (a java.util.HashMap) at java.lang.ClassLoader.getPackage(ClassLoader.java:1625) at java.net.URLClassLoader.getAndVerifyPackage(URLClassLoader.java:394) at java.net.URLClassLoader.definePackageInternal(URLClassLoader.java:420) at java.net.URLClassLoader.defineClass(URLClassLoader.java:452) at java.net.URLClassLoader.access$100(URLClassLoader.java:74) at java.net.URLClassLoader$1.run(URLClassLoader.java:369) at java.net.URLClassLoader$1.run(URLClassLoader.java:363) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:362) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) - locked <0x82789598> (a org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1) at org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1.doLoadClass(IsolatedClientLoader.scala:221) at org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1.loadClass(IsolatedClientLoader.scala:210) at java.lang.ClassLoader.loadClass(ClassLoader.java:411) - locked <0x82789540> (a org.apache.spark.sql.internal.NonClosableMutableURLClassLoader) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:370) at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404) at java.util.ServiceLoader$1.next(ServiceLoader.java:480) at javax.xml.parsers.FactoryFinder$1.run(FactoryFinder.java:294) at java.security.AccessController.doPrivileged(Native Method) at javax.xml.parsers.FactoryFinder.findServiceProvider(FactoryFinder.java:289) at javax.xml.parsers.FactoryFinder.find(FactoryFinder.java:267) at javax.xml.parsers.DocumentBuilderFactory.newInstance(DocumentBuilderFactory.java:120) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2516) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2492) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2405) - locked <0x8271bb98> (a org.apache.hadoop.conf.Configuration) at org.apache.hadoop.conf.Configuration.get(Configuration.java:981) at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1031) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2189) at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2702) at org.apache.hadoop.fs.FsUrlStreamHandlerFactory.createURLStreamHandler(FsUrlStreamHandlerFactory.java:74) at java.net.URL.getURLStreamHandler(URL.java:1142) at java.net.URL.(URL.java:599) at java.net.URL.(URL.java:490) at java.net.URL.(URL.java:439) at java.net.JarURLConnection.parseSpecs(JarURLConnection.java:175) at java.net.JarURLConnection.(JarURLConnection.java:158) at sun.net.www.protocol.jar.JarURLConnection.(JarURLConnection.java:81) at sun.net.www.protocol.jar.Handler.openConnection(Handler.java:41) at java.net.URL.openConnection(URL.java:979) at java.net.URLClassLoader.getResourceAsStream(URLClassLoader.java:238) at org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1.doLoadClass(IsolatedClientLoader.scala:216) at org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1.loadClass(IsolatedClientLoader.scala:210) at java.lang.ClassLoader.loadClass(ClassLoader.java:411) - locked <0x82789540> (a org.apache.spark.sql.internal.NonClosableMutableURLClassLoader) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:262) at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:362) at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:266) at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:66) - locked <0x8302a120> (a org.apache.spark.sql.hive.HiveExternalCatalog) at org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:65) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply$mcZ$sp(HiveExternalCatalog.scala:194) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:194) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:194) at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97) -
[jira] [Updated] (SPARK-26587) Deadlock between SparkUI thread and Driver thread
[ https://issues.apache.org/jira/browse/SPARK-26587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vitaliy Savkin updated SPARK-26587: --- Description: One time in a month (~1000 runs) one of our spark applications freezes. jstack says that there is a deadlock. Please see locks 0x802c00c0 and 0x8271bb98 in stacktraces below. {noformat} "Driver": at java.lang.Package.getSystemPackage(Package.java:540) - waiting to lock <0x802c00c0> (a java.util.HashMap) at java.lang.ClassLoader.getPackage(ClassLoader.java:1625) at java.net.URLClassLoader.getAndVerifyPackage(URLClassLoader.java:394) at java.net.URLClassLoader.definePackageInternal(URLClassLoader.java:420) at java.net.URLClassLoader.defineClass(URLClassLoader.java:452) at java.net.URLClassLoader.access$100(URLClassLoader.java:74) at java.net.URLClassLoader$1.run(URLClassLoader.java:369) at java.net.URLClassLoader$1.run(URLClassLoader.java:363) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:362) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) - locked <0x82789598> (a org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1) at org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1.doLoadClass(IsolatedClientLoader.scala:221) at org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1.loadClass(IsolatedClientLoader.scala:210) at java.lang.ClassLoader.loadClass(ClassLoader.java:411) - locked <0x82789540> (a org.apache.spark.sql.internal.NonClosableMutableURLClassLoader) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:370) at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404) at java.util.ServiceLoader$1.next(ServiceLoader.java:480) at javax.xml.parsers.FactoryFinder$1.run(FactoryFinder.java:294) at java.security.AccessController.doPrivileged(Native Method) at javax.xml.parsers.FactoryFinder.findServiceProvider(FactoryFinder.java:289) at javax.xml.parsers.FactoryFinder.find(FactoryFinder.java:267) at javax.xml.parsers.DocumentBuilderFactory.newInstance(DocumentBuilderFactory.java:120) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2516) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2492) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2405) - locked <0x8271bb98> (a org.apache.hadoop.conf.Configuration) at org.apache.hadoop.conf.Configuration.get(Configuration.java:981) at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1031) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2189) at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2702) at org.apache.hadoop.fs.FsUrlStreamHandlerFactory.createURLStreamHandler(FsUrlStreamHandlerFactory.java:74) at java.net.URL.getURLStreamHandler(URL.java:1142) at java.net.URL.(URL.java:599) at java.net.URL.(URL.java:490) at java.net.URL.(URL.java:439) at java.net.JarURLConnection.parseSpecs(JarURLConnection.java:175) at java.net.JarURLConnection.(JarURLConnection.java:158) at sun.net.www.protocol.jar.JarURLConnection.(JarURLConnection.java:81) at sun.net.www.protocol.jar.Handler.openConnection(Handler.java:41) at java.net.URL.openConnection(URL.java:979) at java.net.URLClassLoader.getResourceAsStream(URLClassLoader.java:238) at org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1.doLoadClass(IsolatedClientLoader.scala:216) at org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1.loadClass(IsolatedClientLoader.scala:210) at java.lang.ClassLoader.loadClass(ClassLoader.java:411) - locked <0x82789540> (a org.apache.spark.sql.internal.NonClosableMutableURLClassLoader) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:262) at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:362) at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:266) at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:66) - locked <0x8302a120> (a org.apache.spark.sql.hive.HiveExternalCatalog) at org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:65) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply$mcZ$sp(HiveExternalCatalog.scala:194) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:194) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:194) at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97) - locked <0x0
[jira] [Created] (SPARK-26587) Deadlock between SparkUI thread and Driver thread
Vitaliy Savkin created SPARK-26587: -- Summary: Deadlock between SparkUI thread and Driver thread Key: SPARK-26587 URL: https://issues.apache.org/jira/browse/SPARK-26587 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.2.0 Environment: EMR 5.9.0 Reporter: Vitaliy Savkin One time in a month (~1000 runs) one of our spark applications freezes. jstack says that there is a deadlock. Please see locks 0x802c00c0 and 0x8271bb98 in stacktraces below. {noformat} "Driver": at java.lang.Package.getSystemPackage(Package.java:540) - waiting to lock <0x802c00c0> (a java.util.HashMap) at java.lang.ClassLoader.getPackage(ClassLoader.java:1625) at java.net.URLClassLoader.getAndVerifyPackage(URLClassLoader.java:394) at java.net.URLClassLoader.definePackageInternal(URLClassLoader.java:420) at java.net.URLClassLoader.defineClass(URLClassLoader.java:452) at java.net.URLClassLoader.access$100(URLClassLoader.java:74) at java.net.URLClassLoader$1.run(URLClassLoader.java:369) at java.net.URLClassLoader$1.run(URLClassLoader.java:363) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:362) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) - locked <0x82789598> (a org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1) at org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1.doLoadClass(IsolatedClientLoader.scala:221) at org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1.loadClass(IsolatedClientLoader.scala:210) at java.lang.ClassLoader.loadClass(ClassLoader.java:411) - locked <0x82789540> (a org.apache.spark.sql.internal.NonClosableMutableURLClassLoader) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:370) at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404) at java.util.ServiceLoader$1.next(ServiceLoader.java:480) at javax.xml.parsers.FactoryFinder$1.run(FactoryFinder.java:294) at java.security.AccessController.doPrivileged(Native Method) at javax.xml.parsers.FactoryFinder.findServiceProvider(FactoryFinder.java:289) at javax.xml.parsers.FactoryFinder.find(FactoryFinder.java:267) at javax.xml.parsers.DocumentBuilderFactory.newInstance(DocumentBuilderFactory.java:120) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2516) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2492) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2405) - locked <0x8271bb98> (a org.apache.hadoop.conf.Configuration) at org.apache.hadoop.conf.Configuration.get(Configuration.java:981) at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1031) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2189) at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2702) at org.apache.hadoop.fs.FsUrlStreamHandlerFactory.createURLStreamHandler(FsUrlStreamHandlerFactory.java:74) at java.net.URL.getURLStreamHandler(URL.java:1142) at java.net.URL.(URL.java:599) at java.net.URL.(URL.java:490) at java.net.URL.(URL.java:439) at java.net.JarURLConnection.parseSpecs(JarURLConnection.java:175) at java.net.JarURLConnection.(JarURLConnection.java:158) at sun.net.www.protocol.jar.JarURLConnection.(JarURLConnection.java:81) at sun.net.www.protocol.jar.Handler.openConnection(Handler.java:41) at java.net.URL.openConnection(URL.java:979) at java.net.URLClassLoader.getResourceAsStream(URLClassLoader.java:238) at org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1.doLoadClass(IsolatedClientLoader.scala:216) at org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1.loadClass(IsolatedClientLoader.scala:210) at java.lang.ClassLoader.loadClass(ClassLoader.java:411) - locked <0x82789540> (a org.apache.spark.sql.internal.NonClosableMutableURLClassLoader) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:262) at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:362) at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:266) at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:66) - locked <0x8302a120> (a org.apache.spark.sql.hive.HiveExternalCatalog) at org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:65) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply$mcZ$sp(HiveExternalCatalog.scala:194) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala: