I don't believe mine is a regression. But it is related to thread safety on Hadoop Configuration objects. Should I start a new thread? On Jul 15, 2014 12:55 AM, "Patrick Wendell" <pwend...@gmail.com> wrote:
> Andrew is your issue also a regression from 1.0.0 to 1.0.1? The > immediate priority is addressing regressions between these two > releases. > > On Mon, Jul 14, 2014 at 9:05 PM, Andrew Ash <and...@andrewash.com> wrote: > > I'm not sure either of those PRs will fix the concurrent adds to > > Configuration issue I observed. I've got a stack trace and writeup I'll > > share in an hour or two (traveling today). > > On Jul 14, 2014 9:50 PM, "scwf" <wangf...@huawei.com> wrote: > > > >> hi,Cody > >> i met this issue days before and i post a PR for this( > >> https://github.com/apache/spark/pull/1385) > >> it's very strange that if i synchronize conf it will deadlock but it is > ok > >> when synchronize initLocalJobConfFuncOpt > >> > >> > >> Here's the entire jstack output. > >>> > >>> > >>> On Mon, Jul 14, 2014 at 4:44 PM, Patrick Wendell <pwend...@gmail.com > >>> <mailto:pwend...@gmail.com>> wrote: > >>> > >>> Hey Cody, > >>> > >>> This Jstack seems truncated, would you mind giving the entire stack > >>> trace? For the second thread, for instance, we can't see where the > >>> lock is being acquired. > >>> > >>> - Patrick > >>> > >>> On Mon, Jul 14, 2014 at 1:42 PM, Cody Koeninger > >>> <cody.koenin...@mediacrossing.com <mailto:cody.koeninger@ > >>> mediacrossing.com>> wrote: > >>> > Hi all, just wanted to give a heads up that we're seeing a > >>> reproducible > >>> > deadlock with spark 1.0.1 with 2.3.0-mr1-cdh5.0.2 > >>> > > >>> > If jira is a better place for this, apologies in advance - > figured > >>> talking > >>> > about it on the mailing list was friendlier than randomly > >>> (re)opening jira > >>> > tickets. > >>> > > >>> > I know Gary had mentioned some issues with 1.0.1 on the mailing > >>> list, once > >>> > we got a thread dump I wanted to follow up. > >>> > > >>> > The thread dump shows the deadlock occurs in the synchronized > >>> block of code > >>> > that was changed in HadoopRDD.scala, for the Spark-1097 issue > >>> > > >>> > Relevant portions of the thread dump are summarized below, we > can > >>> provide > >>> > the whole dump if it's useful. > >>> > > >>> > Found one Java-level deadlock: > >>> > ============================= > >>> > "Executor task launch worker-1": > >>> > waiting to lock monitor 0x00007f250400c520 (object > >>> 0x00000000fae7dc30, a > >>> > org.apache.hadoop.co <http://org.apache.hadoop.co> > >>> > nf.Configuration), > >>> > which is held by "Executor task launch worker-0" > >>> > "Executor task launch worker-0": > >>> > waiting to lock monitor 0x00007f2520495620 (object > >>> 0x00000000faeb4fc8, a > >>> > java.lang.Class), > >>> > which is held by "Executor task launch worker-1" > >>> > > >>> > > >>> > "Executor task launch worker-1": > >>> > at > >>> > org.apache.hadoop.conf.Configuration.reloadConfiguration( > >>> Configuration.java:791) > >>> > - waiting to lock <0x00000000fae7dc30> (a > >>> > org.apache.hadoop.conf.Configuration) > >>> > at > >>> > org.apache.hadoop.conf.Configuration.addDefaultResource( > >>> Configuration.java:690) > >>> > - locked <0x00000000faca6ff8> (a java.lang.Class for > >>> > org.apache.hadoop.conf.Configurati > >>> > on) > >>> > at > >>> > org.apache.hadoop.hdfs.HdfsConfiguration.<clinit>( > >>> HdfsConfiguration.java:34) > >>> > at > >>> > org.apache.hadoop.hdfs.DistributedFileSystem.<clinit> > >>> (DistributedFileSystem.java:110 > >>> > ) > >>> > at sun.reflect.NativeConstructorAccessorImpl. > >>> newInstance0(Native > >>> > Method) > >>> > at > >>> > sun.reflect.NativeConstructorAccessorImpl.newInstance( > >>> NativeConstructorAccessorImpl. > >>> > java:57) > >>> > at sun.reflect.NativeConstructorAccessorImpl. > >>> newInstance0(Native > >>> > Method) > >>> > at > >>> > sun.reflect.NativeConstructorAccessorImpl.newInstance( > >>> NativeConstructorAccessorImpl. > >>> > java:57) > >>> > at > >>> > sun.reflect.DelegatingConstructorAccessorImpl.newInstance( > >>> DelegatingConstructorAcces > >>> > sorImpl.java:45) > >>> > at java.lang.reflect.Constructor. > >>> newInstance(Constructor.java:525) > >>> > at java.lang.Class.newInstance0(Class.java:374) > >>> > at java.lang.Class.newInstance(Class.java:327) > >>> > at java.util.ServiceLoader$LazyIterator.next( > >>> ServiceLoader.java:373) > >>> > at > java.util.ServiceLoader$1.next(ServiceLoader.java:445) > >>> > at > >>> > org.apache.hadoop.fs.FileSystem.loadFileSystems( > >>> FileSystem.java:2364) > >>> > - locked <0x00000000faeb4fc8> (a java.lang.Class for > >>> > org.apache.hadoop.fs.FileSystem) > >>> > at > >>> > org.apache.hadoop.fs.FileSystem.getFileSystemClass( > >>> FileSystem.java:2375) > >>> > at > >>> > org.apache.hadoop.fs.FileSystem.createFileSystem( > >>> FileSystem.java:2392) > >>> > at org.apache.hadoop.fs.FileSystem.access$200( > >>> FileSystem.java:89) > >>> > at > >>> > org.apache.hadoop.fs.FileSystem$Cache.getInternal( > >>> FileSystem.java:2431) > >>> > at org.apache.hadoop.fs.FileSystem$Cache.get( > >>> FileSystem.java:2413) > >>> > at org.apache.hadoop.fs.FileSystem.get(FileSystem. > >>> java:368) > >>> > at org.apache.hadoop.fs.FileSystem.get(FileSystem. > >>> java:167) > >>> > at > >>> > org.apache.hadoop.mapred.JobConf.getWorkingDirectory( > >>> JobConf.java:587) > >>> > at > >>> > org.apache.hadoop.mapred.FileInputFormat.setInputPaths( > >>> FileInputFormat.java:315) > >>> > at > >>> > org.apache.hadoop.mapred.FileInputFormat.setInputPaths( > >>> FileInputFormat.java:288) > >>> > at > >>> > org.apache.spark.SparkContext$$anonfun$22.apply( > >>> SparkContext.scala:546) > >>> > at > >>> > org.apache.spark.SparkContext$$anonfun$22.apply( > >>> SparkContext.scala:546) > >>> > at > >>> > org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$ > >>> 1.apply(HadoopRDD.scala:145) > >>> > > >>> > > >>> > > >>> > ...elided... > >>> > > >>> > > >>> > "Executor task launch worker-0" daemon prio=10 > >>> tid=0x0000000001e71800 > >>> > nid=0x2d97 waiting for monitor entry [0x00007f24d2bf1000] > >>> > java.lang.Thread.State: BLOCKED (on object monitor) > >>> > at > >>> > org.apache.hadoop.fs.FileSystem.loadFileSystems( > >>> FileSystem.java:2362) > >>> > - waiting to lock <0x00000000faeb4fc8> (a > java.lang.Class > >>> for > >>> > org.apache.hadoop.fs.FileSystem) > >>> > at > >>> > org.apache.hadoop.fs.FileSystem.getFileSystemClass( > >>> FileSystem.java:2375) > >>> > at > >>> > org.apache.hadoop.fs.FileSystem.createFileSystem( > >>> FileSystem.java:2392) > >>> > at org.apache.hadoop.fs.FileSystem.access$200( > >>> FileSystem.java:89) > >>> > at > >>> > org.apache.hadoop.fs.FileSystem$Cache.getInternal( > >>> FileSystem.java:2431) > >>> > at org.apache.hadoop.fs.FileSystem$Cache.get( > >>> FileSystem.java:2413) > >>> > at org.apache.hadoop.fs.FileSystem.get(FileSystem. > >>> java:368) > >>> > at org.apache.hadoop.fs.FileSystem.get(FileSystem. > >>> java:167) > >>> > at > >>> > org.apache.hadoop.mapred.JobConf.getWorkingDirectory( > >>> JobConf.java:587) > >>> > at > >>> > org.apache.hadoop.mapred.FileInputFormat.setInputPaths( > >>> FileInputFormat.java:315) > >>> > at > >>> > org.apache.hadoop.mapred.FileInputFormat.setInputPaths( > >>> FileInputFormat.java:288) > >>> > at > >>> > org.apache.spark.SparkContext$$anonfun$22.apply( > >>> SparkContext.scala:546) > >>> > at > >>> > org.apache.spark.SparkContext$$anonfun$22.apply( > >>> SparkContext.scala:546) > >>> > at > >>> > org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$ > >>> 1.apply(HadoopRDD.scala:145) > >>> > >>> > >>> > >> > >> -- > >> > >> Best Regards > >> Fei Wang > >> > >> ------------------------------------------------------------ > >> -------------------- > >> > >> > >> >