[jira] [Commented] (SPARK-26961) Found Java-level deadlock in Spark Driver
[ https://issues.apache.org/jira/browse/SPARK-26961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17272272#comment-17272272 ] Ajith S commented on SPARK-26961: - Scala issue : https://github.com/scala/bug/issues/11429 > Found Java-level deadlock in Spark Driver > - > > Key: SPARK-26961 > URL: https://issues.apache.org/jira/browse/SPARK-26961 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.3.0 >Reporter: Rong Jialei >Assignee: Ajith S >Priority: Major > Fix For: 2.3.4, 2.4.2, 3.0.0 > > Attachments: image-2019-03-13-19-53-52-390.png > > > Our spark job usually will finish in minutes, however, we recently found it > take days to run, and we can only kill it when this happened. > An investigation show all worker container could not connect drive after > start, and driver is hanging, using jstack, we found a Java-level deadlock. > > *Jstack output for deadlock part is showing below:* > > Found one Java-level deadlock: > = > "SparkUI-907": > waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a > org.apache.hadoop.conf.Configuration), > which is held by "ForkJoinPool-1-worker-57" > "ForkJoinPool-1-worker-57": > waiting to lock monitor 0x7f3860574298 (object 0x0005b7991168, a > org.apache.spark.util.MutableURLClassLoader), > which is held by "ForkJoinPool-1-worker-7" > "ForkJoinPool-1-worker-7": > waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a > org.apache.hadoop.conf.Configuration), > which is held by "ForkJoinPool-1-worker-57" > Java stack information for the threads listed above: > === > "SparkUI-907": > at org.apache.hadoop.conf.Configuration.getOverlay(Configuration.java:1328) > - waiting to lock <0x0005c0c1e5e0> (a > org.apache.hadoop.conf.Configuration) > at > org.apache.hadoop.conf.Configuration.handleDeprecation(Configuration.java:684) > at org.apache.hadoop.conf.Configuration.get(Configuration.java:1088) > at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1145) > at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2363) > at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2840) > at > org.apache.hadoop.fs.FsUrlStreamHandlerFactory.createURLStreamHandler(FsUrlStreamHandlerFactory.java:74) > at java.net.URL.getURLStreamHandler(URL.java:1142) > at java.net.URL.(URL.java:599) > at java.net.URL.(URL.java:490) > at java.net.URL.(URL.java:439) > at org.apache.spark.ui.JettyUtils$$anon$4.doRequest(JettyUtils.scala:176) > at org.apache.spark.ui.JettyUtils$$anon$4.doGet(JettyUtils.scala:161) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at > org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) > at > org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772) > at > org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:171) > at > org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) > at > org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) > at > org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) > at > org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) > at > org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) > at > org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493) > at > org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) > at > org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) > at org.spark_project.jetty.server.Server.handle(Server.java:534) > at org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:320) > at > org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) > at > org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) > at org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:108) > at > org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) > at > org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) > at > org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) > at java.lang.Thread.run(Thread.java:748) > "ForkJoinPool-1-work
[jira] [Created] (SPARK-30621) Dynamic Pruning thread propagates the localProperties to task
Ajith S created SPARK-30621: --- Summary: Dynamic Pruning thread propagates the localProperties to task Key: SPARK-30621 URL: https://issues.apache.org/jira/browse/SPARK-30621 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.0.0 Reporter: Ajith S Local properties set via sparkContext are not available as TaskContext properties when executing parallel jobs and threadpools have idle threads Explanation: When executing parallel jobs via SubqueryBroadcastExec, the {{relationFuture}} is evaluated via a separate thread. The threads inherit the {{localProperties}} from sparkContext as they are the child threads. These threads are controlled via the executionContext (thread pools). Each Thread pool has a default {{keepAliveSeconds}} of 60 seconds for idle threads. Scenarios where the thread pool has threads which are idle and reused for a subsequent new query, the thread local properties will not be inherited from spark context (thread properties are inherited only on thread creation) hence end up having old or no properties set. This will cause taskset properties to be missing when properties are transferred by child thread -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30556) Copy sparkContext.localproperties to child thread inSubqueryExec.executionContext
[ https://issues.apache.org/jira/browse/SPARK-30556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17021937#comment-17021937 ] Ajith S commented on SPARK-30556: - Yes, it exist in lower version like 2.3.x too > Copy sparkContext.localproperties to child thread > inSubqueryExec.executionContext > - > > Key: SPARK-30556 > URL: https://issues.apache.org/jira/browse/SPARK-30556 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4, 3.0.0 >Reporter: Ajith S >Assignee: Ajith S >Priority: Major > Fix For: 3.0.0 > > > Local properties set via sparkContext are not available as TaskContext > properties when executing jobs and threadpools have idle threads which are > reused > Explanation: > When SubqueryExec, the {{relationFuture}} is evaluated via a separate thread. > The threads inherit the {{localProperties}} from sparkContext as they are the > child threads. > These threads are controlled via the executionContext (thread pools). Each > Thread pool has a default {{keepAliveSeconds}} of 60 seconds for idle threads. > Scenarios where the thread pool has threads which are idle and reused for a > subsequent new query, the thread local properties will not be inherited from > spark context (thread properties are inherited only on thread creation) hence > end up having old or no properties set. This will cause taskset properties to > be missing when properties are transferred by child thread via > {{sparkContext.runJob/submitJob}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30556) Copy sparkContext.localproperties to child thread inSubqueryExec.executionContext
[ https://issues.apache.org/jira/browse/SPARK-30556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17021936#comment-17021936 ] Ajith S commented on SPARK-30556: - Raised backport PR for branch 2.4 [https://github.com/apache/spark/pull/27340] > Copy sparkContext.localproperties to child thread > inSubqueryExec.executionContext > - > > Key: SPARK-30556 > URL: https://issues.apache.org/jira/browse/SPARK-30556 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4, 3.0.0 >Reporter: Ajith S >Assignee: Ajith S >Priority: Major > Fix For: 3.0.0 > > > Local properties set via sparkContext are not available as TaskContext > properties when executing jobs and threadpools have idle threads which are > reused > Explanation: > When SubqueryExec, the {{relationFuture}} is evaluated via a separate thread. > The threads inherit the {{localProperties}} from sparkContext as they are the > child threads. > These threads are controlled via the executionContext (thread pools). Each > Thread pool has a default {{keepAliveSeconds}} of 60 seconds for idle threads. > Scenarios where the thread pool has threads which are idle and reused for a > subsequent new query, the thread local properties will not be inherited from > spark context (thread properties are inherited only on thread creation) hence > end up having old or no properties set. This will cause taskset properties to > be missing when properties are transferred by child thread via > {{sparkContext.runJob/submitJob}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30360) Avoid Redact classpath entries in History Server UI
[ https://issues.apache.org/jira/browse/SPARK-30360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S updated SPARK-30360: Description: Currently SPARK history server display the classpath entries in the Environment tab with classpaths redacted, this is because EventLog file has the entry values redacted while writing. But when same is seen from a running application UI, its seen that it is not redacted. Classpath entries redact is not needed and can be avoided (was: Currently SPARK history server display the classpath entries in the Environment tab with classpaths redacted, this is because EventLog file has the entry values redacted while writing. But when same is seen from a running application UI, its seen that it is not redacted. ) > Avoid Redact classpath entries in History Server UI > --- > > Key: SPARK-30360 > URL: https://issues.apache.org/jira/browse/SPARK-30360 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.0.0 >Reporter: Ajith S >Assignee: Ajith S >Priority: Major > Fix For: 3.0.0 > > > Currently SPARK history server display the classpath entries in the > Environment tab with classpaths redacted, this is because EventLog file has > the entry values redacted while writing. But when same is seen from a running > application UI, its seen that it is not redacted. Classpath entries redact is > not needed and can be avoided -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30360) Avoid Redact classpath entries in History Server UI
[ https://issues.apache.org/jira/browse/SPARK-30360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S updated SPARK-30360: Summary: Avoid Redact classpath entries in History Server UI (was: Redact classpath entries in Spark UI) > Avoid Redact classpath entries in History Server UI > --- > > Key: SPARK-30360 > URL: https://issues.apache.org/jira/browse/SPARK-30360 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.0.0 >Reporter: Ajith S >Assignee: Ajith S >Priority: Major > Fix For: 3.0.0 > > > Currently SPARK history server display the classpath entries in the > Environment tab with classpaths redacted, this is because EventLog file has > the entry values redacted while writing. But when same is seen from a running > application UI, its seen that it is not redacted. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22590) Broadcast thread propagates the localProperties to task
[ https://issues.apache.org/jira/browse/SPARK-22590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S updated SPARK-22590: Affects Version/s: 3.0.0 2.4.4 > Broadcast thread propagates the localProperties to task > --- > > Key: SPARK-22590 > URL: https://issues.apache.org/jira/browse/SPARK-22590 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0, 2.4.4, 3.0.0 >Reporter: Ajith S >Priority: Major > Labels: bulk-closed > Attachments: TestProps.scala > > > Local properties set via sparkContext are not available as TaskContext > properties when executing parallel jobs and threadpools have idle threads > Explanation: > When executing parallel jobs via {{BroadcastExchangeExec}}, the > {{relationFuture}} is evaluated via a seperate thread. The threads inherit > the {{localProperties}} from sparkContext as they are the child threads. > These threads are controlled via the executionContext (thread pools). Each > Thread pool has a default {{keepAliveSeconds}} of 60 seconds for idle > threads. > Scenarios where the thread pool has threads which are idle and reused for a > subsequent new query, the thread local properties will not be inherited from > spark context (thread properties are inherited only on thread creation) hence > end up having old or no properties set. This will cause taskset properties to > be missing when properties are transferred by child thread via > {{sparkContext.runJob/submitJob}} > Attached is a test-case to simulate this behavior -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30556) SubqueryExec passes local properties to SubqueryExec.executionContext
Ajith S created SPARK-30556: --- Summary: SubqueryExec passes local properties to SubqueryExec.executionContext Key: SPARK-30556 URL: https://issues.apache.org/jira/browse/SPARK-30556 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.4, 3.0.0 Reporter: Ajith S Local properties set via sparkContext are not available as TaskContext properties when executing jobs and threadpools have idle threads which are reused Explanation: When SubqueryExec, the {{relationFuture}} is evaluated via a separate thread. The threads inherit the {{localProperties}} from sparkContext as they are the child threads. These threads are controlled via the executionContext (thread pools). Each Thread pool has a default {{keepAliveSeconds}} of 60 seconds for idle threads. Scenarios where the thread pool has threads which are idle and reused for a subsequent new query, the thread local properties will not be inherited from spark context (thread properties are inherited only on thread creation) hence end up having old or no properties set. This will cause taskset properties to be missing when properties are transferred by child thread via {{sparkContext.runJob/submitJob}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22590) Broadcast thread propagates the localProperties to task
[ https://issues.apache.org/jira/browse/SPARK-22590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S updated SPARK-22590: Summary: Broadcast thread propagates the localProperties to task (was: SparkContext's local properties missing from TaskContext properties) > Broadcast thread propagates the localProperties to task > --- > > Key: SPARK-22590 > URL: https://issues.apache.org/jira/browse/SPARK-22590 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Ajith S >Priority: Major > Labels: bulk-closed > Attachments: TestProps.scala > > > Local properties set via sparkContext are not available as TaskContext > properties when executing parallel jobs and threadpools have idle threads > Explanation: > When executing parallel jobs via {{BroadcastExchangeExec}}, the > {{relationFuture}} is evaluated via a seperate thread. The threads inherit > the {{localProperties}} from sparkContext as they are the child threads. > These threads are controlled via the executionContext (thread pools). Each > Thread pool has a default {{keepAliveSeconds}} of 60 seconds for idle > threads. > Scenarios where the thread pool has threads which are idle and reused for a > subsequent new query, the thread local properties will not be inherited from > spark context (thread properties are inherited only on thread creation) hence > end up having old or no properties set. This will cause taskset properties to > be missing when properties are transferred by child thread via > {{sparkContext.runJob/submitJob}} > Attached is a test-case to simulate this behavior -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22590) SparkContext's local properties missing from TaskContext properties
[ https://issues.apache.org/jira/browse/SPARK-22590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S updated SPARK-22590: Description: Local properties set via sparkContext are not available as TaskContext properties when executing parallel jobs and threadpools have idle threads Explanation: When executing parallel jobs via {{BroadcastExchangeExec}}, the {{relationFuture}} is evaluated via a seperate thread. The threads inherit the {{localProperties}} from sparkContext as they are the child threads. These threads are controlled via the executionContext (thread pools). Each Thread pool has a default {{keepAliveSeconds}} of 60 seconds for idle threads. Scenarios where the thread pool has threads which are idle and reused for a subsequent new query, the thread local properties will not be inherited from spark context (thread properties are inherited only on thread creation) hence end up having old or no properties set. This will cause taskset properties to be missing when properties are transferred by child thread via {{sparkContext.runJob/submitJob}} Attached is a test-case to simulate this behavior was: Local properties set via sparkContext are not available as TaskContext properties when executing parallel jobs and threadpools have idle threads Explanation: When executing parallel jobs via {{BroadcastExchangeExec}} or {{SubqueryExec}}, the {{relationFuture}} is evaluated via a seperate thread. The threads inherit the {{localProperties}} from sparkContext as they are the child threads. These threads are controlled via the executionContext (thread pools). Each Thread pool has a default {{keepAliveSeconds}} of 60 seconds for idle threads. Scenarios where the thread pool has threads which are idle and reused for a subsequent new query, the thread local properties will not be inherited from spark context (thread properties are inherited only on thread creation) hence end up having old or no properties set. This will cause taskset properties to be missing when properties are transferred by child thread via {{sparkContext.runJob/submitJob}} Attached is a test-case to simulate this behavior > SparkContext's local properties missing from TaskContext properties > --- > > Key: SPARK-22590 > URL: https://issues.apache.org/jira/browse/SPARK-22590 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Ajith S >Priority: Major > Labels: bulk-closed > Attachments: TestProps.scala > > > Local properties set via sparkContext are not available as TaskContext > properties when executing parallel jobs and threadpools have idle threads > Explanation: > When executing parallel jobs via {{BroadcastExchangeExec}}, the > {{relationFuture}} is evaluated via a seperate thread. The threads inherit > the {{localProperties}} from sparkContext as they are the child threads. > These threads are controlled via the executionContext (thread pools). Each > Thread pool has a default {{keepAliveSeconds}} of 60 seconds for idle > threads. > Scenarios where the thread pool has threads which are idle and reused for a > subsequent new query, the thread local properties will not be inherited from > spark context (thread properties are inherited only on thread creation) hence > end up having old or no properties set. This will cause taskset properties to > be missing when properties are transferred by child thread via > {{sparkContext.runJob/submitJob}} > Attached is a test-case to simulate this behavior -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-22590) SparkContext's local properties missing from TaskContext properties
[ https://issues.apache.org/jira/browse/SPARK-22590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S reopened SPARK-22590: - Adding Fix > SparkContext's local properties missing from TaskContext properties > --- > > Key: SPARK-22590 > URL: https://issues.apache.org/jira/browse/SPARK-22590 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Ajith S >Priority: Major > Labels: bulk-closed > Attachments: TestProps.scala > > > Local properties set via sparkContext are not available as TaskContext > properties when executing parallel jobs and threadpools have idle threads > Explanation: > When executing parallel jobs via {{BroadcastExchangeExec}} or > {{SubqueryExec}}, the {{relationFuture}} is evaluated via a seperate thread. > The threads inherit the {{localProperties}} from sparkContext as they are the > child threads. > These threads are controlled via the executionContext (thread pools). Each > Thread pool has a default {{keepAliveSeconds}} of 60 seconds for idle > threads. > Scenarios where the thread pool has threads which are idle and reused for a > subsequent new query, the thread local properties will not be inherited from > spark context (thread properties are inherited only on thread creation) hence > end up having old or no properties set. This will cause taskset properties to > be missing when properties are transferred by child thread via > {{sparkContext.runJob/submitJob}} > Attached is a test-case to simulate this behavior -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-23626) DAGScheduler blocked due to JobSubmitted event
[ https://issues.apache.org/jira/browse/SPARK-23626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S reopened SPARK-23626: - Old PR was closed due to inactivity. Reopening with a new PR and hence to conclude > DAGScheduler blocked due to JobSubmitted event > --- > > Key: SPARK-23626 > URL: https://issues.apache.org/jira/browse/SPARK-23626 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 2.2.1, 2.3.3, 2.4.3, 3.0.0 >Reporter: Ajith S >Priority: Major > > DAGScheduler becomes a bottleneck in cluster when multiple JobSubmitted > events has to be processed as DAGSchedulerEventProcessLoop is single threaded > and it will block other tasks in queue like TaskCompletion. > The JobSubmitted event is time consuming depending on the nature of the job > (Example: calculating parent stage dependencies, shuffle dependencies, > partitions) and thus it blocks all the events to be processed. > > I see multiple JIRA referring to this behavior > https://issues.apache.org/jira/browse/SPARK-2647 > https://issues.apache.org/jira/browse/SPARK-4961 > > Similarly in my cluster some jobs partition calculation is time consuming > (Similar to stack at SPARK-2647) hence it slows down the spark > DAGSchedulerEventProcessLoop which results in user jobs to slowdown, even if > its tasks are finished within seconds, as TaskCompletion Events are processed > at a slower rate due to blockage. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-30517) Support SHOW TABLES EXTENDED
[ https://issues.apache.org/jira/browse/SPARK-30517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17015706#comment-17015706 ] Ajith S edited comment on SPARK-30517 at 1/15/20 8:01 AM: -- [~srowen] [~dongjoon] [~vanzin] Please let me know your opinions about proposal. I would like to work if its acceptable was (Author: ajithshetty): [~srowen] [~dongjoon] [~vanzin] Please let me know about your opinion about proposal. I would like to work if its acceptable > Support SHOW TABLES EXTENDED > > > Key: SPARK-30517 > URL: https://issues.apache.org/jira/browse/SPARK-30517 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.0.0 >Reporter: Ajith S >Priority: Major > > {{Intention is to support show tables with a additional column 'type' where > type can be MANAGED,EXTERNAL,VIEW using which user can query only tables of > required types, like listing only views or only external tables (using a > 'where' clause over 'type' column).}} > {{Usecase example:}} > {{Currently its not possible to list all the VIEWS, but other technologies > like hive support it using 'SHOW VIEWS', mysql supports it using a more > complex query 'SHOW FULL TABLES WHERE table_type = 'VIEW';'}} > Decide to take mysql approach as it provides more flexibility for querying. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30517) Support SHOW TABLES EXTENDED
[ https://issues.apache.org/jira/browse/SPARK-30517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17015706#comment-17015706 ] Ajith S commented on SPARK-30517: - [~srowen] [~dongjoon] [~vanzin] Please let me know about your opinion about proposal. I would like to work if its acceptable > Support SHOW TABLES EXTENDED > > > Key: SPARK-30517 > URL: https://issues.apache.org/jira/browse/SPARK-30517 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.0.0 >Reporter: Ajith S >Priority: Major > > {{Intention is to support show tables with a additional column 'type' where > type can be MANAGED,EXTERNAL,VIEW using which user can query only tables of > required types, like listing only views or only external tables (using a > 'where' clause over 'type' column).}} > {{Usecase example:}} > {{Currently its not possible to list all the VIEWS, but other technologies > like hive support it using 'SHOW VIEWS', mysql supports it using a more > complex query 'SHOW FULL TABLES WHERE table_type = 'VIEW';'}} > Decide to take mysql approach as it provides more flexibility for querying. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30517) Support SHOW TABLES EXTENDED
Ajith S created SPARK-30517: --- Summary: Support SHOW TABLES EXTENDED Key: SPARK-30517 URL: https://issues.apache.org/jira/browse/SPARK-30517 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 3.0.0 Reporter: Ajith S {{Intention is to support show tables with a additional column 'type' where type can be MANAGED,EXTERNAL,VIEW using which user can query only tables of required types, like listing only views or only external tables (using a 'where' clause over 'type' column).}} {{Usecase example:}} {{Currently its not possible to list all the VIEWS, but other technologies like hive support it using 'SHOW VIEWS', mysql supports it using a more complex query 'SHOW FULL TABLES WHERE table_type = 'VIEW';'}} Decide to take mysql approach as it provides more flexibility for querying. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-30484) Job History Storage Tab does not display RDD Table
[ https://issues.apache.org/jira/browse/SPARK-30484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013656#comment-17013656 ] Ajith S edited comment on SPARK-30484 at 1/12/20 6:36 AM: -- This is not a issue. As SparkListenerBlockUpdated is filtered by default for performance reasons, set spark.eventLog.logBlockUpdates.enabled=true to view storage information was (Author: ajithshetty): As SparkListenerBlockUpdated is filtered by default for performance reasons, set spark.eventLog.logBlockUpdates.enabled=true to view storage information > Job History Storage Tab does not display RDD Table > -- > > Key: SPARK-30484 > URL: https://issues.apache.org/jira/browse/SPARK-30484 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.0.0 >Reporter: ABHISHEK KUMAR GUPTA >Priority: Major > > scala> import org.apache.spark.storage.StorageLevel._ > import org.apache.spark.storage.StorageLevel._ > scala> val rdd = sc.range(0, 100, 1, 5).setName("rdd") > rdd: org.apache.spark.rdd.RDD[Long] = rdd MapPartitionsRDD[1] at range at > :27 > scala> rdd.persist(MEMORY_ONLY_SER) > res0: rdd.type = rdd MapPartitionsRDD[1] at range at :27 > scala> rdd.count > res1: Long = 100 > > scala> val df = Seq((1, "andy"), (2, "bob"), (2, "andy")).toDF("count", > "name") > df: org.apache.spark.sql.DataFrame = [count: int, name: string] > scala> df.persist(DISK_ONLY) > res2: df.type = [count: int, name: string] > scala> df.count > res3: Long = 3 > Open Storage Tab under Incomplete Jobs in Job History Page > UI will not display the RDD Table. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30484) Job History Storage Tab does not display RDD Table
[ https://issues.apache.org/jira/browse/SPARK-30484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013656#comment-17013656 ] Ajith S commented on SPARK-30484: - As SparkListenerBlockUpdated is filtered by default for performance reasons, set spark.eventLog.logBlockUpdates.enabled=true to view storage information > Job History Storage Tab does not display RDD Table > -- > > Key: SPARK-30484 > URL: https://issues.apache.org/jira/browse/SPARK-30484 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.0.0 >Reporter: ABHISHEK KUMAR GUPTA >Priority: Major > > scala> import org.apache.spark.storage.StorageLevel._ > import org.apache.spark.storage.StorageLevel._ > scala> val rdd = sc.range(0, 100, 1, 5).setName("rdd") > rdd: org.apache.spark.rdd.RDD[Long] = rdd MapPartitionsRDD[1] at range at > :27 > scala> rdd.persist(MEMORY_ONLY_SER) > res0: rdd.type = rdd MapPartitionsRDD[1] at range at :27 > scala> rdd.count > res1: Long = 100 > > scala> val df = Seq((1, "andy"), (2, "bob"), (2, "andy")).toDF("count", > "name") > df: org.apache.spark.sql.DataFrame = [count: int, name: string] > scala> df.persist(DISK_ONLY) > res2: df.type = [count: int, name: string] > scala> df.count > res3: Long = 3 > Open Storage Tab under Incomplete Jobs in Job History Page > UI will not display the RDD Table. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30488) Deadlock between block-manager-slave-async-thread-pool and spark context cleaner
[ https://issues.apache.org/jira/browse/SPARK-30488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013651#comment-17013651 ] Ajith S commented on SPARK-30488: - As from log, i see `sbt` classes in the deadlock threads,this is related to internal classloaders in sbt which was fixed in sbt 1.3.3 by marking classloaders as parallel capable. [https://github.com/sbt/sbt/pull/5131] also a similar issue '[https://github.com/sbt/sbt/issues/5116]' [~rohit21agrawal] Thanks for reporting this. Some questions, can you also please mention how the sparkcontext was created.? > Deadlock between block-manager-slave-async-thread-pool and spark context > cleaner > > > Key: SPARK-30488 > URL: https://issues.apache.org/jira/browse/SPARK-30488 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.3 >Reporter: Rohit Agrawal >Priority: Major > > Deadlock happens while cleaning up the spark context. Here is the full thread > dump: > > > 2020-01-10T20:13:16.2884057Z Full thread dump Java HotSpot(TM) 64-Bit > Server VM (25.221-b11 mixed mode): > 2020-01-10T20:13:16.2884392Z > 2020-01-10T20:13:16.2884660Z "SIGINT handler" #488 daemon prio=9 os_prio=2 > tid=0x111fa000 nid=0x4794 waiting for monitor entry > [0x1c86e000] > 2020-01-10T20:13:16.2884807Z java.lang.Thread.State: BLOCKED (on object > monitor) > 2020-01-10T20:13:16.2884879Z at java.lang.Shutdown.exit(Shutdown.java:212) > 2020-01-10T20:13:16.2885693Z - waiting to lock <0xc0155de0> (a > java.lang.Class for java.lang.Shutdown) > 2020-01-10T20:13:16.2885840Z at > java.lang.Terminator$1.handle(Terminator.java:52) > 2020-01-10T20:13:16.2885965Z at sun.misc.Signal$1.run(Signal.java:212) > 2020-01-10T20:13:16.2886329Z at java.lang.Thread.run(Thread.java:748) > 2020-01-10T20:13:16.2886430Z > 2020-01-10T20:13:16.2886752Z "Thread-3" #108 prio=5 os_prio=0 > tid=0x111f7800 nid=0x48cc waiting for monitor entry > [0x2c33f000] > 2020-01-10T20:13:16.2886881Z java.lang.Thread.State: BLOCKED (on object > monitor) > 2020-01-10T20:13:16.2886999Z at > org.apache.hadoop.util.ShutdownHookManager.getShutdownHooksInOrder(ShutdownHookManager.java:273) > 2020-01-10T20:13:16.2887107Z at > org.apache.hadoop.util.ShutdownHookManager.executeShutdown(ShutdownHookManager.java:121) > 2020-01-10T20:13:16.2887212Z at > org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:95) > 2020-01-10T20:13:16.2887421Z > 2020-01-10T20:13:16.2887798Z "block-manager-slave-async-thread-pool-81" #486 > daemon prio=5 os_prio=0 tid=0x111fe800 nid=0x2e34 waiting for monitor > entry [0x2bf3d000] > 2020-01-10T20:13:16.2889192Z java.lang.Thread.State: BLOCKED (on object > monitor) > 2020-01-10T20:13:16.2889305Z at > java.lang.ClassLoader.loadClass(ClassLoader.java:404) > 2020-01-10T20:13:16.2889405Z - waiting to lock <0xc1f359f0> (a > sbt.internal.LayeredClassLoader) > 2020-01-10T20:13:16.2889482Z at > java.lang.ClassLoader.loadClass(ClassLoader.java:411) > 2020-01-10T20:13:16.2889582Z - locked <0xca33e4c8> (a > sbt.internal.ManagedClassLoader$ZombieClassLoader) > 2020-01-10T20:13:16.2889659Z at > java.lang.ClassLoader.loadClass(ClassLoader.java:357) > 2020-01-10T20:13:16.2890881Z at > org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$3.apply$mcZ$sp(BlockManagerSlaveEndpoint.scala:58) > 2020-01-10T20:13:16.2891006Z at > org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$3.apply(BlockManagerSlaveEndpoint.scala:57) > 2020-01-10T20:13:16.2891142Z at > org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$3.apply(BlockManagerSlaveEndpoint.scala:57) > 2020-01-10T20:13:16.2891260Z at > org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$1.apply(BlockManagerSlaveEndpoint.scala:86) > 2020-01-10T20:13:16.2891375Z at > scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) > 2020-01-10T20:13:16.2891624Z at > scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) > 2020-01-10T20:13:16.2891737Z at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > 2020-01-10T20:13:16.2891833Z at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > 2020-01-10T20:13:16.2891925Z at java.lang.Thread.run(Thread.java:748) > 2020-01-10T20:13:16.2891967Z > 2020-01-10T20:13:16.2892066Z "pool-31-thread-16" #335 prio=5 os_prio=0 > tid=0x153b2000 nid=0x1aac waiting on condition [0x4b2ff000] > 2020-01-10T20:13:16.2892147Z java.lang.Thread.State: WAITING (parking) > 2020-01
[jira] [Commented] (SPARK-30440) Flaky test: org.apache.spark.scheduler.TaskSetManagerSuite.reset
[ https://issues.apache.org/jira/browse/SPARK-30440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009433#comment-17009433 ] Ajith S commented on SPARK-30440: - Found a race between reviveOffers in org.apache.spark.scheduler.TaskSchedulerImpl#submitTasks and org.apache.spark.scheduler.TaskSetManager#resourceOffer, in the testcase made PR for same [https://github.com/apache/spark/pull/27115] > Flaky test: org.apache.spark.scheduler.TaskSetManagerSuite.reset > > > Key: SPARK-30440 > URL: https://issues.apache.org/jira/browse/SPARK-30440 > Project: Spark > Issue Type: Bug > Components: Spark Core, Tests >Affects Versions: 3.0.0 >Reporter: Jungtaek Lim >Priority: Major > > [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116126/testReport] > [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116159/testReport/] > {noformat} > org.apache.spark.scheduler.TaskSetManagerSuite.reset Error > Detailsorg.scalatest.exceptions.TestFailedException: task0.isDefined was > true, but task1.isDefined was false Stack Tracesbt.ForkMain$ForkError: > org.scalatest.exceptions.TestFailedException: task0.isDefined was true, but > task1.isDefined was false > at > org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530) > at > org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529) > at > org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560) > at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:503) > at > org.apache.spark.scheduler.TaskSetManagerSuite.$anonfun$new$107(TaskSetManagerSuite.scala:1933) > at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) > at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:149) > at > org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184) > at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196) > at org.scalatest.SuperEngine.runTestImpl(Engine.scala:286) > at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196) > at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178) > at > org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:56) > at > org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:221) > at > org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:214) > at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:56) > at > org.scalatest.FunSuiteLike.$anonfun$runTests$1(FunSuiteLike.scala:229) > at > org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:393) > at scala.collection.immutable.List.foreach(List.scala:392) > at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:381) > at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:376) > at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:458) > at org.scalatest.FunSuiteLike.runTests(FunSuiteLike.scala:229) > at org.scalatest.FunSuiteLike.runTests$(FunSuiteLike.scala:228) > at org.scalatest.FunSuite.runTests(FunSuite.scala:1560) > at org.scalatest.Suite.run(Suite.scala:1124) > at org.scalatest.Suite.run$(Suite.scala:1106) > at > org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560) > at org.scalatest.FunSuiteLike.$anonfun$run$1(FunSuiteLike.scala:233) > at org.scalatest.SuperEngine.runImpl(Engine.scala:518) > at org.scalatest.FunSuiteLike.run(FunSuiteLike.scala:233) > at org.scalatest.FunSuiteLike.run$(FunSuiteLike.scala:232) > at > org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:56) > at > org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213) > at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) > at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208) > at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:56) > at > org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317) > at > org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510) > at sbt.ForkMain$Run$2.call(ForkMain.java:296) > at sbt.ForkMain$Run$2.call(ForkMain.java:286) > at java.util.co
[jira] [Created] (SPARK-30406) OneForOneStreamManager use AtomicLong
Ajith S created SPARK-30406: --- Summary: OneForOneStreamManager use AtomicLong Key: SPARK-30406 URL: https://issues.apache.org/jira/browse/SPARK-30406 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.0.0 Reporter: Ajith S Using compound operations as well as increments and decrements on primitive fields are not atomic operations. Here when volatile primitive field is incremented or decremented, we run into data loss if threads interleave in steps of update. Refer: [https://wiki.sei.cmu.edu/confluence/display/java/VNA02-J.+Ensure+that+compound+operations+on+shared+variables+are+atomic] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30405) ArrayKeyIndexType should use Arrays.hashCode
Ajith S created SPARK-30405: --- Summary: ArrayKeyIndexType should use Arrays.hashCode Key: SPARK-30405 URL: https://issues.apache.org/jira/browse/SPARK-30405 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.4.4, 2.3.4 Reporter: Ajith S hashCode on a array, returns arrays's identity hash and does not reflect the array's content instead. this cann be corrected by using Arrays.hashCode(array) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30382) start-thriftserver throws ClassNotFoundException
Ajith S created SPARK-30382: --- Summary: start-thriftserver throws ClassNotFoundException Key: SPARK-30382 URL: https://issues.apache.org/jira/browse/SPARK-30382 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0 Reporter: Ajith S start-thriftserver.sh --help throws {code} . Thrift server options: Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/logging/log4j/spi/LoggerContextFactory at org.apache.hive.service.server.HiveServer2.main(HiveServer2.java:167) at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:82) at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala) Caused by: java.lang.ClassNotFoundException: org.apache.logging.log4j.spi.LoggerContextFactory at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 3 more {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25061) Spark SQL Thrift Server fails to not pick up hiveconf passing parameter
[ https://issues.apache.org/jira/browse/SPARK-25061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17004736#comment-17004736 ] Ajith S commented on SPARK-25061: - I could reproduce this and as per documentation, https://spark.apache.org/docs/latest/sql-distributed-sql-engine.html --hiveconf can be used to pass hive properties to thrift server. Raising PR for fixing the same. > Spark SQL Thrift Server fails to not pick up hiveconf passing parameter > > > Key: SPARK-25061 > URL: https://issues.apache.org/jira/browse/SPARK-25061 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Zineng Yuan >Priority: Major > > Spark thrift server should use passing parameter value and overwrites the > same conf from hive-site.xml. For example, the server should overwrite what > exists in hive-site.xml. > ./sbin/start-thriftserver.sh --master yarn-client ... > --hiveconf > "hive.server2.authentication.kerberos.principal=" ... > > hive.server2.authentication.kerberos.principal > hive/_HOST@ > > However, the server takes what in hive-site.xml. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25030) SparkSubmit.doSubmit will not return result if the mainClass submitted creates a Timer()
[ https://issues.apache.org/jira/browse/SPARK-25030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17003932#comment-17003932 ] Ajith S commented on SPARK-25030: - [~jiangxb1987] is there a reproduce case for this.? i just tried this and seems to work fine. > SparkSubmit.doSubmit will not return result if the mainClass submitted > creates a Timer() > > > Key: SPARK-25030 > URL: https://issues.apache.org/jira/browse/SPARK-25030 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.1 >Reporter: Xingbo Jiang >Priority: Major > Labels: bulk-closed > > Create a Timer() in the mainClass submitted to SparkSubmit makes it unable to > fetch result, it is very easy to reproduce the issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30361) Monitoring URL do not redact information about environment
Ajith S created SPARK-30361: --- Summary: Monitoring URL do not redact information about environment Key: SPARK-30361 URL: https://issues.apache.org/jira/browse/SPARK-30361 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.0.0 Reporter: Ajith S UI and event logs redact sensitive information. But the monitoring URL, https://spark.apache.org/docs/latest/monitoring.html#rest-api , specifically /applications/[app-id]/environment does not, which is a security issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30360) Redact classpath entries in Spark UI
Ajith S created SPARK-30360: --- Summary: Redact classpath entries in Spark UI Key: SPARK-30360 URL: https://issues.apache.org/jira/browse/SPARK-30360 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 3.0.0 Reporter: Ajith S Currently SPARK history server display the classpath entries in the Environment tab with classpaths redacted, this is because EventLog file has the entry values redacted while writing. But when same is seen from a running application UI, its seen that it is not redacted. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27719) Set maxDisplayLogSize for spark history server
[ https://issues.apache.org/jira/browse/SPARK-27719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16985021#comment-16985021 ] Ajith S commented on SPARK-27719: - Currently our production also encounter this issue, I would like to work on this, as per suggestion of [~hao.li] is the idea acceptable [~dongjoon] .? > Set maxDisplayLogSize for spark history server > -- > > Key: SPARK-27719 > URL: https://issues.apache.org/jira/browse/SPARK-27719 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.0.0 >Reporter: hao.li >Priority: Minor > > Sometimes a very large eventllog may be useless, and parses it may waste many > resources. > It may be useful to avoid parse large enventlogs by setting a configuration > spark.history.fs.maxDisplayLogSize. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29174) LOCAL is not supported in INSERT OVERWRITE DIRECTORY to data source
[ https://issues.apache.org/jira/browse/SPARK-29174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16933203#comment-16933203 ] Ajith S commented on SPARK-29174: - Had checked with original author for this https://github.com/apache/spark/pull/18975#issuecomment-523261355 > LOCAL is not supported in INSERT OVERWRITE DIRECTORY to data source > --- > > Key: SPARK-29174 > URL: https://issues.apache.org/jira/browse/SPARK-29174 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: ABHISHEK KUMAR GUPTA >Priority: Major > > *using does not work for insert overwrite when in local but works when > insert overwrite in HDFS directory* > ** > > 0: jdbc:hive2://10.18.18.214:23040/default> insert overwrite directory > '/user/trash2/' using parquet select * from trash1 a where a.country='PAK'; > +-+--+ > | Result | > +-+--+ > +-+--+ > No rows selected (0.448 seconds) > 0: jdbc:hive2://10.18.18.214:23040/default> insert overwrite local directory > '/opt/trash2/' using parquet select * from trash1 a where a.country='PAK'; > Error: org.apache.spark.sql.catalyst.parser.ParseException: > LOCAL is not supported in INSERT OVERWRITE DIRECTORY to data source(line 1, > pos 0) > > == SQL == > insert overwrite local directory '/opt/trash2/' using parquet select * from > trash1 a where a.country='PAK' > ^^^ (state=,code=0) > 0: jdbc:hive2://10.18.18.214:23040/default> insert overwrite local directory > '/opt/trash2/' stored as parquet select * from trash1 a where a.country='PAK'; > +-+--+ > | Result | > +-+--+ > | | | > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28848) insert overwrite local directory stored as parquet does not creates snappy.parquet data file at local directory path
[ https://issues.apache.org/jira/browse/SPARK-28848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S resolved SPARK-28848. - Resolution: Duplicate Will be fixed as part of SPARK-28659 > insert overwrite local directory stored as parquet does not creates > snappy.parquet data file at local directory path > --- > > Key: SPARK-28848 > URL: https://issues.apache.org/jira/browse/SPARK-28848 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.3 >Reporter: ABHISHEK KUMAR GUPTA >Priority: Major > > {code} > 0: jdbc:hive2://10.18.18.214:23040/func> insert overwrite local directory > '/opt/trash4/' stored as parquet select * from trash1 a where a.country='PAK'; > +-+--+ > | Result | > +-+--+ > +-+--+ > No rows selected (1.368 seconds) > {code} > Data file at local directory path: > {code} > vm1:/opt/trash4 # ll > total 12 > -rw-r--r-- 1 root root 8 Aug 22 14:30 ._SUCCESS.crc > -rw-r--r-- 1 root root 16 Aug 22 14:30 > .part-1-2b17ec6a-ef7e-4b45-927e-f93b88ff4f65-c000.crc > -rw-r--r-- 1 root root 0 Aug 22 14:30 _SUCCESS > -rw-r--r-- 1 root root 619 Aug 22 14:30 > part-1-2b17ec6a-ef7e-4b45-927e-f93b88ff4f65-c000 > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28848) insert overwrite local directory stored as parquet does not creates snappy.parquet data file at local directory path
[ https://issues.apache.org/jira/browse/SPARK-28848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16913027#comment-16913027 ] Ajith S commented on SPARK-28848: - Thanks for reporting. Will look into this. On initial thoughts looks like org.apache.spark.sql.hive.execution.HiveFileFormat#prepareWrite here OutputWriterFactory.getFileExtension may not be passing the right file extension incase of stored as > insert overwrite local directory stored as parquet does not creates > snappy.parquet data file at local directory path > --- > > Key: SPARK-28848 > URL: https://issues.apache.org/jira/browse/SPARK-28848 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.3 >Reporter: ABHISHEK KUMAR GUPTA >Priority: Major > > 0: jdbc:hive2://10.18.18.214:23040/func> insert overwrite local directory > '/opt/trash4/' stored as parquet select * from trash1 a where a.country='PAK'; > +-+--+ > | Result | > +-+--+ > +-+--+ > No rows selected (1.368 seconds) > Data file at local directory path: > vm1:/opt/trash4 # ll > total 12 > -rw-r--r-- 1 root root 8 Aug 22 14:30 ._SUCCESS.crc > -rw-r--r-- 1 root root 16 Aug 22 14:30 > .part-1-2b17ec6a-ef7e-4b45-927e-f93b88ff4f65-c000.crc > -rw-r--r-- 1 root root 0 Aug 22 14:30 _SUCCESS > -rw-r--r-- 1 root root 619 Aug 22 14:30 > part-1-2b17ec6a-ef7e-4b45-927e-f93b88ff4f65-c000 -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28726) Spark with DynamicAllocation always got connect rest by peers
[ https://issues.apache.org/jira/browse/SPARK-28726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16907112#comment-16907112 ] Ajith S commented on SPARK-28726: - As i see, this is driver trying to clean up RDDs, broadcasts etc from the expiring executor and meanwhile the executor has gone down, which is why such exceptions are under warning. Does the issue occur with higher timeouts too.? > Spark with DynamicAllocation always got connect rest by peers > - > > Key: SPARK-28726 > URL: https://issues.apache.org/jira/browse/SPARK-28726 > Project: Spark > Issue Type: Wish > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: angerszhu >Priority: Major > > When use Spark with dynamic allocation, we set idle time to 5s > We always got exception about neety 'Connect reset by peers' > > I suspect that it's because we set idle time 5s is too small, it will cause > when Blockmanager call netty io, the executor has been remove because of > timeout. > But not timely notify driver's BlocakManager > {code:java} > 19/08/14 00:00:46 WARN > org.apache.spark.network.server.TransportChannelHandler: "Exception in > connection from /host:port" > java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) > at sun.nio.ch.IOUtil.read(IOUtil.java:192) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) > at > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:288) > at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1106) > at > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:343) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:123) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) > at > io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) > at > io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) > -- > 19/08/14 00:00:46 WARN org.apache.spark.storage.BlockManagerMasterEndpoint: > "Error trying to remove broadcast 67 from block manager BlockManagerId(967, > host, port, None)" > java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) > at sun.nio.ch.IOUtil.read(IOUtil.java:192) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) > at > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:288) > at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1106) > at > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:343) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:123) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) > at > io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) > at > io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) > -- > 19/08/14 00:00:46 INFO org.apache.spark.ContextCleaner: "Cleaned accumulator > 162174" > 19/08/14 00:00:46 WARN org.apache.spark.storage.BlockManagerMaster: "Failed > to remove shuffle 22 - Connection reset by peer" > java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39){code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28696) create database _; allowing in Spark but Hive throws Parse Exception
[ https://issues.apache.org/jira/browse/SPARK-28696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16906881#comment-16906881 ] Ajith S commented on SPARK-28696: - [~hyukjin.kwon] okay, should we disallow tables names starting with _ rather than throwing path error for SPARK-28697 .?? > create database _; allowing in Spark but Hive throws Parse Exception > > > Key: SPARK-28696 > URL: https://issues.apache.org/jira/browse/SPARK-28696 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: ABHISHEK KUMAR GUPTA >Priority: Minor > > In Spark > {code} > spark-sql> create database _; > Time taken: 0.062 seconds > spark-sql> show databases; > _ > adap > adaptive > adaptive_tc8 > {code} > In Hive > {code} > 0: jdbc:hive2://10.18.98.147:21066/> create database _; > Error: Error while compiling statement: FAILED: ParseException line 1:16 > cannot recognize input near '_--0' '' '' in create database > statement (state=42000,code=4) > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28696) create database _; allowing in Spark but Hive throws Parse Exception
[ https://issues.apache.org/jira/browse/SPARK-28696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16905196#comment-16905196 ] Ajith S commented on SPARK-28696: - I think we should disallow if a identifier starts with _ for create database and create table Partially we can see its effect in SPARK-28697 where as the table name starts with _ (like _sampleTable) , the FileFormat assumes it to be a hidden folder and do not list it which causes unusual behavior > create database _; allowing in Spark but Hive throws Parse Exception > > > Key: SPARK-28696 > URL: https://issues.apache.org/jira/browse/SPARK-28696 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: ABHISHEK KUMAR GUPTA >Priority: Minor > > In Spark > spark-sql> create database _; > Time taken: 0.062 seconds > spark-sql> show databases; > _ > adap > adaptive > adaptive_tc8 > In Hive > 0: jdbc:hive2://10.18.98.147:21066/> create database _; > Error: Error while compiling statement: FAILED: ParseException line 1:16 > cannot recognize input near '_--0' '' '' in create database > statement (state=42000,code=4) -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-28697) select * from _; throws InvalidInputException and says path does not exists at HDFS side
[ https://issues.apache.org/jira/browse/SPARK-28697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16905144#comment-16905144 ] Ajith S edited comment on SPARK-28697 at 8/12/19 1:28 PM: -- !screenshot-1.png! Here due to org.apache.hadoop.fs.FileSystem#globStatus(org.apache.hadoop.fs.Path, org.apache.hadoop.fs.PathFilter), it return null matches when folder name starts with _ This infact is due to hadoop code, here org.apache.hadoop.mapred.FileInputFormat#hiddenFileFilter which disallows paths which start with _ was (Author: ajithshetty): !screenshot-1.png! Here due to org.apache.hadoop.fs.FileSystem#globStatus(org.apache.hadoop.fs.Path, org.apache.hadoop.fs.PathFilter), it return null matches when folder name starts with _ > select * from _; throws InvalidInputException and says path does not exists > at HDFS side > - > > Key: SPARK-28697 > URL: https://issues.apache.org/jira/browse/SPARK-28697 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: ABHISHEK KUMAR GUPTA >Priority: Major > Attachments: screenshot-1.png > > > spark-sql> create database func1; > Time taken: 0.095 seconds > spark-sql> use func1; > Time taken: 0.031 seconds > spark-sql> create table _(id int); > Time taken: 0.351 seconds > spark-sql> insert into _ values(1); > Time taken: 3.148 seconds > spark-sql> select * from _; > org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: > hdfs://hacluster/user/sparkhive/warehouse/func1.db/_ > at > org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287) > But at HDFS side it is present > vm1:/opt/HA/C10/install/hadoop/nodemanager/bin # ./hdfs dfs -ls > /user/sparkhive/warehouse/func1.db > Found 2 items > drwxr-xr-x - root hadoop 0 2019-08-12 20:02 > /user/sparkhive/warehouse/func1.db/_ -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-28697) select * from _; throws InvalidInputException and says path does not exists at HDFS side
[ https://issues.apache.org/jira/browse/SPARK-28697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16905144#comment-16905144 ] Ajith S edited comment on SPARK-28697 at 8/12/19 12:44 PM: --- !screenshot-1.png! Here due to org.apache.hadoop.fs.FileSystem#globStatus(org.apache.hadoop.fs.Path, org.apache.hadoop.fs.PathFilter), it return null matches when folder name starts with _ was (Author: ajithshetty): !screenshot-1.png! Here due to org.apache.hadoop.fs.FileSystem#globStatus(org.apache.hadoop.fs.Path, org.apache.hadoop.fs.PathFilter), it return 0 matches when folder name starts with _ > select * from _; throws InvalidInputException and says path does not exists > at HDFS side > - > > Key: SPARK-28697 > URL: https://issues.apache.org/jira/browse/SPARK-28697 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: ABHISHEK KUMAR GUPTA >Priority: Major > Attachments: screenshot-1.png > > > spark-sql> create database func1; > Time taken: 0.095 seconds > spark-sql> use func1; > Time taken: 0.031 seconds > spark-sql> create table _(id int); > Time taken: 0.351 seconds > spark-sql> insert into _ values(1); > Time taken: 3.148 seconds > spark-sql> select * from _; > org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: > hdfs://hacluster/user/sparkhive/warehouse/func1.db/_ > at > org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287) > But at HDFS side it is present > vm1:/opt/HA/C10/install/hadoop/nodemanager/bin # ./hdfs dfs -ls > /user/sparkhive/warehouse/func1.db > Found 2 items > drwxr-xr-x - root hadoop 0 2019-08-12 20:02 > /user/sparkhive/warehouse/func1.db/_ -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28697) select * from _; throws InvalidInputException and says path does not exists at HDFS side
[ https://issues.apache.org/jira/browse/SPARK-28697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16905144#comment-16905144 ] Ajith S commented on SPARK-28697: - !screenshot-1.png! Here due to org.apache.hadoop.fs.FileSystem#globStatus(org.apache.hadoop.fs.Path, org.apache.hadoop.fs.PathFilter), it return 0 matches when folder name starts with _ > select * from _; throws InvalidInputException and says path does not exists > at HDFS side > - > > Key: SPARK-28697 > URL: https://issues.apache.org/jira/browse/SPARK-28697 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: ABHISHEK KUMAR GUPTA >Priority: Major > Attachments: screenshot-1.png > > > spark-sql> create database func1; > Time taken: 0.095 seconds > spark-sql> use func1; > Time taken: 0.031 seconds > spark-sql> create table _(id int); > Time taken: 0.351 seconds > spark-sql> insert into _ values(1); > Time taken: 3.148 seconds > spark-sql> select * from _; > org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: > hdfs://hacluster/user/sparkhive/warehouse/func1.db/_ > at > org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287) > But at HDFS side it is present > vm1:/opt/HA/C10/install/hadoop/nodemanager/bin # ./hdfs dfs -ls > /user/sparkhive/warehouse/func1.db > Found 2 items > drwxr-xr-x - root hadoop 0 2019-08-12 20:02 > /user/sparkhive/warehouse/func1.db/_ -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28697) select * from _; throws InvalidInputException and says path does not exists at HDFS side
[ https://issues.apache.org/jira/browse/SPARK-28697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S updated SPARK-28697: Attachment: screenshot-1.png > select * from _; throws InvalidInputException and says path does not exists > at HDFS side > - > > Key: SPARK-28697 > URL: https://issues.apache.org/jira/browse/SPARK-28697 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: ABHISHEK KUMAR GUPTA >Priority: Major > Attachments: screenshot-1.png > > > spark-sql> create database func1; > Time taken: 0.095 seconds > spark-sql> use func1; > Time taken: 0.031 seconds > spark-sql> create table _(id int); > Time taken: 0.351 seconds > spark-sql> insert into _ values(1); > Time taken: 3.148 seconds > spark-sql> select * from _; > org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: > hdfs://hacluster/user/sparkhive/warehouse/func1.db/_ > at > org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287) > But at HDFS side it is present > vm1:/opt/HA/C10/install/hadoop/nodemanager/bin # ./hdfs dfs -ls > /user/sparkhive/warehouse/func1.db > Found 2 items > drwxr-xr-x - root hadoop 0 2019-08-12 20:02 > /user/sparkhive/warehouse/func1.db/_ -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-28697) select * from _; throws InvalidInputException and says path does not exists at HDFS side
[ https://issues.apache.org/jira/browse/SPARK-28697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16905138#comment-16905138 ] Ajith S edited comment on SPARK-28697 at 8/12/19 12:35 PM: --- Found this even in single node, local filesystem case. select * from _; select * from _table1; Below is the stack 19/08/12 18:00:18 ERROR SparkSQLDriver: Failed in [select * from _] org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/home/root1/spark/install/spark-3.0.0-SNAPSHOT-bin-custom-spark/bin/spark-warehouse/_ at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:297) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:239) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:325) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:205) at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:256) at scala.Option.getOrElse(Option.scala:138) at org.apache.spark.rdd.RDD.partitions(RDD.scala:254) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:256) at scala.Option.getOrElse(Option.scala:138) at org.apache.spark.rdd.RDD.partitions(RDD.scala:254) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:256) at scala.Option.getOrElse(Option.scala:138) at org.apache.spark.rdd.RDD.partitions(RDD.scala:254) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:256) at scala.Option.getOrElse(Option.scala:138) at org.apache.spark.rdd.RDD.partitions(RDD.scala:254) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:256) at scala.Option.getOrElse(Option.scala:138) at org.apache.spark.rdd.RDD.partitions(RDD.scala:254) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2119) at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:961) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:366) at org.apache.spark.rdd.RDD.collect(RDD.scala:960) at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:372) at org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:399) at org.apache.spark.sql.execution.HiveResult$.hiveResultString(HiveResult.scala:52) at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.$anonfun$run$1(SparkSQLDriver.scala:65) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$4(SQLExecution.scala:100) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:87) at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:65) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:368) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:273) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:920) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:179) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:202) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:89) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:999) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1008) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) org.apache.hadoop.mapred.InvalidInputException: Input path does no
[jira] [Comment Edited] (SPARK-28697) select * from _; throws InvalidInputException and says path does not exists at HDFS side
[ https://issues.apache.org/jira/browse/SPARK-28697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16905138#comment-16905138 ] Ajith S edited comment on SPARK-28697 at 8/12/19 12:35 PM: --- Found this even in single node, local filesystem case. Will work on it select * from _; select * from _table1; Below is the stack 19/08/12 18:00:18 ERROR SparkSQLDriver: Failed in [select * from _] org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/home/root1/spark/install/spark-3.0.0-SNAPSHOT-bin-custom-spark/bin/spark-warehouse/_ at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:297) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:239) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:325) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:205) at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:256) at scala.Option.getOrElse(Option.scala:138) at org.apache.spark.rdd.RDD.partitions(RDD.scala:254) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:256) at scala.Option.getOrElse(Option.scala:138) at org.apache.spark.rdd.RDD.partitions(RDD.scala:254) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:256) at scala.Option.getOrElse(Option.scala:138) at org.apache.spark.rdd.RDD.partitions(RDD.scala:254) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:256) at scala.Option.getOrElse(Option.scala:138) at org.apache.spark.rdd.RDD.partitions(RDD.scala:254) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:256) at scala.Option.getOrElse(Option.scala:138) at org.apache.spark.rdd.RDD.partitions(RDD.scala:254) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2119) at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:961) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:366) at org.apache.spark.rdd.RDD.collect(RDD.scala:960) at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:372) at org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:399) at org.apache.spark.sql.execution.HiveResult$.hiveResultString(HiveResult.scala:52) at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.$anonfun$run$1(SparkSQLDriver.scala:65) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$4(SQLExecution.scala:100) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:87) at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:65) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:368) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:273) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:920) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:179) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:202) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:89) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:999) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1008) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) org.apache.hadoop.mapred.InvalidInputException: In
[jira] [Commented] (SPARK-28697) select * from _; throws InvalidInputException and says path does not exists at HDFS side
[ https://issues.apache.org/jira/browse/SPARK-28697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16905138#comment-16905138 ] Ajith S commented on SPARK-28697: - Found this even in single node, local filesystem case. Below is the stack 19/08/12 18:00:18 ERROR SparkSQLDriver: Failed in [select * from _] org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/home/root1/spark/install/spark-3.0.0-SNAPSHOT-bin-custom-spark/bin/spark-warehouse/_ at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:297) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:239) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:325) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:205) at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:256) at scala.Option.getOrElse(Option.scala:138) at org.apache.spark.rdd.RDD.partitions(RDD.scala:254) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:256) at scala.Option.getOrElse(Option.scala:138) at org.apache.spark.rdd.RDD.partitions(RDD.scala:254) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:256) at scala.Option.getOrElse(Option.scala:138) at org.apache.spark.rdd.RDD.partitions(RDD.scala:254) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:256) at scala.Option.getOrElse(Option.scala:138) at org.apache.spark.rdd.RDD.partitions(RDD.scala:254) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:256) at scala.Option.getOrElse(Option.scala:138) at org.apache.spark.rdd.RDD.partitions(RDD.scala:254) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2119) at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:961) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:366) at org.apache.spark.rdd.RDD.collect(RDD.scala:960) at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:372) at org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:399) at org.apache.spark.sql.execution.HiveResult$.hiveResultString(HiveResult.scala:52) at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.$anonfun$run$1(SparkSQLDriver.scala:65) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$4(SQLExecution.scala:100) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:87) at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:65) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:368) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:273) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:920) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:179) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:202) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:89) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:999) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1008) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/home/root1/spark/install/spark-3.0.0-SNAPSHOT-bin-custom-spark/bin/spark-war
[jira] [Commented] (SPARK-28696) create database _; allowing in Spark but Hive throws Parse Exception
[ https://issues.apache.org/jira/browse/SPARK-28696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16905127#comment-16905127 ] Ajith S commented on SPARK-28696: - Thanks for reporting. I can see this due to https://github.com/apache/spark/blob/master/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4#L1694 where just "_" is a valid identifier. To keep it compatible with Hive i can fix this. Any thoughts [~srowen] [~dongjoon] > create database _; allowing in Spark but Hive throws Parse Exception > > > Key: SPARK-28696 > URL: https://issues.apache.org/jira/browse/SPARK-28696 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: ABHISHEK KUMAR GUPTA >Priority: Minor > > In Spark > spark-sql> create database _; > Time taken: 0.062 seconds > spark-sql> show databases; > _ > adap > adaptive > adaptive_tc8 > In Hive > 0: jdbc:hive2://10.18.98.147:21066/> create database _; > Error: Error while compiling statement: FAILED: ParseException line 1:16 > cannot recognize input near '_--0' '' '' in create database > statement (state=42000,code=4) -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28676) Avoid Excessive logging from ContextCleaner
Ajith S created SPARK-28676: --- Summary: Avoid Excessive logging from ContextCleaner Key: SPARK-28676 URL: https://issues.apache.org/jira/browse/SPARK-28676 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 2.4.3, 2.3.3, 3.0.0 Reporter: Ajith S In high workload environments, ContextCleaner seems to have excessive logging at INFO level which do not give much information. In one Particular case we see that ``INFO ContextCleaner: Cleaned accumulator`` message is 25-30% of the generated logs. We can log this information for cleanup in DEBUG level instead. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23626) DAGScheduler blocked due to JobSubmitted event
[ https://issues.apache.org/jira/browse/SPARK-23626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S updated SPARK-23626: Affects Version/s: 2.4.3 > DAGScheduler blocked due to JobSubmitted event > --- > > Key: SPARK-23626 > URL: https://issues.apache.org/jira/browse/SPARK-23626 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 2.2.1, 2.3.3, 3.0.0, 2.4.3 >Reporter: Ajith S >Priority: Major > > DAGScheduler becomes a bottleneck in cluster when multiple JobSubmitted > events has to be processed as DAGSchedulerEventProcessLoop is single threaded > and it will block other tasks in queue like TaskCompletion. > The JobSubmitted event is time consuming depending on the nature of the job > (Example: calculating parent stage dependencies, shuffle dependencies, > partitions) and thus it blocks all the events to be processed. > > I see multiple JIRA referring to this behavior > https://issues.apache.org/jira/browse/SPARK-2647 > https://issues.apache.org/jira/browse/SPARK-4961 > > Similarly in my cluster some jobs partition calculation is time consuming > (Similar to stack at SPARK-2647) hence it slows down the spark > DAGSchedulerEventProcessLoop which results in user jobs to slowdown, even if > its tasks are finished within seconds, as TaskCompletion Events are processed > at a slower rate due to blockage. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23626) DAGScheduler blocked due to JobSubmitted event
[ https://issues.apache.org/jira/browse/SPARK-23626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S updated SPARK-23626: Summary: DAGScheduler blocked due to JobSubmitted event (was: Spark DAGScheduler scheduling performance hindered on JobSubmitted Event) > DAGScheduler blocked due to JobSubmitted event > --- > > Key: SPARK-23626 > URL: https://issues.apache.org/jira/browse/SPARK-23626 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 2.2.1, 2.3.3, 3.0.0 >Reporter: Ajith S >Priority: Major > > DAGScheduler becomes a bottleneck in cluster when multiple JobSubmitted > events has to be processed as DAGSchedulerEventProcessLoop is single threaded > and it will block other tasks in queue like TaskCompletion. > The JobSubmitted event is time consuming depending on the nature of the job > (Example: calculating parent stage dependencies, shuffle dependencies, > partitions) and thus it blocks all the events to be processed. > > I see multiple JIRA referring to this behavior > https://issues.apache.org/jira/browse/SPARK-2647 > https://issues.apache.org/jira/browse/SPARK-4961 > > Similarly in my cluster some jobs partition calculation is time consuming > (Similar to stack at SPARK-2647) hence it slows down the spark > DAGSchedulerEventProcessLoop which results in user jobs to slowdown, even if > its tasks are finished within seconds, as TaskCompletion Events are processed > at a slower rate due to blockage. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23626) Spark DAGScheduler scheduling performance hindered on JobSubmitted Event
[ https://issues.apache.org/jira/browse/SPARK-23626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S updated SPARK-23626: Labels: (was: bulk-closed) > Spark DAGScheduler scheduling performance hindered on JobSubmitted Event > > > Key: SPARK-23626 > URL: https://issues.apache.org/jira/browse/SPARK-23626 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 2.2.1, 2.3.3, 3.0.0 >Reporter: Ajith S >Priority: Major > > DAGScheduler becomes a bottleneck in cluster when multiple JobSubmitted > events has to be processed as DAGSchedulerEventProcessLoop is single threaded > and it will block other tasks in queue like TaskCompletion. > The JobSubmitted event is time consuming depending on the nature of the job > (Example: calculating parent stage dependencies, shuffle dependencies, > partitions) and thus it blocks all the events to be processed. > > I see multiple JIRA referring to this behavior > https://issues.apache.org/jira/browse/SPARK-2647 > https://issues.apache.org/jira/browse/SPARK-4961 > > Similarly in my cluster some jobs partition calculation is time consuming > (Similar to stack at SPARK-2647) hence it slows down the spark > DAGSchedulerEventProcessLoop which results in user jobs to slowdown, even if > its tasks are finished within seconds, as TaskCompletion Events are processed > at a slower rate due to blockage. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27907) HiveUDAF with 0 rows throw NPE
[ https://issues.apache.org/jira/browse/SPARK-27907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S updated SPARK-27907: Description: When query returns zero rows, the HiveUDAFFunction throws NPE CASE 1: create table abc(a int) select histogram_numeric(a,2) from abc // NPE Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 0, localhost, executor driver): java.lang.NullPointerException at org.apache.spark.sql.hive.HiveUDAFFunction.eval(hiveUDFs.scala:471) at org.apache.spark.sql.hive.HiveUDAFFunction.eval(hiveUDFs.scala:315) at org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggregate.eval(interfaces.scala:543) at org.apache.spark.sql.execution.aggregate.AggregationIterator.$anonfun$generateResultProjection$5(AggregationIterator.scala:231) at org.apache.spark.sql.execution.aggregate.ObjectAggregationIterator.outputForEmptyGroupingKeyWithoutInput(ObjectAggregationIterator.scala:97) at org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec.$anonfun$doExecute$2(ObjectHashAggregateExec.scala:132) at org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec.$anonfun$doExecute$2$adapted(ObjectHashAggregateExec.scala:107) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndexInternal$2(RDD.scala:839) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndexInternal$2$adapted(RDD.scala:839) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:327) at org.apache.spark.rdd.RDD.iterator(RDD.scala:291) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:327) at org.apache.spark.rdd.RDD.iterator(RDD.scala:291) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:327) at org.apache.spark.rdd.RDD.iterator(RDD.scala:291) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:122) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:425) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1350) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:428) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) CASE 2: create table abc(a int) insert into abc values (1) select histogram_numeric(a,2) from abc where a=3 //NPE Job aborted due to stage failure: Task 0 in stage 4.0 failed 1 times, most recent failure: Lost task 0.0 in stage 4.0 (TID 5, localhost, executor driver): java.lang.NullPointerException at org.apache.spark.sql.hive.HiveUDAFFunction.serialize(hiveUDFs.scala:477) at org.apache.spark.sql.hive.HiveUDAFFunction.serialize(hiveUDFs.scala:315) at org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggregate.serializeAggregateBufferInPlace(interfaces.scala:570) at org.apache.spark.sql.execution.aggregate.AggregationIterator.$anonfun$generateResultProjection$6(AggregationIterator.scala:254) at org.apache.spark.sql.execution.aggregate.ObjectAggregationIterator.outputForEmptyGroupingKeyWithoutInput(ObjectAggregationIterator.scala:97) at org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec.$anonfun$doExecute$2(ObjectHashAggregateExec.scala:132) at org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec.$anonfun$doExecute$2$adapted(ObjectHashAggregateExec.scala:107) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndexInternal$2(RDD.scala:839) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndexInternal$2$adapted(RDD.scala:839) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:327) at org.apache.spark.rdd.RDD.iterator(RDD.scala:291) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:327) at org.apache.spark.rdd.RDD.iterator(RDD.scala:291) at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:94) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52) at org.apache.spark.scheduler.Task.run(T
[jira] [Updated] (SPARK-27907) HiveUDAF with 0 rows throw NPE
[ https://issues.apache.org/jira/browse/SPARK-27907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S updated SPARK-27907: Summary: HiveUDAF with 0 rows throw NPE (was: HiveUDAF with 0 rows throw NPE when try to serialize) > HiveUDAF with 0 rows throw NPE > -- > > Key: SPARK-27907 > URL: https://issues.apache.org/jira/browse/SPARK-27907 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.3, 3.0.0, 2.4.3, 3.1.0 >Reporter: Ajith S >Priority: Major > > When query returns zero rows, the HiveUDAFFunction.seralize throws NPE > create table abc(a int) > insert into abc values (1) > insert into abc values (2) > select histogram_numeric(a,2) from abc where a=3 //NPE > Job aborted due to stage failure: Task 0 in stage 4.0 failed 1 times, most > recent failure: Lost task 0.0 in stage 4.0 (TID 5, localhost, executor > driver): java.lang.NullPointerException > at > org.apache.spark.sql.hive.HiveUDAFFunction.serialize(hiveUDFs.scala:477) > at > org.apache.spark.sql.hive.HiveUDAFFunction.serialize(hiveUDFs.scala:315) > at > org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggregate.serializeAggregateBufferInPlace(interfaces.scala:570) > at > org.apache.spark.sql.execution.aggregate.AggregationIterator.$anonfun$generateResultProjection$6(AggregationIterator.scala:254) > at > org.apache.spark.sql.execution.aggregate.ObjectAggregationIterator.outputForEmptyGroupingKeyWithoutInput(ObjectAggregationIterator.scala:97) > at > org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec.$anonfun$doExecute$2(ObjectHashAggregateExec.scala:132) > at > org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec.$anonfun$doExecute$2$adapted(ObjectHashAggregateExec.scala:107) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndexInternal$2(RDD.scala:839) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndexInternal$2$adapted(RDD.scala:839) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:327) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:291) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:327) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:291) > at > org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:94) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52) > at org.apache.spark.scheduler.Task.run(Task.scala:122) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:425) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1350) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:428) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27907) HiveUDAF with 0 rows throw NPE when try to serialize
Ajith S created SPARK-27907: --- Summary: HiveUDAF with 0 rows throw NPE when try to serialize Key: SPARK-27907 URL: https://issues.apache.org/jira/browse/SPARK-27907 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.3, 2.3.3, 3.0.0, 3.1.0 Reporter: Ajith S When query returns zero rows, the HiveUDAFFunction.seralize throws NPE create table abc(a int) insert into abc values (1) insert into abc values (2) select histogram_numeric(a,2) from abc where a=3 //NPE Job aborted due to stage failure: Task 0 in stage 4.0 failed 1 times, most recent failure: Lost task 0.0 in stage 4.0 (TID 5, localhost, executor driver): java.lang.NullPointerException at org.apache.spark.sql.hive.HiveUDAFFunction.serialize(hiveUDFs.scala:477) at org.apache.spark.sql.hive.HiveUDAFFunction.serialize(hiveUDFs.scala:315) at org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggregate.serializeAggregateBufferInPlace(interfaces.scala:570) at org.apache.spark.sql.execution.aggregate.AggregationIterator.$anonfun$generateResultProjection$6(AggregationIterator.scala:254) at org.apache.spark.sql.execution.aggregate.ObjectAggregationIterator.outputForEmptyGroupingKeyWithoutInput(ObjectAggregationIterator.scala:97) at org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec.$anonfun$doExecute$2(ObjectHashAggregateExec.scala:132) at org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec.$anonfun$doExecute$2$adapted(ObjectHashAggregateExec.scala:107) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndexInternal$2(RDD.scala:839) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndexInternal$2$adapted(RDD.scala:839) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:327) at org.apache.spark.rdd.RDD.iterator(RDD.scala:291) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:327) at org.apache.spark.rdd.RDD.iterator(RDD.scala:291) at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:94) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52) at org.apache.spark.scheduler.Task.run(Task.scala:122) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:425) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1350) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:428) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23626) Spark DAGScheduler scheduling performance hindered on JobSubmitted Event
[ https://issues.apache.org/jira/browse/SPARK-23626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S updated SPARK-23626: Affects Version/s: 3.0.0 > Spark DAGScheduler scheduling performance hindered on JobSubmitted Event > > > Key: SPARK-23626 > URL: https://issues.apache.org/jira/browse/SPARK-23626 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 2.2.1, 2.3.3, 3.0.0 >Reporter: Ajith S >Priority: Major > Labels: bulk-closed > > DAGScheduler becomes a bottleneck in cluster when multiple JobSubmitted > events has to be processed as DAGSchedulerEventProcessLoop is single threaded > and it will block other tasks in queue like TaskCompletion. > The JobSubmitted event is time consuming depending on the nature of the job > (Example: calculating parent stage dependencies, shuffle dependencies, > partitions) and thus it blocks all the events to be processed. > > I see multiple JIRA referring to this behavior > https://issues.apache.org/jira/browse/SPARK-2647 > https://issues.apache.org/jira/browse/SPARK-4961 > > Similarly in my cluster some jobs partition calculation is time consuming > (Similar to stack at SPARK-2647) hence it slows down the spark > DAGSchedulerEventProcessLoop which results in user jobs to slowdown, even if > its tasks are finished within seconds, as TaskCompletion Events are processed > at a slower rate due to blockage. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23626) Spark DAGScheduler scheduling performance hindered on JobSubmitted Event
[ https://issues.apache.org/jira/browse/SPARK-23626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S updated SPARK-23626: Affects Version/s: 2.3.3 > Spark DAGScheduler scheduling performance hindered on JobSubmitted Event > > > Key: SPARK-23626 > URL: https://issues.apache.org/jira/browse/SPARK-23626 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 2.2.1, 2.3.3 >Reporter: Ajith S >Priority: Major > Labels: bulk-closed > > DAGScheduler becomes a bottleneck in cluster when multiple JobSubmitted > events has to be processed as DAGSchedulerEventProcessLoop is single threaded > and it will block other tasks in queue like TaskCompletion. > The JobSubmitted event is time consuming depending on the nature of the job > (Example: calculating parent stage dependencies, shuffle dependencies, > partitions) and thus it blocks all the events to be processed. > > I see multiple JIRA referring to this behavior > https://issues.apache.org/jira/browse/SPARK-2647 > https://issues.apache.org/jira/browse/SPARK-4961 > > Similarly in my cluster some jobs partition calculation is time consuming > (Similar to stack at SPARK-2647) hence it slows down the spark > DAGSchedulerEventProcessLoop which results in user jobs to slowdown, even if > its tasks are finished within seconds, as TaskCompletion Events are processed > at a slower rate due to blockage. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-23626) Spark DAGScheduler scheduling performance hindered on JobSubmitted Event
[ https://issues.apache.org/jira/browse/SPARK-23626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S reopened SPARK-23626: - Resolution in progress > Spark DAGScheduler scheduling performance hindered on JobSubmitted Event > > > Key: SPARK-23626 > URL: https://issues.apache.org/jira/browse/SPARK-23626 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 2.2.1 >Reporter: Ajith S >Priority: Major > Labels: bulk-closed > > DAGScheduler becomes a bottleneck in cluster when multiple JobSubmitted > events has to be processed as DAGSchedulerEventProcessLoop is single threaded > and it will block other tasks in queue like TaskCompletion. > The JobSubmitted event is time consuming depending on the nature of the job > (Example: calculating parent stage dependencies, shuffle dependencies, > partitions) and thus it blocks all the events to be processed. > > I see multiple JIRA referring to this behavior > https://issues.apache.org/jira/browse/SPARK-2647 > https://issues.apache.org/jira/browse/SPARK-4961 > > Similarly in my cluster some jobs partition calculation is time consuming > (Similar to stack at SPARK-2647) hence it slows down the spark > DAGSchedulerEventProcessLoop which results in user jobs to slowdown, even if > its tasks are finished within seconds, as TaskCompletion Events are processed > at a slower rate due to blockage. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27264) spark sql released all executor but the job is not done
[ https://issues.apache.org/jira/browse/SPARK-27264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800323#comment-16800323 ] Ajith S commented on SPARK-27264: - Can you add a snippet to reproduce this.? > spark sql released all executor but the job is not done > --- > > Key: SPARK-27264 > URL: https://issues.apache.org/jira/browse/SPARK-27264 > Project: Spark > Issue Type: Question > Components: SQL >Affects Versions: 2.4.0 > Environment: Azure HDinsight spark 2.4 on Azure storage SQL: Read and > Join some data and finally write result to a Hive metastore; query executed > on jupyterhub; while the pre-migration cluster is a jupyter (non-hub) >Reporter: Mike Chan >Priority: Major > > I have a spark sql that used to execute < 10 mins now running at 3 hours > after a cluster migration and need to deep dive on what it's actually doing. > I'm new to spark and please don't mind if I'm asking something unrelated. > Increased spark.executor.memory but no luck. Env: Azure HDinsight spark 2.4 > on Azure storage SQL: Read and Join some data and finally write result to a > Hive metastore > The sparl.sql ends with below code: > .write.mode("overwrite").saveAsTable("default.mikemiketable") > Application Behavior: Within the first 15 mins, it loads and complete most > tasks (199/200); left only 1 executor process alive and continually to > shuffle read / write data. Because now it only leave 1 executor, we need to > wait 3 hours until this application finish. > [!https://i.stack.imgur.com/6hqvh.png!|https://i.stack.imgur.com/6hqvh.png] > Left only 1 executor alive > [!https://i.stack.imgur.com/55162.png!|https://i.stack.imgur.com/55162.png] > Not sure what's the executor doing: > [!https://i.stack.imgur.com/TwhuX.png!|https://i.stack.imgur.com/TwhuX.png] > From time to time, we can tell the shuffle read increased: > [!https://i.stack.imgur.com/WhF9A.png!|https://i.stack.imgur.com/WhF9A.png] > Therefore I increased the spark.executor.memory to 20g, but nothing changed. > From Ambari and YARN I can tell the cluster has many resources left. > [!https://i.stack.imgur.com/pngQA.png!|https://i.stack.imgur.com/pngQA.png] > Release of almost all executor > [!https://i.stack.imgur.com/pA134.png!|https://i.stack.imgur.com/pA134.png] > Any guidance is greatly appreciated. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27219) Misleading exceptions in transport code's SASL fallback path
[ https://issues.apache.org/jira/browse/SPARK-27219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16797435#comment-16797435 ] Ajith S commented on SPARK-27219: - So do we just log a simple warn with one line message and print the stack in a finer(DEBUG, TRACE) log level.? > Misleading exceptions in transport code's SASL fallback path > > > Key: SPARK-27219 > URL: https://issues.apache.org/jira/browse/SPARK-27219 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Marcelo Vanzin >Priority: Minor > > There are a couple of code paths in the SASL fallback handling that result in > misleading exceptions printed to logs. One of them is if a timeout occurs > during authentication; for example: > {noformat} > 19/03/15 11:21:37 WARN crypto.AuthClientBootstrap: New auth protocol failed, > trying SASL. > java.lang.RuntimeException: java.util.concurrent.TimeoutException: Timeout > waiting for task. > at > org.spark_project.guava.base.Throwables.propagate(Throwables.java:160) > at > org.apache.spark.network.client.TransportClient.sendRpcSync(TransportClient.java:258) > at > org.apache.spark.network.crypto.AuthClientBootstrap.doSparkAuth(AuthClientBootstrap.java:105) > at > org.apache.spark.network.crypto.AuthClientBootstrap.doBootstrap(AuthClientBootstrap.java:79) > at > org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:262) > at > org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:192) > at > org.apache.spark.network.shuffle.ExternalShuffleClient.lambda$fetchBlocks$0(ExternalShuffleClient.java:100) > at > org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:141) > ... > Caused by: java.util.concurrent.TimeoutException: Timeout waiting for task. > at > org.spark_project.guava.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:276) > at > org.spark_project.guava.util.concurrent.AbstractFuture.get(AbstractFuture.java:96) > at > org.apache.spark.network.client.TransportClient.sendRpcSync(TransportClient.java:254) > ... 38 more > 19/03/15 11:21:38 WARN server.TransportChannelHandler: Exception in > connection from vc1033.halxg.cloudera.com/10.17.216.43:7337 > java.lang.IllegalArgumentException: Frame length should be positive: > -3702202170875367528 > at > org.spark_project.guava.base.Preconditions.checkArgument(Preconditions.java:119) > {noformat} > The IllegalArgumentException shouldn't happen, it only happens because the > code is ignoring the time out and retrying, at which point the remote side is > in a different state and thus doesn't expect the message. > The same line that prints that exception can result in a noisy log message > when the remote side (e.g. an old shuffle service) does not understand the > new auth protocol. Since it's a warning it seems like something is wrong, > when it's just doing what's expected. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27220) Remove Yarn specific leftover from CoarseGrainedSchedulerBackend
[ https://issues.apache.org/jira/browse/SPARK-27220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16797428#comment-16797428 ] Ajith S commented on SPARK-27220: - # About making *currentExecutorIdCounter* datatype consistent, Yes, *currentExecutorIdCounter* is int initially in *CoarseGrainedSchedulerBackend*, but when it expects *RegisterExecutor* it expects String which makes it confusing. ** Also *CoarseGrainedExecutorBackend* fires *RegisterExecutor* incase of yarn,mesos with executorId as String # About moving out *currentExecutorIdCounter* from *CoarseGrainedSchedulerBackend,* this i am unsure as *CoarseGrainedSchedulerBackend* is just offering a mechanism to maintain executor ids which yarn is just reusing (But i see mesos ignores it completely and instead uses mesosTaskId, so makes sense of moving *currentExecutorIdCounter* out to yarn) cc [~srowen] [~dongjoon] [~hyukjin.kwon] any thoughts.? > Remove Yarn specific leftover from CoarseGrainedSchedulerBackend > > > Key: SPARK-27220 > URL: https://issues.apache.org/jira/browse/SPARK-27220 > Project: Spark > Issue Type: Task > Components: Spark Core, YARN >Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.3, 2.4.0 >Reporter: Jacek Lewandowski >Priority: Minor > > {{CoarseGrainedSchedulerBackend}} has the following field: > {code:scala} > // The num of current max ExecutorId used to re-register appMaster > @volatile protected var currentExecutorIdCounter = 0 > {code} > which is then updated: > {code:scala} > case RegisterExecutor(executorId, executorRef, hostname, cores, > logUrls) => > ... > // This must be synchronized because variables mutated > // in this block are read when requesting executors > CoarseGrainedSchedulerBackend.this.synchronized { > executorDataMap.put(executorId, data) > if (currentExecutorIdCounter < executorId.toInt) { > currentExecutorIdCounter = executorId.toInt > } > ... > {code} > However it is never really used in {{CoarseGrainedSchedulerBackend}}. Its > only usage is in Yarn-specific code. It should be moved to Yarn then because > {{executorId}} is a {{String}} and there are really no guarantees that it is > always an integer. It was introduced in SPARK-12864 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27194) Job failures when task attempts do not clean up spark-staging parquet files
[ https://issues.apache.org/jira/browse/SPARK-27194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16797241#comment-16797241 ] Ajith S commented on SPARK-27194: - Hi [~dongjoon] , i have some analysis here : [https://github.com/apache/spark/pull/24142#issuecomment-474866759] Please let me know your views > Job failures when task attempts do not clean up spark-staging parquet files > --- > > Key: SPARK-27194 > URL: https://issues.apache.org/jira/browse/SPARK-27194 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 2.3.1, 2.3.2 >Reporter: Reza Safi >Priority: Major > > When a container fails for some reason (for example when killed by yarn for > exceeding memory limits), the subsequent task attempts for the tasks that > were running on that container all fail with a FileAlreadyExistsException. > The original task attempt does not seem to successfully call abortTask (or at > least its "best effort" delete is unsuccessful) and clean up the parquet file > it was writing to, so when later task attempts try to write to the same > spark-staging directory using the same file name, the job fails. > Here is what transpires in the logs: > The container where task 200.0 is running is killed and the task is lost: > {code} > 19/02/20 09:33:25 ERROR cluster.YarnClusterScheduler: Lost executor y on > t.y.z.com: Container killed by YARN for exceeding memory limits. 8.1 GB of 8 > GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead. > 19/02/20 09:33:25 WARN scheduler.TaskSetManager: Lost task 200.0 in stage > 0.0 (TID xxx, t.y.z.com, executor 93): ExecutorLostFailure (executor 93 > exited caused by one of the running tasks) Reason: Container killed by YARN > for exceeding memory limits. 8.1 GB of 8 GB physical memory used. Consider > boosting spark.yarn.executor.memoryOverhead. > {code} > The task is re-attempted on a different executor and fails because the > part-00200-blah-blah.c000.snappy.parquet file from the first task attempt > already exists: > {code} > 19/02/20 09:35:01 WARN scheduler.TaskSetManager: Lost task 200.1 in stage 0.0 > (TID 594, tn.y.z.com, executor 70): org.apache.spark.SparkException: Task > failed while writing rows. > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:285) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:197) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:196) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:109) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.fs.FileAlreadyExistsException: > /user/hive/warehouse/tmp_supply_feb1/.spark-staging-blah-blah-blah/dt=2019-02-17/part-00200-blah-blah.c000.snappy.parquet > for client a.b.c.d already exists > {code} > The job fails when the the configured task attempts (spark.task.maxFailures) > have failed with the same error: > {code} > org.apache.spark.SparkException: Job aborted due to stage failure: Task 200 > in stage 0.0 failed 20 times, most recent failure: Lost task 284.19 in stage > 0.0 (TID yyy, tm.y.z.com, executor 16): org.apache.spark.SparkException: Task > failed while writing rows. > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:285) > ... > Caused by: org.apache.hadoop.fs.FileAlreadyExistsException: > /user/hive/warehouse/tmp_supply_feb1/.spark-staging-blah-blah-blah/dt=2019-02-17/part-00200-blah-blah.c000.snappy.parquet > for client i.p.a.d already exists > {code} > SPARK-26682 wasn't the root cause here, since there wasn't any stage > reattempt. > This issue seems to happen when > spark.sql.sources.partitionOverwriteMode=dynamic. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-27194) Job failures when task attempts do not clean up spark-staging parquet files
[ https://issues.apache.org/jira/browse/SPARK-27194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16797216#comment-16797216 ] Ajith S edited comment on SPARK-27194 at 3/20/19 2:38 PM: -- [~dongjoon] Yes i tried with spark 2.3.3, and the issue persist. Here is the operation i performed {code:java} spark.sql.sources.partitionOverwriteMode=DYNAMIC{code} {code:java} create table t1 (i int, part1 int, part2 int) using parquet partitioned by (part1, part2) insert into t1 partition(part1=1, part2=1) select 1 insert overwrite table t1 partition(part1=1, part2=1) select 2 insert overwrite table t1 partition(part1=2, part2) select 2, 2 // here the exec is killed and task respawns{code} and here is the full stacktrace as per 2.3.3 {code:java} 2019-03-20 19:58:06 WARN TaskSetManager:66 - Lost task 0.1 in stage 2.0 (TID 3, QWERTY, executor 2): org.apache.spark.SparkException: Task failed while writing rows. at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:285) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:197) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:196) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:109) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.fs.FileAlreadyExistsException: /user/hive/warehouse/t2/.spark-staging-1f1efbfd-7e20-4e0f-a49c-a7fa3eae4cb1/part1=2/part2=2/part-0-1f1efbfd-7e20-4e0f-a49c-a7fa3eae4cb1.c000.snappy.parquet for client 127.0.0.1 already exists at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2578) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2465) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2349) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:624) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:398) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2217) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2213) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2213) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1653) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1689) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1624) at org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:448) at org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:444) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:459) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:387) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:892) at org.apache.parquet.hadoop.ParquetFileWriter.(ParquetFileWriter.java:236) at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:342) at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:302) at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.(ParquetOutputWriter.scala:37) at org
[jira] [Commented] (SPARK-27194) Job failures when task attempts do not clean up spark-staging parquet files
[ https://issues.apache.org/jira/browse/SPARK-27194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16797216#comment-16797216 ] Ajith S commented on SPARK-27194: - [~dongjoon] Yes i tried with spark 2.3.3, and the issue persist. Here is the operation i performed {code:java} create table t1 (i int, part1 int, part2 int) using parquet partitioned by (part1, part2) insert into t1 partition(part1=1, part2=1) select 1 insert overwrite table t1 partition(part1=1, part2=1) select 2 insert overwrite table t1 partition(part1=2, part2) select 2, 2 // here the exec is killed and task respawns{code} and here is the full stacktrace as per 2.3.3 {code:java} 2019-03-20 19:58:06 WARN TaskSetManager:66 - Lost task 0.1 in stage 2.0 (TID 3, QWERTY, executor 2): org.apache.spark.SparkException: Task failed while writing rows. at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:285) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:197) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:196) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:109) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.fs.FileAlreadyExistsException: /user/hive/warehouse/t2/.spark-staging-1f1efbfd-7e20-4e0f-a49c-a7fa3eae4cb1/part1=2/part2=2/part-0-1f1efbfd-7e20-4e0f-a49c-a7fa3eae4cb1.c000.snappy.parquet for client 127.0.0.1 already exists at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2578) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2465) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2349) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:624) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:398) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2217) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2213) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2213) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1653) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1689) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1624) at org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:448) at org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:444) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:459) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:387) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:892) at org.apache.parquet.hadoop.ParquetFileWriter.(ParquetFileWriter.java:236) at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:342) at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:302) at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.(ParquetOutputWriter.scala:37) at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:151) a
[jira] [Created] (SPARK-27200) History Environment tab must sort Configurations/Properties by default
Ajith S created SPARK-27200: --- Summary: History Environment tab must sort Configurations/Properties by default Key: SPARK-27200 URL: https://issues.apache.org/jira/browse/SPARK-27200 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 3.0.0 Reporter: Ajith S Environment Page in SparkUI have all the configuration sorted by key. But this is not the case in History server case, to keep UX same, we can have it sorted in history server too -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27198) Heartbeat interval mismatch in driver and executor
[ https://issues.apache.org/jira/browse/SPARK-27198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16795666#comment-16795666 ] Ajith S commented on SPARK-27198: - will be working on this > Heartbeat interval mismatch in driver and executor > -- > > Key: SPARK-27198 > URL: https://issues.apache.org/jira/browse/SPARK-27198 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.3, 2.4.0 >Reporter: Ajith S >Priority: Major > > When heartbeat interval is configured via *spark.executor.heartbeatInterval* > without specifying units, we have time mismatched between driver(considers in > seconds) and executor(considers as milliseconds) > > [https://github.com/apache/spark/blob/v2.4.1-rc8/core/src/main/scala/org/apache/spark/SparkConf.scala#L613] > vs > [https://github.com/apache/spark/blob/v2.4.1-rc8/core/src/main/scala/org/apache/spark/executor/Executor.scala#L858] > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27198) Heartbeat interval mismatch in driver and executor
Ajith S created SPARK-27198: --- Summary: Heartbeat interval mismatch in driver and executor Key: SPARK-27198 URL: https://issues.apache.org/jira/browse/SPARK-27198 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.4.0, 2.3.3 Reporter: Ajith S When heartbeat interval is configured via *spark.executor.heartbeatInterval* without specifying units, we have time mismatched between driver(considers in seconds) and executor(considers as milliseconds) [https://github.com/apache/spark/blob/v2.4.1-rc8/core/src/main/scala/org/apache/spark/SparkConf.scala#L613] vs [https://github.com/apache/spark/blob/v2.4.1-rc8/core/src/main/scala/org/apache/spark/executor/Executor.scala#L858] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27194) Job failures when task attempts do not clean up spark-staging parquet files
[ https://issues.apache.org/jira/browse/SPARK-27194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16795575#comment-16795575 ] Ajith S commented on SPARK-27194: - Currently looks like from logs the file name for task 200.0 and 200.1(reattempt) expected file name to be, part-00200-blah-blah.c000.snappy.parquet. (refer org.apache.spark.internal.io.HadoopMapReduceCommitProtocol#getFilename) May be we should have taskId_attemptId in the part file name so that rerun tasks do not conflict with older failed tasks. cc [~srowen] [~cloud_fan] [~dongjoon] any thoughts.? > Job failures when task attempts do not clean up spark-staging parquet files > --- > > Key: SPARK-27194 > URL: https://issues.apache.org/jira/browse/SPARK-27194 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 2.3.1, 2.3.2 >Reporter: Reza Safi >Priority: Major > > When a container fails for some reason (for example when killed by yarn for > exceeding memory limits), the subsequent task attempts for the tasks that > were running on that container all fail with a FileAlreadyExistsException. > The original task attempt does not seem to successfully call abortTask (or at > least its "best effort" delete is unsuccessful) and clean up the parquet file > it was writing to, so when later task attempts try to write to the same > spark-staging directory using the same file name, the job fails. > Here is what transpires in the logs: > The container where task 200.0 is running is killed and the task is lost: > 19/02/20 09:33:25 ERROR cluster.YarnClusterScheduler: Lost executor y on > t.y.z.com: Container killed by YARN for exceeding memory limits. 8.1 GB of 8 > GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead. > 19/02/20 09:33:25 WARN scheduler.TaskSetManager: Lost task 200.0 in stage > 0.0 (TID xxx, t.y.z.com, executor 93): ExecutorLostFailure (executor 93 > exited caused by one of the running tasks) Reason: Container killed by YARN > for exceeding memory limits. 8.1 GB of 8 GB physical memory used. Consider > boosting spark.yarn.executor.memoryOverhead. > The task is re-attempted on a different executor and fails because the > part-00200-blah-blah.c000.snappy.parquet file from the first task attempt > already exists: > 19/02/20 09:35:01 WARN scheduler.TaskSetManager: Lost task 200.1 in stage 0.0 > (TID 594, tn.y.z.com, executor 70): org.apache.spark.SparkException: Task > failed while writing rows. > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:285) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:197) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:196) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:109) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.fs.FileAlreadyExistsException: > /user/hive/warehouse/tmp_supply_feb1/.spark-staging-blah-blah-blah/dt=2019-02-17/part-00200-blah-blah.c000.snappy.parquet > for client 17.161.235.91 already exists > The job fails when the the configured task attempts (spark.task.maxFailures) > have failed with the same error: > org.apache.spark.SparkException: Job aborted due to stage failure: Task 200 > in stage 0.0 failed 20 times, most recent failure: Lost task 284.19 in stage > 0.0 (TID yyy, tm.y.z.com, executor 16): org.apache.spark.SparkException: Task > failed while writing rows. > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:285) > ... > Caused by: org.apache.hadoop.fs.FileAlreadyExistsException: > /user/hive/warehouse/tmp_supply_feb1/.spark-staging-blah-blah-blah/dt=2019-02-17/part-00200-blah-blah.c000.snappy.parquet > for client i.p.a.d already exists > > SPARK-26682 wasn't the root cause here, since there wasn't any stage > reattempt. > This issue seems to happen when > spark.sql.sources.partitionOverwriteMode=dynamic. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-
[jira] [Commented] (SPARK-26961) Found Java-level deadlock in Spark Driver
[ https://issues.apache.org/jira/browse/SPARK-26961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16794551#comment-16794551 ] Ajith S commented on SPARK-26961: - [~srowen] ok, will raise a PR for this. Thanks > Found Java-level deadlock in Spark Driver > - > > Key: SPARK-26961 > URL: https://issues.apache.org/jira/browse/SPARK-26961 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.3.0 >Reporter: Rong Jialei >Priority: Major > Attachments: image-2019-03-13-19-53-52-390.png > > > Our spark job usually will finish in minutes, however, we recently found it > take days to run, and we can only kill it when this happened. > An investigation show all worker container could not connect drive after > start, and driver is hanging, using jstack, we found a Java-level deadlock. > > *Jstack output for deadlock part is showing below:* > > Found one Java-level deadlock: > = > "SparkUI-907": > waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a > org.apache.hadoop.conf.Configuration), > which is held by "ForkJoinPool-1-worker-57" > "ForkJoinPool-1-worker-57": > waiting to lock monitor 0x7f3860574298 (object 0x0005b7991168, a > org.apache.spark.util.MutableURLClassLoader), > which is held by "ForkJoinPool-1-worker-7" > "ForkJoinPool-1-worker-7": > waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a > org.apache.hadoop.conf.Configuration), > which is held by "ForkJoinPool-1-worker-57" > Java stack information for the threads listed above: > === > "SparkUI-907": > at org.apache.hadoop.conf.Configuration.getOverlay(Configuration.java:1328) > - waiting to lock <0x0005c0c1e5e0> (a > org.apache.hadoop.conf.Configuration) > at > org.apache.hadoop.conf.Configuration.handleDeprecation(Configuration.java:684) > at org.apache.hadoop.conf.Configuration.get(Configuration.java:1088) > at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1145) > at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2363) > at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2840) > at > org.apache.hadoop.fs.FsUrlStreamHandlerFactory.createURLStreamHandler(FsUrlStreamHandlerFactory.java:74) > at java.net.URL.getURLStreamHandler(URL.java:1142) > at java.net.URL.(URL.java:599) > at java.net.URL.(URL.java:490) > at java.net.URL.(URL.java:439) > at org.apache.spark.ui.JettyUtils$$anon$4.doRequest(JettyUtils.scala:176) > at org.apache.spark.ui.JettyUtils$$anon$4.doGet(JettyUtils.scala:161) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at > org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) > at > org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772) > at > org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:171) > at > org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) > at > org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) > at > org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) > at > org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) > at > org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) > at > org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493) > at > org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) > at > org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) > at org.spark_project.jetty.server.Server.handle(Server.java:534) > at org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:320) > at > org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) > at > org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) > at org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:108) > at > org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) > at > org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) > at > org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) > at java.lang.Thread.run(Thread.java:748) > "ForkJoinPool-1-worker-57": > at java.lang.ClassLoader.loadClass(ClassLoader.java:404) > - waiting to l
[jira] [Commented] (SPARK-27142) Provide REST API for SQL level information
[ https://issues.apache.org/jira/browse/SPARK-27142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16794519#comment-16794519 ] Ajith S commented on SPARK-27142: - [~srowen] any inputs about proposal.? > Provide REST API for SQL level information > -- > > Key: SPARK-27142 > URL: https://issues.apache.org/jira/browse/SPARK-27142 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.0.0 >Reporter: Ajith S >Priority: Minor > Attachments: image-2019-03-13-19-29-26-896.png > > > Currently for Monitoring Spark application SQL information is not available > from REST but only via UI. REST provides only > applications,jobs,stages,environment. This Jira is targeted to provide a REST > API so that SQL level information can be found > > Details: > https://issues.apache.org/jira/browse/SPARK-27142?focusedCommentId=16791728&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16791728 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27164) RDD.countApprox on empty RDDs schedules jobs which never complete
[ https://issues.apache.org/jira/browse/SPARK-27164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16793222#comment-16793222 ] Ajith S commented on SPARK-27164: - i will be working on this > RDD.countApprox on empty RDDs schedules jobs which never complete > -- > > Key: SPARK-27164 > URL: https://issues.apache.org/jira/browse/SPARK-27164 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.3, 2.4.0 > Environment: macOS, Spark-2.4.0 with Hadoop 2.7 running on Java 11.0.1 > Also observed on: > macOS, Spark-2.2.3 with Hadoop 2.7 running on Java 1.8.0_151 >Reporter: Ryan Moore >Priority: Major > Attachments: Screen Shot 2019-03-14 at 1.49.19 PM.png > > > When calling `countApprox` on an RDD which has no partitions (such as those > created by `sparkContext.emptyRDD`) a job is scheduled with 0 stages and 0 > tasks. That job appears under the "Active Jobs" in the Spark UI until it is > either killed or the Spark context is shut down. > > {code:java} > Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 11.0.1) > Type in expressions to have them evaluated. > Type :help for more information. > scala> val ints = sc.makeRDD(Seq(1)) > ints: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at makeRDD at > :24 > scala> ints.countApprox(1000) > res0: > org.apache.spark.partial.PartialResult[org.apache.spark.partial.BoundedDouble] > = (final: [1.000, 1.000]) > // PartialResult is returned, Scheduled job completed > scala> ints.filter(_ => false).countApprox(1000) > res1: > org.apache.spark.partial.PartialResult[org.apache.spark.partial.BoundedDouble] > = (final: [0.000, 0.000]) > // PartialResult is returned, Scheduled job completed > scala> sc.emptyRDD[Int].countApprox(1000) > res5: > org.apache.spark.partial.PartialResult[org.apache.spark.partial.BoundedDouble] > = (final: [0.000, 0.000]) > // PartialResult is returned, Scheduled job is ACTIVE but never completes > scala> sc.union(Nil : Seq[org.apache.spark.rdd.RDD[Int]]).countApprox(1000) > res16: > org.apache.spark.partial.PartialResult[org.apache.spark.partial.BoundedDouble] > = (final: [0.000, 0.000]) > // PartialResult is returned, Scheduled job is ACTIVE but never completes > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27122) YARN test failures in Java 9+
[ https://issues.apache.org/jira/browse/SPARK-27122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16792355#comment-16792355 ] Ajith S commented on SPARK-27122: - ping [~srowen] [~dongjoon] [~Gengliang.Wang] > YARN test failures in Java 9+ > - > > Key: SPARK-27122 > URL: https://issues.apache.org/jira/browse/SPARK-27122 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 3.0.0 >Reporter: Sean Owen >Priority: Major > Attachments: image-2019-03-14-09-34-20-592.png, > image-2019-03-14-09-35-23-046.png > > > Currently on Java 11: > {code} > YarnSchedulerBackendSuite: > - RequestExecutors reflects node blacklist and is serializable > - Respect user filters when adding AM IP filter *** FAILED *** > java.lang.ClassCastException: > org.spark_project.jetty.servlet.ServletContextHandler cannot be cast to > org.eclipse.jetty.servlet.ServletContextHandler > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237) > at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) > at scala.collection.TraversableLike.map(TraversableLike.scala:237) > at scala.collection.TraversableLike.map$(TraversableLike.scala:230) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at > org.apache.spark.scheduler.cluster.YarnSchedulerBackend.$anonfun$addWebUIFilter$2(YarnSchedulerBackend.scala:183) > at > org.apache.spark.scheduler.cluster.YarnSchedulerBackend.$anonfun$addWebUIFilter$2$adapted(YarnSchedulerBackend.scala:174) > at scala.Option.foreach(Option.scala:274) > ... > {code} > This looks like a classpath issue, probably ultimately related to the same > classloader issues in https://issues.apache.org/jira/browse/SPARK-26839 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-27122) YARN test failures in Java 9+
[ https://issues.apache.org/jira/browse/SPARK-27122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16792318#comment-16792318 ] Ajith S edited comment on SPARK-27122 at 3/14/19 4:15 AM: -- The problem seems to be shading of jetty package. When we run test, the class path seems to be made from the classes folder(resource-managers/yarn/target/scala-2.12/classes) instead of jar (resource-managers/yarn/target/spark-yarn_2.12-3.0.0-SNAPSHOT.jar) . Check attachment. Here in org.apache.spark.scheduler.cluster.YarnSchedulerBackend, as its not shaded, it expects {color:#FF}org.eclipse.jetty.servlet.ServletContextHandler{color} {code:java} ui.getHandlers.map(_.getServletHandler()).foreach { h => val holder = new FilterHolder(){code} ui.getHandlers is in spark-core and its loaded from spark-core.jar which is shaded and hence refers to {color:#FF}org.spark_project.jetty.servlet.ServletContextHandler{color} And here is the javap command which shows the difference between org.apache.spark.scheduler.cluster.YarnSchedulerBackend present in jar folder and classes folder !image-2019-03-14-09-35-23-046.png! was (Author: ajithshetty): The problem seems to be shading of jetty package. When we run test, the class path seems to be made from the classes folder(resource-managers/yarn/target/scala-2.12/classes) instead of jar (resource-managers/yarn/target/spark-yarn_2.12-3.0.0-SNAPSHOT.jar) . Check attachment. Here in org.apache.spark.scheduler.cluster.YarnSchedulerBackend, as its not shaded, it expects org.eclipse.jetty.servlet.ServletContextHandler {code:java} ui.getHandlers.map(_.getServletHandler()).foreach { h => val holder = new FilterHolder(){code} ui.getHandlers is in spark-core and its loaded from spark-core.jar which is shaded and hence refers to org.spark_project.jetty.servlet.ServletContextHandler And here is the javap command which shows the difference between org.apache.spark.scheduler.cluster.YarnSchedulerBackend present in jar folder and classes folder !image-2019-03-14-09-35-23-046.png! > YARN test failures in Java 9+ > - > > Key: SPARK-27122 > URL: https://issues.apache.org/jira/browse/SPARK-27122 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 3.0.0 >Reporter: Sean Owen >Priority: Major > Attachments: image-2019-03-14-09-34-20-592.png, > image-2019-03-14-09-35-23-046.png > > > Currently on Java 11: > {code} > YarnSchedulerBackendSuite: > - RequestExecutors reflects node blacklist and is serializable > - Respect user filters when adding AM IP filter *** FAILED *** > java.lang.ClassCastException: > org.spark_project.jetty.servlet.ServletContextHandler cannot be cast to > org.eclipse.jetty.servlet.ServletContextHandler > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237) > at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) > at scala.collection.TraversableLike.map(TraversableLike.scala:237) > at scala.collection.TraversableLike.map$(TraversableLike.scala:230) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at > org.apache.spark.scheduler.cluster.YarnSchedulerBackend.$anonfun$addWebUIFilter$2(YarnSchedulerBackend.scala:183) > at > org.apache.spark.scheduler.cluster.YarnSchedulerBackend.$anonfun$addWebUIFilter$2$adapted(YarnSchedulerBackend.scala:174) > at scala.Option.foreach(Option.scala:274) > ... > {code} > This looks like a classpath issue, probably ultimately related to the same > classloader issues in https://issues.apache.org/jira/browse/SPARK-26839 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-27122) YARN test failures in Java 9+
[ https://issues.apache.org/jira/browse/SPARK-27122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16792318#comment-16792318 ] Ajith S edited comment on SPARK-27122 at 3/14/19 4:14 AM: -- The problem seems to be shading of jetty package. When we run test, the class path seems to be made from the classes folder(resource-managers/yarn/target/scala-2.12/classes) instead of jar (resource-managers/yarn/target/spark-yarn_2.12-3.0.0-SNAPSHOT.jar) . Check attachment. Here in org.apache.spark.scheduler.cluster.YarnSchedulerBackend, as its not shaded, it expects org.eclipse.jetty.servlet.ServletContextHandler {code:java} ui.getHandlers.map(_.getServletHandler()).foreach { h => val holder = new FilterHolder(){code} ui.getHandlers is in spark-core and its loaded from spark-core.jar which is shaded and hence refers to org.spark_project.jetty.servlet.ServletContextHandler And here is the javap command which shows the difference between org.apache.spark.scheduler.cluster.YarnSchedulerBackend present in jar folder and classes folder !image-2019-03-14-09-35-23-046.png! was (Author: ajithshetty): The problem seems to be shading of jetty package. When we run test, the class path seems to be made from the classes folder(resource-managers/yarn/target/scala-2.12/classes) instead of jar (resource-managers/yarn/target/spark-yarn_2.12-3.0.0-SNAPSHOT.jar) . Check attachment. And here is the javap command which shows the difference between org.apache.spark.scheduler.cluster.YarnSchedulerBackend present in jar folder and classes folder !image-2019-03-14-09-35-23-046.png! > YARN test failures in Java 9+ > - > > Key: SPARK-27122 > URL: https://issues.apache.org/jira/browse/SPARK-27122 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 3.0.0 >Reporter: Sean Owen >Priority: Major > Attachments: image-2019-03-14-09-34-20-592.png, > image-2019-03-14-09-35-23-046.png > > > Currently on Java 11: > {code} > YarnSchedulerBackendSuite: > - RequestExecutors reflects node blacklist and is serializable > - Respect user filters when adding AM IP filter *** FAILED *** > java.lang.ClassCastException: > org.spark_project.jetty.servlet.ServletContextHandler cannot be cast to > org.eclipse.jetty.servlet.ServletContextHandler > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237) > at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) > at scala.collection.TraversableLike.map(TraversableLike.scala:237) > at scala.collection.TraversableLike.map$(TraversableLike.scala:230) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at > org.apache.spark.scheduler.cluster.YarnSchedulerBackend.$anonfun$addWebUIFilter$2(YarnSchedulerBackend.scala:183) > at > org.apache.spark.scheduler.cluster.YarnSchedulerBackend.$anonfun$addWebUIFilter$2$adapted(YarnSchedulerBackend.scala:174) > at scala.Option.foreach(Option.scala:274) > ... > {code} > This looks like a classpath issue, probably ultimately related to the same > classloader issues in https://issues.apache.org/jira/browse/SPARK-26839 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-27122) YARN test failures in Java 9+
[ https://issues.apache.org/jira/browse/SPARK-27122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16792318#comment-16792318 ] Ajith S edited comment on SPARK-27122 at 3/14/19 4:07 AM: -- The problem seems to be shading of jetty package. When we run test, the class path seems to be made from the classes folder(resource-managers/yarn/target/scala-2.12/classes) instead of jar (resource-managers/yarn/target/spark-yarn_2.12-3.0.0-SNAPSHOT.jar) . Check attachment. And here is the javap command which shows the difference between org.apache.spark.scheduler.cluster.YarnSchedulerBackend present in jar folder and classes folder !image-2019-03-14-09-35-23-046.png! was (Author: ajithshetty): The problem seems to be shading of jetty package. When we run test, the class path seems to be made from the classes folder(resource-managers/yarn/target/scala-2.12/classes) instead of jar (resource-managers/yarn/target/spark-yarn_2.12-3.0.0-SNAPSHOT.jar) And here is the javap command which shows the difference between org.apache.spark.scheduler.cluster.YarnSchedulerBackend present in jar folder and classes folder !image-2019-03-14-09-35-23-046.png! > YARN test failures in Java 9+ > - > > Key: SPARK-27122 > URL: https://issues.apache.org/jira/browse/SPARK-27122 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 3.0.0 >Reporter: Sean Owen >Priority: Major > Attachments: image-2019-03-14-09-34-20-592.png, > image-2019-03-14-09-35-23-046.png > > > Currently on Java 11: > {code} > YarnSchedulerBackendSuite: > - RequestExecutors reflects node blacklist and is serializable > - Respect user filters when adding AM IP filter *** FAILED *** > java.lang.ClassCastException: > org.spark_project.jetty.servlet.ServletContextHandler cannot be cast to > org.eclipse.jetty.servlet.ServletContextHandler > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237) > at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) > at scala.collection.TraversableLike.map(TraversableLike.scala:237) > at scala.collection.TraversableLike.map$(TraversableLike.scala:230) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at > org.apache.spark.scheduler.cluster.YarnSchedulerBackend.$anonfun$addWebUIFilter$2(YarnSchedulerBackend.scala:183) > at > org.apache.spark.scheduler.cluster.YarnSchedulerBackend.$anonfun$addWebUIFilter$2$adapted(YarnSchedulerBackend.scala:174) > at scala.Option.foreach(Option.scala:274) > ... > {code} > This looks like a classpath issue, probably ultimately related to the same > classloader issues in https://issues.apache.org/jira/browse/SPARK-26839 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-27122) YARN test failures in Java 9+
[ https://issues.apache.org/jira/browse/SPARK-27122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16792318#comment-16792318 ] Ajith S edited comment on SPARK-27122 at 3/14/19 4:06 AM: -- The problem seems to be shading of jetty package. When we run test, the class path seems to be made from the classes folder(resource-managers/yarn/target/scala-2.12/classes) instead of jar (resource-managers/yarn/target/spark-yarn_2.12-3.0.0-SNAPSHOT.jar) And here is the javap command which shows the difference between org.apache.spark.scheduler.cluster.YarnSchedulerBackend present in jar folder and classes folder !image-2019-03-14-09-35-23-046.png! was (Author: ajithshetty): The problem seems to be shading of jetty package. When we run test, the class path seems to be made from the classes folder(resource-managers/yarn/target/scala-2.12/classes) instead of jar (resource-managers/yarn/target/spark-yarn_2.12-3.0.0-SNAPSHOT.jar) Here is test classpath info: !image-2019-03-14-09-34-20-592.png! And here is the javap command which shows the difference between org.apache.spark.scheduler.cluster.YarnSchedulerBackend present in jar folder and classes folder !image-2019-03-14-09-35-23-046.png! > YARN test failures in Java 9+ > - > > Key: SPARK-27122 > URL: https://issues.apache.org/jira/browse/SPARK-27122 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 3.0.0 >Reporter: Sean Owen >Priority: Major > Attachments: image-2019-03-14-09-34-20-592.png, > image-2019-03-14-09-35-23-046.png > > > Currently on Java 11: > {code} > YarnSchedulerBackendSuite: > - RequestExecutors reflects node blacklist and is serializable > - Respect user filters when adding AM IP filter *** FAILED *** > java.lang.ClassCastException: > org.spark_project.jetty.servlet.ServletContextHandler cannot be cast to > org.eclipse.jetty.servlet.ServletContextHandler > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237) > at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) > at scala.collection.TraversableLike.map(TraversableLike.scala:237) > at scala.collection.TraversableLike.map$(TraversableLike.scala:230) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at > org.apache.spark.scheduler.cluster.YarnSchedulerBackend.$anonfun$addWebUIFilter$2(YarnSchedulerBackend.scala:183) > at > org.apache.spark.scheduler.cluster.YarnSchedulerBackend.$anonfun$addWebUIFilter$2$adapted(YarnSchedulerBackend.scala:174) > at scala.Option.foreach(Option.scala:274) > ... > {code} > This looks like a classpath issue, probably ultimately related to the same > classloader issues in https://issues.apache.org/jira/browse/SPARK-26839 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27122) YARN test failures in Java 9+
[ https://issues.apache.org/jira/browse/SPARK-27122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S updated SPARK-27122: Attachment: image-2019-03-14-09-35-23-046.png > YARN test failures in Java 9+ > - > > Key: SPARK-27122 > URL: https://issues.apache.org/jira/browse/SPARK-27122 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 3.0.0 >Reporter: Sean Owen >Priority: Major > Attachments: image-2019-03-14-09-34-20-592.png, > image-2019-03-14-09-35-23-046.png > > > Currently on Java 11: > {code} > YarnSchedulerBackendSuite: > - RequestExecutors reflects node blacklist and is serializable > - Respect user filters when adding AM IP filter *** FAILED *** > java.lang.ClassCastException: > org.spark_project.jetty.servlet.ServletContextHandler cannot be cast to > org.eclipse.jetty.servlet.ServletContextHandler > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237) > at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) > at scala.collection.TraversableLike.map(TraversableLike.scala:237) > at scala.collection.TraversableLike.map$(TraversableLike.scala:230) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at > org.apache.spark.scheduler.cluster.YarnSchedulerBackend.$anonfun$addWebUIFilter$2(YarnSchedulerBackend.scala:183) > at > org.apache.spark.scheduler.cluster.YarnSchedulerBackend.$anonfun$addWebUIFilter$2$adapted(YarnSchedulerBackend.scala:174) > at scala.Option.foreach(Option.scala:274) > ... > {code} > This looks like a classpath issue, probably ultimately related to the same > classloader issues in https://issues.apache.org/jira/browse/SPARK-26839 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27122) YARN test failures in Java 9+
[ https://issues.apache.org/jira/browse/SPARK-27122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16792318#comment-16792318 ] Ajith S commented on SPARK-27122: - The problem seems to be shading of jetty package. When we run test, the class path seems to be made from the classes folder(resource-managers/yarn/target/scala-2.12/classes) instead of jar (resource-managers/yarn/target/spark-yarn_2.12-3.0.0-SNAPSHOT.jar) Here is test classpath info: !image-2019-03-14-09-34-20-592.png! And here is the javap command which shows the difference between org.apache.spark.scheduler.cluster.YarnSchedulerBackend present in jar folder and classes folder !image-2019-03-14-09-35-23-046.png! > YARN test failures in Java 9+ > - > > Key: SPARK-27122 > URL: https://issues.apache.org/jira/browse/SPARK-27122 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 3.0.0 >Reporter: Sean Owen >Priority: Major > Attachments: image-2019-03-14-09-34-20-592.png, > image-2019-03-14-09-35-23-046.png > > > Currently on Java 11: > {code} > YarnSchedulerBackendSuite: > - RequestExecutors reflects node blacklist and is serializable > - Respect user filters when adding AM IP filter *** FAILED *** > java.lang.ClassCastException: > org.spark_project.jetty.servlet.ServletContextHandler cannot be cast to > org.eclipse.jetty.servlet.ServletContextHandler > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237) > at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) > at scala.collection.TraversableLike.map(TraversableLike.scala:237) > at scala.collection.TraversableLike.map$(TraversableLike.scala:230) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at > org.apache.spark.scheduler.cluster.YarnSchedulerBackend.$anonfun$addWebUIFilter$2(YarnSchedulerBackend.scala:183) > at > org.apache.spark.scheduler.cluster.YarnSchedulerBackend.$anonfun$addWebUIFilter$2$adapted(YarnSchedulerBackend.scala:174) > at scala.Option.foreach(Option.scala:274) > ... > {code} > This looks like a classpath issue, probably ultimately related to the same > classloader issues in https://issues.apache.org/jira/browse/SPARK-26839 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27122) YARN test failures in Java 9+
[ https://issues.apache.org/jira/browse/SPARK-27122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S updated SPARK-27122: Attachment: image-2019-03-14-09-34-20-592.png > YARN test failures in Java 9+ > - > > Key: SPARK-27122 > URL: https://issues.apache.org/jira/browse/SPARK-27122 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 3.0.0 >Reporter: Sean Owen >Priority: Major > Attachments: image-2019-03-14-09-34-20-592.png > > > Currently on Java 11: > {code} > YarnSchedulerBackendSuite: > - RequestExecutors reflects node blacklist and is serializable > - Respect user filters when adding AM IP filter *** FAILED *** > java.lang.ClassCastException: > org.spark_project.jetty.servlet.ServletContextHandler cannot be cast to > org.eclipse.jetty.servlet.ServletContextHandler > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237) > at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) > at scala.collection.TraversableLike.map(TraversableLike.scala:237) > at scala.collection.TraversableLike.map$(TraversableLike.scala:230) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at > org.apache.spark.scheduler.cluster.YarnSchedulerBackend.$anonfun$addWebUIFilter$2(YarnSchedulerBackend.scala:183) > at > org.apache.spark.scheduler.cluster.YarnSchedulerBackend.$anonfun$addWebUIFilter$2$adapted(YarnSchedulerBackend.scala:174) > at scala.Option.foreach(Option.scala:274) > ... > {code} > This looks like a classpath issue, probably ultimately related to the same > classloader issues in https://issues.apache.org/jira/browse/SPARK-26839 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27122) YARN test failures in Java 9+
[ https://issues.apache.org/jira/browse/SPARK-27122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16792299#comment-16792299 ] Ajith S commented on SPARK-27122: - I can reproduce this issue even in Java8. I would like to work on this. > YARN test failures in Java 9+ > - > > Key: SPARK-27122 > URL: https://issues.apache.org/jira/browse/SPARK-27122 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 3.0.0 >Reporter: Sean Owen >Priority: Major > > Currently on Java 11: > {code} > YarnSchedulerBackendSuite: > - RequestExecutors reflects node blacklist and is serializable > - Respect user filters when adding AM IP filter *** FAILED *** > java.lang.ClassCastException: > org.spark_project.jetty.servlet.ServletContextHandler cannot be cast to > org.eclipse.jetty.servlet.ServletContextHandler > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237) > at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) > at scala.collection.TraversableLike.map(TraversableLike.scala:237) > at scala.collection.TraversableLike.map$(TraversableLike.scala:230) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at > org.apache.spark.scheduler.cluster.YarnSchedulerBackend.$anonfun$addWebUIFilter$2(YarnSchedulerBackend.scala:183) > at > org.apache.spark.scheduler.cluster.YarnSchedulerBackend.$anonfun$addWebUIFilter$2$adapted(YarnSchedulerBackend.scala:174) > at scala.Option.foreach(Option.scala:274) > ... > {code} > This looks like a classpath issue, probably ultimately related to the same > classloader issues in https://issues.apache.org/jira/browse/SPARK-26839 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26961) Found Java-level deadlock in Spark Driver
[ https://issues.apache.org/jira/browse/SPARK-26961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16791751#comment-16791751 ] Ajith S edited comment on SPARK-26961 at 3/13/19 2:37 PM: -- 1) Yes, the registerAsParallelCapable will return true, but if you inspect the classloader instance, parallelLockMap is still null as it was already initalized via super class constructor. so *it has no effect for already created instance* !image-2019-03-13-19-53-52-390.png! 2) URLClassLoader is parallel capable as it does registration in static block which is before calling parent(ClassLoader) constructor. Also as per javadoc [https://docs.oracle.com/javase/8/docs/api/java/lang/ClassLoader.html] {code:java} Note that the ClassLoader class is registered as parallel capable by default. However, its subclasses still need to register themselves if they are parallel capable. {code} Hence MutableURLClassLoader lost its parallel capability by failing to register unlike URLClassLoader was (Author: ajithshetty): 1) Yes, the registerAsParallelCapable will return true, but if you inspect the classloader instance, parallelLockMap is still null as it was already initalized via super class constructor. so it has no effect !image-2019-03-13-19-53-52-390.png! 2) URLClassLoader is parallel capable as it does registration in static block which is before calling parent(ClassLoader) constructor. Also as per javadoc [https://docs.oracle.com/javase/8/docs/api/java/lang/ClassLoader.html] {code:java} Note that the ClassLoader class is registered as parallel capable by default. However, its subclasses still need to register themselves if they are parallel capable. {code} Hence MutableURLClassLoader lost its parallel capability by failing to register unlike URLClassLoader > Found Java-level deadlock in Spark Driver > - > > Key: SPARK-26961 > URL: https://issues.apache.org/jira/browse/SPARK-26961 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.3.0 >Reporter: Rong Jialei >Priority: Major > Attachments: image-2019-03-13-19-53-52-390.png > > > Our spark job usually will finish in minutes, however, we recently found it > take days to run, and we can only kill it when this happened. > An investigation show all worker container could not connect drive after > start, and driver is hanging, using jstack, we found a Java-level deadlock. > > *Jstack output for deadlock part is showing below:* > > Found one Java-level deadlock: > = > "SparkUI-907": > waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a > org.apache.hadoop.conf.Configuration), > which is held by "ForkJoinPool-1-worker-57" > "ForkJoinPool-1-worker-57": > waiting to lock monitor 0x7f3860574298 (object 0x0005b7991168, a > org.apache.spark.util.MutableURLClassLoader), > which is held by "ForkJoinPool-1-worker-7" > "ForkJoinPool-1-worker-7": > waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a > org.apache.hadoop.conf.Configuration), > which is held by "ForkJoinPool-1-worker-57" > Java stack information for the threads listed above: > === > "SparkUI-907": > at org.apache.hadoop.conf.Configuration.getOverlay(Configuration.java:1328) > - waiting to lock <0x0005c0c1e5e0> (a > org.apache.hadoop.conf.Configuration) > at > org.apache.hadoop.conf.Configuration.handleDeprecation(Configuration.java:684) > at org.apache.hadoop.conf.Configuration.get(Configuration.java:1088) > at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1145) > at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2363) > at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2840) > at > org.apache.hadoop.fs.FsUrlStreamHandlerFactory.createURLStreamHandler(FsUrlStreamHandlerFactory.java:74) > at java.net.URL.getURLStreamHandler(URL.java:1142) > at java.net.URL.(URL.java:599) > at java.net.URL.(URL.java:490) > at java.net.URL.(URL.java:439) > at org.apache.spark.ui.JettyUtils$$anon$4.doRequest(JettyUtils.scala:176) > at org.apache.spark.ui.JettyUtils$$anon$4.doGet(JettyUtils.scala:161) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at > org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) > at > org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772) > at > org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:171) > at > org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) > at > org.spark_pr
[jira] [Updated] (SPARK-27142) Provide REST API for SQL level information
[ https://issues.apache.org/jira/browse/SPARK-27142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S updated SPARK-27142: Description: Currently for Monitoring Spark application SQL information is not available from REST but only via UI. REST provides only applications,jobs,stages,environment. This Jira is targeted to provide a REST API so that SQL level information can be found Details: https://issues.apache.org/jira/browse/SPARK-27142?focusedCommentId=16791728&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16791728 was:Currently for Monitoring Spark application SQL information is not available from REST but only via UI. REST provides only applications,jobs,stages,environment. This Jira is targeted to provide a REST API so that SQL level information can be found > Provide REST API for SQL level information > -- > > Key: SPARK-27142 > URL: https://issues.apache.org/jira/browse/SPARK-27142 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.0.0 >Reporter: Ajith S >Priority: Minor > Attachments: image-2019-03-13-19-29-26-896.png > > > Currently for Monitoring Spark application SQL information is not available > from REST but only via UI. REST provides only > applications,jobs,stages,environment. This Jira is targeted to provide a REST > API so that SQL level information can be found > > Details: > https://issues.apache.org/jira/browse/SPARK-27142?focusedCommentId=16791728&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16791728 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26961) Found Java-level deadlock in Spark Driver
[ https://issues.apache.org/jira/browse/SPARK-26961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16791751#comment-16791751 ] Ajith S edited comment on SPARK-26961 at 3/13/19 2:32 PM: -- 1) Yes, the registerAsParallelCapable will return true, but if you inspect the classloader instance, parallelLockMap is still null as it was already initalized via super class constructor. so it has no effect !image-2019-03-13-19-53-52-390.png! 2) URLClassLoader is parallel capable as it does registration in static block which is before calling parent(ClassLoader) constructor. Also as per javadoc [https://docs.oracle.com/javase/8/docs/api/java/lang/ClassLoader.html] {code:java} Note that the ClassLoader class is registered as parallel capable by default. However, its subclasses still need to register themselves if they are parallel capable. {code} Hence MutableURLClassLoader lost its parallel capability by failing to register unlike URLClassLoader was (Author: ajithshetty): Yes, the registerAsParallelCapable will return true, but if you inspect the classloader instance, parallelLockMap is still null as it was already initalized via super class constructor. so it has no effect !image-2019-03-13-19-53-52-390.png! URLClassLoader is parallel capable as it does registration in static block which is before calling parent(ClassLoader) constructor > Found Java-level deadlock in Spark Driver > - > > Key: SPARK-26961 > URL: https://issues.apache.org/jira/browse/SPARK-26961 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.3.0 >Reporter: Rong Jialei >Priority: Major > Attachments: image-2019-03-13-19-53-52-390.png > > > Our spark job usually will finish in minutes, however, we recently found it > take days to run, and we can only kill it when this happened. > An investigation show all worker container could not connect drive after > start, and driver is hanging, using jstack, we found a Java-level deadlock. > > *Jstack output for deadlock part is showing below:* > > Found one Java-level deadlock: > = > "SparkUI-907": > waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a > org.apache.hadoop.conf.Configuration), > which is held by "ForkJoinPool-1-worker-57" > "ForkJoinPool-1-worker-57": > waiting to lock monitor 0x7f3860574298 (object 0x0005b7991168, a > org.apache.spark.util.MutableURLClassLoader), > which is held by "ForkJoinPool-1-worker-7" > "ForkJoinPool-1-worker-7": > waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a > org.apache.hadoop.conf.Configuration), > which is held by "ForkJoinPool-1-worker-57" > Java stack information for the threads listed above: > === > "SparkUI-907": > at org.apache.hadoop.conf.Configuration.getOverlay(Configuration.java:1328) > - waiting to lock <0x0005c0c1e5e0> (a > org.apache.hadoop.conf.Configuration) > at > org.apache.hadoop.conf.Configuration.handleDeprecation(Configuration.java:684) > at org.apache.hadoop.conf.Configuration.get(Configuration.java:1088) > at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1145) > at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2363) > at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2840) > at > org.apache.hadoop.fs.FsUrlStreamHandlerFactory.createURLStreamHandler(FsUrlStreamHandlerFactory.java:74) > at java.net.URL.getURLStreamHandler(URL.java:1142) > at java.net.URL.(URL.java:599) > at java.net.URL.(URL.java:490) > at java.net.URL.(URL.java:439) > at org.apache.spark.ui.JettyUtils$$anon$4.doRequest(JettyUtils.scala:176) > at org.apache.spark.ui.JettyUtils$$anon$4.doGet(JettyUtils.scala:161) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at > org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) > at > org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772) > at > org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:171) > at > org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) > at > org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) > at > org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) > at > org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) > at > org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) > at > org.spark_project.jetty.server.handler.ScopedHandler.handle(Sc
[jira] [Commented] (SPARK-26961) Found Java-level deadlock in Spark Driver
[ https://issues.apache.org/jira/browse/SPARK-26961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16791751#comment-16791751 ] Ajith S commented on SPARK-26961: - Yes, the registerAsParallelCapable will return true, but if you inspect the classloader instance, parallelLockMap is still null as it was already initalized via super class constructor. so it has no effect !image-2019-03-13-19-53-52-390.png! URLClassLoader is parallel capable as it does registration in static block which is before calling parent(ClassLoader) constructor > Found Java-level deadlock in Spark Driver > - > > Key: SPARK-26961 > URL: https://issues.apache.org/jira/browse/SPARK-26961 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.3.0 >Reporter: Rong Jialei >Priority: Major > Attachments: image-2019-03-13-19-53-52-390.png > > > Our spark job usually will finish in minutes, however, we recently found it > take days to run, and we can only kill it when this happened. > An investigation show all worker container could not connect drive after > start, and driver is hanging, using jstack, we found a Java-level deadlock. > > *Jstack output for deadlock part is showing below:* > > Found one Java-level deadlock: > = > "SparkUI-907": > waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a > org.apache.hadoop.conf.Configuration), > which is held by "ForkJoinPool-1-worker-57" > "ForkJoinPool-1-worker-57": > waiting to lock monitor 0x7f3860574298 (object 0x0005b7991168, a > org.apache.spark.util.MutableURLClassLoader), > which is held by "ForkJoinPool-1-worker-7" > "ForkJoinPool-1-worker-7": > waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a > org.apache.hadoop.conf.Configuration), > which is held by "ForkJoinPool-1-worker-57" > Java stack information for the threads listed above: > === > "SparkUI-907": > at org.apache.hadoop.conf.Configuration.getOverlay(Configuration.java:1328) > - waiting to lock <0x0005c0c1e5e0> (a > org.apache.hadoop.conf.Configuration) > at > org.apache.hadoop.conf.Configuration.handleDeprecation(Configuration.java:684) > at org.apache.hadoop.conf.Configuration.get(Configuration.java:1088) > at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1145) > at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2363) > at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2840) > at > org.apache.hadoop.fs.FsUrlStreamHandlerFactory.createURLStreamHandler(FsUrlStreamHandlerFactory.java:74) > at java.net.URL.getURLStreamHandler(URL.java:1142) > at java.net.URL.(URL.java:599) > at java.net.URL.(URL.java:490) > at java.net.URL.(URL.java:439) > at org.apache.spark.ui.JettyUtils$$anon$4.doRequest(JettyUtils.scala:176) > at org.apache.spark.ui.JettyUtils$$anon$4.doGet(JettyUtils.scala:161) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at > org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) > at > org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772) > at > org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:171) > at > org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) > at > org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) > at > org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) > at > org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) > at > org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) > at > org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493) > at > org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) > at > org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) > at org.spark_project.jetty.server.Server.handle(Server.java:534) > at org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:320) > at > org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) > at > org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) > at org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:108) > at > org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) > at >
[jira] [Updated] (SPARK-26961) Found Java-level deadlock in Spark Driver
[ https://issues.apache.org/jira/browse/SPARK-26961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S updated SPARK-26961: Attachment: image-2019-03-13-19-53-52-390.png > Found Java-level deadlock in Spark Driver > - > > Key: SPARK-26961 > URL: https://issues.apache.org/jira/browse/SPARK-26961 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.3.0 >Reporter: Rong Jialei >Priority: Major > Attachments: image-2019-03-13-19-53-52-390.png > > > Our spark job usually will finish in minutes, however, we recently found it > take days to run, and we can only kill it when this happened. > An investigation show all worker container could not connect drive after > start, and driver is hanging, using jstack, we found a Java-level deadlock. > > *Jstack output for deadlock part is showing below:* > > Found one Java-level deadlock: > = > "SparkUI-907": > waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a > org.apache.hadoop.conf.Configuration), > which is held by "ForkJoinPool-1-worker-57" > "ForkJoinPool-1-worker-57": > waiting to lock monitor 0x7f3860574298 (object 0x0005b7991168, a > org.apache.spark.util.MutableURLClassLoader), > which is held by "ForkJoinPool-1-worker-7" > "ForkJoinPool-1-worker-7": > waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a > org.apache.hadoop.conf.Configuration), > which is held by "ForkJoinPool-1-worker-57" > Java stack information for the threads listed above: > === > "SparkUI-907": > at org.apache.hadoop.conf.Configuration.getOverlay(Configuration.java:1328) > - waiting to lock <0x0005c0c1e5e0> (a > org.apache.hadoop.conf.Configuration) > at > org.apache.hadoop.conf.Configuration.handleDeprecation(Configuration.java:684) > at org.apache.hadoop.conf.Configuration.get(Configuration.java:1088) > at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1145) > at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2363) > at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2840) > at > org.apache.hadoop.fs.FsUrlStreamHandlerFactory.createURLStreamHandler(FsUrlStreamHandlerFactory.java:74) > at java.net.URL.getURLStreamHandler(URL.java:1142) > at java.net.URL.(URL.java:599) > at java.net.URL.(URL.java:490) > at java.net.URL.(URL.java:439) > at org.apache.spark.ui.JettyUtils$$anon$4.doRequest(JettyUtils.scala:176) > at org.apache.spark.ui.JettyUtils$$anon$4.doGet(JettyUtils.scala:161) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at > org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) > at > org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772) > at > org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:171) > at > org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) > at > org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) > at > org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) > at > org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) > at > org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) > at > org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493) > at > org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) > at > org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) > at org.spark_project.jetty.server.Server.handle(Server.java:534) > at org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:320) > at > org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) > at > org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) > at org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:108) > at > org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) > at > org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) > at > org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) > at java.lang.Thread.run(Thread.java:748) > "ForkJoinPool-1-worker-57": > at java.lang.ClassLoader.loadClass(ClassLoader.java:404) > - waiting to lock <0x0005b7991168> (a > org.apache.spark.util.Mu
[jira] [Updated] (SPARK-26961) Found Java-level deadlock in Spark Driver
[ https://issues.apache.org/jira/browse/SPARK-26961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S updated SPARK-26961: Attachment: (was: image-2019-03-13-19-51-38-708.png) > Found Java-level deadlock in Spark Driver > - > > Key: SPARK-26961 > URL: https://issues.apache.org/jira/browse/SPARK-26961 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.3.0 >Reporter: Rong Jialei >Priority: Major > > Our spark job usually will finish in minutes, however, we recently found it > take days to run, and we can only kill it when this happened. > An investigation show all worker container could not connect drive after > start, and driver is hanging, using jstack, we found a Java-level deadlock. > > *Jstack output for deadlock part is showing below:* > > Found one Java-level deadlock: > = > "SparkUI-907": > waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a > org.apache.hadoop.conf.Configuration), > which is held by "ForkJoinPool-1-worker-57" > "ForkJoinPool-1-worker-57": > waiting to lock monitor 0x7f3860574298 (object 0x0005b7991168, a > org.apache.spark.util.MutableURLClassLoader), > which is held by "ForkJoinPool-1-worker-7" > "ForkJoinPool-1-worker-7": > waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a > org.apache.hadoop.conf.Configuration), > which is held by "ForkJoinPool-1-worker-57" > Java stack information for the threads listed above: > === > "SparkUI-907": > at org.apache.hadoop.conf.Configuration.getOverlay(Configuration.java:1328) > - waiting to lock <0x0005c0c1e5e0> (a > org.apache.hadoop.conf.Configuration) > at > org.apache.hadoop.conf.Configuration.handleDeprecation(Configuration.java:684) > at org.apache.hadoop.conf.Configuration.get(Configuration.java:1088) > at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1145) > at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2363) > at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2840) > at > org.apache.hadoop.fs.FsUrlStreamHandlerFactory.createURLStreamHandler(FsUrlStreamHandlerFactory.java:74) > at java.net.URL.getURLStreamHandler(URL.java:1142) > at java.net.URL.(URL.java:599) > at java.net.URL.(URL.java:490) > at java.net.URL.(URL.java:439) > at org.apache.spark.ui.JettyUtils$$anon$4.doRequest(JettyUtils.scala:176) > at org.apache.spark.ui.JettyUtils$$anon$4.doGet(JettyUtils.scala:161) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at > org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) > at > org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772) > at > org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:171) > at > org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) > at > org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) > at > org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) > at > org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) > at > org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) > at > org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493) > at > org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) > at > org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) > at org.spark_project.jetty.server.Server.handle(Server.java:534) > at org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:320) > at > org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) > at > org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) > at org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:108) > at > org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) > at > org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) > at > org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) > at java.lang.Thread.run(Thread.java:748) > "ForkJoinPool-1-worker-57": > at java.lang.ClassLoader.loadClass(ClassLoader.java:404) > - waiting to lock <0x0005b7991168> (a > org.apache.spark.util.MutableURLClassLoader) > at java.lang.ClassLoader
[jira] [Updated] (SPARK-26961) Found Java-level deadlock in Spark Driver
[ https://issues.apache.org/jira/browse/SPARK-26961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S updated SPARK-26961: Attachment: image-2019-03-13-19-51-38-708.png > Found Java-level deadlock in Spark Driver > - > > Key: SPARK-26961 > URL: https://issues.apache.org/jira/browse/SPARK-26961 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.3.0 >Reporter: Rong Jialei >Priority: Major > Attachments: image-2019-03-13-19-51-38-708.png > > > Our spark job usually will finish in minutes, however, we recently found it > take days to run, and we can only kill it when this happened. > An investigation show all worker container could not connect drive after > start, and driver is hanging, using jstack, we found a Java-level deadlock. > > *Jstack output for deadlock part is showing below:* > > Found one Java-level deadlock: > = > "SparkUI-907": > waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a > org.apache.hadoop.conf.Configuration), > which is held by "ForkJoinPool-1-worker-57" > "ForkJoinPool-1-worker-57": > waiting to lock monitor 0x7f3860574298 (object 0x0005b7991168, a > org.apache.spark.util.MutableURLClassLoader), > which is held by "ForkJoinPool-1-worker-7" > "ForkJoinPool-1-worker-7": > waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a > org.apache.hadoop.conf.Configuration), > which is held by "ForkJoinPool-1-worker-57" > Java stack information for the threads listed above: > === > "SparkUI-907": > at org.apache.hadoop.conf.Configuration.getOverlay(Configuration.java:1328) > - waiting to lock <0x0005c0c1e5e0> (a > org.apache.hadoop.conf.Configuration) > at > org.apache.hadoop.conf.Configuration.handleDeprecation(Configuration.java:684) > at org.apache.hadoop.conf.Configuration.get(Configuration.java:1088) > at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1145) > at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2363) > at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2840) > at > org.apache.hadoop.fs.FsUrlStreamHandlerFactory.createURLStreamHandler(FsUrlStreamHandlerFactory.java:74) > at java.net.URL.getURLStreamHandler(URL.java:1142) > at java.net.URL.(URL.java:599) > at java.net.URL.(URL.java:490) > at java.net.URL.(URL.java:439) > at org.apache.spark.ui.JettyUtils$$anon$4.doRequest(JettyUtils.scala:176) > at org.apache.spark.ui.JettyUtils$$anon$4.doGet(JettyUtils.scala:161) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at > org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) > at > org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772) > at > org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:171) > at > org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) > at > org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) > at > org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) > at > org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) > at > org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) > at > org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493) > at > org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) > at > org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) > at org.spark_project.jetty.server.Server.handle(Server.java:534) > at org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:320) > at > org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) > at > org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) > at org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:108) > at > org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) > at > org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) > at > org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) > at java.lang.Thread.run(Thread.java:748) > "ForkJoinPool-1-worker-57": > at java.lang.ClassLoader.loadClass(ClassLoader.java:404) > - waiting to lock <0x0005b7991168> (a > org.apache.spark.util.Mu
[jira] [Updated] (SPARK-27142) Provide REST API for SQL level information
[ https://issues.apache.org/jira/browse/SPARK-27142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S updated SPARK-27142: Attachment: image-2019-03-13-19-29-26-896.png > Provide REST API for SQL level information > -- > > Key: SPARK-27142 > URL: https://issues.apache.org/jira/browse/SPARK-27142 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.0.0 >Reporter: Ajith S >Priority: Minor > Attachments: image-2019-03-13-19-29-26-896.png > > > Currently for Monitoring Spark application SQL information is not available > from REST but only via UI. REST provides only > applications,jobs,stages,environment. This Jira is targeted to provide a REST > API so that SQL level information can be found -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27142) Provide REST API for SQL level information
[ https://issues.apache.org/jira/browse/SPARK-27142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16791728#comment-16791728 ] Ajith S commented on SPARK-27142: - Ok apologies for being abstract about this requirement. Let me explain. A single SQL query can result into multiple jobs. So for end user who is using STS or spark-sql, the intended highest level of probe is the SQL which he has executed. This information can be seen from SQL tab. Attaching a sample. !image-2019-03-13-19-29-26-896.png! But same information he cannot access using the REST API exposed by spark and he always have to rely on jobs API which may be difficult. So i intend to expose the information seen in SQL tab in UI via REST API Mainly: # executionId : long # status : string - possible values COMPLETED/RUNNING/FAILED # description : string - executed SQL string # submissionTime : formatted time of SQL submission # duration : string - total run time # runningJobIds : Seq[Int] - sequence of running job ids # failedJobIds : Seq[Int] - sequence of failed job ids # successJobIds : Seq[Int] - sequence of success job ids > Provide REST API for SQL level information > -- > > Key: SPARK-27142 > URL: https://issues.apache.org/jira/browse/SPARK-27142 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.0.0 >Reporter: Ajith S >Priority: Minor > Attachments: image-2019-03-13-19-29-26-896.png > > > Currently for Monitoring Spark application SQL information is not available > from REST but only via UI. REST provides only > applications,jobs,stages,environment. This Jira is targeted to provide a REST > API so that SQL level information can be found -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27142) Provide REST API for SQL level information
[ https://issues.apache.org/jira/browse/SPARK-27142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S updated SPARK-27142: Attachment: (was: image-2019-03-13-19-19-27-831.png) > Provide REST API for SQL level information > -- > > Key: SPARK-27142 > URL: https://issues.apache.org/jira/browse/SPARK-27142 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.0.0 >Reporter: Ajith S >Priority: Minor > > Currently for Monitoring Spark application SQL information is not available > from REST but only via UI. REST provides only > applications,jobs,stages,environment. This Jira is targeted to provide a REST > API so that SQL level information can be found -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27142) Provide REST API for SQL level information
[ https://issues.apache.org/jira/browse/SPARK-27142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S updated SPARK-27142: Attachment: (was: image-2019-03-13-19-19-24-951.png) > Provide REST API for SQL level information > -- > > Key: SPARK-27142 > URL: https://issues.apache.org/jira/browse/SPARK-27142 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.0.0 >Reporter: Ajith S >Priority: Minor > > Currently for Monitoring Spark application SQL information is not available > from REST but only via UI. REST provides only > applications,jobs,stages,environment. This Jira is targeted to provide a REST > API so that SQL level information can be found -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27142) Provide REST API for SQL level information
[ https://issues.apache.org/jira/browse/SPARK-27142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S updated SPARK-27142: Attachment: image-2019-03-13-19-19-24-951.png > Provide REST API for SQL level information > -- > > Key: SPARK-27142 > URL: https://issues.apache.org/jira/browse/SPARK-27142 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.0.0 >Reporter: Ajith S >Priority: Minor > Attachments: image-2019-03-13-19-19-24-951.png, > image-2019-03-13-19-19-27-831.png > > > Currently for Monitoring Spark application SQL information is not available > from REST but only via UI. REST provides only > applications,jobs,stages,environment. This Jira is targeted to provide a REST > API so that SQL level information can be found -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27142) Provide REST API for SQL level information
[ https://issues.apache.org/jira/browse/SPARK-27142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S updated SPARK-27142: Attachment: image-2019-03-13-19-19-27-831.png > Provide REST API for SQL level information > -- > > Key: SPARK-27142 > URL: https://issues.apache.org/jira/browse/SPARK-27142 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.0.0 >Reporter: Ajith S >Priority: Minor > Attachments: image-2019-03-13-19-19-24-951.png, > image-2019-03-13-19-19-27-831.png > > > Currently for Monitoring Spark application SQL information is not available > from REST but only via UI. REST provides only > applications,jobs,stages,environment. This Jira is targeted to provide a REST > API so that SQL level information can be found -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26961) Found Java-level deadlock in Spark Driver
[ https://issues.apache.org/jira/browse/SPARK-26961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16791712#comment-16791712 ] Ajith S commented on SPARK-26961: - [~srowen] That too will not work Here is my custom classloader {code:java} class MYClassLoader(urls: Array[URL], parent: ClassLoader) extends URLClassLoader(urls, parent) { ClassLoader.registerAsParallelCapable() override def loadClass(name: String): Class[_] = { super.loadClass(name) } } {code} If we see class initialization flow, we see that super constructor is called before ClassLoader.registerAsParallelCapable() line is hit, hence it doesn't take effect {code:java} :280, ClassLoader (java.lang) :316, ClassLoader (java.lang) :76, SecureClassLoader (java.security) :100, URLClassLoader (java.net) :23, MYClassLoader (org.apache.spark.util.ajith) {code} as per [https://github.com/scala/bug/issues/11429] scala 2.x do not have a pure static support yet. So moving classloader to a java based implementation may be only option we have > Found Java-level deadlock in Spark Driver > - > > Key: SPARK-26961 > URL: https://issues.apache.org/jira/browse/SPARK-26961 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.3.0 >Reporter: Rong Jialei >Priority: Major > > Our spark job usually will finish in minutes, however, we recently found it > take days to run, and we can only kill it when this happened. > An investigation show all worker container could not connect drive after > start, and driver is hanging, using jstack, we found a Java-level deadlock. > > *Jstack output for deadlock part is showing below:* > > Found one Java-level deadlock: > = > "SparkUI-907": > waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a > org.apache.hadoop.conf.Configuration), > which is held by "ForkJoinPool-1-worker-57" > "ForkJoinPool-1-worker-57": > waiting to lock monitor 0x7f3860574298 (object 0x0005b7991168, a > org.apache.spark.util.MutableURLClassLoader), > which is held by "ForkJoinPool-1-worker-7" > "ForkJoinPool-1-worker-7": > waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a > org.apache.hadoop.conf.Configuration), > which is held by "ForkJoinPool-1-worker-57" > Java stack information for the threads listed above: > === > "SparkUI-907": > at org.apache.hadoop.conf.Configuration.getOverlay(Configuration.java:1328) > - waiting to lock <0x0005c0c1e5e0> (a > org.apache.hadoop.conf.Configuration) > at > org.apache.hadoop.conf.Configuration.handleDeprecation(Configuration.java:684) > at org.apache.hadoop.conf.Configuration.get(Configuration.java:1088) > at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1145) > at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2363) > at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2840) > at > org.apache.hadoop.fs.FsUrlStreamHandlerFactory.createURLStreamHandler(FsUrlStreamHandlerFactory.java:74) > at java.net.URL.getURLStreamHandler(URL.java:1142) > at java.net.URL.(URL.java:599) > at java.net.URL.(URL.java:490) > at java.net.URL.(URL.java:439) > at org.apache.spark.ui.JettyUtils$$anon$4.doRequest(JettyUtils.scala:176) > at org.apache.spark.ui.JettyUtils$$anon$4.doGet(JettyUtils.scala:161) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at > org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) > at > org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772) > at > org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:171) > at > org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) > at > org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) > at > org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) > at > org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) > at > org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) > at > org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493) > at > org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) > at > org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) > at org.spark_project.jetty.server.Server.handle(Server.java:534) > at org.spark_project.jetty.serve
[jira] [Commented] (SPARK-27143) Provide REST API for JDBC/ODBC level information
[ https://issues.apache.org/jira/browse/SPARK-27143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16791272#comment-16791272 ] Ajith S commented on SPARK-27143: - ping [~srowen] [~cloud_fan] [~dongjoon] Please suggest if this sounds reasonable > Provide REST API for JDBC/ODBC level information > > > Key: SPARK-27143 > URL: https://issues.apache.org/jira/browse/SPARK-27143 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Ajith S >Priority: Minor > > Currently for Monitoring Spark application JDBC/ODBC information is not > available from REST but only via UI. REST provides only > applications,jobs,stages,environment. This Jira is targeted to provide a REST > API so that JDBC/ODBC level information like session statistics, sql > staistics can be provided -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27143) Provide REST API for JDBC/ODBC level information
[ https://issues.apache.org/jira/browse/SPARK-27143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16791270#comment-16791270 ] Ajith S commented on SPARK-27143: - I will be working on this > Provide REST API for JDBC/ODBC level information > > > Key: SPARK-27143 > URL: https://issues.apache.org/jira/browse/SPARK-27143 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Ajith S >Priority: Minor > > Currently for Monitoring Spark application JDBC/ODBC information is not > available from REST but only via UI. REST provides only > applications,jobs,stages,environment. This Jira is targeted to provide a REST > API so that JDBC/ODBC level information like session statistics, sql > staistics can be provided -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27143) Provide REST API for JDBC/ODBC level information
Ajith S created SPARK-27143: --- Summary: Provide REST API for JDBC/ODBC level information Key: SPARK-27143 URL: https://issues.apache.org/jira/browse/SPARK-27143 Project: Spark Issue Type: New Feature Components: Spark Core Affects Versions: 3.0.0 Reporter: Ajith S Currently for Monitoring Spark application JDBC/ODBC information is not available from REST but only via UI. REST provides only applications,jobs,stages,environment. This Jira is targeted to provide a REST API so that JDBC/ODBC level information like session statistics, sql staistics can be provided -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27142) Provide REST API for SQL level information
[ https://issues.apache.org/jira/browse/SPARK-27142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16791269#comment-16791269 ] Ajith S commented on SPARK-27142: - I will be working on this > Provide REST API for SQL level information > -- > > Key: SPARK-27142 > URL: https://issues.apache.org/jira/browse/SPARK-27142 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.0.0 >Reporter: Ajith S >Priority: Minor > > Currently for Monitoring Spark application SQL information is not available > from REST but only via UI. REST provides only > applications,jobs,stages,environment. This Jira is targeted to provide a REST > API so that SQL level information can be found -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27142) Provide REST API for SQL level information
Ajith S created SPARK-27142: --- Summary: Provide REST API for SQL level information Key: SPARK-27142 URL: https://issues.apache.org/jira/browse/SPARK-27142 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 3.0.0 Reporter: Ajith S Currently for Monitoring Spark application SQL information is not available from REST but only via UI. REST provides only applications,jobs,stages,environment. This Jira is targeted to provide a REST API so that SQL level information can be found -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26961) Found Java-level deadlock in Spark Driver
[ https://issues.apache.org/jira/browse/SPARK-26961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16790208#comment-16790208 ] Ajith S edited comment on SPARK-26961 at 3/12/19 6:45 AM: -- [~srowen] Yes. I too have same opinion of fixing it via registerAsParallelCapable. But Its not possible to do via Companion Object. I Tried and found a issue. Refer [https://github.com/scala/bug/issues/11429] May be we need to move them to java implementation from scala to achieve this [~xsapphire] i think these class loaders are child classloaders of LaunchAppClassLoader which already has classes for jar in class path. So overhead may not be of higher magnitude was (Author: ajithshetty): [~srowen] Yes. I too have same opinion of fixing it via registerAsParallelCapable. But Its not possible to do via Companion Object. I Tried and found a issue. Refer https://github.com/scala/bug/issues/11429 [~xsapphire] i think these class loaders are child classloaders of LaunchAppClassLoader which already has classes for jar in class path. So overhead may not be of higher magnitude > Found Java-level deadlock in Spark Driver > - > > Key: SPARK-26961 > URL: https://issues.apache.org/jira/browse/SPARK-26961 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.3.0 >Reporter: Rong Jialei >Priority: Major > > Our spark job usually will finish in minutes, however, we recently found it > take days to run, and we can only kill it when this happened. > An investigation show all worker container could not connect drive after > start, and driver is hanging, using jstack, we found a Java-level deadlock. > > *Jstack output for deadlock part is showing below:* > > Found one Java-level deadlock: > = > "SparkUI-907": > waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a > org.apache.hadoop.conf.Configuration), > which is held by "ForkJoinPool-1-worker-57" > "ForkJoinPool-1-worker-57": > waiting to lock monitor 0x7f3860574298 (object 0x0005b7991168, a > org.apache.spark.util.MutableURLClassLoader), > which is held by "ForkJoinPool-1-worker-7" > "ForkJoinPool-1-worker-7": > waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a > org.apache.hadoop.conf.Configuration), > which is held by "ForkJoinPool-1-worker-57" > Java stack information for the threads listed above: > === > "SparkUI-907": > at org.apache.hadoop.conf.Configuration.getOverlay(Configuration.java:1328) > - waiting to lock <0x0005c0c1e5e0> (a > org.apache.hadoop.conf.Configuration) > at > org.apache.hadoop.conf.Configuration.handleDeprecation(Configuration.java:684) > at org.apache.hadoop.conf.Configuration.get(Configuration.java:1088) > at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1145) > at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2363) > at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2840) > at > org.apache.hadoop.fs.FsUrlStreamHandlerFactory.createURLStreamHandler(FsUrlStreamHandlerFactory.java:74) > at java.net.URL.getURLStreamHandler(URL.java:1142) > at java.net.URL.(URL.java:599) > at java.net.URL.(URL.java:490) > at java.net.URL.(URL.java:439) > at org.apache.spark.ui.JettyUtils$$anon$4.doRequest(JettyUtils.scala:176) > at org.apache.spark.ui.JettyUtils$$anon$4.doGet(JettyUtils.scala:161) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at > org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) > at > org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772) > at > org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:171) > at > org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) > at > org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) > at > org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) > at > org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) > at > org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) > at > org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493) > at > org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) > at > org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) > at org.spark_proj
[jira] [Comment Edited] (SPARK-26961) Found Java-level deadlock in Spark Driver
[ https://issues.apache.org/jira/browse/SPARK-26961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16790208#comment-16790208 ] Ajith S edited comment on SPARK-26961 at 3/12/19 6:43 AM: -- [~srowen] Yes. I too have same opinion of fixing it via registerAsParallelCapable. But Its not possible to do via Companion Object. I Tried and found a issue. Refer https://github.com/scala/bug/issues/11429 [~xsapphire] i think these class loaders are child classloaders of LaunchAppClassLoader which already has classes for jar in class path. So overhead may not be of higher magnitude was (Author: ajithshetty): [~srowen] Yes. I too have same opinion of fixing it via registerAsParallelCapable. Will raise a PR for this [~xsapphire] i think these class loaders are child classloaders of LaunchAppClassLoader which already has classes for jar in class path. So overhead may not be of higher magnitude > Found Java-level deadlock in Spark Driver > - > > Key: SPARK-26961 > URL: https://issues.apache.org/jira/browse/SPARK-26961 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.3.0 >Reporter: Rong Jialei >Priority: Major > > Our spark job usually will finish in minutes, however, we recently found it > take days to run, and we can only kill it when this happened. > An investigation show all worker container could not connect drive after > start, and driver is hanging, using jstack, we found a Java-level deadlock. > > *Jstack output for deadlock part is showing below:* > > Found one Java-level deadlock: > = > "SparkUI-907": > waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a > org.apache.hadoop.conf.Configuration), > which is held by "ForkJoinPool-1-worker-57" > "ForkJoinPool-1-worker-57": > waiting to lock monitor 0x7f3860574298 (object 0x0005b7991168, a > org.apache.spark.util.MutableURLClassLoader), > which is held by "ForkJoinPool-1-worker-7" > "ForkJoinPool-1-worker-7": > waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a > org.apache.hadoop.conf.Configuration), > which is held by "ForkJoinPool-1-worker-57" > Java stack information for the threads listed above: > === > "SparkUI-907": > at org.apache.hadoop.conf.Configuration.getOverlay(Configuration.java:1328) > - waiting to lock <0x0005c0c1e5e0> (a > org.apache.hadoop.conf.Configuration) > at > org.apache.hadoop.conf.Configuration.handleDeprecation(Configuration.java:684) > at org.apache.hadoop.conf.Configuration.get(Configuration.java:1088) > at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1145) > at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2363) > at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2840) > at > org.apache.hadoop.fs.FsUrlStreamHandlerFactory.createURLStreamHandler(FsUrlStreamHandlerFactory.java:74) > at java.net.URL.getURLStreamHandler(URL.java:1142) > at java.net.URL.(URL.java:599) > at java.net.URL.(URL.java:490) > at java.net.URL.(URL.java:439) > at org.apache.spark.ui.JettyUtils$$anon$4.doRequest(JettyUtils.scala:176) > at org.apache.spark.ui.JettyUtils$$anon$4.doGet(JettyUtils.scala:161) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at > org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) > at > org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772) > at > org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:171) > at > org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) > at > org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) > at > org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) > at > org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) > at > org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) > at > org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493) > at > org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) > at > org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) > at org.spark_project.jetty.server.Server.handle(Server.java:534) > at org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:320) > at > org.spark_project.jetty.server.HttpConnection.o
[jira] [Commented] (SPARK-27011) reset command fails after cache table
[ https://issues.apache.org/jira/browse/SPARK-27011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16790220#comment-16790220 ] Ajith S commented on SPARK-27011: - @ [~cloud_fan] As [https://github.com/apache/spark/pull/23918] is merged, can we close this.? > reset command fails after cache table > - > > Key: SPARK-27011 > URL: https://issues.apache.org/jira/browse/SPARK-27011 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 2.3.3, 2.4.0, 3.0.0 >Reporter: Ajith S >Priority: Minor > > > h3. Commands to reproduce > spark-sql> create table abcde ( a int); > spark-sql> reset; // can work success > spark-sql> cache table abcde; > spark-sql> reset; //fails with exception > h3. Below is the stack > {{org.apache.spark.sql.catalyst.errors.package$TreeNodeException: makeCopy, > tree:}} > {{ResetCommand$}}{{at > org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)}} > {{ at > org.apache.spark.sql.catalyst.trees.TreeNode.makeCopy(TreeNode.scala:379)}} > {{ at > org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:216)}} > {{ at > org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:211)}} > {{ at > org.apache.spark.sql.catalyst.plans.QueryPlan.sameResult(QueryPlan.scala:259)}} > {{ at > org.apache.spark.sql.execution.CacheManager.$anonfun$lookupCachedData$3(CacheManager.scala:236)}} > {{ at > org.apache.spark.sql.execution.CacheManager.$anonfun$lookupCachedData$3$adapted(CacheManager.scala:236)}} > {{ at scala.collection.Iterator.find(Iterator.scala:993)}} > {{ at scala.collection.Iterator.find$(Iterator.scala:990)}} > {{ at scala.collection.AbstractIterator.find(Iterator.scala:1429)}} > {{ at scala.collection.IterableLike.find(IterableLike.scala:81)}} > {{ at scala.collection.IterableLike.find$(IterableLike.scala:80)}} > {{ at scala.collection.AbstractIterable.find(Iterable.scala:56)}} > {{ at > org.apache.spark.sql.execution.CacheManager.$anonfun$lookupCachedData$2(CacheManager.scala:236)}} > {{ at > org.apache.spark.sql.execution.CacheManager.readLock(CacheManager.scala:59)}} > {{ at > org.apache.spark.sql.execution.CacheManager.lookupCachedData(CacheManager.scala:236)}} > {{ at > org.apache.spark.sql.execution.CacheManager$$anonfun$1.applyOrElse(CacheManager.scala:250)}} > {{ at > org.apache.spark.sql.execution.CacheManager$$anonfun$1.applyOrElse(CacheManager.scala:241)}} > {{ at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$1(TreeNode.scala:258)}} > {{ at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72)}} > {{ at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:258)}} > {{ at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDown(LogicalPlan.scala:29)}} > {{ at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown(AnalysisHelper.scala:149)}} > {{ at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown$(AnalysisHelper.scala:147)}} > {{ at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)}} > {{ at > org.apache.spark.sql.execution.CacheManager.useCachedData(CacheManager.scala:241)}} > {{ at > org.apache.spark.sql.execution.QueryExecution.withCachedData$lzycompute(QueryExecution.scala:68)}} > {{ at > org.apache.spark.sql.execution.QueryExecution.withCachedData(QueryExecution.scala:65)}} > {{ at > org.apache.spark.sql.execution.QueryExecution.$anonfun$optimizedPlan$1(QueryExecution.scala:72)}} > {{ at > org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)}} > {{ at > org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:72)}} > {{ at > org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:71)}} > {{ at > org.apache.spark.sql.execution.QueryExecution.$anonfun$writePlans$4(QueryExecution.scala:139)}} > {{ at > org.apache.spark.sql.catalyst.plans.QueryPlan$.append(QueryPlan.scala:316)}} > {{ at > org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$writePlans(QueryExecution.scala:139)}} > {{ at > org.apache.spark.sql.execution.QueryExecution.toString(QueryExecution.scala:146)}} > {{ at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:82)}} > {{ at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:147)}} > {{ at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74)}} > {{ at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3346)}} > {{ at org.apache
[jira] [Commented] (SPARK-26961) Found Java-level deadlock in Spark Driver
[ https://issues.apache.org/jira/browse/SPARK-26961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16790208#comment-16790208 ] Ajith S commented on SPARK-26961: - [~srowen] Yes. I too have same opinion of fixing it via registerAsParallelCapable. Will raise a PR for this [~xsapphire] i think these class loaders are child classloaders of LaunchAppClassLoader which already has classes for jar in class path. So overhead may not be of higher magnitude > Found Java-level deadlock in Spark Driver > - > > Key: SPARK-26961 > URL: https://issues.apache.org/jira/browse/SPARK-26961 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.3.0 >Reporter: Rong Jialei >Priority: Major > > Our spark job usually will finish in minutes, however, we recently found it > take days to run, and we can only kill it when this happened. > An investigation show all worker container could not connect drive after > start, and driver is hanging, using jstack, we found a Java-level deadlock. > > *Jstack output for deadlock part is showing below:* > > Found one Java-level deadlock: > = > "SparkUI-907": > waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a > org.apache.hadoop.conf.Configuration), > which is held by "ForkJoinPool-1-worker-57" > "ForkJoinPool-1-worker-57": > waiting to lock monitor 0x7f3860574298 (object 0x0005b7991168, a > org.apache.spark.util.MutableURLClassLoader), > which is held by "ForkJoinPool-1-worker-7" > "ForkJoinPool-1-worker-7": > waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a > org.apache.hadoop.conf.Configuration), > which is held by "ForkJoinPool-1-worker-57" > Java stack information for the threads listed above: > === > "SparkUI-907": > at org.apache.hadoop.conf.Configuration.getOverlay(Configuration.java:1328) > - waiting to lock <0x0005c0c1e5e0> (a > org.apache.hadoop.conf.Configuration) > at > org.apache.hadoop.conf.Configuration.handleDeprecation(Configuration.java:684) > at org.apache.hadoop.conf.Configuration.get(Configuration.java:1088) > at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1145) > at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2363) > at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2840) > at > org.apache.hadoop.fs.FsUrlStreamHandlerFactory.createURLStreamHandler(FsUrlStreamHandlerFactory.java:74) > at java.net.URL.getURLStreamHandler(URL.java:1142) > at java.net.URL.(URL.java:599) > at java.net.URL.(URL.java:490) > at java.net.URL.(URL.java:439) > at org.apache.spark.ui.JettyUtils$$anon$4.doRequest(JettyUtils.scala:176) > at org.apache.spark.ui.JettyUtils$$anon$4.doGet(JettyUtils.scala:161) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at > org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) > at > org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772) > at > org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:171) > at > org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) > at > org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) > at > org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) > at > org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) > at > org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) > at > org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493) > at > org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) > at > org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) > at org.spark_project.jetty.server.Server.handle(Server.java:534) > at org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:320) > at > org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) > at > org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) > at org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:108) > at > org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) > at > org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) > at > org.spark_project.jetty.util.thread.QueuedThreadPoo
[jira] [Commented] (SPARK-27114) SQL Tab shows duplicate executions for some commands
[ https://issues.apache.org/jira/browse/SPARK-27114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16790136#comment-16790136 ] Ajith S commented on SPARK-27114: - [~srowen] as *LocalRelation* is eagerly evaluated hence can skip second evaluation, i suppose its not executed on second time as it will throw a exception(in this case table already exists). Currently it will use two execution IDs and it will fire a duplicate *SparkListenerSQLExecutionStart* event. This cause app store to record a duplicate event and hence it shows up in UI twice > SQL Tab shows duplicate executions for some commands > > > Key: SPARK-27114 > URL: https://issues.apache.org/jira/browse/SPARK-27114 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Ajith S >Priority: Minor > Attachments: Screenshot from 2019-03-09 14-04-07.png > > > run simple sql command > {{create table abc ( a int );}} > Open SQL tab in SparkUI, we can see duplicate entries for the execution. > Tested behaviour in thriftserver and sparksql > *check attachment* > The Problem seems be due to eager execution of commands @ > org.apache.spark.sql.Dataset#logicalPlan > After analysis for spark-sql, the call stacks for duplicate execution id > seems to be > {code:java} > $anonfun$withNewExecutionId$1:78, SQLExecution$ > (org.apache.spark.sql.execution) > apply:-1, 2057192703 > (org.apache.spark.sql.execution.SQLExecution$$$Lambda$1036) > withSQLConfPropagated:147, SQLExecution$ (org.apache.spark.sql.execution) > withNewExecutionId:74, SQLExecution$ (org.apache.spark.sql.execution) > withAction:3346, Dataset (org.apache.spark.sql) > :203, Dataset (org.apache.spark.sql) > ofRows:88, Dataset$ (org.apache.spark.sql) > sql:656, SparkSession (org.apache.spark.sql) > sql:685, SQLContext (org.apache.spark.sql) > run:63, SparkSQLDriver (org.apache.spark.sql.hive.thriftserver) > processCmd:372, SparkSQLCLIDriver (org.apache.spark.sql.hive.thriftserver) > processLine:376, CliDriver (org.apache.hadoop.hive.cli) > main:275, SparkSQLCLIDriver$ (org.apache.spark.sql.hive.thriftserver) > main:-1, SparkSQLCLIDriver (org.apache.spark.sql.hive.thriftserver) > invoke0:-1, NativeMethodAccessorImpl (sun.reflect) > invoke:62, NativeMethodAccessorImpl (sun.reflect) > invoke:43, DelegatingMethodAccessorImpl (sun.reflect) > invoke:498, Method (java.lang.reflect) > start:52, JavaMainApplication (org.apache.spark.deploy) > org$apache$spark$deploy$SparkSubmit$$runMain:855, SparkSubmit > (org.apache.spark.deploy) > doRunMain$1:162, SparkSubmit (org.apache.spark.deploy) > submit:185, SparkSubmit (org.apache.spark.deploy) > doSubmit:87, SparkSubmit (org.apache.spark.deploy) > doSubmit:934, SparkSubmit$$anon$2 (org.apache.spark.deploy) > main:943, SparkSubmit$ (org.apache.spark.deploy) > main:-1, SparkSubmit (org.apache.spark.deploy){code} > {code:java} > $anonfun$withNewExecutionId$1:78, SQLExecution$ > (org.apache.spark.sql.execution) > apply:-1, 2057192703 > (org.apache.spark.sql.execution.SQLExecution$$$Lambda$1036) > withSQLConfPropagated:147, SQLExecution$ (org.apache.spark.sql.execution) > withNewExecutionId:74, SQLExecution$ (org.apache.spark.sql.execution) > run:65, SparkSQLDriver (org.apache.spark.sql.hive.thriftserver) > processCmd:372, SparkSQLCLIDriver (org.apache.spark.sql.hive.thriftserver) > processLine:376, CliDriver (org.apache.hadoop.hive.cli) > main:275, SparkSQLCLIDriver$ (org.apache.spark.sql.hive.thriftserver) > main:-1, SparkSQLCLIDriver (org.apache.spark.sql.hive.thriftserver) > invoke0:-1, NativeMethodAccessorImpl (sun.reflect) > invoke:62, NativeMethodAccessorImpl (sun.reflect) > invoke:43, DelegatingMethodAccessorImpl (sun.reflect) > invoke:498, Method (java.lang.reflect) > start:52, JavaMainApplication (org.apache.spark.deploy) > org$apache$spark$deploy$SparkSubmit$$runMain:855, SparkSubmit > (org.apache.spark.deploy) > doRunMain$1:162, SparkSubmit (org.apache.spark.deploy) > submit:185, SparkSubmit (org.apache.spark.deploy) > doSubmit:87, SparkSubmit (org.apache.spark.deploy) > doSubmit:934, SparkSubmit$$anon$2 (org.apache.spark.deploy) > main:943, SparkSubmit$ (org.apache.spark.deploy) > main:-1, SparkSubmit (org.apache.spark.deploy){code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26152) Flaky test: BroadcastSuite
[ https://issues.apache.org/jira/browse/SPARK-26152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S updated SPARK-26152: Attachment: Screenshot from 2019-03-11 17-03-40.png > Flaky test: BroadcastSuite > -- > > Key: SPARK-26152 > URL: https://issues.apache.org/jira/browse/SPARK-26152 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Priority: Critical > Attachments: Screenshot from 2019-03-11 17-03-40.png > > > - > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7/5627 > (2018-11-16) > {code} > BroadcastSuite: > - Using TorrentBroadcast locally > - Accessing TorrentBroadcast variables from multiple threads > - Accessing TorrentBroadcast variables in a local cluster (encryption = off) > java.util.concurrent.RejectedExecutionException: Task > scala.concurrent.impl.CallbackRunnable@59428a1 rejected from > java.util.concurrent.ThreadPoolExecutor@4096a677[Shutting down, pool size = > 1, active threads = 1, queued tasks = 0, completed tasks = 0] > at > java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047) > at > java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369) > at > java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:668) > at > scala.concurrent.impl.ExecutionContextImpl$$anon$1.execute(ExecutionContextImpl.scala:134) > at > scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:68) > at > scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:284) > at > scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:284) > at > scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:284) > at scala.concurrent.Promise.complete(Promise.scala:49) > at scala.concurrent.Promise.complete$(Promise.scala:48) > at > scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:183) > at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:29) > at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60) > at > scala.concurrent.BatchingExecutor$Batch.processBatch$1(BatchingExecutor.scala:63) > at > scala.concurrent.BatchingExecutor$Batch.$anonfun$run$1(BatchingExecutor.scala:78) > at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12) > at > scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:81) > at > scala.concurrent.BatchingExecutor$Batch.run(BatchingExecutor.scala:55) > at > scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:870) > at scala.concurrent.BatchingExecutor.execute(BatchingExecutor.scala:106) > at > scala.concurrent.BatchingExecutor.execute$(BatchingExecutor.scala:103) > at > scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:868) > at > scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:68) > at > scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:284) > at > scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:284) > at > scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:284) > at scala.concurrent.Promise.complete(Promise.scala:49) > at scala.concurrent.Promise.complete$(Promise.scala:48) > at > scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:183) > at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:29) > at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > java.util.concurrent.RejectedExecutionException: Task > scala.concurrent.impl.CallbackRunnable@40a5bf17 rejected from > java.util.concurrent.ThreadPoolExecutor@5a73967[Shutting down, pool size = 1, > active threads = 1, queued tasks = 0, completed tasks = 0] > at > java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047) > at > java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369) > at > java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:668) > at > sc
[jira] [Comment Edited] (SPARK-26152) Flaky test: BroadcastSuite
[ https://issues.apache.org/jira/browse/SPARK-26152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16789462#comment-16789462 ] Ajith S edited comment on SPARK-26152 at 3/11/19 12:07 PM: --- I encountered this issue on latest master branch and see that its the race between *org.apache.spark.deploy.DeployMessages.WorkDirCleanup* event and *org.apache.spark.deploy.worker.Worker#onStop*. Here its possible that while the WorkDirCleanup event is being processed, *org.apache.spark.deploy.worker.Worker#cleanupThreadExecutor* was shutdown. hence any submission after ThreadPoolExecutor will result in *java.util.concurrent.RejectedExecutionException* Attaching the debug snapshot of same. I would like to work on this. Please suggest was (Author: ajithshetty): I encountered this issue and see that its the race between *org.apache.spark.deploy.DeployMessages.WorkDirCleanup* event and *org.apache.spark.deploy.worker.Worker#onStop*. Here its possible that while the WorkDirCleanup event is being processed, *org.apache.spark.deploy.worker.Worker#cleanupThreadExecutor* was shutdown. hence any submission after ThreadPoolExecutor will result in *java.util.concurrent.RejectedExecutionException* Attaching the debug snapshot of same. I would like to work on this. Please suggest > Flaky test: BroadcastSuite > -- > > Key: SPARK-26152 > URL: https://issues.apache.org/jira/browse/SPARK-26152 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Priority: Critical > > - > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7/5627 > (2018-11-16) > {code} > BroadcastSuite: > - Using TorrentBroadcast locally > - Accessing TorrentBroadcast variables from multiple threads > - Accessing TorrentBroadcast variables in a local cluster (encryption = off) > java.util.concurrent.RejectedExecutionException: Task > scala.concurrent.impl.CallbackRunnable@59428a1 rejected from > java.util.concurrent.ThreadPoolExecutor@4096a677[Shutting down, pool size = > 1, active threads = 1, queued tasks = 0, completed tasks = 0] > at > java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047) > at > java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369) > at > java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:668) > at > scala.concurrent.impl.ExecutionContextImpl$$anon$1.execute(ExecutionContextImpl.scala:134) > at > scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:68) > at > scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:284) > at > scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:284) > at > scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:284) > at scala.concurrent.Promise.complete(Promise.scala:49) > at scala.concurrent.Promise.complete$(Promise.scala:48) > at > scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:183) > at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:29) > at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60) > at > scala.concurrent.BatchingExecutor$Batch.processBatch$1(BatchingExecutor.scala:63) > at > scala.concurrent.BatchingExecutor$Batch.$anonfun$run$1(BatchingExecutor.scala:78) > at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12) > at > scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:81) > at > scala.concurrent.BatchingExecutor$Batch.run(BatchingExecutor.scala:55) > at > scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:870) > at scala.concurrent.BatchingExecutor.execute(BatchingExecutor.scala:106) > at > scala.concurrent.BatchingExecutor.execute$(BatchingExecutor.scala:103) > at > scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:868) > at > scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:68) > at > scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:284) > at > scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:284) > at > scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:284) > at scala.concurrent.Promise.complete(Promise.scala:49) > at scala.concurrent.Promise.complete$(Promise.scala:48) > at > scala.concurrent.impl.Promise$DefaultPromis
[jira] [Comment Edited] (SPARK-26152) Flaky test: BroadcastSuite
[ https://issues.apache.org/jira/browse/SPARK-26152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16789462#comment-16789462 ] Ajith S edited comment on SPARK-26152 at 3/11/19 12:07 PM: --- I encountered this issue and see that its the race between *org.apache.spark.deploy.DeployMessages.WorkDirCleanup* event and *org.apache.spark.deploy.worker.Worker#onStop*. Here its possible that while the WorkDirCleanup event is being processed, *org.apache.spark.deploy.worker.Worker#cleanupThreadExecutor* was shutdown. hence any submission after ThreadPoolExecutor will result in *java.util.concurrent.RejectedExecutionException* Attaching the debug snapshot of same. I would like to work on this. Please suggest was (Author: ajithshetty): I encountered this issue and see that its the race between ``org.apache.spark.deploy.DeployMessages.WorkDirCleanup`` event and onStop call of org.apache.spark.deploy.worker.Worker#onStop. Here its possible that while the WorkDirCleanup event is being processed, org.apache.spark.deploy.worker.Worker#cleanupThreadExecutor was shutdown Attaching the debug snapshot of same. I would like to work on this. Please suggest > Flaky test: BroadcastSuite > -- > > Key: SPARK-26152 > URL: https://issues.apache.org/jira/browse/SPARK-26152 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Priority: Critical > > - > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7/5627 > (2018-11-16) > {code} > BroadcastSuite: > - Using TorrentBroadcast locally > - Accessing TorrentBroadcast variables from multiple threads > - Accessing TorrentBroadcast variables in a local cluster (encryption = off) > java.util.concurrent.RejectedExecutionException: Task > scala.concurrent.impl.CallbackRunnable@59428a1 rejected from > java.util.concurrent.ThreadPoolExecutor@4096a677[Shutting down, pool size = > 1, active threads = 1, queued tasks = 0, completed tasks = 0] > at > java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047) > at > java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369) > at > java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:668) > at > scala.concurrent.impl.ExecutionContextImpl$$anon$1.execute(ExecutionContextImpl.scala:134) > at > scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:68) > at > scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:284) > at > scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:284) > at > scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:284) > at scala.concurrent.Promise.complete(Promise.scala:49) > at scala.concurrent.Promise.complete$(Promise.scala:48) > at > scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:183) > at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:29) > at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60) > at > scala.concurrent.BatchingExecutor$Batch.processBatch$1(BatchingExecutor.scala:63) > at > scala.concurrent.BatchingExecutor$Batch.$anonfun$run$1(BatchingExecutor.scala:78) > at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12) > at > scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:81) > at > scala.concurrent.BatchingExecutor$Batch.run(BatchingExecutor.scala:55) > at > scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:870) > at scala.concurrent.BatchingExecutor.execute(BatchingExecutor.scala:106) > at > scala.concurrent.BatchingExecutor.execute$(BatchingExecutor.scala:103) > at > scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:868) > at > scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:68) > at > scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:284) > at > scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:284) > at > scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:284) > at scala.concurrent.Promise.complete(Promise.scala:49) > at scala.concurrent.Promise.complete$(Promise.scala:48) > at > scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:183) > at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:29) > at scala