[jira] [Updated] (HUDI-4193) Fail to compile in osx aarch_64 environment
[ https://issues.apache.org/jira/browse/HUDI-4193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao updated HUDI-4193: -- Affects Version/s: 0.12.0 > Fail to compile in osx aarch_64 environment > --- > > Key: HUDI-4193 > URL: https://issues.apache.org/jira/browse/HUDI-4193 > Project: Apache Hudi > Issue Type: Bug > Components: kafka-connect >Affects Versions: 0.12.0 >Reporter: Saisai Shao >Priority: Minor > > hudi-kafka-connect module relies on protoc to generate code, current version > of protoc cannot support compiling under osx aarch_64 environment, which will > throw an error like below: > {code:java} > [ERROR] Failed to execute goal > com.github.os72:protoc-jar-maven-plugin:3.11.4:run (default) on project > hudi-kafka-connect: Error extracting protoc for version 3.11.4: Unsupported > platform: protoc-3.11.4-osx-aarch_64.exe -> [Help 1] > [ERROR] > [ERROR] To see the full stack trace of the errors, re-run Maven with the -e > switch. > [ERROR] Re-run Maven using the -X switch to enable full debug logging. > [ERROR] > [ERROR] For more information about the errors and possible solutions, please > read the following articles: > [ERROR] [Help 1] > http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException > [ERROR] > [ERROR] After correcting the problems, you can resume the build with the > command > [ERROR] mvn -rf :hudi-kafka-connect > {code} > So here proposing to upgrade protoc version to fix this. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HUDI-4193) Fail to compile in osx aarch_64 environment
Saisai Shao created HUDI-4193: - Summary: Fail to compile in osx aarch_64 environment Key: HUDI-4193 URL: https://issues.apache.org/jira/browse/HUDI-4193 Project: Apache Hudi Issue Type: Bug Components: kafka-connect Reporter: Saisai Shao hudi-kafka-connect module relies on protoc to generate code, current version of protoc cannot support compiling under osx aarch_64 environment, which will throw an error like below: {code:java} [ERROR] Failed to execute goal com.github.os72:protoc-jar-maven-plugin:3.11.4:run (default) on project hudi-kafka-connect: Error extracting protoc for version 3.11.4: Unsupported platform: protoc-3.11.4-osx-aarch_64.exe -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :hudi-kafka-connect {code} So here proposing to upgrade protoc version to fix this. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Closed] (LIVY-833) Livy allows users to see password in config files (spark.ssl.keyPassword,spark.ssl.keyStorePassword,spark.ssl.trustStorePassword, etc)
[ https://issues.apache.org/jira/browse/LIVY-833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao closed LIVY-833. Resolution: Won't Fix > Livy allows users to see password in config files > (spark.ssl.keyPassword,spark.ssl.keyStorePassword,spark.ssl.trustStorePassword, > etc) > -- > > Key: LIVY-833 > URL: https://issues.apache.org/jira/browse/LIVY-833 > Project: Livy > Issue Type: Bug > Components: Server >Affects Versions: 0.7.0 >Reporter: Kaidi Zhao >Priority: Major > Labels: security > > It looks like a regular user (client) of Livy, can use commands like: > spark.sparkContext.getConf().getAll() > The command will retry all spark configurations including those passwords > (such as spark.ssl.trustStorePassword, spark.ssl.keyPassword). > I would suggest to block / mask these password. > PS, Spark's UI fixed this issue in this > https://issues.apache.org/jira/browse/SPARK-16796 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (LIVY-833) Livy allows users to see password in config files (spark.ssl.keyPassword,spark.ssl.keyStorePassword,spark.ssl.trustStorePassword, etc)
[ https://issues.apache.org/jira/browse/LIVY-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17289457#comment-17289457 ] Saisai Shao commented on LIVY-833: -- This is the problem of Spark, not Livy. Spark uses the configuration to store everything including passwords, and user could get configurations within application through many ways. Besides Livy, user still could get password by using spark-shell, spark-submit and others. If user could submit code through Livy to spark when Livy security is enabled, it means user permission to execute code, it is acceptable to see the passwords. > Livy allows users to see password in config files > (spark.ssl.keyPassword,spark.ssl.keyStorePassword,spark.ssl.trustStorePassword, > etc) > -- > > Key: LIVY-833 > URL: https://issues.apache.org/jira/browse/LIVY-833 > Project: Livy > Issue Type: Bug > Components: Server >Affects Versions: 0.7.0 >Reporter: Kaidi Zhao >Priority: Major > Labels: security > > It looks like a regular user (client) of Livy, can use commands like: > spark.sparkContext.getConf().getAll() > The command will retry all spark configurations including those passwords > (such as spark.ssl.trustStorePassword, spark.ssl.keyPassword). > I would suggest to block / mask these password. > PS, Spark's UI fixed this issue in this > https://issues.apache.org/jira/browse/SPARK-16796 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (SPARK-13704) TaskSchedulerImpl.createTaskSetManager can be expensive, and result in lost executors due to blocked heartbeats
[ https://issues.apache.org/jira/browse/SPARK-13704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao updated SPARK-13704: Description: In some cases, TaskSchedulerImpl.createTaskSetManager can be expensive. For example, in a Yarn cluster, it may call the topology script for rack awareness. When submit a very large job in a very large Yarn cluster, the topology script may take signifiant time to run. And this blocks receiving executors' heartbeats, which may result in lost executors Stacktraces we observed which is related to this issue: {code}https://issues.apache.org/jira/browse/SPARK-13704# "dag-scheduler-event-loop" daemon prio=10 tid=0x7f8392875800 nid=0x26e8 runnable [0x7f83576f4000] java.lang.Thread.State: RUNNABLE at java.io.FileInputStream.readBytes(Native Method) at java.io.FileInputStream.read(FileInputStream.java:272) at java.io.BufferedInputStream.read1(BufferedInputStream.java:273) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) - locked <0xf551f460> (a java.lang.UNIXProcess$ProcessPipeInputStream) at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283) at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325) at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177) - locked <0xf5529740> (a java.io.InputStreamReader) at java.io.InputStreamReader.read(InputStreamReader.java:184) at java.io.BufferedReader.fill(BufferedReader.java:154) at java.io.BufferedReader.read1(BufferedReader.java:205) at java.io.BufferedReader.read(BufferedReader.java:279) - locked <0xf5529740> (a java.io.InputStreamReader) at org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:728) at org.apache.hadoop.util.Shell.runCommand(Shell.java:524) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715) at org.apache.hadoop.net.ScriptBasedMapping$RawScriptBasedMapping.runResolveCommand(ScriptBasedMapping.java:251) at org.apache.hadoop.net.ScriptBasedMapping$RawScriptBasedMapping.resolve(ScriptBasedMapping.java:188) at org.apache.hadoop.net.CachedDNSToSwitchMapping.resolve(CachedDNSToSwitchMapping.java:119) at org.apache.hadoop.yarn.util.RackResolver.coreResolve(RackResolver.java:101) at org.apache.hadoop.yarn.util.RackResolver.resolve(RackResolver.java:81) at org.apache.spark.scheduler.cluster.YarnScheduler.getRackForHost(YarnScheduler.scala:38) at org.apache.spark.scheduler.TaskSetManager$$anonfun$org$apache$spark$scheduler$TaskSetManager$$addPendingTask$1.apply(TaskSetManager.scala:210) at org.apache.spark.scheduler.TaskSetManager$$anonfun$org$apache$spark$scheduler$TaskSetManager$$addPendingTask$1.apply(TaskSetManager.scala:189) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.TaskSetManager.org$apache$spark$scheduler$TaskSetManager$$addPendingTask(TaskSetManager.scala:189) at org.apache.spark.scheduler.TaskSetManager$$anonfun$1.apply$mcVI$sp(TaskSetManager.scala:158) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at org.apache.spark.scheduler.TaskSetManager.(TaskSetManager.scala:157) at org.apache.spark.scheduler.TaskSchedulerImpl.createTaskSetManager(TaskSchedulerImpl.scala:187) at org.apache.spark.scheduler.TaskSchedulerImpl.submitTasks(TaskSchedulerImpl.scala:161) - locked <0xea3b8a88> (a org.apache.spark.scheduler.cluster.YarnScheduler) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:872) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:778) at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:762) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) "sparkDriver-akka.actor.default-dispatcher-15" daemon prio=10 tid=0x7f829c02 nid=0x2737 waiting for monitor entry [0x7f8355ebd000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.spark.scheduler.TaskSchedulerImpl.executorHeartbeatReceived(TaskSchedulerImpl.scala:362) - waiting to lock <0xea3b8a88> (a org.apache.spark.scheduler.cluster.YarnScheduler) at org.apache.spark.HeartbeatReceiver$$anonfun$receiv
[jira] [Closed] (LIVY-593) Adding Scala 2.12 support
[ https://issues.apache.org/jira/browse/LIVY-593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao closed LIVY-593. Fix Version/s: 0.8.0 Resolution: Duplicate > Adding Scala 2.12 support > - > > Key: LIVY-593 > URL: https://issues.apache.org/jira/browse/LIVY-593 > Project: Livy > Issue Type: Improvement >Affects Versions: 0.6.0 >Reporter: Pateyron >Priority: Major > Fix For: 0.8.0 > > > Spark 2.4.2 is build with Scala 2.12 however Apache Livy 0.6.0 do not support > Scala 2.12. > I am writing to let you know that Livy 0.6.0 do not work with Spark 2.4.2 > Scala 2.12. > Could you tell me when Livy will support Scala 2.12 ? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (LIVY-562) Add support of scala 2.12
[ https://issues.apache.org/jira/browse/LIVY-562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao closed LIVY-562. Fix Version/s: 0.8.0 Resolution: Duplicate > Add support of scala 2.12 > - > > Key: LIVY-562 > URL: https://issues.apache.org/jira/browse/LIVY-562 > Project: Livy > Issue Type: Improvement >Reporter: Viacheslav Tradunsky >Priority: Major > Fix For: 0.8.0 > > > I know that adding support of a new scala version is something that can be > time consuming and would require some testing phase, but it would be great at > least to have this support in integration-test module (no scala suffix > supported there, yet). > We can use livy with spark 2.4.0 with scala 2.12 and livy java client, but we > cannot use integration-test module for testing because it fails on scala > compatibility issues, such as > {code:java} > java.lang.NoSuchMethodError: > scala.Predef$.refArrayOps([Ljava/lang/Object;)Lscala/collection/mutable/ArrayOps; > at org.apache.livy.test.framework.MiniCluster.(MiniCluster.scala:211) > at org.apache.livy.test.framework.Cluster$.liftedTree1$1(Cluster.scala:102) > at > org.apache.livy.test.framework.Cluster$.cluster$lzycompute(Cluster.scala:100) > at org.apache.livy.test.framework.Cluster$.cluster(Cluster.scala:98) > at org.apache.livy.test.framework.Cluster$.get(Cluster.scala:125) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (LIVY-423) Adding Scala 2.12 support
[ https://issues.apache.org/jira/browse/LIVY-423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao resolved LIVY-423. -- Fix Version/s: 0.8.0 Assignee: Thomas Prelle Resolution: Fixed Issue resolved by pull request 300 https://github.com/apache/incubator-livy/pull/300 > Adding Scala 2.12 support > - > > Key: LIVY-423 > URL: https://issues.apache.org/jira/browse/LIVY-423 > Project: Livy > Issue Type: New Feature > Components: Build, Core, REPL >Reporter: Saisai Shao >Assignee: Thomas Prelle >Priority: Major > Fix For: 0.8.0 > > > Spark 2.3 already integrates with Scala 2.12 support, it will possibly > release 2.12 artifacts. So in the Livy side we should support Scala 2.12 > build and interpreter. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (LIVY-756) Add spark 3 support
[ https://issues.apache.org/jira/browse/LIVY-756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao resolved LIVY-756. -- Fix Version/s: 0.8.0 Assignee: Thomas Prelle Resolution: Fixed > Add spark 3 support > --- > > Key: LIVY-756 > URL: https://issues.apache.org/jira/browse/LIVY-756 > Project: Livy > Issue Type: Improvement >Reporter: Thomas Prelle >Assignee: Thomas Prelle >Priority: Major > Fix For: 0.8.0 > > Time Spent: 6h 50m > Remaining Estimate: 0h > > Spark 3 will be release soon. > A support of spark 3 will be nice. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (LIVY-756) Add spark 3 support
[ https://issues.apache.org/jira/browse/LIVY-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17150037#comment-17150037 ] Saisai Shao commented on LIVY-756: -- Issue resolved by pull request 300 https://github.com/apache/incubator-livy/pull/300 > Add spark 3 support > --- > > Key: LIVY-756 > URL: https://issues.apache.org/jira/browse/LIVY-756 > Project: Livy > Issue Type: Improvement >Reporter: Thomas Prelle >Priority: Major > Time Spent: 6h 50m > Remaining Estimate: 0h > > Spark 3 will be release soon. > A support of spark 3 will be nice. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (LIVY-751) Livy server should allow to customize LIVY_CLASSPATH
[ https://issues.apache.org/jira/browse/LIVY-751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao reassigned LIVY-751: Assignee: Shingo Furuyama > Livy server should allow to customize LIVY_CLASSPATH > > > Key: LIVY-751 > URL: https://issues.apache.org/jira/browse/LIVY-751 > Project: Livy > Issue Type: Improvement > Components: Server >Affects Versions: 0.7.0 >Reporter: Shingo Furuyama >Assignee: Shingo Furuyama >Priority: Minor > Fix For: 0.8.0 > > Time Spent: 20m > Remaining Estimate: 0h > > What we want to do is - specify LIVY_CLASSPATH at the time of booting livy > server to use the other version of hadoop, which is not included in livy > artifact. > The background is - we are trying to use livy 0.7.0-incubating and spark > 2.4.5 with YARN HDP2.6.4, but we encountered an error due to the > incompatibility of livy included hadoop and HDP2.6.4 hadoop. We came up with > a workaround that 1. remove hadoop from livy artifact by building with `mvn > -Dhadoop.scope=provided` and 2. add HDP2.6.4 hadoop jars to the classpath for > livy server by `hadoop classpath` command at the time of booting livy server. > However, bin/livy-server is not allowing change LIVY_CLASSPATH. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (LIVY-751) Livy server should allow to customize LIVY_CLASSPATH
[ https://issues.apache.org/jira/browse/LIVY-751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17067383#comment-17067383 ] Saisai Shao commented on LIVY-751: -- Issue resolved by pull request 282 https://github.com/apache/incubator-livy/pull/282 > Livy server should allow to customize LIVY_CLASSPATH > > > Key: LIVY-751 > URL: https://issues.apache.org/jira/browse/LIVY-751 > Project: Livy > Issue Type: Improvement > Components: Server >Affects Versions: 0.7.0 >Reporter: Shingo Furuyama >Priority: Minor > Fix For: 0.8.0 > > Time Spent: 20m > Remaining Estimate: 0h > > What we want to do is - specify LIVY_CLASSPATH at the time of booting livy > server to use the other version of hadoop, which is not included in livy > artifact. > The background is - we are trying to use livy 0.7.0-incubating and spark > 2.4.5 with YARN HDP2.6.4, but we encountered an error due to the > incompatibility of livy included hadoop and HDP2.6.4 hadoop. We came up with > a workaround that 1. remove hadoop from livy artifact by building with `mvn > -Dhadoop.scope=provided` and 2. add HDP2.6.4 hadoop jars to the classpath for > livy server by `hadoop classpath` command at the time of booting livy server. > However, bin/livy-server is not allowing change LIVY_CLASSPATH. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (LIVY-751) Livy server should allow to customize LIVY_CLASSPATH
[ https://issues.apache.org/jira/browse/LIVY-751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao resolved LIVY-751. -- Resolution: Fixed > Livy server should allow to customize LIVY_CLASSPATH > > > Key: LIVY-751 > URL: https://issues.apache.org/jira/browse/LIVY-751 > Project: Livy > Issue Type: Improvement > Components: Server >Affects Versions: 0.7.0 >Reporter: Shingo Furuyama >Priority: Minor > Fix For: 0.8.0 > > Time Spent: 20m > Remaining Estimate: 0h > > What we want to do is - specify LIVY_CLASSPATH at the time of booting livy > server to use the other version of hadoop, which is not included in livy > artifact. > The background is - we are trying to use livy 0.7.0-incubating and spark > 2.4.5 with YARN HDP2.6.4, but we encountered an error due to the > incompatibility of livy included hadoop and HDP2.6.4 hadoop. We came up with > a workaround that 1. remove hadoop from livy artifact by building with `mvn > -Dhadoop.scope=provided` and 2. add HDP2.6.4 hadoop jars to the classpath for > livy server by `hadoop classpath` command at the time of booting livy server. > However, bin/livy-server is not allowing change LIVY_CLASSPATH. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (LIVY-748) Add support for running Livy Integration tests against secure external clusters
[ https://issues.apache.org/jira/browse/LIVY-748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao resolved LIVY-748. -- Fix Version/s: 0.8.0 Assignee: Roger Liu Resolution: Fixed Issue resolved by pull request 278 https://github.com/apache/incubator-livy/pull/278 > Add support for running Livy Integration tests against secure external > clusters > --- > > Key: LIVY-748 > URL: https://issues.apache.org/jira/browse/LIVY-748 > Project: Livy > Issue Type: Improvement >Reporter: Roger Liu >Assignee: Roger Liu >Priority: Major > Fix For: 0.8.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Add support so Livy integration tests can be run against secure external > clusters. Currently Livy integration tests only test Livy functionality > against a minicluster and does not support running them against an external > livy endpoint -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (LIVY-423) Adding Scala 2.12 support
[ https://issues.apache.org/jira/browse/LIVY-423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17045049#comment-17045049 ] Saisai Shao commented on LIVY-423: -- We don't typically assign JIRA to someone before it merges. I think you could leave message here in this JIRA that you're working on this, once the feature is merged, we will assign this to you. > Adding Scala 2.12 support > - > > Key: LIVY-423 > URL: https://issues.apache.org/jira/browse/LIVY-423 > Project: Livy > Issue Type: New Feature > Components: Build, Core, REPL >Reporter: Saisai Shao >Priority: Major > > Spark 2.3 already integrates with Scala 2.12 support, it will possibly > release 2.12 artifacts. So in the Livy side we should support Scala 2.12 > build and interpreter. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (LIVY-423) Adding Scala 2.12 support
[ https://issues.apache.org/jira/browse/LIVY-423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044012#comment-17044012 ] Saisai Shao commented on LIVY-423: -- No update on it. There's no one working on this feature, you can take it if you're interested. > Adding Scala 2.12 support > - > > Key: LIVY-423 > URL: https://issues.apache.org/jira/browse/LIVY-423 > Project: Livy > Issue Type: New Feature > Components: Build, Core, REPL >Reporter: Saisai Shao >Priority: Major > > Spark 2.3 already integrates with Scala 2.12 support, it will possibly > release 2.12 artifacts. So in the Livy side we should support Scala 2.12 > build and interpreter. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (SPARK-30586) NPE in LiveRDDDistribution (AppStatusListener)
[ https://issues.apache.org/jira/browse/SPARK-30586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036734#comment-17036734 ] Saisai Shao commented on SPARK-30586: - We also met the same issue. Seems like the code doesn't check the nullable of string and directly called String intern, which throws NPE from guava. My first thinking is to add nullable check in {{weakIntern}}. Still investigating how this could be happened, might be due to the lost or out-of-order spark listener event. > NPE in LiveRDDDistribution (AppStatusListener) > -- > > Key: SPARK-30586 > URL: https://issues.apache.org/jira/browse/SPARK-30586 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.4 > Environment: A Hadoop cluster consisting of Centos 7.4 machines. >Reporter: Jan Van den bosch >Priority: Major > > We've been noticing a great amount of NullPointerExceptions in our > long-running Spark job driver logs: > {noformat} > 20/01/17 23:40:12 ERROR AsyncEventQueue: Listener AppStatusListener threw an > exception > java.lang.NullPointerException > at > org.spark_project.guava.base.Preconditions.checkNotNull(Preconditions.java:191) > at > org.spark_project.guava.collect.MapMakerInternalMap.putIfAbsent(MapMakerInternalMap.java:3507) > at > org.spark_project.guava.collect.Interners$WeakInterner.intern(Interners.java:85) > at > org.apache.spark.status.LiveEntityHelpers$.weakIntern(LiveEntity.scala:603) > at > org.apache.spark.status.LiveRDDDistribution.toApi(LiveEntity.scala:486) > at > org.apache.spark.status.LiveRDD$$anonfun$2.apply(LiveEntity.scala:548) > at > org.apache.spark.status.LiveRDD$$anonfun$2.apply(LiveEntity.scala:548) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.mutable.HashMap$$anon$2$$anonfun$foreach$3.apply(HashMap.scala:139) > at > scala.collection.mutable.HashMap$$anon$2$$anonfun$foreach$3.apply(HashMap.scala:139) > at > scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236) > at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) > at scala.collection.mutable.HashMap$$anon$2.foreach(HashMap.scala:139) > at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at org.apache.spark.status.LiveRDD.doUpdate(LiveEntity.scala:548) > at org.apache.spark.status.LiveEntity.write(LiveEntity.scala:49) > at > org.apache.spark.status.AppStatusListener.org$apache$spark$status$AppStatusListener$$update(AppStatusListener.scala:991) > at > org.apache.spark.status.AppStatusListener.org$apache$spark$status$AppStatusListener$$maybeUpdate(AppStatusListener.scala:997) > at > org.apache.spark.status.AppStatusListener$$anonfun$onExecutorMetricsUpdate$2.apply(AppStatusListener.scala:764) > at > org.apache.spark.status.AppStatusListener$$anonfun$onExecutorMetricsUpdate$2.apply(AppStatusListener.scala:764) > at > scala.collection.mutable.HashMap$$anon$2$$anonfun$foreach$3.apply(HashMap.scala:139) > at > scala.collection.mutable.HashMap$$anon$2$$anonfun$foreach$3.apply(HashMap.scala:139) > at > scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236) > at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) > at scala.collection.mutable.HashMap$$anon$2.foreach(HashMap.scala:139) > at > org.apache.spark.status.AppStatusListener.org$apache$spark$status$AppStatusListener$$flush(AppStatusListener.scala:788) > at > org.apache.spark.status.AppStatusListener.onExecutorMetricsUpdate(AppStatusListener.scala:764) > at > org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:59) > at > org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) > at > org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) > at > org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:91) > at > org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$super$postToAll(AsyncEventQueue.scala:92) > at > org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply$mcJ$sp(AsyncEventQueue.scala:92) > at > org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
[jira] [Comment Edited] (SPARK-30586) NPE in LiveRDDDistribution (AppStatusListener)
[ https://issues.apache.org/jira/browse/SPARK-30586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036734#comment-17036734 ] Saisai Shao edited comment on SPARK-30586 at 2/14/20 7:13 AM: -- We also met the same issue. Seems like the code doesn't check the nullable of string and directly called String intern, which throws NPE from guava. My first thinking is to add nullable check in {{weakIntern}}. Still investigating how this could be happened, might be due to the lost or out-of-order spark listener event. CC [~vanzin] was (Author: jerryshao): We also met the same issue. Seems like the code doesn't check the nullable of string and directly called String intern, which throws NPE from guava. My first thinking is to add nullable check in {{weakIntern}}. Still investigating how this could be happened, might be due to the lost or out-of-order spark listener event. > NPE in LiveRDDDistribution (AppStatusListener) > -- > > Key: SPARK-30586 > URL: https://issues.apache.org/jira/browse/SPARK-30586 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.4 > Environment: A Hadoop cluster consisting of Centos 7.4 machines. >Reporter: Jan Van den bosch >Priority: Major > > We've been noticing a great amount of NullPointerExceptions in our > long-running Spark job driver logs: > {noformat} > 20/01/17 23:40:12 ERROR AsyncEventQueue: Listener AppStatusListener threw an > exception > java.lang.NullPointerException > at > org.spark_project.guava.base.Preconditions.checkNotNull(Preconditions.java:191) > at > org.spark_project.guava.collect.MapMakerInternalMap.putIfAbsent(MapMakerInternalMap.java:3507) > at > org.spark_project.guava.collect.Interners$WeakInterner.intern(Interners.java:85) > at > org.apache.spark.status.LiveEntityHelpers$.weakIntern(LiveEntity.scala:603) > at > org.apache.spark.status.LiveRDDDistribution.toApi(LiveEntity.scala:486) > at > org.apache.spark.status.LiveRDD$$anonfun$2.apply(LiveEntity.scala:548) > at > org.apache.spark.status.LiveRDD$$anonfun$2.apply(LiveEntity.scala:548) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.mutable.HashMap$$anon$2$$anonfun$foreach$3.apply(HashMap.scala:139) > at > scala.collection.mutable.HashMap$$anon$2$$anonfun$foreach$3.apply(HashMap.scala:139) > at > scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236) > at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) > at scala.collection.mutable.HashMap$$anon$2.foreach(HashMap.scala:139) > at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at org.apache.spark.status.LiveRDD.doUpdate(LiveEntity.scala:548) > at org.apache.spark.status.LiveEntity.write(LiveEntity.scala:49) > at > org.apache.spark.status.AppStatusListener.org$apache$spark$status$AppStatusListener$$update(AppStatusListener.scala:991) > at > org.apache.spark.status.AppStatusListener.org$apache$spark$status$AppStatusListener$$maybeUpdate(AppStatusListener.scala:997) > at > org.apache.spark.status.AppStatusListener$$anonfun$onExecutorMetricsUpdate$2.apply(AppStatusListener.scala:764) > at > org.apache.spark.status.AppStatusListener$$anonfun$onExecutorMetricsUpdate$2.apply(AppStatusListener.scala:764) > at > scala.collection.mutable.HashMap$$anon$2$$anonfun$foreach$3.apply(HashMap.scala:139) > at > scala.collection.mutable.HashMap$$anon$2$$anonfun$foreach$3.apply(HashMap.scala:139) > at > scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236) > at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) > at scala.collection.mutable.HashMap$$anon$2.foreach(HashMap.scala:139) > at > org.apache.spark.status.AppStatusListener.org$apache$spark$status$AppStatusListener$$flush(AppStatusListener.scala:788) > at > org.apache.spark.status.AppStatusListener.onExecutorMetricsUpdate(AppStatusListener.scala:764) > at > org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:59) > at > org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) > at > org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) > at > org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:91) > at > org.apache.spark.scheduler.As
[jira] [Commented] (LIVY-718) Support multi-active high availability in Livy
[ https://issues.apache.org/jira/browse/LIVY-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17014846#comment-17014846 ] Saisai Shao commented on LIVY-718: -- Active-active HA doesn't only address scalability issue, but also high availability. Personally I don't feel super useful for active-standby HA about Livy. Usually it is because master node has large amount of state to maintain, so it is hard to implement active-active HA with consistency. If this is not the case, then active-active HA is better both for HA and scalability. > Support multi-active high availability in Livy > -- > > Key: LIVY-718 > URL: https://issues.apache.org/jira/browse/LIVY-718 > Project: Livy > Issue Type: Epic > Components: RSC, Server >Reporter: Yiheng Wang >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > In this JIRA we want to discuss how to implement multi-active high > availability in Livy. > Currently, Livy only supports single node recovery. This is not sufficient in > some production environments. In our scenario, the Livy server serves many > notebook and JDBC services. We want to make Livy service more fault-tolerant > and scalable. > There're already some proposals in the community for high availability. But > they're not so complete or just for active-standby high availability. So we > propose a multi-active high availability design to achieve the following > goals: > # One or more servers will serve the client requests at the same time. > # Sessions are allocated among different servers. > # When one node crashes, the affected sessions will be moved to other active > services. > Here's our design document, please review and comment: > https://docs.google.com/document/d/1bD3qYZpw14_NuCcSGUOfqQ0pqvSbCQsOLFuZp26Ohjc/edit?usp=sharing > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (LIVY-718) Support multi-active high availability in Livy
[ https://issues.apache.org/jira/browse/LIVY-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013965#comment-17013965 ] Saisai Shao commented on LIVY-718: -- [~bikassaha] The merged two sub-tasks, they're actually required by both solutions (the one you proposed and another [~yihengw] proposed). That's why I merged it beforehand, they're not the key differences for two solutions. Actually the proposal [~yihengw] made is just the mid-term solution compared to stateless Livy Server, the key difference is to: 1. change the time when RSCDriver and Livy Server get reconnection. 2. Refactor the most of the current code to make Livy Server stateless. I'm more concerned about the 2nd point, because it has lots of works to do, and could easily introduce regressions. So IMHO, I think we could move on with the current mid-term proposal. If someone else want to pursue a stateless solution, they could simply continue based on our current solution, that would take less efforts compared to start from scratch. Just my two cents. > Support multi-active high availability in Livy > -- > > Key: LIVY-718 > URL: https://issues.apache.org/jira/browse/LIVY-718 > Project: Livy > Issue Type: Epic > Components: RSC, Server >Reporter: Yiheng Wang >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > In this JIRA we want to discuss how to implement multi-active high > availability in Livy. > Currently, Livy only supports single node recovery. This is not sufficient in > some production environments. In our scenario, the Livy server serves many > notebook and JDBC services. We want to make Livy service more fault-tolerant > and scalable. > There're already some proposals in the community for high availability. But > they're not so complete or just for active-standby high availability. So we > propose a multi-active high availability design to achieve the following > goals: > # One or more servers will serve the client requests at the same time. > # Sessions are allocated among different servers. > # When one node crashes, the affected sessions will be moved to other active > services. > Here's our design document, please review and comment: > https://docs.google.com/document/d/1bD3qYZpw14_NuCcSGUOfqQ0pqvSbCQsOLFuZp26Ohjc/edit?usp=sharing > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (LIVY-718) Support multi-active high availability in Livy
[ https://issues.apache.org/jira/browse/LIVY-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17011485#comment-17011485 ] Saisai Shao commented on LIVY-718: -- Hi [~shanyu] what is the main reason that we should have "active-standby" HA? From my understanding, looks like compared to active-active HA, active-standby HA seems not so useful, and current fail recovery could cover most of the scenarios. > Support multi-active high availability in Livy > -- > > Key: LIVY-718 > URL: https://issues.apache.org/jira/browse/LIVY-718 > Project: Livy > Issue Type: Epic > Components: RSC, Server >Reporter: Yiheng Wang >Priority: Major > > In this JIRA we want to discuss how to implement multi-active high > availability in Livy. > Currently, Livy only supports single node recovery. This is not sufficient in > some production environments. In our scenario, the Livy server serves many > notebook and JDBC services. We want to make Livy service more fault-tolerant > and scalable. > There're already some proposals in the community for high availability. But > they're not so complete or just for active-standby high availability. So we > propose a multi-active high availability design to achieve the following > goals: > # One or more servers will serve the client requests at the same time. > # Sessions are allocated among different servers. > # When one node crashes, the affected sessions will be moved to other active > services. > Here's our design document, please review and comment: > https://docs.google.com/document/d/1bD3qYZpw14_NuCcSGUOfqQ0pqvSbCQsOLFuZp26Ohjc/edit?usp=sharing > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (LIVY-735) Fix RPC Channel Closed When Multi Clients Connect to One Driver
[ https://issues.apache.org/jira/browse/LIVY-735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao resolved LIVY-735. -- Fix Version/s: 0.8.0 Assignee: runzhiwang Resolution: Fixed Issue resolved by pull request 268 https://github.com/apache/incubator-livy/pull/268 > Fix RPC Channel Closed When Multi Clients Connect to One Driver > > > Key: LIVY-735 > URL: https://issues.apache.org/jira/browse/LIVY-735 > Project: Livy > Issue Type: Sub-task >Reporter: Yiheng Wang >Assignee: runzhiwang >Priority: Major > Fix For: 0.8.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Currently, the driver tries to support communicating with multi-clients, by > registering each client at > https://github.com/apache/incubator-livy/blob/master/rsc/src/main/java/org/apache/livy/rsc/driver/RSCDriver.java#L220. > But actually, if multi-clients connect to one driver, the rpc channel will > close, the reason are as follows. > 1. In every communication, client sends two packages to driver: header\{type, > id}, and payload at > https://github.com/apache/incubator-livy/blob/master/rsc/src/main/java/org/apache/livy/rsc/rpc/RpcDispatcher.java#L144. > 2. If client1 sends header1, payload1, and client2 sends header2, payload2 at > the same time. > The driver receives the package in the order: header1, header2, payload1, > payload2. > 3. When driver receives header1, driver assigns lastHeader at > https://github.com/apache/incubator-livy/blob/master/rsc/src/main/java/org/apache/livy/rsc/rpc/RpcDispatcher.java#L73. > 4. Then driver receives header2, driver process it as a payload at > https://github.com/apache/incubator-livy/blob/master/rsc/src/main/java/org/apache/livy/rsc/rpc/RpcDispatcher.java#L78 > which cause exception and rpc channel closed. > In the muti-active HA mode, the design doc is at: > https://docs.google.com/document/d/1bD3qYZpw14_NuCcSGUOfqQ0pqvSbCQsOLFuZp26Ohjc/edit?usp=sharing, > the session is allocated among servers by consistent hashing. If a new livy > joins, some session will be migrated from old livy to new livy. If the > session client in new livy connect to driver before stoping session client in > old livy, then two session clients will both connect to driver, and rpc > channel close. In this case, it's hard to ensure only one client connect to > one driver at any time. So it's better to support multi-clients connect to > one driver, which has no side effects. > How to fix: > 1. Move the code of processing client message from `RpcDispatcher` to each > `Rpc`. > 2. Each `Rpc` registers itself to `channelRpc` in RpcDispatcher. > 3. `RpcDispatcher` dispatches each message to `Rpc` according to > `ctx.channel()`. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (LIVY-732) A Common Zookeeper Wrapper Utility
[ https://issues.apache.org/jira/browse/LIVY-732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao resolved LIVY-732. -- Fix Version/s: 0.8.0 Assignee: runzhiwang Resolution: Fixed Issue resolved by pull request 267 https://github.com/apache/incubator-livy/pull/267 > A Common Zookeeper Wrapper Utility > --- > > Key: LIVY-732 > URL: https://issues.apache.org/jira/browse/LIVY-732 > Project: Livy > Issue Type: Sub-task >Reporter: Yiheng Wang >Assignee: runzhiwang >Priority: Major > Fix For: 0.8.0 > > Time Spent: 3.5h > Remaining Estimate: 0h > > Currently, the utilities of zookeeper mixed with ZooKeeperStateStore. To use > the utility of zookeeper, the instance of ZooKeeperStateStore has to be > created , which looks weird. > This Jira aims to achieve two targets: > 1. Extract the utilities of zookeeper from ZooKeeperStateStore to support > such as distributed lock, service discovery and so on. > 2. ZooKeeperManager which contains the utilities of zookeeper should be a > single instance. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (LIVY-738) Fix Livy third-party library license generating issue
[ https://issues.apache.org/jira/browse/LIVY-738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao resolved LIVY-738. -- Fix Version/s: 0.8.0 0.7.0 Assignee: Saisai Shao Resolution: Fixed Issue resolved by pull request 272 https://github.com/apache/incubator-livy/pull/272 > Fix Livy third-party library license generating issue > - > > Key: LIVY-738 > URL: https://issues.apache.org/jira/browse/LIVY-738 > Project: Livy > Issue Type: Bug > Components: Build >Affects Versions: 0.8.0 >Reporter: Saisai Shao >Assignee: Saisai Shao >Priority: Minor > Fix For: 0.7.0, 0.8.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Some dependencies' licenses are missing, so it cannot correctly generate the > license information, here propose to fix them manually. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (LIVY-738) Fix Livy third-party library license generating issue
Saisai Shao created LIVY-738: Summary: Fix Livy third-party library license generating issue Key: LIVY-738 URL: https://issues.apache.org/jira/browse/LIVY-738 Project: Livy Issue Type: Bug Components: Build Affects Versions: 0.8.0 Reporter: Saisai Shao Some dependencies' licenses are missing, so it cannot correctly generate the license information, here propose to fix them manually. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (LIVY-718) Support multi-active high availability in Livy
[ https://issues.apache.org/jira/browse/LIVY-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17005895#comment-17005895 ] Saisai Shao commented on LIVY-718: -- IIUC, current JDBC part maintains lots of results/metadata/information on LivyServer, I think the key point mentioned by [~bikassaha] is to make Livy Server stateless, that would be an ideal solution, but currently it requires a large amount of works. > Support multi-active high availability in Livy > -- > > Key: LIVY-718 > URL: https://issues.apache.org/jira/browse/LIVY-718 > Project: Livy > Issue Type: Epic > Components: RSC, Server >Reporter: Yiheng Wang >Priority: Major > > In this JIRA we want to discuss how to implement multi-active high > availability in Livy. > Currently, Livy only supports single node recovery. This is not sufficient in > some production environments. In our scenario, the Livy server serves many > notebook and JDBC services. We want to make Livy service more fault-tolerant > and scalable. > There're already some proposals in the community for high availability. But > they're not so complete or just for active-standby high availability. So we > propose a multi-active high availability design to achieve the following > goals: > # One or more servers will serve the client requests at the same time. > # Sessions are allocated among different servers. > # When one node crashes, the affected sessions will be moved to other active > services. > Here's our design document, please review and comment: > https://docs.google.com/document/d/1bD3qYZpw14_NuCcSGUOfqQ0pqvSbCQsOLFuZp26Ohjc/edit?usp=sharing > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (LIVY-727) Session state always be idle though the yarn application has been killed after restart livy.
[ https://issues.apache.org/jira/browse/LIVY-727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao updated LIVY-727: - Affects Version/s: 0.7.0 0.6.0 > Session state always be idle though the yarn application has been killed > after restart livy. > - > > Key: LIVY-727 > URL: https://issues.apache.org/jira/browse/LIVY-727 > Project: Livy > Issue Type: Bug >Affects Versions: 0.6.0, 0.7.0 >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Fix For: 0.8.0 > > Time Spent: 50m > Remaining Estimate: 0h > > # Set livy.server.recovery.mode=recovery, and create a session in yarn-cluster > # Restart livy,then kill the yarn application of the session. > # The session state will always be idle and never change to killed or dead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (LIVY-727) Session state always be idle though the yarn application has been killed after restart livy.
[ https://issues.apache.org/jira/browse/LIVY-727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao resolved LIVY-727. -- Fix Version/s: 0.8.0 Assignee: runzhiwang Resolution: Fixed Issue resolved by pull request 272 https://github.com/apache/incubator-livy/pull/272 > Session state always be idle though the yarn application has been killed > after restart livy. > - > > Key: LIVY-727 > URL: https://issues.apache.org/jira/browse/LIVY-727 > Project: Livy > Issue Type: Bug >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Fix For: 0.8.0 > > Time Spent: 50m > Remaining Estimate: 0h > > # Set livy.server.recovery.mode=recovery, and create a session in yarn-cluster > # Restart livy,then kill the yarn application of the session. > # The session state will always be idle and never change to killed or dead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (LIVY-729) Livy should not recover the killed session
[ https://issues.apache.org/jira/browse/LIVY-729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao resolved LIVY-729. -- Fix Version/s: 0.8.0 Assignee: runzhiwang Resolution: Fixed Issue resolved by pull request 266 https://github.com/apache/incubator-livy/pull/266 > Livy should not recover the killed session > -- > > Key: LIVY-729 > URL: https://issues.apache.org/jira/browse/LIVY-729 > Project: Livy > Issue Type: Bug > Components: Server >Affects Versions: 0.6.0 >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Fix For: 0.8.0 > > Attachments: image-2019-12-18-08-56-46-925.png > > Time Spent: 20m > Remaining Estimate: 0h > > Follows are steps to reproduce the problem: > # Set livy.server.recovery.mode=recovery, and create a session: session0 in > yarn-cluster > # kill the yarn application of the session > # restart livy > # livy try to recover session0, but application has been killed and driver > does not exist, so client can not connect to driver, and exception was thrown > as the image. > # If the ip:port of the driver was reused by session1, client of session0 > will try to connect to driver of session1, then driver will throw exception: > Unexpected client ID. > # Both the exception threw by livy and driver will confused the user, and > recover a lot of killed sessions will delay the recover of alive session. > !image-2019-12-18-08-56-46-925.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (LIVY-717) Specify explicit ZooKeeper version in maven
[ https://issues.apache.org/jira/browse/LIVY-717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao resolved LIVY-717. -- Fix Version/s: 0.7.0 Assignee: Mate Szalay-Beko Resolution: Fixed Issue resolved by pull request 262 https://github.com/apache/incubator-livy/pull/262 > Specify explicit ZooKeeper version in maven > --- > > Key: LIVY-717 > URL: https://issues.apache.org/jira/browse/LIVY-717 > Project: Livy > Issue Type: Improvement >Reporter: Mate Szalay-Beko >Assignee: Mate Szalay-Beko >Priority: Major > Fix For: 0.7.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Hadoop trunk was updated recently to use Curator 4.0.2 and ZooKeeper 3.5.6, > see [HADOOP-16579|https://issues.apache.org/jira/browse/HADOOP-16579]. Now I > want to test Livy in a cluster where we have a new ZooKeeper 3.5 deployed. > When we want to use Livy in a cluster where a newer ZooKeeper server version > is used, we might run into run-time errors if we compile Livy using the > current Curator / Hadoop versions. The Curator version can already explicitly > set with the {{curator.version}} maven property in build time, but we were > still missed the same parameter for ZooKeeper. > In this PR I added a new maven parameter called {{zookeeper.version}} and > after analyzing the maven dependency tree, I made sure that the Curator and > ZooKeeper versions used in compile time are always harmonized and controlled > by the maven parameters. > I set the zookeeper.version in maven to {{3.4.6}} to be backward compatible > with the current Livy dependencies. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (LIVY-714) Cannot remove the app in leakedAppTags when timeout
[ https://issues.apache.org/jira/browse/LIVY-714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao resolved LIVY-714. -- Fix Version/s: 0.7.0 Assignee: runzhiwang Resolution: Fixed Issue resolved by pull request 259 https://github.com/apache/incubator-livy/pull/259 > Cannot remove the app in leakedAppTags when timeout > --- > > Key: LIVY-714 > URL: https://issues.apache.org/jira/browse/LIVY-714 > Project: Livy > Issue Type: New Feature >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Fix For: 0.7.0 > > Attachments: image-2019-11-20-09-50-52-316.png > > Time Spent: 20m > Remaining Estimate: 0h > > # var isRemoved = false should be in while(iter.hasNext), otherwise if there > are two apps, the first app will be removed and the second app will timeout > in this loop, and after remove the first app, isRemoved = true, and the > second app cannot pass the if(!isRemoved) and only will be delete in the next > loop. > # entry.getValue - now is negative, and never greater than > sessionLeakageCheckTimeout. > !image-2019-11-20-09-50-52-316.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (LIVY-715) The configuration in the livy.conf.template is inconsistent with LivyConf.scala
[ https://issues.apache.org/jira/browse/LIVY-715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao resolved LIVY-715. -- Fix Version/s: 0.7.0 Resolution: Fixed Issue resolved by pull request 261 https://github.com/apache/incubator-livy/pull/261 > The configuration in the livy.conf.template is inconsistent with > LivyConf.scala > --- > > Key: LIVY-715 > URL: https://issues.apache.org/jira/browse/LIVY-715 > Project: Livy > Issue Type: Bug > Components: Docs >Affects Versions: 0.6.0 >Reporter: mingchao zhao >Assignee: mingchao zhao >Priority: Major > Fix For: 0.7.0 > > Time Spent: 40m > Remaining Estimate: 0h > > When I test livy impersonation found that, in livy.conf.template the > value of livy.impersonation.enabled is true. So I thought impersonation was > enabled by default. > However, impersonation was not turned on when we test. I found that the > real configuration in LivyConf. scala is false. This can mislead users. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (LIVY-707) Add audit log for SqlJobs from ThriftServer
[ https://issues.apache.org/jira/browse/LIVY-707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao resolved LIVY-707. -- Fix Version/s: 0.7.0 Assignee: Hui An Resolution: Fixed Issue resolved by pull request 255 https://github.com/apache/incubator-livy/pull/255 > Add audit log for SqlJobs from ThriftServer > --- > > Key: LIVY-707 > URL: https://issues.apache.org/jira/browse/LIVY-707 > Project: Livy > Issue Type: Improvement > Components: Thriftserver >Reporter: Hui An >Assignee: Hui An >Priority: Minor > Fix For: 0.7.0 > > > The audit Log style is below > {code:java} > 19/11/06 16:38:30 INFO ThriftServerAudit$: user: test ipAddress: 10.25.22.46 > query: select count(*) from test1 beforeExecute: 1573029416951 afterExecute: > 1573029510972 time spent: 94021 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (LIVY-711) Travis fails to build on Ubuntu16.04
[ https://issues.apache.org/jira/browse/LIVY-711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao updated LIVY-711: - Component/s: Build > Travis fails to build on Ubuntu16.04 > > > Key: LIVY-711 > URL: https://issues.apache.org/jira/browse/LIVY-711 > Project: Livy > Issue Type: Bug > Components: Build >Affects Versions: 0.6.0 >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Fix For: 0.7.0 > > Attachments: image-2019-11-12-10-16-27-108.png, > image-2019-11-12-14-25-37-189.png > > Time Spent: 20m > Remaining Estimate: 0h > > !image-2019-11-12-14-25-37-189.png!!image-2019-11-12-10-16-27-108.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (LIVY-711) Travis fails to build on Ubuntu16.04
[ https://issues.apache.org/jira/browse/LIVY-711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao updated LIVY-711: - Affects Version/s: 0.6.0 > Travis fails to build on Ubuntu16.04 > > > Key: LIVY-711 > URL: https://issues.apache.org/jira/browse/LIVY-711 > Project: Livy > Issue Type: Bug >Affects Versions: 0.6.0 >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Fix For: 0.7.0 > > Attachments: image-2019-11-12-10-16-27-108.png, > image-2019-11-12-14-25-37-189.png > > Time Spent: 20m > Remaining Estimate: 0h > > !image-2019-11-12-14-25-37-189.png!!image-2019-11-12-10-16-27-108.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (LIVY-711) Travis fails to build on Ubuntu16.04
[ https://issues.apache.org/jira/browse/LIVY-711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao updated LIVY-711: - Issue Type: Bug (was: New Feature) > Travis fails to build on Ubuntu16.04 > > > Key: LIVY-711 > URL: https://issues.apache.org/jira/browse/LIVY-711 > Project: Livy > Issue Type: Bug >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Fix For: 0.7.0 > > Attachments: image-2019-11-12-10-16-27-108.png, > image-2019-11-12-14-25-37-189.png > > Time Spent: 20m > Remaining Estimate: 0h > > !image-2019-11-12-14-25-37-189.png!!image-2019-11-12-10-16-27-108.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (LIVY-711) Travis fails to build on Ubuntu16.04
[ https://issues.apache.org/jira/browse/LIVY-711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao resolved LIVY-711. -- Fix Version/s: 0.7.0 Assignee: runzhiwang Resolution: Fixed Issue resolved by pull request 257 https://github.com/apache/incubator-livy/pull/257 > Travis fails to build on Ubuntu16.04 > > > Key: LIVY-711 > URL: https://issues.apache.org/jira/browse/LIVY-711 > Project: Livy > Issue Type: New Feature >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Fix For: 0.7.0 > > Attachments: image-2019-11-12-10-16-27-108.png, > image-2019-11-12-14-25-37-189.png > > Time Spent: 20m > Remaining Estimate: 0h > > !image-2019-11-12-14-25-37-189.png!!image-2019-11-12-10-16-27-108.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (LIVY-708) The version of curator jars are not aligned
[ https://issues.apache.org/jira/browse/LIVY-708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao resolved LIVY-708. -- Fix Version/s: 0.7.0 Assignee: Yiheng Wang Resolution: Fixed Issue resolved by pull request 256 https://github.com/apache/incubator-livy/pull/256 > The version of curator jars are not aligned > --- > > Key: LIVY-708 > URL: https://issues.apache.org/jira/browse/LIVY-708 > Project: Livy > Issue Type: Bug > Components: Server >Affects Versions: 0.6.0 >Reporter: Yiheng Wang >Assignee: Yiheng Wang >Priority: Major > Fix For: 0.7.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Livy server has dependency of Apache Curator through hadoop client. However, > the versions of the curator jars are not aligned. Here're the curator jars > after build > * curator-client-2.7.1.jar > * curator-framework-2.7.1.jar > * curator-recipes-2.6.0.jar > This will cause Method not found issue in some case: > Exception in thread "main" java.lang.NoSuchMethodError: > {code:bash} > org.apache.curator.utils.PathUtils.validatePath(Ljava/lang/String;)V > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (LIVY-678) Livy Thrift-server Ordinary ldap authentication, based on ldap.url, basedn, domain
[ https://issues.apache.org/jira/browse/LIVY-678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao resolved LIVY-678. -- Fix Version/s: 0.7.0 Assignee: mingchao zhao Resolution: Fixed Issue resolved by pull request 236 https://github.com/apache/incubator-livy/pull/236 > Livy Thrift-server Ordinary ldap authentication, based on ldap.url, basedn, > domain > -- > > Key: LIVY-678 > URL: https://issues.apache.org/jira/browse/LIVY-678 > Project: Livy > Issue Type: Sub-task > Components: Thriftserver >Reporter: mingchao zhao >Assignee: mingchao zhao >Priority: Major > Fix For: 0.7.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Add provider to search for the user in the LDAP directory within the baseDN > tree. Access is granted if user has been found, and denied otherwise. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (LIVY-697) Rsc client cannot resolve the hostname of driver in yarn-cluster mode
[ https://issues.apache.org/jira/browse/LIVY-697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao resolved LIVY-697. -- Fix Version/s: 0.7.0 Assignee: runzhiwang Resolution: Fixed Issue resolved by pull request 246 https://github.com/apache/incubator-livy/pull/246 > Rsc client cannot resolve the hostname of driver in yarn-cluster mode > - > > Key: LIVY-697 > URL: https://issues.apache.org/jira/browse/LIVY-697 > Project: Livy > Issue Type: Bug > Components: RSC >Affects Versions: 0.6.0 >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Fix For: 0.7.0 > > Attachments: image-2019-10-13-12-44-41-861.png > > Time Spent: 40m > Remaining Estimate: 0h > > !image-2019-10-13-12-44-41-861.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (LIVY-690) Exclude curator in thrift server pom to avoid conflict jars
[ https://issues.apache.org/jira/browse/LIVY-690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao reassigned LIVY-690: Assignee: Yiheng Wang > Exclude curator in thrift server pom to avoid conflict jars > --- > > Key: LIVY-690 > URL: https://issues.apache.org/jira/browse/LIVY-690 > Project: Livy > Issue Type: Bug > Components: Thriftserver >Affects Versions: 0.6.0 >Reporter: Yiheng Wang >Assignee: Yiheng Wang >Priority: Major > Fix For: 0.7.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Currently, thrift server has a dependency of curator-client:2.12.0 through > the hive service. After the build, a curator-client-2.12.0.jar file will be > generated in the jars folder. It is conflicted with the > curator-client-2.7.1.jar file, which is used by livy server. > We observed that in some JDK, the curator-client-2.12.0.jar is loaded before > the curator-client-2.7.1.jar, and will crash the recovery enabled livy server. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (LIVY-690) Exclude curator in thrift server pom to avoid conflict jars
[ https://issues.apache.org/jira/browse/LIVY-690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao resolved LIVY-690. -- Resolution: Fixed Issue resolved by pull request 239 [https://github.com/apache/incubator-livy/pull/239] > Exclude curator in thrift server pom to avoid conflict jars > --- > > Key: LIVY-690 > URL: https://issues.apache.org/jira/browse/LIVY-690 > Project: Livy > Issue Type: Bug > Components: Thriftserver >Affects Versions: 0.6.0 >Reporter: Yiheng Wang >Priority: Major > Fix For: 0.7.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Currently, thrift server has a dependency of curator-client:2.12.0 through > the hive service. After the build, a curator-client-2.12.0.jar file will be > generated in the jars folder. It is conflicted with the > curator-client-2.7.1.jar file, which is used by livy server. > We observed that in some JDK, the curator-client-2.12.0.jar is loaded before > the curator-client-2.7.1.jar, and will crash the recovery enabled livy server. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (LIVY-688) Error message of LivyClient only keep outer stackTrace but discard cause's stackTrace
[ https://issues.apache.org/jira/browse/LIVY-688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao reassigned LIVY-688: Assignee: weiwenda > Error message of LivyClient only keep outer stackTrace but discard cause's > stackTrace > - > > Key: LIVY-688 > URL: https://issues.apache.org/jira/browse/LIVY-688 > Project: Livy > Issue Type: Improvement > Components: RSC >Affects Versions: 0.6.0 >Reporter: weiwenda >Assignee: weiwenda >Priority: Trivial > Fix For: 0.7.0 > > Time Spent: 20m > Remaining Estimate: 0h > > 1. SparkSession maybe failed when initialize ExternalCatalog at sometime. As > SparkSession call _SharedState.reflect_ to instance an ExternalCatalog, any > exception happened during this process will wrapped by > InvocationTargetException. > IllegalArgumentException > └──InvocationTargetException > └──the indeed Exception > 2. org.apache.livy.rsc.Utils.stackTraceAsString only keep > IllegalArgumentException's stackTrace but discard the indeed Exception's > stackTrace and message, which makes the final > java.util.concurrent.ExecutionException's message ambiguous. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (LIVY-688) Error message of LivyClient only keep outer stackTrace but discard cause's stackTrace
[ https://issues.apache.org/jira/browse/LIVY-688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao resolved LIVY-688. -- Fix Version/s: 0.7.0 Resolution: Fixed Issue resolved by pull request 237 [https://github.com/apache/incubator-livy/pull/237] > Error message of LivyClient only keep outer stackTrace but discard cause's > stackTrace > - > > Key: LIVY-688 > URL: https://issues.apache.org/jira/browse/LIVY-688 > Project: Livy > Issue Type: Improvement > Components: RSC >Affects Versions: 0.6.0 >Reporter: weiwenda >Priority: Trivial > Fix For: 0.7.0 > > Time Spent: 20m > Remaining Estimate: 0h > > 1. SparkSession maybe failed when initialize ExternalCatalog at sometime. As > SparkSession call _SharedState.reflect_ to instance an ExternalCatalog, any > exception happened during this process will wrapped by > InvocationTargetException. > IllegalArgumentException > └──InvocationTargetException > └──the indeed Exception > 2. org.apache.livy.rsc.Utils.stackTraceAsString only keep > IllegalArgumentException's stackTrace but discard the indeed Exception's > stackTrace and message, which makes the final > java.util.concurrent.ExecutionException's message ambiguous. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (LIVY-658) RSCDriver should catch exception if cancel job failed during shutdown
[ https://issues.apache.org/jira/browse/LIVY-658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao resolved LIVY-658. -- Fix Version/s: 0.7.0 Assignee: Jeffrey(Xilang) Yan Resolution: Fixed Issue resolved by pull request 223 https://github.com/apache/incubator-livy/pull/223 > RSCDriver should catch exception if cancel job failed during shutdown > - > > Key: LIVY-658 > URL: https://issues.apache.org/jira/browse/LIVY-658 > Project: Livy > Issue Type: Bug > Components: RSC >Reporter: Jeffrey(Xilang) Yan >Assignee: Jeffrey(Xilang) Yan >Priority: Major > Fix For: 0.7.0 > > Time Spent: 40m > Remaining Estimate: 0h > > Currently, if startup meet exception, exception will trigger spark to > shutdown, then trigger cancel job, but cancel job will throw another > exception due to spark is not initialized. The new exception will swallow the > old exception. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (LIVY-644) Flaky test: Failed to execute goal org.jacoco:jacoco-maven-plugin:0.8.2:report-aggregate (jacoco-report) on project livy-coverage-report
[ https://issues.apache.org/jira/browse/LIVY-644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao resolved LIVY-644. -- Fix Version/s: 0.7.0 Assignee: Yiheng Wang Resolution: Fixed Issue resolved by pull request 229 https://github.com/apache/incubator-livy/pull/229 > Flaky test: Failed to execute goal > org.jacoco:jacoco-maven-plugin:0.8.2:report-aggregate (jacoco-report) on > project livy-coverage-report > > > Key: LIVY-644 > URL: https://issues.apache.org/jira/browse/LIVY-644 > Project: Livy > Issue Type: Bug > Components: Tests >Reporter: Yiheng Wang >Assignee: Yiheng Wang >Priority: Minor > Fix For: 0.7.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Recently a lot of Travis job failed when generating coverage report: > [https://travis-ci.org/apache/incubator-livy/jobs/575142847] > [https://travis-ci.org/apache/incubator-livy/jobs/561700903] > [https://travis-ci.org/apache/incubator-livy/jobs/508574433] > [https://travis-ci.org/apache/incubator-livy/jobs/508574435] > [https://travis-ci.org/apache/incubator-livy/jobs/508066760] > [https://travis-ci.org/apache/incubator-livy/jobs/507989073] > [https://travis-ci.org/apache/incubator-livy/jobs/574702251] > [https://travis-ci.org/apache/incubator-livy/jobs/574686891] > [https://travis-ci.org/apache/incubator-livy/jobs/574363881] > [https://travis-ci.org/apache/incubator-livy/jobs/574215174] > [https://travis-ci.org/apache/incubator-livy/jobs/573689926] > > Here is the error stack: > > [ERROR] Failed to execute goal > org.jacoco:jacoco-maven-plugin:0.8.2:report-aggregate (jacoco-report) on > project livy-coverage-report: An error has occurred in JaCoCo Aggregate > report generation. Error while creating report: null: EOFException -> [Help 1] > 2988org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute > goal org.jacoco:jacoco-maven-plugin:0.8.2:report-aggregate (jacoco-report) on > project livy-coverage-report: An error has occurred in JaCoCo Aggregate > report generation. > 2989at org.apache.maven.lifecycle.internal.MojoExecutor.execute > (MojoExecutor.java:213) > 2990at org.apache.maven.lifecycle.internal.MojoExecutor.execute > (MojoExecutor.java:154) > 2991at org.apache.maven.lifecycle.internal.MojoExecutor.execute > (MojoExecutor.java:146) > 2992at > org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject > (LifecycleModuleBuilder.java:117) > 2993at > org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject > (LifecycleModuleBuilder.java:81) > 2994at > org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build > (SingleThreadedBuilder.java:51) > 2995at org.apache.maven.lifecycle.internal.LifecycleStarter.execute > (LifecycleStarter.java:128) > 2996at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:309) > 2997at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:194) > 2998at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:107) > 2999at org.apache.maven.cli.MavenCli.execute (MavenCli.java:955) > 3000at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:290) > 3001at org.apache.maven.cli.MavenCli.main (MavenCli.java:194) > 3002at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method) > 3003at sun.reflect.NativeMethodAccessorImpl.invoke > (NativeMethodAccessorImpl.java:62) > 3004at sun.reflect.DelegatingMethodAccessorImpl.invoke > (DelegatingMethodAccessorImpl.java:43) > 3005at java.lang.reflect.Method.invoke (Method.java:498) > 3006at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced > (Launcher.java:289) > 3007at org.codehaus.plexus.classworlds.launcher.Launcher.launch > (Launcher.java:229) > 3008at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode > (Launcher.java:415) > 3009at org.codehaus.plexus.classworlds.launcher.Launcher.main > (Launcher.java:356) > 3010Caused by: org.apache.maven.plugin.MojoExecutionException: An error has > occurred in JaCoCo Aggregate report generation. > 3011at org.jacoco.maven.AbstractReportMojo.execute > (AbstractReportMojo.java:167) > 3012at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo > (DefaultBuildPluginManager.java:134) > 3013at org.apache.maven.lifecycle.internal.MojoExecutor.execute > (MojoExecutor.java:208) > 3014at org.apache.maven.lifecycle.internal.MojoExecutor.execute > (MojoExecutor.java:154) > 3015at org.apache.maven.lifecycle.internal.MojoExecutor.execute > (MojoExecutor.java:146) > 3016at > org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject > (LifecycleModuleBuilder.java:117) > 3017at > org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.bu
[jira] [Resolved] (SPARK-29112) Expose more details when ApplicationMaster reporter faces a fatal exception
[ https://issues.apache.org/jira/browse/SPARK-29112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao resolved SPARK-29112. - Fix Version/s: 3.0.0 Assignee: Lantao Jin Resolution: Fixed Issue resolved by pull request 25810 https://github.com/apache/spark/pull/25810 > Expose more details when ApplicationMaster reporter faces a fatal exception > --- > > Key: SPARK-29112 > URL: https://issues.apache.org/jira/browse/SPARK-29112 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 2.4.4, 3.0.0 >Reporter: Lantao Jin >Assignee: Lantao Jin >Priority: Minor > Fix For: 3.0.0 > > > In {{ApplicationMaster.Reporter}} thread, fatal exception information is > swallowed. It's better to expose. > A thrift server was shutdown due to some fatal exception but no useful > information from log. > {code} > 19/09/16 06:59:54,498 INFO [Reporter] yarn.ApplicationMaster:54 : Final app > status: FAILED, exitCode: 12, (reason: Exception was thrown 1 time(s) from > Reporter thread.) > 19/09/16 06:59:54,500 ERROR [Driver] thriftserver.HiveThriftServer2:91 : > Error starting HiveThriftServer2 > java.lang.InterruptedException: sleep interrupted > at java.lang.Thread.sleep(Native Method) > at > org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:160) > at > org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.yarn.ApplicationMaster$$anon$4.run(ApplicationMaster.scala:708) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (LIVY-657) Travis failed on should not create sessions with duplicate names
[ https://issues.apache.org/jira/browse/LIVY-657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao resolved LIVY-657. -- Fix Version/s: 0.7.0 Assignee: runzhiwang Resolution: Fixed Issue resolved by pull request 225 https://github.com/apache/incubator-livy/pull/225 > Travis failed on should not create sessions with duplicate names > > > Key: LIVY-657 > URL: https://issues.apache.org/jira/browse/LIVY-657 > Project: Livy > Issue Type: Bug > Components: Tests >Affects Versions: 0.6.0 >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Fix For: 0.7.0 > > Time Spent: 40m > Remaining Estimate: 0h > > should not create sessions with duplicate names *** FAILED *** (17 > milliseconds) > session2.stopped was false (SessionManagerSpec.scala:96) > > please reference to https://travis-ci.org/apache/incubator-livy/jobs/579604782 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (LIVY-666) Support named interpreter groups
[ https://issues.apache.org/jira/browse/LIVY-666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16931976#comment-16931976 ] Saisai Shao commented on LIVY-666: -- I think it looks good to me. With the above, I think the REST APIs should also be updated, especially statements related ones, would you please propose the changes here, mainly about the compatibility. > Support named interpreter groups > > > Key: LIVY-666 > URL: https://issues.apache.org/jira/browse/LIVY-666 > Project: Livy > Issue Type: New Feature >Reporter: Naman Mishra >Priority: Major > Attachments: multiple_interpreter_groups.png > > > Currently, a session can contain only one interpreter group. In order to > support use case of multiple repls with the same spark application multiple > interpreters with different variable scoping (something similar to scoped > interpreter mode in Zeppelin: > [https://zeppelin.apache.org/docs/0.8.0/usage/interpreter/interpreter_binding_mode.html#scoped-mode > > |https://zeppelin.apache.org/docs/0.8.0/usage/interpreter/interpreter_binding_mode.html#scoped-mode]), > I propose to have "named interpreter groups", i.e., multiple interpreter > groups in a session all sharing a spark context. The interpreter group can be > specified on which the execution is supposed to happen in the execution API. > Similar ask has been put in LIVY-325 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (LIVY-633) session should not be gc-ed for long running queries
[ https://issues.apache.org/jira/browse/LIVY-633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao resolved LIVY-633. -- Fix Version/s: 0.7.0 Assignee: Yiheng Wang Resolution: Fixed Issue resolved by pull request 224 https://github.com/apache/incubator-livy/pull/224 > session should not be gc-ed for long running queries > > > Key: LIVY-633 > URL: https://issues.apache.org/jira/browse/LIVY-633 > Project: Livy > Issue Type: Bug > Components: Server >Affects Versions: 0.6.0 >Reporter: Liju >Assignee: Yiheng Wang >Priority: Major > Fix For: 0.7.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > If you have set a relatively small session timeout eg 15 mins and query > execution is taking > 15 mins , the session gets gc-ed , which is incorrect > wrt user experience as the user was still active on session and waiting for > result -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (SPARK-29114) Dataset.coalesce(10) throw ChunkFetchFailureException when original Dataset size is big
[ https://issues.apache.org/jira/browse/SPARK-29114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao updated SPARK-29114: Priority: Major (was: Blocker) > Dataset.coalesce(10) throw ChunkFetchFailureException when original > Dataset size is big > > > Key: SPARK-29114 > URL: https://issues.apache.org/jira/browse/SPARK-29114 > Project: Spark > Issue Type: Bug > Components: Block Manager >Affects Versions: 2.3.0 >Reporter: ZhanxiongWang >Priority: Major > > I create a Dataset df with 200 partitions. I applied for 100 executors > for my task. Each executor with 1 core, and driver memory is 8G executor is > 16G. I use df.cache() before df.coalesce(10). When{color:#de350b} > Dataset{color} {color:#de350b}size is small{color}, the program works > well. But when I {color:#de350b}increase{color} the size of the Dataset, > the function {color:#de350b}df.coalesce(10){color} will throw > ChunkFetchFailureException. > 19/09/17 08:26:44 INFO CoarseGrainedExecutorBackend: Got assigned task 210 > 19/09/17 08:26:44 INFO Executor: Running task 0.0 in stage 3.0 (TID 210) > 19/09/17 08:26:44 INFO MapOutputTrackerWorker: Updating epoch to 1 and > clearing cache > 19/09/17 08:26:44 INFO TorrentBroadcast: Started reading broadcast variable > 1003 > 19/09/17 08:26:44 INFO MemoryStore: Block broadcast_1003_piece0 stored as > bytes in memory (estimated size 49.4 KB, free 3.8 GB) > 19/09/17 08:26:44 INFO TorrentBroadcast: Reading broadcast variable 1003 took > 7 ms > 19/09/17 08:26:44 INFO MemoryStore: Block broadcast_1003 stored as values in > memory (estimated size 154.5 KB, free 3.8 GB) > 19/09/17 08:26:44 INFO BlockManager: Found block rdd_1005_0 locally > 19/09/17 08:26:44 INFO BlockManager: Found block rdd_1005_1 locally > 19/09/17 08:26:44 INFO TransportClientFactory: Successfully created > connection to /100.76.29.130:54238 after 1 ms (0 ms spent in bootstraps) > 19/09/17 08:26:46 ERROR RetryingBlockFetcher: Failed to fetch block > rdd_1005_18, and will not retry (0 retries) > org.apache.spark.network.client.ChunkFetchFailureException: Failure while > fetching StreamChunkId\{streamId=69368607002, chunkIndex=0}: readerIndex: 0, > writerIndex: -2137154997 (expected: 0 <= readerIndex <= writerIndex <= > capacity(-2137154997)) > at > org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:182) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead(TransportChannelHandler.java:120) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:292) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:278) > at > io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:292) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:278) > at > io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:292) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:278) > at > org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:85) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:292) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:278) > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:962) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:485) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:399) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:371) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112) > at > io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) > at java.lang.Thread.run(Thread.java:745) > 19/09/17 08:26:46 WARN BlockManager: Failed to fetch block after 1 fetch > failures. Most recent failure cause: > org.apache.spark.SparkException: Exception thrown in awaitResult: > at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
[jira] [Resolved] (LIVY-647) Travis failed on "batch session should not be gc-ed until application is finished"
[ https://issues.apache.org/jira/browse/LIVY-647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao resolved LIVY-647. -- Fix Version/s: 0.7.0 Assignee: runzhiwang Resolution: Fixed Issue resolved by pull request 222 https://github.com/apache/incubator-livy/pull/222 > Travis failed on "batch session should not be gc-ed until application is > finished" > -- > > Key: LIVY-647 > URL: https://issues.apache.org/jira/browse/LIVY-647 > Project: Livy > Issue Type: Bug > Components: Server >Affects Versions: 0.6.0 >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Fix For: 0.7.0 > > Time Spent: 3h > Remaining Estimate: 0h > > - batch session should not be gc-ed until application is finished *** FAILED > *** (50 milliseconds) > org.mockito.exceptions.misusing.UnfinishedStubbingException: Unfinished > stubbing detected here: > -> at > org.apache.livy.sessions.SessionManagerSpec$$anonfun$1.org$apache$livy$sessions$SessionManagerSpec$$anonfun$$changeStateAndCheck$1(SessionManagerSpec.scala:129) > E.g. thenReturn() may be missing. > Examples of correct stubbing: > when(mock.isOk()).thenReturn(true); > when(mock.isOk()).thenThrow(exception); > doThrow(exception).when(mock).someVoidMethod(); > Hints: > 1. missing thenReturn() > 2. you are trying to stub a final method, you naughty developer! > at > org.apache.livy.sessions.SessionManagerSpec$$anonfun$1.org$apache$livy$sessions$SessionManagerSpec$$anonfun$$changeStateAndCheck$1(SessionManagerSpec.scala:129) > at > org.apache.livy.sessions.SessionManagerSpec$$anonfun$1$$anonfun$org$apache$livy$sessions$SessionManagerSpec$$anonfun$$testSessionGC$1$1.apply(SessionManagerSpec.scala:141) > at > org.apache.livy.sessions.SessionManagerSpec$$anonfun$1$$anonfun$org$apache$livy$sessions$SessionManagerSpec$$anonfun$$testSessionGC$1$1.apply(SessionManagerSpec.scala:135) > at scala.collection.immutable.List.foreach(List.scala:392) > at > org.apache.livy.sessions.SessionManagerSpec$$anonfun$1.org$apache$livy$sessions$SessionManagerSpec$$anonfun$$testSessionGC$1(SessionManagerSpec.scala:135) > at > org.apache.livy.sessions.SessionManagerSpec$$anonfun$1$$anonfun$apply$mcV$sp$8.apply$mcV$sp(SessionManagerSpec.scala:108) > at > org.apache.livy.sessions.SessionManagerSpec$$anonfun$1$$anonfun$apply$mcV$sp$8.apply(SessionManagerSpec.scala:98) > at > org.apache.livy.sessions.SessionManagerSpec$$anonfun$1$$anonfun$apply$mcV$sp$8.apply(SessionManagerSpec.scala:98) > at > org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) > at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) > ... > > please reference to > https://travis-ci.org/runzhiwang/incubator-livy/builds/575627338 -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (LIVY-325) Refactoring Livy Session and Interpreter
[ https://issues.apache.org/jira/browse/LIVY-325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928479#comment-16928479 ] Saisai Shao commented on LIVY-325: -- I don't have a plan on it, please go ahead if you would like to do it. But since this is a large work, please do have a good plan and design doc about it, also separate down the work to small ones. > Refactoring Livy Session and Interpreter > > > Key: LIVY-325 > URL: https://issues.apache.org/jira/browse/LIVY-325 > Project: Livy > Issue Type: New Feature > Components: REPL >Affects Versions: 0.4.0 >Reporter: Saisai Shao >Priority: Major > > Currently in Livy master code, Livy interpreter is bound with Livy session, > when we created a session, we should specify which interpreter we want, and > this interpreter will be created implicitly. This potentially has several > limitations: > 1. We cannot create a share session, when we choose one language, we have to > create a new session to use. But some notebooks like Zeppelin could use > python, scala, R to manipulate data under the same SparkContext. So in Livy > we should decouple interpreter with SC and support shared context between > different interpreters. > 2. Furthermore, we cannot create multiple same interpreters in one session. > For example in Zeppelin scope mode, it could create multiple scala > interpreters to share with one context, but unfortunately in current Livy we > could not support this. > So based on the problems we mentioned above, we mainly have three things: > 1. Decouple interpreters from Spark context, so that we could create multiple > interpreters under one context. > 2. Make sure multiple interpreters could be worked together. > 3. Change REST APIs to support multiple interpreters per session. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (SPARK-29038) SPIP: Support Spark Materialized View
[ https://issues.apache.org/jira/browse/SPARK-29038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927265#comment-16927265 ] Saisai Shao commented on SPARK-29038: - [~cltlfcjin] I think we need a SPIP review and vote on the dev mail list before starting the works. > SPIP: Support Spark Materialized View > - > > Key: SPARK-29038 > URL: https://issues.apache.org/jira/browse/SPARK-29038 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.0.0 >Reporter: Lantao Jin >Priority: Major > > Materialized view is an important approach in DBMS to cache data to > accelerate queries. By creating a materialized view through SQL, the data > that can be cached is very flexible, and needs to be configured arbitrarily > according to specific usage scenarios. The Materialization Manager > automatically updates the cache data according to changes in detail source > tables, simplifying user work. When user submit query, Spark optimizer > rewrites the execution plan based on the available materialized view to > determine the optimal execution plan. > Details in [design > doc|https://docs.google.com/document/d/1q5pjSWoTNVc9zsAfbNzJ-guHyVwPsEroIEP8Cca179A/edit?usp=sharing] -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29038) SPIP: Support Spark Materialized View
[ https://issues.apache.org/jira/browse/SPARK-29038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927263#comment-16927263 ] Saisai Shao commented on SPARK-29038: - IIUC, I think the key difference between MV and Spark's built-in {{CACHE}} support is: 1. MV needs update when source table is updated, which I think current Spark's {{CACHE}} cannot support; 2. classical MV requires writing of source query based on the existing MV, which I think current Spark doesn't have. Please correct me if I'm wrong. > SPIP: Support Spark Materialized View > - > > Key: SPARK-29038 > URL: https://issues.apache.org/jira/browse/SPARK-29038 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.0.0 >Reporter: Lantao Jin >Priority: Major > > Materialized view is an important approach in DBMS to cache data to > accelerate queries. By creating a materialized view through SQL, the data > that can be cached is very flexible, and needs to be configured arbitrarily > according to specific usage scenarios. The Materialization Manager > automatically updates the cache data according to changes in detail source > tables, simplifying user work. When user submit query, Spark optimizer > rewrites the execution plan based on the available materialized view to > determine the optimal execution plan. > Details in [design > doc|https://docs.google.com/document/d/1q5pjSWoTNVc9zsAfbNzJ-guHyVwPsEroIEP8Cca179A/edit?usp=sharing] -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (LIVY-659) Travis failed on "can kill spark-submit while it's running"
[ https://issues.apache.org/jira/browse/LIVY-659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao resolved LIVY-659. -- Fix Version/s: 0.7.0 Assignee: runzhiwang Resolution: Fixed Issue resolved by pull request 226 https://github.com/apache/incubator-livy/pull/226 > Travis failed on "can kill spark-submit while it's running" > --- > > Key: LIVY-659 > URL: https://issues.apache.org/jira/browse/LIVY-659 > Project: Livy > Issue Type: Bug > Components: Tests >Affects Versions: 0.6.0 >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Fix For: 0.7.0 > > Time Spent: 20m > Remaining Estimate: 0h > > * can kill spark-submit while it's running *** FAILED *** (41 milliseconds) > org.mockito.exceptions.verification.WantedButNotInvoked: Wanted but not > invoked: > lineBufferedProcess.destroy(); > -> at > org.apache.livy.utils.SparkYarnAppSpec$$anonfun$1$$anonfun$apply$mcV$sp$13$$anonfun$apply$mcV$sp$15$$anonfun$apply$mcV$sp$16.apply$mcV$sp(SparkYarnAppSpec.scala:226) > Actually, there were zero interactions with this mock. > at > org.apache.livy.utils.SparkYarnAppSpec$$anonfun$1$$anonfun$apply$mcV$sp$13$$anonfun$apply$mcV$sp$15$$anonfun$apply$mcV$sp$16.apply$mcV$sp(SparkYarnAppSpec.scala:226) > at > org.apache.livy.utils.SparkYarnAppSpec.org$apache$livy$utils$SparkYarnAppSpec$$cleanupThread(SparkYarnAppSpec.scala:43) > at > org.apache.livy.utils.SparkYarnAppSpec$$anonfun$1$$anonfun$apply$mcV$sp$13$$anonfun$apply$mcV$sp$15.apply$mcV$sp(SparkYarnAppSpec.scala:224) > at org.apache.livy.utils.Clock$.withSleepMethod(Clock.scala:31) > at > org.apache.livy.utils.SparkYarnAppSpec$$anonfun$1$$anonfun$apply$mcV$sp$13.apply$mcV$sp(SparkYarnAppSpec.scala:201) > at > org.apache.livy.utils.SparkYarnAppSpec$$anonfun$1$$anonfun$apply$mcV$sp$13.apply(SparkYarnAppSpec.scala:201) > at > org.apache.livy.utils.SparkYarnAppSpec$$anonfun$1$$anonfun$apply$mcV$sp$13.apply(SparkYarnAppSpec.scala:201) > at > org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) > at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > please reference to: > https://travis-ci.org/captainzmc/incubator-livy/jobs/580596561 -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (LIVY-645) Add Name, Owner, Proxy User to web UI
[ https://issues.apache.org/jira/browse/LIVY-645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao resolved LIVY-645. -- Fix Version/s: 0.7.0 Assignee: Jeffrey(Xilang) Yan Resolution: Fixed Issue resolved by pull request 207 https://github.com/apache/incubator-livy/pull/207 > Add Name, Owner, Proxy User to web UI > - > > Key: LIVY-645 > URL: https://issues.apache.org/jira/browse/LIVY-645 > Project: Livy > Issue Type: Improvement > Components: Server >Affects Versions: 0.6.0 >Reporter: Jeffrey(Xilang) Yan >Assignee: Jeffrey(Xilang) Yan >Priority: Major > Fix For: 0.7.0 > > Time Spent: 50m > Remaining Estimate: 0h > > In current web UI, Interactive Sessions list has no Name, Batch Sessions list > has no Name, Owner and Proxy User. Should add those information so user can > find their session quickly. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Reopened] (SPARK-19147) netty throw NPE
[ https://issues.apache.org/jira/browse/SPARK-19147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao reopened SPARK-19147: - > netty throw NPE > --- > > Key: SPARK-19147 > URL: https://issues.apache.org/jira/browse/SPARK-19147 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 >Reporter: cen yuhai >Priority: Major > Labels: bulk-closed > > {code} > 17/01/10 19:17:20 ERROR ShuffleBlockFetcherIterator: Failed to get block(s) > from bigdata-hdp-apache1828.xg01.diditaxi.com:7337 > java.lang.NullPointerException: group > at io.netty.bootstrap.AbstractBootstrap.group(AbstractBootstrap.java:80) > at > org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:203) > at > org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:181) > at > org.apache.spark.network.shuffle.ExternalShuffleClient$1.createAndStart(ExternalShuffleClient.java:105) > at > org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140) > at > org.apache.spark.network.shuffle.RetryingBlockFetcher.start(RetryingBlockFetcher.java:120) > at > org.apache.spark.network.shuffle.ExternalShuffleClient.fetchBlocks(ExternalShuffleClient.java:114) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.sendRequest(ShuffleBlockFetcherIterator.scala:169) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.fetchUpToMaxBytes(ShuffleBlockFetcherIterator.scala:354) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:332) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:54) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) > at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) > at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at > org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$2.hasNext(WholeStageCodegenExec.scala:396) > at > org.apache.spark.sql.execution.columnar.InMemoryRelation$$anonfun$1$$anon$1.hasNext(InMemoryRelation.scala:138) > at > org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:215) > at > org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:957) > at > org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:948) > at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:888) > at > org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948) > at > org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:694) > at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:285) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.s
[jira] [Commented] (SPARK-19147) netty throw NPE
[ https://issues.apache.org/jira/browse/SPARK-19147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16923014#comment-16923014 ] Saisai Shao commented on SPARK-19147: - Yes, we also met the similar issue when executor is stopping, floods of netty NPE appears. I'm going to reopen this issue, at least we should improve the exception message. > netty throw NPE > --- > > Key: SPARK-19147 > URL: https://issues.apache.org/jira/browse/SPARK-19147 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 >Reporter: cen yuhai >Priority: Major > Labels: bulk-closed > > {code} > 17/01/10 19:17:20 ERROR ShuffleBlockFetcherIterator: Failed to get block(s) > from bigdata-hdp-apache1828.xg01.diditaxi.com:7337 > java.lang.NullPointerException: group > at io.netty.bootstrap.AbstractBootstrap.group(AbstractBootstrap.java:80) > at > org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:203) > at > org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:181) > at > org.apache.spark.network.shuffle.ExternalShuffleClient$1.createAndStart(ExternalShuffleClient.java:105) > at > org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140) > at > org.apache.spark.network.shuffle.RetryingBlockFetcher.start(RetryingBlockFetcher.java:120) > at > org.apache.spark.network.shuffle.ExternalShuffleClient.fetchBlocks(ExternalShuffleClient.java:114) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.sendRequest(ShuffleBlockFetcherIterator.scala:169) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.fetchUpToMaxBytes(ShuffleBlockFetcherIterator.scala:354) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:332) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:54) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) > at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) > at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at > org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$2.hasNext(WholeStageCodegenExec.scala:396) > at > org.apache.spark.sql.execution.columnar.InMemoryRelation$$anonfun$1$$anon$1.hasNext(InMemoryRelation.scala:138) > at > org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:215) > at > org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:957) > at > org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:948) > at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:888) > at > org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948) > at > org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:694) > at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:285) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > or
[jira] [Resolved] (LIVY-652) Thrifserver doesn't set session name correctly
[ https://issues.apache.org/jira/browse/LIVY-652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao resolved LIVY-652. -- Fix Version/s: 0.7.0 Assignee: Jeffrey(Xilang) Yan Resolution: Fixed Issue resolved by pull request 218 https://github.com/apache/incubator-livy/pull/218 > Thrifserver doesn't set session name correctly > -- > > Key: LIVY-652 > URL: https://issues.apache.org/jira/browse/LIVY-652 > Project: Livy > Issue Type: Bug > Components: Thriftserver >Affects Versions: 0.6.0 >Reporter: Jeffrey(Xilang) Yan >Assignee: Jeffrey(Xilang) Yan >Priority: Major > Fix For: 0.7.0 > > Time Spent: 40m > Remaining Estimate: 0h > > Thrift server doesn't set session name correctly, so not able to view session > name in Livy Web UI -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (LIVY-519) Flaky test: SparkYarnApp "should kill yarn app "
[ https://issues.apache.org/jira/browse/LIVY-519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao resolved LIVY-519. -- Fix Version/s: 0.7.0 Assignee: runzhiwang Resolution: Fixed Issue resolved by pull request 221 https://github.com/apache/incubator-livy/pull/221 > Flaky test: SparkYarnApp "should kill yarn app " > > > Key: LIVY-519 > URL: https://issues.apache.org/jira/browse/LIVY-519 > Project: Livy > Issue Type: Bug > Components: Tests >Affects Versions: 0.6.0 >Reporter: Marcelo Vanzin >Assignee: runzhiwang >Priority: Major > Fix For: 0.7.0 > > Time Spent: 1h > Remaining Estimate: 0h > > From a travis run: > {noformat} > [32mSparkYarnApp[0m > [32m- should poll YARN state and terminate (116 milliseconds)[0m > [31m- should kill yarn app *** FAILED *** (83 milliseconds)[0m > [31m org.mockito.exceptions.verification.WantedButNotInvoked: Wanted but > not invoked:[0m > [31myarnClient.killApplication([0m > [31mapplication_1467912463905_0021[0m > [31m);[0m > [31m-> at > org.apache.livy.utils.SparkYarnAppSpec$$anonfun$1$$anonfun$apply$mcV$sp$5$$anonfun$apply$mcV$sp$7$$anonfun$apply$mcV$sp$8.apply$mcV$sp(SparkYarnAppSpec.scala:156)[0m > [31m[0m > [31mHowever, there were other interactions with this mock:[0m > [31m-> at > org.apache.livy.utils.SparkYarnApp$$anonfun$1.apply$mcV$sp(SparkYarnApp.scala:261)[0m > [31m-> at > org.apache.livy.utils.SparkYarnApp$$anonfun$1.apply$mcV$sp(SparkYarnApp.scala:270)[0m > [31m at > org.apache.livy.utils.SparkYarnAppSpec$$anonfun$1$$anonfun$apply$mcV$sp$5$$anonfun$apply$mcV$sp$7$$anonfun$apply$mcV$sp$8.apply$mcV$sp(SparkYarnAppSpec.scala:156)[0m > [31m at > org.apache.livy.utils.SparkYarnAppSpec.org$apache$livy$utils$SparkYarnAppSpec$$cleanupThread(SparkYarnAppSpec.scala:43)[0m > [31m at > org.apache.livy.utils.SparkYarnAppSpec$$anonfun$1$$anonfun$apply$mcV$sp$5$$anonfun$apply$mcV$sp$7.apply$mcV$sp(SparkYarnAppSpec.scala:148)[0m > [31m at org.apache.livy.utils.Clock$.withSleepMethod(Clock.scala:31)[0m > [31m at > org.apache.livy.utils.SparkYarnAppSpec$$anonfun$1$$anonfun$apply$mcV$sp$5.apply$mcV$sp(SparkYarnAppSpec.scala:126)[0m > [31m at > org.apache.livy.utils.SparkYarnAppSpec$$anonfun$1$$anonfun$apply$mcV$sp$5.apply(SparkYarnAppSpec.scala:126)[0m > [31m at > org.apache.livy.utils.SparkYarnAppSpec$$anonfun$1$$anonfun$apply$mcV$sp$5.apply(SparkYarnAppSpec.scala:126)[0m > [31m at > org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)[0m > [31m at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)[0m > [31m at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)[0m > [31m ...[0m > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (LIVY-586) When a batch fails on startup, Livy continues to report the batch as "starting", even though it has failed
[ https://issues.apache.org/jira/browse/LIVY-586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao resolved LIVY-586. -- Fix Version/s: 0.7.0 Assignee: runzhiwang Resolution: Fixed Issue resolved by pull request 215 https://github.com/apache/incubator-livy/pull/215 > When a batch fails on startup, Livy continues to report the batch as > "starting", even though it has failed > -- > > Key: LIVY-586 > URL: https://issues.apache.org/jira/browse/LIVY-586 > Project: Livy > Issue Type: Bug > Components: Batch >Affects Versions: 0.5.0 > Environment: AWS EMR, Livy submits batches to YARN in cluster mode >Reporter: Sam Brougher >Assignee: runzhiwang >Priority: Major > Fix For: 0.7.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > When starting a Livy batch, I accidentally provided it a jar location in S3 > that did not exist. Livy then continued to report that the job was > "starting", even though it had clearly failed. > stdout: > {code:java} > 2019-04-05 11:24:18,149 [main] WARN org.apache.hadoop.util.NativeCodeLoader > [appName=] [jobId=] [clusterId=] - Unable to load native-hadoop library for > your platform... using builtin-java classes where applicable > Warning: Skip remote jar > s3://dev-dp-local/jars/develop-fix/ap5-app-transform-0.2-thread-pool-SNAPSHOT.jar. > 2019-04-05 11:24:19,152 [main] INFO org.apache.hadoop.yarn.client.RMProxy > [appName=] [jobId=] [clusterId=] - Connecting to ResourceManager at > ip-10-25-30-127.dev.cainc.internal/10.25.30.127:8032 > 2019-04-05 11:24:19,453 [main] INFO org.apache.spark.deploy.yarn.Client > [appName=] [jobId=] [clusterId=] - Requesting a new application from cluster > with 6 NodeManagers > 2019-04-05 11:24:19,532 [main] INFO org.apache.spark.deploy.yarn.Client > [appName=] [jobId=] [clusterId=] - Verifying our application has not > requested more than the maximum memory capability of the cluster (54272 MB > per container) > 2019-04-05 11:24:19,533 [main] INFO org.apache.spark.deploy.yarn.Client > [appName=] [jobId=] [clusterId=] - Will allocate AM container, with 9011 MB > memory including 819 MB overhead > 2019-04-05 11:24:19,534 [main] INFO org.apache.spark.deploy.yarn.Client > [appName=] [jobId=] [clusterId=] - Setting up container launch context for > our AM > 2019-04-05 11:24:19,537 [main] INFO org.apache.spark.deploy.yarn.Client > [appName=] [jobId=] [clusterId=] - Setting up the launch environment for our > AM container > 2019-04-05 11:24:19,549 [main] INFO org.apache.spark.deploy.yarn.Client > [appName=] [jobId=] [clusterId=] - Preparing resources for our AM container > 2019-04-05 11:24:21,059 [main] WARN org.apache.spark.deploy.yarn.Client > [appName=] [jobId=] [clusterId=] - Neither spark.yarn.jars nor > spark.yarn.archive is set, falling back to uploading libraries under > SPARK_HOME. > 2019-04-05 11:24:23,790 [main] INFO org.apache.spark.deploy.yarn.Client > [appName=] [jobId=] [clusterId=] - Uploading resource > file:/mnt/tmp/spark-b4e4a760-77a3-4554-a3f3-c3f82675d865/__spark_libs__3639879082942366045.zip > -> > hdfs://ip-10-25-30-127.dev.cainc.internal:8020/user/livy/.sparkStaging/application_1554234858331_0222/__spark_libs__3639879082942366045.zip > 2019-04-05 11:24:26,817 [main] INFO org.apache.spark.deploy.yarn.Client > [appName=] [jobId=] [clusterId=] - Uploading resource > s3://dev-dp-local/jars/develop-fix/ap5-app-transform-0.2-thread-pool-SNAPSHOT.jar > -> > hdfs://ip-10-25-30-127.dev.cainc.internal:8020/user/livy/.sparkStaging/application_1554234858331_0222/ap5-app-transform-0.2-thread-pool-SNAPSHOT.jar > 2019-04-05 11:24:26,940 [main] INFO org.apache.spark.deploy.yarn.Client > [appName=] [jobId=] [clusterId=] - Deleted staging directory > hdfs://ip-10-25-30-127.dev.cainc.internal:8020/user/livy/.sparkStaging/application_1554234858331_0222 > Exception in thread "main" java.io.FileNotFoundException: No such file or > directory > 's3://dev-dp-local/jars/develop-fix/ap5-app-transform-0.2-thread-pool-SNAPSHOT.jar' > at > com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:805) > at > com.amazon.ws.emr.hadoop.fs.EmrFileSystem.getFileStatus(EmrFileSystem.java:536) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:340) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:292) > at > org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:356) > at > org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:478) > at > org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$10.apply(Client.scala:577) > at > org.apache.spark.deploy.yarn.Clie
[jira] [Resolved] (LIVY-642) A rare status happened in yarn cause SparkApp change into error state
[ https://issues.apache.org/jira/browse/LIVY-642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao resolved LIVY-642. -- Fix Version/s: 0.7.0 Assignee: Zhefeng Wang Resolution: Fixed Issue resolved by pull request 204 https://github.com/apache/incubator-livy/pull/204 > A rare status happened in yarn cause SparkApp change into error state > - > > Key: LIVY-642 > URL: https://issues.apache.org/jira/browse/LIVY-642 > Project: Livy > Issue Type: Bug > Components: Server >Affects Versions: 0.6.0 >Reporter: Zhefeng Wang >Assignee: Zhefeng Wang >Priority: Major > Fix For: 0.7.0 > > Time Spent: 40m > Remaining Estimate: 0h > > Some batch session execute successfully but return error state. in livy > log,we find: > {quote}{{2019-08-08 20:11:37,678 ERROR [Logging.scala:56] - Unknown YARN > state RUNNING for app application_1559632632227_39801506 with final status > SUCCEEDED.}} > {quote} > and this situation with yarn state RUNNING and final status SUCCEEDED is a > *correct* state in yarn, so this should not mapped to SparkApp.State.FAILED -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (SPARK-28340) Noisy exceptions when tasks are killed: "DiskBlockObjectWriter: Uncaught exception while reverting partial writes to file: java.nio.channels.ClosedByInterruptException
[ https://issues.apache.org/jira/browse/SPARK-28340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16918386#comment-16918386 ] Saisai Shao commented on SPARK-28340: - My simple concern is that there may be other places which will potentially throw this "ClosedByInterruptException" during task killing, it seems hard to figure out all of them. > Noisy exceptions when tasks are killed: "DiskBlockObjectWriter: Uncaught > exception while reverting partial writes to file: > java.nio.channels.ClosedByInterruptException" > > > Key: SPARK-28340 > URL: https://issues.apache.org/jira/browse/SPARK-28340 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Josh Rosen >Priority: Minor > > If a Spark task is killed while writing blocks to disk (due to intentional > job kills, automated killing of redundant speculative tasks, etc) then Spark > may log exceptions like > {code:java} > 19/07/10 21:31:08 ERROR storage.DiskBlockObjectWriter: Uncaught exception > while reverting partial writes to file / > java.nio.channels.ClosedByInterruptException > at > java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) > at sun.nio.ch.FileChannelImpl.truncate(FileChannelImpl.java:372) > at > org.apache.spark.storage.DiskBlockObjectWriter$$anonfun$revertPartialWritesAndClose$2.apply$mcV$sp(DiskBlockObjectWriter.scala:218) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1369) > at > org.apache.spark.storage.DiskBlockObjectWriter.revertPartialWritesAndClose(DiskBlockObjectWriter.scala:214) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.stop(BypassMergeSortShuffleWriter.java:237) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:105) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) > at org.apache.spark.scheduler.Task.run(Task.scala:121) > at > org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748){code} > If {{BypassMergeSortShuffleWriter}} is being used then a single cancelled > task can result in hundreds of these stacktraces being logged. > Here are some StackOverflow questions asking about this: > * [https://stackoverflow.com/questions/40027870/spark-jobserver-job-crash] > * > [https://stackoverflow.com/questions/50646953/why-is-java-nio-channels-closedbyinterruptexceptio-called-when-caling-multiple] > * > [https://stackoverflow.com/questions/41867053/java-nio-channels-closedbyinterruptexception-in-spark] > * > [https://stackoverflow.com/questions/56845041/are-closedbyinterruptexception-exceptions-expected-when-spark-speculation-kills] > > Can we prevent this exception from occurring? If not, can we treat this > "expected exception" in a special manner to avoid log spam? My concern is > that the presence of large numbers of spurious exceptions is confusing to > users when they are inspecting Spark logs to diagnose other issues. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28340) Noisy exceptions when tasks are killed: "DiskBlockObjectWriter: Uncaught exception while reverting partial writes to file: java.nio.channels.ClosedByInterruptException
[ https://issues.apache.org/jira/browse/SPARK-28340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16918306#comment-16918306 ] Saisai Shao commented on SPARK-28340: - We also saw a bunch of exceptions in our production environment. Looks like it is hard to prevent unless we change to not use `interrupt`, maybe we can just ignore logging such exceptions. > Noisy exceptions when tasks are killed: "DiskBlockObjectWriter: Uncaught > exception while reverting partial writes to file: > java.nio.channels.ClosedByInterruptException" > > > Key: SPARK-28340 > URL: https://issues.apache.org/jira/browse/SPARK-28340 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Josh Rosen >Priority: Minor > > If a Spark task is killed while writing blocks to disk (due to intentional > job kills, automated killing of redundant speculative tasks, etc) then Spark > may log exceptions like > {code:java} > 19/07/10 21:31:08 ERROR storage.DiskBlockObjectWriter: Uncaught exception > while reverting partial writes to file / > java.nio.channels.ClosedByInterruptException > at > java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) > at sun.nio.ch.FileChannelImpl.truncate(FileChannelImpl.java:372) > at > org.apache.spark.storage.DiskBlockObjectWriter$$anonfun$revertPartialWritesAndClose$2.apply$mcV$sp(DiskBlockObjectWriter.scala:218) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1369) > at > org.apache.spark.storage.DiskBlockObjectWriter.revertPartialWritesAndClose(DiskBlockObjectWriter.scala:214) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.stop(BypassMergeSortShuffleWriter.java:237) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:105) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) > at org.apache.spark.scheduler.Task.run(Task.scala:121) > at > org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748){code} > If {{BypassMergeSortShuffleWriter}} is being used then a single cancelled > task can result in hundreds of these stacktraces being logged. > Here are some StackOverflow questions asking about this: > * [https://stackoverflow.com/questions/40027870/spark-jobserver-job-crash] > * > [https://stackoverflow.com/questions/50646953/why-is-java-nio-channels-closedbyinterruptexceptio-called-when-caling-multiple] > * > [https://stackoverflow.com/questions/41867053/java-nio-channels-closedbyinterruptexception-in-spark] > * > [https://stackoverflow.com/questions/56845041/are-closedbyinterruptexception-exceptions-expected-when-spark-speculation-kills] > > Can we prevent this exception from occurring? If not, can we treat this > "expected exception" in a special manner to avoid log spam? My concern is > that the presence of large numbers of spurious exceptions is confusing to > users when they are inspecting Spark logs to diagnose other issues. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (LIVY-616) Livy Server discovery
[ https://issues.apache.org/jira/browse/LIVY-616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao reassigned LIVY-616: Assignee: Saisai Shao > Livy Server discovery > - > > Key: LIVY-616 > URL: https://issues.apache.org/jira/browse/LIVY-616 > Project: Livy > Issue Type: Improvement > Components: Server >Reporter: Oleksandr Shevchenko >Assignee: Saisai Shao >Priority: Major > Attachments: Livy Server discovery.pdf, > image-2019-08-28-17-08-21-590.png > > Time Spent: 10m > Remaining Estimate: 0h > > Currently, there isn't a way to get Livy Server URI by the client without > setting Livy address explicitly to livy.conf. A client should set > "livy.server.host" variable and then get it via LivyConf. The same behavior > if you want to use Livy with Zeppelin, we need to set "zeppelin.livy.url". It > very inconvenient when we install Livy packages on couple nodes and don't > know where exactly Livy Server will be started e.g. by Ambari or Cloudera > Manager. Also, in this case, we need to have Livy configuration files on a > node where we want to get Livy address. > It will be very helpful if we will add Livy Server address to Zookeeper and > expose API for clients to get Livy URL to use it in client code for REST > calls. > Livy already supports state saving in Zookeeper but I don't see that we store > Livy server address somewhere. Before starting investigating and > implementation I want to ask here about this. > Please, correct me if I missed something. > Any comments will be highly appreciated! -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (LIVY-616) Livy Server discovery
[ https://issues.apache.org/jira/browse/LIVY-616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917755#comment-16917755 ] Saisai Shao commented on LIVY-616: -- [~oshevchenko] we don't typically assign the jira ourselves, committers will assign to whom the PR is finally merged. > Livy Server discovery > - > > Key: LIVY-616 > URL: https://issues.apache.org/jira/browse/LIVY-616 > Project: Livy > Issue Type: Improvement > Components: Server >Reporter: Oleksandr Shevchenko >Priority: Major > Attachments: Livy Server discovery.pdf > > Time Spent: 10m > Remaining Estimate: 0h > > Currently, there isn't a way to get Livy Server URI by the client without > setting Livy address explicitly to livy.conf. A client should set > "livy.server.host" variable and then get it via LivyConf. The same behavior > if you want to use Livy with Zeppelin, we need to set "zeppelin.livy.url". It > very inconvenient when we install Livy packages on couple nodes and don't > know where exactly Livy Server will be started e.g. by Ambari or Cloudera > Manager. Also, in this case, we need to have Livy configuration files on a > node where we want to get Livy address. > It will be very helpful if we will add Livy Server address to Zookeeper and > expose API for clients to get Livy URL to use it in client code for REST > calls. > Livy already supports state saving in Zookeeper but I don't see that we store > Livy server address somewhere. Before starting investigating and > implementation I want to ask here about this. > Please, correct me if I missed something. > Any comments will be highly appreciated! -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Assigned] (LIVY-616) Livy Server discovery
[ https://issues.apache.org/jira/browse/LIVY-616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao reassigned LIVY-616: Assignee: (was: Oleksandr Shevchenko) > Livy Server discovery > - > > Key: LIVY-616 > URL: https://issues.apache.org/jira/browse/LIVY-616 > Project: Livy > Issue Type: Improvement > Components: Server >Reporter: Oleksandr Shevchenko >Priority: Major > Attachments: Livy Server discovery.pdf > > Time Spent: 10m > Remaining Estimate: 0h > > Currently, there isn't a way to get Livy Server URI by the client without > setting Livy address explicitly to livy.conf. A client should set > "livy.server.host" variable and then get it via LivyConf. The same behavior > if you want to use Livy with Zeppelin, we need to set "zeppelin.livy.url". It > very inconvenient when we install Livy packages on couple nodes and don't > know where exactly Livy Server will be started e.g. by Ambari or Cloudera > Manager. Also, in this case, we need to have Livy configuration files on a > node where we want to get Livy address. > It will be very helpful if we will add Livy Server address to Zookeeper and > expose API for clients to get Livy URL to use it in client code for REST > calls. > Livy already supports state saving in Zookeeper but I don't see that we store > Livy server address somewhere. Before starting investigating and > implementation I want to ask here about this. > Please, correct me if I missed something. > Any comments will be highly appreciated! -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (LIVY-617) Livy session leak on Yarn when creating session duplicated names
[ https://issues.apache.org/jira/browse/LIVY-617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao resolved LIVY-617. -- Fix Version/s: 0.7.0 Assignee: shanyu zhao Resolution: Fixed Issue resolved by pull request 187 https://github.com/apache/incubator-livy/pull/187 > Livy session leak on Yarn when creating session duplicated names > > > Key: LIVY-617 > URL: https://issues.apache.org/jira/browse/LIVY-617 > Project: Livy > Issue Type: Bug > Components: Server >Affects Versions: 0.6.0 >Reporter: shanyu zhao >Assignee: shanyu zhao >Priority: Major > Fix For: 0.7.0 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > When running Livy on Yarn and try to create session with duplicated names, > Livy server sends response to client "Duplicate session name: xxx" but it > doesn't stop the session. The session creation failed, however, the Yarn > application got started and keeps running forever. > This is because during livy session register method, exception > "IllegalArgumentException" is thrown without stopping the session: > {code:java} > def register(session: S): S = { > info(s"Registering new session ${session.id}") > synchronized { > session.name.foreach { sessionName => > if (sessionsByName.contains(sessionName)) { > throw new IllegalArgumentException(s"Duplicate session name: > ${session.name}") > } else { > sessionsByName.put(sessionName, session) > } > } > sessions.put(session.id, session) > session.start() > } > session > }{code} > > Reproduction scripts: > curl -s -k -u username:password -X POST --data '\{"name": "duplicatedname", > "kind": "pyspark"}' -H "Content-Type: application/json" > 'https://myserver/livy/v1/sessions' -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Comment Edited] (LIVY-641) Travis failed on "should end with status dead when batch session exits with no 0 return code"
[ https://issues.apache.org/jira/browse/LIVY-641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16916318#comment-16916318 ] Saisai Shao edited comment on LIVY-641 at 8/27/19 3:58 AM: --- Issue resolved by pull request 214 [https://github.com/apache/incubator-livy/pull/214|https://github.com/apache/incubator-livy/pull/214] was (Author: jerryshao): Issue resolved by pull request 214 [https://github.com/apache/incubator-livy/pull/214|https://github.com/apache/incubator-livy/pull/184] > Travis failed on "should end with status dead when batch session exits with > no 0 return code" > - > > Key: LIVY-641 > URL: https://issues.apache.org/jira/browse/LIVY-641 > Project: Livy > Issue Type: Bug > Components: Tests >Affects Versions: 0.6.0 >Reporter: jiewang >Assignee: jiewang >Priority: Major > Fix For: 0.7.0 > > Time Spent: 10m > Remaining Estimate: 0h > > BatchSessionSpec: > 2463A Batch process > 2464- should create a process (2 seconds, 103 milliseconds) > 2465- should update appId and appInfo (30 milliseconds) > {color:#FF}2466- should end with status dead when batch session exits > with no 0 return code *** FAILED *** (121 milliseconds){color} > {color:#FF} 2467 false was not true (BatchSessionSpec.scala:138){color} > 2468- should recover session (name = None) (17 milliseconds) > 2469- should recover session (name = Some(Test Batch Session)) (4 > milliseconds) > 2470- should recover session (name = null) (15 milliseconds) > please reference to > [https://travis-ci.org/apache/incubator-livy/builds/572562376?utm_source=github_status&utm_medium=notification] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (LIVY-641) Travis failed on "should end with status dead when batch session exits with no 0 return code"
[ https://issues.apache.org/jira/browse/LIVY-641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao resolved LIVY-641. -- Fix Version/s: 0.7.0 Assignee: jiewang Resolution: Fixed Issue resolved by pull request 214 [https://github.com/apache/incubator-livy/pull/214|https://github.com/apache/incubator-livy/pull/184] > Travis failed on "should end with status dead when batch session exits with > no 0 return code" > - > > Key: LIVY-641 > URL: https://issues.apache.org/jira/browse/LIVY-641 > Project: Livy > Issue Type: Bug > Components: Tests >Affects Versions: 0.6.0 >Reporter: jiewang >Assignee: jiewang >Priority: Major > Fix For: 0.7.0 > > Time Spent: 10m > Remaining Estimate: 0h > > BatchSessionSpec: > 2463A Batch process > 2464- should create a process (2 seconds, 103 milliseconds) > 2465- should update appId and appInfo (30 milliseconds) > {color:#FF}2466- should end with status dead when batch session exits > with no 0 return code *** FAILED *** (121 milliseconds){color} > {color:#FF} 2467 false was not true (BatchSessionSpec.scala:138){color} > 2468- should recover session (name = None) (17 milliseconds) > 2469- should recover session (name = Some(Test Batch Session)) (4 > milliseconds) > 2470- should recover session (name = null) (15 milliseconds) > please reference to > [https://travis-ci.org/apache/incubator-livy/builds/572562376?utm_source=github_status&utm_medium=notification] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (LIVY-639) Is it possible to add start time and completion time and duration to the statements web ui interface?
[ https://issues.apache.org/jira/browse/LIVY-639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16916306#comment-16916306 ] Saisai Shao commented on LIVY-639: -- Issue resolved by pull request 201 https://github.com/apache/incubator-livy/pull/201 > Is it possible to add start time and completion time and duration to the > statements web ui interface? > - > > Key: LIVY-639 > URL: https://issues.apache.org/jira/browse/LIVY-639 > Project: Livy > Issue Type: Improvement > Components: REPL, RSC, Server >Affects Versions: 0.6.0 >Reporter: zhang peng >Assignee: zhang peng >Priority: Major > Fix For: 0.7.0 > > Attachments: DeepinScreenshot_select-area_20190815153353.png > > Time Spent: 3h 10m > Remaining Estimate: 0h > > Many times I need to know the execution time and start time of the statement, > but now the livy web ui does not support this. I have improved livy and have > written the code, I hope to get community support and submit PR. > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Assigned] (LIVY-639) Is it possible to add start time and completion time and duration to the statements web ui interface?
[ https://issues.apache.org/jira/browse/LIVY-639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao reassigned LIVY-639: Assignee: zhang peng > Is it possible to add start time and completion time and duration to the > statements web ui interface? > - > > Key: LIVY-639 > URL: https://issues.apache.org/jira/browse/LIVY-639 > Project: Livy > Issue Type: Improvement > Components: REPL, RSC, Server >Affects Versions: 0.6.0 >Reporter: zhang peng >Assignee: zhang peng >Priority: Major > Fix For: 0.7.0 > > Attachments: DeepinScreenshot_select-area_20190815153353.png > > Time Spent: 3h 10m > Remaining Estimate: 0h > > Many times I need to know the execution time and start time of the statement, > but now the livy web ui does not support this. I have improved livy and have > written the code, I hope to get community support and submit PR. > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (LIVY-639) Is it possible to add start time and completion time and duration to the statements web ui interface?
[ https://issues.apache.org/jira/browse/LIVY-639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao resolved LIVY-639. -- Fix Version/s: 0.7.0 Resolution: Fixed > Is it possible to add start time and completion time and duration to the > statements web ui interface? > - > > Key: LIVY-639 > URL: https://issues.apache.org/jira/browse/LIVY-639 > Project: Livy > Issue Type: Improvement > Components: REPL, RSC, Server >Affects Versions: 0.6.0 >Reporter: zhang peng >Priority: Major > Fix For: 0.7.0 > > Attachments: DeepinScreenshot_select-area_20190815153353.png > > Time Spent: 3h 10m > Remaining Estimate: 0h > > Many times I need to know the execution time and start time of the statement, > but now the livy web ui does not support this. I have improved livy and have > written the code, I hope to get community support and submit PR. > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Assigned] (LIVY-648) Wrong return message in cancel statement documentation
[ https://issues.apache.org/jira/browse/LIVY-648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao reassigned LIVY-648: Assignee: Oleksandr Shevchenko > Wrong return message in cancel statement documentation > -- > > Key: LIVY-648 > URL: https://issues.apache.org/jira/browse/LIVY-648 > Project: Livy > Issue Type: Documentation >Reporter: Oleksandr Shevchenko >Assignee: Oleksandr Shevchenko >Priority: Trivial > Fix For: 0.7.0 > > Attachments: image-2019-08-24-00-54-29-000.png, > image-2019-08-24-00-58-11-847.png > > Time Spent: 20m > Remaining Estimate: 0h > > A trivial mistake in the documentation. "Canceled" vs "cancelled" (two Ls). > [https://github.com/apache/incubator-livy/blob/80daadef02ae57b2a5487c6f92e0f7df558d4864/docs/rest-api.md#L308] > !image-2019-08-24-00-54-29-000.png|width=1529,height=278! > !image-2019-08-24-00-58-11-847.png! > We can just fix documentation or unify all names across the project. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (LIVY-648) Wrong return message in cancel statement documentation
[ https://issues.apache.org/jira/browse/LIVY-648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao resolved LIVY-648. -- Fix Version/s: 0.7.0 Resolution: Fixed Issue resolved by pull request 210 [https://github.com/apache/incubator-livy/pull/210] > Wrong return message in cancel statement documentation > -- > > Key: LIVY-648 > URL: https://issues.apache.org/jira/browse/LIVY-648 > Project: Livy > Issue Type: Documentation >Reporter: Oleksandr Shevchenko >Priority: Trivial > Fix For: 0.7.0 > > Attachments: image-2019-08-24-00-54-29-000.png, > image-2019-08-24-00-58-11-847.png > > Time Spent: 20m > Remaining Estimate: 0h > > A trivial mistake in the documentation. "Canceled" vs "cancelled" (two Ls). > [https://github.com/apache/incubator-livy/blob/80daadef02ae57b2a5487c6f92e0f7df558d4864/docs/rest-api.md#L308] > !image-2019-08-24-00-54-29-000.png|width=1529,height=278! > !image-2019-08-24-00-58-11-847.png! > We can just fix documentation or unify all names across the project. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (SPARK-28849) Spark's UnsafeShuffleWriter may run into infinite loop in transferTo occasionally
[ https://issues.apache.org/jira/browse/SPARK-28849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao updated SPARK-28849: Description: Spark's {{UnsafeShuffleWriter}} may run into infinite loop when calling {{transferTo}} occasionally. What we saw is that when merging shuffle temp file, the task is hung for several hours until it is killed manually. Here's the log you can see, there's no any log after spilling the shuffle data to disk, but the executor is still alive. !95330.png! And here is the thread dump, we could see that it always calls native method {{size0}}. !91ADA.png! And we use strace to trace the system call, we found that this thread is always calling {{fstat}}, and the system usage is pretty high, here is the screenshot. !D18F4.png! We didn't find the root cause here, I guess it might be related to FS or disk issue. Anyway we should figure out a way to fail fast in a such scenario. was: Spark's {{UnsafeShuffleWriter}} may run into infinite loop when calling {{transferTo}} occasionally. What we saw is that when merging shuffle temp file, the task is hung for several hours until it is killed manually. Here's the log you can see, there's no any log after spilling the shuffle data to disk, but the executor is still alive. !95330.png! And here is the thread dump, we could see that it always calls native method {{size0}}. !91ADA.png! And we use strace to trace the system, we found that this thread is always calling {{fstat}}, and the system usage is pretty high, here is the screenshot. !D18F4.png! We didn't find the root cause here, I guess it might be related to FS or disk issue. Anyway we should figure out a way to fail fast in a such scenario. > Spark's UnsafeShuffleWriter may run into infinite loop in transferTo > occasionally > - > > Key: SPARK-28849 > URL: https://issues.apache.org/jira/browse/SPARK-28849 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.1 >Reporter: Saisai Shao >Priority: Major > Attachments: 91ADA.png, 95330.png, D18F4.png > > > Spark's {{UnsafeShuffleWriter}} may run into infinite loop when calling > {{transferTo}} occasionally. What we saw is that when merging shuffle temp > file, the task is hung for several hours until it is killed manually. Here's > the log you can see, there's no any log after spilling the shuffle data to > disk, but the executor is still alive. > !95330.png! > And here is the thread dump, we could see that it always calls native method > {{size0}}. > !91ADA.png! > And we use strace to trace the system call, we found that this thread is > always calling {{fstat}}, and the system usage is pretty high, here is the > screenshot. > !D18F4.png! > We didn't find the root cause here, I guess it might be related to FS or disk > issue. Anyway we should figure out a way to fail fast in a such scenario. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28849) Spark's UnsafeShuffleWriter may run into infinite loop in transferTo occasionally
[ https://issues.apache.org/jira/browse/SPARK-28849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao updated SPARK-28849: Description: Spark's {{UnsafeShuffleWriter}} may run into infinite loop when calling {{transferTo}} occasionally. What we saw is that when merging shuffle temp file, the task is hung for several hours until it is killed manually. Here's the log you can see, there's no any log after spilling the shuffle data to disk, but the executor is still alive. !95330.png! And here is the thread dump, we could see that it always calls native method {{size0}}. !91ADA.png! And we use strace to trace the system, we found that this thread is always calling {{fstat}}, and the system usage is pretty high, here is the screenshot. !D18F4.png! We didn't find the root cause here, I guess it might be related to FS or disk issue. Anyway we should figure out a way to fail fast in a such scenario. was: Spark's {{UnsafeShuffleWriter}} may run into infinite loop when calling {{transferTo}} occasionally. What we saw is that when merging shuffle temp file, the task is hung for several hours until it is killed manually. Here's the log you can see, there's no any log after spill the shuffle data to disk. !95330.png! And here is the thread dump, we could see that it always calls native method {{size0}}. !91ADA.png! And we use strace to trace the system, we found that this thread is always calling {{fstat}}, and the system usage is pretty high, here is the screenshot. !D18F4.png! We didn't find the root cause here, I guess it might be related to FS or disk issue. Anyway we should figure out a way to fail fast in a such scenario. > Spark's UnsafeShuffleWriter may run into infinite loop in transferTo > occasionally > - > > Key: SPARK-28849 > URL: https://issues.apache.org/jira/browse/SPARK-28849 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.1 >Reporter: Saisai Shao >Priority: Major > Attachments: 91ADA.png, 95330.png, D18F4.png > > > Spark's {{UnsafeShuffleWriter}} may run into infinite loop when calling > {{transferTo}} occasionally. What we saw is that when merging shuffle temp > file, the task is hung for several hours until it is killed manually. Here's > the log you can see, there's no any log after spilling the shuffle data to > disk, but the executor is still alive. > !95330.png! > And here is the thread dump, we could see that it always calls native method > {{size0}}. > !91ADA.png! > And we use strace to trace the system, we found that this thread is always > calling {{fstat}}, and the system usage is pretty high, here is the > screenshot. > !D18F4.png! > We didn't find the root cause here, I guess it might be related to FS or disk > issue. Anyway we should figure out a way to fail fast in a such scenario. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28849) Spark's UnsafeShuffleWriter may run into infinite loop in transferTo occasionally
[ https://issues.apache.org/jira/browse/SPARK-28849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao updated SPARK-28849: Description: Spark's {{UnsafeShuffleWriter}} may run into infinite loop when calling {{transferTo}} occasionally. What we saw is that when merging shuffle temp file, the task is hung for several hours until it is killed manually. Here's the log you can see, there's no any log after spill the shuffle data to disk. !95330.png! And here is the thread dump, we could see that it always calls native method {{size0}}. !91ADA.png! And we use strace to trace the system, we found that this thread is always calling {{fstat}}, and the system usage is pretty high, here is the screenshot. !D18F4.png! We didn't find the root cause here, I guess it might be related to FS or disk issue. Anyway we should figure out a way to fail fast in a such scenario. was: Spark's {{UnsafeShuffleWriter}} may run into infinite loop when calling {{transferTo}} occasionally. What we saw is that when merging shuffle temp file, the task is hung for several hours until killed manually. Here's the log you can see, there's no any log after spill the shuffle files to disk. !95330.png! And here is the thread dump, we could see that it always calls native method {{size0}}. !91ADA.png! And we use strace to trace the system, we found that this thread is always calling {{fstat}}, and the system usage is pretty high, here is the screenshot. !D18F4.png! We didn't find the root cause here, I guess it might be related to FS or disk issue. Anyway we should figure out a way to fail fast in a such scenario. > Spark's UnsafeShuffleWriter may run into infinite loop in transferTo > occasionally > - > > Key: SPARK-28849 > URL: https://issues.apache.org/jira/browse/SPARK-28849 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.1 >Reporter: Saisai Shao >Priority: Major > Attachments: 91ADA.png, 95330.png, D18F4.png > > > Spark's {{UnsafeShuffleWriter}} may run into infinite loop when calling > {{transferTo}} occasionally. What we saw is that when merging shuffle temp > file, the task is hung for several hours until it is killed manually. Here's > the log you can see, there's no any log after spill the shuffle data to disk. > !95330.png! > And here is the thread dump, we could see that it always calls native method > {{size0}}. > !91ADA.png! > And we use strace to trace the system, we found that this thread is always > calling {{fstat}}, and the system usage is pretty high, here is the > screenshot. > !D18F4.png! > We didn't find the root cause here, I guess it might be related to FS or disk > issue. Anyway we should figure out a way to fail fast in a such scenario. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28849) Spark's UnsafeShuffleWriter may run into infinite loop in transferTo occasionally
[ https://issues.apache.org/jira/browse/SPARK-28849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao updated SPARK-28849: Description: Spark's {{UnsafeShuffleWriter}} may run into infinite loop when calling {{transferTo}} occasionally. What we saw is that when merging shuffle temp file, the task is hung for several hours until killed manually. Here's the log you can see, there's no any log after spill the shuffle files to disk. !95330.png! And here is the thread dump, we could see that it always calls native method {{size0}}. !91ADA.png! And we use strace to trace the system, we found that this thread is always calling {{fstat}}, here is the screenshot. !D18F4.png! We didn't find the root cause here, I guess it might be related to FS or disk issue. Anyway we should figure out a way to fail fast in a such scenario. was: Spark's {{UnsafeShuffleWriter}} may run into infinite loop when calling {{transferTo}} occasionally. What we saw is that when merging shuffle temp file, the task is hung for several hours until killed manually. Here's the log you can see, there's no any log after spill the shuffle files to disk. !95330.png! And here is the thread dump, we could see that it is calling native method {{size0}}. !91ADA.png! And we use strace to trace the system, we found that this thread is always calling {{fstat}}, here is the screenshot. !D18F4.png! We didn't find the root cause here, I guess it might be related to FS or disk issue. Anyway we should figure out a way to fail fast in a such scenario. > Spark's UnsafeShuffleWriter may run into infinite loop in transferTo > occasionally > - > > Key: SPARK-28849 > URL: https://issues.apache.org/jira/browse/SPARK-28849 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.1 >Reporter: Saisai Shao >Priority: Major > Attachments: 91ADA.png, 95330.png, D18F4.png > > > Spark's {{UnsafeShuffleWriter}} may run into infinite loop when calling > {{transferTo}} occasionally. What we saw is that when merging shuffle temp > file, the task is hung for several hours until killed manually. Here's the > log you can see, there's no any log after spill the shuffle files to disk. > !95330.png! > And here is the thread dump, we could see that it always calls native method > {{size0}}. > !91ADA.png! > And we use strace to trace the system, we found that this thread is always > calling {{fstat}}, here is the screenshot. > !D18F4.png! > We didn't find the root cause here, I guess it might be related to FS or disk > issue. Anyway we should figure out a way to fail fast in a such scenario. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28849) Spark's UnsafeShuffleWriter may run into infinite loop in transferTo occasionally
[ https://issues.apache.org/jira/browse/SPARK-28849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao updated SPARK-28849: Description: Spark's {{UnsafeShuffleWriter}} may run into infinite loop when calling {{transferTo}} occasionally. What we saw is that when merging shuffle temp file, the task is hung for several hours until killed manually. Here's the log you can see, there's no any log after spill the shuffle files to disk. !95330.png! And here is the thread dump, we could see that it always calls native method {{size0}}. !91ADA.png! And we use strace to trace the system, we found that this thread is always calling {{fstat}}, and the system usage is pretty high, here is the screenshot. !D18F4.png! We didn't find the root cause here, I guess it might be related to FS or disk issue. Anyway we should figure out a way to fail fast in a such scenario. was: Spark's {{UnsafeShuffleWriter}} may run into infinite loop when calling {{transferTo}} occasionally. What we saw is that when merging shuffle temp file, the task is hung for several hours until killed manually. Here's the log you can see, there's no any log after spill the shuffle files to disk. !95330.png! And here is the thread dump, we could see that it always calls native method {{size0}}. !91ADA.png! And we use strace to trace the system, we found that this thread is always calling {{fstat}}, here is the screenshot. !D18F4.png! We didn't find the root cause here, I guess it might be related to FS or disk issue. Anyway we should figure out a way to fail fast in a such scenario. > Spark's UnsafeShuffleWriter may run into infinite loop in transferTo > occasionally > - > > Key: SPARK-28849 > URL: https://issues.apache.org/jira/browse/SPARK-28849 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.1 >Reporter: Saisai Shao >Priority: Major > Attachments: 91ADA.png, 95330.png, D18F4.png > > > Spark's {{UnsafeShuffleWriter}} may run into infinite loop when calling > {{transferTo}} occasionally. What we saw is that when merging shuffle temp > file, the task is hung for several hours until killed manually. Here's the > log you can see, there's no any log after spill the shuffle files to disk. > !95330.png! > And here is the thread dump, we could see that it always calls native method > {{size0}}. > !91ADA.png! > And we use strace to trace the system, we found that this thread is always > calling {{fstat}}, and the system usage is pretty high, here is the > screenshot. > !D18F4.png! > We didn't find the root cause here, I guess it might be related to FS or disk > issue. Anyway we should figure out a way to fail fast in a such scenario. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28849) Spark's UnsafeShuffleWriter may run into infinite loop in transferTo occasionally
[ https://issues.apache.org/jira/browse/SPARK-28849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao updated SPARK-28849: Description: Spark's {{UnsafeShuffleWriter}} may run into infinite loop when calling {{transferTo}} occasionally. What we saw is that when merging shuffle temp file, the task is hung for several hours until killed manually. Here's the log you can see, there's no any log after spill the shuffle files to disk. !95330.png! And here is the thread dump, we could see that it is calling native method {{size0}}. !91ADA.png! And we use strace to trace the system, we found that this thread is always calling {{fstat}}, here is the screenshot. !D18F4.png! We didn't find the root cause here, I guess it might be related to FS or disk issue. Anyway we should figure out a way to fail fast in a such scenario. was: Spark's {{UnsafeShuffleWriter}} may run into infinite loop when calling {{transferTo}} occasionally. What we saw is that when merging shuffle temp file, the task is hung for several hours until killed manually. Here's the log you can see, there's no any log after spill the shuffle files to disk for several hours. !95330.png! And here is the thread dump, we could see that it is calling native method {{size0}}. !91ADA.png! And we use strace to trace the system, we found that this thread is always calling {{fstat}}, here is the screenshot. !D18F4.png! We didn't find the root cause here, I guess it might be related to FS or disk issue. Anyway we should figure out a way to fail fast in a such scenario. > Spark's UnsafeShuffleWriter may run into infinite loop in transferTo > occasionally > - > > Key: SPARK-28849 > URL: https://issues.apache.org/jira/browse/SPARK-28849 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.1 >Reporter: Saisai Shao >Priority: Major > Attachments: 91ADA.png, 95330.png, D18F4.png > > > Spark's {{UnsafeShuffleWriter}} may run into infinite loop when calling > {{transferTo}} occasionally. What we saw is that when merging shuffle temp > file, the task is hung for several hours until killed manually. Here's the > log you can see, there's no any log after spill the shuffle files to disk. > !95330.png! > And here is the thread dump, we could see that it is calling native method > {{size0}}. > !91ADA.png! > And we use strace to trace the system, we found that this thread is always > calling {{fstat}}, here is the screenshot. > !D18F4.png! > We didn't find the root cause here, I guess it might be related to FS or disk > issue. Anyway we should figure out a way to fail fast in a such scenario. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28849) Spark's UnsafeShuffleWriter may run into infinite loop in transferTo occasionally
[ https://issues.apache.org/jira/browse/SPARK-28849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao updated SPARK-28849: Attachment: D18F4.png 95330.png 91ADA.png > Spark's UnsafeShuffleWriter may run into infinite loop in transferTo > occasionally > - > > Key: SPARK-28849 > URL: https://issues.apache.org/jira/browse/SPARK-28849 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.1 >Reporter: Saisai Shao >Priority: Major > Attachments: 91ADA.png, 95330.png, D18F4.png > > > Spark's {{UnsafeShuffleWriter}} may run into infinite loop when calling > {{transferTo}} occasionally. What we saw is that when merging shuffle temp > file, the task is hung for several hours until killed manually. Here's the > log you can see, there's no any log after spill the shuffle files to disk for > several hours. > And here is the thread dump, we could see that it is calling native method > {{size0}}. > And we use strace to trace the system, we found that this thread is always > calling {{fstat}}, here is the screenshot. > We didn't find the root cause here, I guess it might be related to FS or disk > issue. Anyway we should figure out a way to fail fast in a such scenario. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28849) Spark's UnsafeShuffleWriter may run into infinite loop in transferTo occasionally
[ https://issues.apache.org/jira/browse/SPARK-28849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao updated SPARK-28849: Description: Spark's {{UnsafeShuffleWriter}} may run into infinite loop when calling {{transferTo}} occasionally. What we saw is that when merging shuffle temp file, the task is hung for several hours until killed manually. Here's the log you can see, there's no any log after spill the shuffle files to disk for several hours. !95330.png! And here is the thread dump, we could see that it is calling native method {{size0}}. !91ADA.png! And we use strace to trace the system, we found that this thread is always calling {{fstat}}, here is the screenshot. !D18F4.png! We didn't find the root cause here, I guess it might be related to FS or disk issue. Anyway we should figure out a way to fail fast in a such scenario. was: Spark's {{UnsafeShuffleWriter}} may run into infinite loop when calling {{transferTo}} occasionally. What we saw is that when merging shuffle temp file, the task is hung for several hours until killed manually. Here's the log you can see, there's no any log after spill the shuffle files to disk for several hours. And here is the thread dump, we could see that it is calling native method {{size0}}. And we use strace to trace the system, we found that this thread is always calling {{fstat}}, here is the screenshot. We didn't find the root cause here, I guess it might be related to FS or disk issue. Anyway we should figure out a way to fail fast in a such scenario. > Spark's UnsafeShuffleWriter may run into infinite loop in transferTo > occasionally > - > > Key: SPARK-28849 > URL: https://issues.apache.org/jira/browse/SPARK-28849 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.1 >Reporter: Saisai Shao >Priority: Major > Attachments: 91ADA.png, 95330.png, D18F4.png > > > Spark's {{UnsafeShuffleWriter}} may run into infinite loop when calling > {{transferTo}} occasionally. What we saw is that when merging shuffle temp > file, the task is hung for several hours until killed manually. Here's the > log you can see, there's no any log after spill the shuffle files to disk for > several hours. > !95330.png! > And here is the thread dump, we could see that it is calling native method > {{size0}}. > !91ADA.png! > And we use strace to trace the system, we found that this thread is always > calling {{fstat}}, here is the screenshot. > !D18F4.png! > We didn't find the root cause here, I guess it might be related to FS or disk > issue. Anyway we should figure out a way to fail fast in a such scenario. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28849) Spark's UnsafeShuffleWriter may run into infinite loop in transferTo occasionally
Saisai Shao created SPARK-28849: --- Summary: Spark's UnsafeShuffleWriter may run into infinite loop in transferTo occasionally Key: SPARK-28849 URL: https://issues.apache.org/jira/browse/SPARK-28849 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.3.1 Reporter: Saisai Shao Spark's {{UnsafeShuffleWriter}} may run into infinite loop when calling {{transferTo}} occasionally. What we saw is that when merging shuffle temp file, the task is hung for several hours until killed manually. Here's the log you can see, there's no any log after spill the shuffle files to disk for several hours. And here is the thread dump, we could see that it is calling native method {{size0}}. And we use strace to trace the system, we found that this thread is always calling {{fstat}}, here is the screenshot. We didn't find the root cause here, I guess it might be related to FS or disk issue. Anyway we should figure out a way to fail fast in a such scenario. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (LIVY-637) get NullPointerException when create database using thriftserver
[ https://issues.apache.org/jira/browse/LIVY-637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao resolved LIVY-637. -- Fix Version/s: 0.7.0 Resolution: Fixed > get NullPointerException when create database using thriftserver > > > Key: LIVY-637 > URL: https://issues.apache.org/jira/browse/LIVY-637 > Project: Livy > Issue Type: Bug > Components: Thriftserver >Affects Versions: 0.6.0 >Reporter: mingchao zhao >Assignee: mingchao zhao >Priority: Major > Fix For: 0.7.0 > > Attachments: create.png, drop.png, use.png > > Time Spent: 4h > Remaining Estimate: 0h > > When I connected thriftserver with spark beeline. NullPointerException occurs > when execute the following SQL. This exception does not affect the final > execution result. > create database test; > use test; > drop database test; > 0: jdbc:hive2://localhost:10090> create database test; > java.lang.NullPointerException > at org.apache.hive.service.cli.ColumnBasedSet.(ColumnBasedSet.java:50) > at org.apache.hive.service.cli.RowSetFactory.create(RowSetFactory.java:37) > at org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:368) > at org.apache.hive.beeline.BufferedRows.(BufferedRows.java:42) > at org.apache.hive.beeline.BeeLine.print(BeeLine.java:1794) > at org.apache.hive.beeline.Commands.execute(Commands.java:860) > at org.apache.hive.beeline.Commands.sql(Commands.java:713) > at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:973) > at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:813) > at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:771) > at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:484) > at org.apache.hive.beeline.BeeLine.main(BeeLine.java:467) > Error: Error retrieving next row (state=,code=0) > 0: jdbc:hive2://localhost:10090> use test; > java.lang.NullPointerException > at org.apache.hive.service.cli.ColumnBasedSet.(ColumnBasedSet.java:50) > at org.apache.hive.service.cli.RowSetFactory.create(RowSetFactory.java:37) > at org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:368) > at org.apache.hive.beeline.BufferedRows.(BufferedRows.java:42) > at org.apache.hive.beeline.BeeLine.print(BeeLine.java:1794) > at org.apache.hive.beeline.Commands.execute(Commands.java:860) > at org.apache.hive.beeline.Commands.sql(Commands.java:713) > at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:973) > at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:813) > at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:771) > at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:484) > at org.apache.hive.beeline.BeeLine.main(BeeLine.java:467) > Error: Error retrieving next row (state=,code=0) > 0: jdbc:hive2://localhost:10090> drop database test; > java.lang.NullPointerException > at org.apache.hive.service.cli.ColumnBasedSet.(ColumnBasedSet.java:50) > at org.apache.hive.service.cli.RowSetFactory.create(RowSetFactory.java:37) > at org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:368) > at org.apache.hive.beeline.BufferedRows.(BufferedRows.java:42) > at org.apache.hive.beeline.BeeLine.print(BeeLine.java:1794) > at org.apache.hive.beeline.Commands.execute(Commands.java:860) > at org.apache.hive.beeline.Commands.sql(Commands.java:713) > at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:973) > at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:813) > at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:771) > at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:484) > at org.apache.hive.beeline.BeeLine.main(BeeLine.java:467) > Error: Error retrieving next row (state=,code=0) > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (LIVY-637) get NullPointerException when create database using thriftserver
[ https://issues.apache.org/jira/browse/LIVY-637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16913001#comment-16913001 ] Saisai Shao commented on LIVY-637: -- Issue resolved by pull request 200 https://github.com/apache/incubator-livy/pull/200 > get NullPointerException when create database using thriftserver > > > Key: LIVY-637 > URL: https://issues.apache.org/jira/browse/LIVY-637 > Project: Livy > Issue Type: Bug > Components: Thriftserver >Affects Versions: 0.6.0 >Reporter: mingchao zhao >Assignee: mingchao zhao >Priority: Major > Fix For: 0.7.0 > > Attachments: create.png, drop.png, use.png > > Time Spent: 4h > Remaining Estimate: 0h > > When I connected thriftserver with spark beeline. NullPointerException occurs > when execute the following SQL. This exception does not affect the final > execution result. > create database test; > use test; > drop database test; > 0: jdbc:hive2://localhost:10090> create database test; > java.lang.NullPointerException > at org.apache.hive.service.cli.ColumnBasedSet.(ColumnBasedSet.java:50) > at org.apache.hive.service.cli.RowSetFactory.create(RowSetFactory.java:37) > at org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:368) > at org.apache.hive.beeline.BufferedRows.(BufferedRows.java:42) > at org.apache.hive.beeline.BeeLine.print(BeeLine.java:1794) > at org.apache.hive.beeline.Commands.execute(Commands.java:860) > at org.apache.hive.beeline.Commands.sql(Commands.java:713) > at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:973) > at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:813) > at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:771) > at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:484) > at org.apache.hive.beeline.BeeLine.main(BeeLine.java:467) > Error: Error retrieving next row (state=,code=0) > 0: jdbc:hive2://localhost:10090> use test; > java.lang.NullPointerException > at org.apache.hive.service.cli.ColumnBasedSet.(ColumnBasedSet.java:50) > at org.apache.hive.service.cli.RowSetFactory.create(RowSetFactory.java:37) > at org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:368) > at org.apache.hive.beeline.BufferedRows.(BufferedRows.java:42) > at org.apache.hive.beeline.BeeLine.print(BeeLine.java:1794) > at org.apache.hive.beeline.Commands.execute(Commands.java:860) > at org.apache.hive.beeline.Commands.sql(Commands.java:713) > at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:973) > at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:813) > at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:771) > at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:484) > at org.apache.hive.beeline.BeeLine.main(BeeLine.java:467) > Error: Error retrieving next row (state=,code=0) > 0: jdbc:hive2://localhost:10090> drop database test; > java.lang.NullPointerException > at org.apache.hive.service.cli.ColumnBasedSet.(ColumnBasedSet.java:50) > at org.apache.hive.service.cli.RowSetFactory.create(RowSetFactory.java:37) > at org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:368) > at org.apache.hive.beeline.BufferedRows.(BufferedRows.java:42) > at org.apache.hive.beeline.BeeLine.print(BeeLine.java:1794) > at org.apache.hive.beeline.Commands.execute(Commands.java:860) > at org.apache.hive.beeline.Commands.sql(Commands.java:713) > at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:973) > at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:813) > at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:771) > at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:484) > at org.apache.hive.beeline.BeeLine.main(BeeLine.java:467) > Error: Error retrieving next row (state=,code=0) > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Assigned] (LIVY-637) get NullPointerException when create database using thriftserver
[ https://issues.apache.org/jira/browse/LIVY-637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao reassigned LIVY-637: Assignee: mingchao zhao > get NullPointerException when create database using thriftserver > > > Key: LIVY-637 > URL: https://issues.apache.org/jira/browse/LIVY-637 > Project: Livy > Issue Type: Bug > Components: Thriftserver >Affects Versions: 0.6.0 >Reporter: mingchao zhao >Assignee: mingchao zhao >Priority: Major > Attachments: create.png, drop.png, use.png > > Time Spent: 4h > Remaining Estimate: 0h > > When I connected thriftserver with spark beeline. NullPointerException occurs > when execute the following SQL. This exception does not affect the final > execution result. > create database test; > use test; > drop database test; > 0: jdbc:hive2://localhost:10090> create database test; > java.lang.NullPointerException > at org.apache.hive.service.cli.ColumnBasedSet.(ColumnBasedSet.java:50) > at org.apache.hive.service.cli.RowSetFactory.create(RowSetFactory.java:37) > at org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:368) > at org.apache.hive.beeline.BufferedRows.(BufferedRows.java:42) > at org.apache.hive.beeline.BeeLine.print(BeeLine.java:1794) > at org.apache.hive.beeline.Commands.execute(Commands.java:860) > at org.apache.hive.beeline.Commands.sql(Commands.java:713) > at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:973) > at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:813) > at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:771) > at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:484) > at org.apache.hive.beeline.BeeLine.main(BeeLine.java:467) > Error: Error retrieving next row (state=,code=0) > 0: jdbc:hive2://localhost:10090> use test; > java.lang.NullPointerException > at org.apache.hive.service.cli.ColumnBasedSet.(ColumnBasedSet.java:50) > at org.apache.hive.service.cli.RowSetFactory.create(RowSetFactory.java:37) > at org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:368) > at org.apache.hive.beeline.BufferedRows.(BufferedRows.java:42) > at org.apache.hive.beeline.BeeLine.print(BeeLine.java:1794) > at org.apache.hive.beeline.Commands.execute(Commands.java:860) > at org.apache.hive.beeline.Commands.sql(Commands.java:713) > at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:973) > at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:813) > at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:771) > at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:484) > at org.apache.hive.beeline.BeeLine.main(BeeLine.java:467) > Error: Error retrieving next row (state=,code=0) > 0: jdbc:hive2://localhost:10090> drop database test; > java.lang.NullPointerException > at org.apache.hive.service.cli.ColumnBasedSet.(ColumnBasedSet.java:50) > at org.apache.hive.service.cli.RowSetFactory.create(RowSetFactory.java:37) > at org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:368) > at org.apache.hive.beeline.BufferedRows.(BufferedRows.java:42) > at org.apache.hive.beeline.BeeLine.print(BeeLine.java:1794) > at org.apache.hive.beeline.Commands.execute(Commands.java:860) > at org.apache.hive.beeline.Commands.sql(Commands.java:713) > at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:973) > at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:813) > at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:771) > at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:484) > at org.apache.hive.beeline.BeeLine.main(BeeLine.java:467) > Error: Error retrieving next row (state=,code=0) > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Closed] (LIVY-591) ACLs enforcement should occur on both session owner and proxy user
[ https://issues.apache.org/jira/browse/LIVY-591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao closed LIVY-591. Resolution: Duplicate > ACLs enforcement should occur on both session owner and proxy user > -- > > Key: LIVY-591 > URL: https://issues.apache.org/jira/browse/LIVY-591 > Project: Livy > Issue Type: Improvement > Components: Server >Affects Versions: 0.6.0 >Reporter: Ankur Gupta >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Currently ACLs enforcement occurs only on session owner. So, a request is > authorized if the request user is same as session owner or has correct ACLs > configured. > Eg: > https://github.com/apache/incubator-livy/blob/master/server/src/main/scala/org/apache/livy/server/interactive/InteractiveSessionServlet.scala#L70 > In case of impersonation, proxy user is checked against session owner, > instead he should be checked against session proxy. Otherwise, a proxy user > who created the session will not be able to submit statements against it, if > ACLs are not configured correctly. > Additionally, it seems there is no auth-check right now while creating a > session. We should add that check as well (against modify-session acls). -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (LIVY-592) Proxy user cannot view its session log
[ https://issues.apache.org/jira/browse/LIVY-592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao resolved LIVY-592. -- Fix Version/s: 0.7.0 Assignee: Yiheng Wang Resolution: Fixed > Proxy user cannot view its session log > -- > > Key: LIVY-592 > URL: https://issues.apache.org/jira/browse/LIVY-592 > Project: Livy > Issue Type: Bug > Components: Server > Environment: Docker running on Kubernetes >Reporter: Zikun Xu >Assignee: Yiheng Wang >Priority: Minor > Fix For: 0.7.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Here is how to reproduce the issue. > > root@storage-0-0:~# kinit admin > Password for [admin@AZDATA.LOCAL|mailto:admin@AZDATA.LOCAL]: > Warning: Your password will expire in 41 days on Tue Jun 11 08:35:19 2019 > root@storage-0-0:~# > root@storage-0-0:~# curl -k -X POST --negotiate -u : --data '\{"kind": > "pyspark", "proxyUser": "admin"}' -H "Content-Type: application/json" > 'https://gateway-0.azdata.local:8443/gateway/default/livy/v1/sessions' > {"id":0,"name":null,"appId":null,"owner":"knox","proxyUser":"admin","state":"starting","kind":"pyspark","appInfo":\{"driverLogUrl":null,"sparkUiUrl":null},"log":[]} > > root@storage-0-0:~# curl -k --negotiate -u : > 'https://gateway-0.azdata.local:8443/gateway/default/livy/v1/sessions' > {"from":0,"total":2,"sessions":[{"id":0,"name":null,"appId":"application_1556613676830_0001","owner":"knox","proxyUser":"admin","state":"starting","kind":"pyspark","appInfo":{"driverLogUrl":"[http://storage-0-0.storage-0-svc.test.svc.cluster.local:8042/node/containerlogs/container_1556613676830_0001_01_01/admin]","sparkUiUrl":"[http://master-0.azdata.local:8088/proxy/application_1556613676830_0001/]"},"log":[]},\{"id":1,"name":null,"appId":null,"owner":"knox","proxyUser":"bob","state":"starting","kind":"pyspark","appInfo":{"driverLogUrl":null,"sparkUiUrl":null},"log":[]}]} > > From the result, you can see that the user admin can not view the log of its > own session. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (LIVY-592) Proxy user cannot view its session log
[ https://issues.apache.org/jira/browse/LIVY-592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16912875#comment-16912875 ] Saisai Shao commented on LIVY-592: -- Issue resolved by pull request 202 https://github.com/apache/incubator-livy/pull/202 > Proxy user cannot view its session log > -- > > Key: LIVY-592 > URL: https://issues.apache.org/jira/browse/LIVY-592 > Project: Livy > Issue Type: Bug > Components: Server > Environment: Docker running on Kubernetes >Reporter: Zikun Xu >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > Here is how to reproduce the issue. > > root@storage-0-0:~# kinit admin > Password for [admin@AZDATA.LOCAL|mailto:admin@AZDATA.LOCAL]: > Warning: Your password will expire in 41 days on Tue Jun 11 08:35:19 2019 > root@storage-0-0:~# > root@storage-0-0:~# curl -k -X POST --negotiate -u : --data '\{"kind": > "pyspark", "proxyUser": "admin"}' -H "Content-Type: application/json" > 'https://gateway-0.azdata.local:8443/gateway/default/livy/v1/sessions' > {"id":0,"name":null,"appId":null,"owner":"knox","proxyUser":"admin","state":"starting","kind":"pyspark","appInfo":\{"driverLogUrl":null,"sparkUiUrl":null},"log":[]} > > root@storage-0-0:~# curl -k --negotiate -u : > 'https://gateway-0.azdata.local:8443/gateway/default/livy/v1/sessions' > {"from":0,"total":2,"sessions":[{"id":0,"name":null,"appId":"application_1556613676830_0001","owner":"knox","proxyUser":"admin","state":"starting","kind":"pyspark","appInfo":{"driverLogUrl":"[http://storage-0-0.storage-0-svc.test.svc.cluster.local:8042/node/containerlogs/container_1556613676830_0001_01_01/admin]","sparkUiUrl":"[http://master-0.azdata.local:8088/proxy/application_1556613676830_0001/]"},"log":[]},\{"id":1,"name":null,"appId":null,"owner":"knox","proxyUser":"bob","state":"starting","kind":"pyspark","appInfo":{"driverLogUrl":null,"sparkUiUrl":null},"log":[]}]} > > From the result, you can see that the user admin can not view the log of its > own session. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Assigned] (LIVY-623) Implement GetTables metadata operation
[ https://issues.apache.org/jira/browse/LIVY-623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao reassigned LIVY-623: Assignee: Yiheng Wang > Implement GetTables metadata operation > -- > > Key: LIVY-623 > URL: https://issues.apache.org/jira/browse/LIVY-623 > Project: Livy > Issue Type: Sub-task > Components: Thriftserver >Reporter: Yiheng Wang >Assignee: Yiheng Wang >Priority: Minor > Fix For: 0.7.0 > > > We should support GetTables metadata operation in Livy thrift server. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Assigned] (LIVY-625) Implement GetFunctions metadata operation
[ https://issues.apache.org/jira/browse/LIVY-625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao reassigned LIVY-625: Assignee: Yiheng Wang > Implement GetFunctions metadata operation > - > > Key: LIVY-625 > URL: https://issues.apache.org/jira/browse/LIVY-625 > Project: Livy > Issue Type: Sub-task > Components: Thriftserver >Reporter: Yiheng Wang >Assignee: Yiheng Wang >Priority: Minor > Fix For: 0.7.0 > > > We should support GetFunctions metadata operation in Livy thrift server. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Assigned] (LIVY-624) Implement GetColumns metadata operation
[ https://issues.apache.org/jira/browse/LIVY-624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao reassigned LIVY-624: Assignee: Yiheng Wang > Implement GetColumns metadata operation > --- > > Key: LIVY-624 > URL: https://issues.apache.org/jira/browse/LIVY-624 > Project: Livy > Issue Type: Sub-task > Components: Thriftserver >Reporter: Yiheng Wang >Assignee: Yiheng Wang >Priority: Minor > Fix For: 0.7.0 > > > We should support GetColumns metadata operation in Livy thrift server. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (LIVY-575) Implement missing metadata operations
[ https://issues.apache.org/jira/browse/LIVY-575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao updated LIVY-575: - Priority: Major (was: Minor) > Implement missing metadata operations > - > > Key: LIVY-575 > URL: https://issues.apache.org/jira/browse/LIVY-575 > Project: Livy > Issue Type: Improvement > Components: Thriftserver >Reporter: Marco Gaido >Priority: Major > > Many metadata operations (eg. table list retrieval, schema retrieval, ...) > are currently not implemented. We should implement them. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Assigned] (LIVY-622) Implement GetSchemas metadata operation
[ https://issues.apache.org/jira/browse/LIVY-622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao reassigned LIVY-622: Assignee: Yiheng Wang > Implement GetSchemas metadata operation > --- > > Key: LIVY-622 > URL: https://issues.apache.org/jira/browse/LIVY-622 > Project: Livy > Issue Type: Sub-task > Components: Thriftserver >Reporter: Yiheng Wang >Assignee: Yiheng Wang >Priority: Minor > Fix For: 0.7.0 > > Time Spent: 1h > Remaining Estimate: 0h > > We should support GetSchemas metadata operation in Livy thrift server. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Resolved] (LIVY-625) Implement GetFunctions metadata operation
[ https://issues.apache.org/jira/browse/LIVY-625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao resolved LIVY-625. -- Resolution: Fixed Fix Version/s: 0.7.0 Issue resolved by pull request 194 [https://github.com/apache/incubator-livy/pull/194] > Implement GetFunctions metadata operation > - > > Key: LIVY-625 > URL: https://issues.apache.org/jira/browse/LIVY-625 > Project: Livy > Issue Type: Sub-task > Components: Thriftserver >Reporter: Yiheng Wang >Priority: Minor > Fix For: 0.7.0 > > > We should support GetFunctions metadata operation in Livy thrift server. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Resolved] (LIVY-624) Implement GetColumns metadata operation
[ https://issues.apache.org/jira/browse/LIVY-624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao resolved LIVY-624. -- Resolution: Fixed Fix Version/s: 0.7.0 Issue resolved by pull request 194 [https://github.com/apache/incubator-livy/pull/194] > Implement GetColumns metadata operation > --- > > Key: LIVY-624 > URL: https://issues.apache.org/jira/browse/LIVY-624 > Project: Livy > Issue Type: Sub-task > Components: Thriftserver >Reporter: Yiheng Wang >Priority: Minor > Fix For: 0.7.0 > > > We should support GetColumns metadata operation in Livy thrift server. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Resolved] (LIVY-623) Implement GetTables metadata operation
[ https://issues.apache.org/jira/browse/LIVY-623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao resolved LIVY-623. -- Resolution: Fixed Fix Version/s: 0.7.0 Issue resolved by pull request 194 [https://github.com/apache/incubator-livy/pull/194] > Implement GetTables metadata operation > -- > > Key: LIVY-623 > URL: https://issues.apache.org/jira/browse/LIVY-623 > Project: Livy > Issue Type: Sub-task > Components: Thriftserver >Reporter: Yiheng Wang >Priority: Minor > Fix For: 0.7.0 > > > We should support GetTables metadata operation in Livy thrift server. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Resolved] (LIVY-622) Implement GetSchemas metadata operation
[ https://issues.apache.org/jira/browse/LIVY-622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao resolved LIVY-622. -- Resolution: Fixed Fix Version/s: 0.7.0 Issue resolved by pull request 194 [https://github.com/apache/incubator-livy/pull/194] > Implement GetSchemas metadata operation > --- > > Key: LIVY-622 > URL: https://issues.apache.org/jira/browse/LIVY-622 > Project: Livy > Issue Type: Sub-task > Components: Thriftserver >Reporter: Yiheng Wang >Priority: Minor > Fix For: 0.7.0 > > Time Spent: 1h > Remaining Estimate: 0h > > We should support GetSchemas metadata operation in Livy thrift server. -- This message was sent by Atlassian JIRA (v7.6.14#76016)