[jira] [Updated] (SPARK-23788) Race condition in StreamingQuerySuite
[ https://issues.apache.org/jira/browse/SPARK-23788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-23788: - Fix Version/s: 2.2.2 > Race condition in StreamingQuerySuite > - > > Key: SPARK-23788 > URL: https://issues.apache.org/jira/browse/SPARK-23788 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.0 >Reporter: Jose Torres >Assignee: Jose Torres >Priority: Minor > Fix For: 2.2.2, 2.3.1, 2.4.0 > > > The serializability test uses the same MemoryStream instance for 3 different > queries. If any of those queries ask it to commit before the others have run, > the rest will see empty dataframes. This can fail the test if q3 is affected. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-23788) Race condition in StreamingQuerySuite
[ https://issues.apache.org/jira/browse/SPARK-23788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-23788. -- Resolution: Fixed Assignee: Jose Torres Fix Version/s: 2.4.0 2.3.1 > Race condition in StreamingQuerySuite > - > > Key: SPARK-23788 > URL: https://issues.apache.org/jira/browse/SPARK-23788 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.0 >Reporter: Jose Torres >Assignee: Jose Torres >Priority: Minor > Fix For: 2.3.1, 2.4.0 > > > The serializability test uses the same MemoryStream instance for 3 different > queries. If any of those queries ask it to commit before the others have run, > the rest will see empty dataframes. This can fail the test if q3 is affected. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23727) Support DATE predict push down in parquet
[ https://issues.apache.org/jira/browse/SPARK-23727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-23727: -- Issue Type: Improvement (was: Bug) > Support DATE predict push down in parquet > - > > Key: SPARK-23727 > URL: https://issues.apache.org/jira/browse/SPARK-23727 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: yucai >Priority: Major > > DATE predict push down is missing, should be supported. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23598) WholeStageCodegen can lead to IllegalAccessError calling append for HashAggregateExec
[ https://issues.apache.org/jira/browse/SPARK-23598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16412835#comment-16412835 ] Dongjoon Hyun commented on SPARK-23598: --- Hi, [~hvanhovell] and [~kiszk]. Although this test case is failing in `branch-2.3` sometimes, I added `2.3.1` to `Fixed Versions` because the patch landed on `branch-2.3`. - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.6/lastCompletedBuild/testReport/org.apache.spark.sql.execution/WholeStageCodegenSuite/SPARK_23598__Codegen_working_for_lots_of_aggregation_operations_without_runtime_errors/ > WholeStageCodegen can lead to IllegalAccessError calling append for > HashAggregateExec > -- > > Key: SPARK-23598 > URL: https://issues.apache.org/jira/browse/SPARK-23598 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: David Vogelbacher >Assignee: Kazuaki Ishizaki >Priority: Major > Fix For: 2.3.1, 2.4.0 > > > Got the following stacktrace for a large QueryPlan using WholeStageCodeGen: > {noformat} > java.lang.IllegalAccessError: tried to access method > org.apache.spark.sql.execution.BufferedRowIterator.append(Lorg/apache/spark/sql/catalyst/InternalRow;)V > from class > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass.agg_doAggregateWithKeysOutput$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125) > at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) > at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) > at org.apache.spark.scheduler.Task.run(Task.scala:109) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345){noformat} > After disabling codegen, everything works. > The root cause seems to be that we are trying to call the protected _append_ > method of > [BufferedRowIterator|https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/BufferedRowIterator.java#L68] > from an inner-class of a sub-class that is loaded by a different > class-loader (after codegen compilation). > [https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-5.html#jvms-5.4.4] > states that a protected method _R_ can be accessed only if one of the > following two conditions is fulfilled: > # R is protected and is declared in a class C, and D is either a subclass of > C or C itself. Furthermore, if R is not static, then the symbolic reference > to R must contain a symbolic reference to a class T, such that T is either a > subclass of D, a superclass of D, or D itself. > # R is either protected or has default access (that is, neither public nor > protected nor private), and is declared by a class in the same run-time > package as D. > 2.) doesn't apply as we have loaded the class with a different class loader > (and are in a different package) and 1.) doesn't apply because we are > apparently trying to call the method from an inner class of a subclass of > _BufferedRowIterator_. > Looking at the Code path of _WholeStageCodeGen_, the following happens: > # In > [WholeStageCodeGen|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala#L527], > we create the subclass of _BufferedRowIterator_, along with a _processNext_ > method for processing the output of the child plan. > # In the child, which is a > [HashAggregateExec|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala#L517], > we create the method which shows up at the top of the stack trace (called > _doAggregateWithKeysOutput_ ) > # We add this method to the compiled code invoking _addNewFunction_ of > [CodeGenerator|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L460] > In the generated function body we call the _append_ method.| > Now, the _addNewFunction_ method states that: > {noformat} > If the
[jira] [Updated] (SPARK-23598) WholeStageCodegen can lead to IllegalAccessError calling append for HashAggregateExec
[ https://issues.apache.org/jira/browse/SPARK-23598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-23598: -- Fix Version/s: 2.3.1 > WholeStageCodegen can lead to IllegalAccessError calling append for > HashAggregateExec > -- > > Key: SPARK-23598 > URL: https://issues.apache.org/jira/browse/SPARK-23598 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: David Vogelbacher >Assignee: Kazuaki Ishizaki >Priority: Major > Fix For: 2.3.1, 2.4.0 > > > Got the following stacktrace for a large QueryPlan using WholeStageCodeGen: > {noformat} > java.lang.IllegalAccessError: tried to access method > org.apache.spark.sql.execution.BufferedRowIterator.append(Lorg/apache/spark/sql/catalyst/InternalRow;)V > from class > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass.agg_doAggregateWithKeysOutput$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125) > at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) > at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) > at org.apache.spark.scheduler.Task.run(Task.scala:109) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345){noformat} > After disabling codegen, everything works. > The root cause seems to be that we are trying to call the protected _append_ > method of > [BufferedRowIterator|https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/BufferedRowIterator.java#L68] > from an inner-class of a sub-class that is loaded by a different > class-loader (after codegen compilation). > [https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-5.html#jvms-5.4.4] > states that a protected method _R_ can be accessed only if one of the > following two conditions is fulfilled: > # R is protected and is declared in a class C, and D is either a subclass of > C or C itself. Furthermore, if R is not static, then the symbolic reference > to R must contain a symbolic reference to a class T, such that T is either a > subclass of D, a superclass of D, or D itself. > # R is either protected or has default access (that is, neither public nor > protected nor private), and is declared by a class in the same run-time > package as D. > 2.) doesn't apply as we have loaded the class with a different class loader > (and are in a different package) and 1.) doesn't apply because we are > apparently trying to call the method from an inner class of a subclass of > _BufferedRowIterator_. > Looking at the Code path of _WholeStageCodeGen_, the following happens: > # In > [WholeStageCodeGen|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala#L527], > we create the subclass of _BufferedRowIterator_, along with a _processNext_ > method for processing the output of the child plan. > # In the child, which is a > [HashAggregateExec|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala#L517], > we create the method which shows up at the top of the stack trace (called > _doAggregateWithKeysOutput_ ) > # We add this method to the compiled code invoking _addNewFunction_ of > [CodeGenerator|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L460] > In the generated function body we call the _append_ method.| > Now, the _addNewFunction_ method states that: > {noformat} > If the code for the `OuterClass` grows too large, the function will be > inlined into a new private, inner class > {noformat} > This indeed seems to happen: the _doAggregateWithKeysOutput_ method is put > into a new private inner class. Thus, it doesn't have access to the protected > _append_ method anymore but still tries to call it, which results in the > _IllegalAccessError._ > Possible fixes: > * Pass in the _inlineToOuterClass_ flag when invoking the _addNewFunction_ > *
[jira] [Commented] (SPARK-23782) SHS should not show applications to user without read permission
[ https://issues.apache.org/jira/browse/SPARK-23782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16412820#comment-16412820 ] Marcelo Vanzin commented on SPARK-23782: bq. The users can see which applications have been run by each users... Sorry but I don't consider any of the things you mentioned sensitive. They basically boil down to: there are other users in the system, and they can run applications. The consequences of this feature for the usability of the SHS (different users see different things) is a lot worse. I'm still against it unless you can make a very good case for it, which I haven't seen yet. > SHS should not show applications to user without read permission > > > Key: SPARK-23782 > URL: https://issues.apache.org/jira/browse/SPARK-23782 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.4.0 >Reporter: Marco Gaido >Priority: Major > > The History Server shows all the applications to all the users, even though > they have no permission to read them. They cannot read the details of the > applications they cannot access, but still anybody can list all the > applications submitted by all users. > For instance, if we have an admin user {{admin}} and two normal users {{u1}} > and {{u2}}, and each of them submitted one application, all of them can see > in the main page of SHS: > ||App ID||App Name|| ... ||Spark User|| ... || > |app-123456789|The Admin App| .. |admin| ... | > |app-123456790|u1 secret app| .. |u1| ... | > |app-123456791|u2 secret app| .. |u2| ... | > Then clicking on each application, the proper permissions are applied and > each user can see only the applications he has the read permission for. > Instead, each user should see only the applications he has the permission to > read and he/she should not be able to see applications he has not the > permissions for. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23791) Sub-optimal generated code for sum aggregating
[ https://issues.apache.org/jira/browse/SPARK-23791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16412799#comment-16412799 ] Valentin Nikotin commented on SPARK-23791: -- When testing aggregation with different number of columns (v2.3.0) I found that 100 columns works the approx same time for both cases. With 90 columns Spark job failed with {noformat} 18/03/25 00:11:33 ERROR Executor: Exception in task 117.0 in stage 1.0 (TID 4) java.lang.ClassFormatError: Too many arguments in method signature in class file org/apache/spark/sql/catalyst/expressions/GeneratedClass$GeneratedIteratorForCodegenStage2 at java.lang.ClassLoader.defineClass1(Native Method) {noformat} > Sub-optimal generated code for sum aggregating > -- > > Key: SPARK-23791 > URL: https://issues.apache.org/jira/browse/SPARK-23791 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 2.2.0, 2.3.0 >Reporter: Valentin Nikotin >Priority: Major > Labels: performance > Original Estimate: 24h > Remaining Estimate: 24h > > It appears to be that with wholeStage codegen enabled simple spark job > performing sum aggregation of 50 columns runs ~4 timer slower than without > wholeStage codegen. > Please check test case code. Please note that udf is only to prevent > elimination optimizations that could be applied to literals. > {code:scala} > import org.apache.spark.sql.functions._ > import org.apache.spark.sql.{Column, DataFrame, SparkSession} > import org.apache.spark.sql.internal.SQLConf.WHOLESTAGE_CODEGEN_ENABLED > object SPARK_23791 { > def main(args: Array[String]): Unit = { > val spark = SparkSession > .builder() > .master("local[4]") > .appName("test") > .getOrCreate() > def addConstColumns(prefix: String, cnt: Int, value: Column)(inputDF: > DataFrame) = > (0 until cnt).foldLeft(inputDF)((df, idx) => > df.withColumn(s"$prefix$idx", value)) > val dummy = udf(() => Option.empty[Int]) > def test(cnt: Int = 50, rows: Int = 500, grps: Int = 1000): Double = { > val t0 = System.nanoTime() > spark.range(rows).toDF() > .withColumn("grp", col("id").mod(grps)) > .transform(addConstColumns("null_", cnt, dummy())) > .groupBy("grp") > .agg(sum("null_0"), (1 until cnt).map(idx => sum(s"null_$idx")): _*) > .collect() > val t1 = System.nanoTime() > (t1 - t0) / 1e9 > } > val timings = for (i <- 1 to 3) yield { > spark.sessionState.conf.setConf(WHOLESTAGE_CODEGEN_ENABLED, true) > val with_wholestage = test() > spark.sessionState.conf.setConf(WHOLESTAGE_CODEGEN_ENABLED, false) > val without_wholestage = test() > (with_wholestage, without_wholestage) > } > timings.foreach(println) > println("Press enter ...") > System.in.read() > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23791) Sub-optimal generated code for sum aggregating
[ https://issues.apache.org/jira/browse/SPARK-23791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Valentin Nikotin updated SPARK-23791: - Description: It appears to be that with wholeStage codegen enabled simple spark job performing sum aggregation of 50 columns runs ~4 timer slower than without wholeStage codegen. Please check test case code. Please note that udf is only to prevent elimination optimizations that could be applied to literals. {code:scala} import org.apache.spark.sql.functions._ import org.apache.spark.sql.{Column, DataFrame, SparkSession} import org.apache.spark.sql.internal.SQLConf.WHOLESTAGE_CODEGEN_ENABLED object SPARK_23791 { def main(args: Array[String]): Unit = { val spark = SparkSession .builder() .master("local[4]") .appName("test") .getOrCreate() def addConstColumns(prefix: String, cnt: Int, value: Column)(inputDF: DataFrame) = (0 until cnt).foldLeft(inputDF)((df, idx) => df.withColumn(s"$prefix$idx", value)) val dummy = udf(() => Option.empty[Int]) def test(cnt: Int = 50, rows: Int = 500, grps: Int = 1000): Double = { val t0 = System.nanoTime() spark.range(rows).toDF() .withColumn("grp", col("id").mod(grps)) .transform(addConstColumns("null_", cnt, dummy())) .groupBy("grp") .agg(sum("null_0"), (1 until cnt).map(idx => sum(s"null_$idx")): _*) .collect() val t1 = System.nanoTime() (t1 - t0) / 1e9 } val timings = for (i <- 1 to 3) yield { spark.sessionState.conf.setConf(WHOLESTAGE_CODEGEN_ENABLED, true) val with_wholestage = test() spark.sessionState.conf.setConf(WHOLESTAGE_CODEGEN_ENABLED, false) val without_wholestage = test() (with_wholestage, without_wholestage) } timings.foreach(println) println("Press enter ...") System.in.read() } } {code} was: It appears to be that with wholeStage codegen enabled simple spark job performing sum aggregation of 50 nullable columns runs ~4 timer slower than without wholeStage codegen. Please check test case code. Please note that udf is only to prevent elimination optimizations that could be applied to literals. {code:scala} import org.apache.spark.sql.functions._ import org.apache.spark.sql.{Column, DataFrame, SparkSession} import org.apache.spark.sql.internal.SQLConf.WHOLESTAGE_CODEGEN_ENABLED object SPARK_23791 { def main(args: Array[String]): Unit = { val spark = SparkSession .builder() .master("local[4]") .appName("test") .getOrCreate() def addConstColumns(prefix: String, cnt: Int, value: Column)(inputDF: DataFrame) = (0 until cnt).foldLeft(inputDF)((df, idx) => df.withColumn(s"$prefix$idx", value)) val dummy = udf(() => Option.empty[Int]) def test(cnt: Int = 50, rows: Int = 500, grps: Int = 1000): Double = { val t0 = System.nanoTime() spark.range(rows).toDF() .withColumn("grp", col("id").mod(grps)) .transform(addConstColumns("null_", cnt, dummy())) .groupBy("grp") .agg(sum("null_0"), (1 until cnt).map(idx => sum(s"null_$idx")): _*) .collect() val t1 = System.nanoTime() (t1 - t0) / 1e9 } val timings = for (i <- 1 to 3) yield { spark.sessionState.conf.setConf(WHOLESTAGE_CODEGEN_ENABLED, true) val with_wholestage = test() spark.sessionState.conf.setConf(WHOLESTAGE_CODEGEN_ENABLED, false) val without_wholestage = test() (with_wholestage, without_wholestage) } timings.foreach(println) println("Press enter ...") System.in.read() } } {code} > Sub-optimal generated code for sum aggregating > -- > > Key: SPARK-23791 > URL: https://issues.apache.org/jira/browse/SPARK-23791 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 2.2.0, 2.3.0 >Reporter: Valentin Nikotin >Priority: Major > Labels: performance > Original Estimate: 24h > Remaining Estimate: 24h > > It appears to be that with wholeStage codegen enabled simple spark job > performing sum aggregation of 50 columns runs ~4 timer slower than without > wholeStage codegen. > Please check test case code. Please note that udf is only to prevent > elimination optimizations that could be applied to literals. > {code:scala} > import org.apache.spark.sql.functions._ > import org.apache.spark.sql.{Column, DataFrame, SparkSession} > import org.apache.spark.sql.internal.SQLConf.WHOLESTAGE_CODEGEN_ENABLED > object SPARK_23791 { > def main(args: Array[String]): Unit = { > val spark = SparkSession > .builder() > .master("local[4]") > .appName("test") > .getOrCreate() > def addConstColumns(prefix: String, cnt: Int, value: Column)(inputDF: >
[jira] [Updated] (SPARK-23791) Sub-optimal generated code for sum aggregating
[ https://issues.apache.org/jira/browse/SPARK-23791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Valentin Nikotin updated SPARK-23791: - Summary: Sub-optimal generated code for sum aggregating (was: Sub-optimal generated code when aggregating nullable columns) > Sub-optimal generated code for sum aggregating > -- > > Key: SPARK-23791 > URL: https://issues.apache.org/jira/browse/SPARK-23791 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 2.2.0, 2.3.0 >Reporter: Valentin Nikotin >Priority: Major > Labels: performance > Original Estimate: 24h > Remaining Estimate: 24h > > It appears to be that with wholeStage codegen enabled simple spark job > performing sum aggregation of 50 nullable columns runs ~4 timer slower than > without wholeStage codegen. > Please check test case code. Please note that udf is only to prevent > elimination optimizations that could be applied to literals. > {code:scala} > import org.apache.spark.sql.functions._ > import org.apache.spark.sql.{Column, DataFrame, SparkSession} > import org.apache.spark.sql.internal.SQLConf.WHOLESTAGE_CODEGEN_ENABLED > object SPARK_23791 { > def main(args: Array[String]): Unit = { > val spark = SparkSession > .builder() > .master("local[4]") > .appName("test") > .getOrCreate() > def addConstColumns(prefix: String, cnt: Int, value: Column)(inputDF: > DataFrame) = > (0 until cnt).foldLeft(inputDF)((df, idx) => > df.withColumn(s"$prefix$idx", value)) > val dummy = udf(() => Option.empty[Int]) > def test(cnt: Int = 50, rows: Int = 500, grps: Int = 1000): Double = { > val t0 = System.nanoTime() > spark.range(rows).toDF() > .withColumn("grp", col("id").mod(grps)) > .transform(addConstColumns("null_", cnt, dummy())) > .groupBy("grp") > .agg(sum("null_0"), (1 until cnt).map(idx => sum(s"null_$idx")): _*) > .collect() > val t1 = System.nanoTime() > (t1 - t0) / 1e9 > } > val timings = for (i <- 1 to 3) yield { > spark.sessionState.conf.setConf(WHOLESTAGE_CODEGEN_ENABLED, true) > val with_wholestage = test() > spark.sessionState.conf.setConf(WHOLESTAGE_CODEGEN_ENABLED, false) > val without_wholestage = test() > (with_wholestage, without_wholestage) > } > timings.foreach(println) > println("Press enter ...") > System.in.read() > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23791) Sub-optimal generated code when aggregating nullable columns
[ https://issues.apache.org/jira/browse/SPARK-23791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Valentin Nikotin updated SPARK-23791: - Environment: (was: {code:java} import org.apache.spark.sql.functions._ import org.apache.spark.sql.{Column, DataFrame, SparkSession} import org.apache.spark.sql.internal.SQLConf.WHOLESTAGE_CODEGEN_ENABLED object TestCase { def main(args: Array[String]): Unit = { val spark = SparkSession .builder() .master("local[4]") .appName("test") .getOrCreate() def addConstColumns(prefix: String, cnt: Int, value: Column)(inputDF: DataFrame) = (0 until cnt).foldLeft(inputDF)((df, idx) => df.withColumn(s"$prefix$idx", value)) val dummy = udf(() => Option.empty[Int]) def test(cnt: Int = 50, rows: Int = 500, grps: Int = 1000): Double = { val t0 = System.nanoTime() spark.range(rows).toDF() .withColumn("grp", col("id").mod(grps)) .transform(addConstColumns("null_", cnt, dummy())) .groupBy("grp") .agg(sum("null_0"), (1 until cnt).map(idx => sum(s"null_$idx")): _*) .collect() val t1 = System.nanoTime() (t1 - t0) / 1e9 } val timings = for (i <- 1 to 3) yield { spark.sessionState.conf.setConf(WHOLESTAGE_CODEGEN_ENABLED, true) val with_wholestage = test() spark.sessionState.conf.setConf(WHOLESTAGE_CODEGEN_ENABLED, false) val without_wholestage = test() (with_wholestage, without_wholestage) } timings.foreach(println) println("Press enter ...") System.in.read() } } {code}) > Sub-optimal generated code when aggregating nullable columns > > > Key: SPARK-23791 > URL: https://issues.apache.org/jira/browse/SPARK-23791 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 2.2.0, 2.3.0 >Reporter: Valentin Nikotin >Priority: Major > Labels: performance > Original Estimate: 24h > Remaining Estimate: 24h > > It appears to be that with wholeStage codegen enabled simple spark job > performing sum aggregation of 50 nullable columns runs ~4 timer slower than > without wholeStage codegen. > Please check test case code. Please note that udf is only to prevent > elimination optimizations that could be applied to literals. > {code:scala} > import org.apache.spark.sql.functions._ > import org.apache.spark.sql.{Column, DataFrame, SparkSession} > import org.apache.spark.sql.internal.SQLConf.WHOLESTAGE_CODEGEN_ENABLED > object SPARK_23791 { > def main(args: Array[String]): Unit = { > val spark = SparkSession > .builder() > .master("local[4]") > .appName("test") > .getOrCreate() > def addConstColumns(prefix: String, cnt: Int, value: Column)(inputDF: > DataFrame) = > (0 until cnt).foldLeft(inputDF)((df, idx) => > df.withColumn(s"$prefix$idx", value)) > val dummy = udf(() => Option.empty[Int]) > def test(cnt: Int = 50, rows: Int = 500, grps: Int = 1000): Double = { > val t0 = System.nanoTime() > spark.range(rows).toDF() > .withColumn("grp", col("id").mod(grps)) > .transform(addConstColumns("null_", cnt, dummy())) > .groupBy("grp") > .agg(sum("null_0"), (1 until cnt).map(idx => sum(s"null_$idx")): _*) > .collect() > val t1 = System.nanoTime() > (t1 - t0) / 1e9 > } > val timings = for (i <- 1 to 3) yield { > spark.sessionState.conf.setConf(WHOLESTAGE_CODEGEN_ENABLED, true) > val with_wholestage = test() > spark.sessionState.conf.setConf(WHOLESTAGE_CODEGEN_ENABLED, false) > val without_wholestage = test() > (with_wholestage, without_wholestage) > } > timings.foreach(println) > println("Press enter ...") > System.in.read() > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23791) Sub-optimal generated code when aggregating nullable columns
[ https://issues.apache.org/jira/browse/SPARK-23791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Valentin Nikotin updated SPARK-23791: - Description: It appears to be that with wholeStage codegen enabled simple spark job performing sum aggregation of 50 nullable columns runs ~4 timer slower than without wholeStage codegen. Please check test case code. Please note that udf is only to prevent elimination optimizations that could be applied to literals. {code:java} import org.apache.spark.sql.functions._ import org.apache.spark.sql.{Column, DataFrame, SparkSession} import org.apache.spark.sql.internal.SQLConf.WHOLESTAGE_CODEGEN_ENABLED object SPARK_23791 { def main(args: Array[String]): Unit = { val spark = SparkSession .builder() .master("local[4]") .appName("test") .getOrCreate() def addConstColumns(prefix: String, cnt: Int, value: Column)(inputDF: DataFrame) = (0 until cnt).foldLeft(inputDF)((df, idx) => df.withColumn(s"$prefix$idx", value)) val dummy = udf(() => Option.empty[Int]) def test(cnt: Int = 50, rows: Int = 500, grps: Int = 1000): Double = { val t0 = System.nanoTime() spark.range(rows).toDF() .withColumn("grp", col("id").mod(grps)) .transform(addConstColumns("null_", cnt, dummy())) .groupBy("grp") .agg(sum("null_0"), (1 until cnt).map(idx => sum(s"null_$idx")): _*) .collect() val t1 = System.nanoTime() (t1 - t0) / 1e9 } val timings = for (i <- 1 to 3) yield { spark.sessionState.conf.setConf(WHOLESTAGE_CODEGEN_ENABLED, true) val with_wholestage = test() spark.sessionState.conf.setConf(WHOLESTAGE_CODEGEN_ENABLED, false) val without_wholestage = test() (with_wholestage, without_wholestage) } timings.foreach(println) println("Press enter ...") System.in.read() } } {code} was: It appears to be that with wholeStage codegen enabled simple spark job performing sum aggregation of 50 nullable columns runs ~4 timer slower than without wholeStage codegen. Please check test case code. Please note that udf is only to prevent elimination optimizations that could be applied to literals. > Sub-optimal generated code when aggregating nullable columns > > > Key: SPARK-23791 > URL: https://issues.apache.org/jira/browse/SPARK-23791 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 2.2.0, 2.3.0 > Environment: {code:java} > import org.apache.spark.sql.functions._ > import org.apache.spark.sql.{Column, DataFrame, SparkSession} > import org.apache.spark.sql.internal.SQLConf.WHOLESTAGE_CODEGEN_ENABLED > object TestCase { > def main(args: Array[String]): Unit = { > val spark = SparkSession > .builder() > .master("local[4]") > .appName("test") > .getOrCreate() > def addConstColumns(prefix: String, cnt: Int, value: Column)(inputDF: > DataFrame) = > (0 until cnt).foldLeft(inputDF)((df, idx) => > df.withColumn(s"$prefix$idx", value)) > val dummy = udf(() => Option.empty[Int]) > def test(cnt: Int = 50, rows: Int = 500, grps: Int = 1000): Double = { > val t0 = System.nanoTime() > spark.range(rows).toDF() > .withColumn("grp", col("id").mod(grps)) > .transform(addConstColumns("null_", cnt, dummy())) > .groupBy("grp") > .agg(sum("null_0"), (1 until cnt).map(idx => sum(s"null_$idx")): _*) > .collect() > val t1 = System.nanoTime() > (t1 - t0) / 1e9 > } > val timings = for (i <- 1 to 3) yield { > spark.sessionState.conf.setConf(WHOLESTAGE_CODEGEN_ENABLED, true) > val with_wholestage = test() > spark.sessionState.conf.setConf(WHOLESTAGE_CODEGEN_ENABLED, false) > val without_wholestage = test() > (with_wholestage, without_wholestage) > } > timings.foreach(println) > println("Press enter ...") > System.in.read() > } > } > {code} >Reporter: Valentin Nikotin >Priority: Major > Labels: performance > Original Estimate: 24h > Remaining Estimate: 24h > > It appears to be that with wholeStage codegen enabled simple spark job > performing sum aggregation of 50 nullable columns runs ~4 timer slower than > without wholeStage codegen. > Please check test case code. Please note that udf is only to prevent > elimination optimizations that could be applied to literals. > {code:java} > import org.apache.spark.sql.functions._ > import org.apache.spark.sql.{Column, DataFrame, SparkSession} > import org.apache.spark.sql.internal.SQLConf.WHOLESTAGE_CODEGEN_ENABLED > object SPARK_23791 { > def main(args: Array[String]): Unit = { > val spark = SparkSession > .builder() > .master("local[4]") >
[jira] [Updated] (SPARK-23791) Sub-optimal generated code when aggregating nullable columns
[ https://issues.apache.org/jira/browse/SPARK-23791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Valentin Nikotin updated SPARK-23791: - Description: It appears to be that with wholeStage codegen enabled simple spark job performing sum aggregation of 50 nullable columns runs ~4 timer slower than without wholeStage codegen. Please check test case code. Please note that udf is only to prevent elimination optimizations that could be applied to literals. {code:scala} import org.apache.spark.sql.functions._ import org.apache.spark.sql.{Column, DataFrame, SparkSession} import org.apache.spark.sql.internal.SQLConf.WHOLESTAGE_CODEGEN_ENABLED object SPARK_23791 { def main(args: Array[String]): Unit = { val spark = SparkSession .builder() .master("local[4]") .appName("test") .getOrCreate() def addConstColumns(prefix: String, cnt: Int, value: Column)(inputDF: DataFrame) = (0 until cnt).foldLeft(inputDF)((df, idx) => df.withColumn(s"$prefix$idx", value)) val dummy = udf(() => Option.empty[Int]) def test(cnt: Int = 50, rows: Int = 500, grps: Int = 1000): Double = { val t0 = System.nanoTime() spark.range(rows).toDF() .withColumn("grp", col("id").mod(grps)) .transform(addConstColumns("null_", cnt, dummy())) .groupBy("grp") .agg(sum("null_0"), (1 until cnt).map(idx => sum(s"null_$idx")): _*) .collect() val t1 = System.nanoTime() (t1 - t0) / 1e9 } val timings = for (i <- 1 to 3) yield { spark.sessionState.conf.setConf(WHOLESTAGE_CODEGEN_ENABLED, true) val with_wholestage = test() spark.sessionState.conf.setConf(WHOLESTAGE_CODEGEN_ENABLED, false) val without_wholestage = test() (with_wholestage, without_wholestage) } timings.foreach(println) println("Press enter ...") System.in.read() } } {code} was: It appears to be that with wholeStage codegen enabled simple spark job performing sum aggregation of 50 nullable columns runs ~4 timer slower than without wholeStage codegen. Please check test case code. Please note that udf is only to prevent elimination optimizations that could be applied to literals. {code:java} import org.apache.spark.sql.functions._ import org.apache.spark.sql.{Column, DataFrame, SparkSession} import org.apache.spark.sql.internal.SQLConf.WHOLESTAGE_CODEGEN_ENABLED object SPARK_23791 { def main(args: Array[String]): Unit = { val spark = SparkSession .builder() .master("local[4]") .appName("test") .getOrCreate() def addConstColumns(prefix: String, cnt: Int, value: Column)(inputDF: DataFrame) = (0 until cnt).foldLeft(inputDF)((df, idx) => df.withColumn(s"$prefix$idx", value)) val dummy = udf(() => Option.empty[Int]) def test(cnt: Int = 50, rows: Int = 500, grps: Int = 1000): Double = { val t0 = System.nanoTime() spark.range(rows).toDF() .withColumn("grp", col("id").mod(grps)) .transform(addConstColumns("null_", cnt, dummy())) .groupBy("grp") .agg(sum("null_0"), (1 until cnt).map(idx => sum(s"null_$idx")): _*) .collect() val t1 = System.nanoTime() (t1 - t0) / 1e9 } val timings = for (i <- 1 to 3) yield { spark.sessionState.conf.setConf(WHOLESTAGE_CODEGEN_ENABLED, true) val with_wholestage = test() spark.sessionState.conf.setConf(WHOLESTAGE_CODEGEN_ENABLED, false) val without_wholestage = test() (with_wholestage, without_wholestage) } timings.foreach(println) println("Press enter ...") System.in.read() } } {code} > Sub-optimal generated code when aggregating nullable columns > > > Key: SPARK-23791 > URL: https://issues.apache.org/jira/browse/SPARK-23791 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 2.2.0, 2.3.0 > Environment: {code:java} > import org.apache.spark.sql.functions._ > import org.apache.spark.sql.{Column, DataFrame, SparkSession} > import org.apache.spark.sql.internal.SQLConf.WHOLESTAGE_CODEGEN_ENABLED > object TestCase { > def main(args: Array[String]): Unit = { > val spark = SparkSession > .builder() > .master("local[4]") > .appName("test") > .getOrCreate() > def addConstColumns(prefix: String, cnt: Int, value: Column)(inputDF: > DataFrame) = > (0 until cnt).foldLeft(inputDF)((df, idx) => > df.withColumn(s"$prefix$idx", value)) > val dummy = udf(() => Option.empty[Int]) > def test(cnt: Int = 50, rows: Int = 500, grps: Int = 1000): Double = { > val t0 = System.nanoTime() > spark.range(rows).toDF() > .withColumn("grp", col("id").mod(grps)) > .transform(addConstColumns("null_", cnt, dummy())) >
[jira] [Updated] (SPARK-23791) Sub-optimal generated code when aggregating nullable columns
[ https://issues.apache.org/jira/browse/SPARK-23791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Valentin Nikotin updated SPARK-23791: - Description: It appears to be that with wholeStage codegen enabled simple spark job performing sum aggregation of 50 nullable columns runs ~4 timer slower than without wholeStage codegen. Please check test case code. Please note that udf is only to prevent elimination optimizations that could be applied to literals. was: It appears to be that with wholeStage codegen enabled simple spark job performing sum aggregation of 50 nullable columns runs ~4 timer slower than without wholeStage codegen. Please check test case code. Please note that udf is only to prevent elimination optimizations that could be applied to literals. > Sub-optimal generated code when aggregating nullable columns > > > Key: SPARK-23791 > URL: https://issues.apache.org/jira/browse/SPARK-23791 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 2.2.0, 2.3.0 > Environment: {code:java} > import org.apache.spark.sql.functions._ > import org.apache.spark.sql.{Column, DataFrame, SparkSession} > import org.apache.spark.sql.internal.SQLConf.WHOLESTAGE_CODEGEN_ENABLED > object TestCase { > def main(args: Array[String]): Unit = { > val spark = SparkSession > .builder() > .master("local[4]") > .appName("test") > .getOrCreate() > def addConstColumns(prefix: String, cnt: Int, value: Column)(inputDF: > DataFrame) = > (0 until cnt).foldLeft(inputDF)((df, idx) => > df.withColumn(s"$prefix$idx", value)) > val dummy = udf(() => Option.empty[Int]) > def test(cnt: Int = 50, rows: Int = 500, grps: Int = 1000): Double = { > val t0 = System.nanoTime() > spark.range(rows).toDF() > .withColumn("grp", col("id").mod(grps)) > .transform(addConstColumns("null_", cnt, dummy())) > .groupBy("grp") > .agg(sum("null_0"), (1 until cnt).map(idx => sum(s"null_$idx")): _*) > .collect() > val t1 = System.nanoTime() > (t1 - t0) / 1e9 > } > val timings = for (i <- 1 to 3) yield { > spark.sessionState.conf.setConf(WHOLESTAGE_CODEGEN_ENABLED, true) > val with_wholestage = test() > spark.sessionState.conf.setConf(WHOLESTAGE_CODEGEN_ENABLED, false) > val without_wholestage = test() > (with_wholestage, without_wholestage) > } > timings.foreach(println) > println("Press enter ...") > System.in.read() > } > } > {code} >Reporter: Valentin Nikotin >Priority: Major > Labels: performance > Original Estimate: 24h > Remaining Estimate: 24h > > It appears to be that with wholeStage codegen enabled simple spark job > performing sum aggregation of 50 nullable columns runs ~4 timer slower than > without wholeStage codegen. > Please check test case code. Please note that udf is only to prevent > elimination optimizations that could be applied to literals. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23791) Sub-optimal generated code when aggregating nullable columns
Valentin Nikotin created SPARK-23791: Summary: Sub-optimal generated code when aggregating nullable columns Key: SPARK-23791 URL: https://issues.apache.org/jira/browse/SPARK-23791 Project: Spark Issue Type: Bug Components: Optimizer Affects Versions: 2.3.0, 2.2.0 Environment: {code:java} import org.apache.spark.sql.functions._ import org.apache.spark.sql.{Column, DataFrame, SparkSession} import org.apache.spark.sql.internal.SQLConf.WHOLESTAGE_CODEGEN_ENABLED object TestCase { def main(args: Array[String]): Unit = { val spark = SparkSession .builder() .master("local[4]") .appName("test") .getOrCreate() def addConstColumns(prefix: String, cnt: Int, value: Column)(inputDF: DataFrame) = (0 until cnt).foldLeft(inputDF)((df, idx) => df.withColumn(s"$prefix$idx", value)) val dummy = udf(() => Option.empty[Int]) def test(cnt: Int = 50, rows: Int = 500, grps: Int = 1000): Double = { val t0 = System.nanoTime() spark.range(rows).toDF() .withColumn("grp", col("id").mod(grps)) .transform(addConstColumns("null_", cnt, dummy())) .groupBy("grp") .agg(sum("null_0"), (1 until cnt).map(idx => sum(s"null_$idx")): _*) .collect() val t1 = System.nanoTime() (t1 - t0) / 1e9 } val timings = for (i <- 1 to 3) yield { spark.sessionState.conf.setConf(WHOLESTAGE_CODEGEN_ENABLED, true) val with_wholestage = test() spark.sessionState.conf.setConf(WHOLESTAGE_CODEGEN_ENABLED, false) val without_wholestage = test() (with_wholestage, without_wholestage) } timings.foreach(println) println("Press enter ...") System.in.read() } } {code} Reporter: Valentin Nikotin It appears to be that with wholeStage codegen enabled simple spark job performing sum aggregation of 50 nullable columns runs ~4 timer slower than without wholeStage codegen. Please check test case code. Please note that udf is only to prevent elimination optimizations that could be applied to literals. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23645) pandas_udf can not be called with keyword arguments
[ https://issues.apache.org/jira/browse/SPARK-23645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16412711#comment-16412711 ] Apache Spark commented on SPARK-23645: -- User 'mstewart141' has created a pull request for this issue: https://github.com/apache/spark/pull/20900 > pandas_udf can not be called with keyword arguments > --- > > Key: SPARK-23645 > URL: https://issues.apache.org/jira/browse/SPARK-23645 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 2.3.0 > Environment: python 3.6 | pyspark 2.3.0 | Using Scala version 2.11.8, > OpenJDK 64-Bit Server VM, 1.8.0_141 >Reporter: Stu (Michael Stewart) >Priority: Minor > > pandas_udf (all python udfs(?)) do not accept keyword arguments because > `pyspark/sql/udf.py` class `UserDefinedFunction` has __call__, and also > wrapper utility methods, that only accept args and not kwargs: > @ line 168: > {code:java} > ... > def __call__(self, *cols): > judf = self._judf > sc = SparkContext._active_spark_context > return Column(judf.apply(_to_seq(sc, cols, _to_java_column))) > # This function is for improving the online help system in the interactive > interpreter. > # For example, the built-in help / pydoc.help. It wraps the UDF with the > docstring and > # argument annotation. (See: SPARK-19161) > def _wrapped(self): > """ > Wrap this udf with a function and attach docstring from func > """ > # It is possible for a callable instance without __name__ attribute or/and > # __module__ attribute to be wrapped here. For example, > functools.partial. In this case, > # we should avoid wrapping the attributes from the wrapped function to > the wrapper > # function. So, we take out these attribute names from the default names > to set and > # then manually assign it after being wrapped. > assignments = tuple( > a for a in functools.WRAPPER_ASSIGNMENTS if a != '__name__' and a != > '__module__') > @functools.wraps(self.func, assigned=assignments) > def wrapper(*args): > return self(*args) > ...{code} > as seen in: > {code:java} > from pyspark.sql import SparkSession > from pyspark.sql.functions import pandas_udf, PandasUDFType, col, lit > spark = SparkSession.builder.getOrCreate() > df = spark.range(12).withColumn('b', col('id') * 2) > def ok(a,b): return a*b > df.withColumn('ok', pandas_udf(f=ok, returnType='bigint')('id','b')).show() > # no problems > df.withColumn('ok', pandas_udf(f=ok, > returnType='bigint')(a='id',b='b')).show() # fail with ~no stacktrace thanks > to wrapper helper > --- > TypeError Traceback (most recent call last) > in () > > 1 df.withColumn('ok', pandas_udf(f=ok, > returnType='bigint')(a='id',b='b')).show() > TypeError: wrapper() got an unexpected keyword argument 'a'{code} > > > *discourse*: it isn't difficult to swap back in the kwargs, allowing the UDF > to be called as such, but the cols tuple that gets passed in the call method: > {code:java} > _to_seq(sc, cols, _to_java_column{code} > has to be in the right order based on the functions defined argument inputs, > or the function will return incorrect results. so, the challenge here is to: > (a) make sure to reconstruct the proper order of the full args/kwargs > --> args first, and then kwargs (not in the order passed but in the order > requested by the fn) > (b) handle python2 and python3 `inspect` module inconsistencies -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23790) proxy-user failed connecting to a kerberos configured metastore
[ https://issues.apache.org/jira/browse/SPARK-23790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stavros Kontopoulos updated SPARK-23790: Description: This appeared at a customer trying to integrate with a kerberized hdfs cluster. This can be easily fixed with the proposed fix [here|https://github.com/apache/spark/pull/17333] and the problem was reported first [here|https://issues.apache.org/jira/browse/SPARK-19995] for yarn. The other option is to add the delegation tokens to the current user's UGI as in [here|https://github.com/apache/spark/pull/17335] . The last fixes the problem but leads to a failure when someones uses a HadoopRDD because the latter, uses FileInputFormat to get the splits which calls the local ticket cache by using TokenCache.obtainTokensForNamenodes. Eventually this will fail with: {quote}Exception in thread "main" org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token can be issued only with kerberos or web authenticationat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:5896) {quote} This implies that security mode is SIMPLE and hadoop libs there are not aware of kerberos. This is related to this issue the workaround decided was to [trick|https://github.com/apache/spark/blob/a33655348c4066d9c1d8ad2055aadfbc892ba7fd/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L795-L804] hadoop. was: This appeared at a customer trying to integrate with a kerberized hdfs cluster. This can be easily fixed with the proposed fix [here|https://github.com/apache/spark/pull/17333]. The other option is to add the delegation tokens to the current user's UGI as in [here|https://github.com/apache/spark/pull/17335] . The last fixes the problem but leads to a failure when someones uses a HadoopRDD because the latter, uses FileInputFormat to get the splits which calls the local ticket cache by using TokenCache.obtainTokensForNamenodes. Eventually this will fail with: {quote}Exception in thread "main" org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token can be issued only with kerberos or web authenticationat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:5896) {quote} This implies that security mode is SIMPLE and hadoop libs there are not aware of kerberos. This is related to this [issue|https://issues.apache.org/jira/browse/MAPREDUCE-6876] and the workaround decided was to [trick|https://github.com/apache/spark/blob/a33655348c4066d9c1d8ad2055aadfbc892ba7fd/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L795-L804] hadoop. > proxy-user failed connecting to a kerberos configured metastore > --- > > Key: SPARK-23790 > URL: https://issues.apache.org/jira/browse/SPARK-23790 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 2.3.0 >Reporter: Stavros Kontopoulos >Priority: Major > > This appeared at a customer trying to integrate with a kerberized hdfs > cluster. > This can be easily fixed with the proposed fix > [here|https://github.com/apache/spark/pull/17333] and the problem was > reported first [here|https://issues.apache.org/jira/browse/SPARK-19995] for > yarn. > The other option is to add the delegation tokens to the current user's UGI as > in [here|https://github.com/apache/spark/pull/17335] . The last fixes the > problem but leads to a failure when someones uses a HadoopRDD because the > latter, uses FileInputFormat to get the splits which calls the local ticket > cache by using TokenCache.obtainTokensForNamenodes. Eventually this will fail > with: > {quote}Exception in thread "main" > org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token > can be issued only with kerberos or web authenticationat > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:5896) > {quote} > This implies that security mode is SIMPLE and hadoop libs there are not aware > of kerberos. > This is related to this issue the workaround decided was to > [trick|https://github.com/apache/spark/blob/a33655348c4066d9c1d8ad2055aadfbc892ba7fd/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L795-L804] > hadoop. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23790) proxy-user failed connecting to a kerberos configured metastore
[ https://issues.apache.org/jira/browse/SPARK-23790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16412644#comment-16412644 ] Stavros Kontopoulos commented on SPARK-23790: - [~susanxhuynh] fyi. [~vanzin], [~jerryshao] do you think we should revert back to the other solution with the doAsRealUser(SessionState.start(state))? I dont think there is much progress [here|https://issues.apache.org/jira/browse/MAPREDUCE-6876]. > proxy-user failed connecting to a kerberos configured metastore > --- > > Key: SPARK-23790 > URL: https://issues.apache.org/jira/browse/SPARK-23790 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 2.3.0 >Reporter: Stavros Kontopoulos >Priority: Major > > This appeared at a customer trying to integrate with a kerberized hdfs > cluster. > This can be easily fixed with the proposed fix > [here|https://github.com/apache/spark/pull/17333]. > The other option is to add the delegation tokens to the current user's UGI as > in [here|https://github.com/apache/spark/pull/17335] . The last fixes the > problem but leads to a failure when someones uses a HadoopRDD because the > latter, uses FileInputFormat to get the splits which calls the local ticket > cache by using TokenCache.obtainTokensForNamenodes. Eventually this will fail > with: > {quote}Exception in thread "main" > org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token > can be issued only with kerberos or web authenticationat > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:5896) > {quote} > This implies that security mode is SIMPLE and hadoop libs there are not aware > of kerberos. > This is related to this > [issue|https://issues.apache.org/jira/browse/MAPREDUCE-6876] and the > workaround decided was to > [trick|https://github.com/apache/spark/blob/a33655348c4066d9c1d8ad2055aadfbc892ba7fd/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L795-L804] > hadoop. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23790) proxy-user failed connecting to a kerberos configured metastore
[ https://issues.apache.org/jira/browse/SPARK-23790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stavros Kontopoulos updated SPARK-23790: Description: This appeared at a customer trying to integrate with a kerberized hdfs cluster. This can be easily fixed with the proposed fix [here|https://github.com/apache/spark/pull/17333]. The other option is to add the delegation tokens to the current user's UGI as in [here|https://github.com/apache/spark/pull/17335] . The last fixes the problem but leads to a failure when someones uses a HadoopRDD because the latter, uses FileInputFormat to get the splits which calls the local ticket cache by using TokenCache.obtainTokensForNamenodes. Eventually this will fail with: {quote}Exception in thread "main" org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token can be issued only with kerberos or web authenticationat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:5896) {quote} This implies that security mode is SIMPLE and hadoop libs there are not aware of kerberos. This is related to this [issue|https://issues.apache.org/jira/browse/MAPREDUCE-6876] and the workaround decided was to [trick|https://github.com/apache/spark/blob/a33655348c4066d9c1d8ad2055aadfbc892ba7fd/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L795-L804] hadoop. was: This appeared at a customer trying to integrate with a kerberized hdfs cluster. This can be easily fixed with the proposed fix [here|https://github.com/apache/spark/pull/17333]. The other option is to add the delegation tokens to the current user's UGI as in [here|https://github.com/apache/spark/pull/17335] . The last fixes the problem but leads to a failure when someones uses a HadoopRDD because the latter, uses FileInputFormat to get the splits which calls the local ticket cache by using TokenCache.obtainTokensForNamenodes. Eventually this will fail with: {quote}Exception in thread "main" org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token can be issued only with kerberos or web authenticationat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:5896) {quote} This implies that security mode is SIMPLE and hadoop libs there are not aware of kerberos. This is related to this issue and the workaround decided was to [trick|https://github.com/apache/spark/blob/a33655348c4066d9c1d8ad2055aadfbc892ba7fd/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L795-L804] hadoop. > proxy-user failed connecting to a kerberos configured metastore > --- > > Key: SPARK-23790 > URL: https://issues.apache.org/jira/browse/SPARK-23790 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 2.3.0 >Reporter: Stavros Kontopoulos >Priority: Major > > This appeared at a customer trying to integrate with a kerberized hdfs > cluster. > This can be easily fixed with the proposed fix > [here|https://github.com/apache/spark/pull/17333]. > The other option is to add the delegation tokens to the current user's UGI as > in [here|https://github.com/apache/spark/pull/17335] . The last fixes the > problem but leads to a failure when someones uses a HadoopRDD because the > latter, uses FileInputFormat to get the splits which calls the local ticket > cache by using TokenCache.obtainTokensForNamenodes. Eventually this will fail > with: > {quote}Exception in thread "main" > org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token > can be issued only with kerberos or web authenticationat > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:5896) > {quote} > This implies that security mode is SIMPLE and hadoop libs there are not aware > of kerberos. > This is related to this > [issue|https://issues.apache.org/jira/browse/MAPREDUCE-6876] and the > workaround decided was to > [trick|https://github.com/apache/spark/blob/a33655348c4066d9c1d8ad2055aadfbc892ba7fd/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L795-L804] > hadoop. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23790) proxy-user failed connecting to a kerberos configured metastore
[ https://issues.apache.org/jira/browse/SPARK-23790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stavros Kontopoulos updated SPARK-23790: Description: This appeared at a customer trying to integrate with a kerberized hdfs cluster. This can be easily fixed with the proposed fix [here|https://github.com/apache/spark/pull/17333]. The other option is to add the delegation tokens to the current user's UGI as in [here|https://github.com/apache/spark/pull/17335] . The last fixes the problem but leads to a failure when someones uses a HadoopRDD because the latter, uses FileInputFormat to get the splits which calls the local ticket cache by using TokenCache.obtainTokensForNamenodes. Eventually this will fail with: {quote}Exception in thread "main" org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token can be issued only with kerberos or web authenticationat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:5896) {quote} This implies that security mode is SIMPLE and hadoop libs there are not aware of kerberos. This is related to this issue where we had some issues in the past and the workaround decided was to [trick|https://github.com/apache/spark/blob/a33655348c4066d9c1d8ad2055aadfbc892ba7fd/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L795-L804] hadoop. was: This appeared at a customer trying to integrate with a kerberized hdfs cluster. This can be easily fixed with the proposed fix [here|https://github.com/apache/spark/pull/17333]. The other option is to add the delegation tokens to the current user's UGI as in [here|https://github.com/apache/spark/pull/17335] . The last fixes the problem but leads to a failure when someones uses a HadoopRDD because the latter, uses FileInputFormat to get the splits which calls the local ticket cache by using TokenCache.obtainTokensForNamenodes. Eventually this will fail with: {quote}Exception in thread "main" org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token can be issued only with kerberos or web authenticationat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:5896) {quote} This implies that security mode is SIMPLE and hadoop libs there are not aware of kerberos. This is related to this [issue|https://issues.apache.org/jira/browse/MAPREDUCE-6876] where we had some issues in the past and the workaround decided is to [trick|https://github.com/apache/spark/blob/a33655348c4066d9c1d8ad2055aadfbc892ba7fd/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L795-L804] hadoop. > proxy-user failed connecting to a kerberos configured metastore > --- > > Key: SPARK-23790 > URL: https://issues.apache.org/jira/browse/SPARK-23790 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 2.3.0 >Reporter: Stavros Kontopoulos >Priority: Major > > This appeared at a customer trying to integrate with a kerberized hdfs > cluster. > This can be easily fixed with the proposed fix > [here|https://github.com/apache/spark/pull/17333]. > The other option is to add the delegation tokens to the current user's UGI as > in [here|https://github.com/apache/spark/pull/17335] . The last fixes the > problem but leads to a failure when someones uses a HadoopRDD because the > latter, uses FileInputFormat to get the splits which calls the local ticket > cache by using TokenCache.obtainTokensForNamenodes. Eventually this will fail > with: > {quote}Exception in thread "main" > org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token > can be issued only with kerberos or web authenticationat > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:5896) > {quote} > This implies that security mode is SIMPLE and hadoop libs there are not aware > of kerberos. > This is related to this issue where we had some issues in the past and the > workaround decided was to > [trick|https://github.com/apache/spark/blob/a33655348c4066d9c1d8ad2055aadfbc892ba7fd/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L795-L804] > hadoop. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23790) proxy-user failed connecting to a kerberos configured metastore
[ https://issues.apache.org/jira/browse/SPARK-23790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stavros Kontopoulos updated SPARK-23790: Description: This appeared at a customer trying to integrate with a kerberized hdfs cluster. This can be easily fixed with the proposed fix [here|https://github.com/apache/spark/pull/17333]. The other option is to add the delegation tokens to the current user's UGI as in [here|https://github.com/apache/spark/pull/17335] . The last fixes the problem but leads to a failure when someones uses a HadoopRDD because the latter, uses FileInputFormat to get the splits which calls the local ticket cache by using TokenCache.obtainTokensForNamenodes. Eventually this will fail with: {quote}Exception in thread "main" org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token can be issued only with kerberos or web authenticationat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:5896) {quote} This implies that security mode is SIMPLE and hadoop libs there are not aware of kerberos. This is related to this [issue|https://issues.apache.org/jira/browse/MAPREDUCE-6876] where we had some issues in the past and the workaround decided is to [trick|https://github.com/apache/spark/blob/a33655348c4066d9c1d8ad2055aadfbc892ba7fd/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L795-L804] hadoop. was: This appeared at a customer trying to integrate with a kerberized hdfs cluster. This can be easily fixed with the proposed fix [here|https://github.com/apache/spark/pull/17333]. The other option is to add the delegation tokens to the current user's UGI as in [here|https://github.com/apache/spark/pull/17335] . The last fixes the problem but leads to a failure when someones uses a HadoopRDD because the latter, uses FileInputFormat to get the splits which calls the local ticket cache by using TokenCache.obtainTokensForNamenodes. Eventually this will fail with: {quote}Exception in thread "main" org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token can be issued only with kerberos or web authenticationat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:5896) {quote} This implies that security mode is SIMPLE and hadoop libs there are not aware of kerberos. This is related to this [ issue|https://issues.apache.org/jira/browse/MAPREDUCE-6876] where we had some issues in the past and the workaround decided is to [trick|https://github.com/apache/spark/blob/a33655348c4066d9c1d8ad2055aadfbc892ba7fd/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L795-L804] hadoop. > proxy-user failed connecting to a kerberos configured metastore > --- > > Key: SPARK-23790 > URL: https://issues.apache.org/jira/browse/SPARK-23790 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 2.3.0 >Reporter: Stavros Kontopoulos >Priority: Major > > This appeared at a customer trying to integrate with a kerberized hdfs > cluster. > This can be easily fixed with the proposed fix > [here|https://github.com/apache/spark/pull/17333]. > The other option is to add the delegation tokens to the current user's UGI as > in [here|https://github.com/apache/spark/pull/17335] . The last fixes the > problem but leads to a failure when someones uses a HadoopRDD because the > latter, uses FileInputFormat to get the splits which calls the local ticket > cache by using TokenCache.obtainTokensForNamenodes. Eventually this will fail > with: > {quote}Exception in thread "main" > org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token > can be issued only with kerberos or web authenticationat > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:5896) > {quote} > This implies that security mode is SIMPLE and hadoop libs there are not aware > of kerberos. > This is related to this > [issue|https://issues.apache.org/jira/browse/MAPREDUCE-6876] where we had > some issues in the past and the workaround decided is to > [trick|https://github.com/apache/spark/blob/a33655348c4066d9c1d8ad2055aadfbc892ba7fd/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L795-L804] > hadoop. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23790) proxy-user failed connecting to a kerberos configured metastore
[ https://issues.apache.org/jira/browse/SPARK-23790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stavros Kontopoulos updated SPARK-23790: Description: This appeared at a customer trying to integrate with a kerberized hdfs cluster. This can be easily fixed with the proposed fix [here|https://github.com/apache/spark/pull/17333]. The other option is to add the delegation tokens to the current user's UGI as in [here|https://github.com/apache/spark/pull/17335] . The last fixes the problem but leads to a failure when someones uses a HadoopRDD because the latter, uses FileInputFormat to get the splits which calls the local ticket cache by using TokenCache.obtainTokensForNamenodes. Eventually this will fail with: {quote}Exception in thread "main" org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token can be issued only with kerberos or web authenticationat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:5896) {quote} This implies that security mode is SIMPLE and hadoop libs there are not aware of kerberos. This is related to this issue and the workaround decided was to [trick|https://github.com/apache/spark/blob/a33655348c4066d9c1d8ad2055aadfbc892ba7fd/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L795-L804] hadoop. was: This appeared at a customer trying to integrate with a kerberized hdfs cluster. This can be easily fixed with the proposed fix [here|https://github.com/apache/spark/pull/17333]. The other option is to add the delegation tokens to the current user's UGI as in [here|https://github.com/apache/spark/pull/17335] . The last fixes the problem but leads to a failure when someones uses a HadoopRDD because the latter, uses FileInputFormat to get the splits which calls the local ticket cache by using TokenCache.obtainTokensForNamenodes. Eventually this will fail with: {quote}Exception in thread "main" org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token can be issued only with kerberos or web authenticationat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:5896) {quote} This implies that security mode is SIMPLE and hadoop libs there are not aware of kerberos. This is related to this issue where we had some issues in the past and the workaround decided was to [trick|https://github.com/apache/spark/blob/a33655348c4066d9c1d8ad2055aadfbc892ba7fd/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L795-L804] hadoop. > proxy-user failed connecting to a kerberos configured metastore > --- > > Key: SPARK-23790 > URL: https://issues.apache.org/jira/browse/SPARK-23790 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 2.3.0 >Reporter: Stavros Kontopoulos >Priority: Major > > This appeared at a customer trying to integrate with a kerberized hdfs > cluster. > This can be easily fixed with the proposed fix > [here|https://github.com/apache/spark/pull/17333]. > The other option is to add the delegation tokens to the current user's UGI as > in [here|https://github.com/apache/spark/pull/17335] . The last fixes the > problem but leads to a failure when someones uses a HadoopRDD because the > latter, uses FileInputFormat to get the splits which calls the local ticket > cache by using TokenCache.obtainTokensForNamenodes. Eventually this will fail > with: > {quote}Exception in thread "main" > org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token > can be issued only with kerberos or web authenticationat > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:5896) > {quote} > This implies that security mode is SIMPLE and hadoop libs there are not aware > of kerberos. > This is related to this issue and the workaround decided was to > [trick|https://github.com/apache/spark/blob/a33655348c4066d9c1d8ad2055aadfbc892ba7fd/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L795-L804] > hadoop. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23790) proxy-user failed connecting to a kerberos configured metastore
[ https://issues.apache.org/jira/browse/SPARK-23790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stavros Kontopoulos updated SPARK-23790: Description: This appeared at a customer trying to integrate with a kerberized hdfs cluster. This can be easily fixed with the proposed fix [here|https://github.com/apache/spark/pull/17333]. The other option is to add the delegation tokens to the current user's UGI as in [here|https://github.com/apache/spark/pull/17335] . The last fixes the problem but leads to a failure when someones uses a HadoopRDD because the latter, uses FileInputFormat to get the splits which calls the local ticket cache by using TokenCache.obtainTokensForNamenodes. Eventually this will fail with: {quote}Exception in thread "main" org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token can be issued only with kerberos or web authenticationat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:5896) {quote} This implies that security mode is SIMPLE and hadoop libs there are not aware of kerberos. This is related to this [ issue|https://issues.apache.org/jira/browse/MAPREDUCE-6876] where we had some issues in the past and the workaround decided is to [trick|https://github.com/apache/spark/blob/a33655348c4066d9c1d8ad2055aadfbc892ba7fd/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L795-L804] hadoop. was: This appeared at a customer trying to integrate with a kerberized hdfs cluster. This can be easily fixed with the proposed fix [here|https://github.com/apache/spark/pull/17333]. The other option is to add the delegation tokens to the current user's UGI as in [here|https://github.com/apache/spark/pull/17335] . The last fixes the problem but leads to a failure when someones uses a HadoopRDD because the latter, uses FileInputFormat to get the splits which calls the local ticket cache by using TokenCache.obtainTokensForNamenodes. Eventually this will fail with: {quote}Exception in thread "main" org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token can be issued only with kerberos or web authenticationat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:5896) {quote} This implies that security mode is SIMPLE and hadoop libs there are not aware of kerberos. This is related to this[ issue|https://issues.apache.org/jira/browse/MAPREDUCE-6876] where we had some issues in the past and the workaround decided is to [trick|https://github.com/apache/spark/blob/a33655348c4066d9c1d8ad2055aadfbc892ba7fd/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L795-L804] hadoop. > proxy-user failed connecting to a kerberos configured metastore > --- > > Key: SPARK-23790 > URL: https://issues.apache.org/jira/browse/SPARK-23790 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 2.3.0 >Reporter: Stavros Kontopoulos >Priority: Major > > This appeared at a customer trying to integrate with a kerberized hdfs > cluster. > This can be easily fixed with the proposed fix > [here|https://github.com/apache/spark/pull/17333]. > The other option is to add the delegation tokens to the current user's UGI as > in [here|https://github.com/apache/spark/pull/17335] . The last fixes the > problem but leads to a failure when someones uses a HadoopRDD because the > latter, uses FileInputFormat to get the splits which calls the local ticket > cache by using TokenCache.obtainTokensForNamenodes. Eventually this will fail > with: > {quote}Exception in thread "main" > org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token > can be issued only with kerberos or web authenticationat > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:5896) > {quote} > This implies that security mode is SIMPLE and hadoop libs there are not aware > of kerberos. > This is related to this [ > issue|https://issues.apache.org/jira/browse/MAPREDUCE-6876] where we had > some issues in the past and the workaround decided is to > [trick|https://github.com/apache/spark/blob/a33655348c4066d9c1d8ad2055aadfbc892ba7fd/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L795-L804] > hadoop. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23790) proxy-user failed connecting to a kerberos configured metastore
[ https://issues.apache.org/jira/browse/SPARK-23790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stavros Kontopoulos updated SPARK-23790: Description: This appeared at a customer trying to integrate with a kerberized hdfs cluster. This can be easily fixed with the proposed fix [here|https://github.com/apache/spark/pull/17333]. The other option is to add the delegation tokens to the current user's UGI as in [here|https://github.com/apache/spark/pull/17335] . The last fixes the problem but leads to a failure when someones uses a HadoopRDD because the latter, uses FileInputFormat to get the splits which calls the local ticket cache by using TokenCache.obtainTokensForNamenodes. Eventually this will fail with: {quote}Exception in thread "main" org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token can be issued only with kerberos or web authenticationat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:5896) {quote} This implies that security mode is SIMPLE and hadoop libs there are not aware of kerberos. This is related to this[ issue|https://issues.apache.org/jira/browse/MAPREDUCE-6876] where we had some issues in the past and the workaround decided is to [trick|https://github.com/apache/spark/blob/a33655348c4066d9c1d8ad2055aadfbc892ba7fd/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L795-L804] hadoop. was: This appeared at a customer trying to integrate with a kerberized hdfs cluster. This can be easily fixed with the proposed fix [here|https://github.com/apache/spark/pull/17333]. The other option is to add the delegation tokens to the current user's UGI as in [here|https://github.com/apache/spark/pull/17335] . The last fixes the problem but leads to a failure when someones uses a HadoopRDD because the latter, uses FileInputFormat to get the splits which calls the local ticket cache by using TokenCache.obtainTokensForNamenodes. Eventually this will fail with: Exception in thread "main" org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token can be issued only with kerberos or web authentication at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:5896) This implies that security mode is SIMPLE and hadoop libs there are not aware of kerberos. This is related to this[ issue|https://issues.apache.org/jira/browse/MAPREDUCE-6876] where we had some issues in the past and the workaround decided is to [trick|https://github.com/apache/spark/blob/a33655348c4066d9c1d8ad2055aadfbc892ba7fd/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L795-L804] hadoop. > proxy-user failed connecting to a kerberos configured metastore > --- > > Key: SPARK-23790 > URL: https://issues.apache.org/jira/browse/SPARK-23790 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 2.3.0 >Reporter: Stavros Kontopoulos >Priority: Major > > This appeared at a customer trying to integrate with a kerberized hdfs > cluster. > This can be easily fixed with the proposed fix > [here|https://github.com/apache/spark/pull/17333]. > The other option is to add the delegation tokens to the current user's UGI as > in [here|https://github.com/apache/spark/pull/17335] . The last fixes the > problem but leads to a failure when someones uses a HadoopRDD because the > latter, uses FileInputFormat to get the splits which calls the local ticket > cache by using TokenCache.obtainTokensForNamenodes. Eventually this will fail > with: > {quote}Exception in thread "main" > org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token > can be issued only with kerberos or web authenticationat > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:5896) > {quote} > This implies that security mode is SIMPLE and hadoop libs there are not aware > of kerberos. > This is related to this[ > issue|https://issues.apache.org/jira/browse/MAPREDUCE-6876] where we had > some issues in the past and the workaround decided is to > [trick|https://github.com/apache/spark/blob/a33655348c4066d9c1d8ad2055aadfbc892ba7fd/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L795-L804] > hadoop. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23790) proxy-user failed connecting to a kerberos configured metastore
[ https://issues.apache.org/jira/browse/SPARK-23790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stavros Kontopoulos updated SPARK-23790: Description: This appeared at a customer trying to integrate with a kerberized hdfs cluster. This can be easily fixed with the proposed fix [here|https://github.com/apache/spark/pull/17333]. The other option is to add the delegation tokens to the current user's UGI as in [here|https://github.com/apache/spark/pull/17335] . The last fixes the problem but leads to a failure when someones uses a HadoopRDD because the latter, uses FileInputFormat to get the splits which calls the local ticket cache TokenCache.obtainTokensForNamenodes. Eventually this will fail with: Exception in thread "main" org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token can be issued only with kerberos or web authentication at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:5896) This implies that security mode is SIMPLE and hadoop libs there are not aware of kerberos. This is related to this[ issue|https://issues.apache.org/jira/browse/MAPREDUCE-6876] where we had some issues in the past and the workaround decided is to [trick|https://github.com/apache/spark/blob/a33655348c4066d9c1d8ad2055aadfbc892ba7fd/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L795-L804] hadoop. was: This appeared at a customer trying to integrate with a kerberized hdfs cluster. This is easily fixed with the proposed fix [here|https://github.com/apache/spark/pull/17333]. The other option is to add the delegation tokens to the current user's UGI as in [here|https://github.com/apache/spark/pull/17335] . The last fixes the problem but leads to a failure when someones uses a HadoopRDD because the latter, uses FileInputFormat to get the splits which calls the local ticket cache TokenCache.obtainTokensForNamenodes. Eventually this will fail with: Exception in thread "main" org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token can be issued only with kerberos or web authentication at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:5896) This implies that security mode is SIMPLE and hadoop libs there are not aware of kerberos. This is related to this[ issue|https://issues.apache.org/jira/browse/MAPREDUCE-6876] where we had some issues in the past and the workaround decided is to [trick|https://github.com/apache/spark/blob/a33655348c4066d9c1d8ad2055aadfbc892ba7fd/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L795-L804] hadoop. > proxy-user failed connecting to a kerberos configured metastore > --- > > Key: SPARK-23790 > URL: https://issues.apache.org/jira/browse/SPARK-23790 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 2.3.0 >Reporter: Stavros Kontopoulos >Priority: Major > > This appeared at a customer trying to integrate with a kerberized hdfs > cluster. > This can be easily fixed with the proposed fix > [here|https://github.com/apache/spark/pull/17333]. > The other option is to add the delegation tokens to the current user's UGI as > in [here|https://github.com/apache/spark/pull/17335] . The last fixes the > problem but leads to a failure when someones uses a HadoopRDD because the > latter, uses FileInputFormat to get the splits which calls the local ticket > cache TokenCache.obtainTokensForNamenodes. Eventually this will fail with: > Exception in thread "main" > org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token > can be issued only with kerberos or web authentication > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:5896) > This implies that security mode is SIMPLE and hadoop libs there are not aware > of kerberos. > This is related to this[ > issue|https://issues.apache.org/jira/browse/MAPREDUCE-6876] where we had > some issues in the past and the workaround decided is to > [trick|https://github.com/apache/spark/blob/a33655348c4066d9c1d8ad2055aadfbc892ba7fd/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L795-L804] > hadoop. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23790) proxy-user failed connecting to a kerberos configured metastore
[ https://issues.apache.org/jira/browse/SPARK-23790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stavros Kontopoulos updated SPARK-23790: Description: This appeared at a customer trying to integrate with a kerberized hdfs cluster. This is easily fixed with the proposed fix [here|https://github.com/apache/spark/pull/17333]. The other option is to add the delegation tokens to the current user's UGI as in [here|https://github.com/apache/spark/pull/17335] . The last fixes the problem but leads to a failure when someones uses a HadoopRDD because the latter, uses FileInputFormat to get the splits which calls the local ticket cache TokenCache.obtainTokensForNamenodes. Eventually this will fail with: Exception in thread "main" org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token can be issued only with kerberos or web authentication at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:5896) This implies that security mode is SIMPLE and hadoop libs there are not aware of kerberos. This is related to this[ issue|https://issues.apache.org/jira/browse/MAPREDUCE-6876] where we had some issues in the past and the workaround decided is to [trick|https://github.com/apache/spark/blob/a33655348c4066d9c1d8ad2055aadfbc892ba7fd/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L795-L804] hadoop. was: This appeared at a customer trying to integrate with a kerberized hdfs cluster. This is easily fixed with the proposed fix here: [https://github.com/apache/spark/pull/17333] The other option is to add the delegation tokens to the current user's UGI as in here: [https://github.com/apache/spark/pull/17335]. The last fixes the problem but leads to a failure when someones uses a HadoopRDD because the latter, uses FileInputFormat to get the splits which calls the local ticket cache TokenCache.obtainTokensForNamenodes. Eventually this will fail with: Exception in thread "main" org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token can be issued only with kerberos or web authentication at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:5896) This implies that security mode is SIMPLE and hadoop libs there are not aware of kerberos. This is related to this issue: https://issues.apache.org/jira/browse/MAPREDUCE-6876 > proxy-user failed connecting to a kerberos configured metastore > --- > > Key: SPARK-23790 > URL: https://issues.apache.org/jira/browse/SPARK-23790 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 2.3.0 >Reporter: Stavros Kontopoulos >Priority: Major > > This appeared at a customer trying to integrate with a kerberized hdfs > cluster. > This is easily fixed with the proposed fix > [here|https://github.com/apache/spark/pull/17333]. > The other option is to add the delegation tokens to the current user's UGI as > in [here|https://github.com/apache/spark/pull/17335] . The last fixes the > problem but leads to a failure when someones uses a HadoopRDD because the > latter, uses FileInputFormat to get the splits which calls the local ticket > cache TokenCache.obtainTokensForNamenodes. Eventually this will fail with: > Exception in thread "main" > org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token > can be issued only with kerberos or web authentication > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:5896) > This implies that security mode is SIMPLE and hadoop libs there are not aware > of kerberos. > This is related to this[ > issue|https://issues.apache.org/jira/browse/MAPREDUCE-6876] where we had > some issues in the past and the workaround decided is to > [trick|https://github.com/apache/spark/blob/a33655348c4066d9c1d8ad2055aadfbc892ba7fd/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L795-L804] > hadoop. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23790) proxy-user failed connecting to a kerberos configured metastore
[ https://issues.apache.org/jira/browse/SPARK-23790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stavros Kontopoulos updated SPARK-23790: Description: This appeared at a customer trying to integrate with a kerberized hdfs cluster. This can be easily fixed with the proposed fix [here|https://github.com/apache/spark/pull/17333]. The other option is to add the delegation tokens to the current user's UGI as in [here|https://github.com/apache/spark/pull/17335] . The last fixes the problem but leads to a failure when someones uses a HadoopRDD because the latter, uses FileInputFormat to get the splits which calls the local ticket cache by using TokenCache.obtainTokensForNamenodes. Eventually this will fail with: Exception in thread "main" org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token can be issued only with kerberos or web authentication at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:5896) This implies that security mode is SIMPLE and hadoop libs there are not aware of kerberos. This is related to this[ issue|https://issues.apache.org/jira/browse/MAPREDUCE-6876] where we had some issues in the past and the workaround decided is to [trick|https://github.com/apache/spark/blob/a33655348c4066d9c1d8ad2055aadfbc892ba7fd/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L795-L804] hadoop. was: This appeared at a customer trying to integrate with a kerberized hdfs cluster. This can be easily fixed with the proposed fix [here|https://github.com/apache/spark/pull/17333]. The other option is to add the delegation tokens to the current user's UGI as in [here|https://github.com/apache/spark/pull/17335] . The last fixes the problem but leads to a failure when someones uses a HadoopRDD because the latter, uses FileInputFormat to get the splits which calls the local ticket cache TokenCache.obtainTokensForNamenodes. Eventually this will fail with: Exception in thread "main" org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token can be issued only with kerberos or web authentication at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:5896) This implies that security mode is SIMPLE and hadoop libs there are not aware of kerberos. This is related to this[ issue|https://issues.apache.org/jira/browse/MAPREDUCE-6876] where we had some issues in the past and the workaround decided is to [trick|https://github.com/apache/spark/blob/a33655348c4066d9c1d8ad2055aadfbc892ba7fd/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L795-L804] hadoop. > proxy-user failed connecting to a kerberos configured metastore > --- > > Key: SPARK-23790 > URL: https://issues.apache.org/jira/browse/SPARK-23790 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 2.3.0 >Reporter: Stavros Kontopoulos >Priority: Major > > This appeared at a customer trying to integrate with a kerberized hdfs > cluster. > This can be easily fixed with the proposed fix > [here|https://github.com/apache/spark/pull/17333]. > The other option is to add the delegation tokens to the current user's UGI as > in [here|https://github.com/apache/spark/pull/17335] . The last fixes the > problem but leads to a failure when someones uses a HadoopRDD because the > latter, uses FileInputFormat to get the splits which calls the local ticket > cache by using TokenCache.obtainTokensForNamenodes. Eventually this will fail > with: > Exception in thread "main" > org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token > can be issued only with kerberos or web authentication > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:5896) > This implies that security mode is SIMPLE and hadoop libs there are not aware > of kerberos. > This is related to this[ > issue|https://issues.apache.org/jira/browse/MAPREDUCE-6876] where we had > some issues in the past and the workaround decided is to > [trick|https://github.com/apache/spark/blob/a33655348c4066d9c1d8ad2055aadfbc892ba7fd/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L795-L804] > hadoop. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23790) proxy-user failed connecting to a kerberos configured metastore
[ https://issues.apache.org/jira/browse/SPARK-23790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stavros Kontopoulos updated SPARK-23790: Description: This appeared at a customer trying to integrate with a kerberized hdfs cluster. This is easily fixed with the proposed fix here: [https://github.com/apache/spark/pull/17333] The other option is to add the delegation tokens to the current user's UGI as in here: [https://github.com/apache/spark/pull/17335]. The last fixes the problem but leads to a failure when someones uses a HadoopRDD because the latter, uses FileInputFormat to get the splits which calls the local ticket cache TokenCache.obtainTokensForNamenodes. Eventually this will fail with: Exception in thread "main" org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token can be issued only with kerberos or web authentication at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:5896) This implies that security mode is SIMPLE and hadoop libs there are not aware of kerberos. This is related to this issue: https://issues.apache.org/jira/browse/MAPREDUCE-6876 was: This appeared at a customer trying to integrate with a kerberized hdfs cluster. This is easily fixed with the proposed fix here: [https://github.com/apache/spark/pull/17333] The other option is to add the delegation tokens to the current user's UGI as in here: [https://github.com/apache/spark/pull/17335]. The last fixes the problem but leads to a failure when someones uses a HadoopRDD because the latter FileInputFormat to get the splits which call the local ticket cache TokenCache.obtainTokensForNamenodes which will fail with: Exception in thread "main" org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token can be issued only with kerberos or web authentication at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:5896) This implies that security mode is SIMPLE and hadoop libs there are not aware of kerberos. This is related to this issue: https://issues.apache.org/jira/browse/MAPREDUCE-6876 > proxy-user failed connecting to a kerberos configured metastore > --- > > Key: SPARK-23790 > URL: https://issues.apache.org/jira/browse/SPARK-23790 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 2.3.0 >Reporter: Stavros Kontopoulos >Priority: Major > > This appeared at a customer trying to integrate with a kerberized hdfs > cluster. > This is easily fixed with the proposed fix here: > [https://github.com/apache/spark/pull/17333] > The other option is to add the delegation tokens to the current user's UGI as > in here: [https://github.com/apache/spark/pull/17335]. The last fixes the > problem but leads to a failure when someones uses a HadoopRDD because the > latter, uses FileInputFormat to get the splits which calls the local ticket > cache TokenCache.obtainTokensForNamenodes. Eventually this will fail with: > Exception in thread "main" > org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token > can be issued only with kerberos or web authentication > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:5896) > This implies that security mode is SIMPLE and hadoop libs there are not aware > of kerberos. > This is related to this issue: > https://issues.apache.org/jira/browse/MAPREDUCE-6876 > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23790) proxy-user failed connecting to a kerberos configured metastore
Stavros Kontopoulos created SPARK-23790: --- Summary: proxy-user failed connecting to a kerberos configured metastore Key: SPARK-23790 URL: https://issues.apache.org/jira/browse/SPARK-23790 Project: Spark Issue Type: Bug Components: Mesos Affects Versions: 2.3.0 Reporter: Stavros Kontopoulos This appeared at a customer trying to integrate with a kerberized hdfs cluster. This is easily fixed with the proposed fix here: [https://github.com/apache/spark/pull/17333] The other option is to add the delegation tokens to the current user's UGI as in here: [https://github.com/apache/spark/pull/17335]. The last fixes the problem but leads to a failure when someones uses a HadoopRDD because the latter FileInputFormat to get the splits which call the local ticket cache TokenCache.obtainTokensForNamenodes which will fail with: Exception in thread "main" org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token can be issued only with kerberos or web authentication at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:5896) This implies that security mode is SIMPLE and hadoop libs there are not aware of kerberos. This is related to this issue: https://issues.apache.org/jira/browse/MAPREDUCE-6876 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23782) SHS should not show applications to user without read permission
[ https://issues.apache.org/jira/browse/SPARK-23782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16412510#comment-16412510 ] Marco Gaido commented on SPARK-23782: - [~vanzin] thanks for the link. I see that in the discussion there were doubts about this, so the PR removed this part from it to focus on the other aspects, but there was no strong opinion against this.. bq. What sensitive information is being exposed to users that should not see it? The users can see which applications have been run by each users, when, how long they last, their names and which applications other users are running, if they are connected though a spark-shell for instance and so on. These are information which should not be shared with non-authorized people and if the names of the applications are meaningful a user can easily guess what the others are doing on the cluster. If you compare how other systems work, moreover, of course they do not show to non-admin users what the others are doing. Our current situation is the same as if in Oracle or Postgres you were able to list the queries run by other users: of course each user can list only its queries. bq. Won't you get that same info if you go to the resource manager's page and look at what applications have run? I am not sure how the RM UI works. If it lists all the applications to all the users, even though they do not have the rights for it, it is a big security hole, since there you can also retrieve the logs. I hope the RM has better security than this but I am not an expert on it. And if it doesn't, I do believe it should be fixed. Moreover, I think we should not focus for Spark on a specific resource manager (YARN), since Spark can run in many modes other than it. > SHS should not show applications to user without read permission > > > Key: SPARK-23782 > URL: https://issues.apache.org/jira/browse/SPARK-23782 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.4.0 >Reporter: Marco Gaido >Priority: Major > > The History Server shows all the applications to all the users, even though > they have no permission to read them. They cannot read the details of the > applications they cannot access, but still anybody can list all the > applications submitted by all users. > For instance, if we have an admin user {{admin}} and two normal users {{u1}} > and {{u2}}, and each of them submitted one application, all of them can see > in the main page of SHS: > ||App ID||App Name|| ... ||Spark User|| ... || > |app-123456789|The Admin App| .. |admin| ... | > |app-123456790|u1 secret app| .. |u1| ... | > |app-123456791|u2 secret app| .. |u2| ... | > Then clicking on each application, the proper permissions are applied and > each user can see only the applications he has the read permission for. > Instead, each user should see only the applications he has the permission to > read and he/she should not be able to see applications he has not the > permissions for. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-23789) Shouldn't set hive.metastore.uris before invoking HiveDelegationTokenProvider
[ https://issues.apache.org/jira/browse/SPARK-23789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23789: Assignee: Apache Spark > Shouldn't set hive.metastore.uris before invoking HiveDelegationTokenProvider > - > > Key: SPARK-23789 > URL: https://issues.apache.org/jira/browse/SPARK-23789 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0, 2.4.0 >Reporter: Yuming Wang >Assignee: Apache Spark >Priority: Major > > {noformat} > 18/03/23 23:33:35 WARN HiveConf: DEPRECATED: hive.metastore.ds.retry.* no > longer has any effect. Use hive.hmshandler.retry.* instead > 18/03/23 23:33:35 WARN HiveConf: HiveConf of name hive.metastore.local does > not exist > 18/03/23 23:33:35 WARN HiveConf: HiveConf of name > hive.metastore.ds.retry.attempts does not exist > 18/03/23 23:33:35 WARN HiveConf: HiveConf of name > hive.metastore.ds.retry.interval does not exist > 18/03/23 23:33:35 WARN HiveConf: HiveConf of name > hive.server2.enable.impersonation does not exist > 18/03/23 23:33:35 INFO metastore: Trying to connect to metastore with URI > thrift://metastore.com:9083 > 18/03/23 23:33:35 ERROR TSaslTransport: SASL negotiation failure > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)] > at > com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) > at > org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94) > at > org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) > at > org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37) > at > org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52) > at > org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) > at > org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:420) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:236) > at > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:74) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:86) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104) > at > org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005) > at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024) > at > org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1234) > at > org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:174) > at org.apache.hadoop.hive.ql.metadata.Hive.(Hive.java:166) > at > org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:124) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:879) > at >
[jira] [Commented] (SPARK-23789) Shouldn't set hive.metastore.uris before invoking HiveDelegationTokenProvider
[ https://issues.apache.org/jira/browse/SPARK-23789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16412470#comment-16412470 ] Apache Spark commented on SPARK-23789: -- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/20898 > Shouldn't set hive.metastore.uris before invoking HiveDelegationTokenProvider > - > > Key: SPARK-23789 > URL: https://issues.apache.org/jira/browse/SPARK-23789 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0, 2.4.0 >Reporter: Yuming Wang >Priority: Major > > {noformat} > 18/03/23 23:33:35 WARN HiveConf: DEPRECATED: hive.metastore.ds.retry.* no > longer has any effect. Use hive.hmshandler.retry.* instead > 18/03/23 23:33:35 WARN HiveConf: HiveConf of name hive.metastore.local does > not exist > 18/03/23 23:33:35 WARN HiveConf: HiveConf of name > hive.metastore.ds.retry.attempts does not exist > 18/03/23 23:33:35 WARN HiveConf: HiveConf of name > hive.metastore.ds.retry.interval does not exist > 18/03/23 23:33:35 WARN HiveConf: HiveConf of name > hive.server2.enable.impersonation does not exist > 18/03/23 23:33:35 INFO metastore: Trying to connect to metastore with URI > thrift://metastore.com:9083 > 18/03/23 23:33:35 ERROR TSaslTransport: SASL negotiation failure > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)] > at > com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) > at > org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94) > at > org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) > at > org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37) > at > org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52) > at > org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) > at > org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:420) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:236) > at > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:74) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:86) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104) > at > org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005) > at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024) > at > org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1234) > at > org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:174) > at org.apache.hadoop.hive.ql.metadata.Hive.(Hive.java:166) > at > org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:124) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at >
[jira] [Assigned] (SPARK-23789) Shouldn't set hive.metastore.uris before invoking HiveDelegationTokenProvider
[ https://issues.apache.org/jira/browse/SPARK-23789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23789: Assignee: (was: Apache Spark) > Shouldn't set hive.metastore.uris before invoking HiveDelegationTokenProvider > - > > Key: SPARK-23789 > URL: https://issues.apache.org/jira/browse/SPARK-23789 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0, 2.4.0 >Reporter: Yuming Wang >Priority: Major > > {noformat} > 18/03/23 23:33:35 WARN HiveConf: DEPRECATED: hive.metastore.ds.retry.* no > longer has any effect. Use hive.hmshandler.retry.* instead > 18/03/23 23:33:35 WARN HiveConf: HiveConf of name hive.metastore.local does > not exist > 18/03/23 23:33:35 WARN HiveConf: HiveConf of name > hive.metastore.ds.retry.attempts does not exist > 18/03/23 23:33:35 WARN HiveConf: HiveConf of name > hive.metastore.ds.retry.interval does not exist > 18/03/23 23:33:35 WARN HiveConf: HiveConf of name > hive.server2.enable.impersonation does not exist > 18/03/23 23:33:35 INFO metastore: Trying to connect to metastore with URI > thrift://metastore.com:9083 > 18/03/23 23:33:35 ERROR TSaslTransport: SASL negotiation failure > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)] > at > com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) > at > org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94) > at > org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) > at > org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37) > at > org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52) > at > org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) > at > org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:420) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:236) > at > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:74) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:86) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104) > at > org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005) > at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024) > at > org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1234) > at > org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:174) > at org.apache.hadoop.hive.ql.metadata.Hive.(Hive.java:166) > at > org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:124) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:879) > at >
[jira] [Created] (SPARK-23789) Shouldn't set hive.metastore.uris before invoking HiveDelegationTokenProvider
Yuming Wang created SPARK-23789: --- Summary: Shouldn't set hive.metastore.uris before invoking HiveDelegationTokenProvider Key: SPARK-23789 URL: https://issues.apache.org/jira/browse/SPARK-23789 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.3.0, 2.4.0 Reporter: Yuming Wang {noformat} 18/03/23 23:33:35 WARN HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead 18/03/23 23:33:35 WARN HiveConf: HiveConf of name hive.metastore.local does not exist 18/03/23 23:33:35 WARN HiveConf: HiveConf of name hive.metastore.ds.retry.attempts does not exist 18/03/23 23:33:35 WARN HiveConf: HiveConf of name hive.metastore.ds.retry.interval does not exist 18/03/23 23:33:35 WARN HiveConf: HiveConf of name hive.server2.enable.impersonation does not exist 18/03/23 23:33:35 INFO metastore: Trying to connect to metastore with URI thrift://metastore.com:9083 18/03/23 23:33:35 ERROR TSaslTransport: SASL negotiation failure javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) at org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:420) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:236) at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:74) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:86) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024) at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1234) at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:174) at org.apache.hadoop.hive.ql.metadata.Hive.(Hive.java:166) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:124) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:879) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt) at
[jira] [Comment Edited] (SPARK-23780) Failed to use googleVis library with new SparkR
[ https://issues.apache.org/jira/browse/SPARK-23780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16412458#comment-16412458 ] Felix Cheung edited comment on SPARK-23780 at 3/24/18 6:53 AM: --- here [https://github.com/mages/googleVis/blob/master/R/zzz.R#L39] or here [https://github.com/jeroen/jsonlite/blob/master/R/toJSON.R#L2] was (Author: felixcheung): here [https://github.com/mages/googleVis/blob/master/R/zzz.R#L39] > Failed to use googleVis library with new SparkR > --- > > Key: SPARK-23780 > URL: https://issues.apache.org/jira/browse/SPARK-23780 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.2.1 >Reporter: Ivan Dzikovsky >Priority: Major > > I've tried to use googleVis library with Spark 2.2.1, and faced with problem. > Steps to reproduce: > # Install R with googleVis library. > # Run SparkR: > {code} > sparkR --master yarn --deploy-mode client > {code} > # Run code that uses googleVis: > {code} > library(googleVis) > df=data.frame(country=c("US", "GB", "BR"), > val1=c(10,13,14), > val2=c(23,12,32)) > Bar <- gvisBarChart(df) > cat("%html ", Bar$html$chart) > {code} > Than I got following error message: > {code} > Error : .onLoad failed in loadNamespace() for 'googleVis', details: > call: rematchDefinition(definition, fdef, mnames, fnames, signature) > error: methods can add arguments to the generic 'toJSON' only if '...' is > an argument to the generic > Error : package or namespace load failed for 'googleVis' > {code} > But expected result is to get some HTML code output, as it was with Spark > 2.1.0. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23780) Failed to use googleVis library with new SparkR
[ https://issues.apache.org/jira/browse/SPARK-23780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16412458#comment-16412458 ] Felix Cheung commented on SPARK-23780: -- here [https://github.com/mages/googleVis/blob/master/R/zzz.R#L39] > Failed to use googleVis library with new SparkR > --- > > Key: SPARK-23780 > URL: https://issues.apache.org/jira/browse/SPARK-23780 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.2.1 >Reporter: Ivan Dzikovsky >Priority: Major > > I've tried to use googleVis library with Spark 2.2.1, and faced with problem. > Steps to reproduce: > # Install R with googleVis library. > # Run SparkR: > {code} > sparkR --master yarn --deploy-mode client > {code} > # Run code that uses googleVis: > {code} > library(googleVis) > df=data.frame(country=c("US", "GB", "BR"), > val1=c(10,13,14), > val2=c(23,12,32)) > Bar <- gvisBarChart(df) > cat("%html ", Bar$html$chart) > {code} > Than I got following error message: > {code} > Error : .onLoad failed in loadNamespace() for 'googleVis', details: > call: rematchDefinition(definition, fdef, mnames, fnames, signature) > error: methods can add arguments to the generic 'toJSON' only if '...' is > an argument to the generic > Error : package or namespace load failed for 'googleVis' > {code} > But expected result is to get some HTML code output, as it was with Spark > 2.1.0. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23780) Failed to use googleVis library with new SparkR
[ https://issues.apache.org/jira/browse/SPARK-23780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16412457#comment-16412457 ] Felix Cheung commented on SPARK-23780: -- hmm, I think the cause of this is the incompatibility of the method signature of toJSON > Failed to use googleVis library with new SparkR > --- > > Key: SPARK-23780 > URL: https://issues.apache.org/jira/browse/SPARK-23780 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.2.1 >Reporter: Ivan Dzikovsky >Priority: Major > > I've tried to use googleVis library with Spark 2.2.1, and faced with problem. > Steps to reproduce: > # Install R with googleVis library. > # Run SparkR: > {code} > sparkR --master yarn --deploy-mode client > {code} > # Run code that uses googleVis: > {code} > library(googleVis) > df=data.frame(country=c("US", "GB", "BR"), > val1=c(10,13,14), > val2=c(23,12,32)) > Bar <- gvisBarChart(df) > cat("%html ", Bar$html$chart) > {code} > Than I got following error message: > {code} > Error : .onLoad failed in loadNamespace() for 'googleVis', details: > call: rematchDefinition(definition, fdef, mnames, fnames, signature) > error: methods can add arguments to the generic 'toJSON' only if '...' is > an argument to the generic > Error : package or namespace load failed for 'googleVis' > {code} > But expected result is to get some HTML code output, as it was with Spark > 2.1.0. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org