[jira] [Created] (SPARK-29354) Spark has direct dependency on jline, but binaries for 'without hadoop' don't have a jline jar file.
Sungpeo Kook created SPARK-29354: Summary: Spark has direct dependency on jline, but binaries for 'without hadoop' don't have a jline jar file. Key: SPARK-29354 URL: https://issues.apache.org/jira/browse/SPARK-29354 Project: Spark Issue Type: Bug Components: Build Affects Versions: 2.4.4, 2.3.4 Environment: From spark 2.3.x, spark 2.4.x Reporter: Sungpeo Kook Spark has direct dependency on jline, included in the root pom.xml but binaries for 'without hadoop' don't have a jline jar file. spark 2.2.x has the jline jar. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29302) dynamic partition overwrite with speculation enabled
[ https://issues.apache.org/jira/browse/SPARK-29302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944194#comment-16944194 ] L. C. Hsieh commented on SPARK-29302: - If this is not an issue, we should close it. > dynamic partition overwrite with speculation enabled > > > Key: SPARK-29302 > URL: https://issues.apache.org/jira/browse/SPARK-29302 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4 >Reporter: feiwang >Priority: Major > > Now, for a dynamic partition overwrite operation, the filename of a task > output is determinable. > So, if speculation is enabled, would a task conflict with its relative > speculation task? > Would the two tasks concurrent write a same file? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-29351) Avoid full synchronization in ShuffleMapStage
[ https://issues.apache.org/jira/browse/SPARK-29351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] L. C. Hsieh reassigned SPARK-29351: --- Assignee: DB Tsai > Avoid full synchronization in ShuffleMapStage > - > > Key: SPARK-29351 > URL: https://issues.apache.org/jira/browse/SPARK-29351 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 2.4.4 > Environment: # >Reporter: DB Tsai >Assignee: DB Tsai >Priority: Major > Fix For: 3.0.0 > > > In one of our production streaming jobs that has more than 1k executors, and > each has 20 cores, Spark spends significant portion of time (30s) in sending > out the `ShuffeStatus`. We find there are two issues. > # In driver's message loop, it's calling `serializedMapStatus` which is in > sync block. When the job scales really big, it can cause the contention. > # When the job is big, the `MapStatus` is huge as well, the serialization > time and compression time is slow. > This work aims to address the first problem. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29351) Avoid full synchronization in ShuffleMapStage
[ https://issues.apache.org/jira/browse/SPARK-29351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] L. C. Hsieh resolved SPARK-29351. - Resolution: Resolved > Avoid full synchronization in ShuffleMapStage > - > > Key: SPARK-29351 > URL: https://issues.apache.org/jira/browse/SPARK-29351 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 2.4.4 > Environment: # >Reporter: DB Tsai >Assignee: DB Tsai >Priority: Major > Fix For: 3.0.0 > > > In one of our production streaming jobs that has more than 1k executors, and > each has 20 cores, Spark spends significant portion of time (30s) in sending > out the `ShuffeStatus`. We find there are two issues. > # In driver's message loop, it's calling `serializedMapStatus` which is in > sync block. When the job scales really big, it can cause the contention. > # When the job is big, the `MapStatus` is huge as well, the serialization > time and compression time is slow. > This work aims to address the first problem. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29353) AlterTableAlterColumnStatement should fallback to v1 AlterTableChangeColumnCommand
Wenchen Fan created SPARK-29353: --- Summary: AlterTableAlterColumnStatement should fallback to v1 AlterTableChangeColumnCommand Key: SPARK-29353 URL: https://issues.apache.org/jira/browse/SPARK-29353 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0 Reporter: Wenchen Fan -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29337) How to Cache Table and Pin it in Memory and should not Spill to Disk on Thrift Server
[ https://issues.apache.org/jira/browse/SPARK-29337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944168#comment-16944168 ] Yuming Wang commented on SPARK-29337: - Could you try to cache table with options? {code:sql} CACHE TABLE tableName OPTIONS('storageLevel' 'MEMORY_ONLY'); {code} https://github.com/apache/spark/pull/22263 > How to Cache Table and Pin it in Memory and should not Spill to Disk on > Thrift Server > -- > > Key: SPARK-29337 > URL: https://issues.apache.org/jira/browse/SPARK-29337 > Project: Spark > Issue Type: Question > Components: SQL >Affects Versions: 2.3.0 >Reporter: Srini E >Priority: Major > Attachments: Cache+Image.png > > > Hi Team, > How to pin the table in cache so it would not swap out of memory? > Situation: We are using Microstrategy BI reporting. Semantic layer is built. > We wanted to Cache highly used tables into CACHE using Spark SQL CACHE Table > ; we did cache for SPARK context( Thrift server). Please see > below snapshot of Cache table, went to disk over time. Initially it was all > in cache , now some in cache and some in disk. That disk may be local disk > relatively more expensive reading than from s3. Queries may take longer and > inconsistent times from user experience perspective. If More queries running > using Cache tables, copies of the cache table images are copied and copies > are not staying in memory causing reports to run longer. so how to pin the > table so would not swap to disk. Spark memory management is dynamic > allocation, and how to use those few tables to Pin in memory . -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29323) Add tooltip for The Executors Tab's column names in the Spark history server Page
[ https://issues.apache.org/jira/browse/SPARK-29323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liucht-inspur updated SPARK-29323: -- Attachment: image-2019-10-04-09-42-14-174.png > Add tooltip for The Executors Tab's column names in the Spark history server > Page > - > > Key: SPARK-29323 > URL: https://issues.apache.org/jira/browse/SPARK-29323 > Project: Spark > Issue Type: New Feature > Components: Web UI >Affects Versions: 2.4.4 >Reporter: liucht-inspur >Priority: Major > Fix For: 2.4.4 > > Attachments: image-2019-10-04-09-42-14-174.png > > > the spark Executors of history Tab page, the Summary part shows the line in > the list of title, but format is irregular. > Some column names have tooltip, such as Storage Memory, Task Time(GC Time), > Input, Shuffle Read,Shuffle Write and Blacklisted, but there are still some > list names that have not tooltip. They are RDD Blocks, Disk Used,Cores, > Activity Tasks, Failed Tasks , Complete Tasks and Total Tasks. oddly, > Executors section below,All the column names Contains the column names above > have tooltip . -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29352) Move active streaming query state to the SharedState
Burak Yavuz created SPARK-29352: --- Summary: Move active streaming query state to the SharedState Key: SPARK-29352 URL: https://issues.apache.org/jira/browse/SPARK-29352 Project: Spark Issue Type: Improvement Components: Structured Streaming Affects Versions: 2.4.4, 3.0.0 Reporter: Burak Yavuz We have checks to prevent the restarting of the same stream on the same spark session, but we can actually make that better in multi-tenant environments by actually putting that state in the SharedState instead of SessionState. This would allow a more comprehensive check for multi-tenant clusters. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29339) Support Arrow 0.14 in vectoried dapply and gapply (test it in AppVeyor build)
[ https://issues.apache.org/jira/browse/SPARK-29339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-29339. -- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 25993 [https://github.com/apache/spark/pull/25993] > Support Arrow 0.14 in vectoried dapply and gapply (test it in AppVeyor build) > - > > Key: SPARK-29339 > URL: https://issues.apache.org/jira/browse/SPARK-29339 > Project: Spark > Issue Type: Improvement > Components: SparkR >Affects Versions: 3.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Fix For: 3.0.0 > > > dapply and gapply with Arrow optimization and Arrow 0.14 seems failing: > {code} > > collect(dapply(df, function(rdf) { data.frame(rdf$gear + 1) }, > > structType("gear double"))) > Error in readBin(con, raw(), as.integer(dataLen), endian = "big") : > invalid 'n' argument > {code} > We should fix it and also test it in AppVeyor -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-29350) Fix BroadcastExchange reuse in Dynamic Partition Pruning
[ https://issues.apache.org/jira/browse/SPARK-29350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li reassigned SPARK-29350: --- Assignee: Wei Xue > Fix BroadcastExchange reuse in Dynamic Partition Pruning > > > Key: SPARK-29350 > URL: https://issues.apache.org/jira/browse/SPARK-29350 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Wei Xue >Assignee: Wei Xue >Priority: Major > > Dynamic partition pruning filters are added as an in-subquery containing a > {{BroadcastExchange}} in a broadcast hash join. To ensure this new > {{BroadcastExchange}} can be reused, we need to make the {{ReuseExchange}} > rule visit in-subquery nodes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29350) Fix BroadcastExchange reuse in Dynamic Partition Pruning
[ https://issues.apache.org/jira/browse/SPARK-29350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-29350. - Fix Version/s: 3.0.0 Resolution: Fixed > Fix BroadcastExchange reuse in Dynamic Partition Pruning > > > Key: SPARK-29350 > URL: https://issues.apache.org/jira/browse/SPARK-29350 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Wei Xue >Assignee: Wei Xue >Priority: Major > Fix For: 3.0.0 > > > Dynamic partition pruning filters are added as an in-subquery containing a > {{BroadcastExchange}} in a broadcast hash join. To ensure this new > {{BroadcastExchange}} can be reused, we need to make the {{ReuseExchange}} > rule visit in-subquery nodes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28583) Subqueries should not call `onUpdatePlan` in Adaptive Query Execution
[ https://issues.apache.org/jira/browse/SPARK-28583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xingbo Jiang reassigned SPARK-28583: Assignee: Wei Xue (was: Xingbo Jiang) > Subqueries should not call `onUpdatePlan` in Adaptive Query Execution > - > > Key: SPARK-28583 > URL: https://issues.apache.org/jira/browse/SPARK-28583 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Wei Xue >Assignee: Wei Xue >Priority: Major > Fix For: 3.0.0 > > > Subqueries do not have their own execution id, thus when calling > {{AdaptiveSparkPlanExec.onUpdatePlan}}, it will actually get the > {{QueryExecution}} instance of the main query, which is wasteful and > problematic. It could cause issues like stack overflow or dead locks in some > circumstances. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28583) Subqueries should not call `onUpdatePlan` in Adaptive Query Execution
[ https://issues.apache.org/jira/browse/SPARK-28583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xingbo Jiang reassigned SPARK-28583: Assignee: Xingbo Jiang (was: Wei Xue) > Subqueries should not call `onUpdatePlan` in Adaptive Query Execution > - > > Key: SPARK-28583 > URL: https://issues.apache.org/jira/browse/SPARK-28583 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Wei Xue >Assignee: Xingbo Jiang >Priority: Major > Fix For: 3.0.0 > > > Subqueries do not have their own execution id, thus when calling > {{AdaptiveSparkPlanExec.onUpdatePlan}}, it will actually get the > {{QueryExecution}} instance of the main query, which is wasteful and > problematic. It could cause issues like stack overflow or dead locks in some > circumstances. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29336) The implementation of QuantileSummaries.merge does not guarantee that the relativeError will be respected
[ https://issues.apache.org/jira/browse/SPARK-29336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944046#comment-16944046 ] Guilherme Souza commented on SPARK-29336: - I've added a new test case that reproduces the problem here: https://github.com/sitegui/spark/commit/fa123cf289c47ceeb6f84278ae028e5a46a85bf0 The problem is triggered specially when the merging summaries had seen an uneven number of samples. I've managed to reproduce for exact splits as well, however that requires a larger number of samples. I'm current working at a forked branch and will try to create a PR that fixes the issue in the following days. > The implementation of QuantileSummaries.merge does not guarantee that the > relativeError will be respected > --- > > Key: SPARK-29336 > URL: https://issues.apache.org/jira/browse/SPARK-29336 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.3 >Reporter: Guilherme Souza >Priority: Minor > > Hello Spark maintainers, > I was experimenting with my own implementation of the [space-efficient > quantile > algorithm|http://infolab.stanford.edu/~datar/courses/cs361a/papers/quantiles.pdf] > in another language and I was using the Spark's one as a reference. > In my analysis, I believe to have found an issue with the {{merge()}} logic. > Here is some simple Scala code that reproduces the issue I've found: > > {code:java} > var values = (1 to 100).toArray > val all_quantiles = values.indices.map(i => (i+1).toDouble / > values.length).toArray > for (n <- 0 until 5) { > var df = spark.sparkContext.makeRDD(values).toDF("value").repartition(5) > val all_answers = df.stat.approxQuantile("value", all_quantiles, 0.1) > val all_answered_ranks = all_answers.map(ans => values.indexOf(ans)).toArray > val error = all_answered_ranks.zipWithIndex.map({ case (answer, expected) > => Math.abs(expected - answer) }).toArray > val max_error = error.max > print(max_error + "\n") > } > {code} > I query for all possible quantiles in a 100-element array with a desired 10% > max error. In this scenario, one would expect to observe a maximum error of > 10 ranks or less (10% of 100). However, the output I observe is: > > {noformat} > 16 > 12 > 10 > 11 > 17{noformat} > The variance is probably due to non-deterministic operations behind the > scenes, but irrelevant to the core cause. (and sorry for my Scala, I'm not > used to it) > Interestingly enough, if I change from five to one partition the code works > as expected and gives 10 every time. This seems to point to some problem at > the [merge > logic|https://github.com/apache/spark/blob/51d6ba7490eaac32fc33b8996fdf06b747884a54/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/QuantileSummaries.scala#L153-L171] > The original authors ([~clockfly] and [~cloud_fan] for what I could dig from > the history) suggest the published paper is not clear on how that should be > done and, honestly, I was not confident in the current approach either. > I've found SPARK-21184 that reports the same problem, but it was > unfortunately closed with no fix applied. > In my external implementation I believe to have found a sound way to > implement the merge method. [Here is my take in Rust, if > relevant|https://github.com/sitegui/space-efficient-quantile/blob/188c74638c9840e5f47d6c6326b2886d47b149bc/src/modified_gk/summary.rs#L162-L218] > I'd be really glad to add unit tests and contribute my implementation adapted > to Scala. > I'd love to hear your opinion on the matter. > Best regards > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27903) Improve parser error message for mismatched parentheses in expressions
[ https://issues.apache.org/jira/browse/SPARK-27903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944041#comment-16944041 ] Jeff Evans commented on SPARK-27903: I've been spending some time playing around with the grammar, and I'm not sure this is possible in the general case. It should be easy enough to handle the case outlined in this Jira (I have a working change for that), but an "extra" right parenthesis is much more challenging due to the way ANTLR works, and the way the grammar is written. > Improve parser error message for mismatched parentheses in expressions > -- > > Key: SPARK-27903 > URL: https://issues.apache.org/jira/browse/SPARK-27903 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yesheng Ma >Priority: Major > > When parentheses are mismatched in expressions in queries, the error message > is confusing. This is especially true for large queries, where mismatched > parens are tedious for human to figure out. > For example, the error message for > {code:sql} > SELECT ((x + y) * z FROM t; > {code} > is > {code:java} > mismatched input 'FROM' expecting ','(line 1, pos 20) > {code} > One possible way to fix is to explicitly capture such kind of mismatched > parens in a grammar rule and print user-friendly error message such as > {code:java} > mismatched parentheses for expression 'SELECT ((x + y) * z FROM t;'(line 1, > pos 20) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29351) Avoid full synchronization in ShuffleMapStage
DB Tsai created SPARK-29351: --- Summary: Avoid full synchronization in ShuffleMapStage Key: SPARK-29351 URL: https://issues.apache.org/jira/browse/SPARK-29351 Project: Spark Issue Type: New Feature Components: Spark Core Affects Versions: 2.4.4 Environment: # Reporter: DB Tsai Fix For: 3.0.0 In one of our production streaming jobs that has more than 1k executors, and each has 20 cores, Spark spends significant portion of time (30s) in sending out the `ShuffeStatus`. We find there are two issues. # In driver's message loop, it's calling `serializedMapStatus` which is in sync block. When the job scales really big, it can cause the contention. # When the job is big, the `MapStatus` is huge as well, the serialization time and compression time is slow. This work aims to address the first problem. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29329) maven incremental builds not working
[ https://issues.apache.org/jira/browse/SPARK-29329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943839#comment-16943839 ] Thomas Graves commented on SPARK-29329: --- filed issue with scala-maven-plugin, we will see what they say: https://github.com/davidB/scala-maven-plugin/issues/364 > maven incremental builds not working > > > Key: SPARK-29329 > URL: https://issues.apache.org/jira/browse/SPARK-29329 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.0.0 >Reporter: Thomas Graves >Priority: Major > > It looks like since we Upgraded scala-maven-plugin to 4.2.0 > https://issues.apache.org/jira/browse/SPARK-28759 spark incremental builds > stop working. Everytime you build its building all files, which takes > forever. > It would be nice to fix this. > > To reproduce, just build spark once ( I happened to be using the command > below): > build/mvn -Phadoop-3.2 -Phive-thriftserver -Phive -Pyarn -Pkinesis-asl > -Pkubernetes -Pmesos -Phadoop-cloud -Pspark-ganglia-lgpl package -DskipTests > Then build it again and you will see that it compiles all the files and takes > 15-30 minutes. With incremental it skips all unnecessary files and takes > closer to 5 minutes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29054) Invalidate Kafka consumer when new delegation token available
[ https://issues.apache.org/jira/browse/SPARK-29054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Masiero Vanzin resolved SPARK-29054. Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 25760 [https://github.com/apache/spark/pull/25760] > Invalidate Kafka consumer when new delegation token available > - > > Key: SPARK-29054 > URL: https://issues.apache.org/jira/browse/SPARK-29054 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.0.0 >Reporter: Gabor Somogyi >Assignee: Gabor Somogyi >Priority: Major > Fix For: 3.0.0 > > > Kafka consumers are cached. If delegation token is used and the token is > expired, then exception is thrown. Such case new consumer is created in a > Task retry with the latest delegation token. This can be enhanced by > detecting the existence of a new delegation token. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-29054) Invalidate Kafka consumer when new delegation token available
[ https://issues.apache.org/jira/browse/SPARK-29054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Masiero Vanzin reassigned SPARK-29054: -- Assignee: Gabor Somogyi > Invalidate Kafka consumer when new delegation token available > - > > Key: SPARK-29054 > URL: https://issues.apache.org/jira/browse/SPARK-29054 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.0.0 >Reporter: Gabor Somogyi >Assignee: Gabor Somogyi >Priority: Major > > Kafka consumers are cached. If delegation token is used and the token is > expired, then exception is thrown. Such case new consumer is created in a > Task retry with the latest delegation token. This can be enhanced by > detecting the existence of a new delegation token. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29350) Fix BroadcastExchange reuse in Dynamic Partition Pruning
Wei Xue created SPARK-29350: --- Summary: Fix BroadcastExchange reuse in Dynamic Partition Pruning Key: SPARK-29350 URL: https://issues.apache.org/jira/browse/SPARK-29350 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0 Reporter: Wei Xue Dynamic partition pruning filters are added as an in-subquery containing a {{BroadcastExchange}} in a broadcast hash join. To ensure this new {{BroadcastExchange}} can be reused, we need to make the {{ReuseExchange}} rule visit in-subquery nodes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-29320) Compare `sql/core` module in JDK8/11 (Part 1)
[ https://issues.apache.org/jira/browse/SPARK-29320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-29320: - Assignee: Dongjoon Hyun > Compare `sql/core` module in JDK8/11 (Part 1) > - > > Key: SPARK-29320 > URL: https://issues.apache.org/jira/browse/SPARK-29320 > Project: Spark > Issue Type: Sub-task > Components: Tests >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29320) Compare `sql/core` module in JDK8/11 (Part 1)
[ https://issues.apache.org/jira/browse/SPARK-29320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-29320. --- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 26003 [https://github.com/apache/spark/pull/26003] > Compare `sql/core` module in JDK8/11 (Part 1) > - > > Key: SPARK-29320 > URL: https://issues.apache.org/jira/browse/SPARK-29320 > Project: Spark > Issue Type: Sub-task > Components: Tests >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29349) Support FETCH_PRIOR in Thriftserver query results fetching
Juliusz Sompolski created SPARK-29349: - Summary: Support FETCH_PRIOR in Thriftserver query results fetching Key: SPARK-29349 URL: https://issues.apache.org/jira/browse/SPARK-29349 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Juliusz Sompolski Support FETCH_PRIOR fetching in Thriftserver -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29296) Use scala-parallel-collections library in 2.13
[ https://issues.apache.org/jira/browse/SPARK-29296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-29296. -- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 25980 [https://github.com/apache/spark/pull/25980] > Use scala-parallel-collections library in 2.13 > -- > > Key: SPARK-29296 > URL: https://issues.apache.org/jira/browse/SPARK-29296 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Sean R. Owen >Assignee: Sean R. Owen >Priority: Minor > Fix For: 3.0.0 > > > Classes like ForkJoinTaskSupport and .par moved to scala-parallel-collections > in 2.13. This needs to be included as a dependency only in 2.13 via a > profile. However we'll also have to rewrite use of .par to get this to work > in 2.12 and 2.13 simultaneously: > https://github.com/scala/scala-parallel-collections/issues/22 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-29296) Use scala-parallel-collections library in 2.13
[ https://issues.apache.org/jira/browse/SPARK-29296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-29296: Assignee: Sean R. Owen > Use scala-parallel-collections library in 2.13 > -- > > Key: SPARK-29296 > URL: https://issues.apache.org/jira/browse/SPARK-29296 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Sean R. Owen >Assignee: Sean R. Owen >Priority: Minor > > Classes like ForkJoinTaskSupport and .par moved to scala-parallel-collections > in 2.13. This needs to be included as a dependency only in 2.13 via a > profile. However we'll also have to rewrite use of .par to get this to work > in 2.12 and 2.13 simultaneously: > https://github.com/scala/scala-parallel-collections/issues/22 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29348) Add observable metrics
Herman van Hövell created SPARK-29348: - Summary: Add observable metrics Key: SPARK-29348 URL: https://issues.apache.org/jira/browse/SPARK-29348 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 3.0.0 Reporter: Herman van Hövell Assignee: Herman van Hövell -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29347) External Row should be JSON serializable
Herman van Hövell created SPARK-29347: - Summary: External Row should be JSON serializable Key: SPARK-29347 URL: https://issues.apache.org/jira/browse/SPARK-29347 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 3.0.0 Reporter: Herman van Hövell Assignee: Herman van Hövell External row should be exportable to json. This is needed for observable metrics because we want to include these metrics in streaming query progress (which is JSON serializable). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29345) Add an API that allows a user to define and observe arbitrary metrics on streaming queries
[ https://issues.apache.org/jira/browse/SPARK-29345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hövell updated SPARK-29345: -- Summary: Add an API that allows a user to define and observe arbitrary metrics on streaming queries (was: Add an API that allows a user to define and obser arbitrary metrics on streaming queries) > Add an API that allows a user to define and observe arbitrary metrics on > streaming queries > -- > > Key: SPARK-29345 > URL: https://issues.apache.org/jira/browse/SPARK-29345 > Project: Spark > Issue Type: Epic > Components: SQL >Affects Versions: 3.0.0 >Reporter: Herman van Hövell >Assignee: Herman van Hövell >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29346) Create Aggregating Accumulator
Herman van Hövell created SPARK-29346: - Summary: Create Aggregating Accumulator Key: SPARK-29346 URL: https://issues.apache.org/jira/browse/SPARK-29346 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 3.0.0 Reporter: Herman van Hövell Assignee: Herman van Hövell Create an accumulator that can compute a global aggregate over an arbitrary number of expressions. We will use this to implement observable metrics. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29345) Add an API that allows a user to define and obser arbitrary metrics on streaming queries
Herman van Hövell created SPARK-29345: - Summary: Add an API that allows a user to define and obser arbitrary metrics on streaming queries Key: SPARK-29345 URL: https://issues.apache.org/jira/browse/SPARK-29345 Project: Spark Issue Type: Epic Components: SQL Affects Versions: 3.0.0 Reporter: Herman van Hövell Assignee: Herman van Hövell -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29344) Spark application hang
[ https://issues.apache.org/jira/browse/SPARK-29344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kitti updated SPARK-29344: -- Attachment: stderr > Spark application hang > -- > > Key: SPARK-29344 > URL: https://issues.apache.org/jira/browse/SPARK-29344 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.1 >Reporter: Kitti >Priority: Major > Attachments: stderr > > > We found the issue that Spark application hang and stop working sometime > without any log in Spark Driver until we killed the application. > > 19/10/03 06:07:03 INFO spark.ContextCleaner: Cleaned accumulator 117 19/10/03 > 06:07:03 INFO spark.ContextCleaner: Cleaned accumulator 80 19/10/03 06:07:03 > INFO spark.ContextCleaner: Cleaned accumulator 105 19/10/03 06:07:03 INFO > spark.ContextCleaner: Cleaned accumulator 88 19/10/03 10:36:59 ERROR > yarn.ApplicationMaster: RECEIVED SIGNAL TERM -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29344) Spark application hang
Kitti created SPARK-29344: - Summary: Spark application hang Key: SPARK-29344 URL: https://issues.apache.org/jira/browse/SPARK-29344 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.4.1 Reporter: Kitti We found the issue that Spark application hang and stop working sometime without any log in Spark Driver until we killed the application. 19/10/03 06:07:03 INFO spark.ContextCleaner: Cleaned accumulator 117 19/10/03 06:07:03 INFO spark.ContextCleaner: Cleaned accumulator 80 19/10/03 06:07:03 INFO spark.ContextCleaner: Cleaned accumulator 105 19/10/03 06:07:03 INFO spark.ContextCleaner: Cleaned accumulator 88 19/10/03 10:36:59 ERROR yarn.ApplicationMaster: RECEIVED SIGNAL TERM -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29341) Upgrade cloudpickle to 1.0.0
[ https://issues.apache.org/jira/browse/SPARK-29341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-29341. -- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 26009 [https://github.com/apache/spark/pull/26009] > Upgrade cloudpickle to 1.0.0 > > > Key: SPARK-29341 > URL: https://issues.apache.org/jira/browse/SPARK-29341 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.0.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Fix For: 3.0.0 > > > Cloudpickle 1.0.0 includes two bug fixes. It is better we can upgrade to > include them. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-29341) Upgrade cloudpickle to 1.0.0
[ https://issues.apache.org/jira/browse/SPARK-29341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-29341: Assignee: L. C. Hsieh > Upgrade cloudpickle to 1.0.0 > > > Key: SPARK-29341 > URL: https://issues.apache.org/jira/browse/SPARK-29341 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.0.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > > Cloudpickle 1.0.0 includes two bug fixes. It is better we can upgrade to > include them. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-29142) Pyspark clustering models support column setters/getters/predict
[ https://issues.apache.org/jira/browse/SPARK-29142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-29142: - Assignee: Huaxin Gao > Pyspark clustering models support column setters/getters/predict > > > Key: SPARK-29142 > URL: https://issues.apache.org/jira/browse/SPARK-29142 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 3.0.0 >Reporter: zhengruifeng >Assignee: Huaxin Gao >Priority: Minor > Fix For: 3.0.0 > > > Unlike the reg/clf models, clustering models do not have some common class, > so we need to add them one by one. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29343) Eliminate sorts without limit in the subquery of Join/Aggregation
EdisonWang created SPARK-29343: -- Summary: Eliminate sorts without limit in the subquery of Join/Aggregation Key: SPARK-29343 URL: https://issues.apache.org/jira/browse/SPARK-29343 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: EdisonWang The {{Sort}} without {{Limit}} operator in {{Join/GroupBy}} subquery is useless. For example, {{select count(1) from (select a from test1 order by a)}} is equal to {{select count(1) from (select a from test1)}}. 'select * from (select a from test1 order by a) t1 join (select b from test2) t2 on t1.a = t2.b' is equal to {{select * from (select a from test1) t1 join (select b from test2) t2 on t1.a = t2.b}}. Remove useless {{Sort}} operator can import performance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28084) LOAD DATA command resolving the partition column name considering case senstive manner
[ https://issues.apache.org/jira/browse/SPARK-28084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-28084. --- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 24903 [https://github.com/apache/spark/pull/24903] > LOAD DATA command resolving the partition column name considering case > senstive manner > --- > > Key: SPARK-28084 > URL: https://issues.apache.org/jira/browse/SPARK-28084 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.3 >Reporter: Sujith Chacko >Assignee: Sujith Chacko >Priority: Major > Fix For: 3.0.0 > > Attachments: parition_casesensitive.PNG > > > LOAD DATA command resolving the partition column name considering case > sensitive manner, where as insert command resolves case-insensitive manner. > Refer the snapshot for more details. > !image-2019-06-18-00-04-22-475.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28084) LOAD DATA command resolving the partition column name considering case senstive manner
[ https://issues.apache.org/jira/browse/SPARK-28084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-28084: - Assignee: Sujith Chacko > LOAD DATA command resolving the partition column name considering case > senstive manner > --- > > Key: SPARK-28084 > URL: https://issues.apache.org/jira/browse/SPARK-28084 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.3 >Reporter: Sujith Chacko >Assignee: Sujith Chacko >Priority: Major > Attachments: parition_casesensitive.PNG > > > LOAD DATA command resolving the partition column name considering case > sensitive manner, where as insert command resolves case-insensitive manner. > Refer the snapshot for more details. > !image-2019-06-18-00-04-22-475.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29342) Make casting strings to intervals case insensitive
Maxim Gekk created SPARK-29342: -- Summary: Make casting strings to intervals case insensitive Key: SPARK-29342 URL: https://issues.apache.org/jira/browse/SPARK-29342 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk PostgreSQL is not sensitive to case of interval string values: {code} maxim=# select cast('10 Days' as INTERVAL); interval -- 10 days (1 row) {code} but Spark is not tolerant to case: {code} spark-sql> select cast('INTERVAL 10 DAYS' as INTERVAL); NULL spark-sql> select cast('interval 10 days' as INTERVAL); interval 1 weeks 3 days {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29341) Upgrade cloudpickle to 1.0.0
L. C. Hsieh created SPARK-29341: --- Summary: Upgrade cloudpickle to 1.0.0 Key: SPARK-29341 URL: https://issues.apache.org/jira/browse/SPARK-29341 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 3.0.0 Reporter: L. C. Hsieh Cloudpickle 1.0.0 includes two bug fixes. It is better we can upgrade to include them. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29317) Avoid inheritance hierarchy in pandas CoGroup arrow runner and its plan
[ https://issues.apache.org/jira/browse/SPARK-29317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-29317. -- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 25989 [https://github.com/apache/spark/pull/25989] > Avoid inheritance hierarchy in pandas CoGroup arrow runner and its plan > --- > > Key: SPARK-29317 > URL: https://issues.apache.org/jira/browse/SPARK-29317 > Project: Spark > Issue Type: Improvement > Components: PySpark, SQL >Affects Versions: 3.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Fix For: 3.0.0 > > > At SPARK-27463, some refactoring was made. There are two common base abstract > classes were introduced: > 1. {{BaseArrowPythonRunner}} > Before: > {code} > └── BasePythonRunner > ├── ArrowPythonRunner > ├── CoGroupedArrowPythonRunner > ├── PythonRunner > └── PythonUDFRunner > {code} > After: > {code} > BasePythonRunner > ├── BaseArrowPythonRunner > │ ├── ArrowPythonRunner > │ └── CoGroupedArrowPythonRunner > ├── PythonRunner > └── PythonUDFRunner > {code} > The problem is that R code path is being matched with Python side: > {code} > └── BaseRRunner > ├── ArrowRRunner > └── RRunner > {code} > I would like to match the hierarchy and decouple other stuff for now. Ideally > we should deduplicate both code paths. Internal implementation is also > similar intentionally. > 2. {{BasePandasGroupExec}} > Before: > {code} > ├── FlatMapGroupsInPandasExec > └── FlatMapCoGroupsInPandasExec > {code} > After: > {code} > └── BasePandasGroupExec > ├── FlatMapGroupsInPandasExec > └── FlatMapCoGroupsInPandasExec > {code} > Problem is that, R (with Arrow optimization, in particular) has some > duplicated codes with Pandas UDFs. > {{FlatMapGroupsInRWithArrowExec}} <> {{FlatMapGroupsInPandasExec}} > {{MapPartitionsInRWithArrowExec}} <> {{ArrowEvalPythonExec}} > In order to prepare deduplication here as well, it might better avoid > changing hierarchy alone in Python sides but just rather decouple it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29340) Spark Sql executions do not use thread local jobgroup
Navdeep Poonia created SPARK-29340: -- Summary: Spark Sql executions do not use thread local jobgroup Key: SPARK-29340 URL: https://issues.apache.org/jira/browse/SPARK-29340 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.4.4 Reporter: Navdeep Poonia val sparkThreadLocal: SparkSession = DataCurator.spark.newSession() sparkThreadLocal.sparkContext.setJobGroup("", "") OR sparkThreadLocal.sparkContext.setLocalProperty("spark.job.description", "") sparkThreadLocal.sparkContext.setLocalProperty("spark.jobGroup.id", "") The jobgroup property works fine for spark jobs/stages created by spark dataframe operations but in case of sparksql the jobgroup is randomly assigned to stages or is null sometimes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-29078) Spark shell fails if read permission is not granted to hive warehouse directory
[ https://issues.apache.org/jira/browse/SPARK-29078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943127#comment-16943127 ] Peter Toth edited comment on SPARK-29078 at 10/3/19 7:04 AM: - I don't think there should be other databases under {{/apps/hive/warehouse}} directory if that is a concern. The {{default}} database points to {{/apps/hive/warehouse}}, and new databases are created under that directory as well by default, but you have the option to create a new database pointing to a very different directory. I mean that way we could avoid this issue. Anyways, this doesn't seem to me a Spark related issue. was (Author: petertoth): I don't think there should be other databases under {{/apps/hive/warehouse}} directory if that is a concern. The {{default}} database points to {{/apps/hive/warehouse}}, and new databases are created under that directory as well by default, but you have the option to create a new database pointing to a very different directory. I mean that way we could avoid this issue. > Spark shell fails if read permission is not granted to hive warehouse > directory > --- > > Key: SPARK-29078 > URL: https://issues.apache.org/jira/browse/SPARK-29078 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Mihaly Toth >Priority: Major > > Similarly to SPARK-20256, in {{SharedSessionState}} when > {{GlobalTempViewManager}} is created, it is checked that there is no database > exists that has the same name as of the global temp database (name is > configurable with {{spark.sql.globalTempDatabase}}) , because that is a > special database, which should not exist in the metastore. For this, a read > permission is required on the warehouse directory at the moment, which on the > other hand would allow listing all the databases of all users. > When such a read access is not granted for security reasons, an access > violation exception should be ignored upon such initial validation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-29078) Spark shell fails if read permission is not granted to hive warehouse directory
[ https://issues.apache.org/jira/browse/SPARK-29078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943127#comment-16943127 ] Peter Toth edited comment on SPARK-29078 at 10/3/19 7:01 AM: - I don't think there should be other databases under {{/apps/hive/warehouse}} directory if that is a concern. The {{default}} database points to {{/apps/hive/warehouse}}, and new databases are created under that directory as well by default, but you have the option to create a new database pointing to a very different directory. I mean that way we could avoid this issue. was (Author: petertoth): I don't think there should be other databases under {{/apps/hive/warehouse}} directory if that is a concern. The {{default}} database points to {{/apps/hive/warehouse}}, but you have the option to create a new database pointing to a very different directory. I mean that way we could avoid this issue. > Spark shell fails if read permission is not granted to hive warehouse > directory > --- > > Key: SPARK-29078 > URL: https://issues.apache.org/jira/browse/SPARK-29078 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Mihaly Toth >Priority: Major > > Similarly to SPARK-20256, in {{SharedSessionState}} when > {{GlobalTempViewManager}} is created, it is checked that there is no database > exists that has the same name as of the global temp database (name is > configurable with {{spark.sql.globalTempDatabase}}) , because that is a > special database, which should not exist in the metastore. For this, a read > permission is required on the warehouse directory at the moment, which on the > other hand would allow listing all the databases of all users. > When such a read access is not granted for security reasons, an access > violation exception should be ignored upon such initial validation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-29078) Spark shell fails if read permission is not granted to hive warehouse directory
[ https://issues.apache.org/jira/browse/SPARK-29078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943127#comment-16943127 ] Peter Toth edited comment on SPARK-29078 at 10/3/19 7:00 AM: - I don't think there should be other databases under {{/apps/hive/warehouse}} directory if that is a concern. The {{default}} database points to {{/apps/hive/warehouse}}, but you have the option to create a new database pointing to a very different directory. I mean that way we could avoid this issue. was (Author: petertoth): I don't think there should be other databases under {{/apps/hive/warehouse}} directory if the {{default}} database points to {{/apps/hive/warehouse}}. I mean that way we could avoid this issue. > Spark shell fails if read permission is not granted to hive warehouse > directory > --- > > Key: SPARK-29078 > URL: https://issues.apache.org/jira/browse/SPARK-29078 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Mihaly Toth >Priority: Major > > Similarly to SPARK-20256, in {{SharedSessionState}} when > {{GlobalTempViewManager}} is created, it is checked that there is no database > exists that has the same name as of the global temp database (name is > configurable with {{spark.sql.globalTempDatabase}}) , because that is a > special database, which should not exist in the metastore. For this, a read > permission is required on the warehouse directory at the moment, which on the > other hand would allow listing all the databases of all users. > When such a read access is not granted for security reasons, an access > violation exception should be ignored upon such initial validation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org