[jira] [Created] (SPARK-40475) Allow job status tracking with jobGroupId
Anurag Mantripragada created SPARK-40475: Summary: Allow job status tracking with jobGroupId Key: SPARK-40475 URL: https://issues.apache.org/jira/browse/SPARK-40475 Project: Spark Issue Type: Improvement Components: Spark Core, Web UI Affects Versions: 3.3.0 Reporter: Anurag Mantripragada Spark let's us group jobs together by setting a job group id. This is useful to check the job group in the web UI. For example {{spark.sparkContext().setJobGroup("mygroup_id")}} We have a use-case where we would like to have a long running Spark application and have jobs submitted to it. We would like to programmatically check the status of the jobs created by this group id. For example, [SQLStatusStore|#L41]] has `executionList()` which returns a map of jobs to the status. There is no way to filter this based on jobGroupId. This Jira is to add ability to get fine grained job statues by jobGroupId. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30894) The nullability of Size function should not depend on SQLConf.get
[ https://issues.apache.org/jira/browse/SPARK-30894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214822#comment-17214822 ] Anurag Mantripragada commented on SPARK-30894: -- Hi [~dongjoon], here's the backport PR: [https://github.com/apache/spark/pull/30058] > The nullability of Size function should not depend on SQLConf.get > - > > Key: SPARK-30894 > URL: https://issues.apache.org/jira/browse/SPARK-30894 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Wenchen Fan >Assignee: Maxim Gekk >Priority: Blocker > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30893) Expressions should not change its data type/nullability after it's created
[ https://issues.apache.org/jira/browse/SPARK-30893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212749#comment-17212749 ] Anurag Mantripragada commented on SPARK-30893: -- [~maropu], [~dongjoon] - I went through the PRs for individual issues in this Umbrella and looked at the code changes. Only [SPARK-30894|https://issues.apache.org/jira/browse/SPARK-30894] seems to affect branch-2.4. I've commented on that Jira separately asking the original author if we can backport this to branch-2.4. > Expressions should not change its data type/nullability after it's created > -- > > Key: SPARK-30893 > URL: https://issues.apache.org/jira/browse/SPARK-30893 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Wenchen Fan >Priority: Critical > Fix For: 3.0.0 > > > This is a problem because the configuration can change between different > phases of planning, and this can silently break a query plan which can lead > to crashes or data corruption, if data type/nullability gets changed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30894) The nullability of Size function should not depend on SQLConf.get
[ https://issues.apache.org/jira/browse/SPARK-30894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212748#comment-17212748 ] Anurag Mantripragada commented on SPARK-30894: -- [~maxgekk] - Looking at the code in branch-2.4, looks like this could be an issue there - [https://github.com/apache/spark/blob/652e5746019b95b78af4d36c23ec5155bb22325b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L94] Should we backport this to branch-2.4 since it is LTS? > The nullability of Size function should not depend on SQLConf.get > - > > Key: SPARK-30894 > URL: https://issues.apache.org/jira/browse/SPARK-30894 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Wenchen Fan >Assignee: Maxim Gekk >Priority: Blocker > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30893) Expressions should not change its data type/nullability after it's created
[ https://issues.apache.org/jira/browse/SPARK-30893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212717#comment-17212717 ] Anurag Mantripragada commented on SPARK-30893: -- Sorry for not being clear. I was referring to this comment. ??For data type and nullability, I think we should fix before 3.0, as they can lead to data corruption.?? ??For other behaviors, we can have more discussion and wait for 3.1?? > Expressions should not change its data type/nullability after it's created > -- > > Key: SPARK-30893 > URL: https://issues.apache.org/jira/browse/SPARK-30893 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Wenchen Fan >Priority: Critical > Fix For: 3.0.0 > > > This is a problem because the configuration can change between different > phases of planning, and this can silently break a query plan which can lead > to crashes or data corruption, if data type/nullability gets changed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-30893) Expressions should not change its data type/nullability after it's created
[ https://issues.apache.org/jira/browse/SPARK-30893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212635#comment-17212635 ] Anurag Mantripragada edited comment on SPARK-30893 at 10/12/20, 7:56 PM: - As mentioned above in [~cloud_fan]'s comment, should we backport the nullability and datatype issues from this umbrella to branch-2.4 as they may cause corruption? CC: [~viirya], [~dongjoon] was (Author: anuragmantri): As mentioned here [#https://issues.apache.org/jira/browse/SPARK-30893?focusedCommentId=17041618=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17041618], should we backport the nullability and datatype issues from this umbrella to branch-2.4 as they may cause corruption? CC: [~viirya], [~dongjoon] > Expressions should not change its data type/nullability after it's created > -- > > Key: SPARK-30893 > URL: https://issues.apache.org/jira/browse/SPARK-30893 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Wenchen Fan >Priority: Critical > Fix For: 3.0.0 > > > This is a problem because the configuration can change between different > phases of planning, and this can silently break a query plan which can lead > to crashes or data corruption, if data type/nullability gets changed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30893) Expressions should not change its data type/nullability after it's created
[ https://issues.apache.org/jira/browse/SPARK-30893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212635#comment-17212635 ] Anurag Mantripragada commented on SPARK-30893: -- As mentioned here [#https://issues.apache.org/jira/browse/SPARK-30893?focusedCommentId=17041618=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17041618], should we backport the nullability and datatype issues from this umbrella to branch-2.4 as they may cause corruption? CC: [~viirya], [~dongjoon] > Expressions should not change its data type/nullability after it's created > -- > > Key: SPARK-30893 > URL: https://issues.apache.org/jira/browse/SPARK-30893 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Wenchen Fan >Priority: Critical > Fix For: 3.0.0 > > > This is a problem because the configuration can change between different > phases of planning, and this can silently break a query plan which can lead > to crashes or data corruption, if data type/nullability gets changed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28067) Incorrect results in decimal aggregation with whole-stage code gen enabled
[ https://issues.apache.org/jira/browse/SPARK-28067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17209923#comment-17209923 ] Anurag Mantripragada commented on SPARK-28067: -- I just checked the issue exists in branch-2.4. Since this is a `correctness` issue, should we backport it to branch-2.4? cc: [~cloud_fan], [~dongjoon] > Incorrect results in decimal aggregation with whole-stage code gen enabled > -- > > Key: SPARK-28067 > URL: https://issues.apache.org/jira/browse/SPARK-28067 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.4, 3.0.0 >Reporter: Mark Sirek >Assignee: Sunitha Kambhampati >Priority: Critical > Labels: correctness > Fix For: 3.1.0 > > > The following test case involving a join followed by a sum aggregation > returns the wrong answer for the sum: > > {code:java} > val df = Seq( > (BigDecimal("1000"), 1), > (BigDecimal("1000"), 1), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2)).toDF("decNum", "intNum") > val df2 = df.withColumnRenamed("decNum", "decNum2").join(df, > "intNum").agg(sum("decNum")) > scala> df2.show(40,false) > --- > sum(decNum) > --- > 4000.00 > --- > > {code} > > The result should be 104000.. > It appears a partial sum is computed for each join key, as the result > returned would be the answer for all rows matching intNum === 1. > If only the rows with intNum === 2 are included, the answer given is null: > > {code:java} > scala> val df3 = df.filter($"intNum" === lit(2)) > df3: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [decNum: > decimal(38,18), intNum: int] > scala> val df4 = df3.withColumnRenamed("decNum", "decNum2").join(df3, > "intNum").agg(sum("decNum")) > df4: org.apache.spark.sql.DataFrame = [sum(decNum): decimal(38,18)] > scala> df4.show(40,false) > --- > sum(decNum) > --- > null > --- > > {code} > > The correct answer, 10., doesn't fit in > the DataType picked for the result, decimal(38,18), so an overflow occurs, > which Spark then converts to null. > The first example, which doesn't filter out the intNum === 1 values should > also return null, indicating overflow, but it doesn't. This may mislead the > user to think a valid sum was computed. > If whole-stage code gen is turned off: > spark.conf.set("spark.sql.codegen.wholeStage", false) > ... incorrect results are not returned because the overflow is caught as an > exception: > java.lang.IllegalArgumentException: requirement failed: Decimal precision 39 > exceeds max precision 38 > > > > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30201) HiveOutputWriter standardOI should use ObjectInspectorCopyOption.DEFAULT
[ https://issues.apache.org/jira/browse/SPARK-30201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208389#comment-17208389 ] Anurag Mantripragada commented on SPARK-30201: -- [~cloud_fan], [~ulysses], [~dongjoon] - I verified this issue is present in branch-2.4. Test failure below: {{[info] == Results ==}} {{[info] !== Correct Answer - 1 == == Spark Answer - 1 ==}} {{[info] !struct<> struct}} {{[info] ![AABBCC] [EFBFBDEFBFBDEFBFBD] (QueryTest.scala:163)}} {{[info] org.scalatest.exceptions.TestFailedException:}} I created a PR to backport it to branch 2.4. It was a clean cherry-pick, could you please take a look? Thanks [https://github.com/apache/spark/pull/29948] > HiveOutputWriter standardOI should use ObjectInspectorCopyOption.DEFAULT > > > Key: SPARK-30201 > URL: https://issues.apache.org/jira/browse/SPARK-30201 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: ulysses you >Assignee: ulysses you >Priority: Critical > Fix For: 3.0.0 > > > Now spark use `ObjectInspectorCopyOption.JAVA` as oi option which will > convert any string to UTF-8 string. When write non UTF-8 code data, then > `EFBFBD` will appear. > We should use `ObjectInspectorCopyOption.DEFAULT` to support pass the bytes. > Here is the way to reproduce: > 1. make a file contains 16 radix 'AABBCC' which is not the UTF-8 code. > 2. create table test1 (c string) location '$file_path'; > 3. select hex(c) from test1; // AABBCC > 4. craete table test2 (c string) as select c from test1; > 5. select hex(c) from test2; // EFBFBDEFBFBDEFBFBD -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org