[jira] [Created] (SPARK-40475) Allow job status tracking with jobGroupId

2022-09-16 Thread Anurag Mantripragada (Jira)
Anurag Mantripragada created SPARK-40475:


 Summary: Allow job status tracking with jobGroupId
 Key: SPARK-40475
 URL: https://issues.apache.org/jira/browse/SPARK-40475
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, Web UI
Affects Versions: 3.3.0
Reporter: Anurag Mantripragada


Spark let's us group jobs together by setting a job group id. This is useful to 
check the job group in the web UI. For example

{{spark.sparkContext().setJobGroup("mygroup_id")}}

We have a use-case where we would like to have a long running Spark application 
and have jobs submitted to it. We would like to programmatically check the 
status of the jobs created by this group id. For example, 
[SQLStatusStore|#L41]] has `executionList()` which returns a map of jobs to the 
status. There is no way to filter this based on jobGroupId. 

This Jira is to add ability to get fine grained job statues by jobGroupId.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30894) The nullability of Size function should not depend on SQLConf.get

2020-10-15 Thread Anurag Mantripragada (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214822#comment-17214822
 ] 

Anurag Mantripragada commented on SPARK-30894:
--

Hi [~dongjoon], here's the backport PR: 
[https://github.com/apache/spark/pull/30058]

 

> The nullability of Size function should not depend on SQLConf.get
> -
>
> Key: SPARK-30894
> URL: https://issues.apache.org/jira/browse/SPARK-30894
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Assignee: Maxim Gekk
>Priority: Blocker
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30893) Expressions should not change its data type/nullability after it's created

2020-10-12 Thread Anurag Mantripragada (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212749#comment-17212749
 ] 

Anurag Mantripragada commented on SPARK-30893:
--

[~maropu], [~dongjoon] - I went through the PRs for individual issues in this 
Umbrella and looked at the code changes. Only 
[SPARK-30894|https://issues.apache.org/jira/browse/SPARK-30894] seems to affect 
branch-2.4. I've commented on that Jira separately asking the original author 
if we can backport this to branch-2.4. 

> Expressions should not change its data type/nullability after it's created
> --
>
> Key: SPARK-30893
> URL: https://issues.apache.org/jira/browse/SPARK-30893
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Priority: Critical
> Fix For: 3.0.0
>
>
> This is a problem because the configuration can change between different 
> phases of planning, and this can silently break a query plan which can lead 
> to crashes or data corruption, if data type/nullability gets changed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30894) The nullability of Size function should not depend on SQLConf.get

2020-10-12 Thread Anurag Mantripragada (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212748#comment-17212748
 ] 

Anurag Mantripragada commented on SPARK-30894:
--

[~maxgekk] - Looking at the code in branch-2.4, looks like this could be an 
issue there - 
[https://github.com/apache/spark/blob/652e5746019b95b78af4d36c23ec5155bb22325b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L94]

Should we backport this to branch-2.4 since it is LTS?

> The nullability of Size function should not depend on SQLConf.get
> -
>
> Key: SPARK-30894
> URL: https://issues.apache.org/jira/browse/SPARK-30894
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Assignee: Maxim Gekk
>Priority: Blocker
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30893) Expressions should not change its data type/nullability after it's created

2020-10-12 Thread Anurag Mantripragada (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212717#comment-17212717
 ] 

Anurag Mantripragada commented on SPARK-30893:
--

Sorry for not being clear. I was referring to this comment.

 

??For data type and nullability, I think we should fix before 3.0, as they can 
lead to data corruption.??

??For other behaviors, we can have more discussion and wait for 3.1??

> Expressions should not change its data type/nullability after it's created
> --
>
> Key: SPARK-30893
> URL: https://issues.apache.org/jira/browse/SPARK-30893
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Priority: Critical
> Fix For: 3.0.0
>
>
> This is a problem because the configuration can change between different 
> phases of planning, and this can silently break a query plan which can lead 
> to crashes or data corruption, if data type/nullability gets changed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-30893) Expressions should not change its data type/nullability after it's created

2020-10-12 Thread Anurag Mantripragada (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212635#comment-17212635
 ] 

Anurag Mantripragada edited comment on SPARK-30893 at 10/12/20, 7:56 PM:
-

As mentioned above in [~cloud_fan]'s comment, should we backport the 
nullability and datatype issues from this umbrella to branch-2.4 as they may 
cause corruption? 

CC: [~viirya], [~dongjoon]


was (Author: anuragmantri):
As mentioned here 
[#https://issues.apache.org/jira/browse/SPARK-30893?focusedCommentId=17041618=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17041618],
 should we backport the nullability and datatype issues from this umbrella to 
branch-2.4 as they may cause corruption? 

CC: [~viirya], [~dongjoon]

> Expressions should not change its data type/nullability after it's created
> --
>
> Key: SPARK-30893
> URL: https://issues.apache.org/jira/browse/SPARK-30893
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Priority: Critical
> Fix For: 3.0.0
>
>
> This is a problem because the configuration can change between different 
> phases of planning, and this can silently break a query plan which can lead 
> to crashes or data corruption, if data type/nullability gets changed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30893) Expressions should not change its data type/nullability after it's created

2020-10-12 Thread Anurag Mantripragada (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212635#comment-17212635
 ] 

Anurag Mantripragada commented on SPARK-30893:
--

As mentioned here 
[#https://issues.apache.org/jira/browse/SPARK-30893?focusedCommentId=17041618=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17041618],
 should we backport the nullability and datatype issues from this umbrella to 
branch-2.4 as they may cause corruption? 

CC: [~viirya], [~dongjoon]

> Expressions should not change its data type/nullability after it's created
> --
>
> Key: SPARK-30893
> URL: https://issues.apache.org/jira/browse/SPARK-30893
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Priority: Critical
> Fix For: 3.0.0
>
>
> This is a problem because the configuration can change between different 
> phases of planning, and this can silently break a query plan which can lead 
> to crashes or data corruption, if data type/nullability gets changed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28067) Incorrect results in decimal aggregation with whole-stage code gen enabled

2020-10-07 Thread Anurag Mantripragada (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17209923#comment-17209923
 ] 

Anurag Mantripragada commented on SPARK-28067:
--

I just checked the issue exists in branch-2.4. Since this is a `correctness` 
issue, should we backport it to branch-2.4? 
cc: [~cloud_fan], [~dongjoon]

> Incorrect results in decimal aggregation with whole-stage code gen enabled
> --
>
> Key: SPARK-28067
> URL: https://issues.apache.org/jira/browse/SPARK-28067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.4, 3.0.0
>Reporter: Mark Sirek
>Assignee: Sunitha Kambhampati
>Priority: Critical
>  Labels: correctness
> Fix For: 3.1.0
>
>
> The following test case involving a join followed by a sum aggregation 
> returns the wrong answer for the sum:
>  
> {code:java}
> val df = Seq(
>  (BigDecimal("1000"), 1),
>  (BigDecimal("1000"), 1),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2)).toDF("decNum", "intNum")
> val df2 = df.withColumnRenamed("decNum", "decNum2").join(df, 
> "intNum").agg(sum("decNum"))
> scala> df2.show(40,false)
>  ---
> sum(decNum)
> ---
> 4000.00
> ---
>  
> {code}
>  
> The result should be 104000..
> It appears a partial sum is computed for each join key, as the result 
> returned would be the answer for all rows matching intNum === 1.
> If only the rows with intNum === 2 are included, the answer given is null:
>  
> {code:java}
> scala> val df3 = df.filter($"intNum" === lit(2))
>  df3: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [decNum: 
> decimal(38,18), intNum: int]
> scala> val df4 = df3.withColumnRenamed("decNum", "decNum2").join(df3, 
> "intNum").agg(sum("decNum"))
>  df4: org.apache.spark.sql.DataFrame = [sum(decNum): decimal(38,18)]
> scala> df4.show(40,false)
>  ---
> sum(decNum)
> ---
> null
> ---
>  
> {code}
>  
> The correct answer, 10., doesn't fit in 
> the DataType picked for the result, decimal(38,18), so an overflow occurs, 
> which Spark then converts to null.
> The first example, which doesn't filter out the intNum === 1 values should 
> also return null, indicating overflow, but it doesn't.  This may mislead the 
> user to think a valid sum was computed.
> If whole-stage code gen is turned off:
> spark.conf.set("spark.sql.codegen.wholeStage", false)
> ... incorrect results are not returned because the overflow is caught as an 
> exception:
> java.lang.IllegalArgumentException: requirement failed: Decimal precision 39 
> exceeds max precision 38
>  
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30201) HiveOutputWriter standardOI should use ObjectInspectorCopyOption.DEFAULT

2020-10-05 Thread Anurag Mantripragada (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208389#comment-17208389
 ] 

Anurag Mantripragada commented on SPARK-30201:
--

[~cloud_fan], [~ulysses], [~dongjoon] - I verified this issue is present in 
branch-2.4. Test failure below:



{{[info] == Results ==}}
{{[info] !== Correct Answer - 1 == == Spark Answer - 1 ==}}
{{[info] !struct<> struct}}
{{[info] ![AABBCC] [EFBFBDEFBFBDEFBFBD] (QueryTest.scala:163)}}
{{[info] org.scalatest.exceptions.TestFailedException:}}

 

I created a PR to backport it to branch 2.4. It was a clean cherry-pick, could 
you please take a look? Thanks

[https://github.com/apache/spark/pull/29948]

> HiveOutputWriter standardOI should use ObjectInspectorCopyOption.DEFAULT
> 
>
> Key: SPARK-30201
> URL: https://issues.apache.org/jira/browse/SPARK-30201
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: ulysses you
>Assignee: ulysses you
>Priority: Critical
> Fix For: 3.0.0
>
>
> Now spark use `ObjectInspectorCopyOption.JAVA` as oi option which will 
> convert any string to UTF-8 string. When write non UTF-8 code data, then 
> `EFBFBD` will appear.
> We should use `ObjectInspectorCopyOption.DEFAULT` to support pass the bytes.
> Here is the way to reproduce:
> 1. make a file contains 16 radix 'AABBCC' which is not the UTF-8 code.
> 2. create table test1 (c string) location '$file_path';
> 3. select hex(c) from test1; // AABBCC
> 4. craete table test2 (c string) as select c from test1;
> 5. select hex(c) from test2; // EFBFBDEFBFBDEFBFBD



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org