[jira] [Created] (SPARK-36068) No tests in hadoop-cloud run unless hadoop-3.2 profile is not activated explicitly
Kousuke Saruta created SPARK-36068: -- Summary: No tests in hadoop-cloud run unless hadoop-3.2 profile is not activated explicitly Key: SPARK-36068 URL: https://issues.apache.org/jira/browse/SPARK-36068 Project: Spark Issue Type: Bug Components: Build, Tests Affects Versions: 3.2.0, 3.3.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta No tests in hadoop-cloud are compiled and run unless hadoop-3.2 profile is activated explicitly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36068) No tests in hadoop-cloud run unless hadoop-3.2 profile is not activated explicitly
[ https://issues.apache.org/jira/browse/SPARK-36068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-36068: --- Description: No tests in hadoop-cloud are compiled and run unless hadoop-3.2 profile is activated explicitly. This issue is similar to SPARK-36067. was:No tests in hadoop-cloud are compiled and run unless hadoop-3.2 profile is activated explicitly. > No tests in hadoop-cloud run unless hadoop-3.2 profile is not activated > explicitly > -- > > Key: SPARK-36068 > URL: https://issues.apache.org/jira/browse/SPARK-36068 > Project: Spark > Issue Type: Bug > Components: Build, Tests >Affects Versions: 3.2.0, 3.3.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > No tests in hadoop-cloud are compiled and run unless hadoop-3.2 profile is > activated explicitly. > This issue is similar to SPARK-36067. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36068) No tests in hadoop-cloud run unless hadoop-3.2 profile is activated explicitly
[ https://issues.apache.org/jira/browse/SPARK-36068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-36068: --- Summary: No tests in hadoop-cloud run unless hadoop-3.2 profile is activated explicitly (was: No tests in hadoop-cloud run unless hadoop-3.2 profile is not activated explicitly) > No tests in hadoop-cloud run unless hadoop-3.2 profile is activated explicitly > -- > > Key: SPARK-36068 > URL: https://issues.apache.org/jira/browse/SPARK-36068 > Project: Spark > Issue Type: Bug > Components: Build, Tests >Affects Versions: 3.2.0, 3.3.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > No tests in hadoop-cloud are compiled and run unless hadoop-3.2 profile is > activated explicitly. > This issue is similar to SPARK-36067. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36068) No tests in hadoop-cloud run unless hadoop-3.2 profile is activated explicitly
[ https://issues.apache.org/jira/browse/SPARK-36068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36068: Assignee: Apache Spark (was: Kousuke Saruta) > No tests in hadoop-cloud run unless hadoop-3.2 profile is activated explicitly > -- > > Key: SPARK-36068 > URL: https://issues.apache.org/jira/browse/SPARK-36068 > Project: Spark > Issue Type: Bug > Components: Build, Tests >Affects Versions: 3.2.0, 3.3.0 >Reporter: Kousuke Saruta >Assignee: Apache Spark >Priority: Minor > > No tests in hadoop-cloud are compiled and run unless hadoop-3.2 profile is > activated explicitly. > This issue is similar to SPARK-36067. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36068) No tests in hadoop-cloud run unless hadoop-3.2 profile is activated explicitly
[ https://issues.apache.org/jira/browse/SPARK-36068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377888#comment-17377888 ] Apache Spark commented on SPARK-36068: -- User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/33277 > No tests in hadoop-cloud run unless hadoop-3.2 profile is activated explicitly > -- > > Key: SPARK-36068 > URL: https://issues.apache.org/jira/browse/SPARK-36068 > Project: Spark > Issue Type: Bug > Components: Build, Tests >Affects Versions: 3.2.0, 3.3.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > No tests in hadoop-cloud are compiled and run unless hadoop-3.2 profile is > activated explicitly. > This issue is similar to SPARK-36067. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36068) No tests in hadoop-cloud run unless hadoop-3.2 profile is activated explicitly
[ https://issues.apache.org/jira/browse/SPARK-36068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36068: Assignee: Kousuke Saruta (was: Apache Spark) > No tests in hadoop-cloud run unless hadoop-3.2 profile is activated explicitly > -- > > Key: SPARK-36068 > URL: https://issues.apache.org/jira/browse/SPARK-36068 > Project: Spark > Issue Type: Bug > Components: Build, Tests >Affects Versions: 3.2.0, 3.3.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > No tests in hadoop-cloud are compiled and run unless hadoop-3.2 profile is > activated explicitly. > This issue is similar to SPARK-36067. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36068) No tests in hadoop-cloud run unless hadoop-3.2 profile is activated explicitly
[ https://issues.apache.org/jira/browse/SPARK-36068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377890#comment-17377890 ] Apache Spark commented on SPARK-36068: -- User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/33277 > No tests in hadoop-cloud run unless hadoop-3.2 profile is activated explicitly > -- > > Key: SPARK-36068 > URL: https://issues.apache.org/jira/browse/SPARK-36068 > Project: Spark > Issue Type: Bug > Components: Build, Tests >Affects Versions: 3.2.0, 3.3.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > No tests in hadoop-cloud are compiled and run unless hadoop-3.2 profile is > activated explicitly. > This issue is similar to SPARK-36067. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36058) Support replicasets/job API
[ https://issues.apache.org/jira/browse/SPARK-36058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377893#comment-17377893 ] Klaus Ma commented on SPARK-36058: -- In Volcano, we have volcano job to run mpi/tensorflow job (the pods are created by controller/operator); but for spark-job, the executor pod is created by driver pod which is different. If spark pods (both driver and executor) can be created by operator, we can use volcano job to make it simple :) > Support replicasets/job API > --- > > Key: SPARK-36058 > URL: https://issues.apache.org/jira/browse/SPARK-36058 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.2.0 >Reporter: Holden Karau >Priority: Major > > Volcano & Yunikorn both support scheduling invidual pods, but they also > support higher level abstractions similar to the vanilla Kube replicasets > which we can use to improve scheduling performance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36058) Support replicasets/job API
[ https://issues.apache.org/jira/browse/SPARK-36058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377895#comment-17377895 ] Klaus Ma commented on SPARK-36058: -- xref for volcano job example: [https://github.com/volcano-sh/volcano/blob/master/example/integrations/mpi/mpi-example.yaml] > Support replicasets/job API > --- > > Key: SPARK-36058 > URL: https://issues.apache.org/jira/browse/SPARK-36058 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.2.0 >Reporter: Holden Karau >Priority: Major > > Volcano & Yunikorn both support scheduling invidual pods, but they also > support higher level abstractions similar to the vanilla Kube replicasets > which we can use to improve scheduling performance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35571) tag v3.0.0 org.apache.spark.sql.catalyst.parser.AstBuilder import error
[ https://issues.apache.org/jira/browse/SPARK-35571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] geekyouth updated SPARK-35571: -- Attachment: screenshot-1.png > tag v3.0.0 org.apache.spark.sql.catalyst.parser.AstBuilder import error > --- > > Key: SPARK-35571 > URL: https://issues.apache.org/jira/browse/SPARK-35571 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: geekyouth >Priority: Major > Attachments: screenshot-1.png > > > org.apache.spark.sql.catalyst.parser.AstBuilder: > https://github.com/apache/spark/blob/v3.0.0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala > line 36: > `import org.apache.spark.sql.catalyst.parser.SqlBaseParser._` > SqlBaseParser do not exists in pkg `org.apache.spark.sql.catalyst.parser` > https://github.com/apache/spark/tree/v3.0.0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser > also line 54 : > SqlBaseBaseVisitor does not import and could not compile -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35571) tag v3.0.0 org.apache.spark.sql.catalyst.parser.AstBuilder import error
[ https://issues.apache.org/jira/browse/SPARK-35571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377901#comment-17377901 ] geekyouth commented on SPARK-35571: --- !screenshot-1.png! That works for me (y) > tag v3.0.0 org.apache.spark.sql.catalyst.parser.AstBuilder import error > --- > > Key: SPARK-35571 > URL: https://issues.apache.org/jira/browse/SPARK-35571 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: geekyouth >Priority: Major > Attachments: screenshot-1.png > > > org.apache.spark.sql.catalyst.parser.AstBuilder: > https://github.com/apache/spark/blob/v3.0.0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala > line 36: > `import org.apache.spark.sql.catalyst.parser.SqlBaseParser._` > SqlBaseParser do not exists in pkg `org.apache.spark.sql.catalyst.parser` > https://github.com/apache/spark/tree/v3.0.0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser > also line 54 : > SqlBaseBaseVisitor does not import and could not compile -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36044) Suport TimestampNTZ in functions unix_timestamp/to_unix_timestamp
[ https://issues.apache.org/jira/browse/SPARK-36044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36044: Assignee: Apache Spark > Suport TimestampNTZ in functions unix_timestamp/to_unix_timestamp > - > > Key: SPARK-36044 > URL: https://issues.apache.org/jira/browse/SPARK-36044 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Apache Spark >Priority: Major > > The functions unix_timestamp/to_unix_timestamp should be able to accept input > of TimestampNTZ type. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36044) Suport TimestampNTZ in functions unix_timestamp/to_unix_timestamp
[ https://issues.apache.org/jira/browse/SPARK-36044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36044: Assignee: (was: Apache Spark) > Suport TimestampNTZ in functions unix_timestamp/to_unix_timestamp > - > > Key: SPARK-36044 > URL: https://issues.apache.org/jira/browse/SPARK-36044 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Priority: Major > > The functions unix_timestamp/to_unix_timestamp should be able to accept input > of TimestampNTZ type. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36044) Suport TimestampNTZ in functions unix_timestamp/to_unix_timestamp
[ https://issues.apache.org/jira/browse/SPARK-36044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377907#comment-17377907 ] Apache Spark commented on SPARK-36044: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/33278 > Suport TimestampNTZ in functions unix_timestamp/to_unix_timestamp > - > > Key: SPARK-36044 > URL: https://issues.apache.org/jira/browse/SPARK-36044 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Priority: Major > > The functions unix_timestamp/to_unix_timestamp should be able to accept input > of TimestampNTZ type. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-36044) Suport TimestampNTZ in functions unix_timestamp/to_unix_timestamp
[ https://issues.apache.org/jira/browse/SPARK-36044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-36044: --- Comment: was deleted (was: I'm working on this.) > Suport TimestampNTZ in functions unix_timestamp/to_unix_timestamp > - > > Key: SPARK-36044 > URL: https://issues.apache.org/jira/browse/SPARK-36044 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Priority: Major > > The functions unix_timestamp/to_unix_timestamp should be able to accept input > of TimestampNTZ type. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36069) spark function from_json output field name, field type and field value when FAILFAST mode throw exception
geekyouth created SPARK-36069: - Summary: spark function from_json output field name, field type and field value when FAILFAST mode throw exception Key: SPARK-36069 URL: https://issues.apache.org/jira/browse/SPARK-36069 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: geekyouth spark function from_json outputs error message when FAILFAST mode throw exception. But the message does not contain important info exemaple: field name, field vlue , field type... This infoormation is very important for devlops to find where error input data is located. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36069) spark function from_json should output field name, field type and field value when FAILFAST mode throw exception
[ https://issues.apache.org/jira/browse/SPARK-36069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] geekyouth updated SPARK-36069: -- Summary: spark function from_json should output field name, field type and field value when FAILFAST mode throw exception (was: spark function from_json output field name, field type and field value when FAILFAST mode throw exception) > spark function from_json should output field name, field type and field value > when FAILFAST mode throw exception > > > Key: SPARK-36069 > URL: https://issues.apache.org/jira/browse/SPARK-36069 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: geekyouth >Priority: Major > > spark function from_json outputs error message when FAILFAST mode throw > exception. > > But the message does not contain important info exemaple: field name, field > vlue , field type... > > This infoormation is very important for devlops to find where error input > data is located. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36035) Adjust `test_astype`, `test_neg` for old pandas versions
[ https://issues.apache.org/jira/browse/SPARK-36035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-36035: Assignee: Xinrong Meng > Adjust `test_astype`, `test_neg` for old pandas versions > > > Key: SPARK-36035 > URL: https://issues.apache.org/jira/browse/SPARK-36035 > Project: Spark > Issue Type: Test > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > > * test_astype > For pandas < 1.1.0, declaring or converting to StringDtype was in general > only possible if the data was already only str or nan-like (GH31204). > In pandas 1.1.0, the problem is adjusted by > [https://pandas.pydata.org/pandas-docs/stable/whatsnew/v1.1.0.html#all-dtypes-can-now-be-converted-to-stringdtype]. > That should be considered in `test_astype`, otherwise, current tests will > fail with pandas < 1.1.0. > * test_neg > {code:java} > dtypes = [ > "Int8", > "Int16", > "Int32", > "Int64", > ] > psers = [] > for dtype in dtypes: > psers.append(pd.Series([1, 2, 3, None], dtype=dtype)) > > for pser in psers: > print((-pser).dtype){code} > ~ 1.0.5, object dtype > 1.1.0~1.1.2, TypeError: bad operand type for unary -: 'IntegerArray' > 1.1.3, correct respective dtype -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36035) Adjust `test_astype`, `test_neg` for old pandas versions
[ https://issues.apache.org/jira/browse/SPARK-36035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-36035. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 33250 [https://github.com/apache/spark/pull/33250] > Adjust `test_astype`, `test_neg` for old pandas versions > > > Key: SPARK-36035 > URL: https://issues.apache.org/jira/browse/SPARK-36035 > Project: Spark > Issue Type: Test > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.2.0 > > > * test_astype > For pandas < 1.1.0, declaring or converting to StringDtype was in general > only possible if the data was already only str or nan-like (GH31204). > In pandas 1.1.0, the problem is adjusted by > [https://pandas.pydata.org/pandas-docs/stable/whatsnew/v1.1.0.html#all-dtypes-can-now-be-converted-to-stringdtype]. > That should be considered in `test_astype`, otherwise, current tests will > fail with pandas < 1.1.0. > * test_neg > {code:java} > dtypes = [ > "Int8", > "Int16", > "Int32", > "Int64", > ] > psers = [] > for dtype in dtypes: > psers.append(pd.Series([1, 2, 3, None], dtype=dtype)) > > for pser in psers: > print((-pser).dtype){code} > ~ 1.0.5, object dtype > 1.1.0~1.1.2, TypeError: bad operand type for unary -: 'IntegerArray' > 1.1.3, correct respective dtype -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36068) No tests in hadoop-cloud run unless hadoop-3.2 profile is activated explicitly
[ https://issues.apache.org/jira/browse/SPARK-36068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-36068. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 33277 [https://github.com/apache/spark/pull/33277] > No tests in hadoop-cloud run unless hadoop-3.2 profile is activated explicitly > -- > > Key: SPARK-36068 > URL: https://issues.apache.org/jira/browse/SPARK-36068 > Project: Spark > Issue Type: Bug > Components: Build, Tests >Affects Versions: 3.2.0, 3.3.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > Fix For: 3.2.0 > > > No tests in hadoop-cloud are compiled and run unless hadoop-3.2 profile is > activated explicitly. > This issue is similar to SPARK-36067. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36066) UTF8String trimAll() only can trim space but not ({@literal <=} ASCII 32)
[ https://issues.apache.org/jira/browse/SPARK-36066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liukai updated SPARK-36066: --- Description: In this method,Character.isWhitespace() is used to judged. But Character.isWhitespace() can not work matching the method definition. > UTF8String trimAll() only can trim space but not ({@literal <=} ASCII 32) > - > > Key: SPARK-36066 > URL: https://issues.apache.org/jira/browse/SPARK-36066 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.1, 3.1.2 >Reporter: liukai >Priority: Major > > In this method,Character.isWhitespace() is used to judged. But > Character.isWhitespace() can not work matching the method definition. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36070) Add time cost info for writing rows out and committing the task.
Kent Yao created SPARK-36070: Summary: Add time cost info for writing rows out and committing the task. Key: SPARK-36070 URL: https://issues.apache.org/jira/browse/SPARK-36070 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.2.0 Reporter: Kent Yao We have a job that has a stage that contains about 8k tasks. Most tasks take about 1~10min to finish but 3 of them tasks run extremely slow. The root cause is most likely the delay of the storage system. On the spark side, we can record the time cost in logs for better bug hunting or performance tuning. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36070) Add time cost info for writing rows out and committing the task.
[ https://issues.apache.org/jira/browse/SPARK-36070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-36070: - Description: We have a job that has a stage that contains about 8k tasks. Most tasks take about 1~10min to finish but 3 of them tasks run extremely slow. They take about 1 hour each to finish and also do their speculations. The root cause is most likely the delay of the storage system. On the spark side, we can record the time cost in logs for better bug hunting or performance tuning. (was: We have a job that has a stage that contains about 8k tasks. Most tasks take about 1~10min to finish but 3 of them tasks run extremely slow. The root cause is most likely the delay of the storage system. On the spark side, we can record the time cost in logs for better bug hunting or performance tuning.) > Add time cost info for writing rows out and committing the task. > > > Key: SPARK-36070 > URL: https://issues.apache.org/jira/browse/SPARK-36070 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: Kent Yao >Priority: Minor > > We have a job that has a stage that contains about 8k tasks. Most tasks take > about 1~10min to finish but 3 of them tasks run extremely slow. They take > about 1 hour each to finish and also do their speculations. The root cause is > most likely the delay of the storage system. On the spark side, we can record > the time cost in logs for better bug hunting or performance tuning. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36070) Add time cost info for writing rows out and committing the task.
[ https://issues.apache.org/jira/browse/SPARK-36070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36070: Assignee: Apache Spark > Add time cost info for writing rows out and committing the task. > > > Key: SPARK-36070 > URL: https://issues.apache.org/jira/browse/SPARK-36070 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: Kent Yao >Assignee: Apache Spark >Priority: Minor > > We have a job that has a stage that contains about 8k tasks. Most tasks take > about 1~10min to finish but 3 of them tasks run extremely slow. They take > about 1 hour each to finish and also do their speculations. The root cause is > most likely the delay of the storage system. On the spark side, we can record > the time cost in logs for better bug hunting or performance tuning. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36070) Add time cost info for writing rows out and committing the task.
[ https://issues.apache.org/jira/browse/SPARK-36070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377937#comment-17377937 ] Apache Spark commented on SPARK-36070: -- User 'yaooqinn' has created a pull request for this issue: https://github.com/apache/spark/pull/33279 > Add time cost info for writing rows out and committing the task. > > > Key: SPARK-36070 > URL: https://issues.apache.org/jira/browse/SPARK-36070 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: Kent Yao >Priority: Minor > > We have a job that has a stage that contains about 8k tasks. Most tasks take > about 1~10min to finish but 3 of them tasks run extremely slow. They take > about 1 hour each to finish and also do their speculations. The root cause is > most likely the delay of the storage system. On the spark side, we can record > the time cost in logs for better bug hunting or performance tuning. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36070) Add time cost info for writing rows out and committing the task.
[ https://issues.apache.org/jira/browse/SPARK-36070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36070: Assignee: (was: Apache Spark) > Add time cost info for writing rows out and committing the task. > > > Key: SPARK-36070 > URL: https://issues.apache.org/jira/browse/SPARK-36070 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: Kent Yao >Priority: Minor > > We have a job that has a stage that contains about 8k tasks. Most tasks take > about 1~10min to finish but 3 of them tasks run extremely slow. They take > about 1 hour each to finish and also do their speculations. The root cause is > most likely the delay of the storage system. On the spark side, we can record > the time cost in logs for better bug hunting or performance tuning. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36070) Add time cost info for writing rows out and committing the task.
[ https://issues.apache.org/jira/browse/SPARK-36070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377938#comment-17377938 ] Apache Spark commented on SPARK-36070: -- User 'yaooqinn' has created a pull request for this issue: https://github.com/apache/spark/pull/33279 > Add time cost info for writing rows out and committing the task. > > > Key: SPARK-36070 > URL: https://issues.apache.org/jira/browse/SPARK-36070 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: Kent Yao >Priority: Minor > > We have a job that has a stage that contains about 8k tasks. Most tasks take > about 1~10min to finish but 3 of them tasks run extremely slow. They take > about 1 hour each to finish and also do their speculations. The root cause is > most likely the delay of the storage system. On the spark side, we can record > the time cost in logs for better bug hunting or performance tuning. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36068) No tests in hadoop-cloud run unless hadoop-3.2 profile is activated explicitly
[ https://issues.apache.org/jira/browse/SPARK-36068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-36068: - Fix Version/s: (was: 3.2.0) > No tests in hadoop-cloud run unless hadoop-3.2 profile is activated explicitly > -- > > Key: SPARK-36068 > URL: https://issues.apache.org/jira/browse/SPARK-36068 > Project: Spark > Issue Type: Bug > Components: Build, Tests >Affects Versions: 3.2.0, 3.3.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > No tests in hadoop-cloud are compiled and run unless hadoop-3.2 profile is > activated explicitly. > This issue is similar to SPARK-36067. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36068) No tests in hadoop-cloud run unless hadoop-3.2 profile is activated explicitly
[ https://issues.apache.org/jira/browse/SPARK-36068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36068: Assignee: (was: Apache Spark) > No tests in hadoop-cloud run unless hadoop-3.2 profile is activated explicitly > -- > > Key: SPARK-36068 > URL: https://issues.apache.org/jira/browse/SPARK-36068 > Project: Spark > Issue Type: Bug > Components: Build, Tests >Affects Versions: 3.2.0, 3.3.0 >Reporter: Kousuke Saruta >Priority: Minor > > No tests in hadoop-cloud are compiled and run unless hadoop-3.2 profile is > activated explicitly. > This issue is similar to SPARK-36067. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-36068) No tests in hadoop-cloud run unless hadoop-3.2 profile is activated explicitly
[ https://issues.apache.org/jira/browse/SPARK-36068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reopened SPARK-36068: -- Assignee: (was: Kousuke Saruta) Reverted at https://github.com/apache/spark/commit/951e84f1b91fc2ac09b3afbe51bdd68af62d26fb > No tests in hadoop-cloud run unless hadoop-3.2 profile is activated explicitly > -- > > Key: SPARK-36068 > URL: https://issues.apache.org/jira/browse/SPARK-36068 > Project: Spark > Issue Type: Bug > Components: Build, Tests >Affects Versions: 3.2.0, 3.3.0 >Reporter: Kousuke Saruta >Priority: Minor > > No tests in hadoop-cloud are compiled and run unless hadoop-3.2 profile is > activated explicitly. > This issue is similar to SPARK-36067. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36068) No tests in hadoop-cloud run unless hadoop-3.2 profile is activated explicitly
[ https://issues.apache.org/jira/browse/SPARK-36068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36068: Assignee: Apache Spark > No tests in hadoop-cloud run unless hadoop-3.2 profile is activated explicitly > -- > > Key: SPARK-36068 > URL: https://issues.apache.org/jira/browse/SPARK-36068 > Project: Spark > Issue Type: Bug > Components: Build, Tests >Affects Versions: 3.2.0, 3.3.0 >Reporter: Kousuke Saruta >Assignee: Apache Spark >Priority: Minor > > No tests in hadoop-cloud are compiled and run unless hadoop-3.2 profile is > activated explicitly. > This issue is similar to SPARK-36067. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36069) spark function from_json should output field name, field type and field value when FAILFAST mode throw exception
[ https://issues.apache.org/jira/browse/SPARK-36069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377968#comment-17377968 ] geekyouth commented on SPARK-36069: --- here is my unit test output: org.apache.spark.SparkException: Malformed records are detected in record parsing. Parse Mode: FAILFAST. To process malformed records as null result, try setting the option 'mode' as 'PERMISSIVE'.org.apache.spark.SparkException: Malformed records are detected in record parsing. Parse Mode: FAILFAST. To process malformed records as null result, try setting the option 'mode' as 'PERMISSIVE'. at org.apache.spark.sql.catalyst.util.FailureSafeParser.parse(FailureSafeParser.scala:70) at org.apache.spark.sql.catalyst.expressions.JsonToStructs.nullSafeEval(jsonExpressions.scala:597) at org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:461) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.subExpr_0$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source) at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:341) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:872) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:872) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:127) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:444) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:447) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.spark.sql.catalyst.util.BadRecordException: java.lang.RuntimeException: Cannot parse 0.31 as double. at org.apache.spark.sql.catalyst.json.JacksonParser.parse(JacksonParser.scala:478) at org.apache.spark.sql.catalyst.expressions.JsonToStructs.$anonfun$parser$3(jsonExpressions.scala:585) at org.apache.spark.sql.catalyst.util.FailureSafeParser.parse(FailureSafeParser.scala:60) ... 20 more > spark function from_json should output field name, field type and field value > when FAILFAST mode throw exception > > > Key: SPARK-36069 > URL: https://issues.apache.org/jira/browse/SPARK-36069 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: geekyouth >Priority: Major > > spark function from_json outputs error message when FAILFAST mode throw > exception. > > But the message does not contain important info exemaple: field name, field > vlue , field type... > > This infoormation is very important for devlops to find where error input > data is located. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12837) Spark driver requires large memory space for serialized results even there are no data collected to the driver
[ https://issues.apache.org/jira/browse/SPARK-12837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377990#comment-17377990 ] shashank commented on SPARK-12837: -- Seeing the same issue on 2.4.3 {code:java} Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 104904 tasks (8.0 GB) is bigger than spark.driver.maxResultSize (8.0 GB) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:2041) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:2029) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:2028) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2028) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:966) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:966) at scala.Option.foreach(Option.scala:257) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:966) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2262) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2211) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2200) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:777) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2114) at org.apache.spark.internal.io.SparkHadoopWriter$.write(SparkHadoopWriter.scala:78){code} ... 54 more > Spark driver requires large memory space for serialized results even there > are no data collected to the driver > -- > > Key: SPARK-12837 > URL: https://issues.apache.org/jira/browse/SPARK-12837 > Project: Spark > Issue Type: Question > Components: SQL >Affects Versions: 1.5.2, 1.6.0 >Reporter: Tien-Dung LE >Assignee: Wenchen Fan >Priority: Critical > Fix For: 2.2.0 > > > Executing a sql statement with a large number of partitions requires a high > memory space for the driver even there are no requests to collect data back > to the driver. > Here are steps to re-produce the issue. > 1. Start spark shell with a spark.driver.maxResultSize setting > {code:java} > bin/spark-shell --driver-memory=1g --conf spark.driver.maxResultSize=1m > {code} > 2. Execute the code > {code:java} > case class Toto( a: Int, b: Int) > val df = sc.parallelize( 1 to 1e6.toInt).map( i => Toto( i, i)).toDF > sqlContext.setConf( "spark.sql.shuffle.partitions", "200" ) > df.groupBy("a").count().saveAsParquetFile( "toto1" ) // OK > sqlContext.setConf( "spark.sql.shuffle.partitions", 1e3.toInt.toString ) > df.repartition(1e3.toInt).groupBy("a").count().repartition(1e3.toInt).saveAsParquetFile( > "toto2" ) // ERROR > {code} > The error message is > {code:java} > Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: > Total size of serialized results of 393 tasks (1025.9 KB) is bigger than > spark.driver.maxResultSize (1024.0 KB) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36071) Spark driver requires large memory space for serialized results even there are no data collected to the driver
shashank created SPARK-36071: Summary: Spark driver requires large memory space for serialized results even there are no data collected to the driver Key: SPARK-36071 URL: https://issues.apache.org/jira/browse/SPARK-36071 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.3 Reporter: shashank Executing with large partition is causing the data transferred to driver exceed spark.driver.maxResultSize. Even when no data from the logic is being collected at by the driver. Looks like spark is sending metadata back which is causing it to exceed. {code:java} spark.driver.maxResultSize=8g{code} {code:java} Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 104904 tasks (8.0 GB) is bigger than spark.driver.maxResultSize (8.0 GB)Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 104904 tasks (8.0 GB) is bigger than spark.driver.maxResultSize (8.0 GB) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:2041) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:2029) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:2028) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2028) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:966) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:966) at scala.Option.foreach(Option.scala:257) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:966) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2262) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2211) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2200) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:777) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2114) at org.apache.spark.internal.io.SparkHadoopWriter$.write(SparkHadoopWriter.scala:78) ... 54 more{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36071) Spark driver requires large memory space for serialized results even there are no data collected to the driver
[ https://issues.apache.org/jira/browse/SPARK-36071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shashank updated SPARK-36071: - Priority: Critical (was: Major) > Spark driver requires large memory space for serialized results even there > are no data collected to the driver > -- > > Key: SPARK-36071 > URL: https://issues.apache.org/jira/browse/SPARK-36071 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.3 >Reporter: shashank >Priority: Critical > > Executing with large partition is causing the data transferred to driver > exceed spark.driver.maxResultSize. > Even when no data from the logic is being collected at by the driver. Looks > like spark is sending metadata back which is causing it to exceed. > {code:java} > spark.driver.maxResultSize=8g{code} > > {code:java} > Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: > Total size of serialized results of 104904 tasks (8.0 GB) is bigger than > spark.driver.maxResultSize (8.0 GB)Caused by: > org.apache.spark.SparkException: Job aborted due to stage failure: Total size > of serialized results of 104904 tasks (8.0 GB) is bigger than > spark.driver.maxResultSize (8.0 GB) at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:2041) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:2029) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:2028) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2028) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:966) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:966) > at scala.Option.foreach(Option.scala:257) at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:966) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2262) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2211) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2200) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) at > org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:777) at > org.apache.spark.SparkContext.runJob(SparkContext.scala:2061) at > org.apache.spark.SparkContext.runJob(SparkContext.scala:2082) at > org.apache.spark.SparkContext.runJob(SparkContext.scala:2114) at > org.apache.spark.internal.io.SparkHadoopWriter$.write(SparkHadoopWriter.scala:78) > ... 54 more{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36072) TO_TIMESTAMP: return different results based on the default timestamp type
Gengliang Wang created SPARK-36072: -- Summary: TO_TIMESTAMP: return different results based on the default timestamp type Key: SPARK-36072 URL: https://issues.apache.org/jira/browse/SPARK-36072 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Gengliang Wang Assignee: Gengliang Wang The SQL function TO_TIMESTAMP should return different results based on the default timestamp type -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36072) TO_TIMESTAMP: return different results based on the default timestamp type
[ https://issues.apache.org/jira/browse/SPARK-36072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378027#comment-17378027 ] Apache Spark commented on SPARK-36072: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/33280 > TO_TIMESTAMP: return different results based on the default timestamp type > -- > > Key: SPARK-36072 > URL: https://issues.apache.org/jira/browse/SPARK-36072 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > The SQL function TO_TIMESTAMP should return different results based on the > default timestamp type -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36072) TO_TIMESTAMP: return different results based on the default timestamp type
[ https://issues.apache.org/jira/browse/SPARK-36072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36072: Assignee: Apache Spark (was: Gengliang Wang) > TO_TIMESTAMP: return different results based on the default timestamp type > -- > > Key: SPARK-36072 > URL: https://issues.apache.org/jira/browse/SPARK-36072 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Apache Spark >Priority: Major > > The SQL function TO_TIMESTAMP should return different results based on the > default timestamp type -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36072) TO_TIMESTAMP: return different results based on the default timestamp type
[ https://issues.apache.org/jira/browse/SPARK-36072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36072: Assignee: Gengliang Wang (was: Apache Spark) > TO_TIMESTAMP: return different results based on the default timestamp type > -- > > Key: SPARK-36072 > URL: https://issues.apache.org/jira/browse/SPARK-36072 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > The SQL function TO_TIMESTAMP should return different results based on the > default timestamp type -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36073) SubExpr elimination should include common child exprs of conditional expressions
Peter Toth created SPARK-36073: -- Summary: SubExpr elimination should include common child exprs of conditional expressions Key: SPARK-36073 URL: https://issues.apache.org/jira/browse/SPARK-36073 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: Peter Toth SPARK-35410 (https://github.com/apache/spark/commit/9e1b204bcce4a8fe24c1edd8271197277b5017f4#diff-4d8c210a38fc808fef3e5c966b438591f225daa3c9fd69359446b94c351aa11eR106-R112) filters out all child expressions, but in some cases that is not necessary. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36073) SubExpr elimination should include common child exprs of conditional expressions
[ https://issues.apache.org/jira/browse/SPARK-36073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378035#comment-17378035 ] Apache Spark commented on SPARK-36073: -- User 'peter-toth' has created a pull request for this issue: https://github.com/apache/spark/pull/33281 > SubExpr elimination should include common child exprs of conditional > expressions > > > Key: SPARK-36073 > URL: https://issues.apache.org/jira/browse/SPARK-36073 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Peter Toth >Priority: Minor > > SPARK-35410 > (https://github.com/apache/spark/commit/9e1b204bcce4a8fe24c1edd8271197277b5017f4#diff-4d8c210a38fc808fef3e5c966b438591f225daa3c9fd69359446b94c351aa11eR106-R112) > filters out all child expressions, but in some cases that is not necessary. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36073) SubExpr elimination should include common child exprs of conditional expressions
[ https://issues.apache.org/jira/browse/SPARK-36073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36073: Assignee: Apache Spark > SubExpr elimination should include common child exprs of conditional > expressions > > > Key: SPARK-36073 > URL: https://issues.apache.org/jira/browse/SPARK-36073 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Peter Toth >Assignee: Apache Spark >Priority: Minor > > SPARK-35410 > (https://github.com/apache/spark/commit/9e1b204bcce4a8fe24c1edd8271197277b5017f4#diff-4d8c210a38fc808fef3e5c966b438591f225daa3c9fd69359446b94c351aa11eR106-R112) > filters out all child expressions, but in some cases that is not necessary. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36073) SubExpr elimination should include common child exprs of conditional expressions
[ https://issues.apache.org/jira/browse/SPARK-36073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36073: Assignee: (was: Apache Spark) > SubExpr elimination should include common child exprs of conditional > expressions > > > Key: SPARK-36073 > URL: https://issues.apache.org/jira/browse/SPARK-36073 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Peter Toth >Priority: Minor > > SPARK-35410 > (https://github.com/apache/spark/commit/9e1b204bcce4a8fe24c1edd8271197277b5017f4#diff-4d8c210a38fc808fef3e5c966b438591f225daa3c9fd69359446b94c351aa11eR106-R112) > filters out all child expressions, but in some cases that is not necessary. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36073) SubExpr elimination should include common child exprs of conditional expressions
[ https://issues.apache.org/jira/browse/SPARK-36073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378036#comment-17378036 ] Apache Spark commented on SPARK-36073: -- User 'peter-toth' has created a pull request for this issue: https://github.com/apache/spark/pull/33281 > SubExpr elimination should include common child exprs of conditional > expressions > > > Key: SPARK-36073 > URL: https://issues.apache.org/jira/browse/SPARK-36073 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Peter Toth >Priority: Minor > > SPARK-35410 > (https://github.com/apache/spark/commit/9e1b204bcce4a8fe24c1edd8271197277b5017f4#diff-4d8c210a38fc808fef3e5c966b438591f225daa3c9fd69359446b94c351aa11eR106-R112) > filters out all child expressions, but in some cases that is not necessary. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32333) Drop references to Master
[ https://issues.apache.org/jira/browse/SPARK-32333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378093#comment-17378093 ] Thomas Graves commented on SPARK-32333: --- Getting back to this, now that spark 3.2 branch is cut, perhaps we can target for 3.3. >From the discussion thread on spark-dev mailing list Leader was mentioned the >most, Scheduler a second. One reason against controller, coordinator, application manager, primary as it perhaps implies being needed and if the standalone master goes down the apps are unaffected. Based on the feedback, I propose "Leader" based on feedback and it being short. > Drop references to Master > - > > Key: SPARK-32333 > URL: https://issues.apache.org/jira/browse/SPARK-32333 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.0.0 >Reporter: Thomas Graves >Priority: Major > > We have a lot of references to "master" in the code base. It will be > beneficial to remove references to problematic language that can alienate > potential community members. > SPARK-32004 removed references to slave > > Here is a IETF draft to fix up some of the most egregious examples > (master/slave, whitelist/backlist) with proposed alternatives. > https://tools.ietf.org/id/draft-knodel-terminology-00.html#rfc.section.1.1.1 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36074) add error class for StructType.findNestedField
Wenchen Fan created SPARK-36074: --- Summary: add error class for StructType.findNestedField Key: SPARK-36074 URL: https://issues.apache.org/jira/browse/SPARK-36074 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.0 Reporter: Wenchen Fan Assignee: Wenchen Fan -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36074) add error class for StructType.findNestedField
[ https://issues.apache.org/jira/browse/SPARK-36074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36074: Assignee: Wenchen Fan (was: Apache Spark) > add error class for StructType.findNestedField > -- > > Key: SPARK-36074 > URL: https://issues.apache.org/jira/browse/SPARK-36074 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36074) add error class for StructType.findNestedField
[ https://issues.apache.org/jira/browse/SPARK-36074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378136#comment-17378136 ] Apache Spark commented on SPARK-36074: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/33282 > add error class for StructType.findNestedField > -- > > Key: SPARK-36074 > URL: https://issues.apache.org/jira/browse/SPARK-36074 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36074) add error class for StructType.findNestedField
[ https://issues.apache.org/jira/browse/SPARK-36074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36074: Assignee: Apache Spark (was: Wenchen Fan) > add error class for StructType.findNestedField > -- > > Key: SPARK-36074 > URL: https://issues.apache.org/jira/browse/SPARK-36074 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Wenchen Fan >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36074) add error class for StructType.findNestedField
[ https://issues.apache.org/jira/browse/SPARK-36074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378137#comment-17378137 ] Apache Spark commented on SPARK-36074: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/33282 > add error class for StructType.findNestedField > -- > > Key: SPARK-36074 > URL: https://issues.apache.org/jira/browse/SPARK-36074 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36018) Some Improvement for Spark Core
[ https://issues.apache.org/jira/browse/SPARK-36018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-36018. -- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 33216 [https://github.com/apache/spark/pull/33216] > Some Improvement for Spark Core > --- > > Key: SPARK-36018 > URL: https://issues.apache.org/jira/browse/SPARK-36018 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.2.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Trivial > Fix For: 3.3.0 > > > I found some code need to improve. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36018) Some Improvement for Spark Core
[ https://issues.apache.org/jira/browse/SPARK-36018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-36018: Assignee: jiaan.geng > Some Improvement for Spark Core > --- > > Key: SPARK-36018 > URL: https://issues.apache.org/jira/browse/SPARK-36018 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.2.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Trivial > > I found some code need to improve. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36075) Support for specifiying executor/driver node selector
Yikun Jiang created SPARK-36075: --- Summary: Support for specifiying executor/driver node selector Key: SPARK-36075 URL: https://issues.apache.org/jira/browse/SPARK-36075 Project: Spark Issue Type: Sub-task Components: Kubernetes Affects Versions: 3.3.0 Reporter: Yikun Jiang Now we can only use "spark.kubernetes.node.selector" to set lable for executor/driver. Sometimes, we need set executor/driver pods to different selector separately. We can add below configure to enable the support for specifiying executor/driver node selector: - spark.kubernetes.driver.node.selector. - spark.kubernetes.executor.node.selector. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36070) Add time cost info for writing rows out and committing the task.
[ https://issues.apache.org/jira/browse/SPARK-36070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-36070: Assignee: Kent Yao > Add time cost info for writing rows out and committing the task. > > > Key: SPARK-36070 > URL: https://issues.apache.org/jira/browse/SPARK-36070 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Minor > > We have a job that has a stage that contains about 8k tasks. Most tasks take > about 1~10min to finish but 3 of them tasks run extremely slow. They take > about 1 hour each to finish and also do their speculations. The root cause is > most likely the delay of the storage system. On the spark side, we can record > the time cost in logs for better bug hunting or performance tuning. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36070) Add time cost info for writing rows out and committing the task.
[ https://issues.apache.org/jira/browse/SPARK-36070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-36070. -- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 33279 [https://github.com/apache/spark/pull/33279] > Add time cost info for writing rows out and committing the task. > > > Key: SPARK-36070 > URL: https://issues.apache.org/jira/browse/SPARK-36070 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Minor > Fix For: 3.3.0 > > > We have a job that has a stage that contains about 8k tasks. Most tasks take > about 1~10min to finish but 3 of them tasks run extremely slow. They take > about 1 hour each to finish and also do their speculations. The root cause is > most likely the delay of the storage system. On the spark side, we can record > the time cost in logs for better bug hunting or performance tuning. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36075) Support for specifiying executor/driver node selector
[ https://issues.apache.org/jira/browse/SPARK-36075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378177#comment-17378177 ] Apache Spark commented on SPARK-36075: -- User 'Yikun' has created a pull request for this issue: https://github.com/apache/spark/pull/33283 > Support for specifiying executor/driver node selector > - > > Key: SPARK-36075 > URL: https://issues.apache.org/jira/browse/SPARK-36075 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Yikun Jiang >Priority: Major > > Now we can only use "spark.kubernetes.node.selector" to set lable for > executor/driver. Sometimes, we need set executor/driver pods to different > selector separately. > We can add below configure to enable the support for specifiying > executor/driver node selector: > - spark.kubernetes.driver.node.selector. > - spark.kubernetes.executor.node.selector. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36075) Support for specifiying executor/driver node selector
[ https://issues.apache.org/jira/browse/SPARK-36075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378178#comment-17378178 ] Apache Spark commented on SPARK-36075: -- User 'Yikun' has created a pull request for this issue: https://github.com/apache/spark/pull/33283 > Support for specifiying executor/driver node selector > - > > Key: SPARK-36075 > URL: https://issues.apache.org/jira/browse/SPARK-36075 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Yikun Jiang >Priority: Major > > Now we can only use "spark.kubernetes.node.selector" to set lable for > executor/driver. Sometimes, we need set executor/driver pods to different > selector separately. > We can add below configure to enable the support for specifiying > executor/driver node selector: > - spark.kubernetes.driver.node.selector. > - spark.kubernetes.executor.node.selector. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36075) Support for specifiying executor/driver node selector
[ https://issues.apache.org/jira/browse/SPARK-36075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36075: Assignee: (was: Apache Spark) > Support for specifiying executor/driver node selector > - > > Key: SPARK-36075 > URL: https://issues.apache.org/jira/browse/SPARK-36075 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Yikun Jiang >Priority: Major > > Now we can only use "spark.kubernetes.node.selector" to set lable for > executor/driver. Sometimes, we need set executor/driver pods to different > selector separately. > We can add below configure to enable the support for specifiying > executor/driver node selector: > - spark.kubernetes.driver.node.selector. > - spark.kubernetes.executor.node.selector. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36075) Support for specifiying executor/driver node selector
[ https://issues.apache.org/jira/browse/SPARK-36075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36075: Assignee: Apache Spark > Support for specifiying executor/driver node selector > - > > Key: SPARK-36075 > URL: https://issues.apache.org/jira/browse/SPARK-36075 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Yikun Jiang >Assignee: Apache Spark >Priority: Major > > Now we can only use "spark.kubernetes.node.selector" to set lable for > executor/driver. Sometimes, we need set executor/driver pods to different > selector separately. > We can add below configure to enable the support for specifiying > executor/driver node selector: > - spark.kubernetes.driver.node.selector. > - spark.kubernetes.executor.node.selector. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36076) [SQL] ArrayIndexOutOfBounds in CAST string to date
Andy Grove created SPARK-36076: -- Summary: [SQL] ArrayIndexOutOfBounds in CAST string to date Key: SPARK-36076 URL: https://issues.apache.org/jira/browse/SPARK-36076 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.1.1 Reporter: Andy Grove {code:java} __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.1.1 /_/ Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_282) Type in expressions to have them evaluated. Type :help for more information.scala> spark.conf.set("spark.rapids.sql.enabled", "false")scala> val df = Seq(":8:434421+ 98:38").toDF("c0") df: org.apache.spark.sql.DataFrame = [c0: string]scala> val df2 = df.withColumn("c1", col("c0").cast(DataTypes.TimestampType)) :25: error: not found: value DataTypes val df2 = df.withColumn("c1", col("c0").cast(DataTypes.TimestampType)) ^scala> import org.spark.sql.types.DataTypes :23: error: object spark is not a member of package org import org.spark.sql.types.DataTypes ^scala> import org.apache.spark.sql.types.DataTypes import org.apache.spark.sql.types.DataTypesscala> val df2 = df.withColumn("c1", col("c0").cast(DataTypes.TimestampType)) df2: org.apache.spark.sql.DataFrame = [c0: string, c1: timestamp]scala> df2.show java.lang.ArrayIndexOutOfBoundsException: 9 at org.apache.spark.sql.catalyst.util.DateTimeUtils$.stringToTimestamp(DateTimeUtils.scala:328) at org.apache.spark.sql.catalyst.expressions.CastBase.$anonfun$castToTimestamp$2(Cast.scala:455) at org.apache.spark.sql.catalyst.expressions.CastBase.buildCast(Cast.scala:295) at org.apache.spark.sql.catalyst.expressions.CastBase.$anonfun$castToTimestamp$1(Cast.scala:451) at org.apache.spark.sql.catalyst.expressions.CastBase.nullSafeEval(Cast.scala:840) at org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:476) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36076) [SQL] ArrayIndexOutOfBounds in CAST string to date
[ https://issues.apache.org/jira/browse/SPARK-36076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated SPARK-36076: --- Description: I discovered this bug during some fuzz testing. {code:java} __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.1.1 /_/ Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_282) Type in expressions to have them evaluated. Type :help for more information.scala> scala> import org.apache.spark.sql.types.DataTypes scala> val df = Seq(":8:434421+ 98:38").toDF("c0") df: org.apache.spark.sql.DataFrame = [c0: string] scala> val df2 = df.withColumn("c1", col("c0").cast(DataTypes.TimestampType)) df2: org.apache.spark.sql.DataFrame = [c0: string, c1: timestamp] scala> df2.show java.lang.ArrayIndexOutOfBoundsException: 9 at org.apache.spark.sql.catalyst.util.DateTimeUtils$.stringToTimestamp(DateTimeUtils.scala:328) at org.apache.spark.sql.catalyst.expressions.CastBase.$anonfun$castToTimestamp$2(Cast.scala:455) at org.apache.spark.sql.catalyst.expressions.CastBase.buildCast(Cast.scala:295) at org.apache.spark.sql.catalyst.expressions.CastBase.$anonfun$castToTimestamp$1(Cast.scala:451) at org.apache.spark.sql.catalyst.expressions.CastBase.nullSafeEval(Cast.scala:840) at org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:476) {code} was: {code:java} __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.1.1 /_/ Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_282) Type in expressions to have them evaluated. Type :help for more information.scala> scala> import org.apache.spark.sql.types.DataTypes scala> val df = Seq(":8:434421+ 98:38").toDF("c0") df: org.apache.spark.sql.DataFrame = [c0: string] scala> val df2 = df.withColumn("c1", col("c0").cast(DataTypes.TimestampType)) df2: org.apache.spark.sql.DataFrame = [c0: string, c1: timestamp] scala> df2.show java.lang.ArrayIndexOutOfBoundsException: 9 at org.apache.spark.sql.catalyst.util.DateTimeUtils$.stringToTimestamp(DateTimeUtils.scala:328) at org.apache.spark.sql.catalyst.expressions.CastBase.$anonfun$castToTimestamp$2(Cast.scala:455) at org.apache.spark.sql.catalyst.expressions.CastBase.buildCast(Cast.scala:295) at org.apache.spark.sql.catalyst.expressions.CastBase.$anonfun$castToTimestamp$1(Cast.scala:451) at org.apache.spark.sql.catalyst.expressions.CastBase.nullSafeEval(Cast.scala:840) at org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:476) {code} > [SQL] ArrayIndexOutOfBounds in CAST string to date > -- > > Key: SPARK-36076 > URL: https://issues.apache.org/jira/browse/SPARK-36076 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.1 >Reporter: Andy Grove >Priority: Major > > I discovered this bug during some fuzz testing. > {code:java} > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 3.1.1 > /_/ > > Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_282) > Type in expressions to have them evaluated. > Type :help for more information.scala> > scala> import org.apache.spark.sql.types.DataTypes > scala> val df = Seq(":8:434421+ 98:38").toDF("c0") > df: org.apache.spark.sql.DataFrame = [c0: string] > scala> val df2 = df.withColumn("c1", col("c0").cast(DataTypes.TimestampType)) > df2: org.apache.spark.sql.DataFrame = [c0: string, c1: timestamp] > scala> df2.show > java.lang.ArrayIndexOutOfBoundsException: 9 > at > org.apache.spark.sql.catalyst.util.DateTimeUtils$.stringToTimestamp(DateTimeUtils.scala:328) > at > org.apache.spark.sql.catalyst.expressions.CastBase.$anonfun$castToTimestamp$2(Cast.scala:455) > at > org.apache.spark.sql.catalyst.expressions.CastBase.buildCast(Cast.scala:295) > at > org.apache.spark.sql.catalyst.expressions.CastBase.$anonfun$castToTimestamp$1(Cast.scala:451) > at > org.apache.spark.sql.catalyst.expressions.CastBase.nullSafeEval(Cast.scala:840) > at > org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:476) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36076) [SQL] ArrayIndexOutOfBounds in CAST string to date
[ https://issues.apache.org/jira/browse/SPARK-36076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated SPARK-36076: --- Description: {code:java} __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.1.1 /_/ Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_282) Type in expressions to have them evaluated. Type :help for more information.scala> scala> import org.apache.spark.sql.types.DataTypes scala> val df = Seq(":8:434421+ 98:38").toDF("c0") df: org.apache.spark.sql.DataFrame = [c0: string] scala> val df2 = df.withColumn("c1", col("c0").cast(DataTypes.TimestampType)) df2: org.apache.spark.sql.DataFrame = [c0: string, c1: timestamp] scala> df2.show java.lang.ArrayIndexOutOfBoundsException: 9 at org.apache.spark.sql.catalyst.util.DateTimeUtils$.stringToTimestamp(DateTimeUtils.scala:328) at org.apache.spark.sql.catalyst.expressions.CastBase.$anonfun$castToTimestamp$2(Cast.scala:455) at org.apache.spark.sql.catalyst.expressions.CastBase.buildCast(Cast.scala:295) at org.apache.spark.sql.catalyst.expressions.CastBase.$anonfun$castToTimestamp$1(Cast.scala:451) at org.apache.spark.sql.catalyst.expressions.CastBase.nullSafeEval(Cast.scala:840) at org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:476) {code} was: {code:java} __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.1.1 /_/ Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_282) Type in expressions to have them evaluated. Type :help for more information.scala> spark.conf.set("spark.rapids.sql.enabled", "false")scala> val df = Seq(":8:434421+ 98:38").toDF("c0") df: org.apache.spark.sql.DataFrame = [c0: string]scala> val df2 = df.withColumn("c1", col("c0").cast(DataTypes.TimestampType)) :25: error: not found: value DataTypes val df2 = df.withColumn("c1", col("c0").cast(DataTypes.TimestampType)) ^scala> import org.spark.sql.types.DataTypes :23: error: object spark is not a member of package org import org.spark.sql.types.DataTypes ^scala> import org.apache.spark.sql.types.DataTypes import org.apache.spark.sql.types.DataTypesscala> val df2 = df.withColumn("c1", col("c0").cast(DataTypes.TimestampType)) df2: org.apache.spark.sql.DataFrame = [c0: string, c1: timestamp]scala> df2.show java.lang.ArrayIndexOutOfBoundsException: 9 at org.apache.spark.sql.catalyst.util.DateTimeUtils$.stringToTimestamp(DateTimeUtils.scala:328) at org.apache.spark.sql.catalyst.expressions.CastBase.$anonfun$castToTimestamp$2(Cast.scala:455) at org.apache.spark.sql.catalyst.expressions.CastBase.buildCast(Cast.scala:295) at org.apache.spark.sql.catalyst.expressions.CastBase.$anonfun$castToTimestamp$1(Cast.scala:451) at org.apache.spark.sql.catalyst.expressions.CastBase.nullSafeEval(Cast.scala:840) at org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:476) {code} > [SQL] ArrayIndexOutOfBounds in CAST string to date > -- > > Key: SPARK-36076 > URL: https://issues.apache.org/jira/browse/SPARK-36076 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.1 >Reporter: Andy Grove >Priority: Major > > {code:java} > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 3.1.1 > /_/ > > Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_282) > Type in expressions to have them evaluated. > Type :help for more information.scala> > scala> import org.apache.spark.sql.types.DataTypes > scala> val df = Seq(":8:434421+ 98:38").toDF("c0") > df: org.apache.spark.sql.DataFrame = [c0: string] > scala> val df2 = df.withColumn("c1", col("c0").cast(DataTypes.TimestampType)) > df2: org.apache.spark.sql.DataFrame = [c0: string, c1: timestamp] > scala> df2.show > java.lang.ArrayIndexOutOfBoundsException: 9 > at > org.apache.spark.sql.catalyst.util.DateTimeUtils$.stringToTimestamp(DateTimeUtils.scala:328) > at > org.apache.spark.sql.catalyst.expressions.CastBase.$anonfun$castToTimestamp$2(Cast.scala:455) > at > org.apache.spark.sql.catalyst.expressions.CastBase.buildCast(Cast.scala:295) > at > org.apache.spark.sql.catalyst.expressions.CastBase.$anonfun$castToTimestamp$1(Cast.scala:451) > at > org.apache.spark.sql.catalyst.expressions.CastBase.nullSafeEval(Cast.scala:840) > at > org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:476) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) -
[jira] [Updated] (SPARK-36076) [SQL] ArrayIndexOutOfBounds in CAST string to timestamp
[ https://issues.apache.org/jira/browse/SPARK-36076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated SPARK-36076: --- Summary: [SQL] ArrayIndexOutOfBounds in CAST string to timestamp (was: [SQL] ArrayIndexOutOfBounds in CAST string to date) > [SQL] ArrayIndexOutOfBounds in CAST string to timestamp > --- > > Key: SPARK-36076 > URL: https://issues.apache.org/jira/browse/SPARK-36076 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.1 >Reporter: Andy Grove >Priority: Major > > I discovered this bug during some fuzz testing. > {code:java} > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 3.1.1 > /_/ > > Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_282) > Type in expressions to have them evaluated. > Type :help for more information.scala> > scala> import org.apache.spark.sql.types.DataTypes > scala> val df = Seq(":8:434421+ 98:38").toDF("c0") > df: org.apache.spark.sql.DataFrame = [c0: string] > scala> val df2 = df.withColumn("c1", col("c0").cast(DataTypes.TimestampType)) > df2: org.apache.spark.sql.DataFrame = [c0: string, c1: timestamp] > scala> df2.show > java.lang.ArrayIndexOutOfBoundsException: 9 > at > org.apache.spark.sql.catalyst.util.DateTimeUtils$.stringToTimestamp(DateTimeUtils.scala:328) > at > org.apache.spark.sql.catalyst.expressions.CastBase.$anonfun$castToTimestamp$2(Cast.scala:455) > at > org.apache.spark.sql.catalyst.expressions.CastBase.buildCast(Cast.scala:295) > at > org.apache.spark.sql.catalyst.expressions.CastBase.$anonfun$castToTimestamp$1(Cast.scala:451) > at > org.apache.spark.sql.catalyst.expressions.CastBase.nullSafeEval(Cast.scala:840) > at > org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:476) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36077) Support numpy literals as input for pandas-on-Spark APIs
Xinrong Meng created SPARK-36077: Summary: Support numpy literals as input for pandas-on-Spark APIs Key: SPARK-36077 URL: https://issues.apache.org/jira/browse/SPARK-36077 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 3.2.0 Reporter: Xinrong Meng Some pandas-on-Spark APIs use PySpark column-related APIs internally, and these column-related APIs don't support numpy literals, thus numpy literals are disallowed as input (e.g. {{to_replace}} parameter in {{replace}} API). `isin` method has been adjusted in [https://github.com/apache/spark/pull/32955|https://github.com/apache/spark/pull/32955,] . We ought to adjust other API to support numpy literals. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36063) Optimize OneRowRelation subqueries
[ https://issues.apache.org/jira/browse/SPARK-36063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378227#comment-17378227 ] Apache Spark commented on SPARK-36063: -- User 'allisonwang-db' has created a pull request for this issue: https://github.com/apache/spark/pull/33284 > Optimize OneRowRelation subqueries > -- > > Key: SPARK-36063 > URL: https://issues.apache.org/jira/browse/SPARK-36063 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Allison Wang >Priority: Major > > Inline subqueries with OneRowRelation as leaf nodes instead of decorrelating > and rewriting them as left outer joins. > Scalar subquery: > ``` > SELECT (SELECT c1) FROM t1 -> SELECT c1 FROM t1 > ``` > Lateral subquery: > ``` > SELECT * FROM t1, LATERAL (SELECT c1, c2) -> SELECT c1, c2 , c1, c2 FROM t1 > ``` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36078) Complete mappings between numpy literals and Spark data types
Xinrong Meng created SPARK-36078: Summary: Complete mappings between numpy literals and Spark data types Key: SPARK-36078 URL: https://issues.apache.org/jira/browse/SPARK-36078 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 3.2.0 Reporter: Xinrong Meng In [https://github.com/apache/spark/pull/32955,] the {{lit}} function defined in {{pyspark.pandas.spark.functions}} has been adjusted to support numpy literals input. However, the mapping between numpy literals and Spark data types is not complete. We ought to fill the gap. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36063) Optimize OneRowRelation subqueries
[ https://issues.apache.org/jira/browse/SPARK-36063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378228#comment-17378228 ] Apache Spark commented on SPARK-36063: -- User 'allisonwang-db' has created a pull request for this issue: https://github.com/apache/spark/pull/33284 > Optimize OneRowRelation subqueries > -- > > Key: SPARK-36063 > URL: https://issues.apache.org/jira/browse/SPARK-36063 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Allison Wang >Priority: Major > > Inline subqueries with OneRowRelation as leaf nodes instead of decorrelating > and rewriting them as left outer joins. > Scalar subquery: > ``` > SELECT (SELECT c1) FROM t1 -> SELECT c1 FROM t1 > ``` > Lateral subquery: > ``` > SELECT * FROM t1, LATERAL (SELECT c1, c2) -> SELECT c1, c2 , c1, c2 FROM t1 > ``` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36063) Optimize OneRowRelation subqueries
[ https://issues.apache.org/jira/browse/SPARK-36063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36063: Assignee: (was: Apache Spark) > Optimize OneRowRelation subqueries > -- > > Key: SPARK-36063 > URL: https://issues.apache.org/jira/browse/SPARK-36063 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Allison Wang >Priority: Major > > Inline subqueries with OneRowRelation as leaf nodes instead of decorrelating > and rewriting them as left outer joins. > Scalar subquery: > ``` > SELECT (SELECT c1) FROM t1 -> SELECT c1 FROM t1 > ``` > Lateral subquery: > ``` > SELECT * FROM t1, LATERAL (SELECT c1, c2) -> SELECT c1, c2 , c1, c2 FROM t1 > ``` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36063) Optimize OneRowRelation subqueries
[ https://issues.apache.org/jira/browse/SPARK-36063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36063: Assignee: Apache Spark > Optimize OneRowRelation subqueries > -- > > Key: SPARK-36063 > URL: https://issues.apache.org/jira/browse/SPARK-36063 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Allison Wang >Assignee: Apache Spark >Priority: Major > > Inline subqueries with OneRowRelation as leaf nodes instead of decorrelating > and rewriting them as left outer joins. > Scalar subquery: > ``` > SELECT (SELECT c1) FROM t1 -> SELECT c1 FROM t1 > ``` > Lateral subquery: > ``` > SELECT * FROM t1, LATERAL (SELECT c1, c2) -> SELECT c1, c2 , c1, c2 FROM t1 > ``` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35866) Improve error message quality
[ https://issues.apache.org/jira/browse/SPARK-35866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karen Feng updated SPARK-35866: --- Description: In the SPIP: Standardize Exception Messages in Spark, there are three major improvements proposed: # Group error messages in dedicated files: SPARK-33539 # Establish an error message guideline for developers SPARK-35140 # Improve error message quality Based on the guideline, we can start improving the error messages in the dedicated files. To make auditing easy, we should use the [SparkThrowable|https://github.com/apache/spark/blob/master/core/src/main/java/org/apache/spark/SparkThrowable.java] framework; then, the error messages can be centralized in a [single JSON file|https://github.com/apache/spark/blob/master/core/src/main/resources/error/error-classes.json]. was: In the SPIP: Standardize Exception Messages in Spark, there are three major improvements proposed: # Group error messages in dedicated files: [SPARK-33539|https://issues.apache.org/jira/browse/SPARK-33539] # Establish an error message guideline for developers [SPARK-35140|https://issues.apache.org/jira/browse/SPARK-35140] # Improve error message quality Based on the guideline, we can start improving the error messages in the dedicated files. > Improve error message quality > - > > Key: SPARK-35866 > URL: https://issues.apache.org/jira/browse/SPARK-35866 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.2.0 >Reporter: Karen Feng >Priority: Major > > In the SPIP: Standardize Exception Messages in Spark, there are three major > improvements proposed: > # Group error messages in dedicated files: SPARK-33539 > # Establish an error message guideline for developers SPARK-35140 > # Improve error message quality > Based on the guideline, we can start improving the error messages in the > dedicated files. To make auditing easy, we should use the > [SparkThrowable|https://github.com/apache/spark/blob/master/core/src/main/java/org/apache/spark/SparkThrowable.java] > framework; then, the error messages can be centralized in a [single JSON > file|https://github.com/apache/spark/blob/master/core/src/main/resources/error/error-classes.json]. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36003) Implement unary operator `invert` of integral ps.Series/Index
[ https://issues.apache.org/jira/browse/SPARK-36003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-36003: - Summary: Implement unary operator `invert` of integral ps.Series/Index (was: Implement unary operator `invert` of numeric ps.Series/Index) > Implement unary operator `invert` of integral ps.Series/Index > - > > Key: SPARK-36003 > URL: https://issues.apache.org/jira/browse/SPARK-36003 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Priority: Major > > > {code:java} > >>> ~ps.Series([1, 2, 3]) > Traceback (most recent call last): > ... > pyspark.sql.utils.AnalysisException: cannot resolve '(NOT `0`)' due to data > type mismatch: argument 1 requires boolean type, however, '`0`' is of bigint > type.; > 'Project [unresolvedalias(NOT 0#1L, > Some(org.apache.spark.sql.Column$$Lambda$1365/2097273578@53165e1))] > +- Project [__index_level_0__#0L, 0#1L, monotonically_increasing_id() AS > __natural_order__#4L] > +- LogicalRDD [__index_level_0__#0L, 0#1L], false > {code} > > Currently, unary operator `invert` of numeric ps.Series/Index is not > supported. We ought to implement that following pandas' behaviors. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36003) Implement unary operator `invert` of integral ps.Series/Index
[ https://issues.apache.org/jira/browse/SPARK-36003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-36003: - Description: {code:java} >>> ~ps.Series([1, 2, 3]) Traceback (most recent call last): ... pyspark.sql.utils.AnalysisException: cannot resolve '(NOT `0`)' due to data type mismatch: argument 1 requires boolean type, however, '`0`' is of bigint type.; 'Project [unresolvedalias(NOT 0#1L, Some(org.apache.spark.sql.Column$$Lambda$1365/2097273578@53165e1))] +- Project [__index_level_0__#0L, 0#1L, monotonically_increasing_id() AS __natural_order__#4L] +- LogicalRDD [__index_level_0__#0L, 0#1L], false {code} Currently, unary operator `invert` of integral ps.Series/Index is not supported. We ought to implement that following pandas' behaviors. was: {code:java} >>> ~ps.Series([1, 2, 3]) Traceback (most recent call last): ... pyspark.sql.utils.AnalysisException: cannot resolve '(NOT `0`)' due to data type mismatch: argument 1 requires boolean type, however, '`0`' is of bigint type.; 'Project [unresolvedalias(NOT 0#1L, Some(org.apache.spark.sql.Column$$Lambda$1365/2097273578@53165e1))] +- Project [__index_level_0__#0L, 0#1L, monotonically_increasing_id() AS __natural_order__#4L] +- LogicalRDD [__index_level_0__#0L, 0#1L], false {code} Currently, unary operator `invert` of numeric ps.Series/Index is not supported. We ought to implement that following pandas' behaviors. > Implement unary operator `invert` of integral ps.Series/Index > - > > Key: SPARK-36003 > URL: https://issues.apache.org/jira/browse/SPARK-36003 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Priority: Major > > > {code:java} > >>> ~ps.Series([1, 2, 3]) > Traceback (most recent call last): > ... > pyspark.sql.utils.AnalysisException: cannot resolve '(NOT `0`)' due to data > type mismatch: argument 1 requires boolean type, however, '`0`' is of bigint > type.; > 'Project [unresolvedalias(NOT 0#1L, > Some(org.apache.spark.sql.Column$$Lambda$1365/2097273578@53165e1))] > +- Project [__index_level_0__#0L, 0#1L, monotonically_increasing_id() AS > __natural_order__#4L] > +- LogicalRDD [__index_level_0__#0L, 0#1L], false > {code} > > Currently, unary operator `invert` of integral ps.Series/Index is not > supported. We ought to implement that following pandas' behaviors. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36003) Implement unary operator `invert` of integral ps.Series/Index
[ https://issues.apache.org/jira/browse/SPARK-36003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378343#comment-17378343 ] Apache Spark commented on SPARK-36003: -- User 'xinrong-databricks' has created a pull request for this issue: https://github.com/apache/spark/pull/33285 > Implement unary operator `invert` of integral ps.Series/Index > - > > Key: SPARK-36003 > URL: https://issues.apache.org/jira/browse/SPARK-36003 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Priority: Major > > > {code:java} > >>> ~ps.Series([1, 2, 3]) > Traceback (most recent call last): > ... > pyspark.sql.utils.AnalysisException: cannot resolve '(NOT `0`)' due to data > type mismatch: argument 1 requires boolean type, however, '`0`' is of bigint > type.; > 'Project [unresolvedalias(NOT 0#1L, > Some(org.apache.spark.sql.Column$$Lambda$1365/2097273578@53165e1))] > +- Project [__index_level_0__#0L, 0#1L, monotonically_increasing_id() AS > __natural_order__#4L] > +- LogicalRDD [__index_level_0__#0L, 0#1L], false > {code} > > Currently, unary operator `invert` of integral ps.Series/Index is not > supported. We ought to implement that following pandas' behaviors. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36003) Implement unary operator `invert` of integral ps.Series/Index
[ https://issues.apache.org/jira/browse/SPARK-36003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36003: Assignee: Apache Spark > Implement unary operator `invert` of integral ps.Series/Index > - > > Key: SPARK-36003 > URL: https://issues.apache.org/jira/browse/SPARK-36003 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Assignee: Apache Spark >Priority: Major > > > {code:java} > >>> ~ps.Series([1, 2, 3]) > Traceback (most recent call last): > ... > pyspark.sql.utils.AnalysisException: cannot resolve '(NOT `0`)' due to data > type mismatch: argument 1 requires boolean type, however, '`0`' is of bigint > type.; > 'Project [unresolvedalias(NOT 0#1L, > Some(org.apache.spark.sql.Column$$Lambda$1365/2097273578@53165e1))] > +- Project [__index_level_0__#0L, 0#1L, monotonically_increasing_id() AS > __natural_order__#4L] > +- LogicalRDD [__index_level_0__#0L, 0#1L], false > {code} > > Currently, unary operator `invert` of integral ps.Series/Index is not > supported. We ought to implement that following pandas' behaviors. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36003) Implement unary operator `invert` of integral ps.Series/Index
[ https://issues.apache.org/jira/browse/SPARK-36003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36003: Assignee: (was: Apache Spark) > Implement unary operator `invert` of integral ps.Series/Index > - > > Key: SPARK-36003 > URL: https://issues.apache.org/jira/browse/SPARK-36003 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Priority: Major > > > {code:java} > >>> ~ps.Series([1, 2, 3]) > Traceback (most recent call last): > ... > pyspark.sql.utils.AnalysisException: cannot resolve '(NOT `0`)' due to data > type mismatch: argument 1 requires boolean type, however, '`0`' is of bigint > type.; > 'Project [unresolvedalias(NOT 0#1L, > Some(org.apache.spark.sql.Column$$Lambda$1365/2097273578@53165e1))] > +- Project [__index_level_0__#0L, 0#1L, monotonically_increasing_id() AS > __natural_order__#4L] > +- LogicalRDD [__index_level_0__#0L, 0#1L], false > {code} > > Currently, unary operator `invert` of integral ps.Series/Index is not > supported. We ought to implement that following pandas' behaviors. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36079) Filter estimate should always be non-negative
Karen Feng created SPARK-36079: -- Summary: Filter estimate should always be non-negative Key: SPARK-36079 URL: https://issues.apache.org/jira/browse/SPARK-36079 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.0 Reporter: Karen Feng It's possible for a column's statistics to have a higher `nullCount` than the table's `rowCount`. In this case, the filter estimates come back outside of the reasonable range (between 0 and 1). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36079) Filter estimate should always be non-negative
[ https://issues.apache.org/jira/browse/SPARK-36079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36079: Assignee: (was: Apache Spark) > Filter estimate should always be non-negative > - > > Key: SPARK-36079 > URL: https://issues.apache.org/jira/browse/SPARK-36079 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Karen Feng >Priority: Major > > It's possible for a column's statistics to have a higher `nullCount` than the > table's `rowCount`. In this case, the filter estimates come back outside of > the reasonable range (between 0 and 1). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36079) Filter estimate should always be non-negative
[ https://issues.apache.org/jira/browse/SPARK-36079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378345#comment-17378345 ] Apache Spark commented on SPARK-36079: -- User 'karenfeng' has created a pull request for this issue: https://github.com/apache/spark/pull/33286 > Filter estimate should always be non-negative > - > > Key: SPARK-36079 > URL: https://issues.apache.org/jira/browse/SPARK-36079 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Karen Feng >Priority: Major > > It's possible for a column's statistics to have a higher `nullCount` than the > table's `rowCount`. In this case, the filter estimates come back outside of > the reasonable range (between 0 and 1). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36079) Filter estimate should always be non-negative
[ https://issues.apache.org/jira/browse/SPARK-36079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36079: Assignee: Apache Spark > Filter estimate should always be non-negative > - > > Key: SPARK-36079 > URL: https://issues.apache.org/jira/browse/SPARK-36079 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Karen Feng >Assignee: Apache Spark >Priority: Major > > It's possible for a column's statistics to have a higher `nullCount` than the > table's `rowCount`. In this case, the filter estimates come back outside of > the reasonable range (between 0 and 1). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36079) Filter estimate should always be non-negative
[ https://issues.apache.org/jira/browse/SPARK-36079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378346#comment-17378346 ] Apache Spark commented on SPARK-36079: -- User 'karenfeng' has created a pull request for this issue: https://github.com/apache/spark/pull/33286 > Filter estimate should always be non-negative > - > > Key: SPARK-36079 > URL: https://issues.apache.org/jira/browse/SPARK-36079 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Karen Feng >Priority: Major > > It's possible for a column's statistics to have a higher `nullCount` than the > table's `rowCount`. In this case, the filter estimates come back outside of > the reasonable range (between 0 and 1). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36079) Null-based filter estimates should always be non-negative
[ https://issues.apache.org/jira/browse/SPARK-36079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karen Feng updated SPARK-36079: --- Summary: Null-based filter estimates should always be non-negative (was: Filter estimate should always be non-negative) > Null-based filter estimates should always be non-negative > - > > Key: SPARK-36079 > URL: https://issues.apache.org/jira/browse/SPARK-36079 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Karen Feng >Priority: Major > > It's possible for a column's statistics to have a higher `nullCount` than the > table's `rowCount`. In this case, the filter estimates come back outside of > the reasonable range (between 0 and 1). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36080) Broadcast join outer join stream side
Yuming Wang created SPARK-36080: --- Summary: Broadcast join outer join stream side Key: SPARK-36080 URL: https://issues.apache.org/jira/browse/SPARK-36080 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: Yuming Wang -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36066) UTF8String trimAll() only can trim space but not ({@literal <=} ASCII 32)
[ https://issues.apache.org/jira/browse/SPARK-36066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36066: Assignee: (was: Apache Spark) > UTF8String trimAll() only can trim space but not ({@literal <=} ASCII 32) > - > > Key: SPARK-36066 > URL: https://issues.apache.org/jira/browse/SPARK-36066 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.1, 3.1.2 >Reporter: liukai >Priority: Major > > In this method,Character.isWhitespace() is used to judged. But > Character.isWhitespace() can not work matching the method definition. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36066) UTF8String trimAll() only can trim space but not ({@literal <=} ASCII 32)
[ https://issues.apache.org/jira/browse/SPARK-36066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36066: Assignee: Apache Spark > UTF8String trimAll() only can trim space but not ({@literal <=} ASCII 32) > - > > Key: SPARK-36066 > URL: https://issues.apache.org/jira/browse/SPARK-36066 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.1, 3.1.2 >Reporter: liukai >Assignee: Apache Spark >Priority: Major > > In this method,Character.isWhitespace() is used to judged. But > Character.isWhitespace() can not work matching the method definition. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36066) UTF8String trimAll() only can trim space but not ({@literal <=} ASCII 32)
[ https://issues.apache.org/jira/browse/SPARK-36066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378405#comment-17378405 ] Apache Spark commented on SPARK-36066: -- User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/33287 > UTF8String trimAll() only can trim space but not ({@literal <=} ASCII 32) > - > > Key: SPARK-36066 > URL: https://issues.apache.org/jira/browse/SPARK-36066 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.1, 3.1.2 >Reporter: liukai >Priority: Major > > In this method,Character.isWhitespace() is used to judged. But > Character.isWhitespace() can not work matching the method definition. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org