[jira] [Created] (SPARK-35839) New SQL function: to_timestamp_ntz
Gengliang Wang created SPARK-35839: -- Summary: New SQL function: to_timestamp_ntz Key: SPARK-35839 URL: https://issues.apache.org/jira/browse/SPARK-35839 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Gengliang Wang Assignee: Gengliang Wang Implement new SQL function: to_timestamp_ntz. It is similar to the built-in function to_timestamp, except that the result type is TimestampWithoutTZType. The naming is from snowflake: https://docs.snowflake.com/en/sql-reference/functions/to_timestamp.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35839) New SQL function: to_timestamp_ntz
[ https://issues.apache.org/jira/browse/SPARK-35839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366427#comment-17366427 ] Apache Spark commented on SPARK-35839: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/32995 > New SQL function: to_timestamp_ntz > -- > > Key: SPARK-35839 > URL: https://issues.apache.org/jira/browse/SPARK-35839 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > Implement new SQL function: to_timestamp_ntz. It is similar to the built-in > function to_timestamp, except that the result type is TimestampWithoutTZType. > The naming is from snowflake: > https://docs.snowflake.com/en/sql-reference/functions/to_timestamp.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35839) New SQL function: to_timestamp_ntz
[ https://issues.apache.org/jira/browse/SPARK-35839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35839: Assignee: Gengliang Wang (was: Apache Spark) > New SQL function: to_timestamp_ntz > -- > > Key: SPARK-35839 > URL: https://issues.apache.org/jira/browse/SPARK-35839 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > Implement new SQL function: to_timestamp_ntz. It is similar to the built-in > function to_timestamp, except that the result type is TimestampWithoutTZType. > The naming is from snowflake: > https://docs.snowflake.com/en/sql-reference/functions/to_timestamp.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35839) New SQL function: to_timestamp_ntz
[ https://issues.apache.org/jira/browse/SPARK-35839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35839: Assignee: Apache Spark (was: Gengliang Wang) > New SQL function: to_timestamp_ntz > -- > > Key: SPARK-35839 > URL: https://issues.apache.org/jira/browse/SPARK-35839 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Apache Spark >Priority: Major > > Implement new SQL function: to_timestamp_ntz. It is similar to the built-in > function to_timestamp, except that the result type is TimestampWithoutTZType. > The naming is from snowflake: > https://docs.snowflake.com/en/sql-reference/functions/to_timestamp.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35611) Introduce the strategy on mismatched offset for start offset timestamp on Kafka data source
[ https://issues.apache.org/jira/browse/SPARK-35611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] L. C. Hsieh reassigned SPARK-35611: --- Assignee: Jungtaek Lim > Introduce the strategy on mismatched offset for start offset timestamp on > Kafka data source > --- > > Key: SPARK-35611 > URL: https://issues.apache.org/jira/browse/SPARK-35611 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.0.2, 3.1.1 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Major > > 1. Rationalization > We encountered a real-world case Spark fails the query if some of the > partitions don't have matching offset by timestamp. > This is intended behavior to avoid bring unintended output for some cases > like: > * timestamp 2 is presented as timestamp-offset, but the some of partitions > don't have the record yet > * record with timestamp 1 comes "later" in the following micro-batch > which is possible since Kafka allows to specify the timestamp in record. > Here the unintended output we talked about was the risk of reading record > with timestamp 1 in the next micro-batch despite the option specifying > timestamp 2. > But for many cases end users just suppose timestamp is increasing > monotonically, and current behavior blocks these cases to make progress. > 2. Proposal > For the cases the timestamp is supposed to increase monotonically, it's safe > to consider the offset to be latest (technically, offset for latest record + > 1) if there's no matching record via timestamp. > This would be pretty much helpful for the case where there's a skew between > partitions and some partitions have older records. > * AS-IS: Spark simply fails the query and end users have to deal with > workarounds requiring manual steps. > * TO-BE: Spark will assign the latest offset for these partitions, so that > Spark can read newer records from these partitions in further micro-batches. > To retain the existing behavior and also give some help for the proposed > "TO-BE" behavior, we'd like to introduce the strategy on mismatched offset > for start offset timestamp. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35611) Introduce the strategy on mismatched offset for start offset timestamp on Kafka data source
[ https://issues.apache.org/jira/browse/SPARK-35611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] L. C. Hsieh resolved SPARK-35611. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 32747 [https://github.com/apache/spark/pull/32747] > Introduce the strategy on mismatched offset for start offset timestamp on > Kafka data source > --- > > Key: SPARK-35611 > URL: https://issues.apache.org/jira/browse/SPARK-35611 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.0.2, 3.1.1 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Major > Fix For: 3.2.0 > > > 1. Rationalization > We encountered a real-world case Spark fails the query if some of the > partitions don't have matching offset by timestamp. > This is intended behavior to avoid bring unintended output for some cases > like: > * timestamp 2 is presented as timestamp-offset, but the some of partitions > don't have the record yet > * record with timestamp 1 comes "later" in the following micro-batch > which is possible since Kafka allows to specify the timestamp in record. > Here the unintended output we talked about was the risk of reading record > with timestamp 1 in the next micro-batch despite the option specifying > timestamp 2. > But for many cases end users just suppose timestamp is increasing > monotonically, and current behavior blocks these cases to make progress. > 2. Proposal > For the cases the timestamp is supposed to increase monotonically, it's safe > to consider the offset to be latest (technically, offset for latest record + > 1) if there's no matching record via timestamp. > This would be pretty much helpful for the case where there's a skew between > partitions and some partitions have older records. > * AS-IS: Spark simply fails the query and end users have to deal with > workarounds requiring manual steps. > * TO-BE: Spark will assign the latest offset for these partitions, so that > Spark can read newer records from these partitions in further micro-batches. > To retain the existing behavior and also give some help for the proposed > "TO-BE" behavior, we'd like to introduce the strategy on mismatched offset > for start offset timestamp. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35835) Select filter query on table with struct complex type fails
[ https://issues.apache.org/jira/browse/SPARK-35835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35835: Assignee: Apache Spark > Select filter query on table with struct complex type fails > --- > > Key: SPARK-35835 > URL: https://issues.apache.org/jira/browse/SPARK-35835 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.1 > Environment: Spark 3.1.1 >Reporter: Chetan Bhat >Assignee: Apache Spark >Priority: Minor > > [Steps]:- > From Spark beeline create a parquet or ORC table having complex type data. > Load data in the table and execute select filter query. > 0: jdbc:hive2://vm2:22550/> create table Struct_com (CUST_ID string, YEAR > int, MONTH int, AGE int, GENDER string, EDUCATED string, IS_MARRIED string, > STRUCT_INT_DOUBLE_STRING_DATE > struct,CARD_COUNT > int,DEBIT_COUNT int, CREDIT_COUNT int, DEPOSIT double, HQ_DEPOSIT double) > stored as parquet; > +-+ > | Result | > +-+ > +-+ > No rows selected (0.161 seconds) > 0: jdbc:hive2://vm2:22550/> LOAD DATA INPATH > 'hdfs://hacluster/chetan/Struct.csv' OVERWRITE INTO TABLE Struct_com; > +-+ > | Result | > +-+ > +-+ > No rows selected (1.09 seconds) > 0: jdbc:hive2://vm2:22550/> SELECT > struct_int_double_string_date.COUNTRY,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.CHECK_DATE,struct_in > t_double_string_date.Country, SUM(struct_int_double_string_date.id) AS Sum > FROM (select * from Struct_com) SUB_QRY WHERE > struct_int_double_string_date.id > 5700 GRO UP BY > struct_int_double_string_date.COUNTRY,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.Country > ORDER BY struct_int_double_string_date.COUNTRY > asc,struct_int_double_string_date.CHECK_DATE > asc,struct_int_double_string_date.CHECK_DATE asc, struct_int_double_stri > ng_date.Country asc; > > [Actual Issue] : - Select filter query on table with struct complex type fails > 0: jdbc:hive2://vm2:22550/> SELECT > struct_int_double_string_date.COUNTRY,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.CHECK_DATE,struct_in > t_double_string_date.Country, SUM(struct_int_double_string_date.id) AS Sum > FROM (select * from Struct_com) SUB_QRY WHERE > struct_int_double_string_date.id > 5700 GRO UP BY > struct_int_double_string_date.COUNTRY,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.Country > ORDER BY struct_int_double_string_date.COUNTRY > asc,struct_int_double_string_date.CHECK_DATE > asc,struct_int_double_string_date.CHECK_DATE asc, struct_int_double_stri > ng_date.Country asc; > Error: org.apache.hive.service.cli.HiveSQLException: Error running query: > org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree: > Exchange rangepartitioning(COUNTRY#139896 ASC NULLS FIRST, CHECK_DATE#139897 > ASC NULLS FIRST, CHECK_DATE#139897 ASC NULLS FIRST, COUNTRY#139896 ASC NULLS > FIRST, 200 ), ENSURE_REQUIREMENTS, [id=#17161] > +- *(2) HashAggregate(keys=[_gen_alias_139928#139928, > _gen_alias_139929#139929], functions=[sum(cast(_gen_alias_139931#139931 as > bigint))], output=[COUNTRY#139896, CHECK_DATE#139897, CHECK_DATE#139898, > Country#139899, Sum#139877L]) > +- Exchange hashpartitioning(_gen_alias_139928#139928, > _gen_alias_139929#139929, 200), ENSURE_REQUIREMENTS, [id=#17157] > +- *(1) HashAggregate(keys=[_gen_alias_139928#139928, > _gen_alias_139929#139929], > functions=[partial_sum(cast(_gen_alias_139931#139931 as bigint))], output=[_g > en_alias_139928#139928, _gen_alias_139929#139929, sum#139934L]) > +- *(1) Project [STRUCT_INT_DOUBLE_STRING_DATE#139885.COUNTRY AS > _gen_alias_139928#139928, STRUCT_INT_DOUBLE_STRING_DATE#139885.CHECK_DATE AS > _gen_alias_13 9929#139929, STRUCT_INT_DOUBLE_STRING_DATE#139885.COUNTRY AS > _gen_alias_139930#139930, STRUCT_INT_DOUBLE_STRING_DATE#139885.ID AS > _gen_alias_139931#139931] > +- *(1) Filter (isnotnull(STRUCT_INT_DOUBLE_STRING_DATE#139885) AND > (STRUCT_INT_DOUBLE_STRING_DATE#139885.ID > 5700)) > +- FileScan parquet default.struct_com[STRUCT_INT_DOUBLE_STRING_DATE#139885] > Batched: false, DataFilters: [isnotnull(STRUCT_INT_DOUBLE_STRING_DATE#13 > 9885), (STRUCT_INT_DOUBLE_STRING_DATE#139885.ID > 5700)], Format: Parquet, > Location: InMemoryFileIndex[hdfs://hacluster/user/hive/warehouse/struct_com], > PartitionFi lters: [], PushedFilters: > [IsNotNull(STRUCT_INT_DOUBLE_STRING_DATE), > GreaterThan(STRUCT_INT_DOUBLE_STRING_DATE.ID,5700)], ReadSchema: > struct G_DATE:struct> > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftser
[jira] [Assigned] (SPARK-35835) Select filter query on table with struct complex type fails
[ https://issues.apache.org/jira/browse/SPARK-35835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35835: Assignee: (was: Apache Spark) > Select filter query on table with struct complex type fails > --- > > Key: SPARK-35835 > URL: https://issues.apache.org/jira/browse/SPARK-35835 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.1 > Environment: Spark 3.1.1 >Reporter: Chetan Bhat >Priority: Minor > > [Steps]:- > From Spark beeline create a parquet or ORC table having complex type data. > Load data in the table and execute select filter query. > 0: jdbc:hive2://vm2:22550/> create table Struct_com (CUST_ID string, YEAR > int, MONTH int, AGE int, GENDER string, EDUCATED string, IS_MARRIED string, > STRUCT_INT_DOUBLE_STRING_DATE > struct,CARD_COUNT > int,DEBIT_COUNT int, CREDIT_COUNT int, DEPOSIT double, HQ_DEPOSIT double) > stored as parquet; > +-+ > | Result | > +-+ > +-+ > No rows selected (0.161 seconds) > 0: jdbc:hive2://vm2:22550/> LOAD DATA INPATH > 'hdfs://hacluster/chetan/Struct.csv' OVERWRITE INTO TABLE Struct_com; > +-+ > | Result | > +-+ > +-+ > No rows selected (1.09 seconds) > 0: jdbc:hive2://vm2:22550/> SELECT > struct_int_double_string_date.COUNTRY,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.CHECK_DATE,struct_in > t_double_string_date.Country, SUM(struct_int_double_string_date.id) AS Sum > FROM (select * from Struct_com) SUB_QRY WHERE > struct_int_double_string_date.id > 5700 GRO UP BY > struct_int_double_string_date.COUNTRY,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.Country > ORDER BY struct_int_double_string_date.COUNTRY > asc,struct_int_double_string_date.CHECK_DATE > asc,struct_int_double_string_date.CHECK_DATE asc, struct_int_double_stri > ng_date.Country asc; > > [Actual Issue] : - Select filter query on table with struct complex type fails > 0: jdbc:hive2://vm2:22550/> SELECT > struct_int_double_string_date.COUNTRY,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.CHECK_DATE,struct_in > t_double_string_date.Country, SUM(struct_int_double_string_date.id) AS Sum > FROM (select * from Struct_com) SUB_QRY WHERE > struct_int_double_string_date.id > 5700 GRO UP BY > struct_int_double_string_date.COUNTRY,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.Country > ORDER BY struct_int_double_string_date.COUNTRY > asc,struct_int_double_string_date.CHECK_DATE > asc,struct_int_double_string_date.CHECK_DATE asc, struct_int_double_stri > ng_date.Country asc; > Error: org.apache.hive.service.cli.HiveSQLException: Error running query: > org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree: > Exchange rangepartitioning(COUNTRY#139896 ASC NULLS FIRST, CHECK_DATE#139897 > ASC NULLS FIRST, CHECK_DATE#139897 ASC NULLS FIRST, COUNTRY#139896 ASC NULLS > FIRST, 200 ), ENSURE_REQUIREMENTS, [id=#17161] > +- *(2) HashAggregate(keys=[_gen_alias_139928#139928, > _gen_alias_139929#139929], functions=[sum(cast(_gen_alias_139931#139931 as > bigint))], output=[COUNTRY#139896, CHECK_DATE#139897, CHECK_DATE#139898, > Country#139899, Sum#139877L]) > +- Exchange hashpartitioning(_gen_alias_139928#139928, > _gen_alias_139929#139929, 200), ENSURE_REQUIREMENTS, [id=#17157] > +- *(1) HashAggregate(keys=[_gen_alias_139928#139928, > _gen_alias_139929#139929], > functions=[partial_sum(cast(_gen_alias_139931#139931 as bigint))], output=[_g > en_alias_139928#139928, _gen_alias_139929#139929, sum#139934L]) > +- *(1) Project [STRUCT_INT_DOUBLE_STRING_DATE#139885.COUNTRY AS > _gen_alias_139928#139928, STRUCT_INT_DOUBLE_STRING_DATE#139885.CHECK_DATE AS > _gen_alias_13 9929#139929, STRUCT_INT_DOUBLE_STRING_DATE#139885.COUNTRY AS > _gen_alias_139930#139930, STRUCT_INT_DOUBLE_STRING_DATE#139885.ID AS > _gen_alias_139931#139931] > +- *(1) Filter (isnotnull(STRUCT_INT_DOUBLE_STRING_DATE#139885) AND > (STRUCT_INT_DOUBLE_STRING_DATE#139885.ID > 5700)) > +- FileScan parquet default.struct_com[STRUCT_INT_DOUBLE_STRING_DATE#139885] > Batched: false, DataFilters: [isnotnull(STRUCT_INT_DOUBLE_STRING_DATE#13 > 9885), (STRUCT_INT_DOUBLE_STRING_DATE#139885.ID > 5700)], Format: Parquet, > Location: InMemoryFileIndex[hdfs://hacluster/user/hive/warehouse/struct_com], > PartitionFi lters: [], PushedFilters: > [IsNotNull(STRUCT_INT_DOUBLE_STRING_DATE), > GreaterThan(STRUCT_INT_DOUBLE_STRING_DATE.ID,5700)], ReadSchema: > struct G_DATE:struct> > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatement
[jira] [Commented] (SPARK-35835) Select filter query on table with struct complex type fails
[ https://issues.apache.org/jira/browse/SPARK-35835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366435#comment-17366435 ] Apache Spark commented on SPARK-35835: -- User 'PavithraRamachandran' has created a pull request for this issue: https://github.com/apache/spark/pull/32996 > Select filter query on table with struct complex type fails > --- > > Key: SPARK-35835 > URL: https://issues.apache.org/jira/browse/SPARK-35835 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.1 > Environment: Spark 3.1.1 >Reporter: Chetan Bhat >Priority: Minor > > [Steps]:- > From Spark beeline create a parquet or ORC table having complex type data. > Load data in the table and execute select filter query. > 0: jdbc:hive2://vm2:22550/> create table Struct_com (CUST_ID string, YEAR > int, MONTH int, AGE int, GENDER string, EDUCATED string, IS_MARRIED string, > STRUCT_INT_DOUBLE_STRING_DATE > struct,CARD_COUNT > int,DEBIT_COUNT int, CREDIT_COUNT int, DEPOSIT double, HQ_DEPOSIT double) > stored as parquet; > +-+ > | Result | > +-+ > +-+ > No rows selected (0.161 seconds) > 0: jdbc:hive2://vm2:22550/> LOAD DATA INPATH > 'hdfs://hacluster/chetan/Struct.csv' OVERWRITE INTO TABLE Struct_com; > +-+ > | Result | > +-+ > +-+ > No rows selected (1.09 seconds) > 0: jdbc:hive2://vm2:22550/> SELECT > struct_int_double_string_date.COUNTRY,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.CHECK_DATE,struct_in > t_double_string_date.Country, SUM(struct_int_double_string_date.id) AS Sum > FROM (select * from Struct_com) SUB_QRY WHERE > struct_int_double_string_date.id > 5700 GRO UP BY > struct_int_double_string_date.COUNTRY,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.Country > ORDER BY struct_int_double_string_date.COUNTRY > asc,struct_int_double_string_date.CHECK_DATE > asc,struct_int_double_string_date.CHECK_DATE asc, struct_int_double_stri > ng_date.Country asc; > > [Actual Issue] : - Select filter query on table with struct complex type fails > 0: jdbc:hive2://vm2:22550/> SELECT > struct_int_double_string_date.COUNTRY,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.CHECK_DATE,struct_in > t_double_string_date.Country, SUM(struct_int_double_string_date.id) AS Sum > FROM (select * from Struct_com) SUB_QRY WHERE > struct_int_double_string_date.id > 5700 GRO UP BY > struct_int_double_string_date.COUNTRY,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.Country > ORDER BY struct_int_double_string_date.COUNTRY > asc,struct_int_double_string_date.CHECK_DATE > asc,struct_int_double_string_date.CHECK_DATE asc, struct_int_double_stri > ng_date.Country asc; > Error: org.apache.hive.service.cli.HiveSQLException: Error running query: > org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree: > Exchange rangepartitioning(COUNTRY#139896 ASC NULLS FIRST, CHECK_DATE#139897 > ASC NULLS FIRST, CHECK_DATE#139897 ASC NULLS FIRST, COUNTRY#139896 ASC NULLS > FIRST, 200 ), ENSURE_REQUIREMENTS, [id=#17161] > +- *(2) HashAggregate(keys=[_gen_alias_139928#139928, > _gen_alias_139929#139929], functions=[sum(cast(_gen_alias_139931#139931 as > bigint))], output=[COUNTRY#139896, CHECK_DATE#139897, CHECK_DATE#139898, > Country#139899, Sum#139877L]) > +- Exchange hashpartitioning(_gen_alias_139928#139928, > _gen_alias_139929#139929, 200), ENSURE_REQUIREMENTS, [id=#17157] > +- *(1) HashAggregate(keys=[_gen_alias_139928#139928, > _gen_alias_139929#139929], > functions=[partial_sum(cast(_gen_alias_139931#139931 as bigint))], output=[_g > en_alias_139928#139928, _gen_alias_139929#139929, sum#139934L]) > +- *(1) Project [STRUCT_INT_DOUBLE_STRING_DATE#139885.COUNTRY AS > _gen_alias_139928#139928, STRUCT_INT_DOUBLE_STRING_DATE#139885.CHECK_DATE AS > _gen_alias_13 9929#139929, STRUCT_INT_DOUBLE_STRING_DATE#139885.COUNTRY AS > _gen_alias_139930#139930, STRUCT_INT_DOUBLE_STRING_DATE#139885.ID AS > _gen_alias_139931#139931] > +- *(1) Filter (isnotnull(STRUCT_INT_DOUBLE_STRING_DATE#139885) AND > (STRUCT_INT_DOUBLE_STRING_DATE#139885.ID > 5700)) > +- FileScan parquet default.struct_com[STRUCT_INT_DOUBLE_STRING_DATE#139885] > Batched: false, DataFilters: [isnotnull(STRUCT_INT_DOUBLE_STRING_DATE#13 > 9885), (STRUCT_INT_DOUBLE_STRING_DATE#139885.ID > 5700)], Format: Parquet, > Location: InMemoryFileIndex[hdfs://hacluster/user/hive/warehouse/struct_com], > PartitionFi lters: [], PushedFilters: > [IsNotNull(STRUCT_INT_DOUBLE_STRING_DATE), > GreaterThan(STRUCT_INT_DOUBLE_STRING_DATE.ID,5700)], ReadSchema: > struct G_DATE:struct> > at > org
[jira] [Commented] (SPARK-35821) Spark 3.1.1 Internet Explorer 11 compatibility issues
[ https://issues.apache.org/jira/browse/SPARK-35821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366445#comment-17366445 ] Masayoshi Tsuzuki commented on SPARK-35821: --- Yes, our project met the problems like not showing the DAG area in the history page on IE11 several years ago, but we thought that those are just because of IE which has not enough compatibility with HTML5 or something, so we just avoided them by using Firefox. We didn't investigate the cause. > Spark 3.1.1 Internet Explorer 11 compatibility issues > - > > Key: SPARK-35821 > URL: https://issues.apache.org/jira/browse/SPARK-35821 > Project: Spark > Issue Type: New Feature > Components: Web UI >Affects Versions: 3.1.1 >Reporter: jobit mathew >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35840) Add `apply()` for a single field to `YearMonthIntervalType` and `DayTimeIntervalType`
Max Gekk created SPARK-35840: Summary: Add `apply()` for a single field to `YearMonthIntervalType` and `DayTimeIntervalType` Key: SPARK-35840 URL: https://issues.apache.org/jira/browse/SPARK-35840 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Max Gekk Assignee: Max Gekk Add 2 methods: {code:scala} def apply(field: Byte): YearMonthIntervalType = YearMonthIntervalType(field, field) def apply(field: Byte): DayTimeIntervalType = DayTimeIntervalType(field, field) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35840) Add `apply()` for a single field to `YearMonthIntervalType` and `DayTimeIntervalType`
[ https://issues.apache.org/jira/browse/SPARK-35840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366466#comment-17366466 ] Apache Spark commented on SPARK-35840: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/32997 > Add `apply()` for a single field to `YearMonthIntervalType` and > `DayTimeIntervalType` > - > > Key: SPARK-35840 > URL: https://issues.apache.org/jira/browse/SPARK-35840 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > > Add 2 methods: > {code:scala} > def apply(field: Byte): YearMonthIntervalType = > YearMonthIntervalType(field, field) > def apply(field: Byte): DayTimeIntervalType = DayTimeIntervalType(field, > field) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35840) Add `apply()` for a single field to `YearMonthIntervalType` and `DayTimeIntervalType`
[ https://issues.apache.org/jira/browse/SPARK-35840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366465#comment-17366465 ] Apache Spark commented on SPARK-35840: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/32997 > Add `apply()` for a single field to `YearMonthIntervalType` and > `DayTimeIntervalType` > - > > Key: SPARK-35840 > URL: https://issues.apache.org/jira/browse/SPARK-35840 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > > Add 2 methods: > {code:scala} > def apply(field: Byte): YearMonthIntervalType = > YearMonthIntervalType(field, field) > def apply(field: Byte): DayTimeIntervalType = DayTimeIntervalType(field, > field) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35840) Add `apply()` for a single field to `YearMonthIntervalType` and `DayTimeIntervalType`
[ https://issues.apache.org/jira/browse/SPARK-35840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35840: Assignee: Apache Spark (was: Max Gekk) > Add `apply()` for a single field to `YearMonthIntervalType` and > `DayTimeIntervalType` > - > > Key: SPARK-35840 > URL: https://issues.apache.org/jira/browse/SPARK-35840 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Major > > Add 2 methods: > {code:scala} > def apply(field: Byte): YearMonthIntervalType = > YearMonthIntervalType(field, field) > def apply(field: Byte): DayTimeIntervalType = DayTimeIntervalType(field, > field) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35840) Add `apply()` for a single field to `YearMonthIntervalType` and `DayTimeIntervalType`
[ https://issues.apache.org/jira/browse/SPARK-35840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35840: Assignee: Max Gekk (was: Apache Spark) > Add `apply()` for a single field to `YearMonthIntervalType` and > `DayTimeIntervalType` > - > > Key: SPARK-35840 > URL: https://issues.apache.org/jira/browse/SPARK-35840 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > > Add 2 methods: > {code:scala} > def apply(field: Byte): YearMonthIntervalType = > YearMonthIntervalType(field, field) > def apply(field: Byte): DayTimeIntervalType = DayTimeIntervalType(field, > field) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35841) Casting string to decimal type doesn't work if the sum of the digits is greater than 38
Roberto Gelsi created SPARK-35841: - Summary: Casting string to decimal type doesn't work if the sum of the digits is greater than 38 Key: SPARK-35841 URL: https://issues.apache.org/jira/browse/SPARK-35841 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.1.2, 3.1.1 Environment: Tested in a Kubernetes Cluster with Spark 3.1.1 and Spark 3.1.2 images (Hadoop 3.2.1, Python 3.9, Scala 2.12.13) Reporter: Roberto Gelsi Since Spark 3.1.1, NULL is returned when casting a string with many decimal places to a decimal type. If the sum of the digits before and after the decimal point is less than 39, a value is returned. From 39 digits, however, NULL is returned. This worked until Spark 3.0.X. Code to reproduce: * A string with 2 decimal places in front of the decimal point and 37 decimal places after the decimal point returns null {code:python} data = ['28.92599983799625624669715762138'] dfs = spark.createDataFrame(data, StringType()) dfd = dfs.withColumn('value', col('value').cast('decimal(10, 5)')) dfd.show(truncate=False) {code} +-+ |value| +-+ |null | +-+ * A string with 2 decimal places in front of the decimal point and 36 decimal places after the decimal point returns the number as decimal {code:python} data = ['28.9259998379962562466971576213'] dfs = spark.createDataFrame(data, StringType()) dfd = dfs.withColumn('value', col('value').cast('decimal(10, 5)')) dfd.show(truncate=False) {code} ++ |value | ++ |28.92600| ++ * A string with 1 decimal place in front of the decimal point and 37 decimal places after the decimal point returns the number as decimal {code:python} data = ['2.92599983799625624669715762138'] dfs = spark.createDataFrame(data, StringType()) dfd = dfs.withColumn('value', col('value').cast('decimal(10, 5)')) dfd.show(truncate=False) {code} +---+ |value | +---+ |2.92600| +---+ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35842) Ignore all ".idea" directory in submodules
Gengliang Wang created SPARK-35842: -- Summary: Ignore all ".idea" directory in submodules Key: SPARK-35842 URL: https://issues.apache.org/jira/browse/SPARK-35842 Project: Spark Issue Type: Bug Components: Project Infra Affects Versions: 3.2.0 Reporter: Gengliang Wang Assignee: Gengliang Wang After https://github.com/apache/spark/pull/32337, all the `.idea/` in submodules are treated as git difference again. For example, when I open the project `resource-managers/yarn/` with IntelliJ, the git status becomes {code:java} Untracked files: (use "git add ..." to include in what will be committed) resource-managers/yarn/.idea/ {code} The same issue happens on opening `sql/hive-thriftserver/` with IntelliJ. We should ignore all the ".idea" directories instead of the one under the root path. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35842) Ignore all ".idea" directory in submodules
[ https://issues.apache.org/jira/browse/SPARK-35842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang updated SPARK-35842: --- Priority: Minor (was: Major) > Ignore all ".idea" directory in submodules > -- > > Key: SPARK-35842 > URL: https://issues.apache.org/jira/browse/SPARK-35842 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Minor > > After https://github.com/apache/spark/pull/32337, all the `.idea/` in > submodules are treated as git difference again. > For example, when I open the project `resource-managers/yarn/` with IntelliJ, > the git status becomes > {code:java} > Untracked files: > (use "git add ..." to include in what will be committed) > resource-managers/yarn/.idea/ > {code} > The same issue happens on opening `sql/hive-thriftserver/` with IntelliJ. > We should ignore all the ".idea" directories instead of the one under the > root path. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35842) Ignore all ".idea" directories
[ https://issues.apache.org/jira/browse/SPARK-35842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang updated SPARK-35842: --- Summary: Ignore all ".idea" directories (was: Ignore all ".idea" directory in submodules) > Ignore all ".idea" directories > --- > > Key: SPARK-35842 > URL: https://issues.apache.org/jira/browse/SPARK-35842 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Minor > > After https://github.com/apache/spark/pull/32337, all the `.idea/` in > submodules are treated as git difference again. > For example, when I open the project `resource-managers/yarn/` with IntelliJ, > the git status becomes > {code:java} > Untracked files: > (use "git add ..." to include in what will be committed) > resource-managers/yarn/.idea/ > {code} > The same issue happens on opening `sql/hive-thriftserver/` with IntelliJ. > We should ignore all the ".idea" directories instead of the one under the > root path. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35820) Support cast between different DayTimeIntervalType
[ https://issues.apache.org/jira/browse/SPARK-35820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-35820: Assignee: angerszhu > Support cast between different DayTimeIntervalType > -- > > Key: SPARK-35820 > URL: https://issues.apache.org/jira/browse/SPARK-35820 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > > Support cast between different DayTimeIntervalType -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35820) Support cast between different DayTimeIntervalType
[ https://issues.apache.org/jira/browse/SPARK-35820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-35820. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 32975 [https://github.com/apache/spark/pull/32975] > Support cast between different DayTimeIntervalType > -- > > Key: SPARK-35820 > URL: https://issues.apache.org/jira/browse/SPARK-35820 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Fix For: 3.2.0 > > > Support cast between different DayTimeIntervalType -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35842) Ignore all ".idea" directories
[ https://issues.apache.org/jira/browse/SPARK-35842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366483#comment-17366483 ] Apache Spark commented on SPARK-35842: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/32998 > Ignore all ".idea" directories > --- > > Key: SPARK-35842 > URL: https://issues.apache.org/jira/browse/SPARK-35842 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Minor > > After https://github.com/apache/spark/pull/32337, all the `.idea/` in > submodules are treated as git difference again. > For example, when I open the project `resource-managers/yarn/` with IntelliJ, > the git status becomes > {code:java} > Untracked files: > (use "git add ..." to include in what will be committed) > resource-managers/yarn/.idea/ > {code} > The same issue happens on opening `sql/hive-thriftserver/` with IntelliJ. > We should ignore all the ".idea" directories instead of the one under the > root path. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35842) Ignore all ".idea" directories
[ https://issues.apache.org/jira/browse/SPARK-35842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35842: Assignee: Apache Spark (was: Gengliang Wang) > Ignore all ".idea" directories > --- > > Key: SPARK-35842 > URL: https://issues.apache.org/jira/browse/SPARK-35842 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Apache Spark >Priority: Minor > > After https://github.com/apache/spark/pull/32337, all the `.idea/` in > submodules are treated as git difference again. > For example, when I open the project `resource-managers/yarn/` with IntelliJ, > the git status becomes > {code:java} > Untracked files: > (use "git add ..." to include in what will be committed) > resource-managers/yarn/.idea/ > {code} > The same issue happens on opening `sql/hive-thriftserver/` with IntelliJ. > We should ignore all the ".idea" directories instead of the one under the > root path. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35842) Ignore all ".idea" directories
[ https://issues.apache.org/jira/browse/SPARK-35842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35842: Assignee: Gengliang Wang (was: Apache Spark) > Ignore all ".idea" directories > --- > > Key: SPARK-35842 > URL: https://issues.apache.org/jira/browse/SPARK-35842 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Minor > > After https://github.com/apache/spark/pull/32337, all the `.idea/` in > submodules are treated as git difference again. > For example, when I open the project `resource-managers/yarn/` with IntelliJ, > the git status becomes > {code:java} > Untracked files: > (use "git add ..." to include in what will be committed) > resource-managers/yarn/.idea/ > {code} > The same issue happens on opening `sql/hive-thriftserver/` with IntelliJ. > We should ignore all the ".idea" directories instead of the one under the > root path. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35842) Ignore all ".idea" directories
[ https://issues.apache.org/jira/browse/SPARK-35842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366485#comment-17366485 ] Apache Spark commented on SPARK-35842: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/32998 > Ignore all ".idea" directories > --- > > Key: SPARK-35842 > URL: https://issues.apache.org/jira/browse/SPARK-35842 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Minor > > After https://github.com/apache/spark/pull/32337, all the `.idea/` in > submodules are treated as git difference again. > For example, when I open the project `resource-managers/yarn/` with IntelliJ, > the git status becomes > {code:java} > Untracked files: > (use "git add ..." to include in what will be committed) > resource-managers/yarn/.idea/ > {code} > The same issue happens on opening `sql/hive-thriftserver/` with IntelliJ. > We should ignore all the ".idea" directories instead of the one under the > root path. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35727) Return INTERVAL DAY from dates subtraction
[ https://issues.apache.org/jira/browse/SPARK-35727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35727: Assignee: (was: Apache Spark) > Return INTERVAL DAY from dates subtraction > -- > > Key: SPARK-35727 > URL: https://issues.apache.org/jira/browse/SPARK-35727 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Priority: Major > > Type of dates subtraction should be INTERVAL DAY (DayTimeIntervalType(DAY, > DAY)). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35727) Return INTERVAL DAY from dates subtraction
[ https://issues.apache.org/jira/browse/SPARK-35727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366500#comment-17366500 ] Apache Spark commented on SPARK-35727: -- User 'Peng-Lei' has created a pull request for this issue: https://github.com/apache/spark/pull/32999 > Return INTERVAL DAY from dates subtraction > -- > > Key: SPARK-35727 > URL: https://issues.apache.org/jira/browse/SPARK-35727 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Priority: Major > > Type of dates subtraction should be INTERVAL DAY (DayTimeIntervalType(DAY, > DAY)). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35727) Return INTERVAL DAY from dates subtraction
[ https://issues.apache.org/jira/browse/SPARK-35727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366502#comment-17366502 ] Apache Spark commented on SPARK-35727: -- User 'Peng-Lei' has created a pull request for this issue: https://github.com/apache/spark/pull/32999 > Return INTERVAL DAY from dates subtraction > -- > > Key: SPARK-35727 > URL: https://issues.apache.org/jira/browse/SPARK-35727 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Priority: Major > > Type of dates subtraction should be INTERVAL DAY (DayTimeIntervalType(DAY, > DAY)). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35727) Return INTERVAL DAY from dates subtraction
[ https://issues.apache.org/jira/browse/SPARK-35727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35727: Assignee: Apache Spark > Return INTERVAL DAY from dates subtraction > -- > > Key: SPARK-35727 > URL: https://issues.apache.org/jira/browse/SPARK-35727 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Major > > Type of dates subtraction should be INTERVAL DAY (DayTimeIntervalType(DAY, > DAY)). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35778) Check multiply/divide of year-month intervals of any fields by numeric
[ https://issues.apache.org/jira/browse/SPARK-35778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366507#comment-17366507 ] Apache Spark commented on SPARK-35778: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/33000 > Check multiply/divide of year-month intervals of any fields by numeric > -- > > Key: SPARK-35778 > URL: https://issues.apache.org/jira/browse/SPARK-35778 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Priority: Major > > Write tests that checks multiply/divide of the following intervals by numeric: > # INTERVAL YEAR > # INTERVAL YEAR TO MONTH > # INTERVAL MONTH -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35778) Check multiply/divide of year-month intervals of any fields by numeric
[ https://issues.apache.org/jira/browse/SPARK-35778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35778: Assignee: (was: Apache Spark) > Check multiply/divide of year-month intervals of any fields by numeric > -- > > Key: SPARK-35778 > URL: https://issues.apache.org/jira/browse/SPARK-35778 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Priority: Major > > Write tests that checks multiply/divide of the following intervals by numeric: > # INTERVAL YEAR > # INTERVAL YEAR TO MONTH > # INTERVAL MONTH -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35778) Check multiply/divide of year-month intervals of any fields by numeric
[ https://issues.apache.org/jira/browse/SPARK-35778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35778: Assignee: Apache Spark > Check multiply/divide of year-month intervals of any fields by numeric > -- > > Key: SPARK-35778 > URL: https://issues.apache.org/jira/browse/SPARK-35778 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Major > > Write tests that checks multiply/divide of the following intervals by numeric: > # INTERVAL YEAR > # INTERVAL YEAR TO MONTH > # INTERVAL MONTH -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35778) Check multiply/divide of year-month intervals of any fields by numeric
[ https://issues.apache.org/jira/browse/SPARK-35778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366508#comment-17366508 ] Apache Spark commented on SPARK-35778: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/33000 > Check multiply/divide of year-month intervals of any fields by numeric > -- > > Key: SPARK-35778 > URL: https://issues.apache.org/jira/browse/SPARK-35778 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Priority: Major > > Write tests that checks multiply/divide of the following intervals by numeric: > # INTERVAL YEAR > # INTERVAL YEAR TO MONTH > # INTERVAL MONTH -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35821) Spark 3.1.1 Internet Explorer 11 compatibility issues
[ https://issues.apache.org/jira/browse/SPARK-35821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jobit mathew updated SPARK-35821: - Attachment: dag_chrome.png > Spark 3.1.1 Internet Explorer 11 compatibility issues > - > > Key: SPARK-35821 > URL: https://issues.apache.org/jira/browse/SPARK-35821 > Project: Spark > Issue Type: New Feature > Components: Web UI >Affects Versions: 3.1.1 >Reporter: jobit mathew >Priority: Minor > Attachments: dag_IE.PNG, dag_chrome.png > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35821) Spark 3.1.1 Internet Explorer 11 compatibility issues
[ https://issues.apache.org/jira/browse/SPARK-35821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jobit mathew updated SPARK-35821: - Attachment: dag_IE.PNG > Spark 3.1.1 Internet Explorer 11 compatibility issues > - > > Key: SPARK-35821 > URL: https://issues.apache.org/jira/browse/SPARK-35821 > Project: Spark > Issue Type: New Feature > Components: Web UI >Affects Versions: 3.1.1 >Reporter: jobit mathew >Priority: Minor > Attachments: Executortab_Chrome.png, Executortab_IE.PNG, dag_IE.PNG, > dag_chrome.png > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35821) Spark 3.1.1 Internet Explorer 11 compatibility issues
[ https://issues.apache.org/jira/browse/SPARK-35821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jobit mathew updated SPARK-35821: - Attachment: Executortab_IE.PNG > Spark 3.1.1 Internet Explorer 11 compatibility issues > - > > Key: SPARK-35821 > URL: https://issues.apache.org/jira/browse/SPARK-35821 > Project: Spark > Issue Type: New Feature > Components: Web UI >Affects Versions: 3.1.1 >Reporter: jobit mathew >Priority: Minor > Attachments: Executortab_Chrome.png, Executortab_IE.PNG, dag_IE.PNG, > dag_chrome.png > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35821) Spark 3.1.1 Internet Explorer 11 compatibility issues
[ https://issues.apache.org/jira/browse/SPARK-35821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jobit mathew updated SPARK-35821: - Attachment: Executortab_Chrome.png > Spark 3.1.1 Internet Explorer 11 compatibility issues > - > > Key: SPARK-35821 > URL: https://issues.apache.org/jira/browse/SPARK-35821 > Project: Spark > Issue Type: New Feature > Components: Web UI >Affects Versions: 3.1.1 >Reporter: jobit mathew >Priority: Minor > Attachments: Executortab_Chrome.png, Executortab_IE.PNG, dag_IE.PNG, > dag_chrome.png > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35821) Spark 3.1.1 Internet Explorer 11 compatibility issues
[ https://issues.apache.org/jira/browse/SPARK-35821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jobit mathew updated SPARK-35821: - Description: Spark UI-Executor tab is empty in IE11 Spark UI-Stages DAG visualization is empty in IE11 other tabs looks Ok Attaching some scrreshots > Spark 3.1.1 Internet Explorer 11 compatibility issues > - > > Key: SPARK-35821 > URL: https://issues.apache.org/jira/browse/SPARK-35821 > Project: Spark > Issue Type: New Feature > Components: Web UI >Affects Versions: 3.1.1 >Reporter: jobit mathew >Priority: Minor > Attachments: Executortab_Chrome.png, Executortab_IE.PNG, dag_IE.PNG, > dag_chrome.png > > > Spark UI-Executor tab is empty in IE11 > Spark UI-Stages DAG visualization is empty in IE11 > other tabs looks Ok > Attaching some scrreshots -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35821) Spark 3.1.1 Internet Explorer 11 compatibility issues
[ https://issues.apache.org/jira/browse/SPARK-35821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jobit mathew updated SPARK-35821: - Description: Spark UI-Executor tab is empty in IE11 Spark UI-Stages DAG visualization is empty in IE11 other tabs looks Ok Attaching some screenshots was: Spark UI-Executor tab is empty in IE11 Spark UI-Stages DAG visualization is empty in IE11 other tabs looks Ok Attaching some scrreshots > Spark 3.1.1 Internet Explorer 11 compatibility issues > - > > Key: SPARK-35821 > URL: https://issues.apache.org/jira/browse/SPARK-35821 > Project: Spark > Issue Type: New Feature > Components: Web UI >Affects Versions: 3.1.1 >Reporter: jobit mathew >Priority: Minor > Attachments: Executortab_Chrome.png, Executortab_IE.PNG, dag_IE.PNG, > dag_chrome.png > > > Spark UI-Executor tab is empty in IE11 > Spark UI-Stages DAG visualization is empty in IE11 > other tabs looks Ok > Attaching some screenshots -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35821) Spark 3.1.1 Internet Explorer 11 compatibility issues
[ https://issues.apache.org/jira/browse/SPARK-35821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366531#comment-17366531 ] jobit mathew commented on SPARK-35821: -- [~hyukjin.kwon] I attached some screen shots .Could you please have a look > Spark 3.1.1 Internet Explorer 11 compatibility issues > - > > Key: SPARK-35821 > URL: https://issues.apache.org/jira/browse/SPARK-35821 > Project: Spark > Issue Type: New Feature > Components: Web UI >Affects Versions: 3.1.1 >Reporter: jobit mathew >Priority: Minor > Attachments: Executortab_Chrome.png, Executortab_IE.PNG, dag_IE.PNG, > dag_chrome.png > > > Spark UI-Executor tab is empty in IE11 > Spark UI-Stages DAG visualization is empty in IE11 > other tabs looks Ok > Attaching some screenshots -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35821) Spark 3.1.1 Internet Explorer 11 compatibility issues
[ https://issues.apache.org/jira/browse/SPARK-35821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jobit mathew updated SPARK-35821: - Description: Spark UI-Executor tab is empty in IE11 Spark UI-Stages DAG visualization is empty in IE11 other tabs looks Ok. Spark job history shows completed and incomplete applications list .But when we go inside each application same issue may be there. Attaching some screenshots was: Spark UI-Executor tab is empty in IE11 Spark UI-Stages DAG visualization is empty in IE11 other tabs looks Ok Attaching some screenshots > Spark 3.1.1 Internet Explorer 11 compatibility issues > - > > Key: SPARK-35821 > URL: https://issues.apache.org/jira/browse/SPARK-35821 > Project: Spark > Issue Type: New Feature > Components: Web UI >Affects Versions: 3.1.1 >Reporter: jobit mathew >Priority: Minor > Attachments: Executortab_Chrome.png, Executortab_IE.PNG, dag_IE.PNG, > dag_chrome.png > > > Spark UI-Executor tab is empty in IE11 > Spark UI-Stages DAG visualization is empty in IE11 > other tabs looks Ok. > Spark job history shows completed and incomplete applications list .But when > we go inside each application same issue may be there. > Attaching some screenshots -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35840) Add `apply()` for a single field to `YearMonthIntervalType` and `DayTimeIntervalType`
[ https://issues.apache.org/jira/browse/SPARK-35840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-35840. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 32997 [https://github.com/apache/spark/pull/32997] > Add `apply()` for a single field to `YearMonthIntervalType` and > `DayTimeIntervalType` > - > > Key: SPARK-35840 > URL: https://issues.apache.org/jira/browse/SPARK-35840 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Fix For: 3.2.0 > > > Add 2 methods: > {code:scala} > def apply(field: Byte): YearMonthIntervalType = > YearMonthIntervalType(field, field) > def apply(field: Byte): DayTimeIntervalType = DayTimeIntervalType(field, > field) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26247) SPIP - ML Model Extension for no-Spark MLLib Online Serving
[ https://issues.apache.org/jira/browse/SPARK-26247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366543#comment-17366543 ] Yik San Chan commented on SPARK-26247: -- [~aholler] Hi Anne, I wonder what prevents the proposal from approved? I have no access to the Google Docs, so I am not sure what happens there. Thanks! > SPIP - ML Model Extension for no-Spark MLLib Online Serving > --- > > Key: SPARK-26247 > URL: https://issues.apache.org/jira/browse/SPARK-26247 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 2.1.0 >Reporter: Anne Holler >Priority: Major > Labels: SPIP, bulk-closed > Attachments: SPIPMlModelExtensionForOnlineServing.pdf, diff.out, > diff.reduceLoadLatency, diff.scoreInstance > > > This ticket tracks an SPIP to improve model load time and model serving > interfaces for online serving of Spark MLlib models. The SPIP is here > [https://docs.google.com/a/uber.com/document/d/e/2PACX-1vRttVNNMBt4pBU2oBWKoiK3-7PW6RDwvHNgSMqO67ilxTX_WUStJ2ysUdAk5Im08eyHvlpcfq1g-DLF/pub] > > The improvement opportunity exists in all versions of spark. We developed > our set of changes wrt version 2.1.0 and can port them forward to other > versions (e.g., we have ported them forward to 2.3.2). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35623) Volcano resource manager for Spark on Kubernetes
[ https://issues.apache.org/jira/browse/SPARK-35623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366555#comment-17366555 ] Dipanjan Kailthya commented on SPARK-35623: --- Hi [~pingsutw], thank you for expressing your interest! We are in the process of publishing a first draft. In the meantime, how can we contact you, to maybe give you a more detailed overview? Do you have a preferred email address? > Volcano resource manager for Spark on Kubernetes > > > Key: SPARK-35623 > URL: https://issues.apache.org/jira/browse/SPARK-35623 > Project: Spark > Issue Type: Brainstorming > Components: Kubernetes >Affects Versions: 3.1.1, 3.1.2 >Reporter: Dipanjan Kailthya >Priority: Minor > Labels: kubernetes, resourcemanager > > Dear Spark Developers, > > Hello from the Netherlands! Posting this here as I still haven't gotten > accepted to post in the spark dev mailing list. > > My team is planning to use spark with Kubernetes support on our shared > (multi-tenant) on premise Kubernetes cluster. However we would like to have > certain scheduling features like fair-share and preemption which as we > understand are not built into the current spark-kubernetes resource manager > yet. We have been working on and are close to a first successful prototype > integration with Volcano ([https://volcano.sh/en/docs/]). Briefly this means > a new resource manager component with lots in common with existing > spark-kubernetes resource manager, but instead of pods it launches Volcano > jobs which delegate the driver and executor pod creation and lifecycle > management to Volcano. We are interested in contributing this to open source, > either directly in spark or as a separate project. > > So, two questions: > > 1. Do the spark maintainers see this as a valuable contribution to the > mainline spark codebase? If so, can we have some guidance on how to publish > the changes? > > 2. Are any other developers / organizations interested to contribute to this > effort? If so, please get in touch. > > Best, > Dipanjan -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35700) spark.sql.orc.filterPushdown not working with Spark 3.1.1 for tables with varchar data type
[ https://issues.apache.org/jira/browse/SPARK-35700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35700: Assignee: Apache Spark > spark.sql.orc.filterPushdown not working with Spark 3.1.1 for tables with > varchar data type > --- > > Key: SPARK-35700 > URL: https://issues.apache.org/jira/browse/SPARK-35700 > Project: Spark > Issue Type: Bug > Components: Kubernetes, PySpark, Spark Core >Affects Versions: 3.1.1 > Environment: Spark 3.1.1 on K8S >Reporter: Arghya Saha >Assignee: Apache Spark >Priority: Major > > We are not able to upgrade to Spark 3.1.1 from Spark 2.4.x as the join on > varchar column is failing which is unexpected and works on Spark 3.0.0. We > are trying to run it on Spark 3.1.1 (MR 3.2) on K8s > Below is my use case: > Tables are external hive table and files are stored as ORC. We do have > varchar column and when we are trying to perform join on varchar column we > are getting the exception. > As I understand Spark 3.1.1 have introduced varchar data type but seems its > not well tested with ORC and does not have backward compatibility. I have > even tried with below config without luck > *spark.sql.legacy.charVarcharAsString: "true"* > We are not getting the error when *spark.sql.orc.filterPushdown=false* > Below is the code: Here col1 is of type varchar(32) in hive > {code:java} > df = spark.sql("select col1, col2 from table1 a inner join table2 on b > (a.col1=b.col1 and a.col2 > b.col2 )") > df.write.format("orc").option("compression", > "zlib").mode("Append").save("") > {code} > Below is the error: > > {code:java} > Job aborted due to stage failure: Task 43 in stage 5.0 failed 4 times, most > recent failure: Lost task 43.3 in stage 5.0 (TID 524) (10.219.36.64 executor > 5): java.lang.UnsupportedOperationException: DataType: varchar(32) > at > org.apache.spark.sql.execution.datasources.orc.OrcFilters$.getPredicateLeafType(OrcFilters.scala:150) > at > org.apache.spark.sql.execution.datasources.orc.OrcFilters$.getType$1(OrcFilters.scala:222) > at > org.apache.spark.sql.execution.datasources.orc.OrcFilters$.buildLeafSearchArgument(OrcFilters.scala:266) > at > org.apache.spark.sql.execution.datasources.orc.OrcFilters$.convertibleFiltersHelper$1(OrcFilters.scala:132) > at > org.apache.spark.sql.execution.datasources.orc.OrcFilters$.$anonfun$convertibleFilters$4(OrcFilters.scala:135) > at > scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245) > at scala.collection.immutable.List.foreach(List.scala:392) > at scala.collection.TraversableLike.flatMap(TraversableLike.scala:245) > at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:242) > at scala.collection.immutable.List.flatMap(List.scala:355) > at > org.apache.spark.sql.execution.datasources.orc.OrcFilters$.convertibleFilters(OrcFilters.scala:134) > at > org.apache.spark.sql.execution.datasources.orc.OrcFilters$.createFilter(OrcFilters.scala:73) > at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$4(OrcFileFormat.scala:189) > at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$4$adapted(OrcFileFormat.scala:188) > at scala.Option.foreach(Option.scala:407) > at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$1(OrcFileFormat.scala:188) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:116) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:169) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:93) > at > org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:503) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.columnartorow_nextBatch_0$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755) > at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) > at > org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:177) > at > org.apache.sp
[jira] [Commented] (SPARK-35700) spark.sql.orc.filterPushdown not working with Spark 3.1.1 for tables with varchar data type
[ https://issues.apache.org/jira/browse/SPARK-35700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366564#comment-17366564 ] Apache Spark commented on SPARK-35700: -- User 'yaooqinn' has created a pull request for this issue: https://github.com/apache/spark/pull/33001 > spark.sql.orc.filterPushdown not working with Spark 3.1.1 for tables with > varchar data type > --- > > Key: SPARK-35700 > URL: https://issues.apache.org/jira/browse/SPARK-35700 > Project: Spark > Issue Type: Bug > Components: Kubernetes, PySpark, Spark Core >Affects Versions: 3.1.1 > Environment: Spark 3.1.1 on K8S >Reporter: Arghya Saha >Priority: Major > > We are not able to upgrade to Spark 3.1.1 from Spark 2.4.x as the join on > varchar column is failing which is unexpected and works on Spark 3.0.0. We > are trying to run it on Spark 3.1.1 (MR 3.2) on K8s > Below is my use case: > Tables are external hive table and files are stored as ORC. We do have > varchar column and when we are trying to perform join on varchar column we > are getting the exception. > As I understand Spark 3.1.1 have introduced varchar data type but seems its > not well tested with ORC and does not have backward compatibility. I have > even tried with below config without luck > *spark.sql.legacy.charVarcharAsString: "true"* > We are not getting the error when *spark.sql.orc.filterPushdown=false* > Below is the code: Here col1 is of type varchar(32) in hive > {code:java} > df = spark.sql("select col1, col2 from table1 a inner join table2 on b > (a.col1=b.col1 and a.col2 > b.col2 )") > df.write.format("orc").option("compression", > "zlib").mode("Append").save("") > {code} > Below is the error: > > {code:java} > Job aborted due to stage failure: Task 43 in stage 5.0 failed 4 times, most > recent failure: Lost task 43.3 in stage 5.0 (TID 524) (10.219.36.64 executor > 5): java.lang.UnsupportedOperationException: DataType: varchar(32) > at > org.apache.spark.sql.execution.datasources.orc.OrcFilters$.getPredicateLeafType(OrcFilters.scala:150) > at > org.apache.spark.sql.execution.datasources.orc.OrcFilters$.getType$1(OrcFilters.scala:222) > at > org.apache.spark.sql.execution.datasources.orc.OrcFilters$.buildLeafSearchArgument(OrcFilters.scala:266) > at > org.apache.spark.sql.execution.datasources.orc.OrcFilters$.convertibleFiltersHelper$1(OrcFilters.scala:132) > at > org.apache.spark.sql.execution.datasources.orc.OrcFilters$.$anonfun$convertibleFilters$4(OrcFilters.scala:135) > at > scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245) > at scala.collection.immutable.List.foreach(List.scala:392) > at scala.collection.TraversableLike.flatMap(TraversableLike.scala:245) > at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:242) > at scala.collection.immutable.List.flatMap(List.scala:355) > at > org.apache.spark.sql.execution.datasources.orc.OrcFilters$.convertibleFilters(OrcFilters.scala:134) > at > org.apache.spark.sql.execution.datasources.orc.OrcFilters$.createFilter(OrcFilters.scala:73) > at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$4(OrcFileFormat.scala:189) > at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$4$adapted(OrcFileFormat.scala:188) > at scala.Option.foreach(Option.scala:407) > at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$1(OrcFileFormat.scala:188) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:116) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:169) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:93) > at > org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:503) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.columnartorow_nextBatch_0$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755) > at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) > at > org.apache.spark.shuffl
[jira] [Assigned] (SPARK-35700) spark.sql.orc.filterPushdown not working with Spark 3.1.1 for tables with varchar data type
[ https://issues.apache.org/jira/browse/SPARK-35700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35700: Assignee: (was: Apache Spark) > spark.sql.orc.filterPushdown not working with Spark 3.1.1 for tables with > varchar data type > --- > > Key: SPARK-35700 > URL: https://issues.apache.org/jira/browse/SPARK-35700 > Project: Spark > Issue Type: Bug > Components: Kubernetes, PySpark, Spark Core >Affects Versions: 3.1.1 > Environment: Spark 3.1.1 on K8S >Reporter: Arghya Saha >Priority: Major > > We are not able to upgrade to Spark 3.1.1 from Spark 2.4.x as the join on > varchar column is failing which is unexpected and works on Spark 3.0.0. We > are trying to run it on Spark 3.1.1 (MR 3.2) on K8s > Below is my use case: > Tables are external hive table and files are stored as ORC. We do have > varchar column and when we are trying to perform join on varchar column we > are getting the exception. > As I understand Spark 3.1.1 have introduced varchar data type but seems its > not well tested with ORC and does not have backward compatibility. I have > even tried with below config without luck > *spark.sql.legacy.charVarcharAsString: "true"* > We are not getting the error when *spark.sql.orc.filterPushdown=false* > Below is the code: Here col1 is of type varchar(32) in hive > {code:java} > df = spark.sql("select col1, col2 from table1 a inner join table2 on b > (a.col1=b.col1 and a.col2 > b.col2 )") > df.write.format("orc").option("compression", > "zlib").mode("Append").save("") > {code} > Below is the error: > > {code:java} > Job aborted due to stage failure: Task 43 in stage 5.0 failed 4 times, most > recent failure: Lost task 43.3 in stage 5.0 (TID 524) (10.219.36.64 executor > 5): java.lang.UnsupportedOperationException: DataType: varchar(32) > at > org.apache.spark.sql.execution.datasources.orc.OrcFilters$.getPredicateLeafType(OrcFilters.scala:150) > at > org.apache.spark.sql.execution.datasources.orc.OrcFilters$.getType$1(OrcFilters.scala:222) > at > org.apache.spark.sql.execution.datasources.orc.OrcFilters$.buildLeafSearchArgument(OrcFilters.scala:266) > at > org.apache.spark.sql.execution.datasources.orc.OrcFilters$.convertibleFiltersHelper$1(OrcFilters.scala:132) > at > org.apache.spark.sql.execution.datasources.orc.OrcFilters$.$anonfun$convertibleFilters$4(OrcFilters.scala:135) > at > scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245) > at scala.collection.immutable.List.foreach(List.scala:392) > at scala.collection.TraversableLike.flatMap(TraversableLike.scala:245) > at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:242) > at scala.collection.immutable.List.flatMap(List.scala:355) > at > org.apache.spark.sql.execution.datasources.orc.OrcFilters$.convertibleFilters(OrcFilters.scala:134) > at > org.apache.spark.sql.execution.datasources.orc.OrcFilters$.createFilter(OrcFilters.scala:73) > at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$4(OrcFileFormat.scala:189) > at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$4$adapted(OrcFileFormat.scala:188) > at scala.Option.foreach(Option.scala:407) > at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$1(OrcFileFormat.scala:188) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:116) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:169) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:93) > at > org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:503) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.columnartorow_nextBatch_0$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755) > at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) > at > org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:177) > at > org.apache.spark.shuffle.ShuffleWriteP
[jira] [Resolved] (SPARK-34565) Collapse Window nodes with Project between them
[ https://issues.apache.org/jira/browse/SPARK-34565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro resolved SPARK-34565. -- Fix Version/s: 3.2.0 Assignee: Tanel Kiis Resolution: Fixed Resolved by https://github.com/apache/spark/pull/31677 > Collapse Window nodes with Project between them > --- > > Key: SPARK-34565 > URL: https://issues.apache.org/jira/browse/SPARK-34565 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Tanel Kiis >Assignee: Tanel Kiis >Priority: Major > Fix For: 3.2.0 > > > The CollapseWindow optimizer rule can be improved to also collapse Window > nodes, that have a Project between them. This sort of Window - Project - > Window chains will happen when chaining the dataframe.withColumn calls. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35391) Memory leak in ExecutorAllocationListener breaks dynamic allocation under high load
[ https://issues.apache.org/jira/browse/SPARK-35391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-35391. --- Fix Version/s: 3.1.3 3.2.0 Resolution: Fixed > Memory leak in ExecutorAllocationListener breaks dynamic allocation under > high load > --- > > Key: SPARK-35391 > URL: https://issues.apache.org/jira/browse/SPARK-35391 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.1 >Reporter: Vasily Kolpakov >Priority: Major > Fix For: 3.2.0, 3.1.3 > > > ExecutorAllocationListener doesn't clean up data properly. > ExecutorAllocationListener performs progressively slower and eventually fails > to process events in time. > There are two problems: > * a bug (typo?) in totalRunningTasksPerResourceProfile() method > getOrElseUpdate() is used instead of getOrElse(). > If spark-dynamic-executor-allocation thread calls schedule() after a > SparkListenerTaskEnd event for the last task in a stage > but before SparkListenerStageCompleted event for the stage, then > stageAttemptToNumRunningTask will not be cleaned up properly. > * resourceProfileIdToStageAttempt clean-up is broken > If a SparkListenerTaskEnd event for the last task in a stage was processed > before SparkListenerStageCompleted for that stage, > then resourceProfileIdToStageAttempt will not be cleaned up properly. > > Bugs were introduced in this commit: > https://github.com/apache/spark/commit/496f6ac86001d284cbfb7488a63dd3a168919c0f > . > Steps to reproduce: > # Launch standalone master and worker with > 'spark.shuffle.service.enabled=true' > # Run spark-shell with --conf 'spark.shuffle.service.enabled=true' --conf > 'spark.dynamicAllocation.enabled=true' and paste this script > {code:java} > for (_ <- 0 until 10) { > Seq(1, 2, 3, 4, 5).toDF.repartition(100).agg("value" -> "sum").show() > } > {code} > # make a heap dump and examine > ExecutorAllocationListener.totalRunningTasksPerResourceProfile and > ExecutorAllocationListener.resourceProfileIdToStageAttempt fields > Expected: totalRunningTasksPerResourceProfile and > resourceProfileIdToStageAttempt(defaultResourceProfileId) are empty > Actual: totalRunningTasksPerResourceProfile and > resourceProfileIdToStageAttempt(defaultResourceProfileId) contain > non-relevant data > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35391) Memory leak in ExecutorAllocationListener breaks dynamic allocation under high load
[ https://issues.apache.org/jira/browse/SPARK-35391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves reassigned SPARK-35391: - Assignee: Vasily Kolpakov > Memory leak in ExecutorAllocationListener breaks dynamic allocation under > high load > --- > > Key: SPARK-35391 > URL: https://issues.apache.org/jira/browse/SPARK-35391 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.1 >Reporter: Vasily Kolpakov >Assignee: Vasily Kolpakov >Priority: Major > Fix For: 3.2.0, 3.1.3 > > > ExecutorAllocationListener doesn't clean up data properly. > ExecutorAllocationListener performs progressively slower and eventually fails > to process events in time. > There are two problems: > * a bug (typo?) in totalRunningTasksPerResourceProfile() method > getOrElseUpdate() is used instead of getOrElse(). > If spark-dynamic-executor-allocation thread calls schedule() after a > SparkListenerTaskEnd event for the last task in a stage > but before SparkListenerStageCompleted event for the stage, then > stageAttemptToNumRunningTask will not be cleaned up properly. > * resourceProfileIdToStageAttempt clean-up is broken > If a SparkListenerTaskEnd event for the last task in a stage was processed > before SparkListenerStageCompleted for that stage, > then resourceProfileIdToStageAttempt will not be cleaned up properly. > > Bugs were introduced in this commit: > https://github.com/apache/spark/commit/496f6ac86001d284cbfb7488a63dd3a168919c0f > . > Steps to reproduce: > # Launch standalone master and worker with > 'spark.shuffle.service.enabled=true' > # Run spark-shell with --conf 'spark.shuffle.service.enabled=true' --conf > 'spark.dynamicAllocation.enabled=true' and paste this script > {code:java} > for (_ <- 0 until 10) { > Seq(1, 2, 3, 4, 5).toDF.repartition(100).agg("value" -> "sum").show() > } > {code} > # make a heap dump and examine > ExecutorAllocationListener.totalRunningTasksPerResourceProfile and > ExecutorAllocationListener.resourceProfileIdToStageAttempt fields > Expected: totalRunningTasksPerResourceProfile and > resourceProfileIdToStageAttempt(defaultResourceProfileId) are empty > Actual: totalRunningTasksPerResourceProfile and > resourceProfileIdToStageAttempt(defaultResourceProfileId) contain > non-relevant data > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35775) Check all year-month interval types in aggregate expressions
[ https://issues.apache.org/jira/browse/SPARK-35775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-35775. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 32988 [https://github.com/apache/spark/pull/32988] > Check all year-month interval types in aggregate expressions > > > Key: SPARK-35775 > URL: https://issues.apache.org/jira/browse/SPARK-35775 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Assignee: Kousuke Saruta >Priority: Major > Fix For: 3.2.0 > > > Check all supported combination of YearMonthIntervalType fields in the > aggregate expression: sum and avg. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35775) Check all year-month interval types in aggregate expressions
[ https://issues.apache.org/jira/browse/SPARK-35775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-35775: Assignee: Kousuke Saruta > Check all year-month interval types in aggregate expressions > > > Key: SPARK-35775 > URL: https://issues.apache.org/jira/browse/SPARK-35775 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Assignee: Kousuke Saruta >Priority: Major > > Check all supported combination of YearMonthIntervalType fields in the > aggregate expression: sum and avg. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35842) Ignore all ".idea" directories
[ https://issues.apache.org/jira/browse/SPARK-35842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-35842. Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 32998 [https://github.com/apache/spark/pull/32998] > Ignore all ".idea" directories > --- > > Key: SPARK-35842 > URL: https://issues.apache.org/jira/browse/SPARK-35842 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Minor > Fix For: 3.2.0 > > > After https://github.com/apache/spark/pull/32337, all the `.idea/` in > submodules are treated as git difference again. > For example, when I open the project `resource-managers/yarn/` with IntelliJ, > the git status becomes > {code:java} > Untracked files: > (use "git add ..." to include in what will be committed) > resource-managers/yarn/.idea/ > {code} > The same issue happens on opening `sql/hive-thriftserver/` with IntelliJ. > We should ignore all the ".idea" directories instead of the one under the > root path. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35787) Does anyone has performance issue after upgrade from 3.0 to 3.1?
[ https://issues.apache.org/jira/browse/SPARK-35787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366632#comment-17366632 ] Vidmantas Drasutis commented on SPARK-35787: More details about our case - Scala code. We loaded GEOjson file with polygons. In data we have GEOHash|[https://en.wikipedia.org/wiki/Geohash#:~:text=Geohash%20is%20a%20public%20domain,string%20of%20letters%20and%20digits.&text=Geohashing%20guarantees%20that%20the%20longer,spatially%20closer%20they%20are%20together],] From those GEOhash`es we get central point of geohash and checking if that point is within loaded polygons using "org.locationtech.jts.geom" library. {code:java} private def doWork(context: QueryContext, before: Stage.State, root: Node[PolygonDefinition], hitCacheEnabled : Boolean): Stage.State = { val getPolygons = udf((geohash: String) => { PolygonHitTest.getPolygonsForGeoHashFromHierarchy(geohash, root, hitCacheEnabled) }) val result = for (input <- before.df) yield { input .withColumn(polygons, getPolygons(col(geohash))) .withColumn(polygon, explode(col(polygons))) .drop(geohash, polygons) } SparkDebug.show(result, "Polygon mapping") before.copy(df = result) } def getPolygonsForGeoHashFromHierarchy(geoHash: String, root: Node[PolygonDefinition], hitCacheEnabled: Boolean = false): Seq[String] = { val latLong = GeoHash.decodeHash(geoHash) val point = geometryFactory.createPoint(new Coordinate(latLong.getLon, latLong.getLat)) root.data.id :: getPolygonsForPointFromHierarchyV2(point, root.children.toList, hitCacheEnabled) } private def getPolygonsForPointFromHierarchyV2(point: Point, hierarchy: List[Node[PolygonDefinition]], hitCacheEnabled: Boolean): List[String] = { for (node <- hierarchy) { if (node.data.isPointWithinBoundingBox(point)) { val hits = getPolygonsForPointFromHierarchyV2(point, node.children.toList, hitCacheEnabled) if (hits.isEmpty) { if (node.data.isPointWithinPolygon(point, hitCacheEnabled)) { return List(node.data.id) } } else { return node.data.id :: hits } } } return Nil } private[this] lazy val hitCache: mutable.Set[Point] = java.util.concurrent.ConcurrentHashMap.newKeySet[Point]().asScala private[this] lazy val missCache: mutable.Set[Point] = java.util.concurrent.ConcurrentHashMap.newKeySet[Point]().asScala def isPointWithinPolygon(point: Point, hitCacheEnabled: Boolean): Boolean = { if (hitCacheEnabled) { if (hitCache.contains(point)) { true } else if (missCache.contains(point)) { false } else { val hit = point.within(geometry) if (hit) { hitCache.add(point) } else { missCache.add(point) } hit } } else { point.within(geometry) } } {code} *Note:* we have option to enable some cashing (hitCacheEnabled) we had always turned off as processing requires more memory and this caching not always gives any benefit. But when I enabled this polygoHitTest cashing - the query performance of new and old Spark was/is same - fast. But... still we have other product parts where we do not have the way what to tweak and seeing slowdown. > Does anyone has performance issue after upgrade from 3.0 to 3.1? > > > Key: SPARK-35787 > URL: https://issues.apache.org/jira/browse/SPARK-35787 > Project: Spark > Issue Type: Question > Components: Spark Core >Affects Versions: 3.1.2 >Reporter: Vidmantas Drasutis >Priority: Major > Attachments: Execution_plan_difference.png, > spark_3.0_execution_plan_details_fast.txt, > spark_3.1_execution_plan_details_slow.txt, spark_job_info_1.png, > spark_job_info_2.png > > > Hello. > > We had using spark 3.0.2 and query was executed in ~100 seconds. > After we upgraded Spark to 3.1.1 (tried also 3.1.2 - same, slow performance) > - our query execution time started taking ~260 seconds it is huge increase > 250-300 % of execution time increase. > > We tried quite simple query. > In query we using UDF (*org.apache.spark.sql.functions*) > ) - which explodes data and do polygon hit test. Nothing changed in our code > from query perspective. > It is 1 VM box cluster > > Maybe anyone faced similar issue? > Attached some details from spark dashboard. > > *Looks like it is UDF related slowdown. As queries which does not use UDF`s > performance is same and which uses UDFs - starting from 3.1 performance > decreased.* > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35843) unify the file name between batch and streaming file writer
Wenchen Fan created SPARK-35843: --- Summary: unify the file name between batch and streaming file writer Key: SPARK-35843 URL: https://issues.apache.org/jira/browse/SPARK-35843 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: Wenchen Fan -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35831) Handle PathOperationException in copyFileToRemote with force on the same src and dest
[ https://issues.apache.org/jira/browse/SPARK-35831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang reassigned SPARK-35831: -- Assignee: Dongjoon Hyun > Handle PathOperationException in copyFileToRemote with force on the same src > and dest > - > > Key: SPARK-35831 > URL: https://issues.apache.org/jira/browse/SPARK-35831 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35831) Handle PathOperationException in copyFileToRemote with force on the same src and dest
[ https://issues.apache.org/jira/browse/SPARK-35831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-35831. Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 32983 [https://github.com/apache/spark/pull/32983] > Handle PathOperationException in copyFileToRemote with force on the same src > and dest > - > > Key: SPARK-35831 > URL: https://issues.apache.org/jira/browse/SPARK-35831 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.2.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35843) unify the file name between batch and streaming file writer
[ https://issues.apache.org/jira/browse/SPARK-35843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366685#comment-17366685 ] Apache Spark commented on SPARK-35843: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/33002 > unify the file name between batch and streaming file writer > --- > > Key: SPARK-35843 > URL: https://issues.apache.org/jira/browse/SPARK-35843 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35843) unify the file name between batch and streaming file writer
[ https://issues.apache.org/jira/browse/SPARK-35843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35843: Assignee: (was: Apache Spark) > unify the file name between batch and streaming file writer > --- > > Key: SPARK-35843 > URL: https://issues.apache.org/jira/browse/SPARK-35843 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35843) unify the file name between batch and streaming file writer
[ https://issues.apache.org/jira/browse/SPARK-35843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35843: Assignee: Apache Spark > unify the file name between batch and streaming file writer > --- > > Key: SPARK-35843 > URL: https://issues.apache.org/jira/browse/SPARK-35843 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35843) unify the file name between batch and streaming file writer
[ https://issues.apache.org/jira/browse/SPARK-35843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366688#comment-17366688 ] Apache Spark commented on SPARK-35843: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/33002 > unify the file name between batch and streaming file writer > --- > > Key: SPARK-35843 > URL: https://issues.apache.org/jira/browse/SPARK-35843 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35844) Add hadoop-cloud profile to PUBLISH_PROFILES
Dongjoon Hyun created SPARK-35844: - Summary: Add hadoop-cloud profile to PUBLISH_PROFILES Key: SPARK-35844 URL: https://issues.apache.org/jira/browse/SPARK-35844 Project: Spark Issue Type: Improvement Components: Project Infra Affects Versions: 3.2.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35844) Add hadoop-cloud profile to PUBLISH_PROFILES
[ https://issues.apache.org/jira/browse/SPARK-35844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35844: Assignee: Apache Spark > Add hadoop-cloud profile to PUBLISH_PROFILES > > > Key: SPARK-35844 > URL: https://issues.apache.org/jira/browse/SPARK-35844 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35844) Add hadoop-cloud profile to PUBLISH_PROFILES
[ https://issues.apache.org/jira/browse/SPARK-35844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366714#comment-17366714 ] Apache Spark commented on SPARK-35844: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/33003 > Add hadoop-cloud profile to PUBLISH_PROFILES > > > Key: SPARK-35844 > URL: https://issues.apache.org/jira/browse/SPARK-35844 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35844) Add hadoop-cloud profile to PUBLISH_PROFILES
[ https://issues.apache.org/jira/browse/SPARK-35844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35844: Assignee: (was: Apache Spark) > Add hadoop-cloud profile to PUBLISH_PROFILES > > > Key: SPARK-35844 > URL: https://issues.apache.org/jira/browse/SPARK-35844 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35845) OuterReference resolution should reject ambiguous column names
Wenchen Fan created SPARK-35845: --- Summary: OuterReference resolution should reject ambiguous column names Key: SPARK-35845 URL: https://issues.apache.org/jira/browse/SPARK-35845 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.0 Reporter: Wenchen Fan Assignee: Wenchen Fan -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35845) OuterReference resolution should reject ambiguous column names
[ https://issues.apache.org/jira/browse/SPARK-35845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35845: Assignee: Apache Spark (was: Wenchen Fan) > OuterReference resolution should reject ambiguous column names > -- > > Key: SPARK-35845 > URL: https://issues.apache.org/jira/browse/SPARK-35845 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35845) OuterReference resolution should reject ambiguous column names
[ https://issues.apache.org/jira/browse/SPARK-35845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35845: Assignee: Wenchen Fan (was: Apache Spark) > OuterReference resolution should reject ambiguous column names > -- > > Key: SPARK-35845 > URL: https://issues.apache.org/jira/browse/SPARK-35845 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35845) OuterReference resolution should reject ambiguous column names
[ https://issues.apache.org/jira/browse/SPARK-35845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366773#comment-17366773 ] Apache Spark commented on SPARK-35845: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/33004 > OuterReference resolution should reject ambiguous column names > -- > > Key: SPARK-35845 > URL: https://issues.apache.org/jira/browse/SPARK-35845 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35844) Add hadoop-cloud profile to PUBLISH_PROFILES
[ https://issues.apache.org/jira/browse/SPARK-35844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-35844: - Assignee: Dongjoon Hyun > Add hadoop-cloud profile to PUBLISH_PROFILES > > > Key: SPARK-35844 > URL: https://issues.apache.org/jira/browse/SPARK-35844 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35844) Add hadoop-cloud profile to PUBLISH_PROFILES
[ https://issues.apache.org/jira/browse/SPARK-35844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-35844. --- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 33003 [https://github.com/apache/spark/pull/33003] > Add hadoop-cloud profile to PUBLISH_PROFILES > > > Key: SPARK-35844 > URL: https://issues.apache.org/jira/browse/SPARK-35844 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.2.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35836) Remove reference to spark.shuffle.push.based.enabled in ShuffleBlockPusherSuite
[ https://issues.apache.org/jira/browse/SPARK-35836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan resolved SPARK-35836. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 32992 [https://github.com/apache/spark/pull/32992] > Remove reference to spark.shuffle.push.based.enabled in > ShuffleBlockPusherSuite > --- > > Key: SPARK-35836 > URL: https://issues.apache.org/jira/browse/SPARK-35836 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core >Affects Versions: 3.1.0 >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Trivial > Fix For: 3.2.0 > > > The test suite for ShuffleBlockPusherSuite was added with SPARK-32917 and in > this suite, the configuration for push-based shuffle is incorrectly > referenced as {{spark.shuffle.push.based.enabled}}. We need to remove this > config from here. > {{ShuffleBlockPusher}} is created only when push based shuffle is enabled and > this suite is for {{ShuffleBlockPusher}}, so no other change is required. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35836) Remove reference to spark.shuffle.push.based.enabled in ShuffleBlockPusherSuite
[ https://issues.apache.org/jira/browse/SPARK-35836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan reassigned SPARK-35836: --- Assignee: Chandni Singh > Remove reference to spark.shuffle.push.based.enabled in > ShuffleBlockPusherSuite > --- > > Key: SPARK-35836 > URL: https://issues.apache.org/jira/browse/SPARK-35836 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core >Affects Versions: 3.1.0 >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Trivial > > The test suite for ShuffleBlockPusherSuite was added with SPARK-32917 and in > this suite, the configuration for push-based shuffle is incorrectly > referenced as {{spark.shuffle.push.based.enabled}}. We need to remove this > config from here. > {{ShuffleBlockPusher}} is created only when push based shuffle is enabled and > this suite is for {{ShuffleBlockPusher}}, so no other change is required. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35614) Make the conversion to pandas data-type-based for ExtensionDtypes
[ https://issues.apache.org/jira/browse/SPARK-35614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-35614. --- Fix Version/s: 3.2.0 Assignee: Xinrong Meng Resolution: Fixed Issue resolved by pull request 32910 https://github.com/apache/spark/pull/32910 > Make the conversion to pandas data-type-based for ExtensionDtypes > - > > Key: SPARK-35614 > URL: https://issues.apache.org/jira/browse/SPARK-35614 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.2.0 > > > The conversion from/to pandas includes logic for checking ExtensionDtypes > data types and behaving accordingly. > That makes code hard to change or maintain. > We want to introduce the Ops class per ExtensionDtypes data type, and then > make the conversion from/to pandas data-type-based for ExtensionDtypes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35846) Introduce ParquetReadState to track various states while reading a Parquet column chunk
Chao Sun created SPARK-35846: Summary: Introduce ParquetReadState to track various states while reading a Parquet column chunk Key: SPARK-35846 URL: https://issues.apache.org/jira/browse/SPARK-35846 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Chao Sun -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35846) Introduce ParquetReadState to track various states while reading a Parquet column chunk
[ https://issues.apache.org/jira/browse/SPARK-35846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-35846: - Description: This is mostly refactoring work to complete SPARK-34859 > Introduce ParquetReadState to track various states while reading a Parquet > column chunk > --- > > Key: SPARK-35846 > URL: https://issues.apache.org/jira/browse/SPARK-35846 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Chao Sun >Priority: Minor > > This is mostly refactoring work to complete SPARK-34859 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35847) Manage InternalField in DataTypeOps.isnull
Takuya Ueshin created SPARK-35847: - Summary: Manage InternalField in DataTypeOps.isnull Key: SPARK-35847 URL: https://issues.apache.org/jira/browse/SPARK-35847 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.2.0 Reporter: Takuya Ueshin The result of {{DataTypeOps.isnull}} must always be non-nullable boolean. We should manage {{InternalField}} for this case. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35848) Spark Bloom Filter throws OutOfMemoryError
Sai Polisetty created SPARK-35848: - Summary: Spark Bloom Filter throws OutOfMemoryError Key: SPARK-35848 URL: https://issues.apache.org/jira/browse/SPARK-35848 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.0.0, 2.0.0 Reporter: Sai Polisetty When the Bloom filter stat function is invoked on a large dataframe that requires a BitArray of size >2GB, it will result in a {color:#55}java.lang.OutOfMemoryError{color}. As mentioned in a similar bug, this is due to the zero value passed to treeAggrete. Irrespective of spark.serializer value, this will be serialized using JavaSerializer which has a hard limit of 2GB. Using a solution similar to SPARK-26228 and setting spark.serializer to KryoSerializer can avoid this error. Steps to reproduce: {{val df = List.range(0, 10).toDF("Id")}}{{val expectedNumItems = 20L // 2 billion}} {{val fpp = 0.03}} {{val bf = df.stat.bloomFilter("Id", expectedNumItems, fpp)}} Stack trace: {color:#55}java.lang.OutOfMemoryError{color} {color:#55} at java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:123) at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:117) at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153) at org.apache.spark.util.ByteBufferOutputStream.write(ByteBufferOutputStream.scala:41) at java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1877) at java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1786) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1189) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44) at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:101) at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:413) at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:406) at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:162) at org.apache.spark.SparkContext.clean(SparkContext.scala:2604) at org.apache.spark.rdd.PairRDDFunctions.$anonfun$combineByKeyWithClassTag$1(PairRDDFunctions.scala:86) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:165) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:125) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:395) at org.apache.spark.rdd.PairRDDFunctions.combineByKeyWithClassTag(PairRDDFunctions.scala:75) at org.apache.spark.rdd.PairRDDFunctions.$anonfun$foldByKey$1(PairRDDFunctions.scala:218) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:165) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:125) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:395) at org.apache.spark.rdd.PairRDDFunctions.foldByKey(PairRDDFunctions.scala:207) at org.apache.spark.rdd.RDD.$anonfun$treeAggregate$1(RDD.scala:1224) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:165) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:125) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:395) at org.apache.spark.rdd.RDD.treeAggregate(RDD.scala:1203) at org.apache.spark.sql.DataFrameStatFunctions.buildBloomFilter(DataFrameStatFunctions.scala:602) at org.apache.spark.sql.DataFrameStatFunctions.bloomFilter(DataFrameStatFunctions.scala:541){color} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35847) Manage InternalField in DataTypeOps.isnull
[ https://issues.apache.org/jira/browse/SPARK-35847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366850#comment-17366850 ] Apache Spark commented on SPARK-35847: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/33005 > Manage InternalField in DataTypeOps.isnull > -- > > Key: SPARK-35847 > URL: https://issues.apache.org/jira/browse/SPARK-35847 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Priority: Major > > The result of {{DataTypeOps.isnull}} must always be non-nullable boolean. > We should manage {{InternalField}} for this case. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35847) Manage InternalField in DataTypeOps.isnull
[ https://issues.apache.org/jira/browse/SPARK-35847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35847: Assignee: (was: Apache Spark) > Manage InternalField in DataTypeOps.isnull > -- > > Key: SPARK-35847 > URL: https://issues.apache.org/jira/browse/SPARK-35847 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Priority: Major > > The result of {{DataTypeOps.isnull}} must always be non-nullable boolean. > We should manage {{InternalField}} for this case. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35847) Manage InternalField in DataTypeOps.isnull
[ https://issues.apache.org/jira/browse/SPARK-35847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35847: Assignee: Apache Spark > Manage InternalField in DataTypeOps.isnull > -- > > Key: SPARK-35847 > URL: https://issues.apache.org/jira/browse/SPARK-35847 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Assignee: Apache Spark >Priority: Major > > The result of {{DataTypeOps.isnull}} must always be non-nullable boolean. > We should manage {{InternalField}} for this case. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35846) Introduce ParquetReadState to track various states while reading a Parquet column chunk
[ https://issues.apache.org/jira/browse/SPARK-35846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35846: Assignee: (was: Apache Spark) > Introduce ParquetReadState to track various states while reading a Parquet > column chunk > --- > > Key: SPARK-35846 > URL: https://issues.apache.org/jira/browse/SPARK-35846 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Chao Sun >Priority: Minor > > This is mostly refactoring work to complete SPARK-34859 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35846) Introduce ParquetReadState to track various states while reading a Parquet column chunk
[ https://issues.apache.org/jira/browse/SPARK-35846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35846: Assignee: Apache Spark > Introduce ParquetReadState to track various states while reading a Parquet > column chunk > --- > > Key: SPARK-35846 > URL: https://issues.apache.org/jira/browse/SPARK-35846 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Chao Sun >Assignee: Apache Spark >Priority: Minor > > This is mostly refactoring work to complete SPARK-34859 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35846) Introduce ParquetReadState to track various states while reading a Parquet column chunk
[ https://issues.apache.org/jira/browse/SPARK-35846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366852#comment-17366852 ] Apache Spark commented on SPARK-35846: -- User 'sunchao' has created a pull request for this issue: https://github.com/apache/spark/pull/33006 > Introduce ParquetReadState to track various states while reading a Parquet > column chunk > --- > > Key: SPARK-35846 > URL: https://issues.apache.org/jira/browse/SPARK-35846 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Chao Sun >Priority: Minor > > This is mostly refactoring work to complete SPARK-34859 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35790) Spark Package Python Import does not work for namespace packages
[ https://issues.apache.org/jira/browse/SPARK-35790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366890#comment-17366890 ] Mark Hamilton commented on SPARK-35790: --- Was able to properly do this and found this was a user-error. Please feel free to mark as resolved > Spark Package Python Import does not work for namespace packages > > > Key: SPARK-35790 > URL: https://issues.apache.org/jira/browse/SPARK-35790 > Project: Spark > Issue Type: Bug > Components: Build, PySpark, Spark Submit >Affects Versions: 3.0.0, 3.1.2 >Reporter: Mark Hamilton >Priority: Major > > If one includes python files within several jars that comprise a python > "namespace package" > [https://www.python.org/dev/peps/pep-0420/] > Then only one of packages is imported -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35790) Spark Package Python Import does not work for namespace packages
[ https://issues.apache.org/jira/browse/SPARK-35790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Hamilton updated SPARK-35790: -- Priority: Trivial (was: Major) > Spark Package Python Import does not work for namespace packages > > > Key: SPARK-35790 > URL: https://issues.apache.org/jira/browse/SPARK-35790 > Project: Spark > Issue Type: Bug > Components: Build, PySpark, Spark Submit >Affects Versions: 3.0.0, 3.1.2 >Reporter: Mark Hamilton >Priority: Trivial > > If one includes python files within several jars that comprise a python > "namespace package" > [https://www.python.org/dev/peps/pep-0420/] > Then only one of packages is imported -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35744) Performance degradation in avro SpecificRecordBuilders
[ https://issues.apache.org/jira/browse/SPARK-35744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366892#comment-17366892 ] Erik Krogen commented on SPARK-35744: - [~steven.aerts] going a bit off topic from this JIRA, but out of curiosity -- is your work based off of SPARK-25789 / [PR #22878|https://github.com/apache/spark/pull/22878]? We (LinkedIn) also maintain an {{AvroEncoder}} for {{SpecificRecord}} classes which is based off of that PR. We've also been planning to make another effort to push this upstream since the attempt in #22878 eventually stalled. I'd be interested in learning more about your work and potentially collaborating here. > Performance degradation in avro SpecificRecordBuilders > -- > > Key: SPARK-35744 > URL: https://issues.apache.org/jira/browse/SPARK-35744 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: Steven Aerts >Priority: Minor > > Creating this bug to let you know that when we tested out spark 3.2.0 we saw > a significant performance degradation where our code was handling Avro > Specific Record objects. This slowed down some of our jobs with a factor 4. > Spark 3.2.0 upsteps the avro version from 1.8.2 to 1.10.2. > The degradation was caused by a change introduced in avro 1.9.0. This change > degrades performance when creating avro specific records in certain > classloader topologies, like the ones used in spark. > We notified and [proposed|https://github.com/apache/avro/pull/1253] a simple > fix upstream in the avro project. (Links contain more details) > It is unclear for us how many other projects are using avro specific records > in a spark context and will be impacted by this degradation. > Feel free to close this issue if you think this issue is too much of a > corner case. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35344) Support creating a Column of numpy literal value Support creating a Column of numpy literal value in pandas-on-Spark
[ https://issues.apache.org/jira/browse/SPARK-35344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-35344: - Summary: Support creating a Column of numpy literal value Support creating a Column of numpy literal value in pandas-on-Spark (was: Make conversion from/to literals data-type-based) > Support creating a Column of numpy literal value Support creating a Column of > numpy literal value in pandas-on-Spark > > > Key: SPARK-35344 > URL: https://issues.apache.org/jira/browse/SPARK-35344 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Priority: Major > > Aim to achieve {{F}}{{*.*}}{{lit(np}}{{*.*}}{{int64(1)).}} > We can define {{def lit(literal) -> Column: }}under IntegralOps for example. > OR we can define in python/pyspark/pandas/spark/functions.py. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35344) Support creating a Column of numpy literal value Support creating a Column of numpy literal value in pandas-on-Spark
[ https://issues.apache.org/jira/browse/SPARK-35344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-35344: - Description: Spark (`lit` function defined in `pyspark.sql.functions`) doesn't support creating a Column out of numpy literal value. So `lit` function defined in `pyspark.pandas.spark.functions` should be adjusted in order to support that in pandas-on-Spark. was: Aim to achieve {{F}}{{*.*}}{{lit(np}}{{*.*}}{{int64(1)).}} We can define {{def lit(literal) -> Column: }}under IntegralOps for example. OR we can define in python/pyspark/pandas/spark/functions.py. > Support creating a Column of numpy literal value Support creating a Column of > numpy literal value in pandas-on-Spark > > > Key: SPARK-35344 > URL: https://issues.apache.org/jira/browse/SPARK-35344 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Priority: Major > > Spark (`lit` function defined in `pyspark.sql.functions`) doesn't support > creating a Column out of numpy literal value. > So `lit` function defined in `pyspark.pandas.spark.functions` should be > adjusted in order to support that in pandas-on-Spark. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35344) Support creating a Column of numpy literal value in pandas-on-Spark
[ https://issues.apache.org/jira/browse/SPARK-35344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-35344: - Summary: Support creating a Column of numpy literal value in pandas-on-Spark (was: Support creating a Column of numpy literal value Support creating a Column of numpy literal value in pandas-on-Spark) > Support creating a Column of numpy literal value in pandas-on-Spark > --- > > Key: SPARK-35344 > URL: https://issues.apache.org/jira/browse/SPARK-35344 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Priority: Major > > Spark (`lit` function defined in `pyspark.sql.functions`) doesn't support > creating a Column out of numpy literal value. > So `lit` function defined in `pyspark.pandas.spark.functions` should be > adjusted in order to support that in pandas-on-Spark. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35344) Support creating a Column of numpy literal value in pandas-on-Spark
[ https://issues.apache.org/jira/browse/SPARK-35344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35344: Assignee: Apache Spark > Support creating a Column of numpy literal value in pandas-on-Spark > --- > > Key: SPARK-35344 > URL: https://issues.apache.org/jira/browse/SPARK-35344 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Assignee: Apache Spark >Priority: Major > > Spark (`lit` function defined in `pyspark.sql.functions`) doesn't support > creating a Column out of numpy literal value. > So `lit` function defined in `pyspark.pandas.spark.functions` should be > adjusted in order to support that in pandas-on-Spark. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35344) Support creating a Column of numpy literal value in pandas-on-Spark
[ https://issues.apache.org/jira/browse/SPARK-35344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366899#comment-17366899 ] Apache Spark commented on SPARK-35344: -- User 'xinrong-databricks' has created a pull request for this issue: https://github.com/apache/spark/pull/32955 > Support creating a Column of numpy literal value in pandas-on-Spark > --- > > Key: SPARK-35344 > URL: https://issues.apache.org/jira/browse/SPARK-35344 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Priority: Major > > Spark (`lit` function defined in `pyspark.sql.functions`) doesn't support > creating a Column out of numpy literal value. > So `lit` function defined in `pyspark.pandas.spark.functions` should be > adjusted in order to support that in pandas-on-Spark. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35344) Support creating a Column of numpy literal value in pandas-on-Spark
[ https://issues.apache.org/jira/browse/SPARK-35344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35344: Assignee: (was: Apache Spark) > Support creating a Column of numpy literal value in pandas-on-Spark > --- > > Key: SPARK-35344 > URL: https://issues.apache.org/jira/browse/SPARK-35344 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Priority: Major > > Spark (`lit` function defined in `pyspark.sql.functions`) doesn't support > creating a Column out of numpy literal value. > So `lit` function defined in `pyspark.pandas.spark.functions` should be > adjusted in order to support that in pandas-on-Spark. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35344) Support creating a Column of numpy literal value in pandas-on-Spark
[ https://issues.apache.org/jira/browse/SPARK-35344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366901#comment-17366901 ] Apache Spark commented on SPARK-35344: -- User 'xinrong-databricks' has created a pull request for this issue: https://github.com/apache/spark/pull/32955 > Support creating a Column of numpy literal value in pandas-on-Spark > --- > > Key: SPARK-35344 > URL: https://issues.apache.org/jira/browse/SPARK-35344 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Priority: Major > > Spark (`lit` function defined in `pyspark.sql.functions`) doesn't support > creating a Column out of numpy literal value. > So `lit` function defined in `pyspark.pandas.spark.functions` should be > adjusted in order to support that in pandas-on-Spark. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35623) Volcano resource manager for Spark on Kubernetes
[ https://issues.apache.org/jira/browse/SPARK-35623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366906#comment-17366906 ] Klaus Ma commented on SPARK-35623: -- That's interesting, I'd like to help on that :) > Volcano resource manager for Spark on Kubernetes > > > Key: SPARK-35623 > URL: https://issues.apache.org/jira/browse/SPARK-35623 > Project: Spark > Issue Type: Brainstorming > Components: Kubernetes >Affects Versions: 3.1.1, 3.1.2 >Reporter: Dipanjan Kailthya >Priority: Minor > Labels: kubernetes, resourcemanager > > Dear Spark Developers, > > Hello from the Netherlands! Posting this here as I still haven't gotten > accepted to post in the spark dev mailing list. > > My team is planning to use spark with Kubernetes support on our shared > (multi-tenant) on premise Kubernetes cluster. However we would like to have > certain scheduling features like fair-share and preemption which as we > understand are not built into the current spark-kubernetes resource manager > yet. We have been working on and are close to a first successful prototype > integration with Volcano ([https://volcano.sh/en/docs/]). Briefly this means > a new resource manager component with lots in common with existing > spark-kubernetes resource manager, but instead of pods it launches Volcano > jobs which delegate the driver and executor pod creation and lifecycle > management to Volcano. We are interested in contributing this to open source, > either directly in spark or as a separate project. > > So, two questions: > > 1. Do the spark maintainers see this as a valuable contribution to the > mainline spark codebase? If so, can we have some guidance on how to publish > the changes? > > 2. Are any other developers / organizations interested to contribute to this > effort? If so, please get in touch. > > Best, > Dipanjan -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35790) Spark Package Python Import does not work for namespace packages
[ https://issues.apache.org/jira/browse/SPARK-35790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366912#comment-17366912 ] Hyukjin Kwon commented on SPARK-35790: -- Thanks for confirmation [~mhamilton] > Spark Package Python Import does not work for namespace packages > > > Key: SPARK-35790 > URL: https://issues.apache.org/jira/browse/SPARK-35790 > Project: Spark > Issue Type: Bug > Components: Build, PySpark, Spark Submit >Affects Versions: 3.0.0, 3.1.2 >Reporter: Mark Hamilton >Priority: Trivial > > If one includes python files within several jars that comprise a python > "namespace package" > [https://www.python.org/dev/peps/pep-0420/] > Then only one of packages is imported -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35790) Spark Package Python Import does not work for namespace packages
[ https://issues.apache.org/jira/browse/SPARK-35790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-35790. -- Resolution: Not A Problem > Spark Package Python Import does not work for namespace packages > > > Key: SPARK-35790 > URL: https://issues.apache.org/jira/browse/SPARK-35790 > Project: Spark > Issue Type: Bug > Components: Build, PySpark, Spark Submit >Affects Versions: 3.0.0, 3.1.2 >Reporter: Mark Hamilton >Priority: Trivial > > If one includes python files within several jars that comprise a python > "namespace package" > [https://www.python.org/dev/peps/pep-0420/] > Then only one of packages is imported -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35849) Make astype data-type-based for DecimalOps
Yikun Jiang created SPARK-35849: --- Summary: Make astype data-type-based for DecimalOps Key: SPARK-35849 URL: https://issues.apache.org/jira/browse/SPARK-35849 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.2.0 Reporter: Yikun Jiang Make DecimalOps astype data-type-based. See more in: [https://github.com/apache/spark/pull/32821#issuecomment-861119905] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35850) Upgrade scala-maven-plugin to 4.5.3
William Hyun created SPARK-35850: Summary: Upgrade scala-maven-plugin to 4.5.3 Key: SPARK-35850 URL: https://issues.apache.org/jira/browse/SPARK-35850 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.2.0 Reporter: William Hyun -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35850) Upgrade scala-maven-plugin to 4.5.3
[ https://issues.apache.org/jira/browse/SPARK-35850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366922#comment-17366922 ] Apache Spark commented on SPARK-35850: -- User 'williamhyun' has created a pull request for this issue: https://github.com/apache/spark/pull/33007 > Upgrade scala-maven-plugin to 4.5.3 > --- > > Key: SPARK-35850 > URL: https://issues.apache.org/jira/browse/SPARK-35850 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.0 >Reporter: William Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org