[jira] [Created] (SPARK-35839) New SQL function: to_timestamp_ntz

2021-06-21 Thread Gengliang Wang (Jira)
Gengliang Wang created SPARK-35839:
--

 Summary: New SQL function: to_timestamp_ntz
 Key: SPARK-35839
 URL: https://issues.apache.org/jira/browse/SPARK-35839
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang


Implement new SQL function: to_timestamp_ntz. It is similar to the built-in 
function to_timestamp, except that the result type is TimestampWithoutTZType.
The naming is from snowflake: 
https://docs.snowflake.com/en/sql-reference/functions/to_timestamp.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35839) New SQL function: to_timestamp_ntz

2021-06-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366427#comment-17366427
 ] 

Apache Spark commented on SPARK-35839:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/32995

> New SQL function: to_timestamp_ntz
> --
>
> Key: SPARK-35839
> URL: https://issues.apache.org/jira/browse/SPARK-35839
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> Implement new SQL function: to_timestamp_ntz. It is similar to the built-in 
> function to_timestamp, except that the result type is TimestampWithoutTZType.
> The naming is from snowflake: 
> https://docs.snowflake.com/en/sql-reference/functions/to_timestamp.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35839) New SQL function: to_timestamp_ntz

2021-06-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35839:


Assignee: Gengliang Wang  (was: Apache Spark)

> New SQL function: to_timestamp_ntz
> --
>
> Key: SPARK-35839
> URL: https://issues.apache.org/jira/browse/SPARK-35839
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> Implement new SQL function: to_timestamp_ntz. It is similar to the built-in 
> function to_timestamp, except that the result type is TimestampWithoutTZType.
> The naming is from snowflake: 
> https://docs.snowflake.com/en/sql-reference/functions/to_timestamp.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35839) New SQL function: to_timestamp_ntz

2021-06-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35839:


Assignee: Apache Spark  (was: Gengliang Wang)

> New SQL function: to_timestamp_ntz
> --
>
> Key: SPARK-35839
> URL: https://issues.apache.org/jira/browse/SPARK-35839
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Major
>
> Implement new SQL function: to_timestamp_ntz. It is similar to the built-in 
> function to_timestamp, except that the result type is TimestampWithoutTZType.
> The naming is from snowflake: 
> https://docs.snowflake.com/en/sql-reference/functions/to_timestamp.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35611) Introduce the strategy on mismatched offset for start offset timestamp on Kafka data source

2021-06-21 Thread L. C. Hsieh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh reassigned SPARK-35611:
---

Assignee: Jungtaek Lim

> Introduce the strategy on mismatched offset for start offset timestamp on 
> Kafka data source
> ---
>
> Key: SPARK-35611
> URL: https://issues.apache.org/jira/browse/SPARK-35611
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.0.2, 3.1.1
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
>
> 1. Rationalization
> We encountered a real-world case Spark fails the query if some of the 
> partitions don't have matching offset by timestamp.
> This is intended behavior to avoid bring unintended output for some cases 
> like:
> * timestamp 2 is presented as timestamp-offset, but the some of partitions 
> don't have the record yet
> * record with timestamp 1 comes "later" in the following micro-batch
> which is possible since Kafka allows to specify the timestamp in record.
> Here the unintended output we talked about was the risk of reading record 
> with timestamp 1 in the next micro-batch despite the option specifying 
> timestamp 2.
> But for many cases end users just suppose timestamp is increasing 
> monotonically, and current behavior blocks these cases to make progress.
> 2. Proposal
> For the cases the timestamp is supposed to increase monotonically, it's safe 
> to consider the offset to be latest (technically, offset for latest record + 
> 1) if there's no matching record via timestamp.
> This would be pretty much helpful for the case where there's a skew between 
> partitions and some partitions have older records.
> * AS-IS: Spark simply fails the query and end users have to deal with 
> workarounds requiring manual steps.
> * TO-BE: Spark will assign the latest offset for these partitions, so that 
> Spark can read newer records from these partitions in further micro-batches.
> To retain the existing behavior and also give some help for the proposed 
> "TO-BE" behavior, we'd like to introduce the strategy on mismatched offset 
> for start offset timestamp.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-35611) Introduce the strategy on mismatched offset for start offset timestamp on Kafka data source

2021-06-21 Thread L. C. Hsieh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh resolved SPARK-35611.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 32747
[https://github.com/apache/spark/pull/32747]

> Introduce the strategy on mismatched offset for start offset timestamp on 
> Kafka data source
> ---
>
> Key: SPARK-35611
> URL: https://issues.apache.org/jira/browse/SPARK-35611
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.0.2, 3.1.1
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.2.0
>
>
> 1. Rationalization
> We encountered a real-world case Spark fails the query if some of the 
> partitions don't have matching offset by timestamp.
> This is intended behavior to avoid bring unintended output for some cases 
> like:
> * timestamp 2 is presented as timestamp-offset, but the some of partitions 
> don't have the record yet
> * record with timestamp 1 comes "later" in the following micro-batch
> which is possible since Kafka allows to specify the timestamp in record.
> Here the unintended output we talked about was the risk of reading record 
> with timestamp 1 in the next micro-batch despite the option specifying 
> timestamp 2.
> But for many cases end users just suppose timestamp is increasing 
> monotonically, and current behavior blocks these cases to make progress.
> 2. Proposal
> For the cases the timestamp is supposed to increase monotonically, it's safe 
> to consider the offset to be latest (technically, offset for latest record + 
> 1) if there's no matching record via timestamp.
> This would be pretty much helpful for the case where there's a skew between 
> partitions and some partitions have older records.
> * AS-IS: Spark simply fails the query and end users have to deal with 
> workarounds requiring manual steps.
> * TO-BE: Spark will assign the latest offset for these partitions, so that 
> Spark can read newer records from these partitions in further micro-batches.
> To retain the existing behavior and also give some help for the proposed 
> "TO-BE" behavior, we'd like to introduce the strategy on mismatched offset 
> for start offset timestamp.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35835) Select filter query on table with struct complex type fails

2021-06-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35835:


Assignee: Apache Spark

> Select filter query on table with struct complex type fails
> ---
>
> Key: SPARK-35835
> URL: https://issues.apache.org/jira/browse/SPARK-35835
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
> Environment: Spark 3.1.1
>Reporter: Chetan Bhat
>Assignee: Apache Spark
>Priority: Minor
>
> [Steps]:-
> From Spark beeline create a parquet or ORC table having complex type data. 
> Load data in the table and execute select filter query.
> 0: jdbc:hive2://vm2:22550/> create table Struct_com (CUST_ID string, YEAR 
> int, MONTH int, AGE int, GENDER string, EDUCATED string, IS_MARRIED string, 
> STRUCT_INT_DOUBLE_STRING_DATE 
> struct,CARD_COUNT 
> int,DEBIT_COUNT int, CREDIT_COUNT int, DEPOSIT double, HQ_DEPOSIT double) 
> stored as parquet;
> +-+
> | Result |
> +-+
> +-+
> No rows selected (0.161 seconds)
> 0: jdbc:hive2://vm2:22550/> LOAD DATA INPATH 
> 'hdfs://hacluster/chetan/Struct.csv' OVERWRITE INTO TABLE Struct_com;
> +-+
> | Result |
> +-+
> +-+
> No rows selected (1.09 seconds)
> 0: jdbc:hive2://vm2:22550/> SELECT 
> struct_int_double_string_date.COUNTRY,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.CHECK_DATE,struct_in
>  t_double_string_date.Country, SUM(struct_int_double_string_date.id) AS Sum 
> FROM (select * from Struct_com) SUB_QRY WHERE 
> struct_int_double_string_date.id > 5700 GRO UP BY 
> struct_int_double_string_date.COUNTRY,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.Country
>  ORDER BY struct_int_double_string_date.COUNTRY 
> asc,struct_int_double_string_date.CHECK_DATE 
> asc,struct_int_double_string_date.CHECK_DATE asc, struct_int_double_stri 
> ng_date.Country asc;
>  
> [Actual Issue] : - Select filter query on table with struct complex type fails
> 0: jdbc:hive2://vm2:22550/> SELECT 
> struct_int_double_string_date.COUNTRY,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.CHECK_DATE,struct_in
>  t_double_string_date.Country, SUM(struct_int_double_string_date.id) AS Sum 
> FROM (select * from Struct_com) SUB_QRY WHERE 
> struct_int_double_string_date.id > 5700 GRO UP BY 
> struct_int_double_string_date.COUNTRY,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.Country
>  ORDER BY struct_int_double_string_date.COUNTRY 
> asc,struct_int_double_string_date.CHECK_DATE 
> asc,struct_int_double_string_date.CHECK_DATE asc, struct_int_double_stri 
> ng_date.Country asc;
> Error: org.apache.hive.service.cli.HiveSQLException: Error running query: 
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
> Exchange rangepartitioning(COUNTRY#139896 ASC NULLS FIRST, CHECK_DATE#139897 
> ASC NULLS FIRST, CHECK_DATE#139897 ASC NULLS FIRST, COUNTRY#139896 ASC NULLS 
> FIRST, 200 ), ENSURE_REQUIREMENTS, [id=#17161]
> +- *(2) HashAggregate(keys=[_gen_alias_139928#139928, 
> _gen_alias_139929#139929], functions=[sum(cast(_gen_alias_139931#139931 as 
> bigint))], output=[COUNTRY#139896, CHECK_DATE#139897, CHECK_DATE#139898, 
> Country#139899, Sum#139877L])
> +- Exchange hashpartitioning(_gen_alias_139928#139928, 
> _gen_alias_139929#139929, 200), ENSURE_REQUIREMENTS, [id=#17157]
> +- *(1) HashAggregate(keys=[_gen_alias_139928#139928, 
> _gen_alias_139929#139929], 
> functions=[partial_sum(cast(_gen_alias_139931#139931 as bigint))], output=[_g 
> en_alias_139928#139928, _gen_alias_139929#139929, sum#139934L])
> +- *(1) Project [STRUCT_INT_DOUBLE_STRING_DATE#139885.COUNTRY AS 
> _gen_alias_139928#139928, STRUCT_INT_DOUBLE_STRING_DATE#139885.CHECK_DATE AS 
> _gen_alias_13 9929#139929, STRUCT_INT_DOUBLE_STRING_DATE#139885.COUNTRY AS 
> _gen_alias_139930#139930, STRUCT_INT_DOUBLE_STRING_DATE#139885.ID AS 
> _gen_alias_139931#139931]
> +- *(1) Filter (isnotnull(STRUCT_INT_DOUBLE_STRING_DATE#139885) AND 
> (STRUCT_INT_DOUBLE_STRING_DATE#139885.ID > 5700))
> +- FileScan parquet default.struct_com[STRUCT_INT_DOUBLE_STRING_DATE#139885] 
> Batched: false, DataFilters: [isnotnull(STRUCT_INT_DOUBLE_STRING_DATE#13 
> 9885), (STRUCT_INT_DOUBLE_STRING_DATE#139885.ID > 5700)], Format: Parquet, 
> Location: InMemoryFileIndex[hdfs://hacluster/user/hive/warehouse/struct_com], 
> PartitionFi lters: [], PushedFilters: 
> [IsNotNull(STRUCT_INT_DOUBLE_STRING_DATE), 
> GreaterThan(STRUCT_INT_DOUBLE_STRING_DATE.ID,5700)], ReadSchema: 
> struct G_DATE:struct>
> at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftser

[jira] [Assigned] (SPARK-35835) Select filter query on table with struct complex type fails

2021-06-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35835:


Assignee: (was: Apache Spark)

> Select filter query on table with struct complex type fails
> ---
>
> Key: SPARK-35835
> URL: https://issues.apache.org/jira/browse/SPARK-35835
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
> Environment: Spark 3.1.1
>Reporter: Chetan Bhat
>Priority: Minor
>
> [Steps]:-
> From Spark beeline create a parquet or ORC table having complex type data. 
> Load data in the table and execute select filter query.
> 0: jdbc:hive2://vm2:22550/> create table Struct_com (CUST_ID string, YEAR 
> int, MONTH int, AGE int, GENDER string, EDUCATED string, IS_MARRIED string, 
> STRUCT_INT_DOUBLE_STRING_DATE 
> struct,CARD_COUNT 
> int,DEBIT_COUNT int, CREDIT_COUNT int, DEPOSIT double, HQ_DEPOSIT double) 
> stored as parquet;
> +-+
> | Result |
> +-+
> +-+
> No rows selected (0.161 seconds)
> 0: jdbc:hive2://vm2:22550/> LOAD DATA INPATH 
> 'hdfs://hacluster/chetan/Struct.csv' OVERWRITE INTO TABLE Struct_com;
> +-+
> | Result |
> +-+
> +-+
> No rows selected (1.09 seconds)
> 0: jdbc:hive2://vm2:22550/> SELECT 
> struct_int_double_string_date.COUNTRY,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.CHECK_DATE,struct_in
>  t_double_string_date.Country, SUM(struct_int_double_string_date.id) AS Sum 
> FROM (select * from Struct_com) SUB_QRY WHERE 
> struct_int_double_string_date.id > 5700 GRO UP BY 
> struct_int_double_string_date.COUNTRY,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.Country
>  ORDER BY struct_int_double_string_date.COUNTRY 
> asc,struct_int_double_string_date.CHECK_DATE 
> asc,struct_int_double_string_date.CHECK_DATE asc, struct_int_double_stri 
> ng_date.Country asc;
>  
> [Actual Issue] : - Select filter query on table with struct complex type fails
> 0: jdbc:hive2://vm2:22550/> SELECT 
> struct_int_double_string_date.COUNTRY,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.CHECK_DATE,struct_in
>  t_double_string_date.Country, SUM(struct_int_double_string_date.id) AS Sum 
> FROM (select * from Struct_com) SUB_QRY WHERE 
> struct_int_double_string_date.id > 5700 GRO UP BY 
> struct_int_double_string_date.COUNTRY,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.Country
>  ORDER BY struct_int_double_string_date.COUNTRY 
> asc,struct_int_double_string_date.CHECK_DATE 
> asc,struct_int_double_string_date.CHECK_DATE asc, struct_int_double_stri 
> ng_date.Country asc;
> Error: org.apache.hive.service.cli.HiveSQLException: Error running query: 
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
> Exchange rangepartitioning(COUNTRY#139896 ASC NULLS FIRST, CHECK_DATE#139897 
> ASC NULLS FIRST, CHECK_DATE#139897 ASC NULLS FIRST, COUNTRY#139896 ASC NULLS 
> FIRST, 200 ), ENSURE_REQUIREMENTS, [id=#17161]
> +- *(2) HashAggregate(keys=[_gen_alias_139928#139928, 
> _gen_alias_139929#139929], functions=[sum(cast(_gen_alias_139931#139931 as 
> bigint))], output=[COUNTRY#139896, CHECK_DATE#139897, CHECK_DATE#139898, 
> Country#139899, Sum#139877L])
> +- Exchange hashpartitioning(_gen_alias_139928#139928, 
> _gen_alias_139929#139929, 200), ENSURE_REQUIREMENTS, [id=#17157]
> +- *(1) HashAggregate(keys=[_gen_alias_139928#139928, 
> _gen_alias_139929#139929], 
> functions=[partial_sum(cast(_gen_alias_139931#139931 as bigint))], output=[_g 
> en_alias_139928#139928, _gen_alias_139929#139929, sum#139934L])
> +- *(1) Project [STRUCT_INT_DOUBLE_STRING_DATE#139885.COUNTRY AS 
> _gen_alias_139928#139928, STRUCT_INT_DOUBLE_STRING_DATE#139885.CHECK_DATE AS 
> _gen_alias_13 9929#139929, STRUCT_INT_DOUBLE_STRING_DATE#139885.COUNTRY AS 
> _gen_alias_139930#139930, STRUCT_INT_DOUBLE_STRING_DATE#139885.ID AS 
> _gen_alias_139931#139931]
> +- *(1) Filter (isnotnull(STRUCT_INT_DOUBLE_STRING_DATE#139885) AND 
> (STRUCT_INT_DOUBLE_STRING_DATE#139885.ID > 5700))
> +- FileScan parquet default.struct_com[STRUCT_INT_DOUBLE_STRING_DATE#139885] 
> Batched: false, DataFilters: [isnotnull(STRUCT_INT_DOUBLE_STRING_DATE#13 
> 9885), (STRUCT_INT_DOUBLE_STRING_DATE#139885.ID > 5700)], Format: Parquet, 
> Location: InMemoryFileIndex[hdfs://hacluster/user/hive/warehouse/struct_com], 
> PartitionFi lters: [], PushedFilters: 
> [IsNotNull(STRUCT_INT_DOUBLE_STRING_DATE), 
> GreaterThan(STRUCT_INT_DOUBLE_STRING_DATE.ID,5700)], ReadSchema: 
> struct G_DATE:struct>
> at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatement

[jira] [Commented] (SPARK-35835) Select filter query on table with struct complex type fails

2021-06-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366435#comment-17366435
 ] 

Apache Spark commented on SPARK-35835:
--

User 'PavithraRamachandran' has created a pull request for this issue:
https://github.com/apache/spark/pull/32996

> Select filter query on table with struct complex type fails
> ---
>
> Key: SPARK-35835
> URL: https://issues.apache.org/jira/browse/SPARK-35835
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
> Environment: Spark 3.1.1
>Reporter: Chetan Bhat
>Priority: Minor
>
> [Steps]:-
> From Spark beeline create a parquet or ORC table having complex type data. 
> Load data in the table and execute select filter query.
> 0: jdbc:hive2://vm2:22550/> create table Struct_com (CUST_ID string, YEAR 
> int, MONTH int, AGE int, GENDER string, EDUCATED string, IS_MARRIED string, 
> STRUCT_INT_DOUBLE_STRING_DATE 
> struct,CARD_COUNT 
> int,DEBIT_COUNT int, CREDIT_COUNT int, DEPOSIT double, HQ_DEPOSIT double) 
> stored as parquet;
> +-+
> | Result |
> +-+
> +-+
> No rows selected (0.161 seconds)
> 0: jdbc:hive2://vm2:22550/> LOAD DATA INPATH 
> 'hdfs://hacluster/chetan/Struct.csv' OVERWRITE INTO TABLE Struct_com;
> +-+
> | Result |
> +-+
> +-+
> No rows selected (1.09 seconds)
> 0: jdbc:hive2://vm2:22550/> SELECT 
> struct_int_double_string_date.COUNTRY,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.CHECK_DATE,struct_in
>  t_double_string_date.Country, SUM(struct_int_double_string_date.id) AS Sum 
> FROM (select * from Struct_com) SUB_QRY WHERE 
> struct_int_double_string_date.id > 5700 GRO UP BY 
> struct_int_double_string_date.COUNTRY,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.Country
>  ORDER BY struct_int_double_string_date.COUNTRY 
> asc,struct_int_double_string_date.CHECK_DATE 
> asc,struct_int_double_string_date.CHECK_DATE asc, struct_int_double_stri 
> ng_date.Country asc;
>  
> [Actual Issue] : - Select filter query on table with struct complex type fails
> 0: jdbc:hive2://vm2:22550/> SELECT 
> struct_int_double_string_date.COUNTRY,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.CHECK_DATE,struct_in
>  t_double_string_date.Country, SUM(struct_int_double_string_date.id) AS Sum 
> FROM (select * from Struct_com) SUB_QRY WHERE 
> struct_int_double_string_date.id > 5700 GRO UP BY 
> struct_int_double_string_date.COUNTRY,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.Country
>  ORDER BY struct_int_double_string_date.COUNTRY 
> asc,struct_int_double_string_date.CHECK_DATE 
> asc,struct_int_double_string_date.CHECK_DATE asc, struct_int_double_stri 
> ng_date.Country asc;
> Error: org.apache.hive.service.cli.HiveSQLException: Error running query: 
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
> Exchange rangepartitioning(COUNTRY#139896 ASC NULLS FIRST, CHECK_DATE#139897 
> ASC NULLS FIRST, CHECK_DATE#139897 ASC NULLS FIRST, COUNTRY#139896 ASC NULLS 
> FIRST, 200 ), ENSURE_REQUIREMENTS, [id=#17161]
> +- *(2) HashAggregate(keys=[_gen_alias_139928#139928, 
> _gen_alias_139929#139929], functions=[sum(cast(_gen_alias_139931#139931 as 
> bigint))], output=[COUNTRY#139896, CHECK_DATE#139897, CHECK_DATE#139898, 
> Country#139899, Sum#139877L])
> +- Exchange hashpartitioning(_gen_alias_139928#139928, 
> _gen_alias_139929#139929, 200), ENSURE_REQUIREMENTS, [id=#17157]
> +- *(1) HashAggregate(keys=[_gen_alias_139928#139928, 
> _gen_alias_139929#139929], 
> functions=[partial_sum(cast(_gen_alias_139931#139931 as bigint))], output=[_g 
> en_alias_139928#139928, _gen_alias_139929#139929, sum#139934L])
> +- *(1) Project [STRUCT_INT_DOUBLE_STRING_DATE#139885.COUNTRY AS 
> _gen_alias_139928#139928, STRUCT_INT_DOUBLE_STRING_DATE#139885.CHECK_DATE AS 
> _gen_alias_13 9929#139929, STRUCT_INT_DOUBLE_STRING_DATE#139885.COUNTRY AS 
> _gen_alias_139930#139930, STRUCT_INT_DOUBLE_STRING_DATE#139885.ID AS 
> _gen_alias_139931#139931]
> +- *(1) Filter (isnotnull(STRUCT_INT_DOUBLE_STRING_DATE#139885) AND 
> (STRUCT_INT_DOUBLE_STRING_DATE#139885.ID > 5700))
> +- FileScan parquet default.struct_com[STRUCT_INT_DOUBLE_STRING_DATE#139885] 
> Batched: false, DataFilters: [isnotnull(STRUCT_INT_DOUBLE_STRING_DATE#13 
> 9885), (STRUCT_INT_DOUBLE_STRING_DATE#139885.ID > 5700)], Format: Parquet, 
> Location: InMemoryFileIndex[hdfs://hacluster/user/hive/warehouse/struct_com], 
> PartitionFi lters: [], PushedFilters: 
> [IsNotNull(STRUCT_INT_DOUBLE_STRING_DATE), 
> GreaterThan(STRUCT_INT_DOUBLE_STRING_DATE.ID,5700)], ReadSchema: 
> struct G_DATE:struct>
> at 
> org

[jira] [Commented] (SPARK-35821) Spark 3.1.1 Internet Explorer 11 compatibility issues

2021-06-21 Thread Masayoshi Tsuzuki (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366445#comment-17366445
 ] 

Masayoshi Tsuzuki commented on SPARK-35821:
---

Yes, our project met the problems like not showing the DAG area in the history 
page on IE11 several years ago, but we thought that those are just because of 
IE which has not enough compatibility with HTML5 or something, so we just 
avoided them by using Firefox. We didn't investigate the cause.

> Spark 3.1.1 Internet Explorer 11 compatibility issues
> -
>
> Key: SPARK-35821
> URL: https://issues.apache.org/jira/browse/SPARK-35821
> Project: Spark
>  Issue Type: New Feature
>  Components: Web UI
>Affects Versions: 3.1.1
>Reporter: jobit mathew
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35840) Add `apply()` for a single field to `YearMonthIntervalType` and `DayTimeIntervalType`

2021-06-21 Thread Max Gekk (Jira)
Max Gekk created SPARK-35840:


 Summary: Add `apply()` for a single field to 
`YearMonthIntervalType` and `DayTimeIntervalType`
 Key: SPARK-35840
 URL: https://issues.apache.org/jira/browse/SPARK-35840
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Max Gekk
Assignee: Max Gekk


Add 2 methods:
{code:scala}
  def apply(field: Byte): YearMonthIntervalType = YearMonthIntervalType(field, 
field)
  def apply(field: Byte): DayTimeIntervalType = DayTimeIntervalType(field, 
field)
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35840) Add `apply()` for a single field to `YearMonthIntervalType` and `DayTimeIntervalType`

2021-06-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366466#comment-17366466
 ] 

Apache Spark commented on SPARK-35840:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/32997

> Add `apply()` for a single field to `YearMonthIntervalType` and 
> `DayTimeIntervalType`
> -
>
> Key: SPARK-35840
> URL: https://issues.apache.org/jira/browse/SPARK-35840
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> Add 2 methods:
> {code:scala}
>   def apply(field: Byte): YearMonthIntervalType = 
> YearMonthIntervalType(field, field)
>   def apply(field: Byte): DayTimeIntervalType = DayTimeIntervalType(field, 
> field)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35840) Add `apply()` for a single field to `YearMonthIntervalType` and `DayTimeIntervalType`

2021-06-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366465#comment-17366465
 ] 

Apache Spark commented on SPARK-35840:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/32997

> Add `apply()` for a single field to `YearMonthIntervalType` and 
> `DayTimeIntervalType`
> -
>
> Key: SPARK-35840
> URL: https://issues.apache.org/jira/browse/SPARK-35840
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> Add 2 methods:
> {code:scala}
>   def apply(field: Byte): YearMonthIntervalType = 
> YearMonthIntervalType(field, field)
>   def apply(field: Byte): DayTimeIntervalType = DayTimeIntervalType(field, 
> field)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35840) Add `apply()` for a single field to `YearMonthIntervalType` and `DayTimeIntervalType`

2021-06-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35840:


Assignee: Apache Spark  (was: Max Gekk)

> Add `apply()` for a single field to `YearMonthIntervalType` and 
> `DayTimeIntervalType`
> -
>
> Key: SPARK-35840
> URL: https://issues.apache.org/jira/browse/SPARK-35840
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Major
>
> Add 2 methods:
> {code:scala}
>   def apply(field: Byte): YearMonthIntervalType = 
> YearMonthIntervalType(field, field)
>   def apply(field: Byte): DayTimeIntervalType = DayTimeIntervalType(field, 
> field)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35840) Add `apply()` for a single field to `YearMonthIntervalType` and `DayTimeIntervalType`

2021-06-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35840:


Assignee: Max Gekk  (was: Apache Spark)

> Add `apply()` for a single field to `YearMonthIntervalType` and 
> `DayTimeIntervalType`
> -
>
> Key: SPARK-35840
> URL: https://issues.apache.org/jira/browse/SPARK-35840
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> Add 2 methods:
> {code:scala}
>   def apply(field: Byte): YearMonthIntervalType = 
> YearMonthIntervalType(field, field)
>   def apply(field: Byte): DayTimeIntervalType = DayTimeIntervalType(field, 
> field)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35841) Casting string to decimal type doesn't work if the sum of the digits is greater than 38

2021-06-21 Thread Roberto Gelsi (Jira)
Roberto Gelsi created SPARK-35841:
-

 Summary: Casting string to decimal type doesn't work if the sum of 
the digits is greater than 38
 Key: SPARK-35841
 URL: https://issues.apache.org/jira/browse/SPARK-35841
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.2, 3.1.1
 Environment: Tested in a Kubernetes Cluster with Spark 3.1.1 and Spark 
3.1.2 images

(Hadoop 3.2.1, Python 3.9, Scala 2.12.13)
Reporter: Roberto Gelsi


Since Spark 3.1.1, NULL is returned when casting a string with many decimal 
places to a decimal type. If the sum of the digits before and after the decimal 
point is less than 39, a value is returned. From 39 digits, however, NULL is 
returned.
This worked until Spark 3.0.X.

Code to reproduce:

* A string with 2 decimal places in front of the decimal point and 37 decimal 
places after the decimal point returns null

{code:python}
data = ['28.92599983799625624669715762138']
dfs = spark.createDataFrame(data, StringType())
dfd = dfs.withColumn('value', col('value').cast('decimal(10, 5)'))
dfd.show(truncate=False)
{code}

+-+
|value|
+-+
|null |
+-+
 
* A string with 2 decimal places in front of the decimal point and 36 decimal 
places after the decimal point returns the number as decimal

{code:python}
data = ['28.9259998379962562466971576213']
dfs = spark.createDataFrame(data, StringType())
dfd = dfs.withColumn('value', col('value').cast('decimal(10, 5)'))
dfd.show(truncate=False)
{code}

++
|value   |
++
|28.92600|
++

* A string with 1 decimal place in front of the decimal point and 37 decimal 
places after the decimal point returns the number as decimal

{code:python}
data = ['2.92599983799625624669715762138']
dfs = spark.createDataFrame(data, StringType())
dfd = dfs.withColumn('value', col('value').cast('decimal(10, 5)'))
dfd.show(truncate=False)
{code}

+---+
|value  |
+---+
|2.92600|
+---+
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35842) Ignore all ".idea" directory in submodules

2021-06-21 Thread Gengliang Wang (Jira)
Gengliang Wang created SPARK-35842:
--

 Summary: Ignore all ".idea" directory in submodules
 Key: SPARK-35842
 URL: https://issues.apache.org/jira/browse/SPARK-35842
 Project: Spark
  Issue Type: Bug
  Components: Project Infra
Affects Versions: 3.2.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang


After https://github.com/apache/spark/pull/32337, all the `.idea/` in 
submodules are treated as git difference again.
For example, when I open the project `resource-managers/yarn/` with IntelliJ, 
the git status becomes

{code:java}
Untracked files:
  (use "git add ..." to include in what will be committed)
resource-managers/yarn/.idea/
{code}
The same issue happens on opening `sql/hive-thriftserver/` with IntelliJ.
We should ignore all the ".idea" directories instead of the one under the root 
path.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35842) Ignore all ".idea" directory in submodules

2021-06-21 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-35842:
---
Priority: Minor  (was: Major)

> Ignore all ".idea" directory in submodules
> --
>
> Key: SPARK-35842
> URL: https://issues.apache.org/jira/browse/SPARK-35842
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Minor
>
> After https://github.com/apache/spark/pull/32337, all the `.idea/` in 
> submodules are treated as git difference again.
> For example, when I open the project `resource-managers/yarn/` with IntelliJ, 
> the git status becomes
> {code:java}
> Untracked files:
>   (use "git add ..." to include in what will be committed)
>   resource-managers/yarn/.idea/
> {code}
> The same issue happens on opening `sql/hive-thriftserver/` with IntelliJ.
> We should ignore all the ".idea" directories instead of the one under the 
> root path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35842) Ignore all ".idea" directories

2021-06-21 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-35842:
---
Summary: Ignore all ".idea" directories   (was: Ignore all ".idea" 
directory in submodules)

> Ignore all ".idea" directories 
> ---
>
> Key: SPARK-35842
> URL: https://issues.apache.org/jira/browse/SPARK-35842
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Minor
>
> After https://github.com/apache/spark/pull/32337, all the `.idea/` in 
> submodules are treated as git difference again.
> For example, when I open the project `resource-managers/yarn/` with IntelliJ, 
> the git status becomes
> {code:java}
> Untracked files:
>   (use "git add ..." to include in what will be committed)
>   resource-managers/yarn/.idea/
> {code}
> The same issue happens on opening `sql/hive-thriftserver/` with IntelliJ.
> We should ignore all the ".idea" directories instead of the one under the 
> root path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35820) Support cast between different DayTimeIntervalType

2021-06-21 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-35820:


Assignee: angerszhu

> Support cast between different DayTimeIntervalType
> --
>
> Key: SPARK-35820
> URL: https://issues.apache.org/jira/browse/SPARK-35820
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
>
> Support cast between different DayTimeIntervalType



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-35820) Support cast between different DayTimeIntervalType

2021-06-21 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-35820.
--
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 32975
[https://github.com/apache/spark/pull/32975]

> Support cast between different DayTimeIntervalType
> --
>
> Key: SPARK-35820
> URL: https://issues.apache.org/jira/browse/SPARK-35820
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.2.0
>
>
> Support cast between different DayTimeIntervalType



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35842) Ignore all ".idea" directories

2021-06-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366483#comment-17366483
 ] 

Apache Spark commented on SPARK-35842:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/32998

> Ignore all ".idea" directories 
> ---
>
> Key: SPARK-35842
> URL: https://issues.apache.org/jira/browse/SPARK-35842
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Minor
>
> After https://github.com/apache/spark/pull/32337, all the `.idea/` in 
> submodules are treated as git difference again.
> For example, when I open the project `resource-managers/yarn/` with IntelliJ, 
> the git status becomes
> {code:java}
> Untracked files:
>   (use "git add ..." to include in what will be committed)
>   resource-managers/yarn/.idea/
> {code}
> The same issue happens on opening `sql/hive-thriftserver/` with IntelliJ.
> We should ignore all the ".idea" directories instead of the one under the 
> root path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35842) Ignore all ".idea" directories

2021-06-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35842:


Assignee: Apache Spark  (was: Gengliang Wang)

> Ignore all ".idea" directories 
> ---
>
> Key: SPARK-35842
> URL: https://issues.apache.org/jira/browse/SPARK-35842
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Minor
>
> After https://github.com/apache/spark/pull/32337, all the `.idea/` in 
> submodules are treated as git difference again.
> For example, when I open the project `resource-managers/yarn/` with IntelliJ, 
> the git status becomes
> {code:java}
> Untracked files:
>   (use "git add ..." to include in what will be committed)
>   resource-managers/yarn/.idea/
> {code}
> The same issue happens on opening `sql/hive-thriftserver/` with IntelliJ.
> We should ignore all the ".idea" directories instead of the one under the 
> root path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35842) Ignore all ".idea" directories

2021-06-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35842:


Assignee: Gengliang Wang  (was: Apache Spark)

> Ignore all ".idea" directories 
> ---
>
> Key: SPARK-35842
> URL: https://issues.apache.org/jira/browse/SPARK-35842
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Minor
>
> After https://github.com/apache/spark/pull/32337, all the `.idea/` in 
> submodules are treated as git difference again.
> For example, when I open the project `resource-managers/yarn/` with IntelliJ, 
> the git status becomes
> {code:java}
> Untracked files:
>   (use "git add ..." to include in what will be committed)
>   resource-managers/yarn/.idea/
> {code}
> The same issue happens on opening `sql/hive-thriftserver/` with IntelliJ.
> We should ignore all the ".idea" directories instead of the one under the 
> root path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35842) Ignore all ".idea" directories

2021-06-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366485#comment-17366485
 ] 

Apache Spark commented on SPARK-35842:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/32998

> Ignore all ".idea" directories 
> ---
>
> Key: SPARK-35842
> URL: https://issues.apache.org/jira/browse/SPARK-35842
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Minor
>
> After https://github.com/apache/spark/pull/32337, all the `.idea/` in 
> submodules are treated as git difference again.
> For example, when I open the project `resource-managers/yarn/` with IntelliJ, 
> the git status becomes
> {code:java}
> Untracked files:
>   (use "git add ..." to include in what will be committed)
>   resource-managers/yarn/.idea/
> {code}
> The same issue happens on opening `sql/hive-thriftserver/` with IntelliJ.
> We should ignore all the ".idea" directories instead of the one under the 
> root path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35727) Return INTERVAL DAY from dates subtraction

2021-06-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35727:


Assignee: (was: Apache Spark)

> Return INTERVAL DAY from dates subtraction
> --
>
> Key: SPARK-35727
> URL: https://issues.apache.org/jira/browse/SPARK-35727
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Priority: Major
>
> Type of dates subtraction should be INTERVAL DAY (DayTimeIntervalType(DAY, 
> DAY)).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35727) Return INTERVAL DAY from dates subtraction

2021-06-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366500#comment-17366500
 ] 

Apache Spark commented on SPARK-35727:
--

User 'Peng-Lei' has created a pull request for this issue:
https://github.com/apache/spark/pull/32999

> Return INTERVAL DAY from dates subtraction
> --
>
> Key: SPARK-35727
> URL: https://issues.apache.org/jira/browse/SPARK-35727
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Priority: Major
>
> Type of dates subtraction should be INTERVAL DAY (DayTimeIntervalType(DAY, 
> DAY)).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35727) Return INTERVAL DAY from dates subtraction

2021-06-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366502#comment-17366502
 ] 

Apache Spark commented on SPARK-35727:
--

User 'Peng-Lei' has created a pull request for this issue:
https://github.com/apache/spark/pull/32999

> Return INTERVAL DAY from dates subtraction
> --
>
> Key: SPARK-35727
> URL: https://issues.apache.org/jira/browse/SPARK-35727
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Priority: Major
>
> Type of dates subtraction should be INTERVAL DAY (DayTimeIntervalType(DAY, 
> DAY)).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35727) Return INTERVAL DAY from dates subtraction

2021-06-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35727:


Assignee: Apache Spark

> Return INTERVAL DAY from dates subtraction
> --
>
> Key: SPARK-35727
> URL: https://issues.apache.org/jira/browse/SPARK-35727
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Major
>
> Type of dates subtraction should be INTERVAL DAY (DayTimeIntervalType(DAY, 
> DAY)).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35778) Check multiply/divide of year-month intervals of any fields by numeric

2021-06-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366507#comment-17366507
 ] 

Apache Spark commented on SPARK-35778:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/33000

> Check multiply/divide of year-month intervals of any fields by numeric
> --
>
> Key: SPARK-35778
> URL: https://issues.apache.org/jira/browse/SPARK-35778
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Priority: Major
>
> Write tests that checks multiply/divide of the following intervals by numeric:
> # INTERVAL YEAR
> # INTERVAL YEAR TO MONTH
> # INTERVAL MONTH



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35778) Check multiply/divide of year-month intervals of any fields by numeric

2021-06-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35778:


Assignee: (was: Apache Spark)

> Check multiply/divide of year-month intervals of any fields by numeric
> --
>
> Key: SPARK-35778
> URL: https://issues.apache.org/jira/browse/SPARK-35778
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Priority: Major
>
> Write tests that checks multiply/divide of the following intervals by numeric:
> # INTERVAL YEAR
> # INTERVAL YEAR TO MONTH
> # INTERVAL MONTH



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35778) Check multiply/divide of year-month intervals of any fields by numeric

2021-06-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35778:


Assignee: Apache Spark

> Check multiply/divide of year-month intervals of any fields by numeric
> --
>
> Key: SPARK-35778
> URL: https://issues.apache.org/jira/browse/SPARK-35778
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Major
>
> Write tests that checks multiply/divide of the following intervals by numeric:
> # INTERVAL YEAR
> # INTERVAL YEAR TO MONTH
> # INTERVAL MONTH



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35778) Check multiply/divide of year-month intervals of any fields by numeric

2021-06-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366508#comment-17366508
 ] 

Apache Spark commented on SPARK-35778:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/33000

> Check multiply/divide of year-month intervals of any fields by numeric
> --
>
> Key: SPARK-35778
> URL: https://issues.apache.org/jira/browse/SPARK-35778
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Priority: Major
>
> Write tests that checks multiply/divide of the following intervals by numeric:
> # INTERVAL YEAR
> # INTERVAL YEAR TO MONTH
> # INTERVAL MONTH



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35821) Spark 3.1.1 Internet Explorer 11 compatibility issues

2021-06-21 Thread jobit mathew (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jobit mathew updated SPARK-35821:
-
Attachment: dag_chrome.png

> Spark 3.1.1 Internet Explorer 11 compatibility issues
> -
>
> Key: SPARK-35821
> URL: https://issues.apache.org/jira/browse/SPARK-35821
> Project: Spark
>  Issue Type: New Feature
>  Components: Web UI
>Affects Versions: 3.1.1
>Reporter: jobit mathew
>Priority: Minor
> Attachments: dag_IE.PNG, dag_chrome.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35821) Spark 3.1.1 Internet Explorer 11 compatibility issues

2021-06-21 Thread jobit mathew (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jobit mathew updated SPARK-35821:
-
Attachment: dag_IE.PNG

> Spark 3.1.1 Internet Explorer 11 compatibility issues
> -
>
> Key: SPARK-35821
> URL: https://issues.apache.org/jira/browse/SPARK-35821
> Project: Spark
>  Issue Type: New Feature
>  Components: Web UI
>Affects Versions: 3.1.1
>Reporter: jobit mathew
>Priority: Minor
> Attachments: Executortab_Chrome.png, Executortab_IE.PNG, dag_IE.PNG, 
> dag_chrome.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35821) Spark 3.1.1 Internet Explorer 11 compatibility issues

2021-06-21 Thread jobit mathew (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jobit mathew updated SPARK-35821:
-
Attachment: Executortab_IE.PNG

> Spark 3.1.1 Internet Explorer 11 compatibility issues
> -
>
> Key: SPARK-35821
> URL: https://issues.apache.org/jira/browse/SPARK-35821
> Project: Spark
>  Issue Type: New Feature
>  Components: Web UI
>Affects Versions: 3.1.1
>Reporter: jobit mathew
>Priority: Minor
> Attachments: Executortab_Chrome.png, Executortab_IE.PNG, dag_IE.PNG, 
> dag_chrome.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35821) Spark 3.1.1 Internet Explorer 11 compatibility issues

2021-06-21 Thread jobit mathew (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jobit mathew updated SPARK-35821:
-
Attachment: Executortab_Chrome.png

> Spark 3.1.1 Internet Explorer 11 compatibility issues
> -
>
> Key: SPARK-35821
> URL: https://issues.apache.org/jira/browse/SPARK-35821
> Project: Spark
>  Issue Type: New Feature
>  Components: Web UI
>Affects Versions: 3.1.1
>Reporter: jobit mathew
>Priority: Minor
> Attachments: Executortab_Chrome.png, Executortab_IE.PNG, dag_IE.PNG, 
> dag_chrome.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35821) Spark 3.1.1 Internet Explorer 11 compatibility issues

2021-06-21 Thread jobit mathew (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jobit mathew updated SPARK-35821:
-
Description: 
Spark UI-Executor tab is empty in IE11

Spark UI-Stages DAG visualization is empty in IE11

other tabs looks Ok

Attaching some scrreshots

> Spark 3.1.1 Internet Explorer 11 compatibility issues
> -
>
> Key: SPARK-35821
> URL: https://issues.apache.org/jira/browse/SPARK-35821
> Project: Spark
>  Issue Type: New Feature
>  Components: Web UI
>Affects Versions: 3.1.1
>Reporter: jobit mathew
>Priority: Minor
> Attachments: Executortab_Chrome.png, Executortab_IE.PNG, dag_IE.PNG, 
> dag_chrome.png
>
>
> Spark UI-Executor tab is empty in IE11
> Spark UI-Stages DAG visualization is empty in IE11
> other tabs looks Ok
> Attaching some scrreshots



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35821) Spark 3.1.1 Internet Explorer 11 compatibility issues

2021-06-21 Thread jobit mathew (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jobit mathew updated SPARK-35821:
-
Description: 
Spark UI-Executor tab is empty in IE11

Spark UI-Stages DAG visualization is empty in IE11

other tabs looks Ok

Attaching some screenshots

  was:
Spark UI-Executor tab is empty in IE11

Spark UI-Stages DAG visualization is empty in IE11

other tabs looks Ok

Attaching some scrreshots


> Spark 3.1.1 Internet Explorer 11 compatibility issues
> -
>
> Key: SPARK-35821
> URL: https://issues.apache.org/jira/browse/SPARK-35821
> Project: Spark
>  Issue Type: New Feature
>  Components: Web UI
>Affects Versions: 3.1.1
>Reporter: jobit mathew
>Priority: Minor
> Attachments: Executortab_Chrome.png, Executortab_IE.PNG, dag_IE.PNG, 
> dag_chrome.png
>
>
> Spark UI-Executor tab is empty in IE11
> Spark UI-Stages DAG visualization is empty in IE11
> other tabs looks Ok
> Attaching some screenshots



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35821) Spark 3.1.1 Internet Explorer 11 compatibility issues

2021-06-21 Thread jobit mathew (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366531#comment-17366531
 ] 

jobit mathew commented on SPARK-35821:
--

[~hyukjin.kwon] I attached some screen shots .Could you please have a look

> Spark 3.1.1 Internet Explorer 11 compatibility issues
> -
>
> Key: SPARK-35821
> URL: https://issues.apache.org/jira/browse/SPARK-35821
> Project: Spark
>  Issue Type: New Feature
>  Components: Web UI
>Affects Versions: 3.1.1
>Reporter: jobit mathew
>Priority: Minor
> Attachments: Executortab_Chrome.png, Executortab_IE.PNG, dag_IE.PNG, 
> dag_chrome.png
>
>
> Spark UI-Executor tab is empty in IE11
> Spark UI-Stages DAG visualization is empty in IE11
> other tabs looks Ok
> Attaching some screenshots



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35821) Spark 3.1.1 Internet Explorer 11 compatibility issues

2021-06-21 Thread jobit mathew (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jobit mathew updated SPARK-35821:
-
Description: 
Spark UI-Executor tab is empty in IE11

Spark UI-Stages DAG visualization is empty in IE11

other tabs looks Ok.

Spark job history shows completed and incomplete applications list .But when we 
go inside each application same issue may be there.

Attaching some screenshots

  was:
Spark UI-Executor tab is empty in IE11

Spark UI-Stages DAG visualization is empty in IE11

other tabs looks Ok

Attaching some screenshots


> Spark 3.1.1 Internet Explorer 11 compatibility issues
> -
>
> Key: SPARK-35821
> URL: https://issues.apache.org/jira/browse/SPARK-35821
> Project: Spark
>  Issue Type: New Feature
>  Components: Web UI
>Affects Versions: 3.1.1
>Reporter: jobit mathew
>Priority: Minor
> Attachments: Executortab_Chrome.png, Executortab_IE.PNG, dag_IE.PNG, 
> dag_chrome.png
>
>
> Spark UI-Executor tab is empty in IE11
> Spark UI-Stages DAG visualization is empty in IE11
> other tabs looks Ok.
> Spark job history shows completed and incomplete applications list .But when 
> we go inside each application same issue may be there.
> Attaching some screenshots



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-35840) Add `apply()` for a single field to `YearMonthIntervalType` and `DayTimeIntervalType`

2021-06-21 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-35840.
--
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 32997
[https://github.com/apache/spark/pull/32997]

> Add `apply()` for a single field to `YearMonthIntervalType` and 
> `DayTimeIntervalType`
> -
>
> Key: SPARK-35840
> URL: https://issues.apache.org/jira/browse/SPARK-35840
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> Add 2 methods:
> {code:scala}
>   def apply(field: Byte): YearMonthIntervalType = 
> YearMonthIntervalType(field, field)
>   def apply(field: Byte): DayTimeIntervalType = DayTimeIntervalType(field, 
> field)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26247) SPIP - ML Model Extension for no-Spark MLLib Online Serving

2021-06-21 Thread Yik San Chan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-26247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366543#comment-17366543
 ] 

Yik San Chan commented on SPARK-26247:
--

[~aholler] Hi Anne, I wonder what prevents the proposal from approved? I have 
no access to the Google Docs, so I am not sure what happens there. Thanks!

> SPIP - ML Model Extension for no-Spark MLLib Online Serving
> ---
>
> Key: SPARK-26247
> URL: https://issues.apache.org/jira/browse/SPARK-26247
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 2.1.0
>Reporter: Anne Holler
>Priority: Major
>  Labels: SPIP, bulk-closed
> Attachments: SPIPMlModelExtensionForOnlineServing.pdf, diff.out, 
> diff.reduceLoadLatency, diff.scoreInstance
>
>
> This ticket tracks an SPIP to improve model load time and model serving 
> interfaces for online serving of Spark MLlib models.  The SPIP is here
> [https://docs.google.com/a/uber.com/document/d/e/2PACX-1vRttVNNMBt4pBU2oBWKoiK3-7PW6RDwvHNgSMqO67ilxTX_WUStJ2ysUdAk5Im08eyHvlpcfq1g-DLF/pub]
>  
> The improvement opportunity exists in all versions of spark.  We developed 
> our set of changes wrt version 2.1.0 and can port them forward to other 
> versions (e.g., we have ported them forward to 2.3.2).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35623) Volcano resource manager for Spark on Kubernetes

2021-06-21 Thread Dipanjan Kailthya (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366555#comment-17366555
 ] 

Dipanjan Kailthya commented on SPARK-35623:
---

Hi [~pingsutw], thank you for expressing your interest! We are in the process 
of publishing a first draft. In the meantime, how can we contact you, to maybe 
give you a more detailed overview? Do you have a preferred email address?

> Volcano resource manager for Spark on Kubernetes
> 
>
> Key: SPARK-35623
> URL: https://issues.apache.org/jira/browse/SPARK-35623
> Project: Spark
>  Issue Type: Brainstorming
>  Components: Kubernetes
>Affects Versions: 3.1.1, 3.1.2
>Reporter: Dipanjan Kailthya
>Priority: Minor
>  Labels: kubernetes, resourcemanager
>
> Dear Spark Developers, 
>   
>  Hello from the Netherlands! Posting this here as I still haven't gotten 
> accepted to post in the spark dev mailing list.
>   
>  My team is planning to use spark with Kubernetes support on our shared 
> (multi-tenant) on premise Kubernetes cluster. However we would like to have 
> certain scheduling features like fair-share and preemption which as we 
> understand are not built into the current spark-kubernetes resource manager 
> yet. We have been working on and are close to a first successful prototype 
> integration with Volcano ([https://volcano.sh/en/docs/]). Briefly this means 
> a new resource manager component with lots in common with existing 
> spark-kubernetes resource manager, but instead of pods it launches Volcano 
> jobs which delegate the driver and executor pod creation and lifecycle 
> management to Volcano. We are interested in contributing this to open source, 
> either directly in spark or as a separate project.
>   
>  So, two questions: 
>   
>  1. Do the spark maintainers see this as a valuable contribution to the 
> mainline spark codebase? If so, can we have some guidance on how to publish 
> the changes? 
>   
>  2. Are any other developers / organizations interested to contribute to this 
> effort? If so, please get in touch.
>   
>  Best,
>  Dipanjan



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35700) spark.sql.orc.filterPushdown not working with Spark 3.1.1 for tables with varchar data type

2021-06-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35700:


Assignee: Apache Spark

> spark.sql.orc.filterPushdown not working with Spark 3.1.1 for tables with 
> varchar data type
> ---
>
> Key: SPARK-35700
> URL: https://issues.apache.org/jira/browse/SPARK-35700
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, PySpark, Spark Core
>Affects Versions: 3.1.1
> Environment: Spark 3.1.1 on K8S
>Reporter: Arghya Saha
>Assignee: Apache Spark
>Priority: Major
>
> We are not able to upgrade to Spark 3.1.1 from Spark 2.4.x as the join on 
> varchar column is failing which is unexpected and works on Spark 3.0.0.  We 
> are trying to run it on Spark 3.1.1 (MR 3.2) on K8s 
> Below is my use case:
> Tables are external hive table and files are stored as ORC. We do have 
> varchar column and when we are trying to perform join on varchar column we 
> are getting the exception.
> As I understand Spark 3.1.1 have introduced varchar data type but seems its 
> not well tested with ORC and does not have backward compatibility. I have 
> even tried with below config without luck
> *spark.sql.legacy.charVarcharAsString: "true"*
> We are not getting the error when *spark.sql.orc.filterPushdown=false*
> Below is the code: Here col1 is of type varchar(32) in hive
> {code:java}
> df = spark.sql("select col1, col2 from table1 a inner join table2 on b 
> (a.col1=b.col1 and a.col2 > b.col2 )") 
> df.write.format("orc").option("compression", 
> "zlib").mode("Append").save("")
> {code}
> Below is the error:
>  
> {code:java}
> Job aborted due to stage failure: Task 43 in stage 5.0 failed 4 times, most 
> recent failure: Lost task 43.3 in stage 5.0 (TID 524) (10.219.36.64 executor 
> 5): java.lang.UnsupportedOperationException: DataType: varchar(32)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcFilters$.getPredicateLeafType(OrcFilters.scala:150)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcFilters$.getType$1(OrcFilters.scala:222)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcFilters$.buildLeafSearchArgument(OrcFilters.scala:266)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcFilters$.convertibleFiltersHelper$1(OrcFilters.scala:132)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcFilters$.$anonfun$convertibleFilters$4(OrcFilters.scala:135)
>   at 
> scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245)
>   at scala.collection.immutable.List.foreach(List.scala:392)
>   at scala.collection.TraversableLike.flatMap(TraversableLike.scala:245)
>   at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:242)
>   at scala.collection.immutable.List.flatMap(List.scala:355)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcFilters$.convertibleFilters(OrcFilters.scala:134)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcFilters$.createFilter(OrcFilters.scala:73)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$4(OrcFileFormat.scala:189)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$4$adapted(OrcFileFormat.scala:188)
>   at scala.Option.foreach(Option.scala:407)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$1(OrcFileFormat.scala:188)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:116)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:169)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:93)
>   at 
> org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:503)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.columnartorow_nextBatch_0$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755)
>   at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
>   at 
> org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:177)
>   at 
> org.apache.sp

[jira] [Commented] (SPARK-35700) spark.sql.orc.filterPushdown not working with Spark 3.1.1 for tables with varchar data type

2021-06-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366564#comment-17366564
 ] 

Apache Spark commented on SPARK-35700:
--

User 'yaooqinn' has created a pull request for this issue:
https://github.com/apache/spark/pull/33001

> spark.sql.orc.filterPushdown not working with Spark 3.1.1 for tables with 
> varchar data type
> ---
>
> Key: SPARK-35700
> URL: https://issues.apache.org/jira/browse/SPARK-35700
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, PySpark, Spark Core
>Affects Versions: 3.1.1
> Environment: Spark 3.1.1 on K8S
>Reporter: Arghya Saha
>Priority: Major
>
> We are not able to upgrade to Spark 3.1.1 from Spark 2.4.x as the join on 
> varchar column is failing which is unexpected and works on Spark 3.0.0.  We 
> are trying to run it on Spark 3.1.1 (MR 3.2) on K8s 
> Below is my use case:
> Tables are external hive table and files are stored as ORC. We do have 
> varchar column and when we are trying to perform join on varchar column we 
> are getting the exception.
> As I understand Spark 3.1.1 have introduced varchar data type but seems its 
> not well tested with ORC and does not have backward compatibility. I have 
> even tried with below config without luck
> *spark.sql.legacy.charVarcharAsString: "true"*
> We are not getting the error when *spark.sql.orc.filterPushdown=false*
> Below is the code: Here col1 is of type varchar(32) in hive
> {code:java}
> df = spark.sql("select col1, col2 from table1 a inner join table2 on b 
> (a.col1=b.col1 and a.col2 > b.col2 )") 
> df.write.format("orc").option("compression", 
> "zlib").mode("Append").save("")
> {code}
> Below is the error:
>  
> {code:java}
> Job aborted due to stage failure: Task 43 in stage 5.0 failed 4 times, most 
> recent failure: Lost task 43.3 in stage 5.0 (TID 524) (10.219.36.64 executor 
> 5): java.lang.UnsupportedOperationException: DataType: varchar(32)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcFilters$.getPredicateLeafType(OrcFilters.scala:150)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcFilters$.getType$1(OrcFilters.scala:222)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcFilters$.buildLeafSearchArgument(OrcFilters.scala:266)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcFilters$.convertibleFiltersHelper$1(OrcFilters.scala:132)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcFilters$.$anonfun$convertibleFilters$4(OrcFilters.scala:135)
>   at 
> scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245)
>   at scala.collection.immutable.List.foreach(List.scala:392)
>   at scala.collection.TraversableLike.flatMap(TraversableLike.scala:245)
>   at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:242)
>   at scala.collection.immutable.List.flatMap(List.scala:355)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcFilters$.convertibleFilters(OrcFilters.scala:134)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcFilters$.createFilter(OrcFilters.scala:73)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$4(OrcFileFormat.scala:189)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$4$adapted(OrcFileFormat.scala:188)
>   at scala.Option.foreach(Option.scala:407)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$1(OrcFileFormat.scala:188)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:116)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:169)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:93)
>   at 
> org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:503)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.columnartorow_nextBatch_0$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755)
>   at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
>   at 
> org.apache.spark.shuffl

[jira] [Assigned] (SPARK-35700) spark.sql.orc.filterPushdown not working with Spark 3.1.1 for tables with varchar data type

2021-06-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35700:


Assignee: (was: Apache Spark)

> spark.sql.orc.filterPushdown not working with Spark 3.1.1 for tables with 
> varchar data type
> ---
>
> Key: SPARK-35700
> URL: https://issues.apache.org/jira/browse/SPARK-35700
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, PySpark, Spark Core
>Affects Versions: 3.1.1
> Environment: Spark 3.1.1 on K8S
>Reporter: Arghya Saha
>Priority: Major
>
> We are not able to upgrade to Spark 3.1.1 from Spark 2.4.x as the join on 
> varchar column is failing which is unexpected and works on Spark 3.0.0.  We 
> are trying to run it on Spark 3.1.1 (MR 3.2) on K8s 
> Below is my use case:
> Tables are external hive table and files are stored as ORC. We do have 
> varchar column and when we are trying to perform join on varchar column we 
> are getting the exception.
> As I understand Spark 3.1.1 have introduced varchar data type but seems its 
> not well tested with ORC and does not have backward compatibility. I have 
> even tried with below config without luck
> *spark.sql.legacy.charVarcharAsString: "true"*
> We are not getting the error when *spark.sql.orc.filterPushdown=false*
> Below is the code: Here col1 is of type varchar(32) in hive
> {code:java}
> df = spark.sql("select col1, col2 from table1 a inner join table2 on b 
> (a.col1=b.col1 and a.col2 > b.col2 )") 
> df.write.format("orc").option("compression", 
> "zlib").mode("Append").save("")
> {code}
> Below is the error:
>  
> {code:java}
> Job aborted due to stage failure: Task 43 in stage 5.0 failed 4 times, most 
> recent failure: Lost task 43.3 in stage 5.0 (TID 524) (10.219.36.64 executor 
> 5): java.lang.UnsupportedOperationException: DataType: varchar(32)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcFilters$.getPredicateLeafType(OrcFilters.scala:150)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcFilters$.getType$1(OrcFilters.scala:222)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcFilters$.buildLeafSearchArgument(OrcFilters.scala:266)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcFilters$.convertibleFiltersHelper$1(OrcFilters.scala:132)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcFilters$.$anonfun$convertibleFilters$4(OrcFilters.scala:135)
>   at 
> scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245)
>   at scala.collection.immutable.List.foreach(List.scala:392)
>   at scala.collection.TraversableLike.flatMap(TraversableLike.scala:245)
>   at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:242)
>   at scala.collection.immutable.List.flatMap(List.scala:355)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcFilters$.convertibleFilters(OrcFilters.scala:134)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcFilters$.createFilter(OrcFilters.scala:73)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$4(OrcFileFormat.scala:189)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$4$adapted(OrcFileFormat.scala:188)
>   at scala.Option.foreach(Option.scala:407)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$1(OrcFileFormat.scala:188)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:116)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:169)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:93)
>   at 
> org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:503)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.columnartorow_nextBatch_0$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755)
>   at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
>   at 
> org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:177)
>   at 
> org.apache.spark.shuffle.ShuffleWriteP

[jira] [Resolved] (SPARK-34565) Collapse Window nodes with Project between them

2021-06-21 Thread Takeshi Yamamuro (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro resolved SPARK-34565.
--
Fix Version/s: 3.2.0
 Assignee: Tanel Kiis
   Resolution: Fixed

Resolved by https://github.com/apache/spark/pull/31677

> Collapse Window nodes with Project between them
> ---
>
> Key: SPARK-34565
> URL: https://issues.apache.org/jira/browse/SPARK-34565
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Tanel Kiis
>Assignee: Tanel Kiis
>Priority: Major
> Fix For: 3.2.0
>
>
> The CollapseWindow optimizer rule can be improved to also collapse Window 
> nodes, that have a Project between them. This sort of Window - Project - 
> Window chains will happen when chaining the dataframe.withColumn calls.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-35391) Memory leak in ExecutorAllocationListener breaks dynamic allocation under high load

2021-06-21 Thread Thomas Graves (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-35391.
---
Fix Version/s: 3.1.3
   3.2.0
   Resolution: Fixed

> Memory leak in ExecutorAllocationListener breaks dynamic allocation under 
> high load
> ---
>
> Key: SPARK-35391
> URL: https://issues.apache.org/jira/browse/SPARK-35391
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.1
>Reporter: Vasily Kolpakov
>Priority: Major
> Fix For: 3.2.0, 3.1.3
>
>
> ExecutorAllocationListener doesn't clean up data properly. 
> ExecutorAllocationListener performs progressively slower and eventually fails 
> to process events in time.
> There are two problems:
>  * a bug (typo?) in totalRunningTasksPerResourceProfile() method
>  getOrElseUpdate() is used instead of getOrElse().
>  If spark-dynamic-executor-allocation thread calls schedule() after a 
> SparkListenerTaskEnd event for the last task in a stage
>  but before SparkListenerStageCompleted event for the stage, then 
> stageAttemptToNumRunningTask will not be cleaned up properly.
>  * resourceProfileIdToStageAttempt clean-up is broken
>  If a SparkListenerTaskEnd event for the last task in a stage was processed 
> before SparkListenerStageCompleted for that stage,
>  then resourceProfileIdToStageAttempt will not be cleaned up properly.
>  
> Bugs were introduced in this commit: 
> https://github.com/apache/spark/commit/496f6ac86001d284cbfb7488a63dd3a168919c0f
>  .
> Steps to reproduce:
>  # Launch standalone master and worker with 
> 'spark.shuffle.service.enabled=true'
>  # Run spark-shell with --conf 'spark.shuffle.service.enabled=true' --conf 
> 'spark.dynamicAllocation.enabled=true' and paste this script
> {code:java}
> for (_ <- 0 until 10) {
> Seq(1, 2, 3, 4, 5).toDF.repartition(100).agg("value" -> "sum").show()
> }
> {code}
>  # make a heap dump and examine 
> ExecutorAllocationListener.totalRunningTasksPerResourceProfile and 
> ExecutorAllocationListener.resourceProfileIdToStageAttempt fields
> Expected: totalRunningTasksPerResourceProfile and 
> resourceProfileIdToStageAttempt(defaultResourceProfileId) are empty
> Actual: totalRunningTasksPerResourceProfile and 
> resourceProfileIdToStageAttempt(defaultResourceProfileId) contain 
> non-relevant data
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35391) Memory leak in ExecutorAllocationListener breaks dynamic allocation under high load

2021-06-21 Thread Thomas Graves (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves reassigned SPARK-35391:
-

Assignee: Vasily Kolpakov

> Memory leak in ExecutorAllocationListener breaks dynamic allocation under 
> high load
> ---
>
> Key: SPARK-35391
> URL: https://issues.apache.org/jira/browse/SPARK-35391
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.1
>Reporter: Vasily Kolpakov
>Assignee: Vasily Kolpakov
>Priority: Major
> Fix For: 3.2.0, 3.1.3
>
>
> ExecutorAllocationListener doesn't clean up data properly. 
> ExecutorAllocationListener performs progressively slower and eventually fails 
> to process events in time.
> There are two problems:
>  * a bug (typo?) in totalRunningTasksPerResourceProfile() method
>  getOrElseUpdate() is used instead of getOrElse().
>  If spark-dynamic-executor-allocation thread calls schedule() after a 
> SparkListenerTaskEnd event for the last task in a stage
>  but before SparkListenerStageCompleted event for the stage, then 
> stageAttemptToNumRunningTask will not be cleaned up properly.
>  * resourceProfileIdToStageAttempt clean-up is broken
>  If a SparkListenerTaskEnd event for the last task in a stage was processed 
> before SparkListenerStageCompleted for that stage,
>  then resourceProfileIdToStageAttempt will not be cleaned up properly.
>  
> Bugs were introduced in this commit: 
> https://github.com/apache/spark/commit/496f6ac86001d284cbfb7488a63dd3a168919c0f
>  .
> Steps to reproduce:
>  # Launch standalone master and worker with 
> 'spark.shuffle.service.enabled=true'
>  # Run spark-shell with --conf 'spark.shuffle.service.enabled=true' --conf 
> 'spark.dynamicAllocation.enabled=true' and paste this script
> {code:java}
> for (_ <- 0 until 10) {
> Seq(1, 2, 3, 4, 5).toDF.repartition(100).agg("value" -> "sum").show()
> }
> {code}
>  # make a heap dump and examine 
> ExecutorAllocationListener.totalRunningTasksPerResourceProfile and 
> ExecutorAllocationListener.resourceProfileIdToStageAttempt fields
> Expected: totalRunningTasksPerResourceProfile and 
> resourceProfileIdToStageAttempt(defaultResourceProfileId) are empty
> Actual: totalRunningTasksPerResourceProfile and 
> resourceProfileIdToStageAttempt(defaultResourceProfileId) contain 
> non-relevant data
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-35775) Check all year-month interval types in aggregate expressions

2021-06-21 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-35775.
--
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 32988
[https://github.com/apache/spark/pull/32988]

> Check all year-month interval types in aggregate expressions
> 
>
> Key: SPARK-35775
> URL: https://issues.apache.org/jira/browse/SPARK-35775
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Kousuke Saruta
>Priority: Major
> Fix For: 3.2.0
>
>
> Check all supported combination of YearMonthIntervalType fields in the 
> aggregate expression: sum and avg.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35775) Check all year-month interval types in aggregate expressions

2021-06-21 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-35775:


Assignee: Kousuke Saruta

> Check all year-month interval types in aggregate expressions
> 
>
> Key: SPARK-35775
> URL: https://issues.apache.org/jira/browse/SPARK-35775
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Kousuke Saruta
>Priority: Major
>
> Check all supported combination of YearMonthIntervalType fields in the 
> aggregate expression: sum and avg.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-35842) Ignore all ".idea" directories

2021-06-21 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-35842.

Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 32998
[https://github.com/apache/spark/pull/32998]

> Ignore all ".idea" directories 
> ---
>
> Key: SPARK-35842
> URL: https://issues.apache.org/jira/browse/SPARK-35842
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Minor
> Fix For: 3.2.0
>
>
> After https://github.com/apache/spark/pull/32337, all the `.idea/` in 
> submodules are treated as git difference again.
> For example, when I open the project `resource-managers/yarn/` with IntelliJ, 
> the git status becomes
> {code:java}
> Untracked files:
>   (use "git add ..." to include in what will be committed)
>   resource-managers/yarn/.idea/
> {code}
> The same issue happens on opening `sql/hive-thriftserver/` with IntelliJ.
> We should ignore all the ".idea" directories instead of the one under the 
> root path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35787) Does anyone has performance issue after upgrade from 3.0 to 3.1?

2021-06-21 Thread Vidmantas Drasutis (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366632#comment-17366632
 ] 

Vidmantas Drasutis commented on SPARK-35787:


More details about our case -  Scala code.
We loaded GEOjson file with polygons.
In data we have  
GEOHash|[https://en.wikipedia.org/wiki/Geohash#:~:text=Geohash%20is%20a%20public%20domain,string%20of%20letters%20and%20digits.&text=Geohashing%20guarantees%20that%20the%20longer,spatially%20closer%20they%20are%20together],]
 From those GEOhash`es we get central point of geohash and checking if that 
point is within loaded polygons using  "org.locationtech.jts.geom" library.




 
{code:java}
private def doWork(context: QueryContext, before: Stage.State, root: 
Node[PolygonDefinition], hitCacheEnabled : Boolean): Stage.State = {

  val getPolygons = udf((geohash: String) => {
PolygonHitTest.getPolygonsForGeoHashFromHierarchy(geohash, root, 
hitCacheEnabled)
  })

  val result =
for (input <- before.df) yield {
  input
.withColumn(polygons, getPolygons(col(geohash)))
.withColumn(polygon, explode(col(polygons)))
.drop(geohash, polygons)
}

  SparkDebug.show(result, "Polygon mapping")
  before.copy(df = result)
}

def getPolygonsForGeoHashFromHierarchy(geoHash: String, root: 
Node[PolygonDefinition], hitCacheEnabled: Boolean = false): Seq[String] = {
  val latLong = GeoHash.decodeHash(geoHash)
  val point = geometryFactory.createPoint(new Coordinate(latLong.getLon, 
latLong.getLat))
  root.data.id :: getPolygonsForPointFromHierarchyV2(point, 
root.children.toList, hitCacheEnabled)
}

private def getPolygonsForPointFromHierarchyV2(point: Point, hierarchy: 
List[Node[PolygonDefinition]], hitCacheEnabled: Boolean): List[String] = {
  for (node <- hierarchy) {
if (node.data.isPointWithinBoundingBox(point)) {
  val hits = getPolygonsForPointFromHierarchyV2(point, 
node.children.toList, hitCacheEnabled)
  if (hits.isEmpty) {
if (node.data.isPointWithinPolygon(point, hitCacheEnabled)) {
  return List(node.data.id)
}
  } else {
return node.data.id :: hits
  }
}
  }
  return Nil
}



private[this] lazy val hitCache: mutable.Set[Point] = 
java.util.concurrent.ConcurrentHashMap.newKeySet[Point]().asScala

private[this] lazy val missCache: mutable.Set[Point] = 
java.util.concurrent.ConcurrentHashMap.newKeySet[Point]().asScala

def isPointWithinPolygon(point: Point, hitCacheEnabled: Boolean): Boolean = {
  if (hitCacheEnabled) {
if (hitCache.contains(point)) {
  true
} else if (missCache.contains(point)) {
  false
} else {
  val hit = point.within(geometry)
  if (hit) {
hitCache.add(point)
  } else {
missCache.add(point)
  }
  hit
}
  } else {
point.within(geometry)
  }
}

{code}

*Note:* we have option to enable some cashing (hitCacheEnabled) we had always 
turned off as processing requires more memory and this caching not always gives 
any benefit. But when I enabled this polygoHitTest cashing - the query 
performance of new and old Spark was/is same - fast.

But... still we have other product parts where we do not have the way what to 
tweak and seeing slowdown.

 

> Does anyone has performance issue after upgrade from 3.0 to 3.1?
> 
>
> Key: SPARK-35787
> URL: https://issues.apache.org/jira/browse/SPARK-35787
> Project: Spark
>  Issue Type: Question
>  Components: Spark Core
>Affects Versions: 3.1.2
>Reporter: Vidmantas Drasutis
>Priority: Major
> Attachments: Execution_plan_difference.png, 
> spark_3.0_execution_plan_details_fast.txt, 
> spark_3.1_execution_plan_details_slow.txt, spark_job_info_1.png, 
> spark_job_info_2.png
>
>
> Hello.
>  
> We had using spark 3.0.2 and query was executed in ~100 seconds.
> After we upgraded Spark to 3.1.1 (tried also 3.1.2 - same, slow performance) 
> - our query execution time started taking ~260 seconds it is huge increase 
> 250-300 % of execution time increase.
>  
> We tried quite simple query.
> In query we using UDF (*org.apache.spark.sql.functions*)
> ) - which explodes data and do polygon hit test. Nothing changed in our code 
> from query perspective.
>  It is 1 VM box cluster
>  
> Maybe anyone faced similar issue?
> Attached some details from spark dashboard.
>  
> *Looks like it is UDF related slowdown. As queries which does not use UDF`s 
> performance is same and which uses UDFs - starting from 3.1 performance 
> decreased.*
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35843) unify the file name between batch and streaming file writer

2021-06-21 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-35843:
---

 Summary: unify the file name between batch and streaming file 
writer
 Key: SPARK-35843
 URL: https://issues.apache.org/jira/browse/SPARK-35843
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35831) Handle PathOperationException in copyFileToRemote with force on the same src and dest

2021-06-21 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang reassigned SPARK-35831:
--

Assignee: Dongjoon Hyun

> Handle PathOperationException in copyFileToRemote with force on the same src 
> and dest
> -
>
> Key: SPARK-35831
> URL: https://issues.apache.org/jira/browse/SPARK-35831
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-35831) Handle PathOperationException in copyFileToRemote with force on the same src and dest

2021-06-21 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-35831.

Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 32983
[https://github.com/apache/spark/pull/32983]

> Handle PathOperationException in copyFileToRemote with force on the same src 
> and dest
> -
>
> Key: SPARK-35831
> URL: https://issues.apache.org/jira/browse/SPARK-35831
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35843) unify the file name between batch and streaming file writer

2021-06-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366685#comment-17366685
 ] 

Apache Spark commented on SPARK-35843:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/33002

> unify the file name between batch and streaming file writer
> ---
>
> Key: SPARK-35843
> URL: https://issues.apache.org/jira/browse/SPARK-35843
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35843) unify the file name between batch and streaming file writer

2021-06-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35843:


Assignee: (was: Apache Spark)

> unify the file name between batch and streaming file writer
> ---
>
> Key: SPARK-35843
> URL: https://issues.apache.org/jira/browse/SPARK-35843
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35843) unify the file name between batch and streaming file writer

2021-06-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35843:


Assignee: Apache Spark

> unify the file name between batch and streaming file writer
> ---
>
> Key: SPARK-35843
> URL: https://issues.apache.org/jira/browse/SPARK-35843
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35843) unify the file name between batch and streaming file writer

2021-06-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366688#comment-17366688
 ] 

Apache Spark commented on SPARK-35843:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/33002

> unify the file name between batch and streaming file writer
> ---
>
> Key: SPARK-35843
> URL: https://issues.apache.org/jira/browse/SPARK-35843
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35844) Add hadoop-cloud profile to PUBLISH_PROFILES

2021-06-21 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-35844:
-

 Summary: Add hadoop-cloud profile to PUBLISH_PROFILES
 Key: SPARK-35844
 URL: https://issues.apache.org/jira/browse/SPARK-35844
 Project: Spark
  Issue Type: Improvement
  Components: Project Infra
Affects Versions: 3.2.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35844) Add hadoop-cloud profile to PUBLISH_PROFILES

2021-06-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35844:


Assignee: Apache Spark

> Add hadoop-cloud profile to PUBLISH_PROFILES
> 
>
> Key: SPARK-35844
> URL: https://issues.apache.org/jira/browse/SPARK-35844
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35844) Add hadoop-cloud profile to PUBLISH_PROFILES

2021-06-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366714#comment-17366714
 ] 

Apache Spark commented on SPARK-35844:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/33003

> Add hadoop-cloud profile to PUBLISH_PROFILES
> 
>
> Key: SPARK-35844
> URL: https://issues.apache.org/jira/browse/SPARK-35844
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35844) Add hadoop-cloud profile to PUBLISH_PROFILES

2021-06-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35844:


Assignee: (was: Apache Spark)

> Add hadoop-cloud profile to PUBLISH_PROFILES
> 
>
> Key: SPARK-35844
> URL: https://issues.apache.org/jira/browse/SPARK-35844
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35845) OuterReference resolution should reject ambiguous column names

2021-06-21 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-35845:
---

 Summary: OuterReference resolution should reject ambiguous column 
names
 Key: SPARK-35845
 URL: https://issues.apache.org/jira/browse/SPARK-35845
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.2.0
Reporter: Wenchen Fan
Assignee: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35845) OuterReference resolution should reject ambiguous column names

2021-06-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35845:


Assignee: Apache Spark  (was: Wenchen Fan)

> OuterReference resolution should reject ambiguous column names
> --
>
> Key: SPARK-35845
> URL: https://issues.apache.org/jira/browse/SPARK-35845
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35845) OuterReference resolution should reject ambiguous column names

2021-06-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35845:


Assignee: Wenchen Fan  (was: Apache Spark)

> OuterReference resolution should reject ambiguous column names
> --
>
> Key: SPARK-35845
> URL: https://issues.apache.org/jira/browse/SPARK-35845
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35845) OuterReference resolution should reject ambiguous column names

2021-06-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366773#comment-17366773
 ] 

Apache Spark commented on SPARK-35845:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/33004

> OuterReference resolution should reject ambiguous column names
> --
>
> Key: SPARK-35845
> URL: https://issues.apache.org/jira/browse/SPARK-35845
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35844) Add hadoop-cloud profile to PUBLISH_PROFILES

2021-06-21 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-35844:
-

Assignee: Dongjoon Hyun

> Add hadoop-cloud profile to PUBLISH_PROFILES
> 
>
> Key: SPARK-35844
> URL: https://issues.apache.org/jira/browse/SPARK-35844
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-35844) Add hadoop-cloud profile to PUBLISH_PROFILES

2021-06-21 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-35844.
---
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 33003
[https://github.com/apache/spark/pull/33003]

> Add hadoop-cloud profile to PUBLISH_PROFILES
> 
>
> Key: SPARK-35844
> URL: https://issues.apache.org/jira/browse/SPARK-35844
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-35836) Remove reference to spark.shuffle.push.based.enabled in ShuffleBlockPusherSuite

2021-06-21 Thread Mridul Muralidharan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan resolved SPARK-35836.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 32992
[https://github.com/apache/spark/pull/32992]

> Remove reference to spark.shuffle.push.based.enabled in 
> ShuffleBlockPusherSuite
> ---
>
> Key: SPARK-35836
> URL: https://issues.apache.org/jira/browse/SPARK-35836
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Trivial
> Fix For: 3.2.0
>
>
> The test suite for ShuffleBlockPusherSuite was added with SPARK-32917 and in 
> this suite, the configuration for push-based shuffle is incorrectly 
> referenced as {{spark.shuffle.push.based.enabled}}. We need to remove this 
> config from here.
> {{ShuffleBlockPusher}} is created only when push based shuffle is enabled and 
> this suite is for {{ShuffleBlockPusher}}, so no other change is required.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35836) Remove reference to spark.shuffle.push.based.enabled in ShuffleBlockPusherSuite

2021-06-21 Thread Mridul Muralidharan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan reassigned SPARK-35836:
---

Assignee: Chandni Singh

> Remove reference to spark.shuffle.push.based.enabled in 
> ShuffleBlockPusherSuite
> ---
>
> Key: SPARK-35836
> URL: https://issues.apache.org/jira/browse/SPARK-35836
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Trivial
>
> The test suite for ShuffleBlockPusherSuite was added with SPARK-32917 and in 
> this suite, the configuration for push-based shuffle is incorrectly 
> referenced as {{spark.shuffle.push.based.enabled}}. We need to remove this 
> config from here.
> {{ShuffleBlockPusher}} is created only when push based shuffle is enabled and 
> this suite is for {{ShuffleBlockPusher}}, so no other change is required.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-35614) Make the conversion to pandas data-type-based for ExtensionDtypes

2021-06-21 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-35614.
---
Fix Version/s: 3.2.0
 Assignee: Xinrong Meng
   Resolution: Fixed

Issue resolved by pull request 32910
https://github.com/apache/spark/pull/32910

> Make the conversion to pandas data-type-based for ExtensionDtypes
> -
>
> Key: SPARK-35614
> URL: https://issues.apache.org/jira/browse/SPARK-35614
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.2.0
>
>
> The conversion from/to pandas includes logic for checking ExtensionDtypes 
> data types and behaving accordingly.
> That makes code hard to change or maintain.
> We want to introduce the Ops class per ExtensionDtypes data type, and then 
> make the conversion from/to pandas data-type-based for ExtensionDtypes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35846) Introduce ParquetReadState to track various states while reading a Parquet column chunk

2021-06-21 Thread Chao Sun (Jira)
Chao Sun created SPARK-35846:


 Summary: Introduce ParquetReadState to track various states while 
reading a Parquet column chunk
 Key: SPARK-35846
 URL: https://issues.apache.org/jira/browse/SPARK-35846
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Chao Sun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35846) Introduce ParquetReadState to track various states while reading a Parquet column chunk

2021-06-21 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated SPARK-35846:
-
Description: This is mostly refactoring work to complete SPARK-34859

> Introduce ParquetReadState to track various states while reading a Parquet 
> column chunk
> ---
>
> Key: SPARK-35846
> URL: https://issues.apache.org/jira/browse/SPARK-35846
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Chao Sun
>Priority: Minor
>
> This is mostly refactoring work to complete SPARK-34859



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35847) Manage InternalField in DataTypeOps.isnull

2021-06-21 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-35847:
-

 Summary: Manage InternalField in DataTypeOps.isnull
 Key: SPARK-35847
 URL: https://issues.apache.org/jira/browse/SPARK-35847
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Takuya Ueshin


The result of {{DataTypeOps.isnull}} must always be non-nullable boolean.
We should manage {{InternalField}} for this case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35848) Spark Bloom Filter throws OutOfMemoryError

2021-06-21 Thread Sai Polisetty (Jira)
Sai Polisetty created SPARK-35848:
-

 Summary: Spark Bloom Filter throws OutOfMemoryError
 Key: SPARK-35848
 URL: https://issues.apache.org/jira/browse/SPARK-35848
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.0.0, 2.0.0
Reporter: Sai Polisetty


When the Bloom filter stat function is invoked on a large dataframe that 
requires a BitArray of size >2GB, it will result in a 
{color:#55}java.lang.OutOfMemoryError{color}. As mentioned in a similar 
bug, this is due to the zero value passed to treeAggrete. Irrespective of 
spark.serializer value, this will be serialized using JavaSerializer which has 
a hard limit of 2GB. Using a solution similar to SPARK-26228 and setting 
spark.serializer to KryoSerializer can avoid this error.

 

Steps to reproduce:

{{val df = List.range(0, 10).toDF("Id")}}{{val expectedNumItems = 20L 
// 2 billion}}
{{val fpp = 0.03}}
{{val bf = df.stat.bloomFilter("Id", expectedNumItems, fpp)}}

Stack trace:

{color:#55}java.lang.OutOfMemoryError{color}

{color:#55} at 
java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:123) at 
java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:117) at 
java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) at 
java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153) at 
org.apache.spark.util.ByteBufferOutputStream.write(ByteBufferOutputStream.scala:41)
 at 
java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1877)
 at 
java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1786)
 at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1189) at 
java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) at 
org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44)
 at 
org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:101)
 at 
org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:413)
 at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:406) at 
org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:162) at 
org.apache.spark.SparkContext.clean(SparkContext.scala:2604) at 
org.apache.spark.rdd.PairRDDFunctions.$anonfun$combineByKeyWithClassTag$1(PairRDDFunctions.scala:86)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:165) 
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:125) 
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) 
at org.apache.spark.rdd.RDD.withScope(RDD.scala:395) at 
org.apache.spark.rdd.PairRDDFunctions.combineByKeyWithClassTag(PairRDDFunctions.scala:75)
 at 
org.apache.spark.rdd.PairRDDFunctions.$anonfun$foldByKey$1(PairRDDFunctions.scala:218)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:165) 
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:125) 
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) 
at org.apache.spark.rdd.RDD.withScope(RDD.scala:395) at 
org.apache.spark.rdd.PairRDDFunctions.foldByKey(PairRDDFunctions.scala:207) at 
org.apache.spark.rdd.RDD.$anonfun$treeAggregate$1(RDD.scala:1224) at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:165) 
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:125) 
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) 
at org.apache.spark.rdd.RDD.withScope(RDD.scala:395) at 
org.apache.spark.rdd.RDD.treeAggregate(RDD.scala:1203) at 
org.apache.spark.sql.DataFrameStatFunctions.buildBloomFilter(DataFrameStatFunctions.scala:602)
 at 
org.apache.spark.sql.DataFrameStatFunctions.bloomFilter(DataFrameStatFunctions.scala:541){color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35847) Manage InternalField in DataTypeOps.isnull

2021-06-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366850#comment-17366850
 ] 

Apache Spark commented on SPARK-35847:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/33005

> Manage InternalField in DataTypeOps.isnull
> --
>
> Key: SPARK-35847
> URL: https://issues.apache.org/jira/browse/SPARK-35847
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> The result of {{DataTypeOps.isnull}} must always be non-nullable boolean.
> We should manage {{InternalField}} for this case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35847) Manage InternalField in DataTypeOps.isnull

2021-06-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35847:


Assignee: (was: Apache Spark)

> Manage InternalField in DataTypeOps.isnull
> --
>
> Key: SPARK-35847
> URL: https://issues.apache.org/jira/browse/SPARK-35847
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> The result of {{DataTypeOps.isnull}} must always be non-nullable boolean.
> We should manage {{InternalField}} for this case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35847) Manage InternalField in DataTypeOps.isnull

2021-06-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35847:


Assignee: Apache Spark

> Manage InternalField in DataTypeOps.isnull
> --
>
> Key: SPARK-35847
> URL: https://issues.apache.org/jira/browse/SPARK-35847
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Assignee: Apache Spark
>Priority: Major
>
> The result of {{DataTypeOps.isnull}} must always be non-nullable boolean.
> We should manage {{InternalField}} for this case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35846) Introduce ParquetReadState to track various states while reading a Parquet column chunk

2021-06-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35846:


Assignee: (was: Apache Spark)

> Introduce ParquetReadState to track various states while reading a Parquet 
> column chunk
> ---
>
> Key: SPARK-35846
> URL: https://issues.apache.org/jira/browse/SPARK-35846
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Chao Sun
>Priority: Minor
>
> This is mostly refactoring work to complete SPARK-34859



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35846) Introduce ParquetReadState to track various states while reading a Parquet column chunk

2021-06-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35846:


Assignee: Apache Spark

> Introduce ParquetReadState to track various states while reading a Parquet 
> column chunk
> ---
>
> Key: SPARK-35846
> URL: https://issues.apache.org/jira/browse/SPARK-35846
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Chao Sun
>Assignee: Apache Spark
>Priority: Minor
>
> This is mostly refactoring work to complete SPARK-34859



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35846) Introduce ParquetReadState to track various states while reading a Parquet column chunk

2021-06-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366852#comment-17366852
 ] 

Apache Spark commented on SPARK-35846:
--

User 'sunchao' has created a pull request for this issue:
https://github.com/apache/spark/pull/33006

> Introduce ParquetReadState to track various states while reading a Parquet 
> column chunk
> ---
>
> Key: SPARK-35846
> URL: https://issues.apache.org/jira/browse/SPARK-35846
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Chao Sun
>Priority: Minor
>
> This is mostly refactoring work to complete SPARK-34859



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35790) Spark Package Python Import does not work for namespace packages

2021-06-21 Thread Mark Hamilton (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366890#comment-17366890
 ] 

Mark Hamilton commented on SPARK-35790:
---

Was able to properly do this and found this was a user-error. Please feel free 
to mark as resolved

> Spark Package Python Import does not work for namespace packages
> 
>
> Key: SPARK-35790
> URL: https://issues.apache.org/jira/browse/SPARK-35790
> Project: Spark
>  Issue Type: Bug
>  Components: Build, PySpark, Spark Submit
>Affects Versions: 3.0.0, 3.1.2
>Reporter: Mark Hamilton
>Priority: Major
>
> If one includes python files within several jars that comprise a python 
> "namespace package"
> [https://www.python.org/dev/peps/pep-0420/]
> Then only one of packages is imported



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35790) Spark Package Python Import does not work for namespace packages

2021-06-21 Thread Mark Hamilton (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Hamilton updated SPARK-35790:
--
Priority: Trivial  (was: Major)

> Spark Package Python Import does not work for namespace packages
> 
>
> Key: SPARK-35790
> URL: https://issues.apache.org/jira/browse/SPARK-35790
> Project: Spark
>  Issue Type: Bug
>  Components: Build, PySpark, Spark Submit
>Affects Versions: 3.0.0, 3.1.2
>Reporter: Mark Hamilton
>Priority: Trivial
>
> If one includes python files within several jars that comprise a python 
> "namespace package"
> [https://www.python.org/dev/peps/pep-0420/]
> Then only one of packages is imported



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35744) Performance degradation in avro SpecificRecordBuilders

2021-06-21 Thread Erik Krogen (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366892#comment-17366892
 ] 

Erik Krogen commented on SPARK-35744:
-

[~steven.aerts] going a bit off topic from this JIRA, but out of curiosity -- 
is your work based off of SPARK-25789 / [PR 
#22878|https://github.com/apache/spark/pull/22878]? We (LinkedIn) also maintain 
an {{AvroEncoder}} for {{SpecificRecord}} classes which is based off of that 
PR. We've also been planning to make another effort to push this upstream since 
the attempt in #22878 eventually stalled. I'd be interested in learning more 
about your work and potentially collaborating here.

> Performance degradation in avro SpecificRecordBuilders
> --
>
> Key: SPARK-35744
> URL: https://issues.apache.org/jira/browse/SPARK-35744
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Steven Aerts
>Priority: Minor
>
> Creating this bug to let you know that when we tested out spark 3.2.0 we saw 
> a significant performance degradation where our code was handling Avro 
> Specific Record objects.  This slowed down some of our jobs with a factor 4.
> Spark 3.2.0 upsteps the avro version from 1.8.2 to 1.10.2.
> The degradation was caused by a change introduced in avro 1.9.0.  This change 
> degrades performance when creating avro specific records in certain 
> classloader topologies, like the ones used in spark.
> We notified and [proposed|https://github.com/apache/avro/pull/1253] a simple 
> fix upstream in the avro project.  (Links contain more details)
> It is unclear for us how many other projects are using avro specific records 
> in a spark context and will be impacted by this degradation.
>  Feel free to close this issue if you think this issue is too much of a 
> corner case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35344) Support creating a Column of numpy literal value Support creating a Column of numpy literal value in pandas-on-Spark

2021-06-21 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-35344:
-
Summary: Support creating a Column of numpy literal value Support creating 
a Column of numpy literal value in pandas-on-Spark  (was: Make conversion 
from/to literals data-type-based)

> Support creating a Column of numpy literal value Support creating a Column of 
> numpy literal value in pandas-on-Spark
> 
>
> Key: SPARK-35344
> URL: https://issues.apache.org/jira/browse/SPARK-35344
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Aim to achieve {{F}}{{*.*}}{{lit(np}}{{*.*}}{{int64(1)).}}
> We can define {{def lit(literal) -> Column: }}under IntegralOps for example.
> OR we can define in python/pyspark/pandas/spark/functions.py.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35344) Support creating a Column of numpy literal value Support creating a Column of numpy literal value in pandas-on-Spark

2021-06-21 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-35344:
-
Description: 
Spark (`lit` function defined in `pyspark.sql.functions`) doesn't support 
creating a Column out of numpy literal value.


So `lit` function defined in `pyspark.pandas.spark.functions` should be 
adjusted in order to support that in pandas-on-Spark.

  was:
Aim to achieve {{F}}{{*.*}}{{lit(np}}{{*.*}}{{int64(1)).}}

We can define {{def lit(literal) -> Column: }}under IntegralOps for example.

OR we can define in python/pyspark/pandas/spark/functions.py.


> Support creating a Column of numpy literal value Support creating a Column of 
> numpy literal value in pandas-on-Spark
> 
>
> Key: SPARK-35344
> URL: https://issues.apache.org/jira/browse/SPARK-35344
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Spark (`lit` function defined in `pyspark.sql.functions`) doesn't support 
> creating a Column out of numpy literal value.
> So `lit` function defined in `pyspark.pandas.spark.functions` should be 
> adjusted in order to support that in pandas-on-Spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35344) Support creating a Column of numpy literal value in pandas-on-Spark

2021-06-21 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-35344:
-
Summary: Support creating a Column of numpy literal value in 
pandas-on-Spark  (was: Support creating a Column of numpy literal value Support 
creating a Column of numpy literal value in pandas-on-Spark)

> Support creating a Column of numpy literal value in pandas-on-Spark
> ---
>
> Key: SPARK-35344
> URL: https://issues.apache.org/jira/browse/SPARK-35344
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Spark (`lit` function defined in `pyspark.sql.functions`) doesn't support 
> creating a Column out of numpy literal value.
> So `lit` function defined in `pyspark.pandas.spark.functions` should be 
> adjusted in order to support that in pandas-on-Spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35344) Support creating a Column of numpy literal value in pandas-on-Spark

2021-06-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35344:


Assignee: Apache Spark

> Support creating a Column of numpy literal value in pandas-on-Spark
> ---
>
> Key: SPARK-35344
> URL: https://issues.apache.org/jira/browse/SPARK-35344
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Apache Spark
>Priority: Major
>
> Spark (`lit` function defined in `pyspark.sql.functions`) doesn't support 
> creating a Column out of numpy literal value.
> So `lit` function defined in `pyspark.pandas.spark.functions` should be 
> adjusted in order to support that in pandas-on-Spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35344) Support creating a Column of numpy literal value in pandas-on-Spark

2021-06-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366899#comment-17366899
 ] 

Apache Spark commented on SPARK-35344:
--

User 'xinrong-databricks' has created a pull request for this issue:
https://github.com/apache/spark/pull/32955

> Support creating a Column of numpy literal value in pandas-on-Spark
> ---
>
> Key: SPARK-35344
> URL: https://issues.apache.org/jira/browse/SPARK-35344
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Spark (`lit` function defined in `pyspark.sql.functions`) doesn't support 
> creating a Column out of numpy literal value.
> So `lit` function defined in `pyspark.pandas.spark.functions` should be 
> adjusted in order to support that in pandas-on-Spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35344) Support creating a Column of numpy literal value in pandas-on-Spark

2021-06-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35344:


Assignee: (was: Apache Spark)

> Support creating a Column of numpy literal value in pandas-on-Spark
> ---
>
> Key: SPARK-35344
> URL: https://issues.apache.org/jira/browse/SPARK-35344
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Spark (`lit` function defined in `pyspark.sql.functions`) doesn't support 
> creating a Column out of numpy literal value.
> So `lit` function defined in `pyspark.pandas.spark.functions` should be 
> adjusted in order to support that in pandas-on-Spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35344) Support creating a Column of numpy literal value in pandas-on-Spark

2021-06-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366901#comment-17366901
 ] 

Apache Spark commented on SPARK-35344:
--

User 'xinrong-databricks' has created a pull request for this issue:
https://github.com/apache/spark/pull/32955

> Support creating a Column of numpy literal value in pandas-on-Spark
> ---
>
> Key: SPARK-35344
> URL: https://issues.apache.org/jira/browse/SPARK-35344
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Spark (`lit` function defined in `pyspark.sql.functions`) doesn't support 
> creating a Column out of numpy literal value.
> So `lit` function defined in `pyspark.pandas.spark.functions` should be 
> adjusted in order to support that in pandas-on-Spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35623) Volcano resource manager for Spark on Kubernetes

2021-06-21 Thread Klaus Ma (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366906#comment-17366906
 ] 

Klaus Ma commented on SPARK-35623:
--

That's interesting, I'd like to help on that :)

> Volcano resource manager for Spark on Kubernetes
> 
>
> Key: SPARK-35623
> URL: https://issues.apache.org/jira/browse/SPARK-35623
> Project: Spark
>  Issue Type: Brainstorming
>  Components: Kubernetes
>Affects Versions: 3.1.1, 3.1.2
>Reporter: Dipanjan Kailthya
>Priority: Minor
>  Labels: kubernetes, resourcemanager
>
> Dear Spark Developers, 
>   
>  Hello from the Netherlands! Posting this here as I still haven't gotten 
> accepted to post in the spark dev mailing list.
>   
>  My team is planning to use spark with Kubernetes support on our shared 
> (multi-tenant) on premise Kubernetes cluster. However we would like to have 
> certain scheduling features like fair-share and preemption which as we 
> understand are not built into the current spark-kubernetes resource manager 
> yet. We have been working on and are close to a first successful prototype 
> integration with Volcano ([https://volcano.sh/en/docs/]). Briefly this means 
> a new resource manager component with lots in common with existing 
> spark-kubernetes resource manager, but instead of pods it launches Volcano 
> jobs which delegate the driver and executor pod creation and lifecycle 
> management to Volcano. We are interested in contributing this to open source, 
> either directly in spark or as a separate project.
>   
>  So, two questions: 
>   
>  1. Do the spark maintainers see this as a valuable contribution to the 
> mainline spark codebase? If so, can we have some guidance on how to publish 
> the changes? 
>   
>  2. Are any other developers / organizations interested to contribute to this 
> effort? If so, please get in touch.
>   
>  Best,
>  Dipanjan



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35790) Spark Package Python Import does not work for namespace packages

2021-06-21 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366912#comment-17366912
 ] 

Hyukjin Kwon commented on SPARK-35790:
--

Thanks for confirmation [~mhamilton]

> Spark Package Python Import does not work for namespace packages
> 
>
> Key: SPARK-35790
> URL: https://issues.apache.org/jira/browse/SPARK-35790
> Project: Spark
>  Issue Type: Bug
>  Components: Build, PySpark, Spark Submit
>Affects Versions: 3.0.0, 3.1.2
>Reporter: Mark Hamilton
>Priority: Trivial
>
> If one includes python files within several jars that comprise a python 
> "namespace package"
> [https://www.python.org/dev/peps/pep-0420/]
> Then only one of packages is imported



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-35790) Spark Package Python Import does not work for namespace packages

2021-06-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-35790.
--
Resolution: Not A Problem

> Spark Package Python Import does not work for namespace packages
> 
>
> Key: SPARK-35790
> URL: https://issues.apache.org/jira/browse/SPARK-35790
> Project: Spark
>  Issue Type: Bug
>  Components: Build, PySpark, Spark Submit
>Affects Versions: 3.0.0, 3.1.2
>Reporter: Mark Hamilton
>Priority: Trivial
>
> If one includes python files within several jars that comprise a python 
> "namespace package"
> [https://www.python.org/dev/peps/pep-0420/]
> Then only one of packages is imported



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35849) Make astype data-type-based for DecimalOps

2021-06-21 Thread Yikun Jiang (Jira)
Yikun Jiang created SPARK-35849:
---

 Summary: Make astype data-type-based for DecimalOps
 Key: SPARK-35849
 URL: https://issues.apache.org/jira/browse/SPARK-35849
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Yikun Jiang


Make DecimalOps astype data-type-based.

See more in:

[https://github.com/apache/spark/pull/32821#issuecomment-861119905]

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35850) Upgrade scala-maven-plugin to 4.5.3

2021-06-21 Thread William Hyun (Jira)
William Hyun created SPARK-35850:


 Summary: Upgrade scala-maven-plugin to 4.5.3
 Key: SPARK-35850
 URL: https://issues.apache.org/jira/browse/SPARK-35850
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.2.0
Reporter: William Hyun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35850) Upgrade scala-maven-plugin to 4.5.3

2021-06-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366922#comment-17366922
 ] 

Apache Spark commented on SPARK-35850:
--

User 'williamhyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/33007

> Upgrade scala-maven-plugin to 4.5.3
> ---
>
> Key: SPARK-35850
> URL: https://issues.apache.org/jira/browse/SPARK-35850
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: William Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >