[jira] [Created] (SPARK-40971) Imports more from connect proto package to avoid calling `proto.` for Connect DSL
Rui Wang created SPARK-40971: Summary: Imports more from connect proto package to avoid calling `proto.` for Connect DSL Key: SPARK-40971 URL: https://issues.apache.org/jira/browse/SPARK-40971 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Rui Wang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40971) Imports more from connect proto package to avoid calling `proto.` for Connect DSL
[ https://issues.apache.org/jira/browse/SPARK-40971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626440#comment-17626440 ] Apache Spark commented on SPARK-40971: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/38445 > Imports more from connect proto package to avoid calling `proto.` for Connect > DSL > - > > Key: SPARK-40971 > URL: https://issues.apache.org/jira/browse/SPARK-40971 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40971) Imports more from connect proto package to avoid calling `proto.` for Connect DSL
[ https://issues.apache.org/jira/browse/SPARK-40971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40971: Assignee: Apache Spark > Imports more from connect proto package to avoid calling `proto.` for Connect > DSL > - > > Key: SPARK-40971 > URL: https://issues.apache.org/jira/browse/SPARK-40971 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40971) Imports more from connect proto package to avoid calling `proto.` for Connect DSL
[ https://issues.apache.org/jira/browse/SPARK-40971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40971: Assignee: (was: Apache Spark) > Imports more from connect proto package to avoid calling `proto.` for Connect > DSL > - > > Key: SPARK-40971 > URL: https://issues.apache.org/jira/browse/SPARK-40971 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40972) OptimizeLocalShuffleReader causing data skew
Mingming Ge created SPARK-40972: --- Summary: OptimizeLocalShuffleReader causing data skew Key: SPARK-40972 URL: https://issues.apache.org/jira/browse/SPARK-40972 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.0 Reporter: Mingming Ge !image-2022-10-31-15-49-36-559.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40972) OptimizeLocalShuffleReader causing data skew
[ https://issues.apache.org/jira/browse/SPARK-40972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingming Ge updated SPARK-40972: Description: !image-2022-10-31-15-50-36-435.png! (was: !image-2022-10-31-15-49-36-559.png!) > OptimizeLocalShuffleReader causing data skew > > > Key: SPARK-40972 > URL: https://issues.apache.org/jira/browse/SPARK-40972 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Mingming Ge >Priority: Major > Attachments: image-2022-10-31-15-50-36-435.png > > > !image-2022-10-31-15-50-36-435.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40972) OptimizeLocalShuffleReader causing data skew
[ https://issues.apache.org/jira/browse/SPARK-40972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingming Ge updated SPARK-40972: Attachment: image-2022-10-31-15-50-36-435.png > OptimizeLocalShuffleReader causing data skew > > > Key: SPARK-40972 > URL: https://issues.apache.org/jira/browse/SPARK-40972 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Mingming Ge >Priority: Major > Attachments: image-2022-10-31-15-50-36-435.png > > > !image-2022-10-31-15-49-36-559.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40973) Rename _LEGACY_ERROR_TEMP_0055 to UNCLOSED_BRACKETED_COMMENT
[ https://issues.apache.org/jira/browse/SPARK-40973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626449#comment-17626449 ] Haejoon Lee commented on SPARK-40973: - I'm working on it > Rename _LEGACY_ERROR_TEMP_0055 to UNCLOSED_BRACKETED_COMMENT > > > Key: SPARK-40973 > URL: https://issues.apache.org/jira/browse/SPARK-40973 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > Update `_LEGACY_ERROR_TEMP_0055` error class to use proper name. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40973) Rename _LEGACY_ERROR_TEMP_0055 to UNCLOSED_BRACKETED_COMMENT
Haejoon Lee created SPARK-40973: --- Summary: Rename _LEGACY_ERROR_TEMP_0055 to UNCLOSED_BRACKETED_COMMENT Key: SPARK-40973 URL: https://issues.apache.org/jira/browse/SPARK-40973 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.4.0 Reporter: Haejoon Lee Update `_LEGACY_ERROR_TEMP_0055` error class to use proper name. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40972) OptimizeLocalShuffleReader causing data skew
[ https://issues.apache.org/jira/browse/SPARK-40972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingming Ge updated SPARK-40972: Attachment: image-2022-10-31-15-51-39-430.png > OptimizeLocalShuffleReader causing data skew > > > Key: SPARK-40972 > URL: https://issues.apache.org/jira/browse/SPARK-40972 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Mingming Ge >Priority: Major > Attachments: image-2022-10-31-15-50-36-435.png, > image-2022-10-31-15-51-39-430.png > > > !image-2022-10-31-15-50-36-435.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40972) OptimizeLocalShuffleReader causing data skew
[ https://issues.apache.org/jira/browse/SPARK-40972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingming Ge updated SPARK-40972: Description: !image-2022-10-31-15-53-19-751.png! !image-2022-10-31-15-50-36-435.png! !image-2022-10-31-15-51-39-430.png! was:!image-2022-10-31-15-50-36-435.png! > OptimizeLocalShuffleReader causing data skew > > > Key: SPARK-40972 > URL: https://issues.apache.org/jira/browse/SPARK-40972 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Mingming Ge >Priority: Major > Attachments: image-2022-10-31-15-50-36-435.png, > image-2022-10-31-15-51-39-430.png, image-2022-10-31-15-53-19-751.png > > > > !image-2022-10-31-15-53-19-751.png! > !image-2022-10-31-15-50-36-435.png! > > > !image-2022-10-31-15-51-39-430.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40972) OptimizeLocalShuffleReader causing data skew
[ https://issues.apache.org/jira/browse/SPARK-40972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingming Ge updated SPARK-40972: Attachment: image-2022-10-31-15-53-19-751.png > OptimizeLocalShuffleReader causing data skew > > > Key: SPARK-40972 > URL: https://issues.apache.org/jira/browse/SPARK-40972 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Mingming Ge >Priority: Major > Attachments: image-2022-10-31-15-50-36-435.png, > image-2022-10-31-15-51-39-430.png, image-2022-10-31-15-53-19-751.png > > > !image-2022-10-31-15-50-36-435.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40972) OptimizeLocalShuffleReader causing data skew
[ https://issues.apache.org/jira/browse/SPARK-40972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingming Ge updated SPARK-40972: Description: Because there are many empty files in the table, the partition num of OptimizeLocalShuffleReader to optimize shuffle is 1 !image-2022-10-31-15-53-19-751.png! !image-2022-10-31-15-50-36-435.png! !image-2022-10-31-15-51-39-430.png! was: !image-2022-10-31-15-53-19-751.png! !image-2022-10-31-15-50-36-435.png! !image-2022-10-31-15-51-39-430.png! > OptimizeLocalShuffleReader causing data skew > > > Key: SPARK-40972 > URL: https://issues.apache.org/jira/browse/SPARK-40972 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Mingming Ge >Priority: Major > Attachments: image-2022-10-31-15-50-36-435.png, > image-2022-10-31-15-51-39-430.png, image-2022-10-31-15-53-19-751.png > > > Because there are many empty files in the table, the partition num of > OptimizeLocalShuffleReader to optimize shuffle is 1 > !image-2022-10-31-15-53-19-751.png! > !image-2022-10-31-15-50-36-435.png! > > > !image-2022-10-31-15-51-39-430.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40972) OptimizeLocalShuffleReader causing data skew
[ https://issues.apache.org/jira/browse/SPARK-40972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingming Ge updated SPARK-40972: Attachment: image-2022-10-31-15-57-41-599.png > OptimizeLocalShuffleReader causing data skew > > > Key: SPARK-40972 > URL: https://issues.apache.org/jira/browse/SPARK-40972 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Mingming Ge >Priority: Major > Attachments: image-2022-10-31-15-50-36-435.png, > image-2022-10-31-15-51-39-430.png, image-2022-10-31-15-53-19-751.png, > image-2022-10-31-15-57-41-599.png > > > Because there are many empty files in the table, the partition num of > OptimizeLocalShuffleReader to optimize shuffle is 1 > !image-2022-10-31-15-53-19-751.png! > !image-2022-10-31-15-50-36-435.png! > > > !image-2022-10-31-15-51-39-430.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40972) OptimizeLocalShuffleReader causing data skew
[ https://issues.apache.org/jira/browse/SPARK-40972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingming Ge updated SPARK-40972: Description: Because there are many empty files in the table, the partition num of OptimizeLocalShuffleReader to optimize shuffle is 1 !image-2022-10-31-15-53-19-751.png! !image-2022-10-31-15-57-41-599.png! !image-2022-10-31-15-50-36-435.png! !image-2022-10-31-15-51-39-430.png! was: Because there are many empty files in the table, the partition num of OptimizeLocalShuffleReader to optimize shuffle is 1 !image-2022-10-31-15-53-19-751.png! !image-2022-10-31-15-50-36-435.png! !image-2022-10-31-15-51-39-430.png! > OptimizeLocalShuffleReader causing data skew > > > Key: SPARK-40972 > URL: https://issues.apache.org/jira/browse/SPARK-40972 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Mingming Ge >Priority: Major > Attachments: image-2022-10-31-15-50-36-435.png, > image-2022-10-31-15-51-39-430.png, image-2022-10-31-15-53-19-751.png, > image-2022-10-31-15-57-41-599.png > > > Because there are many empty files in the table, the partition num of > OptimizeLocalShuffleReader to optimize shuffle is 1 > !image-2022-10-31-15-53-19-751.png! > !image-2022-10-31-15-57-41-599.png! > !image-2022-10-31-15-50-36-435.png! > > > !image-2022-10-31-15-51-39-430.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40972) OptimizeLocalShuffleReader causing data skew
[ https://issues.apache.org/jira/browse/SPARK-40972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingming Ge updated SPARK-40972: Description: Because there are many empty files in the table, the partition num of OptimizeLocalShuffleReader to optimize partition num is 1 !image-2022-10-31-15-53-19-751.png! !image-2022-10-31-15-57-41-599.png! !image-2022-10-31-15-50-36-435.png! !image-2022-10-31-15-51-39-430.png! was: Because there are many empty files in the table, the partition num of OptimizeLocalShuffleReader to optimize shuffle is 1 !image-2022-10-31-15-53-19-751.png! !image-2022-10-31-15-57-41-599.png! !image-2022-10-31-15-50-36-435.png! !image-2022-10-31-15-51-39-430.png! > OptimizeLocalShuffleReader causing data skew > > > Key: SPARK-40972 > URL: https://issues.apache.org/jira/browse/SPARK-40972 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Mingming Ge >Priority: Major > Attachments: image-2022-10-31-15-50-36-435.png, > image-2022-10-31-15-51-39-430.png, image-2022-10-31-15-53-19-751.png, > image-2022-10-31-15-57-41-599.png > > > Because there are many empty files in the table, the partition num of > OptimizeLocalShuffleReader to optimize partition num is 1 > !image-2022-10-31-15-53-19-751.png! > !image-2022-10-31-15-57-41-599.png! > !image-2022-10-31-15-50-36-435.png! > > > !image-2022-10-31-15-51-39-430.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40794) Upgrade Netty from 4.1.80 to 4.1.84
[ https://issues.apache.org/jira/browse/SPARK-40794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626455#comment-17626455 ] Apache Spark commented on SPARK-40794: -- User 'clairezhuang' has created a pull request for this issue: https://github.com/apache/spark/pull/38446 > Upgrade Netty from 4.1.80 to 4.1.84 > --- > > Key: SPARK-40794 > URL: https://issues.apache.org/jira/browse/SPARK-40794 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Fix For: 3.4.0 > > > * https://netty.io/news/2022/09/08/4-1-81-Final.html > * https://netty.io/news/2022/09/13/4-1-82-Final.html > * https://netty.io/news/2022/10/11/4-1-84-Final.html -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40973) Rename _LEGACY_ERROR_TEMP_0055 to UNCLOSED_BRACKETED_COMMENT
[ https://issues.apache.org/jira/browse/SPARK-40973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40973: Assignee: (was: Apache Spark) > Rename _LEGACY_ERROR_TEMP_0055 to UNCLOSED_BRACKETED_COMMENT > > > Key: SPARK-40973 > URL: https://issues.apache.org/jira/browse/SPARK-40973 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > Update `_LEGACY_ERROR_TEMP_0055` error class to use proper name. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40973) Rename _LEGACY_ERROR_TEMP_0055 to UNCLOSED_BRACKETED_COMMENT
[ https://issues.apache.org/jira/browse/SPARK-40973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40973: Assignee: Apache Spark > Rename _LEGACY_ERROR_TEMP_0055 to UNCLOSED_BRACKETED_COMMENT > > > Key: SPARK-40973 > URL: https://issues.apache.org/jira/browse/SPARK-40973 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Apache Spark >Priority: Major > > Update `_LEGACY_ERROR_TEMP_0055` error class to use proper name. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40973) Rename _LEGACY_ERROR_TEMP_0055 to UNCLOSED_BRACKETED_COMMENT
[ https://issues.apache.org/jira/browse/SPARK-40973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626456#comment-17626456 ] Apache Spark commented on SPARK-40973: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/38447 > Rename _LEGACY_ERROR_TEMP_0055 to UNCLOSED_BRACKETED_COMMENT > > > Key: SPARK-40973 > URL: https://issues.apache.org/jira/browse/SPARK-40973 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > Update `_LEGACY_ERROR_TEMP_0055` error class to use proper name. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40974) EXPODE function selects outer column
Omar Ismail created SPARK-40974: --- Summary: EXPODE function selects outer column Key: SPARK-40974 URL: https://issues.apache.org/jira/browse/SPARK-40974 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.3.0 Reporter: Omar Ismail Im trying to determine if indirectly selecting an outer column is a bug or an intended feature of the EXPLODE function. If I run the following SQL statement: ``` SELECT (SELECT FIRST(name_element_) FROM LATERAL VIEW EXPLODE(name) AS name_element_ *)* FROM patient ``` it fails with: ``` Accessing outer query column is not allowed in: Generate explode(outer(name#9628)) ``` However, if I do a "cheeky select" (bolded below), the SQL query is valid and runs: ``` SELECT( SELECT FIRST(name_element_) FROM (SELECT EXPLODE(name_element_) AS name_element_ \{*}FROM ({*}{*}SELECT{*} *name AS name_element_)* ** ) ) FROM patient ``` >From the viewpoint of the EXPLODE function, it seems like the column >name_element_ does not come from an outer column. Is this an intended feature >or a bug? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40974) EXPODE function selects outer column
[ https://issues.apache.org/jira/browse/SPARK-40974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626489#comment-17626489 ] Apache Spark commented on SPARK-40974: -- User 'clairezhuang' has created a pull request for this issue: https://github.com/apache/spark/pull/38446 > EXPODE function selects outer column > > > Key: SPARK-40974 > URL: https://issues.apache.org/jira/browse/SPARK-40974 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Omar Ismail >Priority: Minor > > Im trying to determine if indirectly selecting an outer column is a bug or an > intended feature of the EXPLODE function. > > If I run the following SQL statement: > ``` > SELECT > (SELECT FIRST(name_element_) > FROM LATERAL VIEW EXPLODE(name) AS name_element_ > *)* > FROM patient > ``` > > it fails with: > ``` > Accessing outer query column is not allowed in: > Generate explode(outer(name#9628)) > ``` > > However, if I do a "cheeky select" (bolded below), the SQL query is valid and > runs: > ``` > SELECT( > SELECT FIRST(name_element_) > FROM (SELECT EXPLODE(name_element_) AS name_element_ > \{*}FROM ({*}{*}SELECT{*} *name AS name_element_)* > ** ) > ) > FROM patient > ``` > From the viewpoint of the EXPLODE function, it seems like the column > name_element_ does not come from an outer column. Is this an intended feature > or a bug? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40974) EXPODE function selects outer column
[ https://issues.apache.org/jira/browse/SPARK-40974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40974: Assignee: (was: Apache Spark) > EXPODE function selects outer column > > > Key: SPARK-40974 > URL: https://issues.apache.org/jira/browse/SPARK-40974 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Omar Ismail >Priority: Minor > > Im trying to determine if indirectly selecting an outer column is a bug or an > intended feature of the EXPLODE function. > > If I run the following SQL statement: > ``` > SELECT > (SELECT FIRST(name_element_) > FROM LATERAL VIEW EXPLODE(name) AS name_element_ > *)* > FROM patient > ``` > > it fails with: > ``` > Accessing outer query column is not allowed in: > Generate explode(outer(name#9628)) > ``` > > However, if I do a "cheeky select" (bolded below), the SQL query is valid and > runs: > ``` > SELECT( > SELECT FIRST(name_element_) > FROM (SELECT EXPLODE(name_element_) AS name_element_ > \{*}FROM ({*}{*}SELECT{*} *name AS name_element_)* > ** ) > ) > FROM patient > ``` > From the viewpoint of the EXPLODE function, it seems like the column > name_element_ does not come from an outer column. Is this an intended feature > or a bug? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40974) EXPODE function selects outer column
[ https://issues.apache.org/jira/browse/SPARK-40974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40974: Assignee: Apache Spark > EXPODE function selects outer column > > > Key: SPARK-40974 > URL: https://issues.apache.org/jira/browse/SPARK-40974 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Omar Ismail >Assignee: Apache Spark >Priority: Minor > > Im trying to determine if indirectly selecting an outer column is a bug or an > intended feature of the EXPLODE function. > > If I run the following SQL statement: > ``` > SELECT > (SELECT FIRST(name_element_) > FROM LATERAL VIEW EXPLODE(name) AS name_element_ > *)* > FROM patient > ``` > > it fails with: > ``` > Accessing outer query column is not allowed in: > Generate explode(outer(name#9628)) > ``` > > However, if I do a "cheeky select" (bolded below), the SQL query is valid and > runs: > ``` > SELECT( > SELECT FIRST(name_element_) > FROM (SELECT EXPLODE(name_element_) AS name_element_ > \{*}FROM ({*}{*}SELECT{*} *name AS name_element_)* > ** ) > ) > FROM patient > ``` > From the viewpoint of the EXPLODE function, it seems like the column > name_element_ does not come from an outer column. Is this an intended feature > or a bug? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40972) OptimizeLocalShuffleReader causing data skew
[ https://issues.apache.org/jira/browse/SPARK-40972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626490#comment-17626490 ] Yuming Wang commented on SPARK-40972: - cc [~michaelzhang-db] > OptimizeLocalShuffleReader causing data skew > > > Key: SPARK-40972 > URL: https://issues.apache.org/jira/browse/SPARK-40972 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Mingming Ge >Priority: Major > Attachments: image-2022-10-31-15-50-36-435.png, > image-2022-10-31-15-51-39-430.png, image-2022-10-31-15-53-19-751.png, > image-2022-10-31-15-57-41-599.png > > > Because there are many empty files in the table, the partition num of > OptimizeLocalShuffleReader to optimize partition num is 1 > !image-2022-10-31-15-53-19-751.png! > !image-2022-10-31-15-57-41-599.png! > !image-2022-10-31-15-50-36-435.png! > > > !image-2022-10-31-15-51-39-430.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40975) Assign a name to the legacy error class _LEGACY_ERROR_TEMP_0021
Max Gekk created SPARK-40975: Summary: Assign a name to the legacy error class _LEGACY_ERROR_TEMP_0021 Key: SPARK-40975 URL: https://issues.apache.org/jira/browse/SPARK-40975 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.4.0 Reporter: Max Gekk Assignee: Max Gekk Fix For: 3.4.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40975) Assign a name to the legacy error class _LEGACY_ERROR_TEMP_0021
[ https://issues.apache.org/jira/browse/SPARK-40975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40975: Assignee: Apache Spark (was: Max Gekk) > Assign a name to the legacy error class _LEGACY_ERROR_TEMP_0021 > --- > > Key: SPARK-40975 > URL: https://issues.apache.org/jira/browse/SPARK-40975 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40975) Assign a name to the legacy error class _LEGACY_ERROR_TEMP_0021
[ https://issues.apache.org/jira/browse/SPARK-40975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626532#comment-17626532 ] Apache Spark commented on SPARK-40975: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/38448 > Assign a name to the legacy error class _LEGACY_ERROR_TEMP_0021 > --- > > Key: SPARK-40975 > URL: https://issues.apache.org/jira/browse/SPARK-40975 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40975) Assign a name to the legacy error class _LEGACY_ERROR_TEMP_0021
[ https://issues.apache.org/jira/browse/SPARK-40975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40975: Assignee: Max Gekk (was: Apache Spark) > Assign a name to the legacy error class _LEGACY_ERROR_TEMP_0021 > --- > > Key: SPARK-40975 > URL: https://issues.apache.org/jira/browse/SPARK-40975 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40975) Assign a name to the legacy error class _LEGACY_ERROR_TEMP_0021
[ https://issues.apache.org/jira/browse/SPARK-40975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626533#comment-17626533 ] Apache Spark commented on SPARK-40975: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/38448 > Assign a name to the legacy error class _LEGACY_ERROR_TEMP_0021 > --- > > Key: SPARK-40975 > URL: https://issues.apache.org/jira/browse/SPARK-40975 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40971) Imports more from connect proto package to avoid calling `proto.` for Connect DSL
[ https://issues.apache.org/jira/browse/SPARK-40971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-40971. - Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38445 [https://github.com/apache/spark/pull/38445] > Imports more from connect proto package to avoid calling `proto.` for Connect > DSL > - > > Key: SPARK-40971 > URL: https://issues.apache.org/jira/browse/SPARK-40971 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40971) Imports more from connect proto package to avoid calling `proto.` for Connect DSL
[ https://issues.apache.org/jira/browse/SPARK-40971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-40971: --- Assignee: Rui Wang > Imports more from connect proto package to avoid calling `proto.` for Connect > DSL > - > > Key: SPARK-40971 > URL: https://issues.apache.org/jira/browse/SPARK-40971 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40798) Alter partition should verify value
[ https://issues.apache.org/jira/browse/SPARK-40798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626556#comment-17626556 ] Apache Spark commented on SPARK-40798: -- User 'ulysses-you' has created a pull request for this issue: https://github.com/apache/spark/pull/38449 > Alter partition should verify value > --- > > Key: SPARK-40798 > URL: https://issues.apache.org/jira/browse/SPARK-40798 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: XiDuo You >Assignee: XiDuo You >Priority: Major > Fix For: 3.4.0 > > > > {code:java} > CREATE TABLE t (c int) USING PARQUET PARTITIONED BY(p int); > -- This DDL should fail but worked: > ALTER TABLE t ADD PARTITION(p='aaa'); {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40663) Migrate execution errors onto error classes
[ https://issues.apache.org/jira/browse/SPARK-40663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626557#comment-17626557 ] Apache Spark commented on SPARK-40663: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/38450 > Migrate execution errors onto error classes > --- > > Key: SPARK-40663 > URL: https://issues.apache.org/jira/browse/SPARK-40663 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.4.0 > > > Use temporary error classes in the execution exceptions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40663) Migrate execution errors onto error classes
[ https://issues.apache.org/jira/browse/SPARK-40663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626558#comment-17626558 ] Apache Spark commented on SPARK-40663: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/38450 > Migrate execution errors onto error classes > --- > > Key: SPARK-40663 > URL: https://issues.apache.org/jira/browse/SPARK-40663 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.4.0 > > > Use temporary error classes in the execution exceptions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34210) Cannot create a record reader because of a previous error when spark accesses the hive on HBase table
[ https://issues.apache.org/jira/browse/SPARK-34210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626576#comment-17626576 ] Mehul Thakkar commented on SPARK-34210: --- Do you mean we have to download the spark source code from master branch and update the code with the fix to make it working for Spark 3? > Cannot create a record reader because of a previous error when spark accesses > the hive on HBase table > -- > > Key: SPARK-34210 > URL: https://issues.apache.org/jira/browse/SPARK-34210 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1 >Reporter: zhangzhanchang >Priority: Major > > It is normal for version 2.4.6 to use spark SQL to access hive on HBase > table,Upgrade to spark3.0.1 with the following exception: > java.io.IOException: Cannot create a record reader because of a previous > error. Please look at the previous logs lines from the task's full log for > more details. > at > org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:252) > at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:131) > at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:276) > at scala.Option.getOrElse(Option.scala:189) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:272) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) > at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:276) > at scala.Option.getOrElse(Option.scala:189) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:272) > Caused by: java.lang.IllegalStateException: The input format instance has not > been properly initialized. Ensure you call initializeTable either in your > constructor or initialize method > at > org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getTable(TableInputFormatBase.java:585) > at > org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:247) > ... 59 more > java.io.IOException: Cannot create a record reader because of a previous > error. Please look at the previous logs lines from the task's full log for > more details. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-34210) Cannot create a record reader because of a previous error when spark accesses the hive on HBase table
[ https://issues.apache.org/jira/browse/SPARK-34210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626576#comment-17626576 ] Mehul Thakkar edited comment on SPARK-34210 at 10/31/22 12:55 PM: -- Do you mean we have to download the spark source code from master branch and update the code with the fix to make it working for Spark 3? Could you please elaborate more on bug in Hadoop? was (Author: JIRAUSER297345): Do you mean we have to download the spark source code from master branch and update the code with the fix to make it working for Spark 3? > Cannot create a record reader because of a previous error when spark accesses > the hive on HBase table > -- > > Key: SPARK-34210 > URL: https://issues.apache.org/jira/browse/SPARK-34210 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1 >Reporter: zhangzhanchang >Priority: Major > > It is normal for version 2.4.6 to use spark SQL to access hive on HBase > table,Upgrade to spark3.0.1 with the following exception: > java.io.IOException: Cannot create a record reader because of a previous > error. Please look at the previous logs lines from the task's full log for > more details. > at > org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:252) > at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:131) > at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:276) > at scala.Option.getOrElse(Option.scala:189) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:272) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) > at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:276) > at scala.Option.getOrElse(Option.scala:189) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:272) > Caused by: java.lang.IllegalStateException: The input format instance has not > been properly initialized. Ensure you call initializeTable either in your > constructor or initialize method > at > org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getTable(TableInputFormatBase.java:585) > at > org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:247) > ... 59 more > java.io.IOException: Cannot create a record reader because of a previous > error. Please look at the previous logs lines from the task's full log for > more details. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40976) Upgrade sbt to 1.7.3
Yang Jie created SPARK-40976: Summary: Upgrade sbt to 1.7.3 Key: SPARK-40976 URL: https://issues.apache.org/jira/browse/SPARK-40976 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.4.0 Reporter: Yang Jie https://github.com/sbt/sbt/releases/tag/v1.7.3 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40976) Upgrade sbt to 1.7.3
[ https://issues.apache.org/jira/browse/SPARK-40976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40976: Assignee: (was: Apache Spark) > Upgrade sbt to 1.7.3 > > > Key: SPARK-40976 > URL: https://issues.apache.org/jira/browse/SPARK-40976 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > > https://github.com/sbt/sbt/releases/tag/v1.7.3 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40976) Upgrade sbt to 1.7.3
[ https://issues.apache.org/jira/browse/SPARK-40976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626630#comment-17626630 ] Apache Spark commented on SPARK-40976: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/38451 > Upgrade sbt to 1.7.3 > > > Key: SPARK-40976 > URL: https://issues.apache.org/jira/browse/SPARK-40976 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > > https://github.com/sbt/sbt/releases/tag/v1.7.3 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40976) Upgrade sbt to 1.7.3
[ https://issues.apache.org/jira/browse/SPARK-40976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40976: Assignee: Apache Spark > Upgrade sbt to 1.7.3 > > > Key: SPARK-40976 > URL: https://issues.apache.org/jira/browse/SPARK-40976 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > > https://github.com/sbt/sbt/releases/tag/v1.7.3 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40976) Upgrade sbt to 1.7.3
[ https://issues.apache.org/jira/browse/SPARK-40976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626631#comment-17626631 ] Apache Spark commented on SPARK-40976: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/38451 > Upgrade sbt to 1.7.3 > > > Key: SPARK-40976 > URL: https://issues.apache.org/jira/browse/SPARK-40976 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > > https://github.com/sbt/sbt/releases/tag/v1.7.3 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-33807) Data Source V2: Remove read specific distributions
[ https://issues.apache.org/jira/browse/SPARK-33807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626635#comment-17626635 ] Nikhil Sharma edited comment on SPARK-33807 at 10/31/22 3:09 PM: - Thank you for sharing such good information. Very informative and effective post. [react native certification| {+}[https://www.igmguru.com/digital-marketing-programming/react-native-training/]{+}] was (Author: JIRAUSER295436): Thank you for sharing such good information. Very informative and effective post. [react native certification|+[https://www.igmguru.com/digital-marketing-programming/react-native-training/]+] > Data Source V2: Remove read specific distributions > -- > > Key: SPARK-33807 > URL: https://issues.apache.org/jira/browse/SPARK-33807 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Anton Okolnychyi >Priority: Blocker > > We should remove the read-specific distributions for DS V2 as discussed > [here|https://github.com/apache/spark/pull/30706#discussion_r543059827]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33807) Data Source V2: Remove read specific distributions
[ https://issues.apache.org/jira/browse/SPARK-33807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626635#comment-17626635 ] Nikhil Sharma commented on SPARK-33807: --- Thank you for sharing such good information. Very informative and effective post. [react native certification|+[https://www.igmguru.com/digital-marketing-programming/react-native-training/]+] > Data Source V2: Remove read specific distributions > -- > > Key: SPARK-33807 > URL: https://issues.apache.org/jira/browse/SPARK-33807 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Anton Okolnychyi >Priority: Blocker > > We should remove the read-specific distributions for DS V2 as discussed > [here|https://github.com/apache/spark/pull/30706#discussion_r543059827]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-33807) Data Source V2: Remove read specific distributions
[ https://issues.apache.org/jira/browse/SPARK-33807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626635#comment-17626635 ] Nikhil Sharma edited comment on SPARK-33807 at 10/31/22 3:10 PM: - Thank you for sharing such good information. Very informative and effective post. [react native certification]({+}[https://www.igmguru.com/digital-marketing-programming/react-native-training/)|https://www.igmguru.com/digital-marketing-programming/react-native-training/]{+} was (Author: JIRAUSER295436): Thank you for sharing such good information. Very informative and effective post. [react native certification| {+}[https://www.igmguru.com/digital-marketing-programming/react-native-training/]{+}] > Data Source V2: Remove read specific distributions > -- > > Key: SPARK-33807 > URL: https://issues.apache.org/jira/browse/SPARK-33807 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Anton Okolnychyi >Priority: Blocker > > We should remove the read-specific distributions for DS V2 as discussed > [here|https://github.com/apache/spark/pull/30706#discussion_r543059827]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-33807) Data Source V2: Remove read specific distributions
[ https://issues.apache.org/jira/browse/SPARK-33807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626635#comment-17626635 ] Nikhil Sharma edited comment on SPARK-33807 at 10/31/22 3:10 PM: - Thank you for sharing such good information. Very informative and effective post. +[https://www.igmguru.com/digital-marketing-programming/react-native-training/]+ was (Author: JIRAUSER295436): Thank you for sharing such good information. Very informative and effective post. [react native certification]({+}[https://www.igmguru.com/digital-marketing-programming/react-native-training/)|https://www.igmguru.com/digital-marketing-programming/react-native-training/]{+} > Data Source V2: Remove read specific distributions > -- > > Key: SPARK-33807 > URL: https://issues.apache.org/jira/browse/SPARK-33807 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Anton Okolnychyi >Priority: Blocker > > We should remove the read-specific distributions for DS V2 as discussed > [here|https://github.com/apache/spark/pull/30706#discussion_r543059827]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40974) EXPODE function selects outer column
[ https://issues.apache.org/jira/browse/SPARK-40974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626642#comment-17626642 ] Apache Spark commented on SPARK-40974: -- User 'clairezhuang' has created a pull request for this issue: https://github.com/apache/spark/pull/38446 > EXPODE function selects outer column > > > Key: SPARK-40974 > URL: https://issues.apache.org/jira/browse/SPARK-40974 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Omar Ismail >Priority: Minor > > Im trying to determine if indirectly selecting an outer column is a bug or an > intended feature of the EXPLODE function. > > If I run the following SQL statement: > ``` > SELECT > (SELECT FIRST(name_element_) > FROM LATERAL VIEW EXPLODE(name) AS name_element_ > *)* > FROM patient > ``` > > it fails with: > ``` > Accessing outer query column is not allowed in: > Generate explode(outer(name#9628)) > ``` > > However, if I do a "cheeky select" (bolded below), the SQL query is valid and > runs: > ``` > SELECT( > SELECT FIRST(name_element_) > FROM (SELECT EXPLODE(name_element_) AS name_element_ > \{*}FROM ({*}{*}SELECT{*} *name AS name_element_)* > ** ) > ) > FROM patient > ``` > From the viewpoint of the EXPLODE function, it seems like the column > name_element_ does not come from an outer column. Is this an intended feature > or a bug? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40974) EXPODE function selects outer column
[ https://issues.apache.org/jira/browse/SPARK-40974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626644#comment-17626644 ] Apache Spark commented on SPARK-40974: -- User 'clairezhuang' has created a pull request for this issue: https://github.com/apache/spark/pull/38446 > EXPODE function selects outer column > > > Key: SPARK-40974 > URL: https://issues.apache.org/jira/browse/SPARK-40974 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Omar Ismail >Priority: Minor > > Im trying to determine if indirectly selecting an outer column is a bug or an > intended feature of the EXPLODE function. > > If I run the following SQL statement: > ``` > SELECT > (SELECT FIRST(name_element_) > FROM LATERAL VIEW EXPLODE(name) AS name_element_ > *)* > FROM patient > ``` > > it fails with: > ``` > Accessing outer query column is not allowed in: > Generate explode(outer(name#9628)) > ``` > > However, if I do a "cheeky select" (bolded below), the SQL query is valid and > runs: > ``` > SELECT( > SELECT FIRST(name_element_) > FROM (SELECT EXPLODE(name_element_) AS name_element_ > \{*}FROM ({*}{*}SELECT{*} *name AS name_element_)* > ** ) > ) > FROM patient > ``` > From the viewpoint of the EXPLODE function, it seems like the column > name_element_ does not come from an outer column. Is this an intended feature > or a bug? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40916) udf could not filter null value cause npe
[ https://issues.apache.org/jira/browse/SPARK-40916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated SPARK-40916: Description: {code:sql} select t22.uid, from ( SELECT code, count(distinct uid) cnt FROM ( SELECT uid, code, lng, lat FROM ( select riskmanage_dw.GEOHASH_ENCODE(manhattan_dw.aes_decode(lng),manhattan_dw.aes_decode(lat),8) as code, uid, lng, lat, dt as event_time from ( select param['timestamp'] as dt, get_json_object(get_json_object(param['input'],'$.baseInfo'),'$.uid') uid, get_json_object(get_json_object(param['input'],'$.envInfo'),'$.lng') lng, get_json_object(get_json_object(param['input'],'$.envInfo'),'$.lat') lat from manhattan_ods.ods_log_manhattan_fbi_workflow_result_log and get_json_object(get_json_object(param['input'],'$.bizExtents'),'$.productId')='2001' )a and lng is not null and lat is not null ) t2 group by uid,code,lng,lat ) t1 GROUP BY code having count(DISTINCT uid)>=10 )t11 join ( SELECT uid, code, lng, lat FROM ( select riskmanage_dw.GEOHASH_ENCODE(manhattan_dw.aes_decode(lng),manhattan_dw.aes_decode(lat),8) as code, uid, lng, lat, dt as event_time from ( select param['timestamp'] as dt, get_json_object(get_json_object(param['input'],'$.baseInfo'),'$.uid') uid, get_json_object(get_json_object(param['input'],'$.envInfo'),'$.lng') lng, get_json_object(get_json_object(param['input'],'$.envInfo'),'$.lat') lat from manhattan_ods.ods_log_manhattan_fbi_workflow_result_log and get_json_object(get_json_object(param['input'],'$.bizExtents'),'$.productId')='2001' )a and lng is not null and lat is not null ) t2 where substr(code,0,6)<>'wx4ey3' group by uid,code,lng,lat ) t22 on t11.code=t22.code group by t22.uid {code} this sql can't run because `riskmanage_dw.GEOHASH_ENCODE(manhattan_dw.aes_decode(lng),manhattan_dw.aes_decode(lat),8)` will throw npe(`Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public java.lang.String com.xiaoju.automarket.GeohashEncode.evaluate(java.lang.Double,java.lang.Double,java.lang.Integer) with arguments {null,null,8}:null`), but I have filter null in my condition, the udf of manhattan_dw.aes_decode will return null if lng or lat is null, *but after I remove `where substr(code,0,6)<>'wx4ey3' `this condition, it can run normally.* complete : Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public java.lang.String com.xiaoju.automarket.GeohashEncode.evaluate(java.lang.Double,java.lang.Double,java.lang.Integer) with arguments {null,null,8}:null at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:1049) at org.apache.spark.sql.hive.HiveSimpleUDF.eval(hiveUDFs.scala:102) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificPredicate.subExpr_3$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificPredicate.eval(Unknown Source) at org.apache.spark.sql.execution.FilterExec.$anonfun$doExecute$3(basicPhysicalOperators.scala:275) at org.apache.spark.sql.execution.FilterExec.$anonfun$doExecute$3$adapted(basicPhysicalOperators.scala:274) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:515) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.agg_doAggregateWithKeys_0$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) was: ``` select t22.uid, from ( SELECT code, count(distinct uid) cnt FROM ( SELECT uid, code, lng, lat FROM ( select riskmanage_dw.GEOHASH_ENCODE(manhattan_dw.aes_decode(lng),manhattan_dw.aes_decode(lat),8) as code, uid, lng, lat, dt as event_time from ( select param['timestamp'] as dt, get_json_object(get_json_object(param['input'],'$.baseInfo'),'$.uid') uid,
[jira] [Commented] (SPARK-40802) Enhance JDBC Connector to use PreparedStatement.getMetaData() to resolve schema instead of PreparedStatement.executeQuery()
[ https://issues.apache.org/jira/browse/SPARK-40802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626743#comment-17626743 ] Apache Spark commented on SPARK-40802: -- User 'Mingli-Rui' has created a pull request for this issue: https://github.com/apache/spark/pull/38452 > Enhance JDBC Connector to use PreparedStatement.getMetaData() to resolve > schema instead of PreparedStatement.executeQuery() > --- > > Key: SPARK-40802 > URL: https://issues.apache.org/jira/browse/SPARK-40802 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.0 >Reporter: Mingli Rui >Priority: Major > Original Estimate: 24h > Remaining Estimate: 24h > > Currently, Spark JDBC Connector uses *PreparedStatement.executeQuery()* to > resolve the JDBCRelation's schema. The schema query is like *s"SELECT * FROM > $table_or_query WHERE 1=0".* > But it is not necessary to execute the query. It's enough to *prepare* the > query. With preparing the statement, the query is parsed and compiled, but is > not executed. It will be more efficient. > So, it's better to use PreparedStatement.getMetaData() to resolve schema. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40802) Enhance JDBC Connector to use PreparedStatement.getMetaData() to resolve schema instead of PreparedStatement.executeQuery()
[ https://issues.apache.org/jira/browse/SPARK-40802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626742#comment-17626742 ] Apache Spark commented on SPARK-40802: -- User 'Mingli-Rui' has created a pull request for this issue: https://github.com/apache/spark/pull/38452 > Enhance JDBC Connector to use PreparedStatement.getMetaData() to resolve > schema instead of PreparedStatement.executeQuery() > --- > > Key: SPARK-40802 > URL: https://issues.apache.org/jira/browse/SPARK-40802 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.0 >Reporter: Mingli Rui >Priority: Major > Original Estimate: 24h > Remaining Estimate: 24h > > Currently, Spark JDBC Connector uses *PreparedStatement.executeQuery()* to > resolve the JDBCRelation's schema. The schema query is like *s"SELECT * FROM > $table_or_query WHERE 1=0".* > But it is not necessary to execute the query. It's enough to *prepare* the > query. With preparing the statement, the query is parsed and compiled, but is > not executed. It will be more efficient. > So, it's better to use PreparedStatement.getMetaData() to resolve schema. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40802) Enhance JDBC Connector to use PreparedStatement.getMetaData() to resolve schema instead of PreparedStatement.executeQuery()
[ https://issues.apache.org/jira/browse/SPARK-40802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40802: Assignee: (was: Apache Spark) > Enhance JDBC Connector to use PreparedStatement.getMetaData() to resolve > schema instead of PreparedStatement.executeQuery() > --- > > Key: SPARK-40802 > URL: https://issues.apache.org/jira/browse/SPARK-40802 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.0 >Reporter: Mingli Rui >Priority: Major > Original Estimate: 24h > Remaining Estimate: 24h > > Currently, Spark JDBC Connector uses *PreparedStatement.executeQuery()* to > resolve the JDBCRelation's schema. The schema query is like *s"SELECT * FROM > $table_or_query WHERE 1=0".* > But it is not necessary to execute the query. It's enough to *prepare* the > query. With preparing the statement, the query is parsed and compiled, but is > not executed. It will be more efficient. > So, it's better to use PreparedStatement.getMetaData() to resolve schema. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40802) Enhance JDBC Connector to use PreparedStatement.getMetaData() to resolve schema instead of PreparedStatement.executeQuery()
[ https://issues.apache.org/jira/browse/SPARK-40802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40802: Assignee: Apache Spark > Enhance JDBC Connector to use PreparedStatement.getMetaData() to resolve > schema instead of PreparedStatement.executeQuery() > --- > > Key: SPARK-40802 > URL: https://issues.apache.org/jira/browse/SPARK-40802 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.0 >Reporter: Mingli Rui >Assignee: Apache Spark >Priority: Major > Original Estimate: 24h > Remaining Estimate: 24h > > Currently, Spark JDBC Connector uses *PreparedStatement.executeQuery()* to > resolve the JDBCRelation's schema. The schema query is like *s"SELECT * FROM > $table_or_query WHERE 1=0".* > But it is not necessary to execute the query. It's enough to *prepare* the > query. With preparing the statement, the query is parsed and compiled, but is > not executed. It will be more efficient. > So, it's better to use PreparedStatement.getMetaData() to resolve schema. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40569) Add smoke test in standalone cluster for spark-docker
[ https://issues.apache.org/jira/browse/SPARK-40569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626753#comment-17626753 ] Vivek Garg commented on SPARK-40569: The Salesforce Marketing Cloud training offered by IgmGuru is created by instructors who are experts in the field using the most recent curriculum. The [Salesforce Marketing Cloud Certification|[https://www.igmguru.com/salesforce/salesforce-marketing-cloud-training/]] Course credential is intended for people who want to show that they have knowledge, expertise, and experience in the following areas: best practices for email marketing, message design, subscriber and data management, inbox delivery, email automation, and tracking and reporting metrics within the Marketing Cloud Email application. > Add smoke test in standalone cluster for spark-docker > - > > Key: SPARK-40569 > URL: https://issues.apache.org/jira/browse/SPARK-40569 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Qian Sun >Assignee: Qian Sun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-40569) Add smoke test in standalone cluster for spark-docker
[ https://issues.apache.org/jira/browse/SPARK-40569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626753#comment-17626753 ] Vivek Garg edited comment on SPARK-40569 at 10/31/22 6:38 PM: -- The Salesforce Marketing Cloud training offered by IgmGuru is created by instructors who are experts in the field using the most recent curriculum. The [[https://www.igmguru.com/salesforce/salesforce-marketing-cloud-training/]Salesforce Marketing Cloud Certification] Course credential is intended for people who want to show that they have knowledge, expertise, and experience in the following areas: best practices for email marketing, message design, subscriber and data management, inbox delivery, email automation, and tracking and reporting metrics within the Marketing Cloud Email application. was (Author: JIRAUSER294516): The Salesforce Marketing Cloud training offered by IgmGuru is created by instructors who are experts in the field using the most recent curriculum. The [Salesforce Marketing Cloud Certification|[https://www.igmguru.com/salesforce/salesforce-marketing-cloud-training/]] Course credential is intended for people who want to show that they have knowledge, expertise, and experience in the following areas: best practices for email marketing, message design, subscriber and data management, inbox delivery, email automation, and tracking and reporting metrics within the Marketing Cloud Email application. > Add smoke test in standalone cluster for spark-docker > - > > Key: SPARK-40569 > URL: https://issues.apache.org/jira/browse/SPARK-40569 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Qian Sun >Assignee: Qian Sun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-40569) Add smoke test in standalone cluster for spark-docker
[ https://issues.apache.org/jira/browse/SPARK-40569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626753#comment-17626753 ] Vivek Garg edited comment on SPARK-40569 at 10/31/22 6:38 PM: -- https://www.igmguru.com/erp-training/sac-analytics-cloud-online-training/";>SAP analytics cloud training [https://www.igmguru.com/erp-training/sac-analytics-cloud-online-training/](SAP analytics cloud training) (https://www.igmguru.com/erp-training/sac-analytics-cloud-online-training/)[SAP analytics cloud training] [url=https://www.igmguru.com/erp-training/sac-analytics-cloud-online-training/]SAP analytics cloud training[/url] [https://www.igmguru.com/erp-training/sac-analytics-cloud-online-training/ SAP analytics cloud training] [SAP analytics cloud training](https://www.igmguru.com/erp-training/sac-analytics-cloud-online-training/) (SAP analytics cloud training)[https://www.igmguru.com/erp-training/sac-analytics-cloud-online-training/] was (Author: JIRAUSER294516): The Salesforce Marketing Cloud training offered by IgmGuru is created by instructors who are experts in the field using the most recent curriculum. The [[https://www.igmguru.com/salesforce/salesforce-marketing-cloud-training/]Salesforce Marketing Cloud Certification] Course credential is intended for people who want to show that they have knowledge, expertise, and experience in the following areas: best practices for email marketing, message design, subscriber and data management, inbox delivery, email automation, and tracking and reporting metrics within the Marketing Cloud Email application. > Add smoke test in standalone cluster for spark-docker > - > > Key: SPARK-40569 > URL: https://issues.apache.org/jira/browse/SPARK-40569 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Qian Sun >Assignee: Qian Sun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-40569) Add smoke test in standalone cluster for spark-docker
[ https://issues.apache.org/jira/browse/SPARK-40569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626753#comment-17626753 ] Vivek Garg edited comment on SPARK-40569 at 10/31/22 6:39 PM: -- The Salesforce Marketing Cloud training offered by IgmGuru is created by instructors who are experts in the field using the most recent curriculum. The [Salesforce Marketing Cloud Certification|[http://www.igmguru.com/salesforce/salesforce-marketing-cloud-training/|https://www.igmguru.com/salesforce/salesforce-marketing-cloud-training/]] Course credential is intended for people who want to show that they have knowledge, expertise, and experience in the following areas: best practices for email marketing, message design, subscriber and data management, inbox delivery, email automation, and tracking and reporting metrics within the Marketing Cloud Email application. was (Author: JIRAUSER294516): https://www.igmguru.com/erp-training/sac-analytics-cloud-online-training/";>SAP analytics cloud training [https://www.igmguru.com/erp-training/sac-analytics-cloud-online-training/](SAP analytics cloud training) (https://www.igmguru.com/erp-training/sac-analytics-cloud-online-training/)[SAP analytics cloud training] [url=https://www.igmguru.com/erp-training/sac-analytics-cloud-online-training/]SAP analytics cloud training[/url] [https://www.igmguru.com/erp-training/sac-analytics-cloud-online-training/ SAP analytics cloud training] [SAP analytics cloud training](https://www.igmguru.com/erp-training/sac-analytics-cloud-online-training/) (SAP analytics cloud training)[https://www.igmguru.com/erp-training/sac-analytics-cloud-online-training/] > Add smoke test in standalone cluster for spark-docker > - > > Key: SPARK-40569 > URL: https://issues.apache.org/jira/browse/SPARK-40569 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Qian Sun >Assignee: Qian Sun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] (SPARK-40569) Add smoke test in standalone cluster for spark-docker
[ https://issues.apache.org/jira/browse/SPARK-40569 ] Vivek Garg deleted comment on SPARK-40569: was (Author: JIRAUSER294516): The Salesforce Marketing Cloud training offered by IgmGuru is created by instructors who are experts in the field using the most recent curriculum. The [Salesforce Marketing Cloud Certification|[http://www.igmguru.com/salesforce/salesforce-marketing-cloud-training/|https://www.igmguru.com/salesforce/salesforce-marketing-cloud-training/]] Course credential is intended for people who want to show that they have knowledge, expertise, and experience in the following areas: best practices for email marketing, message design, subscriber and data management, inbox delivery, email automation, and tracking and reporting metrics within the Marketing Cloud Email application. > Add smoke test in standalone cluster for spark-docker > - > > Key: SPARK-40569 > URL: https://issues.apache.org/jira/browse/SPARK-40569 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Qian Sun >Assignee: Qian Sun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33807) Data Source V2: Remove read specific distributions
[ https://issues.apache.org/jira/browse/SPARK-33807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626759#comment-17626759 ] Vivek Garg commented on SPARK-33807: Thank [you|https://www.igmguru.com/salesforce/salesforce-marketing-cloud-training/]. > Data Source V2: Remove read specific distributions > -- > > Key: SPARK-33807 > URL: https://issues.apache.org/jira/browse/SPARK-33807 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Anton Okolnychyi >Priority: Blocker > > We should remove the read-specific distributions for DS V2 as discussed > [here|https://github.com/apache/spark/pull/30706#discussion_r543059827]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] (SPARK-22588) SPARK: Load Data from Dataframe or RDD to DynamoDB / dealing with null values
[ https://issues.apache.org/jira/browse/SPARK-22588 ] Vivek Garg deleted comment on SPARK-22588: was (Author: JIRAUSER294516): Thank [you|https://www.igmguru.com/salesforce/salesforce-marketing-cloud-training/]. > SPARK: Load Data from Dataframe or RDD to DynamoDB / dealing with null values > - > > Key: SPARK-22588 > URL: https://issues.apache.org/jira/browse/SPARK-22588 > Project: Spark > Issue Type: Question > Components: Deploy >Affects Versions: 2.1.1 >Reporter: Saanvi Sharma >Priority: Minor > Labels: dynamodb, spark > Original Estimate: 24h > Remaining Estimate: 24h > > I am using spark 2.1 on EMR and i have a dataframe like this: > ClientNum | Value_1 | Value_2 | Value_3 | Value_4 > 14 |A |B| C | null > 19 |X |Y| null| null > 21 |R | null | null| null > I want to load data into DynamoDB table with ClientNum as key fetching: > Analyze Your Data on Amazon DynamoDB with apche Spark11 > Using Spark SQL for ETL3 > here is my code that I tried to solve: > var jobConf = new JobConf(sc.hadoopConfiguration) > jobConf.set("dynamodb.servicename", "dynamodb") > jobConf.set("dynamodb.input.tableName", "table_name") > jobConf.set("dynamodb.output.tableName", "table_name") > jobConf.set("dynamodb.endpoint", "dynamodb.eu-west-1.amazonaws.com") > jobConf.set("dynamodb.regionid", "eu-west-1") > jobConf.set("dynamodb.throughput.read", "1") > jobConf.set("dynamodb.throughput.read.percent", "1") > jobConf.set("dynamodb.throughput.write", "1") > jobConf.set("dynamodb.throughput.write.percent", "1") > > jobConf.set("mapred.output.format.class", > "org.apache.hadoop.dynamodb.write.DynamoDBOutputFormat") > jobConf.set("mapred.input.format.class", > "org.apache.hadoop.dynamodb.read.DynamoDBInputFormat") > #Import Data > val df = > sqlContext.read.format("com.databricks.spark.csv").option("header", > "true").option("inferSchema", "true").load(path) > I performed a transformation to have an RDD that matches the types that the > DynamoDB custom output format knows how to write. The custom output format > expects a tuple containing the Text and DynamoDBItemWritable types. > Create a new RDD with those types in it, in the following map call: > #Convert the dataframe to rdd > val df_rdd = df.rdd > > df_rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = > MapPartitionsRDD[10] at rdd at :41 > > #Print first rdd > df_rdd.take(1) > > res12: Array[org.apache.spark.sql.Row] = Array([14,A,B,C,null]) > var ddbInsertFormattedRDD = df_rdd.map(a => { > var ddbMap = new HashMap[String, AttributeValue]() > var ClientNum = new AttributeValue() > ClientNum.setN(a.get(0).toString) > ddbMap.put("ClientNum", ClientNum) > var Value_1 = new AttributeValue() > Value_1.setS(a.get(1).toString) > ddbMap.put("Value_1", Value_1) > var Value_2 = new AttributeValue() > Value_2.setS(a.get(2).toString) > ddbMap.put("Value_2", Value_2) > var Value_3 = new AttributeValue() > Value_3.setS(a.get(3).toString) > ddbMap.put("Value_3", Value_3) > var Value_4 = new AttributeValue() > Value_4.setS(a.get(4).toString) > ddbMap.put("Value_4", Value_4) > var item = new DynamoDBItemWritable() > item.setItem(ddbMap) > (new Text(""), item) > }) > This last call uses the job configuration that defines the EMR-DDB connector > to write out the new RDD you created in the expected format: > ddbInsertFormattedRDD.saveAsHadoopDataset(jobConf) > fails with the follwoing error: > Caused by: java.lang.NullPointerException > null values caused the error, if I try with ClientNum and Value_1 it works > data is correctly inserted on DynamoDB table. > Thanks for your help !! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22588) SPARK: Load Data from Dataframe or RDD to DynamoDB / dealing with null values
[ https://issues.apache.org/jira/browse/SPARK-22588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626758#comment-17626758 ] Vivek Garg commented on SPARK-22588: Thank [you|https://www.igmguru.com/salesforce/salesforce-marketing-cloud-training/]. > SPARK: Load Data from Dataframe or RDD to DynamoDB / dealing with null values > - > > Key: SPARK-22588 > URL: https://issues.apache.org/jira/browse/SPARK-22588 > Project: Spark > Issue Type: Question > Components: Deploy >Affects Versions: 2.1.1 >Reporter: Saanvi Sharma >Priority: Minor > Labels: dynamodb, spark > Original Estimate: 24h > Remaining Estimate: 24h > > I am using spark 2.1 on EMR and i have a dataframe like this: > ClientNum | Value_1 | Value_2 | Value_3 | Value_4 > 14 |A |B| C | null > 19 |X |Y| null| null > 21 |R | null | null| null > I want to load data into DynamoDB table with ClientNum as key fetching: > Analyze Your Data on Amazon DynamoDB with apche Spark11 > Using Spark SQL for ETL3 > here is my code that I tried to solve: > var jobConf = new JobConf(sc.hadoopConfiguration) > jobConf.set("dynamodb.servicename", "dynamodb") > jobConf.set("dynamodb.input.tableName", "table_name") > jobConf.set("dynamodb.output.tableName", "table_name") > jobConf.set("dynamodb.endpoint", "dynamodb.eu-west-1.amazonaws.com") > jobConf.set("dynamodb.regionid", "eu-west-1") > jobConf.set("dynamodb.throughput.read", "1") > jobConf.set("dynamodb.throughput.read.percent", "1") > jobConf.set("dynamodb.throughput.write", "1") > jobConf.set("dynamodb.throughput.write.percent", "1") > > jobConf.set("mapred.output.format.class", > "org.apache.hadoop.dynamodb.write.DynamoDBOutputFormat") > jobConf.set("mapred.input.format.class", > "org.apache.hadoop.dynamodb.read.DynamoDBInputFormat") > #Import Data > val df = > sqlContext.read.format("com.databricks.spark.csv").option("header", > "true").option("inferSchema", "true").load(path) > I performed a transformation to have an RDD that matches the types that the > DynamoDB custom output format knows how to write. The custom output format > expects a tuple containing the Text and DynamoDBItemWritable types. > Create a new RDD with those types in it, in the following map call: > #Convert the dataframe to rdd > val df_rdd = df.rdd > > df_rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = > MapPartitionsRDD[10] at rdd at :41 > > #Print first rdd > df_rdd.take(1) > > res12: Array[org.apache.spark.sql.Row] = Array([14,A,B,C,null]) > var ddbInsertFormattedRDD = df_rdd.map(a => { > var ddbMap = new HashMap[String, AttributeValue]() > var ClientNum = new AttributeValue() > ClientNum.setN(a.get(0).toString) > ddbMap.put("ClientNum", ClientNum) > var Value_1 = new AttributeValue() > Value_1.setS(a.get(1).toString) > ddbMap.put("Value_1", Value_1) > var Value_2 = new AttributeValue() > Value_2.setS(a.get(2).toString) > ddbMap.put("Value_2", Value_2) > var Value_3 = new AttributeValue() > Value_3.setS(a.get(3).toString) > ddbMap.put("Value_3", Value_3) > var Value_4 = new AttributeValue() > Value_4.setS(a.get(4).toString) > ddbMap.put("Value_4", Value_4) > var item = new DynamoDBItemWritable() > item.setItem(ddbMap) > (new Text(""), item) > }) > This last call uses the job configuration that defines the EMR-DDB connector > to write out the new RDD you created in the expected format: > ddbInsertFormattedRDD.saveAsHadoopDataset(jobConf) > fails with the follwoing error: > Caused by: java.lang.NullPointerException > null values caused the error, if I try with ClientNum and Value_1 it works > data is correctly inserted on DynamoDB table. > Thanks for your help !! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-33807) Data Source V2: Remove read specific distributions
[ https://issues.apache.org/jira/browse/SPARK-33807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626759#comment-17626759 ] Vivek Garg edited comment on SPARK-33807 at 10/31/22 6:42 PM: -- Thank [Salesforce Marketing Cloud Certification|https://www.igmguru.com/salesforce/salesforce-marketing-cloud-training/]. was (Author: JIRAUSER294516): Thank [you|https://www.igmguru.com/salesforce/salesforce-marketing-cloud-training/]. > Data Source V2: Remove read specific distributions > -- > > Key: SPARK-33807 > URL: https://issues.apache.org/jira/browse/SPARK-33807 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Anton Okolnychyi >Priority: Blocker > > We should remove the read-specific distributions for DS V2 as discussed > [here|https://github.com/apache/spark/pull/30706#discussion_r543059827]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-33807) Data Source V2: Remove read specific distributions
[ https://issues.apache.org/jira/browse/SPARK-33807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626759#comment-17626759 ] Vivek Garg edited comment on SPARK-33807 at 10/31/22 6:43 PM: -- Great job. [Salesforce Marketing Cloud Certification|https://www.igmguru.com/salesforce/salesforce-marketing-cloud-training/]. was (Author: JIRAUSER294516): Thank [Salesforce Marketing Cloud Certification|https://www.igmguru.com/salesforce/salesforce-marketing-cloud-training/]. > Data Source V2: Remove read specific distributions > -- > > Key: SPARK-33807 > URL: https://issues.apache.org/jira/browse/SPARK-33807 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Anton Okolnychyi >Priority: Blocker > > We should remove the read-specific distributions for DS V2 as discussed > [here|https://github.com/apache/spark/pull/30706#discussion_r543059827]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23521) SPIP: Standardize SQL logical plans with DataSourceV2
[ https://issues.apache.org/jira/browse/SPARK-23521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626762#comment-17626762 ] Vivek Garg commented on SPARK-23521: IgmGuru [Mulesoft Online Training|https://www.igmguru.com/digital-marketing-programming/mulesoft-training/] is created with the Mulesoft certification exam in mind to ensure that the applicant passes the test on their first try. > SPIP: Standardize SQL logical plans with DataSourceV2 > - > > Key: SPARK-23521 > URL: https://issues.apache.org/jira/browse/SPARK-23521 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Ryan Blue >Priority: Major > Labels: SPIP > Attachments: SPIP_ Standardize logical plans.pdf > > > Executive Summary: This SPIP is based on [discussion about the DataSourceV2 > implementation|https://lists.apache.org/thread.html/55676ec1f5039d3deaf347d391cf82fe8574b8fa4eeab70110ed5b2b@%3Cdev.spark.apache.org%3E] > on the dev list. The proposal is to standardize the logical plans used for > write operations to make the planner more maintainable and to make Spark's > write behavior predictable and reliable. It proposes the following principles: > # Use well-defined logical plan nodes for all high-level operations: insert, > create, CTAS, overwrite table, etc. > # Use planner rules that match on these high-level nodes, so that it isn’t > necessary to create rules to match each eventual code path individually. > # Clearly define Spark’s behavior for these logical plan nodes. Physical > nodes should implement that behavior so that all code paths eventually make > the same guarantees. > # Specialize implementation when creating a physical plan, not logical > plans. This will avoid behavior drift and ensure planner code is shared > across physical implementations. > The SPIP doc presents a small but complete set of those high-level logical > operations, most of which are already defined in SQL or implemented by some > write path in Spark. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] (SPARK-33807) Data Source V2: Remove read specific distributions
[ https://issues.apache.org/jira/browse/SPARK-33807 ] Chao Sun deleted comment on SPARK-33807: -- was (Author: JIRAUSER294516): Great job. [Salesforce Marketing Cloud Certification|https://www.igmguru.com/salesforce/salesforce-marketing-cloud-training/]. > Data Source V2: Remove read specific distributions > -- > > Key: SPARK-33807 > URL: https://issues.apache.org/jira/browse/SPARK-33807 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Anton Okolnychyi >Priority: Blocker > > We should remove the read-specific distributions for DS V2 as discussed > [here|https://github.com/apache/spark/pull/30706#discussion_r543059827]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] (SPARK-33807) Data Source V2: Remove read specific distributions
[ https://issues.apache.org/jira/browse/SPARK-33807 ] Chao Sun deleted comment on SPARK-33807: -- was (Author: JIRAUSER295436): Thank you for sharing such good information. Very informative and effective post. +[https://www.igmguru.com/digital-marketing-programming/react-native-training/]+ > Data Source V2: Remove read specific distributions > -- > > Key: SPARK-33807 > URL: https://issues.apache.org/jira/browse/SPARK-33807 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Anton Okolnychyi >Priority: Blocker > > We should remove the read-specific distributions for DS V2 as discussed > [here|https://github.com/apache/spark/pull/30706#discussion_r543059827]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40977) Complete Support for Union in Python client
Rui Wang created SPARK-40977: Summary: Complete Support for Union in Python client Key: SPARK-40977 URL: https://issues.apache.org/jira/browse/SPARK-40977 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Rui Wang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40977) Complete Support for Union in Python client
[ https://issues.apache.org/jira/browse/SPARK-40977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626799#comment-17626799 ] Apache Spark commented on SPARK-40977: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/38453 > Complete Support for Union in Python client > --- > > Key: SPARK-40977 > URL: https://issues.apache.org/jira/browse/SPARK-40977 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40977) Complete Support for Union in Python client
[ https://issues.apache.org/jira/browse/SPARK-40977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626800#comment-17626800 ] Apache Spark commented on SPARK-40977: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/38453 > Complete Support for Union in Python client > --- > > Key: SPARK-40977 > URL: https://issues.apache.org/jira/browse/SPARK-40977 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40977) Complete Support for Union in Python client
[ https://issues.apache.org/jira/browse/SPARK-40977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40977: Assignee: Apache Spark > Complete Support for Union in Python client > --- > > Key: SPARK-40977 > URL: https://issues.apache.org/jira/browse/SPARK-40977 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40977) Complete Support for Union in Python client
[ https://issues.apache.org/jira/browse/SPARK-40977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40977: Assignee: (was: Apache Spark) > Complete Support for Union in Python client > --- > > Key: SPARK-40977 > URL: https://issues.apache.org/jira/browse/SPARK-40977 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40947) Upgrade pandas to 1.5.1
[ https://issues.apache.org/jira/browse/SPARK-40947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-40947: - Assignee: Haejoon Lee > Upgrade pandas to 1.5.1 > --- > > Key: SPARK-40947 > URL: https://issues.apache.org/jira/browse/SPARK-40947 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > > Pandas 1.5.1 is released, we should support latest pandas. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40947) Upgrade pandas to 1.5.1
[ https://issues.apache.org/jira/browse/SPARK-40947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-40947. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38420 [https://github.com/apache/spark/pull/38420] > Upgrade pandas to 1.5.1 > --- > > Key: SPARK-40947 > URL: https://issues.apache.org/jira/browse/SPARK-40947 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.4.0 > > > Pandas 1.5.1 is released, we should support latest pandas. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40966) FIX `read_parquet` with `pandas_metadata`
[ https://issues.apache.org/jira/browse/SPARK-40966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-40966: - Assignee: Haejoon Lee > FIX `read_parquet` with `pandas_metadata` > - > > Key: SPARK-40966 > URL: https://issues.apache.org/jira/browse/SPARK-40966 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > > test_parquet_read_with_pandas_metadata is broken with pandas 1.5.1. > should fix it -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40966) FIX `read_parquet` with `pandas_metadata`
[ https://issues.apache.org/jira/browse/SPARK-40966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-40966. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38420 [https://github.com/apache/spark/pull/38420] > FIX `read_parquet` with `pandas_metadata` > - > > Key: SPARK-40966 > URL: https://issues.apache.org/jira/browse/SPARK-40966 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.4.0 > > > test_parquet_read_with_pandas_metadata is broken with pandas 1.5.1. > should fix it -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40976) Upgrade sbt to 1.7.3
[ https://issues.apache.org/jira/browse/SPARK-40976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-40976. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38451 [https://github.com/apache/spark/pull/38451] > Upgrade sbt to 1.7.3 > > > Key: SPARK-40976 > URL: https://issues.apache.org/jira/browse/SPARK-40976 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Fix For: 3.4.0 > > > https://github.com/sbt/sbt/releases/tag/v1.7.3 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40976) Upgrade sbt to 1.7.3
[ https://issues.apache.org/jira/browse/SPARK-40976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-40976: - Assignee: Yang Jie > Upgrade sbt to 1.7.3 > > > Key: SPARK-40976 > URL: https://issues.apache.org/jira/browse/SPARK-40976 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > > https://github.com/sbt/sbt/releases/tag/v1.7.3 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40978) Migrate failAnalysis() w/o context onto error classes
Max Gekk created SPARK-40978: Summary: Migrate failAnalysis() w/o context onto error classes Key: SPARK-40978 URL: https://issues.apache.org/jira/browse/SPARK-40978 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.4.0 Reporter: Max Gekk Assignee: Max Gekk Fix For: 3.4.0 Call `failAnalysis()` with an error class instead of `failAnalysis()` w/ a message. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40978) Migrate failAnalysis() w/o context onto error classes
[ https://issues.apache.org/jira/browse/SPARK-40978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-40978: - Description: Call `failAnalysis()` w/o context but with an error class instead of `failAnalysis()` w/ a message. (was: Call `failAnalysis()` with an error class instead of `failAnalysis()` w/ a message.) > Migrate failAnalysis() w/o context onto error classes > - > > Key: SPARK-40978 > URL: https://issues.apache.org/jira/browse/SPARK-40978 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Fix For: 3.4.0 > > > Call `failAnalysis()` w/o context but with an error class instead of > `failAnalysis()` w/ a message. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40979) Keep removed executor info in decommission state
Dongjoon Hyun created SPARK-40979: - Summary: Keep removed executor info in decommission state Key: SPARK-40979 URL: https://issues.apache.org/jira/browse/SPARK-40979 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.4.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40979) Keep removed executor info in decommission state
[ https://issues.apache.org/jira/browse/SPARK-40979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-40979: -- Reporter: Zhongwei Zhu (was: Dongjoon Hyun) > Keep removed executor info in decommission state > > > Key: SPARK-40979 > URL: https://issues.apache.org/jira/browse/SPARK-40979 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Zhongwei Zhu >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31776) Literal lit() supports lists and numpy arrays
[ https://issues.apache.org/jira/browse/SPARK-31776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626844#comment-17626844 ] Xinrong Meng commented on SPARK-31776: -- `lit` supports Python list and NumPy arrays in https://issues.apache.org/jira/browse/SPARK-39405 in Spark 3.4.0. > Literal lit() supports lists and numpy arrays > - > > Key: SPARK-31776 > URL: https://issues.apache.org/jira/browse/SPARK-31776 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Xiangrui Meng >Priority: Major > > In ML workload, it is common to replace null feature vectors with some > default value. However, lit() does not support Python list and numpy arrays > at input. Users cannot simply use fillna() to get the job done. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40979) Keep removed executor info in decommission state
[ https://issues.apache.org/jira/browse/SPARK-40979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40979: Assignee: (was: Apache Spark) > Keep removed executor info in decommission state > > > Key: SPARK-40979 > URL: https://issues.apache.org/jira/browse/SPARK-40979 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Zhongwei Zhu >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40979) Keep removed executor info in decommission state
[ https://issues.apache.org/jira/browse/SPARK-40979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40979: Assignee: Apache Spark > Keep removed executor info in decommission state > > > Key: SPARK-40979 > URL: https://issues.apache.org/jira/browse/SPARK-40979 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Zhongwei Zhu >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40979) Keep removed executor info in decommission state
[ https://issues.apache.org/jira/browse/SPARK-40979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626845#comment-17626845 ] Apache Spark commented on SPARK-40979: -- User 'warrenzhu25' has created a pull request for this issue: https://github.com/apache/spark/pull/38441 > Keep removed executor info in decommission state > > > Key: SPARK-40979 > URL: https://issues.apache.org/jira/browse/SPARK-40979 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Zhongwei Zhu >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6857) Python SQL schema inference should support numpy types
[ https://issues.apache.org/jira/browse/SPARK-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626847#comment-17626847 ] Xinrong Meng commented on SPARK-6857: - Hi, we have NumPy input support https://issues.apache.org/jira/browse/SPARK-39405 in Spark 3.4.0. > Python SQL schema inference should support numpy types > -- > > Key: SPARK-6857 > URL: https://issues.apache.org/jira/browse/SPARK-6857 > Project: Spark > Issue Type: Improvement > Components: MLlib, PySpark, SQL >Affects Versions: 1.3.0 >Reporter: Joseph K. Bradley >Priority: Major > > **UPDATE**: Closing this JIRA since a better fix will be better UDT support. > See discussion in comments. > If you try to use SQL's schema inference to create a DataFrame out of a list > or RDD of numpy types (such as numpy.float64), SQL will not recognize the > numpy types. It would be handy if it did. > E.g.: > {code} > import numpy > from collections import namedtuple > from pyspark.sql import SQLContext > MyType = namedtuple('MyType', 'x') > myValues = map(lambda x: MyType(x), numpy.random.randint(100, size=10)) > sqlContext = SQLContext(sc) > data = sqlContext.createDataFrame(myValues) > {code} > The above code fails with: > {code} > Traceback (most recent call last): > File "", line 1, in > File "/Users/josephkb/spark/python/pyspark/sql/context.py", line 331, in > createDataFrame > return self.inferSchema(data, samplingRatio) > File "/Users/josephkb/spark/python/pyspark/sql/context.py", line 205, in > inferSchema > schema = self._inferSchema(rdd, samplingRatio) > File "/Users/josephkb/spark/python/pyspark/sql/context.py", line 160, in > _inferSchema > schema = _infer_schema(first) > File "/Users/josephkb/spark/python/pyspark/sql/types.py", line 660, in > _infer_schema > fields = [StructField(k, _infer_type(v), True) for k, v in items] > File "/Users/josephkb/spark/python/pyspark/sql/types.py", line 637, in > _infer_type > raise ValueError("not supported type: %s" % type(obj)) > ValueError: not supported type: > {code} > But if we cast to int (not numpy types) first, it's OK: > {code} > myNativeValues = map(lambda x: MyType(int(x.x)), myValues) > data = sqlContext.createDataFrame(myNativeValues) # OK > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37697) Make it easier to convert numpy arrays to Spark Dataframes
[ https://issues.apache.org/jira/browse/SPARK-37697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626850#comment-17626850 ] Xinrong Meng commented on SPARK-37697: -- Hi, we have NumPy input support https://issues.apache.org/jira/browse/SPARK-39405 in Spark 3.4.0. > Make it easier to convert numpy arrays to Spark Dataframes > -- > > Key: SPARK-37697 > URL: https://issues.apache.org/jira/browse/SPARK-37697 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.1.2 >Reporter: Douglas Moore >Priority: Major > > Make it easier to convert numpy arrays to dataframes. > Often we receive errors: > > {code:java} > df = spark.createDataFrame(numpy.arange(10)) > Can not infer schema for type: > {code} > > OR > {code:java} > df = spark.createDataFrame(numpy.arange(10.)) > Can not infer schema for type: > {code} > > Today (Spark 3.x) we have to: > {code:java} > spark.createDataFrame(pd.DataFrame(numpy.arange(10.))) {code} > Make this easier with a direct conversion from Numpy arrays to Spark > Dataframes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40979) Keep removed executor info in decommission state
[ https://issues.apache.org/jira/browse/SPARK-40979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhongwei Zhu updated SPARK-40979: - Description: Removed executor due to decommission should be kept in a separate set. To avoid OOM, set size will be limited to 1K or 10K. FetchFailed caused by decom executor could be divided into 2 categories: # When FetchFailed reached DAGScheduler, the executor is still alive or is lost but the lost info hasn't reached TaskSchedulerImpl. This is already handled in SPARK-40979 # FetchFailed is caused by decom executor loss, so the decom info is already removed in TaskSchedulerImpl. If we keep such info in a short period, it is good enough. Even we limit the size of removed executors to 10K, it could be only at most 10MB memory usage. In real case, it's rare to have cluster size of over 10K and the chance that all these executors decomed and lost at the same time would be small. > Keep removed executor info in decommission state > > > Key: SPARK-40979 > URL: https://issues.apache.org/jira/browse/SPARK-40979 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Zhongwei Zhu >Priority: Major > > Removed executor due to decommission should be kept in a separate set. To > avoid OOM, set size will be limited to 1K or 10K. > FetchFailed caused by decom executor could be divided into 2 categories: > # When FetchFailed reached DAGScheduler, the executor is still alive or is > lost but the lost info hasn't reached TaskSchedulerImpl. This is already > handled in SPARK-40979 > # FetchFailed is caused by decom executor loss, so the decom info is already > removed in TaskSchedulerImpl. If we keep such info in a short period, it is > good enough. Even we limit the size of removed executors to 10K, it could be > only at most 10MB memory usage. In real case, it's rare to have cluster size > of over 10K and the chance that all these executors decomed and lost at the > same time would be small. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40978) Migrate failAnalysis() w/o context onto error classes
[ https://issues.apache.org/jira/browse/SPARK-40978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40978: Assignee: Max Gekk (was: Apache Spark) > Migrate failAnalysis() w/o context onto error classes > - > > Key: SPARK-40978 > URL: https://issues.apache.org/jira/browse/SPARK-40978 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Fix For: 3.4.0 > > > Call `failAnalysis()` w/o context but with an error class instead of > `failAnalysis()` w/ a message. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40978) Migrate failAnalysis() w/o context onto error classes
[ https://issues.apache.org/jira/browse/SPARK-40978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626853#comment-17626853 ] Apache Spark commented on SPARK-40978: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/38454 > Migrate failAnalysis() w/o context onto error classes > - > > Key: SPARK-40978 > URL: https://issues.apache.org/jira/browse/SPARK-40978 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Fix For: 3.4.0 > > > Call `failAnalysis()` w/o context but with an error class instead of > `failAnalysis()` w/ a message. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40978) Migrate failAnalysis() w/o context onto error classes
[ https://issues.apache.org/jira/browse/SPARK-40978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40978: Assignee: Apache Spark (was: Max Gekk) > Migrate failAnalysis() w/o context onto error classes > - > > Key: SPARK-40978 > URL: https://issues.apache.org/jira/browse/SPARK-40978 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Major > Fix For: 3.4.0 > > > Call `failAnalysis()` w/o context but with an error class instead of > `failAnalysis()` w/ a message. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40978) Migrate failAnalysis() w/o context onto error classes
[ https://issues.apache.org/jira/browse/SPARK-40978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626854#comment-17626854 ] Apache Spark commented on SPARK-40978: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/38454 > Migrate failAnalysis() w/o context onto error classes > - > > Key: SPARK-40978 > URL: https://issues.apache.org/jira/browse/SPARK-40978 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Fix For: 3.4.0 > > > Call `failAnalysis()` w/o context but with an error class instead of > `failAnalysis()` w/ a message. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37946) Use error classes in the execution errors related to partitions
[ https://issues.apache.org/jira/browse/SPARK-37946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626866#comment-17626866 ] Khalid Mammadov commented on SPARK-37946: - Hi [~maxgekk], I see this one is not done yet here: partitionColumnNotFoundInSchemaError Can I look into it? Also, there are some more waiting to be done in QueryExecutionErrors.scala e.g. stateNotDefinedOrAlreadyRemovedError cannotSetTimeoutDurationError cannotGetEventTimeWatermarkError cannotSetTimeoutTimestampError batchMetadataFileNotFoundError Shall I look into these as well? > Use error classes in the execution errors related to partitions > --- > > Key: SPARK-37946 > URL: https://issues.apache.org/jira/browse/SPARK-37946 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Priority: Major > > Migrate the following errors in QueryExecutionErrors: > * unableToDeletePartitionPathError > * unableToCreatePartitionPathError > * unableToRenamePartitionPathError > * notADatasourceRDDPartitionError > * cannotClearPartitionDirectoryError > * failedToCastValueToDataTypeForPartitionColumnError > * unsupportedPartitionTransformError > * cannotCreateJDBCTableWithPartitionsError > * requestedPartitionsMismatchTablePartitionsError > * dynamicPartitionKeyNotAmongWrittenPartitionPathsError > * cannotRemovePartitionDirError > * alterTableWithDropPartitionAndPurgeUnsupportedError > * invalidPartitionFilterError > * getPartitionMetadataByFilterError > * illegalLocationClauseForViewPartitionError > * partitionColumnNotFoundInSchemaError > * cannotAddMultiPartitionsOnNonatomicPartitionTableError > * cannotDropMultiPartitionsOnNonatomicPartitionTableError > * truncateMultiPartitionUnsupportedError > * dynamicPartitionOverwriteUnsupportedByTableError > * writePartitionExceedConfigSizeWhenDynamicPartitionError > onto use error classes. Throw an implementation of SparkThrowable. Also write > a test per every error in QueryExecutionErrorsSuite. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40815) SymlinkTextInputFormat returns incorrect result due to enabled spark.hadoopRDD.ignoreEmptySplits
[ https://issues.apache.org/jira/browse/SPARK-40815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-40815. --- Fix Version/s: 3.4.0 Assignee: Ivan Sadikov Resolution: Fixed This is resolved via https://github.com/apache/spark/pull/38277 > SymlinkTextInputFormat returns incorrect result due to enabled > spark.hadoopRDD.ignoreEmptySplits > > > Key: SPARK-40815 > URL: https://issues.apache.org/jira/browse/SPARK-40815 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0, 3.2.2, 3.4.0 >Reporter: Ivan Sadikov >Assignee: Ivan Sadikov >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40951) pyspark-connect tests should be skipped if pandas doesn't exist
[ https://issues.apache.org/jira/browse/SPARK-40951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626878#comment-17626878 ] Rui Wang commented on SPARK-40951: -- [~dongjoon] Is this JIRA fully resolved already? Can we close this JIRA now? > pyspark-connect tests should be skipped if pandas doesn't exist > --- > > Key: SPARK-40951 > URL: https://issues.apache.org/jira/browse/SPARK-40951 > Project: Spark > Issue Type: Sub-task > Components: PySpark, Tests >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40944) Relax ordering constraint for CREATE TABLE column options
[ https://issues.apache.org/jira/browse/SPARK-40944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang reassigned SPARK-40944: -- Assignee: Daniel > Relax ordering constraint for CREATE TABLE column options > - > > Key: SPARK-40944 > URL: https://issues.apache.org/jira/browse/SPARK-40944 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Daniel >Assignee: Daniel >Priority: Major > > Currently the grammar for each CREATE TABLE column is: > createOrReplaceTableColType > : colName=errorCapturingIdentifier dataType (NOT NULL)? > defaultExpression? commentSpec? > ; > This enforces a constraint on the order of: (NOT NULL, DEFAULT value, COMMENT > value). We can update the grammar to allow these options in any order > instead, to improve usability. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40944) Relax ordering constraint for CREATE TABLE column options
[ https://issues.apache.org/jira/browse/SPARK-40944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-40944. Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38418 [https://github.com/apache/spark/pull/38418] > Relax ordering constraint for CREATE TABLE column options > - > > Key: SPARK-40944 > URL: https://issues.apache.org/jira/browse/SPARK-40944 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Daniel >Assignee: Daniel >Priority: Major > Fix For: 3.4.0 > > > Currently the grammar for each CREATE TABLE column is: > createOrReplaceTableColType > : colName=errorCapturingIdentifier dataType (NOT NULL)? > defaultExpression? commentSpec? > ; > This enforces a constraint on the order of: (NOT NULL, DEFAULT value, COMMENT > value). We can update the grammar to allow these options in any order > instead, to improve usability. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29683) Job failed due to executor failures all available nodes are blacklisted
[ https://issues.apache.org/jira/browse/SPARK-29683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626884#comment-17626884 ] Attila Zsolt Piros commented on SPARK-29683: [~srowen] I think we can close this as this commit solved the issue: https://github.com/apache/spark/commit/e70df2cea46f71461d8d401a420e946f999862c1 What do you think? > Job failed due to executor failures all available nodes are blacklisted > --- > > Key: SPARK-29683 > URL: https://issues.apache.org/jira/browse/SPARK-29683 > Project: Spark > Issue Type: Bug > Components: Spark Core, YARN >Affects Versions: 3.0.0 >Reporter: Genmao Yu >Priority: Major > > My streaming job will fail *due to executor failures all available nodes are > blacklisted*. This exception is thrown only when all node is blacklisted: > {code:java} > def isAllNodeBlacklisted: Boolean = currentBlacklistedYarnNodes.size >= > numClusterNodes > val allBlacklistedNodes = excludeNodes ++ schedulerBlacklist ++ > allocatorBlacklist.keySet > {code} > After diving into the code, I found some critical conditions not be handled > properly: > - unchecked `excludeNodes`: it comes from user config. If not set properly, > it may lead to "currentBlacklistedYarnNodes.size >= numClusterNodes". For > example, we may set some nodes not in Yarn cluster. > {code:java} > excludeNodes = (invalid1, invalid2, invalid3) > clusterNodes = (valid1, valid2) > {code} > - `numClusterNodes` may equals 0: When HA Yarn failover, it will take some > time for all NodeManagers to register ResourceManager again. In this case, > `numClusterNode` may equals 0 or some other number, and Spark driver failed. > - too strong condition check: Spark driver will fail as long as > "currentBlacklistedYarnNodes.size >= numClusterNodes". This condition should > not indicate a unrecovered fatal. For example, there are some NodeManagers > restarting. So we can give some waiting time before job failed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40933) Reimplement df.stat.{cov, corr} with built-in sql functions
[ https://issues.apache.org/jira/browse/SPARK-40933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng updated SPARK-40933: -- Summary: Reimplement df.stat.{cov, corr} with built-in sql functions (was: Make df.stat.{cov, corr} consistent with sql functions) > Reimplement df.stat.{cov, corr} with built-in sql functions > --- > > Key: SPARK-40933 > URL: https://issues.apache.org/jira/browse/SPARK-40933 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40827) Re-enable the DataFrame.corrwith test after fixing in future pandas.
[ https://issues.apache.org/jira/browse/SPARK-40827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40827: Assignee: Apache Spark > Re-enable the DataFrame.corrwith test after fixing in future pandas. > > > Key: SPARK-40827 > URL: https://issues.apache.org/jira/browse/SPARK-40827 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Apache Spark >Priority: Major > > We should re-enable the skipped test that commented with "Regression in > pandas 1.5.0" after the behavior is fixed in future pandas. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org