[jira] [Assigned] (SPARK-27171) Support Full-Partiton limit in the first scan
[ https://issues.apache.org/jira/browse/SPARK-27171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-27171: Assignee: Apache Spark > Support Full-Partiton limit in the first scan > - > > Key: SPARK-27171 > URL: https://issues.apache.org/jira/browse/SPARK-27171 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.2, 2.4.0 >Reporter: deshanxiao >Assignee: Apache Spark >Priority: Major > > SparkPlan#executeTake must pick element starting with one partition. > Sometimes it will be slow for some query. Although, Spark is better at batch > query. It's not bad to add a switch to allow user search all partitons for > the first time in limit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27171) Support Full-Partiton limit in the first scan
[ https://issues.apache.org/jira/browse/SPARK-27171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-27171: Assignee: (was: Apache Spark) > Support Full-Partiton limit in the first scan > - > > Key: SPARK-27171 > URL: https://issues.apache.org/jira/browse/SPARK-27171 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.2, 2.4.0 >Reporter: deshanxiao >Priority: Major > > SparkPlan#executeTake must pick element starting with one partition. > Sometimes it will be slow for some query. Although, Spark is better at batch > query. It's not bad to add a switch to allow user search all partitons for > the first time in limit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27172) CRLF Injection/HTTP response splitting on spark embedded jetty servlet.
[ https://issues.apache.org/jira/browse/SPARK-27172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry Garcia updated SPARK-27172: - Description: Can we upgrade embedded jetty servlet on spark 1.6.2? Will there be any dependencies that will affected if we do upgrade it? Reason for doing this is we would like to the patch the vulnerability that was scanned, which is the CRLF injection attacks. Please do refer below information. Description: This script is possibly vulnerable to CRLF injection attacks. HTTP headers have the structure "Key: Value", where each line is separated by the CRLF combination. If the user input is injected into the value section without properly escaping/removing CRLF characters it is possible to alter the HTTP headers structure. HTTP Response Splitting is a new application attack technique which enables various new attacks such as web cache poisoning, cross user defacement, hijacking pages with sensitive user information and cross-site scripting (XSS). The attacker sends a single HTTP request that forces the web server to form an output stream, which is then interpreted by the target as two HTTP responses instead of one response. CWE #; CWE-113: Improper Neutralization of CRLF Sequences in HTTP Headers ('HTTP Response Splitting') was: Can we upgrade embedded jetty servlet on spark 1.6.2? As per our vulnerability scan embedded jetty servlet is vulnerable with CRLF injection attacks. Please do refer below information. Description: This script is possibly vulnerable to CRLF injection attacks. HTTP headers have the structure "Key: Value", where each line is separated by the CRLF combination. If the user input is injected into the value section without properly escaping/removing CRLF characters it is possible to alter the HTTP headers structure. HTTP Response Splitting is a new application attack technique which enables various new attacks such as web cache poisoning, cross user defacement, hijacking pages with sensitive user information and cross-site scripting (XSS). The attacker sends a single HTTP request that forces the web server to form an output stream, which is then interpreted by the target as two HTTP responses instead of one response. CWE #; CWE-113: Improper Neutralization of CRLF Sequences in HTTP Headers ('HTTP Response Splitting') > CRLF Injection/HTTP response splitting on spark embedded jetty servlet. > --- > > Key: SPARK-27172 > URL: https://issues.apache.org/jira/browse/SPARK-27172 > Project: Spark > Issue Type: Dependency upgrade > Components: Web UI >Affects Versions: 1.6.2 >Reporter: Jerry Garcia >Priority: Major > > Can we upgrade embedded jetty servlet on spark 1.6.2? Will there be any > dependencies that will affected if we do upgrade it? Reason for doing this is > we would like to the patch the vulnerability that was scanned, which is the > CRLF injection attacks. Please do refer below information. > Description: > This script is possibly vulnerable to CRLF injection attacks. HTTP headers > have the structure "Key: Value", where each line is separated by the CRLF > combination. If the user input is injected into the value section without > properly escaping/removing CRLF characters it is possible to alter the HTTP > headers structure. HTTP Response Splitting is a new application attack > technique which enables various new attacks such as web cache poisoning, > cross user defacement, hijacking pages with sensitive user information and > cross-site scripting (XSS). The attacker sends a single HTTP request that > forces the web server to form an output stream, which is then interpreted by > the target as two HTTP responses instead of one response. > CWE #; > CWE-113: Improper Neutralization of CRLF Sequences in HTTP Headers ('HTTP > Response Splitting') > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27167) What would be the possible impact if I upgrade /static/jquery-1.11.1.min.js ?
[ https://issues.apache.org/jira/browse/SPARK-27167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry Garcia updated SPARK-27167: - Description: Will there be a big impact on the system if current /static/jquery-1.11.1.min.js will be update to latest version ? As per our vulnerability scan javascript library that we are currently using is vulnerable and we wanted to address this vulnerability. Appreciate any help we could get from the community. *Description:* You are using a vulnerable Javascript library. One or more vulnerabilities were reported for this version of the Javascript library. Consult Attack details and Web References for more information about the affected library and the vulnerabilities that were reported. *CWE #:* CWE-16 - Category - configuration Thank you, was: Will there be a big impact on my system if my current /static/jquery-1.11.1.min.js will be update to latest version ? As per our vulnerability scan javascript library that we are currently using is vulnerable and we wanted to address this vulnerability. Appreciate any help we could get from the community. *Description:* You are using a vulnerable Javascript library. One or more vulnerabilities were reported for this version of the Javascript library. Consult Attack details and Web References for more information about the affected library and the vulnerabilities that were reported. *CWE #:* CWE-16 - Category - configuration Thank you, > What would be the possible impact if I upgrade /static/jquery-1.11.1.min.js ? > - > > Key: SPARK-27167 > URL: https://issues.apache.org/jira/browse/SPARK-27167 > Project: Spark > Issue Type: Dependency upgrade > Components: Spark Core >Affects Versions: 1.6.2 >Reporter: Jerry Garcia >Priority: Minor > > Will there be a big impact on the system if current > /static/jquery-1.11.1.min.js will be update to latest version ? > As per our vulnerability scan javascript library that we are currently using > is vulnerable and we wanted to address this vulnerability. Appreciate any > help we could get from the community. > *Description:* > You are using a vulnerable Javascript library. One or more vulnerabilities > were reported for this version of the Javascript library. Consult Attack > details and Web References for more information about the affected library > and the vulnerabilities that were reported. > *CWE #:* > CWE-16 - Category - configuration > > Thank you, > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27156) why is the "http://:18080/static" browse able?
[ https://issues.apache.org/jira/browse/SPARK-27156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry Garcia updated SPARK-27156: - Issue Type: Bug (was: Question) > why is the "http://:18080/static" browse able? > > > Key: SPARK-27156 > URL: https://issues.apache.org/jira/browse/SPARK-27156 > Project: Spark > Issue Type: Bug > Components: Spark Core, Web UI >Affects Versions: 1.6.2 >Reporter: Jerry Garcia >Priority: Minor > Attachments: Screen Shot 2019-03-14 at 11.46.31 AM.png > > > I would like to know is there a way to disable spark history server /static > folder ? Please do refer on the attachment provided. Reason for asking is for > security purposes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27172) CRLF Injection/HTTP response splitting on spark embedded jetty servlet.
[ https://issues.apache.org/jira/browse/SPARK-27172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry Garcia updated SPARK-27172: - Issue Type: Dependency upgrade (was: Question) > CRLF Injection/HTTP response splitting on spark embedded jetty servlet. > --- > > Key: SPARK-27172 > URL: https://issues.apache.org/jira/browse/SPARK-27172 > Project: Spark > Issue Type: Dependency upgrade > Components: Web UI >Affects Versions: 1.6.2 >Reporter: Jerry Garcia >Priority: Major > > Can we upgrade embedded jetty servlet on spark 1.6.2? As per our > vulnerability scan embedded jetty servlet is vulnerable with CRLF injection > attacks. Please do refer below information. > Description: > This script is possibly vulnerable to CRLF injection attacks. HTTP headers > have the structure "Key: Value", where each line is separated by the CRLF > combination. If the user input is injected into the value section without > properly escaping/removing CRLF characters it is possible to alter the HTTP > headers structure. HTTP Response Splitting is a new application attack > technique which enables various new attacks such as web cache poisoning, > cross user defacement, hijacking pages with sensitive user information and > cross-site scripting (XSS). The attacker sends a single HTTP request that > forces the web server to form an output stream, which is then interpreted by > the target as two HTTP responses instead of one response. > CWE #; > CWE-113: Improper Neutralization of CRLF Sequences in HTTP Headers ('HTTP > Response Splitting') > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27172) CRLF Injection/HTTP response splitting on spark embedded jetty servlet.
[ https://issues.apache.org/jira/browse/SPARK-27172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry Garcia updated SPARK-27172: - Description: Can we upgrade embedded jetty servlet on spark 1.6.2? As per our vulnerability scan embedded jetty servlet is vulnerable with CRLF injection attacks. Please do refer below information. Description: This script is possibly vulnerable to CRLF injection attacks. HTTP headers have the structure "Key: Value", where each line is separated by the CRLF combination. If the user input is injected into the value section without properly escaping/removing CRLF characters it is possible to alter the HTTP headers structure. HTTP Response Splitting is a new application attack technique which enables various new attacks such as web cache poisoning, cross user defacement, hijacking pages with sensitive user information and cross-site scripting (XSS). The attacker sends a single HTTP request that forces the web server to form an output stream, which is then interpreted by the target as two HTTP responses instead of one response. CWE #; CWE-113: Improper Neutralization of CRLF Sequences in HTTP Headers ('HTTP Response Splitting') was: Can we upgrade embedded jetty servlet on spark 1.6.2? Is this possible or will there be any impact if we do upgrade it ? Please do refer on description of the vulnerability provided: Description: This script is possibly vulnerable to CRLF injection attacks. HTTP headers have the structure "Key: Value", where each line is separated by the CRLF combination. If the user input is injected into the value section without properly escaping/removing CRLF characters it is possible to alter the HTTP headers structure. HTTP Response Splitting is a new application attack technique which enables various new attacks such as web cache poisoning, cross user defacement, hijacking pages with sensitive user information and cross-site scripting (XSS). The attacker sends a single HTTP request that forces the web server to form an output stream, which is then interpreted by the target as two HTTP responses instead of one response. CWE #; CWE-113: Improper Neutralization of CRLF Sequences in HTTP Headers ('HTTP Response Splitting') > CRLF Injection/HTTP response splitting on spark embedded jetty servlet. > --- > > Key: SPARK-27172 > URL: https://issues.apache.org/jira/browse/SPARK-27172 > Project: Spark > Issue Type: Question > Components: Web UI >Affects Versions: 1.6.2 >Reporter: Jerry Garcia >Priority: Major > > Can we upgrade embedded jetty servlet on spark 1.6.2? As per our > vulnerability scan embedded jetty servlet is vulnerable with CRLF injection > attacks. Please do refer below information. > Description: > This script is possibly vulnerable to CRLF injection attacks. HTTP headers > have the structure "Key: Value", where each line is separated by the CRLF > combination. If the user input is injected into the value section without > properly escaping/removing CRLF characters it is possible to alter the HTTP > headers structure. HTTP Response Splitting is a new application attack > technique which enables various new attacks such as web cache poisoning, > cross user defacement, hijacking pages with sensitive user information and > cross-site scripting (XSS). The attacker sends a single HTTP request that > forces the web server to form an output stream, which is then interpreted by > the target as two HTTP responses instead of one response. > CWE #; > CWE-113: Improper Neutralization of CRLF Sequences in HTTP Headers ('HTTP > Response Splitting') > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27167) What would be the possible impact if I upgrade /static/jquery-1.11.1.min.js ?
[ https://issues.apache.org/jira/browse/SPARK-27167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry Garcia updated SPARK-27167: - Issue Type: Dependency upgrade (was: Question) > What would be the possible impact if I upgrade /static/jquery-1.11.1.min.js ? > - > > Key: SPARK-27167 > URL: https://issues.apache.org/jira/browse/SPARK-27167 > Project: Spark > Issue Type: Dependency upgrade > Components: Spark Core >Affects Versions: 1.6.2 >Reporter: Jerry Garcia >Priority: Minor > > Will there be a big impact on my system if my current > /static/jquery-1.11.1.min.js will be update to latest version ? > As per our vulnerability scan javascript library that we are currently using > is vulnerable and we wanted to address this vulnerability. Appreciate any > help we could get from the community. > *Description:* > You are using a vulnerable Javascript library. One or more vulnerabilities > were reported for this version of the Javascript library. Consult Attack > details and Web References for more information about the affected library > and the vulnerabilities that were reported. > *CWE #:* > CWE-16 - Category - configuration > > > Thank you, > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27167) What would be the possible impact if I upgrade /static/jquery-1.11.1.min.js ?
[ https://issues.apache.org/jira/browse/SPARK-27167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry Garcia updated SPARK-27167: - Description: Will there be a big impact on my system if my current /static/jquery-1.11.1.min.js will be update to latest version ? As per our vulnerability scan javascript library that we are currently using is vulnerable and we wanted to address this vulnerability. Appreciate any help we could get from the community. *Description:* You are using a vulnerable Javascript library. One or more vulnerabilities were reported for this version of the Javascript library. Consult Attack details and Web References for more information about the affected library and the vulnerabilities that were reported. *CWE #:* CWE-16 - Category - configuration Thank you, was: Will there be a big impact on my system if my current /static/jquery-1.11.1.min.js will be update to latest version ? As per our vulnerability scan javascript library that we are currently using is vulnerable and we wanted to address this vulnerability. Appreciate any help we could get from the community. Please do refer on the attachment provided. Thank you, > What would be the possible impact if I upgrade /static/jquery-1.11.1.min.js ? > - > > Key: SPARK-27167 > URL: https://issues.apache.org/jira/browse/SPARK-27167 > Project: Spark > Issue Type: Question > Components: Spark Core >Affects Versions: 1.6.2 >Reporter: Jerry Garcia >Priority: Minor > > Will there be a big impact on my system if my current > /static/jquery-1.11.1.min.js will be update to latest version ? > As per our vulnerability scan javascript library that we are currently using > is vulnerable and we wanted to address this vulnerability. Appreciate any > help we could get from the community. > *Description:* > You are using a vulnerable Javascript library. One or more vulnerabilities > were reported for this version of the Javascript library. Consult Attack > details and Web References for more information about the affected library > and the vulnerabilities that were reported. > *CWE #:* > CWE-16 - Category - configuration > > > Thank you, > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27167) What would be the possible impact if I upgrade /static/jquery-1.11.1.min.js ?
[ https://issues.apache.org/jira/browse/SPARK-27167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry Garcia updated SPARK-27167: - Attachment: (was: Vulnerability Javascript library.xlsx) > What would be the possible impact if I upgrade /static/jquery-1.11.1.min.js ? > - > > Key: SPARK-27167 > URL: https://issues.apache.org/jira/browse/SPARK-27167 > Project: Spark > Issue Type: Question > Components: Spark Core >Affects Versions: 1.6.2 >Reporter: Jerry Garcia >Priority: Minor > > Will there be a big impact on my system if my current > /static/jquery-1.11.1.min.js will be update to latest version ? > As per our vulnerability scan javascript library that we are currently using > is vulnerable and we wanted to address this vulnerability. Appreciate any > help we could get from the community. > Please do refer on the attachment provided. > > > Thank you, > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27172) CRLF Injection/HTTP response splitting on spark embedded jetty servlet.
[ https://issues.apache.org/jira/browse/SPARK-27172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry Garcia updated SPARK-27172: - Description: Can we upgrade embedded jetty servlet on spark 1.6.2? Is this possible or will there be any impact if we do upgrade it ? Please do refer on description of the vulnerability provided: Description: This script is possibly vulnerable to CRLF injection attacks. HTTP headers have the structure "Key: Value", where each line is separated by the CRLF combination. If the user input is injected into the value section without properly escaping/removing CRLF characters it is possible to alter the HTTP headers structure. HTTP Response Splitting is a new application attack technique which enables various new attacks such as web cache poisoning, cross user defacement, hijacking pages with sensitive user information and cross-site scripting (XSS). The attacker sends a single HTTP request that forces the web server to form an output stream, which is then interpreted by the target as two HTTP responses instead of one response. CWE #; CWE-113: Improper Neutralization of CRLF Sequences in HTTP Headers ('HTTP Response Splitting') was: Can we upgrade embedded jetty servlet on spark 1.6.2? Is this possible or will there be any impact if we do upgrade it ? Please do refer on the provided attachment for more information. > CRLF Injection/HTTP response splitting on spark embedded jetty servlet. > --- > > Key: SPARK-27172 > URL: https://issues.apache.org/jira/browse/SPARK-27172 > Project: Spark > Issue Type: Question > Components: Web UI >Affects Versions: 1.6.2 >Reporter: Jerry Garcia >Priority: Major > > Can we upgrade embedded jetty servlet on spark 1.6.2? Is this possible or > will there be any impact if we do upgrade it ? > Please do refer on description of the vulnerability provided: > Description: > This script is possibly vulnerable to CRLF injection attacks. HTTP headers > have the structure "Key: Value", where each line is separated by the CRLF > combination. If the user input is injected into the value section without > properly escaping/removing CRLF characters it is possible to alter the HTTP > headers structure. HTTP Response Splitting is a new application attack > technique which enables various new attacks such as web cache poisoning, > cross user defacement, hijacking pages with sensitive user information and > cross-site scripting (XSS). The attacker sends a single HTTP request that > forces the web server to form an output stream, which is then interpreted by > the target as two HTTP responses instead of one response. > > CWE #; > CWE-113: Improper Neutralization of CRLF Sequences in HTTP Headers ('HTTP > Response Splitting') > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27172) CRLF Injection/HTTP response splitting on spark embedded jetty servlet.
[ https://issues.apache.org/jira/browse/SPARK-27172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry Garcia updated SPARK-27172: - Attachment: (was: CRLF injection - Sheet1.pdf) > CRLF Injection/HTTP response splitting on spark embedded jetty servlet. > --- > > Key: SPARK-27172 > URL: https://issues.apache.org/jira/browse/SPARK-27172 > Project: Spark > Issue Type: Question > Components: Web UI >Affects Versions: 1.6.2 >Reporter: Jerry Garcia >Priority: Major > > Can we upgrade embedded jetty servlet on spark 1.6.2? Is this possible or > will there be any impact if we do upgrade it ? > Please do refer on the provided attachment for more information. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27167) What would be the possible impact if I upgrade /static/jquery-1.11.1.min.js ?
[ https://issues.apache.org/jira/browse/SPARK-27167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry Garcia updated SPARK-27167: - Attachment: Vulnerability Javascript library.xlsx > What would be the possible impact if I upgrade /static/jquery-1.11.1.min.js ? > - > > Key: SPARK-27167 > URL: https://issues.apache.org/jira/browse/SPARK-27167 > Project: Spark > Issue Type: Question > Components: Spark Core >Affects Versions: 1.6.2 >Reporter: Jerry Garcia >Priority: Minor > Attachments: Vulnerability Javascript library.xlsx > > > Will there be a big impact on my system if my current > /static/jquery-1.11.1.min.js will be update to latest version ? > As per VA scan javascript library that we are currently using is vulnerable > and we wanted to address this vulnerability. Appreciate any help we could get > from the community. > Please do refer on the attachment provided. > > > Thank you, > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27167) What would be the possible impact if I upgrade /static/jquery-1.11.1.min.js ?
[ https://issues.apache.org/jira/browse/SPARK-27167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry Garcia updated SPARK-27167: - Description: Will there be a big impact on my system if my current /static/jquery-1.11.1.min.js will be update to latest version ? As per our vulnerability scan javascript library that we are currently using is vulnerable and we wanted to address this vulnerability. Appreciate any help we could get from the community. Please do refer on the attachment provided. Thank you, was: Will there be a big impact on my system if my current /static/jquery-1.11.1.min.js will be update to latest version ? As per VA scan javascript library that we are currently using is vulnerable and we wanted to address this vulnerability. Appreciate any help we could get from the community. Please do refer on the attachment provided. Thank you, > What would be the possible impact if I upgrade /static/jquery-1.11.1.min.js ? > - > > Key: SPARK-27167 > URL: https://issues.apache.org/jira/browse/SPARK-27167 > Project: Spark > Issue Type: Question > Components: Spark Core >Affects Versions: 1.6.2 >Reporter: Jerry Garcia >Priority: Minor > Attachments: Vulnerability Javascript library.xlsx > > > Will there be a big impact on my system if my current > /static/jquery-1.11.1.min.js will be update to latest version ? > As per our vulnerability scan javascript library that we are currently using > is vulnerable and we wanted to address this vulnerability. Appreciate any > help we could get from the community. > Please do refer on the attachment provided. > > > Thank you, > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27167) What would be the possible impact if I upgrade /static/jquery-1.11.1.min.js ?
[ https://issues.apache.org/jira/browse/SPARK-27167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry Garcia updated SPARK-27167: - Description: Will there be a big impact on my system if my current /static/jquery-1.11.1.min.js will be update to latest version ? As per VA scan javascript library that we are currently using is vulnerable and we wanted to address this vulnerability. Appreciate any help we could get from the community. Please do refer on the attachment provided. Thank you, was: Will there be a big impact on my system if my current /static/jquery-1.11.1.min.js will be update to latest version ? As per VA scan javascript library that we are currently using is vulnerable and we wanted to address this vulnerability. Appreciate any help we could get from the community. Please do refer below for more information: |CVS|Severity|Description|Impact|Recommendation|Affected|Reference:| |Vulnerable Javascript library|Medium|You are using a vulnerable Javascript library. One or more vulnerabilities were reported for this version of the Javascript library. Consult Attack details and Web References for more information about the affected library and the vulnerabilities that were reported.|Consult References for more information.|Upgrade to the latest version.|/static/jquery-1.11.1.min.js Details Detected Javascript library jquery version 1.11.1. The version was detected from filename.|References: [https://github.com/jquery/jquery/issues/2432] [http://blog.jquery.com/2016/01/08/jquery-2-2-and-1-12-released/] [https://snyk.io/test/npm/jquery/1.11.1] related reference not directly with spark: [https://community.hortonworks.com/questions/89874/ambari-jquery-172-upgrade-to-jquery191.html]| Thank you, > What would be the possible impact if I upgrade /static/jquery-1.11.1.min.js ? > - > > Key: SPARK-27167 > URL: https://issues.apache.org/jira/browse/SPARK-27167 > Project: Spark > Issue Type: Question > Components: Spark Core >Affects Versions: 1.6.2 >Reporter: Jerry Garcia >Priority: Minor > > Will there be a big impact on my system if my current > /static/jquery-1.11.1.min.js will be update to latest version ? > As per VA scan javascript library that we are currently using is vulnerable > and we wanted to address this vulnerability. Appreciate any help we could get > from the community. > Please do refer on the attachment provided. > > > Thank you, > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22506) Spark thrift server can not impersonate user in kerberos
[ https://issues.apache.org/jira/browse/SPARK-22506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16793308#comment-16793308 ] Wataru Yukawa commented on SPARK-22506: --- Hi, Spark thrift server can impersonate a user in our kerberized hadoop and Spark 2.1.1(HDP-2.6.2.0) and the following setting when I execute select query {code:java} hive.server2.enable.doAs=true {code} But it can't impersonate in create query case. For example, if you execute the following query, /apps/hive/warehouse/hoge.db/piyo in HDFS is hive owner. {code:java} create table hoge.piyo(str string) {code} Thanks > Spark thrift server can not impersonate user in kerberos > - > > Key: SPARK-22506 > URL: https://issues.apache.org/jira/browse/SPARK-22506 > Project: Spark > Issue Type: Improvement > Components: Deploy >Affects Versions: 2.2.0 >Reporter: sydt >Priority: Major > Attachments: screenshot-1.png > > > Spark thrift server can not impersonate user in kerberos environment. > I launch spark thrift server in* yarn-client *mode by user *hive* ,which is > allowed to impersonate other user. > User* jt_jzyx_project7* submit sql statement to query its own table located > in hdfs catalog: /user/jt_jzyx_project7, and happened errors: > Permission denied: *user=hive*, access=EXECUTE, > inode=*"/user/jt_jzyx_project7*":hdfs:jt_jzyx_project7:drwxrwx---:user:g_dcpt_project1:rwx,group::rwx > obviously, spark thrift server didn't proxy user: jt_jzyx_project7 in hdfs. > And this happened task stage, which means it pass the hive authorization. > !screenshot-1.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27172) CRLF Injection/HTTP response splitting on spark embedded jetty servlet.
Jerry Garcia created SPARK-27172: Summary: CRLF Injection/HTTP response splitting on spark embedded jetty servlet. Key: SPARK-27172 URL: https://issues.apache.org/jira/browse/SPARK-27172 Project: Spark Issue Type: Question Components: Web UI Affects Versions: 1.6.2 Reporter: Jerry Garcia Can we upgrade embedded jetty servlet on spark 1.6.2? Is this possible or will there be any impact if we do upgrade it ? Please do refer on the provided attachment for more information. |CVS|Severity|Description|Impact|Recommendation|Affected|Reference:| |CRLF injection/HTTP response splitting|Medium|This script is possibly vulnerable to CRLF injection attacks. HTTP headers have the structure "Key: Value", where each line is separated by the CRLF combination. If the user input is injected into the value section without properly escaping/removing CRLF characters it is possible to alter the HTTP headers structure. HTTP Response Splitting is a new application attack technique which enables various new attacks such as web cache poisoning, cross user defacement, hijacking pages with sensitive user information and cross-site scripting (XSS). The attacker sends a single HTTP request that forces the web server to form an output stream, which is then interpreted by the target as two HTTP responses instead of one response.|Is it possible for a remote attacker to inject custom HTTP headers. For example, an attacker can inject session cookies or HTML code. This may conduct to vulnerabilities like XSS (cross-site scripting) or session fixation.|You need to restrict CR(0x13) and LF(0x10) from the user input or properly encode the output in order to prevent the injection of custom HTTP headers.|Web Server Details URL encoded GET input page was set to %c4%8d%c4%8aSomeCustomInjectedHeader:%20injected_by_wvs Injected header found: SomeCustomInjectedHeader: injected_by_wvs Request headers GET /?page=%c4%8d%c4%8aSomeCustomInjectedHeader:%20injected_by_wvs&showIncomplete=false HTTP/1.1 Referer: https://app30.goldmine.bdo.com.ph Host: app30.goldmine.bdo.com.ph Connection: Keep-alive Accept-Encoding: gzip,deflate User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.21 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.21 Acunetix-Product: WVS/11.0 (Acunetix - WVSE) Acunetix-Scanning-agreement: Third Party Scanning PROHIBITED Acunetix-User-agreement: http://www.acunetix.com/wvs/disc.htm Accept: */* Web Server Details URL encoded GET input showIncomplete was set to %c4%8d%c4%8aSomeCustomInjectedHeader:%20injected_by_wvs Injected header found: SomeCustomInjectedHeader: injected_by_wvs Request headers GET /?page=3&showIncomplete=%c4%8d%c4%8aSomeCustomInjectedHeader:%20injected_by_wvs HTTP/1.1 Referer: https://app30.goldmine.bdo.com.ph Host: app30.goldmine.bdo.com.ph Connection: Keep-alive Accept-Encoding: gzip,deflate User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.21 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.21 Acunetix-Product: WVS/11.0 (Acunetix - WVSE) Acunetix-Scanning-agreement: Third Party Scanning PROHIBITED Acunetix-User-agreement: http://www.acunetix.com/wvs/disc.htm Accept: */*|Acunetix CRLF Injection Attack (http://www.acunetix.com/websitesecurity/crlf-injection.htm) Whitepaper - HTTP Response Splitting (http://packetstormsecurity.org/papers/general/whitepaper_httpresponse.pdf) Introduction to HTTP Response Splitting (http://www.securiteam.com/securityreviews/5WP0E2KFGK.html) https://www.cvedetails.com/cve/CVE-2007-5615/ https://cwe.mitre.org/data/definitions/113.html| -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27172) CRLF Injection/HTTP response splitting on spark embedded jetty servlet.
[ https://issues.apache.org/jira/browse/SPARK-27172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry Garcia updated SPARK-27172: - Attachment: CRLF injection - Sheet1.pdf > CRLF Injection/HTTP response splitting on spark embedded jetty servlet. > --- > > Key: SPARK-27172 > URL: https://issues.apache.org/jira/browse/SPARK-27172 > Project: Spark > Issue Type: Question > Components: Web UI >Affects Versions: 1.6.2 >Reporter: Jerry Garcia >Priority: Major > Attachments: CRLF injection - Sheet1.pdf > > > Can we upgrade embedded jetty servlet on spark 1.6.2? Is this possible or > will there be any impact if we do upgrade it ? > Please do refer on the provided attachment for more information. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27132) Improve file source V2 framework
[ https://issues.apache.org/jira/browse/SPARK-27132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-27132. - Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 24066 [https://github.com/apache/spark/pull/24066] > Improve file source V2 framework > > > Key: SPARK-27132 > URL: https://issues.apache.org/jira/browse/SPARK-27132 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Fix For: 3.0.0 > > > During the migration of CSV V2, I find that we can improve the file source v2 > framework by: > 1. check duplicated column names in both read and write > 2. Not all the file sources support filter push down. So remove > `SupportsPushDownFilters` from FileScanBuilder > 3. The method `isSplitable` might require data source options. Add a new > member `options` to FileScan. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27132) Improve file source V2 framework
[ https://issues.apache.org/jira/browse/SPARK-27132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-27132: --- Assignee: Gengliang Wang > Improve file source V2 framework > > > Key: SPARK-27132 > URL: https://issues.apache.org/jira/browse/SPARK-27132 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > During the migration of CSV V2, I find that we can improve the file source v2 > framework by: > 1. check duplicated column names in both read and write > 2. Not all the file sources support filter push down. So remove > `SupportsPushDownFilters` from FileScanBuilder > 3. The method `isSplitable` might require data source options. Add a new > member `options` to FileScan. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27136) Remove data source option check_files_exist
[ https://issues.apache.org/jira/browse/SPARK-27136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-27136: --- Assignee: Gengliang Wang > Remove data source option check_files_exist > --- > > Key: SPARK-27136 > URL: https://issues.apache.org/jira/browse/SPARK-27136 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > The data source option check_files_exist is introduced in In > https://github.com/apache/spark/pull/23383 when the file source V2 framework > is implemented. In the PR, FileIndex was created as a member of FileTable, so > that we could implement partition pruning like 0f9fcab in the future. At that > time FileIndexes will always be created for file writes, so we needed the > option to decide whether to check file existence. > After https://github.com/apache/spark/pull/23774, the option is not needed > anymore. This PR is to clean the option. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27172) CRLF Injection/HTTP response splitting on spark embedded jetty servlet.
[ https://issues.apache.org/jira/browse/SPARK-27172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry Garcia updated SPARK-27172: - Description: Can we upgrade embedded jetty servlet on spark 1.6.2? Is this possible or will there be any impact if we do upgrade it ? Please do refer on the provided attachment for more information. was: Can we upgrade embedded jetty servlet on spark 1.6.2? Is this possible or will there be any impact if we do upgrade it ? Please do refer on the provided attachment for more information. |CVS|Severity|Description|Impact|Recommendation|Affected|Reference:| |CRLF injection/HTTP response splitting|Medium|This script is possibly vulnerable to CRLF injection attacks. HTTP headers have the structure "Key: Value", where each line is separated by the CRLF combination. If the user input is injected into the value section without properly escaping/removing CRLF characters it is possible to alter the HTTP headers structure. HTTP Response Splitting is a new application attack technique which enables various new attacks such as web cache poisoning, cross user defacement, hijacking pages with sensitive user information and cross-site scripting (XSS). The attacker sends a single HTTP request that forces the web server to form an output stream, which is then interpreted by the target as two HTTP responses instead of one response.|Is it possible for a remote attacker to inject custom HTTP headers. For example, an attacker can inject session cookies or HTML code. This may conduct to vulnerabilities like XSS (cross-site scripting) or session fixation.|You need to restrict CR(0x13) and LF(0x10) from the user input or properly encode the output in order to prevent the injection of custom HTTP headers.|Web Server Details URL encoded GET input page was set to %c4%8d%c4%8aSomeCustomInjectedHeader:%20injected_by_wvs Injected header found: SomeCustomInjectedHeader: injected_by_wvs Request headers GET /?page=%c4%8d%c4%8aSomeCustomInjectedHeader:%20injected_by_wvs&showIncomplete=false HTTP/1.1 Referer: https://app30.goldmine.bdo.com.ph Host: app30.goldmine.bdo.com.ph Connection: Keep-alive Accept-Encoding: gzip,deflate User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.21 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.21 Acunetix-Product: WVS/11.0 (Acunetix - WVSE) Acunetix-Scanning-agreement: Third Party Scanning PROHIBITED Acunetix-User-agreement: http://www.acunetix.com/wvs/disc.htm Accept: */* Web Server Details URL encoded GET input showIncomplete was set to %c4%8d%c4%8aSomeCustomInjectedHeader:%20injected_by_wvs Injected header found: SomeCustomInjectedHeader: injected_by_wvs Request headers GET /?page=3&showIncomplete=%c4%8d%c4%8aSomeCustomInjectedHeader:%20injected_by_wvs HTTP/1.1 Referer: https://app30.goldmine.bdo.com.ph Host: app30.goldmine.bdo.com.ph Connection: Keep-alive Accept-Encoding: gzip,deflate User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.21 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.21 Acunetix-Product: WVS/11.0 (Acunetix - WVSE) Acunetix-Scanning-agreement: Third Party Scanning PROHIBITED Acunetix-User-agreement: http://www.acunetix.com/wvs/disc.htm Accept: */*|Acunetix CRLF Injection Attack (http://www.acunetix.com/websitesecurity/crlf-injection.htm) Whitepaper - HTTP Response Splitting (http://packetstormsecurity.org/papers/general/whitepaper_httpresponse.pdf) Introduction to HTTP Response Splitting (http://www.securiteam.com/securityreviews/5WP0E2KFGK.html) https://www.cvedetails.com/cve/CVE-2007-5615/ https://cwe.mitre.org/data/definitions/113.html| > CRLF Injection/HTTP response splitting on spark embedded jetty servlet. > --- > > Key: SPARK-27172 > URL: https://issues.apache.org/jira/browse/SPARK-27172 > Project: Spark > Issue Type: Question > Components: Web UI >Affects Versions: 1.6.2 >Reporter: Jerry Garcia >Priority: Major > > Can we upgrade embedded jetty servlet on spark 1.6.2? Is this possible or > will there be any impact if we do upgrade it ? > Please do refer on the provided attachment for more information. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27166) Improve `printSchema` to print up to the given level
[ https://issues.apache.org/jira/browse/SPARK-27166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-27166. --- Resolution: Fixed Assignee: Dongjoon Hyun Fix Version/s: 3.0.0 This is resolved via https://github.com/apache/spark/pull/24098 > Improve `printSchema` to print up to the given level > > > Key: SPARK-27166 > URL: https://issues.apache.org/jira/browse/SPARK-27166 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Fix For: 3.0.0 > > > This issue aims to improve `printSchema` to be able to print up to the given > level of the schema. > {code:java} > scala> val df = Seq((1,(2,(3,4.toDF > df: org.apache.spark.sql.DataFrame = [_1: int, _2: struct<_1: int, _2: > struct<_1: int, _2: int>>] > scala> df.printSchema > root > |-- _1: integer (nullable = false) > |-- _2: struct (nullable = true) > | |-- _1: integer (nullable = false) > | |-- _2: struct (nullable = true) > | | |-- _1: integer (nullable = false) > | | |-- _2: integer (nullable = false) > scala> df.printSchema(1) > root > |-- _1: integer (nullable = false) > |-- _2: struct (nullable = true) > scala> df.printSchema(2) > root > |-- _1: integer (nullable = false) > |-- _2: struct (nullable = true) > | |-- _1: integer (nullable = false) > | |-- _2: struct (nullable = true) > scala> df.printSchema(3) > root > |-- _1: integer (nullable = false) > |-- _2: struct (nullable = true) > | |-- _1: integer (nullable = false) > | |-- _2: struct (nullable = true) > | | |-- _1: integer (nullable = false) > | | |-- _2: integer (nullable = false){code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27167) What would be the possible impact if I upgrade /static/jquery-1.11.1.min.js ?
[ https://issues.apache.org/jira/browse/SPARK-27167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry Garcia updated SPARK-27167: - Description: Will there be a big impact on my system if my current /static/jquery-1.11.1.min.js will be update to latest version ? As per VA scan javascript library that we are currently using is vulnerable and we wanted to address this vulnerability. Appreciate any help we could get from the community. Please do refer below for more information: |CVS|Severity|Description|Impact|Recommendation|Affected|Reference:| |Vulnerable Javascript library|Medium|You are using a vulnerable Javascript library. One or more vulnerabilities were reported for this version of the Javascript library. Consult Attack details and Web References for more information about the affected library and the vulnerabilities that were reported.|Consult References for more information.|Upgrade to the latest version.|/static/jquery-1.11.1.min.js Details Detected Javascript library jquery version 1.11.1. The version was detected from filename.|References: [https://github.com/jquery/jquery/issues/2432] [http://blog.jquery.com/2016/01/08/jquery-2-2-and-1-12-released/] [https://snyk.io/test/npm/jquery/1.11.1] related reference not directly with spark: [https://community.hortonworks.com/questions/89874/ambari-jquery-172-upgrade-to-jquery191.html]| Thank you, was: Will there be a big impact on my system if my current /static/jquery-1.11.1.min.js will be update to latest version ? As per VA scan javascript library that we are currently using is vulnerable and we wanted to address this vulnerability. Appreciate any help we could get from the community. Please do refer below for more information: |CVS|Severity|Description|Impact|Recommendation|Affected|Reference:| |Vulnerable Javascript library|Medium|You are using a vulnerable Javascript library. One or more vulnerabilities were reported for this version of the Javascript library. Consult Attack details and Web References for more information about the affected library and the vulnerabilities that were reported.|Consult References for more information.|Upgrade to the latest version.|/static/jquery-1.11.1.min.js Details Detected Javascript library jquery version 1.11.1. The version was detected from filename.|References: https://github.com/jquery/jquery/issues/2432 http://blog.jquery.com/2016/01/08/jquery-2-2-and-1-12-released/ https://snyk.io/test/npm/jquery/1.11.1 related reference not directly with spark: https://community.hortonworks.com/questions/89874/ambari-jquery-172-upgrade-to-jquery191.html| Thanks you, > What would be the possible impact if I upgrade /static/jquery-1.11.1.min.js ? > - > > Key: SPARK-27167 > URL: https://issues.apache.org/jira/browse/SPARK-27167 > Project: Spark > Issue Type: Question > Components: Spark Core >Affects Versions: 1.6.2 >Reporter: Jerry Garcia >Priority: Minor > > Will there be a big impact on my system if my current > /static/jquery-1.11.1.min.js will be update to latest version ? > As per VA scan javascript library that we are currently using is vulnerable > and we wanted to address this vulnerability. Appreciate any help we could get > from the community. > Please do refer below for more information: > |CVS|Severity|Description|Impact|Recommendation|Affected|Reference:| > |Vulnerable Javascript library|Medium|You are using a vulnerable Javascript > library. One or more vulnerabilities were reported for this version of the > Javascript library. Consult Attack details and Web References for more > information about the affected library and the vulnerabilities that were > reported.|Consult References for more information.|Upgrade to the latest > version.|/static/jquery-1.11.1.min.js > > Details > Detected Javascript library jquery version 1.11.1. The version was detected > from filename.|References: > [https://github.com/jquery/jquery/issues/2432] > [http://blog.jquery.com/2016/01/08/jquery-2-2-and-1-12-released/] > > [https://snyk.io/test/npm/jquery/1.11.1] > > related reference not directly with spark: > > [https://community.hortonworks.com/questions/89874/ambari-jquery-172-upgrade-to-jquery191.html]| > > Thank you, > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27136) Remove data source option check_files_exist
[ https://issues.apache.org/jira/browse/SPARK-27136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-27136. - Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 24069 [https://github.com/apache/spark/pull/24069] > Remove data source option check_files_exist > --- > > Key: SPARK-27136 > URL: https://issues.apache.org/jira/browse/SPARK-27136 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Fix For: 3.0.0 > > > The data source option check_files_exist is introduced in In > https://github.com/apache/spark/pull/23383 when the file source V2 framework > is implemented. In the PR, FileIndex was created as a member of FileTable, so > that we could implement partition pruning like 0f9fcab in the future. At that > time FileIndexes will always be created for file writes, so we needed the > option to decide whether to check file existence. > After https://github.com/apache/spark/pull/23774, the option is not needed > anymore. This PR is to clean the option. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27167) What would be the possible impact if I upgrade /static/jquery-1.11.1.min.js ?
[ https://issues.apache.org/jira/browse/SPARK-27167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry Garcia updated SPARK-27167: - Description: Will there be a big impact on my system if my current /static/jquery-1.11.1.min.js will be update to latest version ? As per VA scan javascript library that we are currently using is vulnerable and we wanted to address this vulnerability. Appreciate any help we could get from the community. Please do refer below for more information: |CVS|Severity|Description|Impact|Recommendation|Affected|Reference:| |Vulnerable Javascript library|Medium|You are using a vulnerable Javascript library. One or more vulnerabilities were reported for this version of the Javascript library. Consult Attack details and Web References for more information about the affected library and the vulnerabilities that were reported.|Consult References for more information.|Upgrade to the latest version.|/static/jquery-1.11.1.min.js Details Detected Javascript library jquery version 1.11.1. The version was detected from filename.|References: https://github.com/jquery/jquery/issues/2432 http://blog.jquery.com/2016/01/08/jquery-2-2-and-1-12-released/ https://snyk.io/test/npm/jquery/1.11.1 related reference not directly with spark: https://community.hortonworks.com/questions/89874/ambari-jquery-172-upgrade-to-jquery191.html| Thanks you, was: Will there be a big impact on my system if my current /static/jquery-1.11.1.min.js will be update to latest version ? As per VA scan javascript library that we are currently using is vulnerable and we wanted to address this vulnerability. Appreciate any help we could get from the community. Thanks, > What would be the possible impact if I upgrade /static/jquery-1.11.1.min.js ? > - > > Key: SPARK-27167 > URL: https://issues.apache.org/jira/browse/SPARK-27167 > Project: Spark > Issue Type: Question > Components: Spark Core >Affects Versions: 1.6.2 >Reporter: Jerry Garcia >Priority: Minor > > Will there be a big impact on my system if my current > /static/jquery-1.11.1.min.js will be update to latest version ? > As per VA scan javascript library that we are currently using is vulnerable > and we wanted to address this vulnerability. Appreciate any help we could get > from the community. > Please do refer below for more information: > |CVS|Severity|Description|Impact|Recommendation|Affected|Reference:| > |Vulnerable Javascript library|Medium|You are using a vulnerable Javascript > library. One or more vulnerabilities were reported for this version of the > Javascript library. Consult Attack details and Web References for more > information about the affected library and the vulnerabilities that were > reported.|Consult References for more information.|Upgrade to the latest > version.|/static/jquery-1.11.1.min.js > > Details > Detected Javascript library jquery version 1.11.1. The version was detected > from filename.|References: > https://github.com/jquery/jquery/issues/2432 > http://blog.jquery.com/2016/01/08/jquery-2-2-and-1-12-released/ > > https://snyk.io/test/npm/jquery/1.11.1 > > related reference not directly with spark: > https://community.hortonworks.com/questions/89874/ambari-jquery-172-upgrade-to-jquery191.html| > > Thanks you, > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27107) Spark SQL Job failing because of Kryo buffer overflow with ORC
[ https://issues.apache.org/jira/browse/SPARK-27107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-27107. --- Resolution: Fixed Assignee: Dongjoon Hyun Fix Version/s: 3.0.0 2.4.2 This is resolved via [https://github.com/apache/spark/pull/24096] and [https://github.com/apache/spark/pull/24097] . > Spark SQL Job failing because of Kryo buffer overflow with ORC > -- > > Key: SPARK-27107 > URL: https://issues.apache.org/jira/browse/SPARK-27107 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.2, 2.4.0 >Reporter: Dhruve Ashar >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 2.4.2, 3.0.0 > > > The issue occurs while trying to read ORC data and setting the SearchArgument. > {code:java} > Caused by: com.esotericsoftware.kryo.KryoException: Buffer overflow. > Available: 0, required: 9 > Serialization trace: > literalList > (org.apache.orc.storage.ql.io.sarg.SearchArgumentImpl$PredicateLeafImpl) > leaves (org.apache.orc.storage.ql.io.sarg.SearchArgumentImpl) > at com.esotericsoftware.kryo.io.Output.require(Output.java:163) > at com.esotericsoftware.kryo.io.Output.writeVarLong(Output.java:614) > at com.esotericsoftware.kryo.io.Output.writeLong(Output.java:538) > at > com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.write(DefaultSerializers.java:147) > at > com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.write(DefaultSerializers.java:141) > at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:534) > at > org.apache.orc.mapred.OrcInputFormat.setSearchArgument(OrcInputFormat.java:96) > at > org.apache.orc.mapreduce.OrcInputFormat.setSearchArgument(OrcInputFormat.java:57) > at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(OrcFileFormat.scala:159) > at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(OrcFileFormat.scala:156) > at scala.Option.foreach(Option.scala:257) > at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.buildReaderWithPartitionValues(OrcFileFormat.scala:156) > at > org.apache.spark.sql.execution.FileSourceScanExec.inputRDD$lzycompute(DataSourceScanExec.scala:297) > at > org.apache.spark.sql.execution.FileSourceScanExec.inputRDD(DataSourceScanExec.scala:295) > at > org.apache.spark.sql.execution.FileSourceScanExec.inputRDDs(DataSourceScanExec.scala:315) > at > org.apache.spark.sql.execution.FilterExec.inputRDDs(basicPhysicalOperators.scala:121) > at > org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:41) > at > org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:605) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.python.EvalPythonExec.doExecute(EvalPythonExec.scala:89) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply
[jira] [Resolved] (SPARK-27165) Upgrade Apache ORC to 1.5.5
[ https://issues.apache.org/jira/browse/SPARK-27165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-27165. --- Resolution: Fixed Assignee: Dongjoon Hyun Fix Version/s: 3.0.0 2.4.2 This is resolved via [https://github.com/apache/spark/pull/24096] and [https://github.com/apache/spark/pull/24097] . > Upgrade Apache ORC to 1.5.5 > --- > > Key: SPARK-27165 > URL: https://issues.apache.org/jira/browse/SPARK-27165 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.4.1, 3.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 2.4.2, 3.0.0 > > > This issue aims to update Apache ORC dependency to fix SPARK-27107 . > {code:java} > [ORC-452] Support converting MAP column from JSON to ORC > Improvement > [ORC-447] Change the docker scripts to keep a persistent m2 cache > [ORC-463] Add `version` command > [ORC-475] ORC reader should lazily get filesystem > [ORC-476] Make SearchAgument kryo buffer size configurable{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27171) Support Full-Partiton limit in the first scan
deshanxiao created SPARK-27171: -- Summary: Support Full-Partiton limit in the first scan Key: SPARK-27171 URL: https://issues.apache.org/jira/browse/SPARK-27171 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.0, 2.3.2 Reporter: deshanxiao SparkPlan#executeTake must pick element starting with one partition. Sometimes it will be slow for some query. Although, Spark is better at batch query. It's not bad to add a switch to allow user search all partitons for the first time in limit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27142) Provide REST API for SQL level information
[ https://issues.apache.org/jira/browse/SPARK-27142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16793268#comment-16793268 ] Gengliang Wang commented on SPARK-27142: +1 on the proposal. > Provide REST API for SQL level information > -- > > Key: SPARK-27142 > URL: https://issues.apache.org/jira/browse/SPARK-27142 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.0.0 >Reporter: Ajith S >Priority: Minor > Attachments: image-2019-03-13-19-29-26-896.png > > > Currently for Monitoring Spark application SQL information is not available > from REST but only via UI. REST provides only > applications,jobs,stages,environment. This Jira is targeted to provide a REST > API so that SQL level information can be found > > Details: > https://issues.apache.org/jira/browse/SPARK-27142?focusedCommentId=16791728&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16791728 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27170) Better error message for syntax error with extraneous comma in the SQL parser
Wataru Yukawa created SPARK-27170: - Summary: Better error message for syntax error with extraneous comma in the SQL parser Key: SPARK-27170 URL: https://issues.apache.org/jira/browse/SPARK-27170 Project: Spark Issue Type: Wish Components: SQL Affects Versions: 2.4.0 Reporter: Wataru Yukawa [~maropu], [~smilegator] It was great to talk with you in Hadoop / Spark Conference Japan 2019. Thanks in advance! I filed this issue which I talked with you at that time. We sometimes write a syntax error SQL with extraneous comma by mistake. For example, here is the SQL with an extraneous comma in line 2. {code} SELECT distinct ,a ,b ,c FROM ...' LIMIT 100 {code} We have an error message in spark 2.4.0 but it's a little hard to understand in my feeling because line number is wrong. {code} cannot resolve '`distinct`' given input columns: [...]; line 1 pos 7; 'GlobalLimit 100 +- 'LocalLimit 100 +- 'Project ['distinct, ...] +- Filter (...) +- SubqueryAlias ... +- HiveTableRelation ... {code} By the way, here is the error message in prestosql 305 and same sql. Line number is correct and I guess an error message is better than sparksql. {code} line 2:5: mismatched input ','. Expecting: '*', , {code} If sparksql error message improves, it would be great. Thanks. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26778) Implement file source V2 partitioning pruning
[ https://issues.apache.org/jira/browse/SPARK-26778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16793263#comment-16793263 ] Gengliang Wang commented on SPARK-26778: Sorry, I meant file source partition pruning. I have updated the title. > Implement file source V2 partitioning pruning > - > > Key: SPARK-26778 > URL: https://issues.apache.org/jira/browse/SPARK-26778 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26778) Implement file source V2 partitioning pruning
[ https://issues.apache.org/jira/browse/SPARK-26778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang updated SPARK-26778: --- Summary: Implement file source V2 partitioning pruning (was: Implement file source V2 partitioning ) > Implement file source V2 partitioning pruning > - > > Key: SPARK-26778 > URL: https://issues.apache.org/jira/browse/SPARK-26778 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-26343) Speed up running the kubernetes integration tests locally
[ https://issues.apache.org/jira/browse/SPARK-26343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-26343. Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 23380 [https://github.com/apache/spark/pull/23380] > Speed up running the kubernetes integration tests locally > - > > Key: SPARK-26343 > URL: https://issues.apache.org/jira/browse/SPARK-26343 > Project: Spark > Issue Type: Improvement > Components: Kubernetes, Tests >Affects Versions: 3.0.0 >Reporter: holdenk >Assignee: holdenk >Priority: Trivial > Fix For: 3.0.0 > > > The Kubernetes integration tests right now allow you to specify a docker tag > but even when you do it also requires a tgz to extract, but then it doesn't > really need that extracted version. We could make it easier/faster for folks > to run the integration tests locally by not requiring a distribution tar ball. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27169) number of active tasks is negative on executors page
[ https://issues.apache.org/jira/browse/SPARK-27169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] acupple updated SPARK-27169: Attachment: QQ20190315-102235.png > number of active tasks is negative on executors page > > > Key: SPARK-27169 > URL: https://issues.apache.org/jira/browse/SPARK-27169 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.2 >Reporter: acupple >Priority: Minor > Attachments: QQ20190315-102215.png, QQ20190315-102235.png > > > I use spark to process some data in hdfs and hbase, and the concurrency is > 16. > but when run some time, the active jobs will be thousands, and number of > active tasks are negative. > Actually, these jobs are already done when I check driver logs > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27169) number of active tasks is negative on executors page
[ https://issues.apache.org/jira/browse/SPARK-27169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] acupple updated SPARK-27169: Description: I use spark to process some data in HDFS and HBASE, I use one thread consume message from a queue, and then submit to a thread pool(16 fix size)for spark processor. But when run for some time, the active jobs will be thousands, and number of active tasks are negative. Actually, these jobs are already done when I check driver logs。 was: I use spark to process some data in hdfs and hbase, and the concurrency is 16. but when run some time, the active jobs will be thousands, and number of active tasks are negative. Actually, these jobs are already done when I check driver logs > number of active tasks is negative on executors page > > > Key: SPARK-27169 > URL: https://issues.apache.org/jira/browse/SPARK-27169 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.2 >Reporter: acupple >Priority: Minor > Attachments: QQ20190315-102215.png, QQ20190315-102235.png > > > I use spark to process some data in HDFS and HBASE, I use one thread consume > message from a queue, and then submit to a thread pool(16 fix size)for spark > processor. > But when run for some time, the active jobs will be thousands, and number of > active tasks are negative. > Actually, these jobs are already done when I check driver logs。 > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27169) number of active tasks is negative on executors page
[ https://issues.apache.org/jira/browse/SPARK-27169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] acupple updated SPARK-27169: Attachment: QQ20190315-102215.png > number of active tasks is negative on executors page > > > Key: SPARK-27169 > URL: https://issues.apache.org/jira/browse/SPARK-27169 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.2 >Reporter: acupple >Priority: Minor > Attachments: QQ20190315-102215.png, QQ20190315-102235.png > > > I use spark to process some data in hdfs and hbase, and the concurrency is > 16. > but when run some time, the active jobs will be thousands, and number of > active tasks are negative. > Actually, these jobs are already done when I check driver logs > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27169) number of active tasks is negative on executors page
[ https://issues.apache.org/jira/browse/SPARK-27169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] acupple updated SPARK-27169: Description: I use spark to process some data in hdfs and hbase, and the concurrency is 16. but when run some time, the active jobs will be thousands, and number of active tasks are negative. Actually, these jobs are already done when I check driver logs was: I use spark to process some data in hdfs and hbase, and the concurrency is 16. but when run some time, the active jobs will be thousands, and number of active tasks are negative. Actually, these jobs are already done when I check driver logs !image-2019-03-15-10-20-36-998.png|width=576,height=242! !image-2019-03-15-10-21-16-478.png|width=577,height=258! > number of active tasks is negative on executors page > > > Key: SPARK-27169 > URL: https://issues.apache.org/jira/browse/SPARK-27169 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.2 >Reporter: acupple >Priority: Minor > > I use spark to process some data in hdfs and hbase, and the concurrency is > 16. > but when run some time, the active jobs will be thousands, and number of > active tasks are negative. > Actually, these jobs are already done when I check driver logs > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27169) number of active tasks is negative on executors page
acupple created SPARK-27169: --- Summary: number of active tasks is negative on executors page Key: SPARK-27169 URL: https://issues.apache.org/jira/browse/SPARK-27169 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 2.3.2 Reporter: acupple I use spark to process some data in hdfs and hbase, and the concurrency is 16. but when run some time, the active jobs will be thousands, and number of active tasks are negative. Actually, these jobs are already done when I check driver logs !image-2019-03-15-10-20-36-998.png|width=576,height=242! !image-2019-03-15-10-21-16-478.png|width=577,height=258! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-27141) Use ConfigEntry for hardcoded configs Yarn
[ https://issues.apache.org/jira/browse/SPARK-27141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangjiaochun reopened SPARK-27141: -- > Use ConfigEntry for hardcoded configs Yarn > -- > > Key: SPARK-27141 > URL: https://issues.apache.org/jira/browse/SPARK-27141 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 3.0.0 >Reporter: wangjiaochun >Priority: Major > Fix For: 3.0.0 > > > Some of following Yarn file related configs are not use ConfigEntry value,try > to replace them. > ApplicationMaster > YarnAllocatorSuite > ApplicationMasterSuite > BaseYarnClusterSuite > YarnClusterSuite -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27152) Column equality does not work for aliased columns.
[ https://issues.apache.org/jira/browse/SPARK-27152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16793251#comment-16793251 ] Hyukjin Kwon commented on SPARK-27152: -- So, in which case is it important? > Column equality does not work for aliased columns. > -- > > Key: SPARK-27152 > URL: https://issues.apache.org/jira/browse/SPARK-27152 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Ryan Radtke >Priority: Minor > > assert($"zip".as("zip_code") equals $"zip".as("zip_code")) will return false > assert($"zip" equals $"zip") will return true. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27164) RDD.countApprox on empty RDDs schedules jobs which never complete
[ https://issues.apache.org/jira/browse/SPARK-27164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-27164: Assignee: Apache Spark > RDD.countApprox on empty RDDs schedules jobs which never complete > -- > > Key: SPARK-27164 > URL: https://issues.apache.org/jira/browse/SPARK-27164 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.3, 2.4.0 > Environment: macOS, Spark-2.4.0 with Hadoop 2.7 running on Java 11.0.1 > Also observed on: > macOS, Spark-2.2.3 with Hadoop 2.7 running on Java 1.8.0_151 >Reporter: Ryan Moore >Assignee: Apache Spark >Priority: Major > Attachments: Screen Shot 2019-03-14 at 1.49.19 PM.png > > > When calling `countApprox` on an RDD which has no partitions (such as those > created by `sparkContext.emptyRDD`) a job is scheduled with 0 stages and 0 > tasks. That job appears under the "Active Jobs" in the Spark UI until it is > either killed or the Spark context is shut down. > > {code:java} > Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 11.0.1) > Type in expressions to have them evaluated. > Type :help for more information. > scala> val ints = sc.makeRDD(Seq(1)) > ints: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at makeRDD at > :24 > scala> ints.countApprox(1000) > res0: > org.apache.spark.partial.PartialResult[org.apache.spark.partial.BoundedDouble] > = (final: [1.000, 1.000]) > // PartialResult is returned, Scheduled job completed > scala> ints.filter(_ => false).countApprox(1000) > res1: > org.apache.spark.partial.PartialResult[org.apache.spark.partial.BoundedDouble] > = (final: [0.000, 0.000]) > // PartialResult is returned, Scheduled job completed > scala> sc.emptyRDD[Int].countApprox(1000) > res5: > org.apache.spark.partial.PartialResult[org.apache.spark.partial.BoundedDouble] > = (final: [0.000, 0.000]) > // PartialResult is returned, Scheduled job is ACTIVE but never completes > scala> sc.union(Nil : Seq[org.apache.spark.rdd.RDD[Int]]).countApprox(1000) > res16: > org.apache.spark.partial.PartialResult[org.apache.spark.partial.BoundedDouble] > = (final: [0.000, 0.000]) > // PartialResult is returned, Scheduled job is ACTIVE but never completes > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27164) RDD.countApprox on empty RDDs schedules jobs which never complete
[ https://issues.apache.org/jira/browse/SPARK-27164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-27164: Assignee: (was: Apache Spark) > RDD.countApprox on empty RDDs schedules jobs which never complete > -- > > Key: SPARK-27164 > URL: https://issues.apache.org/jira/browse/SPARK-27164 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.3, 2.4.0 > Environment: macOS, Spark-2.4.0 with Hadoop 2.7 running on Java 11.0.1 > Also observed on: > macOS, Spark-2.2.3 with Hadoop 2.7 running on Java 1.8.0_151 >Reporter: Ryan Moore >Priority: Major > Attachments: Screen Shot 2019-03-14 at 1.49.19 PM.png > > > When calling `countApprox` on an RDD which has no partitions (such as those > created by `sparkContext.emptyRDD`) a job is scheduled with 0 stages and 0 > tasks. That job appears under the "Active Jobs" in the Spark UI until it is > either killed or the Spark context is shut down. > > {code:java} > Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 11.0.1) > Type in expressions to have them evaluated. > Type :help for more information. > scala> val ints = sc.makeRDD(Seq(1)) > ints: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at makeRDD at > :24 > scala> ints.countApprox(1000) > res0: > org.apache.spark.partial.PartialResult[org.apache.spark.partial.BoundedDouble] > = (final: [1.000, 1.000]) > // PartialResult is returned, Scheduled job completed > scala> ints.filter(_ => false).countApprox(1000) > res1: > org.apache.spark.partial.PartialResult[org.apache.spark.partial.BoundedDouble] > = (final: [0.000, 0.000]) > // PartialResult is returned, Scheduled job completed > scala> sc.emptyRDD[Int].countApprox(1000) > res5: > org.apache.spark.partial.PartialResult[org.apache.spark.partial.BoundedDouble] > = (final: [0.000, 0.000]) > // PartialResult is returned, Scheduled job is ACTIVE but never completes > scala> sc.union(Nil : Seq[org.apache.spark.rdd.RDD[Int]]).countApprox(1000) > res16: > org.apache.spark.partial.PartialResult[org.apache.spark.partial.BoundedDouble] > = (final: [0.000, 0.000]) > // PartialResult is returned, Scheduled job is ACTIVE but never completes > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27168) Add docker integration test for MsSql Server
[ https://issues.apache.org/jira/browse/SPARK-27168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-27168: Assignee: Apache Spark > Add docker integration test for MsSql Server > > > Key: SPARK-27168 > URL: https://issues.apache.org/jira/browse/SPARK-27168 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.0.0 >Reporter: Zhu, Lipeng >Assignee: Apache Spark >Priority: Major > > Add docker integration test for MsSql Server. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27168) Add docker integration test for MsSql Server
[ https://issues.apache.org/jira/browse/SPARK-27168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-27168: Assignee: (was: Apache Spark) > Add docker integration test for MsSql Server > > > Key: SPARK-27168 > URL: https://issues.apache.org/jira/browse/SPARK-27168 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.0.0 >Reporter: Zhu, Lipeng >Priority: Major > > Add docker integration test for MsSql Server. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27168) Add docker integration test for MsSql Server
[ https://issues.apache.org/jira/browse/SPARK-27168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16793234#comment-16793234 ] Apache Spark commented on SPARK-27168: -- User 'lipzhu' has created a pull request for this issue: https://github.com/apache/spark/pull/24099 > Add docker integration test for MsSql Server > > > Key: SPARK-27168 > URL: https://issues.apache.org/jira/browse/SPARK-27168 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.0.0 >Reporter: Zhu, Lipeng >Priority: Major > > Add docker integration test for MsSql Server. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27100) dag-scheduler-event-loop" java.lang.StackOverflowError
[ https://issues.apache.org/jira/browse/SPARK-27100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16793233#comment-16793233 ] KaiXu commented on SPARK-27100: --- hi [~hyukjin.kwon], the workload I'm running is ALS from Hibench, the code can be obtained from [here|https://github.com/intel-hadoop/HiBench/blob/master/sparkbench/ml/src/main/scala/com/intel/sparkbench/ml/ALSExample.scala], and here is the [doc |https://github.com/intel-hadoop/HiBench/blob/master/docs/run-sparkbench.md] on how to build and run. Steps to reproduce: # Follow above doc to config the Hibench based on your cluster. # Edit \{HIBENCH_HOME}/conf/benchmarks.lst, keep ml.als in this file to run ALS only. # Edit \{HIBENCH_HOME}/conf/hibench.conf, change the value of hibench.scale.profile to gigantic. # Edit \{HIBENCH_HOME}/conf/workloads/ml/al.conf, change the value of hibench.als.rank to 200, hibench.als.numIterations to 100 # \{HIBENCH_HOME}/conf/run_all.sh, to start the test. # Wait to about 30 iterations, it will fail with StackOverflowError > dag-scheduler-event-loop" java.lang.StackOverflowError > -- > > Key: SPARK-27100 > URL: https://issues.apache.org/jira/browse/SPARK-27100 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 2.1.3, 2.3.3 >Reporter: KaiXu >Priority: Major > Attachments: stderr > > > ALS in Spark MLlib causes StackOverflow: > /opt/sparkml/spark213/bin/spark-submit --properties-file > /opt/HiBench/report/als/spark/conf/sparkbench/spark.conf --class > com.intel.hibench.sparkbench.ml.ALSExample --master yarn-client > --num-executors 3 --executor-memory 322g > /opt/HiBench/sparkbench/assembly/target/sparkbench-assembly-7.1-SNAPSHOT-dist.jar > --numUsers 4 --numProducts 6 --rank 100 --numRecommends 20 > --numIterations 100 --kryo false --implicitPrefs true --numProductBlocks -1 > --numUserBlocks -1 --lambda 1.0 hdfs://bdw-slave20:8020/HiBench/ALS/Input > > Exception in thread "dag-scheduler-event-loop" java.lang.StackOverflowError > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1534) > at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) > at > scala.collection.immutable.List$SerializationProxy.writeObject(List.scala:468) > at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1028) > at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) > at > scala.collection.immutable.List$SerializationProxy.writeObject(List.scala:468) > at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1028) > at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at > java.io.ObjectOutputStream.defaultWriteFields(Objec
[jira] [Updated] (SPARK-27168) Add docker integration test for MsSql Server
[ https://issues.apache.org/jira/browse/SPARK-27168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu, Lipeng updated SPARK-27168: Issue Type: Test (was: Bug) > Add docker integration test for MsSql Server > > > Key: SPARK-27168 > URL: https://issues.apache.org/jira/browse/SPARK-27168 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.0.0 >Reporter: Zhu, Lipeng >Priority: Major > > Add docker integration test for MsSql Server. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27168) Add docker integration test for MsSql Server
Zhu, Lipeng created SPARK-27168: --- Summary: Add docker integration test for MsSql Server Key: SPARK-27168 URL: https://issues.apache.org/jira/browse/SPARK-27168 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0 Reporter: Zhu, Lipeng Add docker integration test for MsSql Server. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27164) RDD.countApprox on empty RDDs schedules jobs which never complete
[ https://issues.apache.org/jira/browse/SPARK-27164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16793222#comment-16793222 ] Ajith S commented on SPARK-27164: - i will be working on this > RDD.countApprox on empty RDDs schedules jobs which never complete > -- > > Key: SPARK-27164 > URL: https://issues.apache.org/jira/browse/SPARK-27164 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.3, 2.4.0 > Environment: macOS, Spark-2.4.0 with Hadoop 2.7 running on Java 11.0.1 > Also observed on: > macOS, Spark-2.2.3 with Hadoop 2.7 running on Java 1.8.0_151 >Reporter: Ryan Moore >Priority: Major > Attachments: Screen Shot 2019-03-14 at 1.49.19 PM.png > > > When calling `countApprox` on an RDD which has no partitions (such as those > created by `sparkContext.emptyRDD`) a job is scheduled with 0 stages and 0 > tasks. That job appears under the "Active Jobs" in the Spark UI until it is > either killed or the Spark context is shut down. > > {code:java} > Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 11.0.1) > Type in expressions to have them evaluated. > Type :help for more information. > scala> val ints = sc.makeRDD(Seq(1)) > ints: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at makeRDD at > :24 > scala> ints.countApprox(1000) > res0: > org.apache.spark.partial.PartialResult[org.apache.spark.partial.BoundedDouble] > = (final: [1.000, 1.000]) > // PartialResult is returned, Scheduled job completed > scala> ints.filter(_ => false).countApprox(1000) > res1: > org.apache.spark.partial.PartialResult[org.apache.spark.partial.BoundedDouble] > = (final: [0.000, 0.000]) > // PartialResult is returned, Scheduled job completed > scala> sc.emptyRDD[Int].countApprox(1000) > res5: > org.apache.spark.partial.PartialResult[org.apache.spark.partial.BoundedDouble] > = (final: [0.000, 0.000]) > // PartialResult is returned, Scheduled job is ACTIVE but never completes > scala> sc.union(Nil : Seq[org.apache.spark.rdd.RDD[Int]]).countApprox(1000) > res16: > org.apache.spark.partial.PartialResult[org.apache.spark.partial.BoundedDouble] > = (final: [0.000, 0.000]) > // PartialResult is returned, Scheduled job is ACTIVE but never completes > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27070) DefaultPartitionCoalescer can lock up driver for hours
[ https://issues.apache.org/jira/browse/SPARK-27070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reassigned SPARK-27070: - Assignee: Yuli Fiterman > DefaultPartitionCoalescer can lock up driver for hours > -- > > Key: SPARK-27070 > URL: https://issues.apache.org/jira/browse/SPARK-27070 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.1, 2.3.2, 2.4.0 >Reporter: Yuli Fiterman >Assignee: Yuli Fiterman >Priority: Major > > We're running Spark on EMR reading large datasets from S3. When trying to > coalesce a UnionRDD of two large FileScanRDDs (each with a few million > partitions) into around 8k partitions the driver can stall for over an hour. > > Profiler shows that over 90% of the time is spent in TimSort which is invoked > by `pickBin`. This seems like a very inefficient way to find the least > occupied PartitionGroup. IMO a better way would just using the `min` method > on the ArrayBuffer of `PartitionGroup`s -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27070) DefaultPartitionCoalescer can lock up driver for hours
[ https://issues.apache.org/jira/browse/SPARK-27070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-27070. --- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 23986 [https://github.com/apache/spark/pull/23986] > DefaultPartitionCoalescer can lock up driver for hours > -- > > Key: SPARK-27070 > URL: https://issues.apache.org/jira/browse/SPARK-27070 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.1, 2.3.2, 2.4.0 >Reporter: Yuli Fiterman >Assignee: Yuli Fiterman >Priority: Major > Fix For: 3.0.0 > > > We're running Spark on EMR reading large datasets from S3. When trying to > coalesce a UnionRDD of two large FileScanRDDs (each with a few million > partitions) into around 8k partitions the driver can stall for over an hour. > > Profiler shows that over 90% of the time is spent in TimSort which is invoked > by `pickBin`. This seems like a very inefficient way to find the least > occupied PartitionGroup. IMO a better way would just using the `min` method > on the ArrayBuffer of `PartitionGroup`s -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26176) Verify column name when creating table via `STORED AS`
[ https://issues.apache.org/jira/browse/SPARK-26176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-26176: -- Issue Type: Improvement (was: Bug) > Verify column name when creating table via `STORED AS` > -- > > Key: SPARK-26176 > URL: https://issues.apache.org/jira/browse/SPARK-26176 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Xiao Li >Priority: Major > Labels: starter > > We can issue a reasonable exception when we creating Parquet native tables, > {code:java} > CREATE TABLE TAB1TEST USING PARQUET AS SELECT COUNT(ID) FROM TAB1; > {code} > {code:java} > org.apache.spark.sql.AnalysisException: Attribute name "count(ID)" contains > invalid character(s) among " ,;{}()\n\t=". Please use alias to rename it.; > {code} > However, the error messages are misleading when we create a table using the > Hive serde "STORED AS" > {code:java} > CREATE TABLE TAB1TEST STORED AS PARQUET AS SELECT COUNT(ID) FROM TAB1; > {code} > {code:java} > 18/11/26 09:04:44 ERROR SparkSQLDriver: Failed in [CREATE TABLE TAB2TEST > stored as parquet AS SELECT COUNT(col1) FROM TAB1] > org.apache.spark.SparkException: Job aborted. > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:196) > at > org.apache.spark.sql.hive.execution.SaveAsHiveFile.saveAsHiveFile(SaveAsHiveFile.scala:97) > at > org.apache.spark.sql.hive.execution.SaveAsHiveFile.saveAsHiveFile$(SaveAsHiveFile.scala:48) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.saveAsHiveFile(InsertIntoHiveTable.scala:66) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.processInsert(InsertIntoHiveTable.scala:201) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.run(InsertIntoHiveTable.scala:99) > at > org.apache.spark.sql.hive.execution.CreateHiveTableAsSelectCommand.run(CreateHiveTableAsSelectCommand.scala:86) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:113) > at > org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:201) > at > org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3270) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:147) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74) > at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3266) > at org.apache.spark.sql.Dataset.(Dataset.scala:201) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:86) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:655) > at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:685) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:62) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:371) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:274) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:852) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:927) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:936) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 0 in stage 3.0 failed 1 times, mo
[jira] [Updated] (SPARK-26176) Verify column name when creating table via `STORED AS`
[ https://issues.apache.org/jira/browse/SPARK-26176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-26176: -- Priority: Minor (was: Major) > Verify column name when creating table via `STORED AS` > -- > > Key: SPARK-26176 > URL: https://issues.apache.org/jira/browse/SPARK-26176 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Xiao Li >Priority: Minor > Labels: starter > > We can issue a reasonable exception when we creating Parquet native tables, > {code:java} > CREATE TABLE TAB1TEST USING PARQUET AS SELECT COUNT(ID) FROM TAB1; > {code} > {code:java} > org.apache.spark.sql.AnalysisException: Attribute name "count(ID)" contains > invalid character(s) among " ,;{}()\n\t=". Please use alias to rename it.; > {code} > However, the error messages are misleading when we create a table using the > Hive serde "STORED AS" > {code:java} > CREATE TABLE TAB1TEST STORED AS PARQUET AS SELECT COUNT(ID) FROM TAB1; > {code} > {code:java} > 18/11/26 09:04:44 ERROR SparkSQLDriver: Failed in [CREATE TABLE TAB2TEST > stored as parquet AS SELECT COUNT(col1) FROM TAB1] > org.apache.spark.SparkException: Job aborted. > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:196) > at > org.apache.spark.sql.hive.execution.SaveAsHiveFile.saveAsHiveFile(SaveAsHiveFile.scala:97) > at > org.apache.spark.sql.hive.execution.SaveAsHiveFile.saveAsHiveFile$(SaveAsHiveFile.scala:48) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.saveAsHiveFile(InsertIntoHiveTable.scala:66) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.processInsert(InsertIntoHiveTable.scala:201) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.run(InsertIntoHiveTable.scala:99) > at > org.apache.spark.sql.hive.execution.CreateHiveTableAsSelectCommand.run(CreateHiveTableAsSelectCommand.scala:86) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:113) > at > org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:201) > at > org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3270) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:147) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74) > at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3266) > at org.apache.spark.sql.Dataset.(Dataset.scala:201) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:86) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:655) > at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:685) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:62) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:371) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:274) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:852) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:927) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:936) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 0 in stage 3.0 failed 1 times, most rec
[jira] [Assigned] (SPARK-26990) Difference in handling of mixed-case partition column names after SPARK-26188
[ https://issues.apache.org/jira/browse/SPARK-26990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li reassigned SPARK-26990: --- Assignee: Gengliang Wang > Difference in handling of mixed-case partition column names after SPARK-26188 > - > > Key: SPARK-26990 > URL: https://issues.apache.org/jira/browse/SPARK-26990 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.1 >Reporter: Bruce Robbins >Assignee: Gengliang Wang >Priority: Major > Fix For: 2.4.1, 3.0.0 > > > I noticed that the [PR for > SPARK-26188|https://github.com/apache/spark/pull/23165] changed how > mixed-cased partition columns are handled when the user provides a schema. > Say I have this file structure (note that each instance of `pS` is mixed > case): > {noformat} > bash-3.2$ find partitioned5 -type d > partitioned5 > partitioned5/pi=2 > partitioned5/pi=2/pS=foo > partitioned5/pi=2/pS=bar > partitioned5/pi=1 > partitioned5/pi=1/pS=foo > partitioned5/pi=1/pS=bar > bash-3.2$ > {noformat} > If I load the file with a user-provided schema in 2.4 (before the PR was > committed) or 2.3, I see: > {noformat} > scala> val df = spark.read.schema("intField int, pi int, ps > string").parquet("partitioned5") > df: org.apache.spark.sql.DataFrame = [intField: int, pi: int ... 1 more field] > scala> df.printSchema > root > |-- intField: integer (nullable = true) > |-- pi: integer (nullable = true) > |-- ps: string (nullable = true) > scala> > {noformat} > However, using 2.4 after the PR was committed. I see: > {noformat} > scala> val df = spark.read.schema("intField int, pi int, ps > string").parquet("partitioned5") > df: org.apache.spark.sql.DataFrame = [intField: int, pi: int ... 1 more field] > scala> df.printSchema > root > |-- intField: integer (nullable = true) > |-- pi: integer (nullable = true) > |-- pS: string (nullable = true) > scala> > {noformat} > Spark is picking up the mixed-case column name {{pS}} from the directory > name, not the lower-case {{ps}} from my specified schema. > In all tests, {{spark.sql.caseSensitive}} is set to the default (false). > Not sure is this is an bug, but it is a difference. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27167) What would be the possible impact if I upgrade /static/jquery-1.11.1.min.js ?
Jerry Garcia created SPARK-27167: Summary: What would be the possible impact if I upgrade /static/jquery-1.11.1.min.js ? Key: SPARK-27167 URL: https://issues.apache.org/jira/browse/SPARK-27167 Project: Spark Issue Type: Question Components: Spark Core Affects Versions: 1.6.2 Reporter: Jerry Garcia Will there be a big impact on my system if my current /static/jquery-1.11.1.min.js will be update to latest version ? As per VA scan javascript library that we are currently using is vulnerable and we wanted to address this vulnerability. Appreciate any help we could get from the community. Thanks, -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27166) Improve `printSchema` to print up to the given level
[ https://issues.apache.org/jira/browse/SPARK-27166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-27166: Assignee: (was: Apache Spark) > Improve `printSchema` to print up to the given level > > > Key: SPARK-27166 > URL: https://issues.apache.org/jira/browse/SPARK-27166 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Priority: Minor > > This issue aims to improve `printSchema` to be able to print up to the given > level of the schema. > {code:java} > scala> val df = Seq((1,(2,(3,4.toDF > df: org.apache.spark.sql.DataFrame = [_1: int, _2: struct<_1: int, _2: > struct<_1: int, _2: int>>] > scala> df.printSchema > root > |-- _1: integer (nullable = false) > |-- _2: struct (nullable = true) > | |-- _1: integer (nullable = false) > | |-- _2: struct (nullable = true) > | | |-- _1: integer (nullable = false) > | | |-- _2: integer (nullable = false) > scala> df.printSchema(1) > root > |-- _1: integer (nullable = false) > |-- _2: struct (nullable = true) > scala> df.printSchema(2) > root > |-- _1: integer (nullable = false) > |-- _2: struct (nullable = true) > | |-- _1: integer (nullable = false) > | |-- _2: struct (nullable = true) > scala> df.printSchema(3) > root > |-- _1: integer (nullable = false) > |-- _2: struct (nullable = true) > | |-- _1: integer (nullable = false) > | |-- _2: struct (nullable = true) > | | |-- _1: integer (nullable = false) > | | |-- _2: integer (nullable = false){code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27158) dev/mima and dev/scalastyle support dynamic profiles
[ https://issues.apache.org/jira/browse/SPARK-27158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-27158. -- Resolution: Fixed Fix Version/s: 3.0.0 Fixed in https://github.com/apache/spark/pull/24089 > dev/mima and dev/scalastyle support dynamic profiles > > > Key: SPARK-27158 > URL: https://issues.apache.org/jira/browse/SPARK-27158 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27158) dev/mima and dev/scalastyle support dynamic profiles
[ https://issues.apache.org/jira/browse/SPARK-27158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-27158: Assignee: Yuming Wang > dev/mima and dev/scalastyle support dynamic profiles > > > Key: SPARK-27158 > URL: https://issues.apache.org/jira/browse/SPARK-27158 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27166) Improve `printSchema` to print up to the given level
Dongjoon Hyun created SPARK-27166: - Summary: Improve `printSchema` to print up to the given level Key: SPARK-27166 URL: https://issues.apache.org/jira/browse/SPARK-27166 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Dongjoon Hyun This issue aims to improve `printSchema` to be able to print up to the given level of the schema. {code:java} scala> val df = Seq((1,(2,(3,4.toDF df: org.apache.spark.sql.DataFrame = [_1: int, _2: struct<_1: int, _2: struct<_1: int, _2: int>>] scala> df.printSchema root |-- _1: integer (nullable = false) |-- _2: struct (nullable = true) | |-- _1: integer (nullable = false) | |-- _2: struct (nullable = true) | | |-- _1: integer (nullable = false) | | |-- _2: integer (nullable = false) scala> df.printSchema(1) root |-- _1: integer (nullable = false) |-- _2: struct (nullable = true) scala> df.printSchema(2) root |-- _1: integer (nullable = false) |-- _2: struct (nullable = true) | |-- _1: integer (nullable = false) | |-- _2: struct (nullable = true) scala> df.printSchema(3) root |-- _1: integer (nullable = false) |-- _2: struct (nullable = true) | |-- _1: integer (nullable = false) | |-- _2: struct (nullable = true) | | |-- _1: integer (nullable = false) | | |-- _2: integer (nullable = false){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27166) Improve `printSchema` to print up to the given level
[ https://issues.apache.org/jira/browse/SPARK-27166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-27166: Assignee: Apache Spark > Improve `printSchema` to print up to the given level > > > Key: SPARK-27166 > URL: https://issues.apache.org/jira/browse/SPARK-27166 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Minor > > This issue aims to improve `printSchema` to be able to print up to the given > level of the schema. > {code:java} > scala> val df = Seq((1,(2,(3,4.toDF > df: org.apache.spark.sql.DataFrame = [_1: int, _2: struct<_1: int, _2: > struct<_1: int, _2: int>>] > scala> df.printSchema > root > |-- _1: integer (nullable = false) > |-- _2: struct (nullable = true) > | |-- _1: integer (nullable = false) > | |-- _2: struct (nullable = true) > | | |-- _1: integer (nullable = false) > | | |-- _2: integer (nullable = false) > scala> df.printSchema(1) > root > |-- _1: integer (nullable = false) > |-- _2: struct (nullable = true) > scala> df.printSchema(2) > root > |-- _1: integer (nullable = false) > |-- _2: struct (nullable = true) > | |-- _1: integer (nullable = false) > | |-- _2: struct (nullable = true) > scala> df.printSchema(3) > root > |-- _1: integer (nullable = false) > |-- _2: struct (nullable = true) > | |-- _1: integer (nullable = false) > | |-- _2: struct (nullable = true) > | | |-- _1: integer (nullable = false) > | | |-- _2: integer (nullable = false){code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27107) Spark SQL Job failing because of Kryo buffer overflow with ORC
[ https://issues.apache.org/jira/browse/SPARK-27107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16793118#comment-16793118 ] Apache Spark commented on SPARK-27107: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/24096 > Spark SQL Job failing because of Kryo buffer overflow with ORC > -- > > Key: SPARK-27107 > URL: https://issues.apache.org/jira/browse/SPARK-27107 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.2, 2.4.0 >Reporter: Dhruve Ashar >Priority: Major > > The issue occurs while trying to read ORC data and setting the SearchArgument. > {code:java} > Caused by: com.esotericsoftware.kryo.KryoException: Buffer overflow. > Available: 0, required: 9 > Serialization trace: > literalList > (org.apache.orc.storage.ql.io.sarg.SearchArgumentImpl$PredicateLeafImpl) > leaves (org.apache.orc.storage.ql.io.sarg.SearchArgumentImpl) > at com.esotericsoftware.kryo.io.Output.require(Output.java:163) > at com.esotericsoftware.kryo.io.Output.writeVarLong(Output.java:614) > at com.esotericsoftware.kryo.io.Output.writeLong(Output.java:538) > at > com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.write(DefaultSerializers.java:147) > at > com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.write(DefaultSerializers.java:141) > at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:534) > at > org.apache.orc.mapred.OrcInputFormat.setSearchArgument(OrcInputFormat.java:96) > at > org.apache.orc.mapreduce.OrcInputFormat.setSearchArgument(OrcInputFormat.java:57) > at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(OrcFileFormat.scala:159) > at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(OrcFileFormat.scala:156) > at scala.Option.foreach(Option.scala:257) > at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.buildReaderWithPartitionValues(OrcFileFormat.scala:156) > at > org.apache.spark.sql.execution.FileSourceScanExec.inputRDD$lzycompute(DataSourceScanExec.scala:297) > at > org.apache.spark.sql.execution.FileSourceScanExec.inputRDD(DataSourceScanExec.scala:295) > at > org.apache.spark.sql.execution.FileSourceScanExec.inputRDDs(DataSourceScanExec.scala:315) > at > org.apache.spark.sql.execution.FilterExec.inputRDDs(basicPhysicalOperators.scala:121) > at > org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:41) > at > org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:605) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.python.EvalPythonExec.doExecute(EvalPythonExec.scala:89) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) > at > org.
[jira] [Assigned] (SPARK-27165) Upgrade Apache ORC to 1.5.5
[ https://issues.apache.org/jira/browse/SPARK-27165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-27165: Assignee: Apache Spark > Upgrade Apache ORC to 1.5.5 > --- > > Key: SPARK-27165 > URL: https://issues.apache.org/jira/browse/SPARK-27165 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.4.1, 3.0.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > > This issue aims to update Apache ORC dependency to fix SPARK-27107 . -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27107) Spark SQL Job failing because of Kryo buffer overflow with ORC
[ https://issues.apache.org/jira/browse/SPARK-27107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16793125#comment-16793125 ] Apache Spark commented on SPARK-27107: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/24097 > Spark SQL Job failing because of Kryo buffer overflow with ORC > -- > > Key: SPARK-27107 > URL: https://issues.apache.org/jira/browse/SPARK-27107 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.2, 2.4.0 >Reporter: Dhruve Ashar >Priority: Major > > The issue occurs while trying to read ORC data and setting the SearchArgument. > {code:java} > Caused by: com.esotericsoftware.kryo.KryoException: Buffer overflow. > Available: 0, required: 9 > Serialization trace: > literalList > (org.apache.orc.storage.ql.io.sarg.SearchArgumentImpl$PredicateLeafImpl) > leaves (org.apache.orc.storage.ql.io.sarg.SearchArgumentImpl) > at com.esotericsoftware.kryo.io.Output.require(Output.java:163) > at com.esotericsoftware.kryo.io.Output.writeVarLong(Output.java:614) > at com.esotericsoftware.kryo.io.Output.writeLong(Output.java:538) > at > com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.write(DefaultSerializers.java:147) > at > com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.write(DefaultSerializers.java:141) > at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:534) > at > org.apache.orc.mapred.OrcInputFormat.setSearchArgument(OrcInputFormat.java:96) > at > org.apache.orc.mapreduce.OrcInputFormat.setSearchArgument(OrcInputFormat.java:57) > at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(OrcFileFormat.scala:159) > at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(OrcFileFormat.scala:156) > at scala.Option.foreach(Option.scala:257) > at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.buildReaderWithPartitionValues(OrcFileFormat.scala:156) > at > org.apache.spark.sql.execution.FileSourceScanExec.inputRDD$lzycompute(DataSourceScanExec.scala:297) > at > org.apache.spark.sql.execution.FileSourceScanExec.inputRDD(DataSourceScanExec.scala:295) > at > org.apache.spark.sql.execution.FileSourceScanExec.inputRDDs(DataSourceScanExec.scala:315) > at > org.apache.spark.sql.execution.FilterExec.inputRDDs(basicPhysicalOperators.scala:121) > at > org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:41) > at > org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:605) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.python.EvalPythonExec.doExecute(EvalPythonExec.scala:89) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) > at > org.
[jira] [Assigned] (SPARK-27165) Upgrade Apache ORC to 1.5.5
[ https://issues.apache.org/jira/browse/SPARK-27165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-27165: Assignee: (was: Apache Spark) > Upgrade Apache ORC to 1.5.5 > --- > > Key: SPARK-27165 > URL: https://issues.apache.org/jira/browse/SPARK-27165 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.4.1, 3.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > This issue aims to update Apache ORC dependency to fix SPARK-27107 . -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27165) Upgrade Apache ORC to 1.5.5
[ https://issues.apache.org/jira/browse/SPARK-27165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-27165: -- Description: This issue aims to update Apache ORC dependency to fix SPARK-27107 . {code:java} [ORC-452] Support converting MAP column from JSON to ORC Improvement [ORC-447] Change the docker scripts to keep a persistent m2 cache [ORC-463] Add `version` command [ORC-475] ORC reader should lazily get filesystem [ORC-476] Make SearchAgument kryo buffer size configurable{code} was:This issue aims to update Apache ORC dependency to fix SPARK-27107 . > Upgrade Apache ORC to 1.5.5 > --- > > Key: SPARK-27165 > URL: https://issues.apache.org/jira/browse/SPARK-27165 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.4.1, 3.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > This issue aims to update Apache ORC dependency to fix SPARK-27107 . > {code:java} > [ORC-452] Support converting MAP column from JSON to ORC > Improvement > [ORC-447] Change the docker scripts to keep a persistent m2 cache > [ORC-463] Add `version` command > [ORC-475] ORC reader should lazily get filesystem > [ORC-476] Make SearchAgument kryo buffer size configurable{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27107) Spark SQL Job failing because of Kryo buffer overflow with ORC
[ https://issues.apache.org/jira/browse/SPARK-27107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-27107: Assignee: Apache Spark > Spark SQL Job failing because of Kryo buffer overflow with ORC > -- > > Key: SPARK-27107 > URL: https://issues.apache.org/jira/browse/SPARK-27107 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.2, 2.4.0 >Reporter: Dhruve Ashar >Assignee: Apache Spark >Priority: Major > > The issue occurs while trying to read ORC data and setting the SearchArgument. > {code:java} > Caused by: com.esotericsoftware.kryo.KryoException: Buffer overflow. > Available: 0, required: 9 > Serialization trace: > literalList > (org.apache.orc.storage.ql.io.sarg.SearchArgumentImpl$PredicateLeafImpl) > leaves (org.apache.orc.storage.ql.io.sarg.SearchArgumentImpl) > at com.esotericsoftware.kryo.io.Output.require(Output.java:163) > at com.esotericsoftware.kryo.io.Output.writeVarLong(Output.java:614) > at com.esotericsoftware.kryo.io.Output.writeLong(Output.java:538) > at > com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.write(DefaultSerializers.java:147) > at > com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.write(DefaultSerializers.java:141) > at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:534) > at > org.apache.orc.mapred.OrcInputFormat.setSearchArgument(OrcInputFormat.java:96) > at > org.apache.orc.mapreduce.OrcInputFormat.setSearchArgument(OrcInputFormat.java:57) > at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(OrcFileFormat.scala:159) > at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(OrcFileFormat.scala:156) > at scala.Option.foreach(Option.scala:257) > at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.buildReaderWithPartitionValues(OrcFileFormat.scala:156) > at > org.apache.spark.sql.execution.FileSourceScanExec.inputRDD$lzycompute(DataSourceScanExec.scala:297) > at > org.apache.spark.sql.execution.FileSourceScanExec.inputRDD(DataSourceScanExec.scala:295) > at > org.apache.spark.sql.execution.FileSourceScanExec.inputRDDs(DataSourceScanExec.scala:315) > at > org.apache.spark.sql.execution.FilterExec.inputRDDs(basicPhysicalOperators.scala:121) > at > org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:41) > at > org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:605) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.python.EvalPythonExec.doExecute(EvalPythonExec.scala:89) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) >
[jira] [Assigned] (SPARK-27107) Spark SQL Job failing because of Kryo buffer overflow with ORC
[ https://issues.apache.org/jira/browse/SPARK-27107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-27107: Assignee: (was: Apache Spark) > Spark SQL Job failing because of Kryo buffer overflow with ORC > -- > > Key: SPARK-27107 > URL: https://issues.apache.org/jira/browse/SPARK-27107 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.2, 2.4.0 >Reporter: Dhruve Ashar >Priority: Major > > The issue occurs while trying to read ORC data and setting the SearchArgument. > {code:java} > Caused by: com.esotericsoftware.kryo.KryoException: Buffer overflow. > Available: 0, required: 9 > Serialization trace: > literalList > (org.apache.orc.storage.ql.io.sarg.SearchArgumentImpl$PredicateLeafImpl) > leaves (org.apache.orc.storage.ql.io.sarg.SearchArgumentImpl) > at com.esotericsoftware.kryo.io.Output.require(Output.java:163) > at com.esotericsoftware.kryo.io.Output.writeVarLong(Output.java:614) > at com.esotericsoftware.kryo.io.Output.writeLong(Output.java:538) > at > com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.write(DefaultSerializers.java:147) > at > com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.write(DefaultSerializers.java:141) > at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:534) > at > org.apache.orc.mapred.OrcInputFormat.setSearchArgument(OrcInputFormat.java:96) > at > org.apache.orc.mapreduce.OrcInputFormat.setSearchArgument(OrcInputFormat.java:57) > at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(OrcFileFormat.scala:159) > at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(OrcFileFormat.scala:156) > at scala.Option.foreach(Option.scala:257) > at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.buildReaderWithPartitionValues(OrcFileFormat.scala:156) > at > org.apache.spark.sql.execution.FileSourceScanExec.inputRDD$lzycompute(DataSourceScanExec.scala:297) > at > org.apache.spark.sql.execution.FileSourceScanExec.inputRDD(DataSourceScanExec.scala:295) > at > org.apache.spark.sql.execution.FileSourceScanExec.inputRDDs(DataSourceScanExec.scala:315) > at > org.apache.spark.sql.execution.FilterExec.inputRDDs(basicPhysicalOperators.scala:121) > at > org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:41) > at > org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:605) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.python.EvalPythonExec.doExecute(EvalPythonExec.scala:89) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) > at > org.apache.spark.r
[jira] [Updated] (SPARK-27165) Upgrade Apache ORC to 1.5.5
[ https://issues.apache.org/jira/browse/SPARK-27165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-27165: -- Description: This issue aims to update Apache ORC dependency to fix SPARK-27107 . (was: This issue aims to update Apache ORC dependency to fix SPARK-27160.) > Upgrade Apache ORC to 1.5.5 > --- > > Key: SPARK-27165 > URL: https://issues.apache.org/jira/browse/SPARK-27165 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.4.1, 3.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > This issue aims to update Apache ORC dependency to fix SPARK-27107 . -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27165) Upgrade Apache ORC to 1.5.5
Dongjoon Hyun created SPARK-27165: - Summary: Upgrade Apache ORC to 1.5.5 Key: SPARK-27165 URL: https://issues.apache.org/jira/browse/SPARK-27165 Project: Spark Issue Type: Bug Components: Build Affects Versions: 2.4.1, 3.0.0 Reporter: Dongjoon Hyun This issue aims to update Apache ORC dependency to fix SPARK-27160. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27107) Spark SQL Job failing because of Kryo buffer overflow with ORC
[ https://issues.apache.org/jira/browse/SPARK-27107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16793110#comment-16793110 ] Dongjoon Hyun commented on SPARK-27107: --- The vote passed. I'm preparing the PRs. > Spark SQL Job failing because of Kryo buffer overflow with ORC > -- > > Key: SPARK-27107 > URL: https://issues.apache.org/jira/browse/SPARK-27107 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.2, 2.4.0 >Reporter: Dhruve Ashar >Priority: Major > > The issue occurs while trying to read ORC data and setting the SearchArgument. > {code:java} > Caused by: com.esotericsoftware.kryo.KryoException: Buffer overflow. > Available: 0, required: 9 > Serialization trace: > literalList > (org.apache.orc.storage.ql.io.sarg.SearchArgumentImpl$PredicateLeafImpl) > leaves (org.apache.orc.storage.ql.io.sarg.SearchArgumentImpl) > at com.esotericsoftware.kryo.io.Output.require(Output.java:163) > at com.esotericsoftware.kryo.io.Output.writeVarLong(Output.java:614) > at com.esotericsoftware.kryo.io.Output.writeLong(Output.java:538) > at > com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.write(DefaultSerializers.java:147) > at > com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.write(DefaultSerializers.java:141) > at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:534) > at > org.apache.orc.mapred.OrcInputFormat.setSearchArgument(OrcInputFormat.java:96) > at > org.apache.orc.mapreduce.OrcInputFormat.setSearchArgument(OrcInputFormat.java:57) > at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(OrcFileFormat.scala:159) > at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(OrcFileFormat.scala:156) > at scala.Option.foreach(Option.scala:257) > at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.buildReaderWithPartitionValues(OrcFileFormat.scala:156) > at > org.apache.spark.sql.execution.FileSourceScanExec.inputRDD$lzycompute(DataSourceScanExec.scala:297) > at > org.apache.spark.sql.execution.FileSourceScanExec.inputRDD(DataSourceScanExec.scala:295) > at > org.apache.spark.sql.execution.FileSourceScanExec.inputRDDs(DataSourceScanExec.scala:315) > at > org.apache.spark.sql.execution.FilterExec.inputRDDs(basicPhysicalOperators.scala:121) > at > org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:41) > at > org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:605) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.python.EvalPythonExec.doExecute(EvalPythonExec.scala:89) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.appl
[jira] [Comment Edited] (SPARK-27098) Flaky missing file parts when writing to Ceph without error
[ https://issues.apache.org/jira/browse/SPARK-27098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16793082#comment-16793082 ] Martin Loncaric edited comment on SPARK-27098 at 3/14/19 9:28 PM: -- [~ste...@apache.org] Does this make more sense to you? This seems to suggest a bug in either Spark or Hadoop, but do you have a more specific idea of where to look? was (Author: mwlon): [~ste...@apache.org] Does this make more sense to you? This seems to suggest a bug in either Spark or Hadoop, but do you have a better idea of where to look? > Flaky missing file parts when writing to Ceph without error > --- > > Key: SPARK-27098 > URL: https://issues.apache.org/jira/browse/SPARK-27098 > Project: Spark > Issue Type: Bug > Components: Input/Output >Affects Versions: 2.4.0 >Reporter: Martin Loncaric >Priority: Major > Attachments: sanitized_stdout_1.txt > > > https://stackoverflow.com/questions/54935822/spark-s3a-write-omits-upload-part-without-failure/55031233?noredirect=1#comment96835218_55031233 > Using 2.4.0 with Hadoop 2.7, hadoop-aws 2.7.5, and the Ceph S3 endpoint. > occasionally a file part will be missing; i.e. part 3 here: > ``` > > aws s3 ls my-bucket/folder/ > 2019-02-28 13:07:21 0 _SUCCESS > 2019-02-28 13:06:58 79428651 > part-0-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:06:59 79586172 > part-1-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:00 79561910 > part-2-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:01 79192617 > part-4-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:07 79364413 > part-5-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:08 79623254 > part-6-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:10 79445030 > part-7-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:10 79474923 > part-8-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:11 79477310 > part-9-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:12 79331453 > part-00010-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:13 79567600 > part-00011-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:13 79388012 > part-00012-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:14 79308387 > part-00013-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:15 79455483 > part-00014-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:17 79512342 > part-00015-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:18 79403307 > part-00016-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:18 79617769 > part-00017-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:19 79333534 > part-00018-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:20 79543324 > part-00019-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > ``` > However, the write succeeds and leaves a _SUCCESS file. > This can be caught by additionally checking afterward whether the number of > written file parts agrees with the number of partitions, but Spark should at > least fail on its own and leave a meaningful stack trace in this case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27098) Flaky missing file parts when writing to Ceph without error
[ https://issues.apache.org/jira/browse/SPARK-27098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16793082#comment-16793082 ] Martin Loncaric commented on SPARK-27098: - [~ste...@apache.org] Does this make more sense to you? This seems to suggest a bug in either Spark or Hadoop, but do you have a better idea of where to look? > Flaky missing file parts when writing to Ceph without error > --- > > Key: SPARK-27098 > URL: https://issues.apache.org/jira/browse/SPARK-27098 > Project: Spark > Issue Type: Bug > Components: Input/Output >Affects Versions: 2.4.0 >Reporter: Martin Loncaric >Priority: Major > Attachments: sanitized_stdout_1.txt > > > https://stackoverflow.com/questions/54935822/spark-s3a-write-omits-upload-part-without-failure/55031233?noredirect=1#comment96835218_55031233 > Using 2.4.0 with Hadoop 2.7, hadoop-aws 2.7.5, and the Ceph S3 endpoint. > occasionally a file part will be missing; i.e. part 3 here: > ``` > > aws s3 ls my-bucket/folder/ > 2019-02-28 13:07:21 0 _SUCCESS > 2019-02-28 13:06:58 79428651 > part-0-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:06:59 79586172 > part-1-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:00 79561910 > part-2-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:01 79192617 > part-4-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:07 79364413 > part-5-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:08 79623254 > part-6-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:10 79445030 > part-7-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:10 79474923 > part-8-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:11 79477310 > part-9-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:12 79331453 > part-00010-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:13 79567600 > part-00011-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:13 79388012 > part-00012-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:14 79308387 > part-00013-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:15 79455483 > part-00014-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:17 79512342 > part-00015-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:18 79403307 > part-00016-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:18 79617769 > part-00017-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:19 79333534 > part-00018-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:20 79543324 > part-00019-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > ``` > However, the write succeeds and leaves a _SUCCESS file. > This can be caught by additionally checking afterward whether the number of > written file parts agrees with the number of partitions, but Spark should at > least fail on its own and leave a meaningful stack trace in this case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27098) Flaky missing file parts when writing to Ceph without error
[ https://issues.apache.org/jira/browse/SPARK-27098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16793080#comment-16793080 ] Martin Loncaric commented on SPARK-27098: - I've gotten the debug logs for (1.), but can't make much of them. In this case, `part-0-` was missing: {{Exception in thread "main" java.lang.AssertionError: assertion failed: Expected to write dataframe with 20 partitions in s3a://my-bucket/my_folder but instead found 19 written parts! 1552587026347 82681618 part-1-5e21727b-508e-4246-b47c-c68c98c04f50-c000.snappy.parquet 1552587027399 82631123 part-2-5e21727b-508e-4246-b47c-c68c98c04f50-c000.snappy.parquet 1552587028592 82513038 part-3-5e21727b-508e-4246-b47c-c68c98c04f50-c000.snappy.parquet 1552587029544 82325322 part-4-5e21727b-508e-4246-b47c-c68c98c04f50-c000.snappy.parquet 1552587030573 82497917 part-5-5e21727b-508e-4246-b47c-c68c98c04f50-c000.snappy.parquet 1552587031590 82736624 part-6-5e21727b-508e-4246-b47c-c68c98c04f50-c000.snappy.parquet 1552587032449 82573267 part-7-5e21727b-508e-4246-b47c-c68c98c04f50-c000.snappy.parquet 1552587033351 82590538 part-8-5e21727b-508e-4246-b47c-c68c98c04f50-c000.snappy.parquet 1552587034582 82617979 part-9-5e21727b-508e-4246-b47c-c68c98c04f50-c000.snappy.parquet 1552587035817 82430474 part-00010-5e21727b-508e-4246-b47c-c68c98c04f50-c000.snappy.parquet 1552587036808 82688230 part-00011-5e21727b-508e-4246-b47c-c68c98c04f50-c000.snappy.parquet 1552587037744 8252 part-00012-5e21727b-508e-4246-b47c-c68c98c04f50-c000.snappy.parquet 1552587039017 82434976 part-00013-5e21727b-508e-4246-b47c-c68c98c04f50-c000.snappy.parquet 1552587039919 82535772 part-00014-5e21727b-508e-4246-b47c-c68c98c04f50-c000.snappy.parquet 1552587040884 82612890 part-00015-5e21727b-508e-4246-b47c-c68c98c04f50-c000.snappy.parquet 1552587041898 82535110 part-00016-5e21727b-508e-4246-b47c-c68c98c04f50-c000.snappy.parquet 1552587042829 82735449 part-00017-5e21727b-508e-4246-b47c-c68c98c04f50-c000.snappy.parquet 1552587043744 82460648 part-00018-5e21727b-508e-4246-b47c-c68c98c04f50-c000.snappy.parquet 1552587044641 82658185 part-00019-5e21727b-508e-4246-b47c-c68c98c04f50-c000.snappy.parquet at scala.Predef$.assert(Predef.scala:170)}} Looking at stdout for the driver, I find that there is absolutely no mention of part-0, but the other parts (i.e. part-1) have various logs, including the "rename path" ones you mentioned, like so: {{2019-03-14 18:10:26 DEBUG S3AFileSystem:449 - Rename path s3a://my-bucket/my/folder/_temporary/0/task_20190314180906_0016_m_01/part-1-5e21727b-508e-4246-b47c-c68c98c04f50-c000.snappy.parquet to s3a://my-bucket/my/folder/part-1-5e21727b-508e-4246-b47c-c68c98c04f50-c000.snappy.parquet}} I have attached all the debugging related to part-1 here. As mentioned, there is nothing for the missing part-0 (in other runs, it was a different part missing, so there is nothing special about 0, just coincidence). [^sanitized_stdout_1.txt] > Flaky missing file parts when writing to Ceph without error > --- > > Key: SPARK-27098 > URL: https://issues.apache.org/jira/browse/SPARK-27098 > Project: Spark > Issue Type: Bug > Components: Input/Output >Affects Versions: 2.4.0 >Reporter: Martin Loncaric >Priority: Major > Attachments: sanitized_stdout_1.txt > > > https://stackoverflow.com/questions/54935822/spark-s3a-write-omits-upload-part-without-failure/55031233?noredirect=1#comment96835218_55031233 > Using 2.4.0 with Hadoop 2.7, hadoop-aws 2.7.5, and the Ceph S3 endpoint. > occasionally a file part will be missing; i.e. part 3 here: > ``` > > aws s3 ls my-bucket/folder/ > 2019-02-28 13:07:21 0 _SUCCESS > 2019-02-28 13:06:58 79428651 > part-0-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:06:59 79586172 > part-1-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:00 79561910 > part-2-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:01 79192617 > part-4-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:07 79364413 > part-5-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:08 79623254 > part-6-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:10 79445030 > part-7-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:10 79474923 > part-8-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:11 79477310 > part-9-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:12 79331453 > part-00010-5789
[jira] [Updated] (SPARK-27098) Flaky missing file parts when writing to Ceph without error
[ https://issues.apache.org/jira/browse/SPARK-27098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martin Loncaric updated SPARK-27098: Attachment: sanitized_stdout_1.txt > Flaky missing file parts when writing to Ceph without error > --- > > Key: SPARK-27098 > URL: https://issues.apache.org/jira/browse/SPARK-27098 > Project: Spark > Issue Type: Bug > Components: Input/Output >Affects Versions: 2.4.0 >Reporter: Martin Loncaric >Priority: Major > Attachments: sanitized_stdout_1.txt > > > https://stackoverflow.com/questions/54935822/spark-s3a-write-omits-upload-part-without-failure/55031233?noredirect=1#comment96835218_55031233 > Using 2.4.0 with Hadoop 2.7, hadoop-aws 2.7.5, and the Ceph S3 endpoint. > occasionally a file part will be missing; i.e. part 3 here: > ``` > > aws s3 ls my-bucket/folder/ > 2019-02-28 13:07:21 0 _SUCCESS > 2019-02-28 13:06:58 79428651 > part-0-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:06:59 79586172 > part-1-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:00 79561910 > part-2-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:01 79192617 > part-4-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:07 79364413 > part-5-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:08 79623254 > part-6-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:10 79445030 > part-7-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:10 79474923 > part-8-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:11 79477310 > part-9-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:12 79331453 > part-00010-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:13 79567600 > part-00011-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:13 79388012 > part-00012-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:14 79308387 > part-00013-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:15 79455483 > part-00014-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:17 79512342 > part-00015-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:18 79403307 > part-00016-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:18 79617769 > part-00017-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:19 79333534 > part-00018-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > 2019-02-28 13:07:20 79543324 > part-00019-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet > ``` > However, the write succeeds and leaves a _SUCCESS file. > This can be caught by additionally checking afterward whether the number of > written file parts agrees with the number of partitions, but Spark should at > least fail on its own and leave a meaningful stack trace in this case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27164) RDD.countApprox on empty RDDs schedules jobs which never complete
[ https://issues.apache.org/jira/browse/SPARK-27164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Moore updated SPARK-27164: --- Attachment: Screen Shot 2019-03-14 at 1.49.19 PM.png > RDD.countApprox on empty RDDs schedules jobs which never complete > -- > > Key: SPARK-27164 > URL: https://issues.apache.org/jira/browse/SPARK-27164 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.3, 2.4.0 > Environment: macOS, Spark-2.4.0 with Hadoop 2.7 running on Java 11.0.1 > Also observed on: > macOS, Spark-2.2.3 with Hadoop 2.7 running on Java 1.8.0_151 >Reporter: Ryan Moore >Priority: Major > Attachments: Screen Shot 2019-03-14 at 1.49.19 PM.png > > > When calling `countApprox` on an RDD which has no partitions (such as those > created by `sparkContext.emptyRDD`) a job is scheduled with 0 stages and 0 > tasks. That job appears under the "Active Jobs" in the Spark UI until it is > either killed or the Spark context is shut down. > > {code:java} > Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 11.0.1) > Type in expressions to have them evaluated. > Type :help for more information. > scala> val ints = sc.makeRDD(Seq(1)) > ints: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at makeRDD at > :24 > scala> ints.countApprox(1000) > res0: > org.apache.spark.partial.PartialResult[org.apache.spark.partial.BoundedDouble] > = (final: [1.000, 1.000]) > // PartialResult is returned, Scheduled job completed > scala> ints.filter(_ => false).countApprox(1000) > res1: > org.apache.spark.partial.PartialResult[org.apache.spark.partial.BoundedDouble] > = (final: [0.000, 0.000]) > // PartialResult is returned, Scheduled job completed > scala> sc.emptyRDD[Int].countApprox(1000) > res5: > org.apache.spark.partial.PartialResult[org.apache.spark.partial.BoundedDouble] > = (final: [0.000, 0.000]) > // PartialResult is returned, Scheduled job is ACTIVE but never completes > scala> sc.union(Nil : Seq[org.apache.spark.rdd.RDD[Int]]).countApprox(1000) > res16: > org.apache.spark.partial.PartialResult[org.apache.spark.partial.BoundedDouble] > = (final: [0.000, 0.000]) > // PartialResult is returned, Scheduled job is ACTIVE but never completes > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27164) RDD.countApprox on empty RDDs schedules jobs which never complete
Ryan Moore created SPARK-27164: -- Summary: RDD.countApprox on empty RDDs schedules jobs which never complete Key: SPARK-27164 URL: https://issues.apache.org/jira/browse/SPARK-27164 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.4.0, 2.2.3 Environment: macOS, Spark-2.4.0 with Hadoop 2.7 running on Java 11.0.1 Also observed on: macOS, Spark-2.2.3 with Hadoop 2.7 running on Java 1.8.0_151 Reporter: Ryan Moore When calling `countApprox` on an RDD which has no partitions (such as those created by `sparkContext.emptyRDD`) a job is scheduled with 0 stages and 0 tasks. That job appears under the "Active Jobs" in the Spark UI until it is either killed or the Spark context is shut down. {code:java} Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 11.0.1) Type in expressions to have them evaluated. Type :help for more information. scala> val ints = sc.makeRDD(Seq(1)) ints: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at makeRDD at :24 scala> ints.countApprox(1000) res0: org.apache.spark.partial.PartialResult[org.apache.spark.partial.BoundedDouble] = (final: [1.000, 1.000]) // PartialResult is returned, Scheduled job completed scala> ints.filter(_ => false).countApprox(1000) res1: org.apache.spark.partial.PartialResult[org.apache.spark.partial.BoundedDouble] = (final: [0.000, 0.000]) // PartialResult is returned, Scheduled job completed scala> sc.emptyRDD[Int].countApprox(1000) res5: org.apache.spark.partial.PartialResult[org.apache.spark.partial.BoundedDouble] = (final: [0.000, 0.000]) // PartialResult is returned, Scheduled job is ACTIVE but never completes scala> sc.union(Nil : Seq[org.apache.spark.rdd.RDD[Int]]).countApprox(1000) res16: org.apache.spark.partial.PartialResult[org.apache.spark.partial.BoundedDouble] = (final: [0.000, 0.000]) // PartialResult is returned, Scheduled job is ACTIVE but never completes {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27145) Close store after test, in the SQLAppStatusListenerSuite
[ https://issues.apache.org/jira/browse/SPARK-27145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-27145. Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 24079 [https://github.com/apache/spark/pull/24079] > Close store after test, in the SQLAppStatusListenerSuite > > > Key: SPARK-27145 > URL: https://issues.apache.org/jira/browse/SPARK-27145 > Project: Spark > Issue Type: Improvement > Components: SQL, Tests >Affects Versions: 2.3.3, 2.4.0, 3.0.0 >Reporter: shahid >Assignee: shahid >Priority: Minor > Fix For: 3.0.0 > > > We create many stores in the SQLAppStatusListenerSuite, but we need to the > close store after test. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27145) Close store after test, in the SQLAppStatusListenerSuite
[ https://issues.apache.org/jira/browse/SPARK-27145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin reassigned SPARK-27145: -- Assignee: shahid > Close store after test, in the SQLAppStatusListenerSuite > > > Key: SPARK-27145 > URL: https://issues.apache.org/jira/browse/SPARK-27145 > Project: Spark > Issue Type: Improvement > Components: SQL, Tests >Affects Versions: 2.3.3, 2.4.0, 3.0.0 >Reporter: shahid >Assignee: shahid >Priority: Minor > > We create many stores in the SQLAppStatusListenerSuite, but we need to the > close store after test. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27142) Provide REST API for SQL level information
[ https://issues.apache.org/jira/browse/SPARK-27142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16792975#comment-16792975 ] Marcelo Vanzin commented on SPARK-27142: I'm not sure I understand your point, Sean. We expose all the data about jobs and streaming in the REST API, why would we not want to expose SQL? > Provide REST API for SQL level information > -- > > Key: SPARK-27142 > URL: https://issues.apache.org/jira/browse/SPARK-27142 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.0.0 >Reporter: Ajith S >Priority: Minor > Attachments: image-2019-03-13-19-29-26-896.png > > > Currently for Monitoring Spark application SQL information is not available > from REST but only via UI. REST provides only > applications,jobs,stages,environment. This Jira is targeted to provide a REST > API so that SQL level information can be found > > Details: > https://issues.apache.org/jira/browse/SPARK-27142?focusedCommentId=16791728&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16791728 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-5997) Increase partition count without performing a shuffle
[ https://issues.apache.org/jira/browse/SPARK-5997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16792964#comment-16792964 ] nirav patel edited comment on SPARK-5997 at 3/14/19 6:56 PM: - Adding another possible use case for this ask - I am hitting IllegalArgumentException: Size exceeds Integer.MAX_VALUE error when trying to write unpartitioned Dataframe to parquet. Error is due to data block exceed 2GB in size before writing to disk. Solution is to repartition the Dataframe (Dataset) . I can do it but I don't want to cause shuffle when I increase number of partitions with repartition API. was (Author: tenstriker): Adding another possible use case for this ask - I am hitting IllegalArgumentException: Size exceeds Integer.MAX_VALUE error when trying to write unpartitioned Dataframe to parquet. Error is due to shuffleblock exceed 2GB in size. Solution is to repartition the Dataframe (Dataset) . I can do it but I don't want to cause shuffle when I increase number of partitions with repartition API. > Increase partition count without performing a shuffle > - > > Key: SPARK-5997 > URL: https://issues.apache.org/jira/browse/SPARK-5997 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Andrew Ash >Priority: Major > > When decreasing partition count with rdd.repartition() or rdd.coalesce(), the > user has the ability to choose whether or not to perform a shuffle. However > when increasing partition count there is no option of whether to perform a > shuffle or not -- a shuffle always occurs. > This Jira is to create a {{rdd.repartition(largeNum, shuffle=false)}} call > that performs a repartition to a higher partition count without a shuffle. > The motivating use case is to decrease the size of an individual partition > enough that the .toLocalIterator has significantly reduced memory pressure on > the driver, as it loads a partition at a time into the driver. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27006) SPIP: .NET bindings for Apache Spark
[ https://issues.apache.org/jira/browse/SPARK-27006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Terry Kim updated SPARK-27006: -- Priority: Major (was: Minor) > SPIP: .NET bindings for Apache Spark > > > Key: SPARK-27006 > URL: https://issues.apache.org/jira/browse/SPARK-27006 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Terry Kim >Priority: Major > Original Estimate: 4,032h > Remaining Estimate: 4,032h > > h4. Background and Motivation: > Apache Spark provides programming language support for Scala/Java (native), > and extensions for Python and R. While a variety of other language extensions > are possible to include in Apache Spark, .NET would bring one of the largest > developer community to the table. Presently, no good Big Data solution exists > for .NET developers in open source. This SPIP aims at discussing how we can > bring Apache Spark goodness to the .NET development platform. > .NET is a free, cross-platform, open source developer platform for building > many different types of applications. With .NET, you can use multiple > languages, editors, and libraries to build for web, mobile, desktop, gaming, > and IoT types of applications. Even with .NET serving millions of developers, > there is no good Big Data solution that exists today, which this SPIP aims to > address. > The .NET developer community is one of the largest programming language > communities in the world. Its flagship programming language C# is listed as > one of the most popular programming languages in a variety of articles and > statistics: > * Most popular Technologies on Stack Overflow: > [https://insights.stackoverflow.com/survey/2018/#most-popular-technologies|https://insights.stackoverflow.com/survey/2018/] > > * Most popular languages on GitHub 2018: > [https://www.businessinsider.com/the-10-most-popular-programming-languages-according-to-github-2018-10#2-java-9|https://www.businessinsider.com/the-10-most-popular-programming-languages-according-to-github-2018-10] > > * 1M+ new developers last 1 year > * Second most demanded technology on LinkedIn > * Top 30 High velocity OSS projects on GitHub > Including a C# language extension in Apache Spark will enable millions of > .NET developers to author Big Data applications in their preferred > programming language, developer environment, and tooling support. We aim to > promote the .NET bindings for Spark through engagements with the Spark > community (e.g., we are scheduled to present an early prototype at the SF > Spark Summit 2019) and the .NET developer community (e.g., similar > presentations will be held at .NET developer conferences this year). As > such, we believe that our efforts will help grow the Spark community by > making it accessible to the millions of .NET developers. > Furthermore, our early discussions with some large .NET development teams got > an enthusiastic reception. > We recognize that earlier attempts at this goal (specifically Mobius > [https://github.com/Microsoft/Mobius]) were unsuccessful primarily due to the > lack of communication with the Spark community. Therefore, another goal of > this proposal is to not only develop .NET bindings for Spark in open source, > but also continuously seek feedback from the Spark community via posted > Jira’s (like this one) and the Spark developer mailing list. Our hope is that > through these engagements, we can build a community of developers that are > eager to contribute to this effort or want to leverage the resulting .NET > bindings for Spark in their respective Big Data applications. > h4. Target Personas: > .NET developers looking to build big data solutions. > h4. Goals: > Our primary goal is to help grow Apache Spark by making it accessible to the > large .NET developer base and ecosystem. We will also look for opportunities > to generalize the interop layers for Spark for adding other language > extensions in the future. [SPARK-26257]( > https://issues.apache.org/jira/browse/SPARK-26257) proposes such a > generalized interop layer, which we hope to address over the course of this > project. > Another important goal for us is to not only enable Spark as an application > solution for .NET developers, but also opening the door for .NET developers > to make contributions to Apache Spark itself. > Lastly, we aim to develop a .NET extension in the open, while continually > engaging with the Spark community for feedback on designs and code. We will > welcome PRs from the Spark community throughout this project and aim to grow > a community of developers that want to contribute to this project. > h4. Non-Goals: > This proposal is focused on adding .NET bindings to Apache Spark, a
[jira] [Commented] (SPARK-5997) Increase partition count without performing a shuffle
[ https://issues.apache.org/jira/browse/SPARK-5997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16792964#comment-16792964 ] nirav patel commented on SPARK-5997: Adding another possible use case for this ask - I am hitting IllegalArgumentException: Size exceeds Integer.MAX_VALUE error when trying to write unpartitioned Dataframe to parquet. Error is due to shuffleblock exceed 2GB in size. Solution is to repartition the Dataframe (Dataset) . I can do it but I don't want to cause shuffle when I increase number of partitions with repartition API. > Increase partition count without performing a shuffle > - > > Key: SPARK-5997 > URL: https://issues.apache.org/jira/browse/SPARK-5997 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Andrew Ash >Priority: Major > > When decreasing partition count with rdd.repartition() or rdd.coalesce(), the > user has the ability to choose whether or not to perform a shuffle. However > when increasing partition count there is no option of whether to perform a > shuffle or not -- a shuffle always occurs. > This Jira is to create a {{rdd.repartition(largeNum, shuffle=false)}} call > that performs a repartition to a higher partition count without a shuffle. > The motivating use case is to decrease the size of an individual partition > enough that the .toLocalIterator has significantly reduced memory pressure on > the driver, as it loads a partition at a time into the driver. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-27006) SPIP: .NET bindings for Apache Spark
[ https://issues.apache.org/jira/browse/SPARK-27006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16792939#comment-16792939 ] Tyson Condie edited comment on SPARK-27006 at 3/14/19 6:23 PM: --- I would like to briefly illuminate what I think this SPIP is trying to accomplish. I have worked in the Apache community for the better part of my career. Early on doing research at UC Berkeley related to Hadoop, then joining the Pig team at Yahoo! Research, and being part of the Microsoft CISL team that created Apache REEF, which turned out to be Microsoft’s first ever top-level Apache project and remains so to this day. I also had the brief pleasure of working with the Structured Stream team at Databricks and witnessed first-hand some of the exceptional minds behind Apache Spark. So, what is this SPIP about? In my honest opinion, it is about bringing two very large communities together under a common shared goal: *to democratize data for all developers*. Given my roots, I am a Java developer at heart, but I see a tremendous value in the .NET stack and in its languages. Not surprisingly then, I see a significant barrier of entry when telling long time .NET developers that if they want to use Apache Spark, they must code in either Scala/Java, Python, or R. The .NET team conducted a survey (with 1000+ responses) revealing a strong desire from the .NET developer community to learn and use Spark. This SPIP is about making that process much more familiar, but that’s not all its about. This SPIP is about the Microsoft community wanting to learn and contribute to the Apache Spark community, and we are fully funded to do just that. Our leadership team includes Michael Rys and Rahul Potharaju from the Big Data organization, along with Ankit Asthana and Dan Moseley from .NET organization. Our development team includes Terry Kim, Steve Suh, Stephen Toub, Eric Erhardt, Aaron Robinson, and me, where I am again in the company of equally exceptional minds. Together, our goal is to develop .NET bindings for Spark in accordance to best practices from the Apache Foundation and Spark guidelines. We would welcome the opportunity to partner with leaders in the Apache Spark community, not only for their guidance on the work items described in this SPIP, but also on engagements that will bring our communities closer together and lead us to mutually beneficial outcomes. Regarding the work items in this SPIP, as recommended by earlier comments, we will develop externally (and openly) on a fork of Apache Spark. We only ask that a shepherd be available to provide us with occasional guidance towards getting our fork in a state that is acceptable for a contribution back to Apache Spark master. We recognize that such a contribution will not happen overnight, and that we will need to prove to the Spark community that we will continue to maintain it for the foreseeable future. That is why building a +diverse+ community is a very high priority for us, as it will ensure the future investments in .NET bindings for Apache Spark. All of this will take time. For now, we only ask if there is a Spark PMC member who is willing to step up and be our shepherd. Thank you for reading this far and we look forward to seeing you at the SF Spark Summit in April where we will be presenting our early progress on enabling .NET bindings for Apache Spark. was (Author: tcondie): I would like to briefly illuminate what I think this SPIP is trying to accomplish. I have worked in the Apache community for the better part of my career. Early on doing research at UC Berkeley related to Hadoop, then joining the Pig team at Yahoo! Research, and being part of the Microsoft CISL team that created Apache REEF, which turned out to be Microsoft’s first ever top-level Apache project and remains so to this day. I also had the brief pleasure of working with the Structured Stream team at Databricks and witnessed first-hand some of the exceptional minds behind Apache Spark. So, what is this SPIP about? In my honest opinion, it is about bringing two very large communities together under a common shared goal: *to democratize data for all developers*. Given my roots, I am a Java developer at heart, but I see a tremendous value in the .NET stack and in its languages. Not surprisingly then, I see a significant barrier of entry when telling long time .NET developers that if they want to use Apache Spark, they must code in either Scala/Java, Python, or R. The .NET team conducted a survey---with 1000+ responses---revealing a strong desire from the .NET developer community to learn and use Spark. This SPIP is about making that process much more familiar, but that’s not all its about. This SPIP is about the Microsoft community wanting to learn and contribute to the Apache Spark community, and we are fully funded to
[jira] [Commented] (SPARK-27006) SPIP: .NET bindings for Apache Spark
[ https://issues.apache.org/jira/browse/SPARK-27006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16792939#comment-16792939 ] Tyson Condie commented on SPARK-27006: -- I would like to briefly illuminate what I think this SPIP is trying to accomplish. I have worked in the Apache community for the better part of my career. Early on doing research at UC Berkeley related to Hadoop, then joining the Pig team at Yahoo! Research, and being part of the Microsoft CISL team that created Apache REEF, which turned out to be Microsoft’s first ever top-level Apache project and remains so to this day. I also had the brief pleasure of working with the Structured Stream team at Databricks and witnessed first-hand some of the exceptional minds behind Apache Spark. So, what is this SPIP about? In my honest opinion, it is about bringing two very large communities together under a common shared goal: *to democratize data for all developers*. Given my roots, I am a Java developer at heart, but I see a tremendous value in the .NET stack and in its languages. Not surprisingly then, I see a significant barrier of entry when telling long time .NET developers that if they want to use Apache Spark, they must code in either Scala/Java, Python, or R. The .NET team conducted a survey---with 1000+ responses---revealing a strong desire from the .NET developer community to learn and use Spark. This SPIP is about making that process much more familiar, but that’s not all its about. This SPIP is about the Microsoft community wanting to learn and contribute to the Apache Spark community, and we are fully funded to do just that. Our leadership team includes Michael Rys and Rahul Potharaju from the Big Data organization, along with Ankit Asthana and Dan Moseley from .NET organization. Our development team includes Terry Kim, Steve Suh, Stephen Toub, Eric Erhardt, Aaron Robinson, and me, where I am again in the company of equally exceptional minds. Together, our goal is to develop .NET bindings for Spark in accordance to best practices from the Apache Foundation and Spark guidelines. We would welcome the opportunity to partner with leaders in the Apache Spark community, not only for their guidance on the work items described in this SPIP, but also on engagements that will bring our communities closer together and lead us to mutually beneficial outcomes. Regarding the work items in this SPIP, as recommended by earlier comments, we will develop externally (and openly) on a fork of Apache Spark. We only ask that a shepherd be available to provide us with occasional guidance towards getting our fork in a state that is acceptable for a contribution back to Apache Spark master. We recognize that such a contribution will not happen overnight, and that we will need to prove to the Spark community that we will continue to maintain it for the foreseeable future. That is why building a +diverse+ community is a very high priority for us, as it will ensure the future investments in .NET bindings for Apache Spark. All of this will take time. For now, we only ask if there is a Spark PMC member who is willing to step up and be our shepherd. Thank you for reading this far and we look forward to seeing you at the SF Spark Summit in April where we will be presenting our early progress on enabling .NET bindings for Apache Spark. > SPIP: .NET bindings for Apache Spark > > > Key: SPARK-27006 > URL: https://issues.apache.org/jira/browse/SPARK-27006 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Terry Kim >Priority: Minor > Original Estimate: 4,032h > Remaining Estimate: 4,032h > > h4. Background and Motivation: > Apache Spark provides programming language support for Scala/Java (native), > and extensions for Python and R. While a variety of other language extensions > are possible to include in Apache Spark, .NET would bring one of the largest > developer community to the table. Presently, no good Big Data solution exists > for .NET developers in open source. This SPIP aims at discussing how we can > bring Apache Spark goodness to the .NET development platform. > .NET is a free, cross-platform, open source developer platform for building > many different types of applications. With .NET, you can use multiple > languages, editors, and libraries to build for web, mobile, desktop, gaming, > and IoT types of applications. Even with .NET serving millions of developers, > there is no good Big Data solution that exists today, which this SPIP aims to > address. > The .NET developer community is one of the largest programming language > communities in the world. Its flagship programming language C# is listed as > one of the most popular p
[jira] [Assigned] (SPARK-27163) Cleanup and consolidate Pandas UDF functionality
[ https://issues.apache.org/jira/browse/SPARK-27163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-27163: Assignee: (was: Apache Spark) > Cleanup and consolidate Pandas UDF functionality > > > Key: SPARK-27163 > URL: https://issues.apache.org/jira/browse/SPARK-27163 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 2.4.0 >Reporter: Bryan Cutler >Priority: Minor > > Some of the code for Pandas UDFs can be cleaned up and consolidated to remove > duplicated parts. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27163) Cleanup and consolidate Pandas UDF functionality
[ https://issues.apache.org/jira/browse/SPARK-27163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-27163: Assignee: Apache Spark > Cleanup and consolidate Pandas UDF functionality > > > Key: SPARK-27163 > URL: https://issues.apache.org/jira/browse/SPARK-27163 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 2.4.0 >Reporter: Bryan Cutler >Assignee: Apache Spark >Priority: Minor > > Some of the code for Pandas UDFs can be cleaned up and consolidated to remove > duplicated parts. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27163) Cleanup and consolidate Pandas UDF functionality
[ https://issues.apache.org/jira/browse/SPARK-27163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-27163: - Priority: Minor (was: Major) > Cleanup and consolidate Pandas UDF functionality > > > Key: SPARK-27163 > URL: https://issues.apache.org/jira/browse/SPARK-27163 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 2.4.0 >Reporter: Bryan Cutler >Priority: Minor > > Some of the code for Pandas UDFs can be cleaned up and consolidated to remove > duplicated parts. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27163) Cleanup and consolidate Pandas UDF functionality
Bryan Cutler created SPARK-27163: Summary: Cleanup and consolidate Pandas UDF functionality Key: SPARK-27163 URL: https://issues.apache.org/jira/browse/SPARK-27163 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 2.4.0 Reporter: Bryan Cutler Some of the code for Pandas UDFs can be cleaned up and consolidated to remove duplicated parts. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26778) Implement file source V2 partitioning
[ https://issues.apache.org/jira/browse/SPARK-26778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16792843#comment-16792843 ] Ryan Blue commented on SPARK-26778: --- [~Gengliang.Wang], can you clarify what this issue is tracking? > Implement file source V2 partitioning > -- > > Key: SPARK-26778 > URL: https://issues.apache.org/jira/browse/SPARK-26778 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2
[ https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated SPARK-26742: --- Fix Version/s: 2.4.2 > Bump Kubernetes Client Version to 4.1.2 > --- > > Key: SPARK-26742 > URL: https://issues.apache.org/jira/browse/SPARK-26742 > Project: Spark > Issue Type: Dependency upgrade > Components: Kubernetes >Affects Versions: 2.4.0, 3.0.0 >Reporter: Steve Davids >Assignee: Jiaxin Shan >Priority: Major > Labels: easyfix > Fix For: 2.4.2, 3.0.0 > > > Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master > branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest > Kubernetes compatibility support for newer clusters: > https://github.com/fabric8io/kubernetes-client#compatibility-matrix -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27158) dev/mima and dev/scalastyle support dynamic profiles
[ https://issues.apache.org/jira/browse/SPARK-27158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-27158: Issue Type: Sub-task (was: Improvement) Parent: SPARK-23710 > dev/mima and dev/scalastyle support dynamic profiles > > > Key: SPARK-27158 > URL: https://issues.apache.org/jira/browse/SPARK-27158 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27162) Add new method getOriginalMap in CaseInsensitiveStringMap
Gengliang Wang created SPARK-27162: -- Summary: Add new method getOriginalMap in CaseInsensitiveStringMap Key: SPARK-27162 URL: https://issues.apache.org/jira/browse/SPARK-27162 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.0.0 Reporter: Gengliang Wang Currently, DataFrameReader/DataFrameReader supports setting Hadoop configurations via method `.option()`. E.g. ``` class TestFileFilter extends PathFilter { override def accept(path: Path): Boolean = path.getParent.getName != "p=2" } withTempPath { dir => val path = dir.getCanonicalPath val df = spark.range(2) df.write.orc(path + "/p=1") df.write.orc(path + "/p=2") assert(spark.read.orc(path).count() === 4) val extraOptions = Map( "mapred.input.pathFilter.class" -> classOf[TestFileFilter].getName, "mapreduce.input.pathFilter.class" -> classOf[TestFileFilter].getName ) assert(spark.read.options(extraOptions).orc(path).count() === 2) } ``` While Hadoop Configurations are case sensitive, the current data source V2 APIs are using `CaseInsensitiveStringMap` in TableProvider. To create Hadoop configurations correctly, I suggest adding a method `getOriginalMap` in `CaseInsensitiveStringMap`. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23710) Upgrade the built-in Hive to 2.3.4 for hadoop-3.1
[ https://issues.apache.org/jira/browse/SPARK-23710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-23710: Issue Type: Umbrella (was: Improvement) > Upgrade the built-in Hive to 2.3.4 for hadoop-3.1 > - > > Key: SPARK-23710 > URL: https://issues.apache.org/jira/browse/SPARK-23710 > Project: Spark > Issue Type: Umbrella > Components: SQL >Affects Versions: 2.4.0 >Reporter: Yuming Wang >Priority: Critical > > Spark fail to run on Hadoop 3.x, because Hive's ShimLoader considers Hadoop > 3.x to be an unknown Hadoop version. see SPARK-18673 and HIVE-16081 for more > details. So we need to upgrade the built-in Hive for Hadoop-3.x. This is an > umbrella JIRA to track this upgrade. > > *Upgrade Plan*: > # SPARK-27054 Remove the Calcite dependency. This can avoid some jar > conflicts. > # SPARK-23749 Replace built-in Hive API (isSub/toKryo) and remove > OrcProto.Type usage > # SPARK-27158, SPARK-27130 Update dev/* to support dynamic change profiles > when testing > # Fix ORC dependency conflict to makes it test passed on Hive 1.2.1 and > compile passed on Hive 2.3.4 > # Add an empty hive-thriftserverV2 module. then we could test all test cases > in next step > # Make Hadoop-3.1 with Hive 2.3.4 test passed > # Adapted hive-thriftserverV2 from hive-thriftserver with Hive 2.3.4's > [TCLIService.thrift|https://github.com/apache/hive/blob/rel/release-2.3.4/service-rpc/if/TCLIService.thrift] > > I have completed the [initial > work|https://github.com/apache/spark/pull/24044] and plan to finish this > upgrade step by step. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27162) Add new method getOriginalMap in CaseInsensitiveStringMap
[ https://issues.apache.org/jira/browse/SPARK-27162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-27162: Assignee: Apache Spark > Add new method getOriginalMap in CaseInsensitiveStringMap > - > > Key: SPARK-27162 > URL: https://issues.apache.org/jira/browse/SPARK-27162 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Gengliang Wang >Assignee: Apache Spark >Priority: Major > > Currently, DataFrameReader/DataFrameReader supports setting Hadoop > configurations via method `.option()`. > E.g. > ``` > class TestFileFilter extends PathFilter { > override def accept(path: Path): Boolean = path.getParent.getName != "p=2" > } > withTempPath { dir => > val path = dir.getCanonicalPath > val df = spark.range(2) > df.write.orc(path + "/p=1") > df.write.orc(path + "/p=2") > assert(spark.read.orc(path).count() === 4) > val extraOptions = Map( > "mapred.input.pathFilter.class" -> classOf[TestFileFilter].getName, > "mapreduce.input.pathFilter.class" -> classOf[TestFileFilter].getName > ) > assert(spark.read.options(extraOptions).orc(path).count() === 2) > } > ``` > While Hadoop Configurations are case sensitive, the current data source V2 > APIs are using `CaseInsensitiveStringMap` in TableProvider. > To create Hadoop configurations correctly, I suggest adding a method > `getOriginalMap` in `CaseInsensitiveStringMap`. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27162) Add new method getOriginalMap in CaseInsensitiveStringMap
[ https://issues.apache.org/jira/browse/SPARK-27162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-27162: Assignee: (was: Apache Spark) > Add new method getOriginalMap in CaseInsensitiveStringMap > - > > Key: SPARK-27162 > URL: https://issues.apache.org/jira/browse/SPARK-27162 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Gengliang Wang >Priority: Major > > Currently, DataFrameReader/DataFrameReader supports setting Hadoop > configurations via method `.option()`. > E.g. > ``` > class TestFileFilter extends PathFilter { > override def accept(path: Path): Boolean = path.getParent.getName != "p=2" > } > withTempPath { dir => > val path = dir.getCanonicalPath > val df = spark.range(2) > df.write.orc(path + "/p=1") > df.write.orc(path + "/p=2") > assert(spark.read.orc(path).count() === 4) > val extraOptions = Map( > "mapred.input.pathFilter.class" -> classOf[TestFileFilter].getName, > "mapreduce.input.pathFilter.class" -> classOf[TestFileFilter].getName > ) > assert(spark.read.options(extraOptions).orc(path).count() === 2) > } > ``` > While Hadoop Configurations are case sensitive, the current data source V2 > APIs are using `CaseInsensitiveStringMap` in TableProvider. > To create Hadoop configurations correctly, I suggest adding a method > `getOriginalMap` in `CaseInsensitiveStringMap`. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27130) Automatically select profile when executing sbt-checkstyle
[ https://issues.apache.org/jira/browse/SPARK-27130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-27130: Issue Type: Sub-task (was: Improvement) Parent: SPARK-23710 > Automatically select profile when executing sbt-checkstyle > -- > > Key: SPARK-27130 > URL: https://issues.apache.org/jira/browse/SPARK-27130 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Minor > Fix For: 3.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27054) Remove Calcite dependency
[ https://issues.apache.org/jira/browse/SPARK-27054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-27054: Issue Type: Sub-task (was: Improvement) Parent: SPARK-23710 > Remove Calcite dependency > - > > Key: SPARK-27054 > URL: https://issues.apache.org/jira/browse/SPARK-27054 > Project: Spark > Issue Type: Sub-task > Components: Build, SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 3.0.0 > > > Calcite is only used for > [runSqlHive|https://github.com/apache/spark/blob/02bbe977abaf7006b845a7e99d612b0235aa0025/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L699-L705] > when > {{hive.cbo.enable=true}}([SemanticAnalyzer|https://github.com/apache/hive/blob/release-1.2.1/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzerFactory.java#L278-L280]). > So we can disable {{hive.cbo.enable}} and remove Calcite dependency. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23710) Upgrade the built-in Hive to 2.3.4 for hadoop-3.1
[ https://issues.apache.org/jira/browse/SPARK-23710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-23710: Description: Spark fail to run on Hadoop 3.x, because Hive's ShimLoader considers Hadoop 3.x to be an unknown Hadoop version. see SPARK-18673 and HIVE-16081 for more details. So we need to upgrade the built-in Hive for Hadoop-3.x. This is an umbrella JIRA to track this upgrade. *Upgrade Plan*: # SPARK-27054 Remove the Calcite dependency. This can avoid some jar conflicts. # SPARK-23749 Replace built-in Hive API (isSub/toKryo) and remove OrcProto.Type usage # SPARK-27158, SPARK-27130 Update dev/* to support dynamic change profiles when testing # Fix ORC dependency conflict to makes it test passed on Hive 1.2.1 and compile passed on Hive 2.3.4 # Add an empty hive-thriftserverV2 module. then we could test all test cases in next step # Make Hadoop-3.1 with Hive 2.3.4 test passed # Adapted hive-thriftserverV2 from hive-thriftserver with Hive 2.3.4's [TCLIService.thrift|https://github.com/apache/hive/blob/rel/release-2.3.4/service-rpc/if/TCLIService.thrift] I have completed the [initial work|https://github.com/apache/spark/pull/24044] and plan to finish this upgrade step by step. was: Upgrade built-in Hive to 2.3.4 for Hadoop-3.1(Please note that this upgrade only for Hadoop-3.1). To achieve this. We need to change sql/core, sql/hive, sql/hive-thriftserver modules at least: *sql/core*: Add two source directories(sql/core/v1.2.1 and sql/core/v2.3.4) to distinguish the code for different built-in Hive. *sql/hive:* use Java reflect or shim to support Hive 1.2.1 and Hive 2.3.4 same time. *sql/hive-thriftserver:* Add new thriftserver named hive-thriftserverV2 with Hive 2.3.4's [TCLIService.thrift|https://github.com/apache/hive/blob/rel/release-2.3.4/service-rpc/if/TCLIService.thrift]. Spark fail to run on Hadoop 3.x, because Hive's shimloader considers Hadoop 3.x to be an unknown Hadoop version. see [SPARK-18673|https://issues.apache.org/jira/browse/SPARK-18673] and [HIVE-16081|https://issues.apache.org/jira/browse/HIVE-16081] for more details. Upgrade Plan: # SPARK-27054 Remove the Calcite dependency. This can avoid some jar conflicts. # SPARK-23749 Replace built-in Hive API (isSub/toKryo) and remove OrcProto.Type usage # SPARK-27158, SPARK-27130 Update dev/* to support dynamic change profiles when testing # Fix ORC dependency conflict to makes it test passed on Hive 1.2.1 and compile passed on Hive 2.3.4 # Add an empty hive-thriftserverV2 module. then we could test all test cases in next step # Make Hadoop-3.1 with Hive 2.3.4 test passed # Adapted hive-thriftserverV2 from hive-thriftserver with Hive 2.3.4's [TCLIService.thrift|https://github.com/apache/hive/blob/rel/release-2.3.4/service-rpc/if/TCLIService.thrift] > Upgrade the built-in Hive to 2.3.4 for hadoop-3.1 > - > > Key: SPARK-23710 > URL: https://issues.apache.org/jira/browse/SPARK-23710 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Yuming Wang >Priority: Critical > > Spark fail to run on Hadoop 3.x, because Hive's ShimLoader considers Hadoop > 3.x to be an unknown Hadoop version. see SPARK-18673 and HIVE-16081 for more > details. So we need to upgrade the built-in Hive for Hadoop-3.x. This is an > umbrella JIRA to track this upgrade. > > *Upgrade Plan*: > # SPARK-27054 Remove the Calcite dependency. This can avoid some jar > conflicts. > # SPARK-23749 Replace built-in Hive API (isSub/toKryo) and remove > OrcProto.Type usage > # SPARK-27158, SPARK-27130 Update dev/* to support dynamic change profiles > when testing > # Fix ORC dependency conflict to makes it test passed on Hive 1.2.1 and > compile passed on Hive 2.3.4 > # Add an empty hive-thriftserverV2 module. then we could test all test cases > in next step > # Make Hadoop-3.1 with Hive 2.3.4 test passed > # Adapted hive-thriftserverV2 from hive-thriftserver with Hive 2.3.4's > [TCLIService.thrift|https://github.com/apache/hive/blob/rel/release-2.3.4/service-rpc/if/TCLIService.thrift] > > I have completed the [initial > work|https://github.com/apache/spark/pull/24044] and plan to finish this > upgrade step by step. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-27152) Column equality does not work for aliased columns.
[ https://issues.apache.org/jira/browse/SPARK-27152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16792793#comment-16792793 ] Ryan Radtke edited comment on SPARK-27152 at 3/14/19 3:36 PM: -- If you are abstracting elt then it is important. Also, its just sloppy. Probably not a major issue though. I changed it to minor. was (Author: ryanwradtke-thmbprnt): If you are abstracting elt then it important. Also, its just sloppy. > Column equality does not work for aliased columns. > -- > > Key: SPARK-27152 > URL: https://issues.apache.org/jira/browse/SPARK-27152 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Ryan Radtke >Priority: Minor > > assert($"zip".as("zip_code") equals $"zip".as("zip_code")) will return false > assert($"zip" equals $"zip") will return true. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org