date:20201125

[jira] [Commented] (SPARK-32223) Support adding a user provided config map.

2020-11-25 Thread Prashant Sharma (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17239069#comment-17239069
 ] 

Prashant Sharma commented on SPARK-32223:
-

One problem, I can think of with driver template is, even though it is possible 
to mount a config map, it is not as straight forward to mount it as 
SPARK_CONF_DIR. One may have to copy the spark.properties during container 
init. 


> Support adding a user provided config map.
> --
>
> Key: SPARK-32223
> URL: https://issues.apache.org/jira/browse/SPARK-32223
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Prashant Sharma
>Priority: Major
>
> One of the challenge with this is, spark.properties is not user provided and 
> is calculated based on certain factors. So a user provided config map, cannot 
> be used as is to mount as SPARK_CONF_DIR, so it will have to be somehow 
> augmented with the correct spark.properties.
> Q, Do we support update to config map properties for an already running job?
> Ans: No, since the spark.properties is calculated at the time of job 
> submission, it cannot be updated on the fly and it is not supported by Spark 
> at the moment for all the configuration values.
> Q. What are the usecases, where supplying SPARK_CONF_DIR via a config map 
> helps?
> One of the use case, I can think of is programmatically submitting a `spark 
> on k8s` job - e.g. spark as a service on a cloud deployment may find this 
> feature useful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33568) install coverage for pypy3

2020-11-25 Thread Shane Knapp (Jira)

Shane Knapp created SPARK-33568:
---

 Summary: install coverage for pypy3
 Key: SPARK-33568
 URL: https://issues.apache.org/jira/browse/SPARK-33568
 Project: Spark
  Issue Type: Bug
  Components: Build, PySpark
Affects Versions: 3.0.0
Reporter: Shane Knapp


from:

https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-3.0-test-sbt-hadoop-2.7-hive-1.2/1002/console

 
Coverage is not installed in Python executable 'pypy3' but 
'COVERAGE_PROCESS_START' environment variable is set, exiting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32915) RPC implementation to support pushing and merging shuffle blocks

2020-11-25 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17239061#comment-17239061
 ] 

Apache Spark commented on SPARK-32915:
--

User 'Victsm' has created a pull request for this issue:
https://github.com/apache/spark/pull/30513

> RPC implementation to support pushing and merging shuffle blocks
> 
>
> Key: SPARK-32915
> URL: https://issues.apache.org/jira/browse/SPARK-32915
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0
>Reporter: Min Shen
>Assignee: Min Shen
>Priority: Major
> Fix For: 3.1.0
>
>
> RPC implementation for the basic functionality in network-common and 
> network-shuffle module to enable pushing blocks on the client side and 
> merging received blocks on the server side.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32915) RPC implementation to support pushing and merging shuffle blocks

2020-11-25 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17239060#comment-17239060
 ] 

Apache Spark commented on SPARK-32915:
--

User 'Victsm' has created a pull request for this issue:
https://github.com/apache/spark/pull/30513

> RPC implementation to support pushing and merging shuffle blocks
> 
>
> Key: SPARK-32915
> URL: https://issues.apache.org/jira/browse/SPARK-32915
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0
>Reporter: Min Shen
>Assignee: Min Shen
>Priority: Major
> Fix For: 3.1.0
>
>
> RPC implementation for the basic functionality in network-common and 
> network-shuffle module to enable pushing blocks on the client side and 
> merging received blocks on the server side.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-33566) Incorrectly Parsing CSV file

2020-11-25 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17239049#comment-17239049
 ] 

Hyukjin Kwon edited comment on SPARK-33566 at 11/26/20, 3:57 AM:
-

Here is the output from running mvn clean test:

Running org.test.CommaTest

{code}
2020-11-25 17:55:45,728 INFO [CommaTest:12]
OpenCsv
2020-11-25 17:55:45,758 INFO [CommaTest:19] h1 h3 h2
2020-11-25 17:55:45,758 INFO [CommaTest:19] one three two
2020-11-25 17:55:45,760 INFO [CommaTest:19] abc xyz ^@Referral from Joe Smith. Fred is 
hard working. Super smart, though you wouldnt know it at first. 6 
months, and we sold this project. Phooey he said to me! Whats up 
with you people. Youll say anything for a sale! Until he met me of 
coursehaar haar!Internet is spottyWorking while at home so. Will be 
applied this weekend. On Bill Recovery and 20 yr warranty 
added.Kindness made this deal happen!
2020-11-25 17:55:45,763 INFO [CommaTest:26]

spark
2020-11-25 17:55:46,464 WARN [NativeCodeLoader:62] Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
2020-11-25 17:55:55,299 INFO [CommaTest:36] Count: 2
2020-11-25 17:55:55,449 INFO [CommaTest:41] one three two
2020-11-25 17:55:55,449 INFO [CommaTest:41] abc sans-serif;"">Referral from Joe 
Smith. Fred is hard working. Super smart "^@Referral from Joe Smith. Fred is 
hard working. Super smart, though you wouldnt know it at first. 6 
months, and we sold this project. Phooey he said to me! Whats up 
with you people. Youll say anything for a sale! Until he met me of 
coursehaar haar!Internet is spottyWorking while at home so. Will be 
applied this weekend. On Bill Recovery and 20 yr warranty 
added.Kindness made this deal happen!
{code}



was (Author: moresmores):
Here is the output from running mvn clean test:

Running org.test.CommaTest
{code}
2020-11-25 17:55:45,728 INFO [CommaTest:12]
OpenCsv
2020-11-25 17:55:45,758 INFO [CommaTest:19] h1 h3 h2
2020-11-25 17:55:45,758 INFO [CommaTest:19] one three two
2020-11-25 17:55:45,760 INFO [CommaTest:19] abc xyz ^@Referral from Joe Smith. Fred is 
hard working. Super smart, though you wouldnt know it at first. 6 
months, and we sold this project. Phooey he said to me! Whats up 
with you people. Youll say anything for a sale! Until he met me of 
coursehaar haar!Internet is spottyWorking while at home so. Will be 
applied this weekend. On Bill Recovery and 20 yr warranty 
added.Kindness made this deal happen!
2020-11-25 17:55:45,763 INFO [CommaTest:26]

spark
2020-11-25 17:55:46,464 WARN [NativeCodeLoader:62] Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
2020-11-25 17:55:55,299 INFO [CommaTest:36] Count: 2
2020-11-25 17:55:55,449 INFO [CommaTest:41] one three two
2020-11-25 17:55:55,449 INFO [CommaTest:41] abc sans-serif;"">Referral from Joe 
Smith. Fred is hard working. Super smart "^@Referral from Joe Smith. Fred is 
hard working. Super smart, though you wouldnt know it at first. 6 
months, and we sold this project. Phooey he said to me! Whats up 
with you people. Youll say anything for a sale! Until he met me of 
coursehaar haar!Internet is spottyWorking while at home so. Will be 
applied this weekend. On Bill Recovery and 20 yr warranty 
added.Kindness made this deal happen!

> Incorrectly Parsing CSV file
> 
>
> Key: SPARK-33566
> URL: https://issues.apache.org/jira/browse/SPARK-33566
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.7
>Reporter: Stephen More
>Priority: Minor
>
> Here is a test case: 
> [https://github.com/mores/maven-examples/blob/master/comma/src/test/java/org/test/CommaTest.java]
> It shows how I believe apache commons csv and opencsv correctly parses the 
> sample csv file.
> spark is not correctly parsing the sample csv file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-33566) Incorrectly Parsing CSV file

2020-11-25 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17239049#comment-17239049
 ] 

Hyukjin Kwon edited comment on SPARK-33566 at 11/26/20, 3:57 AM:
-

Here is the output from running mvn clean test:

Running org.test.CommaTest
{code}
2020-11-25 17:55:45,728 INFO [CommaTest:12]
OpenCsv
2020-11-25 17:55:45,758 INFO [CommaTest:19] h1 h3 h2
2020-11-25 17:55:45,758 INFO [CommaTest:19] one three two
2020-11-25 17:55:45,760 INFO [CommaTest:19] abc xyz ^@Referral from Joe Smith. Fred is 
hard working. Super smart, though you wouldnt know it at first. 6 
months, and we sold this project. Phooey he said to me! Whats up 
with you people. Youll say anything for a sale! Until he met me of 
coursehaar haar!Internet is spottyWorking while at home so. Will be 
applied this weekend. On Bill Recovery and 20 yr warranty 
added.Kindness made this deal happen!
2020-11-25 17:55:45,763 INFO [CommaTest:26]

spark
2020-11-25 17:55:46,464 WARN [NativeCodeLoader:62] Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
2020-11-25 17:55:55,299 INFO [CommaTest:36] Count: 2
2020-11-25 17:55:55,449 INFO [CommaTest:41] one three two
2020-11-25 17:55:55,449 INFO [CommaTest:41] abc sans-serif;"">Referral from Joe 
Smith. Fred is hard working. Super smart "^@Referral from Joe Smith. Fred is 
hard working. Super smart, though you wouldnt know it at first. 6 
months, and we sold this project. Phooey he said to me! Whats up 
with you people. Youll say anything for a sale! Until he met me of 
coursehaar haar!Internet is spottyWorking while at home so. Will be 
applied this weekend. On Bill Recovery and 20 yr warranty 
added.Kindness made this deal happen!


was (Author: moresmores):
{{Here is the output from running mvn clean test}}

 

{{Running org.test.CommaTest}}
{{2020-11-25 17:55:45,728 INFO [CommaTest:12] }}
{{OpenCsv}}
{{2020-11-25 17:55:45,758 INFO [CommaTest:19] h1 h3 h2}}
{{2020-11-25 17:55:45,758 INFO [CommaTest:19] one three two}}
{{2020-11-25 17:55:45,760 INFO [CommaTest:19] abc xyz ^@Referral from Joe Smith. Fred is 
hard working. Super smart, though you wouldnt know it at first. 6 
months, and we sold this project. Phooey he said to me! Whats up 
with you people. Youll say anything for a sale! Until he met me of 
coursehaar haar!Internet is spottyWorking while at home so. Will be 
applied this weekend. On Bill Recovery and 20 yr warranty 
added.Kindness made this deal happen!}}
{{2020-11-25 17:55:45,763 INFO [CommaTest:26] }}
{{spark}}
{{2020-11-25 17:55:46,464 WARN [NativeCodeLoader:62] Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable}}
{{2020-11-25 17:55:55,299 INFO [CommaTest:36] Count: 2}}
{{2020-11-25 17:55:55,449 INFO [CommaTest:41] one three two}}
{{2020-11-25 17:55:55,449 INFO [CommaTest:41] abc sans-serif;"">Referral from 
Joe Smith. Fred is hard working. Super smart "^@Referral from Joe Smith. Fred is 
hard working. Super smart, though you wouldnt know it at first. 6 
months, and we sold this project. Phooey he said to me! Whats up 
with you people. Youll say anything for a sale! Until he met me of 
coursehaar haar!Internet is spottyWorking while at home so. Will be 
applied this weekend. On Bill Recovery and 20 yr warranty 
added.Kindness made this deal happen!}}

> Incorrectly Parsing CSV file
> 
>
> Key: SPARK-33566
> URL: https://issues.apache.org/jira/browse/SPARK-33566
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.7
>Reporter: Stephen More
>Priority: Minor
>
> Here is a test case: 
> [https://github.com/mores/maven-examples/blob/master/comma/src/test/java/org/test/CommaTest.java]
> It shows how I believe apache commons csv and opencsv correctly parses the 
> sample csv file.
> spark is not correctly parsing the sample csv file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33562) Improve the style of the checkbox in executor page

2020-11-25 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-33562:
---
Affects Version/s: (was: 3.0.1)
   (was: 3.0.0)
   3.1.0

> Improve the style of the checkbox in executor page
> --
>
> Key: SPARK-33562
> URL: https://issues.apache.org/jira/browse/SPARK-33562
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.1.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.1.0
>
>
> The width of class `container-fluid-div` is set as 200px after 
> https://github.com/apache/spark/pull/21688 . This makes the checkbox in the 
> executor page messy.
> We should remove the width style.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33566) Incorrectly Parsing CSV file

2020-11-25 Thread Stephen More (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17239049#comment-17239049
 ] 

Stephen More commented on SPARK-33566:
--

{{Here is the output from running mvn clean test}}

 

{{Running org.test.CommaTest}}
{{2020-11-25 17:55:45,728 INFO [CommaTest:12] }}
{{OpenCsv}}
{{2020-11-25 17:55:45,758 INFO [CommaTest:19] h1 h3 h2}}
{{2020-11-25 17:55:45,758 INFO [CommaTest:19] one three two}}
{{2020-11-25 17:55:45,760 INFO [CommaTest:19] abc xyz ^@Referral from Joe Smith. Fred is 
hard working. Super smart, though you wouldnt know it at first. 6 
months, and we sold this project. Phooey he said to me! Whats up 
with you people. Youll say anything for a sale! Until he met me of 
coursehaar haar!Internet is spottyWorking while at home so. Will be 
applied this weekend. On Bill Recovery and 20 yr warranty 
added.Kindness made this deal happen!}}
{{2020-11-25 17:55:45,763 INFO [CommaTest:26] }}
{{spark}}
{{2020-11-25 17:55:46,464 WARN [NativeCodeLoader:62] Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable}}
{{2020-11-25 17:55:55,299 INFO [CommaTest:36] Count: 2}}
{{2020-11-25 17:55:55,449 INFO [CommaTest:41] one three two}}
{{2020-11-25 17:55:55,449 INFO [CommaTest:41] abc sans-serif;"">Referral from 
Joe Smith. Fred is hard working. Super smart "^@Referral from Joe Smith. Fred is 
hard working. Super smart, though you wouldnt know it at first. 6 
months, and we sold this project. Phooey he said to me! Whats up 
with you people. Youll say anything for a sale! Until he met me of 
coursehaar haar!Internet is spottyWorking while at home so. Will be 
applied this weekend. On Bill Recovery and 20 yr warranty 
added.Kindness made this deal happen!}}

> Incorrectly Parsing CSV file
> 
>
> Key: SPARK-33566
> URL: https://issues.apache.org/jira/browse/SPARK-33566
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.7
>Reporter: Stephen More
>Priority: Minor
>
> Here is a test case: 
> [https://github.com/mores/maven-examples/blob/master/comma/src/test/java/org/test/CommaTest.java]
> It shows how I believe apache commons csv and opencsv correctly parses the 
> sample csv file.
> spark is not correctly parsing the sample csv file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33551) Do not use custom shuffle reader for repartition

2020-11-25 Thread Xiao Li (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-33551.
-
Fix Version/s: 3.1.0
 Assignee: Wei Xue
   Resolution: Fixed

> Do not use custom shuffle reader for repartition
> 
>
> Key: SPARK-33551
> URL: https://issues.apache.org/jira/browse/SPARK-33551
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Wei Xue
>Assignee: Wei Xue
>Priority: Major
> Fix For: 3.1.0
>
>
> We should have a more thorough fix for all sorts of custom shuffle readers 
> when the original query has a repartition shuffle, based on the discussions 
> on the initial PR: [https://github.com/apache/spark/pull/29797].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33559) Column pruning with monotonically_increasing_id

2020-11-25 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17239042#comment-17239042
 ] 

Hyukjin Kwon commented on SPARK-33559:
--

Can you try this in Spark 3.0.0? I remember this was fixed.

> Column pruning with monotonically_increasing_id
> ---
>
> Key: SPARK-33559
> URL: https://issues.apache.org/jira/browse/SPARK-33559
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Gaetan
>Priority: Minor
>
> {code:java}
> df = ss.read.parquet("/path/to/parquet/dataset") 
> df.select("partnerid").withColumn("index", 
> sf.monotonically_increasing_id()).explain(True){code}
>  {{We should expect to only read partnerid from parquet dataset but we 
> actually read the whole dataset:}}
> {code:java}
> ... == Physical Plan == Project [partnerid#6794, 
> monotonically_increasing_id() AS index#24939L] +- FileScan parquet 
> [impression_id#6550,arbitrage_id#6551,display_timestamp#6552L,requesttimestamputc#6553,affiliateid#6554,amp_adrequest_type#6555,app_id#6556,app_name#6557,appnexus_viewability#6558,apxpagevertical#6559,arbitrage_time#6560,banner_type#6561,display_type_int#6562,has_multiple_display_types#6563,bannerid#6564,bid_app_id_hash#6565,bid_url_domain_hash#6566,bidding_details#6567,bid_level_core#6568,bidrandomization_user_factor#6569,bidrandomization_user_mu#6570,bidrandomization_user_sigma#6571,big_lastrequesttimestampsession#6572,big_nbrequestaffiliatesession#6573,...
>  566 more fields] ...{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33564) Prometheus metrics for Master and Worker isn't working

2020-11-25 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17239038#comment-17239038
 ] 

Hyukjin Kwon commented on SPARK-33564:
--

cc [~dongjoon] FYI

> Prometheus metrics for Master and Worker isn't working 
> ---
>
> Key: SPARK-33564
> URL: https://issues.apache.org/jira/browse/SPARK-33564
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Spark Shell
>Affects Versions: 3.0.0, 3.0.1
>Reporter: Paulo Roberto de Oliveira Castro
>Priority: Major
>  Labels: Metrics, metrics, prometheus
>
> Following the [PR|https://github.com/apache/spark/pull/25769] that introduced 
> the Prometheus sink, I downloaded the {{spark-3.0.1-bin-hadoop2.7.tgz}}  
> (also tested with 3.0.0), uncompressed the tgz and created a file called 
> {{metrics.properties}} adding this content:
> {quote}{{*.sink.prometheusServlet.class=org.apache.spark.metrics.sink.PrometheusServlet}}
>  {{*.sink.prometheusServlet.path=/metrics/prometheus}}
> master.sink.prometheusServlet.path=/metrics/master/prometheus
> applications.sink.prometheusServlet.path=/metrics/applications/prometheus
> {quote}
> Then I ran: 
> {quote}{{$ sbin/start-master.sh}}
>  {{$ sbin/start-slave.sh spark://`hostname`:7077}}
>  {{$ bin/spark-shell --master spark://`hostname`:7077 
> --files=./metrics.properties --conf spark.metrics.conf=./metrics.properties}}
> {quote}
> {{The Spark shell opens without problems:}}
> {quote}{{20/11/25 17:36:07 WARN NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable}}
> {{Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties}}
> {{Setting default log level to "WARN".}}
> {{To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).}}
> {{Spark context Web UI available at 
> [http://192.168.0.6:4040|http://192.168.0.6:4040/]}}
> {{Spark context available as 'sc' (master = 
> spark://MacBook-Pro-de-Paulo-2.local:7077, app id = 
> app-20201125173618-0002).}}
> {{Spark session available as 'spark'.}}
> {{Welcome to}}
> {{                    __}}
> {{     / __/_   _/ /__}}
> {{    _\ \/ _ \/ _ `/ __/  '_/}}
> {{   /___/ .__/_,_/_/ /_/_\   version 3.0.0}}
> {{      /_/}}
> {{         }}
> {{Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_212)}}
> {{Type in expressions to have them evaluated.}}
> {{Type :help for more information. }}
> {{scala>}}
> {quote}
> {{And when I try to fetch prometheus metrics for driver, everything works 
> fine:}}
> {quote}$ curl -s [http://localhost:4040/metrics/prometheus/] | head -n 5
> metrics_app_20201125173618_0002_driver_BlockManager_disk_diskSpaceUsed_MB_Number\{type="gauges"}
>  0
> metrics_app_20201125173618_0002_driver_BlockManager_disk_diskSpaceUsed_MB_Value\{type="gauges"}
>  0
> metrics_app_20201125173618_0002_driver_BlockManager_memory_maxMem_MB_Number\{type="gauges"}
>  732
> metrics_app_20201125173618_0002_driver_BlockManager_memory_maxMem_MB_Value\{type="gauges"}
>  732
> metrics_app_20201125173618_0002_driver_BlockManager_memory_maxOffHeapMem_MB_Number\{type="gauges"}
>  0
> {quote}
> *The problem appears when I try accessing master metrics*, and I get the 
> following problem:
> {quote}{{$ curl -s [http://localhost:8080/metrics/master/prometheus]}}
> {{}}
> {{      }}
> {{         type="text/css"/> href="/static/vis-timeline-graph2d.min.css" type="text/css"/> rel="stylesheet" href="/static/webui.css" type="text/css"/> rel="stylesheet" href="/static/timeline-view.css" type="text/css"/> src="/static/sorttable.js"> src="/static/jquery-3.4.1.min.js"> src="/static/vis-timeline-graph2d.min.js"> src="/static/bootstrap-tooltip.js"> src="/static/initialize-tooltips.js"> src="/static/table.js"> src="/static/timeline-view.js"> src="/static/log-view.js"> src="/static/webui.js">setUIRoot('')}}
> {{        }}
> {{         href="/static/spark-logo-77x50px-hd.png">}}
> {{        Spark Master at 
> spark://MacBook-Pro-de-Paulo-2.local:7077}}
> {{      }}
> {{      }}
> {{        }}
> {{          }}
> {{            }}
> {{              }}
> {{                }}
> {{                  }}
> {{                  3.0.0}}
> {{                }}
> {{                Spark Master at spark://MacBook-Pro-de-Paulo-2.local:7077}}
> {{              }}
> {{            }}
> {{          }}
> {{          }}
> {{          }}
> {{            }}
> {{              URL: 
> spark://MacBook-Pro-de-Paulo-2.local:7077}}
>  ...
> {quote}
> The same happens for all of those here:
> {quote}{{$ curl -s [http://localhost:8080/metrics/applications/prometheus/]}}
>  {{$ curl -s [http://localhost:8081/metrics/prometheus/]}}
> {quote}
> Instead, *I expected metrics in prometheus metrics*. All related JSON 
> endpoints seem

[jira] [Commented] (SPARK-33566) Incorrectly Parsing CSV file

2020-11-25 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17239037#comment-17239037
 ] 

Hyukjin Kwon commented on SPARK-33566:
--

[~moresmores] can you copy and past the output here? 

> Incorrectly Parsing CSV file
> 
>
> Key: SPARK-33566
> URL: https://issues.apache.org/jira/browse/SPARK-33566
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.7
>Reporter: Stephen More
>Priority: Minor
>
> Here is a test case: 
> [https://github.com/mores/maven-examples/blob/master/comma/src/test/java/org/test/CommaTest.java]
> It shows how I believe apache commons csv and opencsv correctly parses the 
> sample csv file.
> spark is not correctly parsing the sample csv file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-28645) Throw an error on window redefinition

2020-11-25 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28645:


Assignee: (was: Apache Spark)

> Throw an error on window redefinition
> -
>
> Key: SPARK-28645
> URL: https://issues.apache.org/jira/browse/SPARK-28645
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Dylan Guedes
>Priority: Major
>
> Currently in Spark one could redefine a window. For instance:
> {code:sql}select count(*) OVER w FROM tenk1 WINDOW w AS (ORDER BY unique1), w 
> AS (ORDER BY unique1);{code}
> The window `w` is defined two times. In PgSQL, on the other hand, a thrown 
> will happen:
> {code:sql}ERROR:  window "w" is already defined{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28645) Throw an error on window redefinition

2020-11-25 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-28645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17239032#comment-17239032
 ] 

Apache Spark commented on SPARK-28645:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/30512

> Throw an error on window redefinition
> -
>
> Key: SPARK-28645
> URL: https://issues.apache.org/jira/browse/SPARK-28645
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Dylan Guedes
>Priority: Major
>
> Currently in Spark one could redefine a window. For instance:
> {code:sql}select count(*) OVER w FROM tenk1 WINDOW w AS (ORDER BY unique1), w 
> AS (ORDER BY unique1);{code}
> The window `w` is defined two times. In PgSQL, on the other hand, a thrown 
> will happen:
> {code:sql}ERROR:  window "w" is already defined{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-28645) Throw an error on window redefinition

2020-11-25 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28645:


Assignee: Apache Spark

> Throw an error on window redefinition
> -
>
> Key: SPARK-28645
> URL: https://issues.apache.org/jira/browse/SPARK-28645
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Dylan Guedes
>Assignee: Apache Spark
>Priority: Major
>
> Currently in Spark one could redefine a window. For instance:
> {code:sql}select count(*) OVER w FROM tenk1 WINDOW w AS (ORDER BY unique1), w 
> AS (ORDER BY unique1);{code}
> The window `w` is defined two times. In PgSQL, on the other hand, a thrown 
> will happen:
> {code:sql}ERROR:  window "w" is already defined{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33565) python/run-tests.py calling python3.8

2020-11-25 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17238998#comment-17238998
 ] 

Apache Spark commented on SPARK-33565:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/30511

> python/run-tests.py calling python3.8
> -
>
> Key: SPARK-33565
> URL: https://issues.apache.org/jira/browse/SPARK-33565
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.0.1
>Reporter: Shane Knapp
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.2.0
>
>
> this line in run-tests.py on master:
> |python_execs = [x for x in ["python3.6", "python3.8", "pypy3"] if which(x)]|
>  
> and this line in branch-3.0:
> python_execs = [x for x in ["python3.8", "python2.7", "pypy3", "pypy"] if 
> which(x)]
> ...are currently breaking builds on the new ubuntu 20.04LTS workers.
> the default  system python is /usr/bin/python3.8 and we do NOT have a working 
> python3.8 anaconda deployment yet.  this is causing python test breakages.
> PRs incoming
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33565) python/run-tests.py calling python3.8

2020-11-25 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17238997#comment-17238997
 ] 

Apache Spark commented on SPARK-33565:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/30511

> python/run-tests.py calling python3.8
> -
>
> Key: SPARK-33565
> URL: https://issues.apache.org/jira/browse/SPARK-33565
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.0.1
>Reporter: Shane Knapp
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.2.0
>
>
> this line in run-tests.py on master:
> |python_execs = [x for x in ["python3.6", "python3.8", "pypy3"] if which(x)]|
>  
> and this line in branch-3.0:
> python_execs = [x for x in ["python3.8", "python2.7", "pypy3", "pypy"] if 
> which(x)]
> ...are currently breaking builds on the new ubuntu 20.04LTS workers.
> the default  system python is /usr/bin/python3.8 and we do NOT have a working 
> python3.8 anaconda deployment yet.  this is causing python test breakages.
> PRs incoming
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33562) Improve the style of the checkbox in executor page

2020-11-25 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-33562.
--
Fix Version/s: 3.1.0
   Resolution: Fixed

Fixed in https://github.com/apache/spark/pull/30500

> Improve the style of the checkbox in executor page
> --
>
> Key: SPARK-33562
> URL: https://issues.apache.org/jira/browse/SPARK-33562
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.0.0, 3.0.1
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.1.0
>
>
> The width of class `container-fluid-div` is set as 200px after 
> https://github.com/apache/spark/pull/21688 . This makes the checkbox in the 
> executor page messy.
> We should remove the width style.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33565) python/run-tests.py calling python3.8

2020-11-25 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17238993#comment-17238993
 ] 

Apache Spark commented on SPARK-33565:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/30510

> python/run-tests.py calling python3.8
> -
>
> Key: SPARK-33565
> URL: https://issues.apache.org/jira/browse/SPARK-33565
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.0.1
>Reporter: Shane Knapp
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.2.0
>
>
> this line in run-tests.py on master:
> |python_execs = [x for x in ["python3.6", "python3.8", "pypy3"] if which(x)]|
>  
> and this line in branch-3.0:
> python_execs = [x for x in ["python3.8", "python2.7", "pypy3", "pypy"] if 
> which(x)]
> ...are currently breaking builds on the new ubuntu 20.04LTS workers.
> the default  system python is /usr/bin/python3.8 and we do NOT have a working 
> python3.8 anaconda deployment yet.  this is causing python test breakages.
> PRs incoming
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33565) python/run-tests.py calling python3.8

2020-11-25 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17238992#comment-17238992
 ] 

Apache Spark commented on SPARK-33565:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/30510

> python/run-tests.py calling python3.8
> -
>
> Key: SPARK-33565
> URL: https://issues.apache.org/jira/browse/SPARK-33565
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.0.1
>Reporter: Shane Knapp
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.2.0
>
>
> this line in run-tests.py on master:
> |python_execs = [x for x in ["python3.6", "python3.8", "pypy3"] if which(x)]|
>  
> and this line in branch-3.0:
> python_execs = [x for x in ["python3.8", "python2.7", "pypy3", "pypy"] if 
> which(x)]
> ...are currently breaking builds on the new ubuntu 20.04LTS workers.
> the default  system python is /usr/bin/python3.8 and we do NOT have a working 
> python3.8 anaconda deployment yet.  this is causing python test breakages.
> PRs incoming
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33104) Fix `YarnClusterSuite.yarn-cluster should respect conf overrides in SparkHadoopUtil`

2020-11-25 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17238988#comment-17238988
 ] 

Apache Spark commented on SPARK-33104:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/30508

> Fix `YarnClusterSuite.yarn-cluster should respect conf overrides in 
> SparkHadoopUtil`
> 
>
> Key: SPARK-33104
> URL: https://issues.apache.org/jira/browse/SPARK-33104
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Hyukjin Kwon
>Priority: Blocker
> Fix For: 3.1.0
>
>
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.7-hive-2.3/1377/testReport/org.apache.spark.deploy.yarn/YarnClusterSuite/yarn_cluster_should_respect_conf_overrides_in_SparkHadoopUtil__SPARK_16414__SPARK_23630_/
> {code}
> 20/10/09 05:18:13.211 ContainersLauncher #0 WARN DefaultContainerExecutor: 
> Exit code from container container_1602245728426_0006_02_01 is : 15
> 20/10/09 05:18:13.211 ContainersLauncher #0 WARN DefaultContainerExecutor: 
> Exception from container-launch with container ID: 
> container_1602245728426_0006_02_01 and exit code: 15
> ExitCodeException exitCode=15: 
>   at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
>   at org.apache.hadoop.util.Shell.run(Shell.java:482)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> 20/10/09 05:18:13.211 ContainersLauncher #0 WARN ContainerLaunch: Container 
> exited with a non-zero exit code 15
> 20/10/09 05:18:13.237 AsyncDispatcher event handler WARN NMAuditLogger: 
> USER=jenkins  OPERATION=Container Finished - Failed   TARGET=ContainerImpl
> RESULT=FAILURE  DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE  
>   APPID=application_1602245728426_0006
> CONTAINERID=container_1602245728426_0006_02_01
> 20/10/09 05:18:13.244 Socket Reader #1 for port 37112 INFO Server: Auth 
> successful for appattempt_1602245728426_0006_02 (auth:SIMPLE)
> 20/10/09 05:18:13.326 IPC Parameter Sending Thread #0 DEBUG Client: IPC 
> Client (1123559518) connection to 
> amp-jenkins-worker-04.amp/192.168.10.24:43090 from jenkins sending #37
> 20/10/09 05:18:13.327 IPC Client (1123559518) connection to 
> amp-jenkins-worker-04.amp/192.168.10.24:43090 from jenkins DEBUG Client: IPC 
> Client (1123559518) connection to 
> amp-jenkins-worker-04.amp/192.168.10.24:43090 from jenkins got value #37
> 20/10/09 05:18:13.328 main DEBUG ProtobufRpcEngine: Call: 
> getApplicationReport took 2ms
> 20/10/09 05:18:13.328 main INFO Client: Application report for 
> application_1602245728426_0006 (state: FINISHED)
> 20/10/09 05:18:13.328 main DEBUG Client: 
>client token: N/A
>diagnostics: User class threw exception: 
> org.scalatest.exceptions.TestFailedException: null was not equal to 
> "testvalue"
>   at 
> org.scalatest.matchers.MatchersHelper$.indicateFailure(MatchersHelper.scala:344)
>   at 
> org.scalatest.matchers.should.Matchers$ShouldMethodHelperClass.shouldMatcher(Matchers.scala:6778)
>   at 
> org.scalatest.matchers.should.Matchers$AnyShouldWrapper.should(Matchers.scala:6822)
>   at 
> org.apache.spark.deploy.yarn.YarnClusterDriverUseSparkHadoopUtilConf$.$anonfun$main$2(YarnClusterSuite.scala:383)
>   at 
> scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
>   at 
> scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
>   at 
> org.apache.spark.deploy.yarn.YarnClusterDriverUseSparkHadoopUtilConf$.main(YarnClusterSuite.scala:382)
>   at 
> org.apache.spark.deploy.yarn.YarnClusterDriverUseSparkHadoopUtilConf.main(YarnClusterSuite.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>

[jira] [Commented] (SPARK-33565) python/run-tests.py calling python3.8

2020-11-25 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17238975#comment-17238975
 ] 

Apache Spark commented on SPARK-33565:
--

User 'shaneknapp' has created a pull request for this issue:
https://github.com/apache/spark/pull/30509

> python/run-tests.py calling python3.8
> -
>
> Key: SPARK-33565
> URL: https://issues.apache.org/jira/browse/SPARK-33565
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.0.1
>Reporter: Shane Knapp
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.2.0
>
>
> this line in run-tests.py on master:
> |python_execs = [x for x in ["python3.6", "python3.8", "pypy3"] if which(x)]|
>  
> and this line in branch-3.0:
> python_execs = [x for x in ["python3.8", "python2.7", "pypy3", "pypy"] if 
> which(x)]
> ...are currently breaking builds on the new ubuntu 20.04LTS workers.
> the default  system python is /usr/bin/python3.8 and we do NOT have a working 
> python3.8 anaconda deployment yet.  this is causing python test breakages.
> PRs incoming
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33212) Move to shaded clients for Hadoop 3.x profile

2020-11-25 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17238974#comment-17238974
 ] 

Apache Spark commented on SPARK-33212:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/30508

> Move to shaded clients for Hadoop 3.x profile
> -
>
> Key: SPARK-33212
> URL: https://issues.apache.org/jira/browse/SPARK-33212
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, Spark Submit, SQL, YARN
>Affects Versions: 3.0.1
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>  Labels: releasenotes
> Fix For: 3.1.0
>
>
> Hadoop 3.x+ offers shaded client jars: hadoop-client-api and 
> hadoop-client-runtime, which shade 3rd party dependencies such as Guava, 
> protobuf, jetty etc. This Jira switches Spark to use these jars instead of 
> hadoop-common, hadoop-client etc. Benefits include:
>  * It unblocks Spark from upgrading to Hadoop 3.2.2/3.3.0+. The newer 
> versions of Hadoop have migrated to Guava 27.0+ and in order to resolve Guava 
> conflicts, Spark depends on Hadoop to not leaking dependencies.
>  * It makes Spark/Hadoop dependency cleaner. Currently Spark uses both 
> client-side and server-side Hadoop APIs from modules such as hadoop-common, 
> hadoop-yarn-server-common etc. Moving to hadoop-client-api allows use to only 
> use public/client API from Hadoop side.
>  * Provides a better isolation from Hadoop dependencies. In future Spark can 
> better evolve without worrying about dependencies pulled from Hadoop side 
> (which used to be a lot).
> *There are some behavior changes introduced with this JIRA, when people use 
> Spark compiled with Hadoop 3.x:*
> - Users now need to make sure class path contains `hadoop-client-api` and 
> `hadoop-client-runtime` jars when they deploy Spark with the 
> `hadoop-provided` option. In addition, it is high recommended that they put 
> these two jars before other Hadoop jars in the class path. Otherwise, 
> conflicts such as from Guava could happen if classes are loaded from the 
> other non-shaded Hadoop jars.
> - Since the new shaded Hadoop clients no longer include 3rd party 
> dependencies. Users who used to depend on these now need to explicitly put 
> the jars in their class path.
> Ideally the above should go to release notes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33565) python/run-tests.py calling python3.8

2020-11-25 Thread Shane Knapp (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Knapp resolved SPARK-33565.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 30506
[https://github.com/apache/spark/pull/30506]

> python/run-tests.py calling python3.8
> -
>
> Key: SPARK-33565
> URL: https://issues.apache.org/jira/browse/SPARK-33565
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.0.1
>Reporter: Shane Knapp
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.2.0
>
>
> this line in run-tests.py on master:
> |python_execs = [x for x in ["python3.6", "python3.8", "pypy3"] if which(x)]|
>  
> and this line in branch-3.0:
> python_execs = [x for x in ["python3.8", "python2.7", "pypy3", "pypy"] if 
> which(x)]
> ...are currently breaking builds on the new ubuntu 20.04LTS workers.
> the default  system python is /usr/bin/python3.8 and we do NOT have a working 
> python3.8 anaconda deployment yet.  this is causing python test breakages.
> PRs incoming
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26645) CSV infer schema bug infers decimal(9,-1)

2020-11-25 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-26645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17238971#comment-17238971
 ] 

Dongjoon Hyun commented on SPARK-26645:
---

This landed `branch-2.4` via https://github.com/apache/spark/pull/30503 .

> CSV infer schema bug infers decimal(9,-1)
> -
>
> Key: SPARK-26645
> URL: https://issues.apache.org/jira/browse/SPARK-26645
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0, 2.4.7
>Reporter: Ohad Raviv
>Assignee: Marco Gaido
>Priority: Minor
> Fix For: 2.4.8, 3.0.0
>
>
> we have a file /tmp/t1/file.txt that contains only one line "1.18927098E9".
> running:
> {code:python}
> df = spark.read.csv('/tmp/t1', header=False, inferSchema=True, sep='\t')
> print df.dtypes
> {code}
> causes:
> {noformat}
> ValueError: Could not parse datatype: decimal(9,-1)
> {noformat}
> I'm not sure where the bug is - inferSchema or dtypes?
> I saw it is legal to have a decimal with negative scale in the code 
> (CSVInferSchema.scala):
> {code:python}
> if (bigDecimal.scale <= 0) {
> // `DecimalType` conversion can fail when
> //   1. The precision is bigger than 38.
> //   2. scale is bigger than precision.
> DecimalType(bigDecimal.precision, bigDecimal.scale)
>   } 
> {code}
> but what does it mean?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26645) CSV infer schema bug infers decimal(9,-1)

2020-11-25 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-26645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-26645:
--
Fix Version/s: 2.4.8

> CSV infer schema bug infers decimal(9,-1)
> -
>
> Key: SPARK-26645
> URL: https://issues.apache.org/jira/browse/SPARK-26645
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0, 2.4.7
>Reporter: Ohad Raviv
>Assignee: Marco Gaido
>Priority: Minor
> Fix For: 2.4.8, 3.0.0
>
>
> we have a file /tmp/t1/file.txt that contains only one line "1.18927098E9".
> running:
> {code:python}
> df = spark.read.csv('/tmp/t1', header=False, inferSchema=True, sep='\t')
> print df.dtypes
> {code}
> causes:
> {noformat}
> ValueError: Could not parse datatype: decimal(9,-1)
> {noformat}
> I'm not sure where the bug is - inferSchema or dtypes?
> I saw it is legal to have a decimal with negative scale in the code 
> (CSVInferSchema.scala):
> {code:python}
> if (bigDecimal.scale <= 0) {
> // `DecimalType` conversion can fail when
> //   1. The precision is bigger than 38.
> //   2. scale is bigger than precision.
> DecimalType(bigDecimal.precision, bigDecimal.scale)
>   } 
> {code}
> but what does it mean?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33566) Incorrectly Parsing CSV file

2020-11-25 Thread Stephen More (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen More updated SPARK-33566:
-
Description: 
Here is a test case: 

[https://github.com/mores/maven-examples/blob/master/comma/src/test/java/org/test/CommaTest.java]

It shows how I believe apache commons csv and opencsv correctly parses the 
sample csv file.

spark is not correctly parsing the sample csv file.

  was:
Here is a test case: 

[https://github.com/mores/maven-examples/blob/master/comma/src/test/java/org/test/CommaTest.java]

It shows how I believe opencsv correctly parses the csv file and spark is not.


> Incorrectly Parsing CSV file
> 
>
> Key: SPARK-33566
> URL: https://issues.apache.org/jira/browse/SPARK-33566
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.7
>Reporter: Stephen More
>Priority: Minor
>
> Here is a test case: 
> [https://github.com/mores/maven-examples/blob/master/comma/src/test/java/org/test/CommaTest.java]
> It shows how I believe apache commons csv and opencsv correctly parses the 
> sample csv file.
> spark is not correctly parsing the sample csv file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33567) DSv2: Use callback instead of passing Spark session and v2 relation for refreshing cache

2020-11-25 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33567:


Assignee: (was: Apache Spark)

> DSv2: Use callback instead of passing Spark session and v2 relation for 
> refreshing cache
> 
>
> Key: SPARK-33567
> URL: https://issues.apache.org/jira/browse/SPARK-33567
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Chao Sun
>Priority: Major
>
> As discussed [https://github.com/apache/spark/pull/30429], it's better to not 
> pass Spark session and DataSourceV2Relation through Spark plans. Instead we 
> can use a callback which makes the interface cleaner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33567) DSv2: Use callback instead of passing Spark session and v2 relation for refreshing cache

2020-11-25 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33567:


Assignee: Apache Spark

> DSv2: Use callback instead of passing Spark session and v2 relation for 
> refreshing cache
> 
>
> Key: SPARK-33567
> URL: https://issues.apache.org/jira/browse/SPARK-33567
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Chao Sun
>Assignee: Apache Spark
>Priority: Major
>
> As discussed [https://github.com/apache/spark/pull/30429], it's better to not 
> pass Spark session and DataSourceV2Relation through Spark plans. Instead we 
> can use a callback which makes the interface cleaner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33567) DSv2: Use callback instead of passing Spark session and v2 relation for refreshing cache

2020-11-25 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17238963#comment-17238963
 ] 

Apache Spark commented on SPARK-33567:
--

User 'sunchao' has created a pull request for this issue:
https://github.com/apache/spark/pull/30491

> DSv2: Use callback instead of passing Spark session and v2 relation for 
> refreshing cache
> 
>
> Key: SPARK-33567
> URL: https://issues.apache.org/jira/browse/SPARK-33567
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Chao Sun
>Priority: Major
>
> As discussed [https://github.com/apache/spark/pull/30429], it's better to not 
> pass Spark session and DataSourceV2Relation through Spark plans. Instead we 
> can use a callback which makes the interface cleaner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33567) DSv2: Use callback instead of passing Spark session and v2 relation for refreshing cache

2020-11-25 Thread Chao Sun (Jira)

Chao Sun created SPARK-33567:


 Summary: DSv2: Use callback instead of passing Spark session and 
v2 relation for refreshing cache
 Key: SPARK-33567
 URL: https://issues.apache.org/jira/browse/SPARK-33567
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.1.0
Reporter: Chao Sun


As discussed [https://github.com/apache/spark/pull/30429], it's better to not 
pass Spark session and DataSourceV2Relation through Spark plans. Instead we can 
use a callback which makes the interface cleaner.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33566) Incorrectly Parsing CSV file

2020-11-25 Thread Stephen More (Jira)

Stephen More created SPARK-33566:


 Summary: Incorrectly Parsing CSV file
 Key: SPARK-33566
 URL: https://issues.apache.org/jira/browse/SPARK-33566
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.4.7
Reporter: Stephen More


Here is a test case: 

[https://github.com/mores/maven-examples/blob/master/comma/src/test/java/org/test/CommaTest.java]

It shows how I believe opencsv correctly parses the csv file and spark is not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33565) python/run-tests.py calling python3.8

2020-11-25 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17238955#comment-17238955
 ] 

Apache Spark commented on SPARK-33565:
--

User 'shaneknapp' has created a pull request for this issue:
https://github.com/apache/spark/pull/30507

> python/run-tests.py calling python3.8
> -
>
> Key: SPARK-33565
> URL: https://issues.apache.org/jira/browse/SPARK-33565
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.0.1
>Reporter: Shane Knapp
>Assignee: Apache Spark
>Priority: Major
>
> this line in run-tests.py on master:
> |python_execs = [x for x in ["python3.6", "python3.8", "pypy3"] if which(x)]|
>  
> and this line in branch-3.0:
> python_execs = [x for x in ["python3.8", "python2.7", "pypy3", "pypy"] if 
> which(x)]
> ...are currently breaking builds on the new ubuntu 20.04LTS workers.
> the default  system python is /usr/bin/python3.8 and we do NOT have a working 
> python3.8 anaconda deployment yet.  this is causing python test breakages.
> PRs incoming
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33565) python/run-tests.py calling python3.8

2020-11-25 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17238954#comment-17238954
 ] 

Apache Spark commented on SPARK-33565:
--

User 'shaneknapp' has created a pull request for this issue:
https://github.com/apache/spark/pull/30507

> python/run-tests.py calling python3.8
> -
>
> Key: SPARK-33565
> URL: https://issues.apache.org/jira/browse/SPARK-33565
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.0.1
>Reporter: Shane Knapp
>Assignee: Apache Spark
>Priority: Major
>
> this line in run-tests.py on master:
> |python_execs = [x for x in ["python3.6", "python3.8", "pypy3"] if which(x)]|
>  
> and this line in branch-3.0:
> python_execs = [x for x in ["python3.8", "python2.7", "pypy3", "pypy"] if 
> which(x)]
> ...are currently breaking builds on the new ubuntu 20.04LTS workers.
> the default  system python is /usr/bin/python3.8 and we do NOT have a working 
> python3.8 anaconda deployment yet.  this is causing python test breakages.
> PRs incoming
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33565) python/run-tests.py calling python3.8

2020-11-25 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17238951#comment-17238951
 ] 

Apache Spark commented on SPARK-33565:
--

User 'shaneknapp' has created a pull request for this issue:
https://github.com/apache/spark/pull/30506

> python/run-tests.py calling python3.8
> -
>
> Key: SPARK-33565
> URL: https://issues.apache.org/jira/browse/SPARK-33565
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.0.1
>Reporter: Shane Knapp
>Priority: Major
>
> this line in run-tests.py on master:
> |python_execs = [x for x in ["python3.6", "python3.8", "pypy3"] if which(x)]|
>  
> and this line in branch-3.0:
> python_execs = [x for x in ["python3.8", "python2.7", "pypy3", "pypy"] if 
> which(x)]
> ...are currently breaking builds on the new ubuntu 20.04LTS workers.
> the default  system python is /usr/bin/python3.8 and we do NOT have a working 
> python3.8 anaconda deployment yet.  this is causing python test breakages.
> PRs incoming
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33565) python/run-tests.py calling python3.8

2020-11-25 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33565:


Assignee: Apache Spark

> python/run-tests.py calling python3.8
> -
>
> Key: SPARK-33565
> URL: https://issues.apache.org/jira/browse/SPARK-33565
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.0.1
>Reporter: Shane Knapp
>Assignee: Apache Spark
>Priority: Major
>
> this line in run-tests.py on master:
> |python_execs = [x for x in ["python3.6", "python3.8", "pypy3"] if which(x)]|
>  
> and this line in branch-3.0:
> python_execs = [x for x in ["python3.8", "python2.7", "pypy3", "pypy"] if 
> which(x)]
> ...are currently breaking builds on the new ubuntu 20.04LTS workers.
> the default  system python is /usr/bin/python3.8 and we do NOT have a working 
> python3.8 anaconda deployment yet.  this is causing python test breakages.
> PRs incoming
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33565) python/run-tests.py calling python3.8

2020-11-25 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17238952#comment-17238952
 ] 

Apache Spark commented on SPARK-33565:
--

User 'shaneknapp' has created a pull request for this issue:
https://github.com/apache/spark/pull/30506

> python/run-tests.py calling python3.8
> -
>
> Key: SPARK-33565
> URL: https://issues.apache.org/jira/browse/SPARK-33565
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.0.1
>Reporter: Shane Knapp
>Priority: Major
>
> this line in run-tests.py on master:
> |python_execs = [x for x in ["python3.6", "python3.8", "pypy3"] if which(x)]|
>  
> and this line in branch-3.0:
> python_execs = [x for x in ["python3.8", "python2.7", "pypy3", "pypy"] if 
> which(x)]
> ...are currently breaking builds on the new ubuntu 20.04LTS workers.
> the default  system python is /usr/bin/python3.8 and we do NOT have a working 
> python3.8 anaconda deployment yet.  this is causing python test breakages.
> PRs incoming
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33565) python/run-tests.py calling python3.8

2020-11-25 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33565:


Assignee: (was: Apache Spark)

> python/run-tests.py calling python3.8
> -
>
> Key: SPARK-33565
> URL: https://issues.apache.org/jira/browse/SPARK-33565
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.0.1
>Reporter: Shane Knapp
>Priority: Major
>
> this line in run-tests.py on master:
> |python_execs = [x for x in ["python3.6", "python3.8", "pypy3"] if which(x)]|
>  
> and this line in branch-3.0:
> python_execs = [x for x in ["python3.8", "python2.7", "pypy3", "pypy"] if 
> which(x)]
> ...are currently breaking builds on the new ubuntu 20.04LTS workers.
> the default  system python is /usr/bin/python3.8 and we do NOT have a working 
> python3.8 anaconda deployment yet.  this is causing python test breakages.
> PRs incoming
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33565) python/run-tests.py calling python3.8

2020-11-25 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33565:


Assignee: Apache Spark

> python/run-tests.py calling python3.8
> -
>
> Key: SPARK-33565
> URL: https://issues.apache.org/jira/browse/SPARK-33565
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.0.1
>Reporter: Shane Knapp
>Assignee: Apache Spark
>Priority: Major
>
> this line in run-tests.py on master:
> |python_execs = [x for x in ["python3.6", "python3.8", "pypy3"] if which(x)]|
>  
> and this line in branch-3.0:
> python_execs = [x for x in ["python3.8", "python2.7", "pypy3", "pypy"] if 
> which(x)]
> ...are currently breaking builds on the new ubuntu 20.04LTS workers.
> the default  system python is /usr/bin/python3.8 and we do NOT have a working 
> python3.8 anaconda deployment yet.  this is causing python test breakages.
> PRs incoming
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33565) python/run-tests.py calling python3.8

2020-11-25 Thread Shane Knapp (Jira)

Shane Knapp created SPARK-33565:
---

 Summary: python/run-tests.py calling python3.8
 Key: SPARK-33565
 URL: https://issues.apache.org/jira/browse/SPARK-33565
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 3.0.1
Reporter: Shane Knapp


this line in run-tests.py on master:
|python_execs = [x for x in ["python3.6", "python3.8", "pypy3"] if which(x)]|

 

and this line in branch-3.0:

python_execs = [x for x in ["python3.8", "python2.7", "pypy3", "pypy"] if 
which(x)]

...are currently breaking builds on the new ubuntu 20.04LTS workers.

the default  system python is /usr/bin/python3.8 and we do NOT have a working 
python3.8 anaconda deployment yet.  this is causing python test breakages.

PRs incoming

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33489) Support null for conversion from and to Arrow type

2020-11-25 Thread Bryan Cutler (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17238950#comment-17238950
 ] 

Bryan Cutler commented on SPARK-33489:
--

Yes, Arrow supports null type. Should be pretty straightforward to add in Scala 
and Python. Is this something you are interested in working on adding 
[~cactice] ?

> Support null for conversion from and to Arrow type
> --
>
> Key: SPARK-33489
> URL: https://issues.apache.org/jira/browse/SPARK-33489
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.0.1
>Reporter: Yuya Kanai
>Priority: Minor
>
> I got below error when using from_arrow_type() in pyspark.sql.pandas.types
> {{Unsupported type in conversion from Arrow: null}}
> I noticed NullType exists under pyspark.sql.types so it seems possible to 
> convert from pyarrow null to pyspark null type and vice versa.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33564) Prometheus metrics for Master and Worker isn't working

2020-11-25 Thread Paulo Roberto de Oliveira Castro (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paulo Roberto de Oliveira Castro updated SPARK-33564:
-
Description: 
Following the [PR|https://github.com/apache/spark/pull/25769] that introduced 
the Prometheus sink, I downloaded the {{spark-3.0.1-bin-hadoop2.7.tgz}}  (also 
tested with 3.0.0), uncompressed the tgz and created a file called 
{{metrics.properties}} adding this content:
{quote}{{*.sink.prometheusServlet.class=org.apache.spark.metrics.sink.PrometheusServlet}}
 {{*.sink.prometheusServlet.path=/metrics/prometheus}}
master.sink.prometheusServlet.path=/metrics/master/prometheus
applications.sink.prometheusServlet.path=/metrics/applications/prometheus
{quote}
Then I ran: 
{quote}{{$ sbin/start-master.sh}}
 {{$ sbin/start-slave.sh spark://`hostname`:7077}}
 {{$ bin/spark-shell --master spark://`hostname`:7077 
--files=./metrics.properties --conf spark.metrics.conf=./metrics.properties}}
{quote}
{{The Spark shell opens without problems:}}
{quote}{{20/11/25 17:36:07 WARN NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable}}

{{Using Spark's default log4j profile: 
org/apache/spark/log4j-defaults.properties}}

{{Setting default log level to "WARN".}}

{{To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).}}

{{Spark context Web UI available at 
[http://192.168.0.6:4040|http://192.168.0.6:4040/]}}

{{Spark context available as 'sc' (master = 
spark://MacBook-Pro-de-Paulo-2.local:7077, app id = app-20201125173618-0002).}}

{{Spark session available as 'spark'.}}

{{Welcome to}}

{{                    __}}

{{     / __/_   _/ /__}}

{{    _\ \/ _ \/ _ `/ __/  '_/}}

{{   /___/ .__/_,_/_/ /_/_\   version 3.0.0}}

{{      /_/}}

{{         }}

{{Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_212)}}

{{Type in expressions to have them evaluated.}}

{{Type :help for more information. }}

{{scala>}}
{quote}
{{And when I try to fetch prometheus metrics for driver, everything works 
fine:}}
{quote}$ curl -s [http://localhost:4040/metrics/prometheus/] | head -n 5

metrics_app_20201125173618_0002_driver_BlockManager_disk_diskSpaceUsed_MB_Number\{type="gauges"}
 0

metrics_app_20201125173618_0002_driver_BlockManager_disk_diskSpaceUsed_MB_Value\{type="gauges"}
 0

metrics_app_20201125173618_0002_driver_BlockManager_memory_maxMem_MB_Number\{type="gauges"}
 732

metrics_app_20201125173618_0002_driver_BlockManager_memory_maxMem_MB_Value\{type="gauges"}
 732

metrics_app_20201125173618_0002_driver_BlockManager_memory_maxOffHeapMem_MB_Number\{type="gauges"}
 0
{quote}
*The problem appears when I try accessing master metrics*, and I get the 
following problem:
{quote}{{$ curl -s [http://localhost:8080/metrics/master/prometheus]}}

{{}}

{{      }}

{{        setUIRoot('')}}

{{        }}

{{        }}

{{        Spark Master at 
spark://MacBook-Pro-de-Paulo-2.local:7077}}

{{      }}

{{      }}

{{        }}

{{          }}

{{            }}

{{              }}

{{                }}

{{                  }}

{{                  3.0.0}}

{{                }}

{{                Spark Master at spark://MacBook-Pro-de-Paulo-2.local:7077}}

{{              }}

{{            }}

{{          }}

{{          }}

{{          }}

{{            }}

{{              URL: 
spark://MacBook-Pro-de-Paulo-2.local:7077}}
 ...
{quote}
The same happens for all of those here:
{quote}{{$ curl -s [http://localhost:8080/metrics/applications/prometheus/]}}
 {{$ curl -s [http://localhost:8081/metrics/prometheus/]}}
{quote}
Instead, *I expected metrics in prometheus metrics*. All related JSON endpoints 
seem to be working fine.

  was:
Following the [PR|https://github.com/apache/spark/pull/25769] that introduced 
the Prometheus sink, I downloaded the {{spark-3.0.1-bin-hadoop2.7.tgz}}  (also 
tested with 3.0.0), uncompressed the tgz and created a file called 
{{metrics.properties}} adding this content:
{quote}*.sink.prometheusServlet.class=org.apache.spark.metrics.sink.PrometheusServlet
 {{*.sink.prometheusServlet.path=/metrics/prometheus}}
 \{{ master.sink.prometheusServlet.path=/metrics/master/prometheus}}
 \{{ applications.sink.prometheusServlet.path=/metrics/applications/prometheus}}
{quote}
Then I ran: 
{quote}{{$ sbin/start-master.sh}}
{{$ sbin/start-slave.sh spark://`hostname`:7077}}
{{$ bin/spark-shell --master spark://`hostname`:7077 
--files=./metrics.properties --conf spark.metrics.conf=./metrics.properties}}
{quote}
{{The Spark shell opens without problems:}}
{quote}{{20/11/25 17:36:07 WARN NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable}}

{{Using Spark's default log4j profile: 
org/apache/spark/log4j-defaults.properties}}

{{Setting default log level to "WARN".}}

{{To adjust

[jira] [Assigned] (SPARK-33525) Upgrade hive-service-rpc to 3.1.2

2020-11-25 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-33525:
-

Assignee: Yuming Wang

> Upgrade hive-service-rpc to 3.1.2
> -
>
> Key: SPARK-33525
> URL: https://issues.apache.org/jira/browse/SPARK-33525
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>
> We supported Hive metastore are 0.12.0 through 3.1.2. but we supported 
> hive-jdbc are 0.12.0 through 2.3.7. It will throw TProtocolException if we 
> use hive-jdbc 3.x:
> {noformat}
> [root@spark-3267648 apache-hive-3.1.2-bin]# bin/beeline -u 
> jdbc:hive2://localhost:1/default
> Connecting to jdbc:hive2://localhost:1/default
> Connected to: Spark SQL (version 3.1.0-SNAPSHOT)
> Driver: Hive JDBC (version 3.1.2)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> Beeline version 3.1.2 by Apache Hive
> 0: jdbc:hive2://localhost:1/default> create table t1(id int) using 
> parquet;
> Unexpected end of file when reading from HS2 server. The root cause might be 
> too many concurrent connections. Please ask the administrator to check the 
> number of active connections, and adjust 
> hive.server2.thrift.max.worker.threads if applicable.
> Error: org.apache.thrift.transport.TTransportException (state=08S01,code=0)
> {noformat}
> {noformat}
> org.apache.thrift.protocol.TProtocolException: Missing version in 
> readMessageBegin, old client?
>   at 
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:234)
>   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:27)
>   at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:53)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:310)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
>   at java.base/java.lang.Thread.run(Thread.java:832)
> {noformat}
> We can upgrade hive-service-rpc to 3.1.2 to fix this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33525) Upgrade hive-service-rpc to 3.1.2

2020-11-25 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33525.
---
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30478
[https://github.com/apache/spark/pull/30478]

> Upgrade hive-service-rpc to 3.1.2
> -
>
> Key: SPARK-33525
> URL: https://issues.apache.org/jira/browse/SPARK-33525
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.1.0
>
>
> We supported Hive metastore are 0.12.0 through 3.1.2. but we supported 
> hive-jdbc are 0.12.0 through 2.3.7. It will throw TProtocolException if we 
> use hive-jdbc 3.x:
> {noformat}
> [root@spark-3267648 apache-hive-3.1.2-bin]# bin/beeline -u 
> jdbc:hive2://localhost:1/default
> Connecting to jdbc:hive2://localhost:1/default
> Connected to: Spark SQL (version 3.1.0-SNAPSHOT)
> Driver: Hive JDBC (version 3.1.2)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> Beeline version 3.1.2 by Apache Hive
> 0: jdbc:hive2://localhost:1/default> create table t1(id int) using 
> parquet;
> Unexpected end of file when reading from HS2 server. The root cause might be 
> too many concurrent connections. Please ask the administrator to check the 
> number of active connections, and adjust 
> hive.server2.thrift.max.worker.threads if applicable.
> Error: org.apache.thrift.transport.TTransportException (state=08S01,code=0)
> {noformat}
> {noformat}
> org.apache.thrift.protocol.TProtocolException: Missing version in 
> readMessageBegin, old client?
>   at 
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:234)
>   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:27)
>   at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:53)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:310)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
>   at java.base/java.lang.Thread.run(Thread.java:832)
> {noformat}
> We can upgrade hive-service-rpc to 3.1.2 to fix this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33523) Add predicate related benchmark to SubExprEliminationBenchmark

2020-11-25 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17238929#comment-17238929
 ] 

Apache Spark commented on SPARK-33523:
--

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/30505

> Add predicate related benchmark to SubExprEliminationBenchmark
> --
>
> Key: SPARK-33523
> URL: https://issues.apache.org/jira/browse/SPARK-33523
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
> Fix For: 3.1.0
>
>
> This is for the task to add predicate related benchmark to 
> SubExprEliminationBenchmark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33564) Prometheus metrics for Master and Worker isn't working

2020-11-25 Thread Paulo Roberto de Oliveira Castro (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paulo Roberto de Oliveira Castro updated SPARK-33564:
-
Description: 
Following the [PR|https://github.com/apache/spark/pull/25769] that introduced 
the Prometheus sink, I downloaded the {{spark-3.0.1-bin-hadoop2.7.tgz}}  (also 
tested with 3.0.0), uncompressed the tgz and created a file called 
{{metrics.properties}} adding this content:
{quote}*.sink.prometheusServlet.class=org.apache.spark.metrics.sink.PrometheusServlet
 {{*.sink.prometheusServlet.path=/metrics/prometheus}}
 \{{ master.sink.prometheusServlet.path=/metrics/master/prometheus}}
 \{{ applications.sink.prometheusServlet.path=/metrics/applications/prometheus}}
{quote}
Then I ran: 
{quote}{{$ sbin/start-master.sh}}
{{$ sbin/start-slave.sh spark://`hostname`:7077}}
{{$ bin/spark-shell --master spark://`hostname`:7077 
--files=./metrics.properties --conf spark.metrics.conf=./metrics.properties}}
{quote}
{{The Spark shell opens without problems:}}
{quote}{{20/11/25 17:36:07 WARN NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable}}

{{Using Spark's default log4j profile: 
org/apache/spark/log4j-defaults.properties}}

{{Setting default log level to "WARN".}}

{{To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).}}

{{Spark context Web UI available at 
[http://192.168.0.6:4040|http://192.168.0.6:4040/]}}

{{Spark context available as 'sc' (master = 
spark://MacBook-Pro-de-Paulo-2.local:7077, app id = app-20201125173618-0002).}}

{{Spark session available as 'spark'.}}

{{Welcome to}}

{{                    __}}

{{     / __/_   _/ /__}}

{{    _\ \/ _ \/ _ `/ __/  '_/}}

{{   /___/ .__/_,_/_/ /_/_\   version 3.0.0}}

{{      /_/}}

{{         }}

{{Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_212)}}

{{Type in expressions to have them evaluated.}}

{{Type :help for more information. }}

{{scala>}}
{quote}
{{And when I try to fetch prometheus metrics for driver, everything works 
fine:}}
{quote}$ curl -s [http://localhost:4040/metrics/prometheus/] | head -n 5

metrics_app_20201125173618_0002_driver_BlockManager_disk_diskSpaceUsed_MB_Number\{type="gauges"}
 0

metrics_app_20201125173618_0002_driver_BlockManager_disk_diskSpaceUsed_MB_Value\{type="gauges"}
 0

metrics_app_20201125173618_0002_driver_BlockManager_memory_maxMem_MB_Number\{type="gauges"}
 732

metrics_app_20201125173618_0002_driver_BlockManager_memory_maxMem_MB_Value\{type="gauges"}
 732

metrics_app_20201125173618_0002_driver_BlockManager_memory_maxOffHeapMem_MB_Number\{type="gauges"}
 0
{quote}
*The problem appears when I try accessing master metrics*, and I get the 
following problem:
{quote}{{$ curl -s [http://localhost:8080/metrics/master/prometheus]}}

{{}}

{{      }}

{{        setUIRoot('')}}

{{        }}

{{        }}

{{        Spark Master at 
spark://MacBook-Pro-de-Paulo-2.local:7077}}

{{      }}

{{      }}

{{        }}

{{          }}

{{            }}

{{              }}

{{                }}

{{                  }}

{{                  3.0.0}}

{{                }}

{{                Spark Master at spark://MacBook-Pro-de-Paulo-2.local:7077}}

{{              }}

{{            }}

{{          }}

{{          }}

{{          }}

{{            }}

{{              URL: 
spark://MacBook-Pro-de-Paulo-2.local:7077}}
...
{quote}
The same happens for all of those here:
{quote}{{$ curl -s [http://localhost:8080/metrics/applications/prometheus/]}}
 {{$ curl -s [http://localhost:8081/metrics/prometheus/]}}
{quote}
Instead, *I expected metrics in prometheus metrics*. All related JSON endpoints 
seem to be working fine.

  was:
Following the [PR|https://github.com/apache/spark/pull/25769] that introduced 
the Prometheus sink, I downloaded the {{spark-3.0.1-bin-hadoop2.7.tgz}}  (also 
tested with 3.0.0), uncompressed the tgz and created a file called 
{{metrics.properties}} adding this content:
{quote}*.sink.prometheusServlet.class=org.apache.spark.metrics.sink.PrometheusServlet
{{*.sink.prometheusServlet.path=/metrics/prometheus}}
{{ master.sink.prometheusServlet.path=/metrics/master/prometheus}}
{{ applications.sink.prometheusServlet.path=/metrics/applications/prometheus}}
{quote}
Then I ran: 
{quote}$ sbin/start-master.sh
{{ {{$ sbin/start-slave.sh spark://`hostname`:7077
{{ {{$ bin/spark-shell --master spark://`hostname`:7077 
--files=./metrics.properties --conf spark.metrics.conf=./metrics.properties
{quote}
{{The Spark shell opens without problems:}}
{quote}{{20/11/25 17:36:07 WARN NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable}}

{{Using Spark's default log4j profile: 
org/apache/spark/log4j-defaults.properties}}

{{Setting default log level to

[jira] [Updated] (SPARK-33564) Prometheus metrics for Master and Worker isn't working

2020-11-25 Thread Paulo Roberto de Oliveira Castro (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paulo Roberto de Oliveira Castro updated SPARK-33564:
-
Description: 
Following the [PR|https://github.com/apache/spark/pull/25769] that introduced 
the Prometheus sink, I downloaded the {{spark-3.0.1-bin-hadoop2.7.tgz}}  (also 
tested with 3.0.0), uncompressed the tgz and created a file called 
{{metrics.properties}} adding this content:
{quote}*.sink.prometheusServlet.class=org.apache.spark.metrics.sink.PrometheusServlet
{{*.sink.prometheusServlet.path=/metrics/prometheus}}
{{ master.sink.prometheusServlet.path=/metrics/master/prometheus}}
{{ applications.sink.prometheusServlet.path=/metrics/applications/prometheus}}
{quote}
Then I ran: 
{quote}$ sbin/start-master.sh
{{ {{$ sbin/start-slave.sh spark://`hostname`:7077
{{ {{$ bin/spark-shell --master spark://`hostname`:7077 
--files=./metrics.properties --conf spark.metrics.conf=./metrics.properties
{quote}
{{The Spark shell opens without problems:}}
{quote}{{20/11/25 17:36:07 WARN NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable}}

{{Using Spark's default log4j profile: 
org/apache/spark/log4j-defaults.properties}}

{{Setting default log level to "WARN".}}

{{To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).}}

{{Spark context Web UI available at 
[http://192.168.0.6:4040|http://192.168.0.6:4040/]}}

{{Spark context available as 'sc' (master = 
spark://MacBook-Pro-de-Paulo-2.local:7077, app id = app-20201125173618-0002).}}

{{Spark session available as 'spark'.}}

{{Welcome to}}

{{                    __}}

{{     / __/_   _/ /__}}

{{    _\ \/ _ \/ _ `/ __/  '_/}}

{{   /___/ .__/_,_/_/ /_/_\   version 3.0.0}}

{{      /_/}}

{{         }}

{{Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_212)}}

{{Type in expressions to have them evaluated.}}

{{Type :help for more information. }}

{{scala>}}
{quote}
{{And when I try to fetch prometheus metrics for driver, everything works 
fine:}}
{quote}$ curl -s [http://localhost:4040/metrics/prometheus/] | head -n 5

metrics_app_20201125173618_0002_driver_BlockManager_disk_diskSpaceUsed_MB_Number\{type="gauges"}
 0

metrics_app_20201125173618_0002_driver_BlockManager_disk_diskSpaceUsed_MB_Value\{type="gauges"}
 0

metrics_app_20201125173618_0002_driver_BlockManager_memory_maxMem_MB_Number\{type="gauges"}
 732

metrics_app_20201125173618_0002_driver_BlockManager_memory_maxMem_MB_Value\{type="gauges"}
 732

metrics_app_20201125173618_0002_driver_BlockManager_memory_maxOffHeapMem_MB_Number\{type="gauges"}
 0
{quote}
*The problem appears when I try accessing master metrics*, and I get the 
following problem:
{quote}{{$ curl -s [http://localhost:8080/metrics/master/prometheus]}}

{{}}

{{      }}

{{        setUIRoot('')}}

{{        }}

{{        }}

{{        Spark Master at 
spark://MacBook-Pro-de-Paulo-2.local:7077}}

{{      }}

{{      }}

{{        }}

{{          }}

{{            }}

{{              }}

{{                }}

{{                  }}

{{                  3.0.0}}

{{                }}

{{                Spark Master at spark://MacBook-Pro-de-Paulo-2.local:7077}}

{{              }}

{{            }}

{{          }}

{{          }}

{{          }}

{{            }}

{{              URL: 
spark://MacBook-Pro-de-Paulo-2.local:7077}}
{{ ...}}
{quote}
The same happens for all of those here:
{quote}{{$ curl -s [http://localhost:8080/metrics/applications/prometheus/]}}
 {{$ curl -s [http://localhost:8081/metrics/prometheus/]}}
{quote}
Instead, *I expected metrics in prometheus metrics*. All related JSON endpoints 
seem to be working fine.

  was:
Following the [PR|https://github.com/apache/spark/pull/25769] that introduced 
the Prometheus sink, I downloaded the {{spark-3.0.1-bin-hadoop2.7.tgz}}  (also 
tested with 3.0.0), uncompressed the tgz and created a file called 
\{{metrics.properties __ }}adding this content:

{{*.sink.prometheusServlet.class=org.apache.spark.metrics.sink.PrometheusServlet}}
 {{*.sink.prometheusServlet.path=/metrics/prometheus
 master.sink.prometheusServlet.path=/metrics/master/prometheus
 applications.sink.prometheusServlet.path=/metrics/applications/prometheus}}

Then I ran: 

{{$ sbin/start-master.sh}}
 {{$ sbin/start-slave.sh spark://`hostname`:7077}}
 {{$ bin/spark-shell --master spark://`hostname`:7077 
--files=./metrics.properties --conf spark.metrics.conf=./metrics.properties}}

{{The Spark shell opens without problems:}}
{quote}{{20/11/25 17:36:07 WARN NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable}}

{{Using Spark's default log4j profile: 
org/apache/spark/log4j-defaults.properties}}

{{Setting default log level to "WARN".}}

{{To adjust logging

[jira] [Updated] (SPARK-33564) Prometheus metrics for Master and Worker isn't working

2020-11-25 Thread Paulo Roberto de Oliveira Castro (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paulo Roberto de Oliveira Castro updated SPARK-33564:
-
Description: 
Following the [PR|https://github.com/apache/spark/pull/25769] that introduced 
the Prometheus sink, I downloaded the {{spark-3.0.1-bin-hadoop2.7.tgz}}  (also 
tested with 3.0.0), uncompressed the tgz and created a file called 
\{{metrics.properties __ }}adding this content:

{{*.sink.prometheusServlet.class=org.apache.spark.metrics.sink.PrometheusServlet}}
 {{*.sink.prometheusServlet.path=/metrics/prometheus
 master.sink.prometheusServlet.path=/metrics/master/prometheus
 applications.sink.prometheusServlet.path=/metrics/applications/prometheus}}

Then I ran: 

{{$ sbin/start-master.sh}}
 {{$ sbin/start-slave.sh spark://`hostname`:7077}}
 {{$ bin/spark-shell --master spark://`hostname`:7077 
--files=./metrics.properties --conf spark.metrics.conf=./metrics.properties}}

{{The Spark shell opens without problems:}}
{quote}{{20/11/25 17:36:07 WARN NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable}}

{{Using Spark's default log4j profile: 
org/apache/spark/log4j-defaults.properties}}

{{Setting default log level to "WARN".}}

{{To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).}}

{{Spark context Web UI available at 
[http://192.168.0.6:4040|http://192.168.0.6:4040/]}}

{{Spark context available as 'sc' (master = 
spark://MacBook-Pro-de-Paulo-2.local:7077, app id = app-20201125173618-0002).}}

{{Spark session available as 'spark'.}}

{{Welcome to}}

{{                    __}}

{{     / __/_   _/ /__}}

{{    _\ \/ _ \/ _ `/ __/  '_/}}

{{   /___/ .__/_,_/_/ /_/_\   version 3.0.0}}

{{      /_/}}

{{         }}

{{Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_212)}}

{{Type in expressions to have them evaluated.}}

{{Type :help for more information. }}

{{scala>}}
{quote}
{{And when I try to fetch prometheus metrics for driver, everything works 
fine:}}
{quote}$ curl -s [http://localhost:4040/metrics/prometheus/] | head -n 5

metrics_app_20201125173618_0002_driver_BlockManager_disk_diskSpaceUsed_MB_Number\{type="gauges"}
 0

metrics_app_20201125173618_0002_driver_BlockManager_disk_diskSpaceUsed_MB_Value\{type="gauges"}
 0

metrics_app_20201125173618_0002_driver_BlockManager_memory_maxMem_MB_Number\{type="gauges"}
 732

metrics_app_20201125173618_0002_driver_BlockManager_memory_maxMem_MB_Value\{type="gauges"}
 732

metrics_app_20201125173618_0002_driver_BlockManager_memory_maxOffHeapMem_MB_Number\{type="gauges"}
 0
{quote}
*The problem appears when I try accessing master metrics*, and I get the 
following problem:
{quote}$ curl -s [http://localhost:8080/metrics/master/prometheus]



      

        setUIRoot('')

        

        

        Spark Master at spark://MacBook-Pro-de-Paulo-2.local:7077

      

      

        

          

            

              

                

                  

                  3.0.0

                

                Spark Master at spark://MacBook-Pro-de-Paulo-2.local:7077

              

            

          

          

          

            

              URL: 
spark://MacBook-Pro-de-Paulo-2.local:7077
 ...
{quote}
The same happens for all of those here:
{quote}{{$ curl -s [http://localhost:8080/metrics/applications/prometheus/]}}
 {{$ curl -s [http://localhost:8081/metrics/prometheus/]}}
{quote}
Instead, *I expected metrics in prometheus metrics*. All related JSON endpoints 
seem to be working fine.

  was:
Following the [PR|https://github.com/apache/spark/pull/25769] that introduced 
the Prometheus sink, I downloaded the {{spark-3.0.1-bin-hadoop2.7.tgz}}  (also 
tested with 3.0.0), uncompressed the tgz and created a file called 
\{{metrics.properties __ }}adding this content:
 
{{*.sink.prometheusServlet.class=org.apache.spark.metrics.sink.PrometheusServlet}}
 {{*.sink.prometheusServlet.path=/metrics/prometheus
 master.sink.prometheusServlet.path=/metrics/master/prometheus
 applications.sink.prometheusServlet.path=/metrics/applications/prometheus}}

Then I ran: 

{{$ sbin/start-master.sh}}
 {{$ sbin/start-slave.sh spark://`hostname`:7077}}
 {{$ bin/spark-shell --master spark://`hostname`:7077 
--files=./metrics.properties --conf spark.metrics.conf=./metrics.properties}}

{{The Spark shell opens without problems:}}
{quote}20/11/25 17:36:07 WARN NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties

Setting default log level to "WARN".

To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).

Spark context Web UI available at 
[http://192.168.0.6:4040|http://192.168.0.6:4040/]

[jira] [Updated] (SPARK-33564) Prometheus metrics for Master and Worker isn't working

2020-11-25 Thread Paulo Roberto de Oliveira Castro (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paulo Roberto de Oliveira Castro updated SPARK-33564:
-
Description: 
Following the [PR|https://github.com/apache/spark/pull/25769] that introduced 
the Prometheus sink, I downloaded the {{spark-3.0.1-bin-hadoop2.7.tgz}}  (also 
tested with 3.0.0), uncompressed the tgz and created a file called 
\{{metrics.properties __ }}adding this content:
 
{{*.sink.prometheusServlet.class=org.apache.spark.metrics.sink.PrometheusServlet}}
 {{*.sink.prometheusServlet.path=/metrics/prometheus
 master.sink.prometheusServlet.path=/metrics/master/prometheus
 applications.sink.prometheusServlet.path=/metrics/applications/prometheus}}

Then I ran: 

{{$ sbin/start-master.sh}}
 {{$ sbin/start-slave.sh spark://`hostname`:7077}}
 {{$ bin/spark-shell --master spark://`hostname`:7077 
--files=./metrics.properties --conf spark.metrics.conf=./metrics.properties}}

{{The Spark shell opens without problems:}}
{quote}20/11/25 17:36:07 WARN NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties

Setting default log level to "WARN".

To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).

Spark context Web UI available at 
[http://192.168.0.6:4040|http://192.168.0.6:4040/]

Spark context available as 'sc' (master = 
spark://MacBook-Pro-de-Paulo-2.local:7077, app id = app-20201125173618-0002).

Spark session available as 'spark'.

Welcome to

                    __

     / __/_   _/ /__

    _\ \/ _ \/ _ `/ __/  '_/

   /___/ .__/_,_/_/ /_/_\   version 3.0.0

      /_/

         

Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_212)

Type in expressions to have them evaluated.

Type :help for more information. 

scala>
{quote}
{{And when I try to fetch prometheus metrics for driver, everything works 
fine:}}
{quote}$ curl -s [http://localhost:4040/metrics/prometheus/] | head -n 5

metrics_app_20201125173618_0002_driver_BlockManager_disk_diskSpaceUsed_MB_Number\{type="gauges"}
 0

metrics_app_20201125173618_0002_driver_BlockManager_disk_diskSpaceUsed_MB_Value\{type="gauges"}
 0

metrics_app_20201125173618_0002_driver_BlockManager_memory_maxMem_MB_Number\{type="gauges"}
 732

metrics_app_20201125173618_0002_driver_BlockManager_memory_maxMem_MB_Value\{type="gauges"}
 732

metrics_app_20201125173618_0002_driver_BlockManager_memory_maxOffHeapMem_MB_Number\{type="gauges"}
 0
{quote}
*The problem appears when I try accessing master metrics*, and I get the 
following problem:
{quote}$ curl -s [http://localhost:8080/metrics/master/prometheus]



      

        setUIRoot('')

        

        

        Spark Master at spark://MacBook-Pro-de-Paulo-2.local:7077

      

      

        

          

            

              

                

                  

                  3.0.0

                

                Spark Master at spark://MacBook-Pro-de-Paulo-2.local:7077

              

            

          

          

          

            

              URL: 
spark://MacBook-Pro-de-Paulo-2.local:7077
 ...
{quote}
The same happens for all of those here:
{quote}{{$ curl -s [http://localhost:8080/metrics/applications/prometheus/]}}
 {{$ curl -s [http://localhost:8081/metrics/prometheus/]}}
{quote}
Instead, *I expected metrics in prometheus metrics*. All related JSON endpoints 
seem to be working fine.

  was:
Following the [PR|https://github.com/apache/spark/pull/25769] that introduced 
the Prometheus sink, I downloaded the {{spark-3.0.1-bin-hadoop2.7.tgz}}  (also 
tested with 3.0.0), uncompressed the tgz and created a file called 
{{metrics.properties __ }}adding this content:
{{}} 

{{*.sink.prometheusServlet.class=org.apache.spark.metrics.sink.PrometheusServlet}}
{{*.sink.prometheusServlet.path=/metrics/prometheus
master.sink.prometheusServlet.path=/metrics/master/prometheus
applications.sink.prometheusServlet.path=/metrics/applications/prometheus}}

Then I ran:

 

{{$ sbin/start-master.sh}}
{{$ sbin/start-slave.sh spark://`hostname`:7077}}
{{$ bin/spark-shell --master spark://`hostname`:7077 
--files=./metrics.properties --conf spark.metrics.conf=./metrics.properties}}

{{The Spark shell opens without problems:}}

{{}}
{quote}20/11/25 17:36:07 WARN NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable

{{}}

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties

{{}}

Setting default log level to "WARN".

{{}}

To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).

{{}}

Spark context Web UI available at http://192.168.0.6:4040

{{}}

Spark context available as 'sc' (master =

[jira] [Created] (SPARK-33564) Prometheus metrics for Master and Worker isn't working

2020-11-25 Thread Paulo Roberto de Oliveira Castro (Jira)

Paulo Roberto de Oliveira Castro created SPARK-33564:


 Summary: Prometheus metrics for Master and Worker isn't working 
 Key: SPARK-33564
 URL: https://issues.apache.org/jira/browse/SPARK-33564
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, Spark Shell
Affects Versions: 3.0.1, 3.0.0
Reporter: Paulo Roberto de Oliveira Castro


Following the [PR|https://github.com/apache/spark/pull/25769] that introduced 
the Prometheus sink, I downloaded the {{spark-3.0.1-bin-hadoop2.7.tgz}}  (also 
tested with 3.0.0), uncompressed the tgz and created a file called 
{{metrics.properties __ }}adding this content:
{{}} 

{{*.sink.prometheusServlet.class=org.apache.spark.metrics.sink.PrometheusServlet}}
{{*.sink.prometheusServlet.path=/metrics/prometheus
master.sink.prometheusServlet.path=/metrics/master/prometheus
applications.sink.prometheusServlet.path=/metrics/applications/prometheus}}

Then I ran:

 

{{$ sbin/start-master.sh}}
{{$ sbin/start-slave.sh spark://`hostname`:7077}}
{{$ bin/spark-shell --master spark://`hostname`:7077 
--files=./metrics.properties --conf spark.metrics.conf=./metrics.properties}}

{{The Spark shell opens without problems:}}

{{}}
{quote}20/11/25 17:36:07 WARN NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable

{{}}

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties

{{}}

Setting default log level to "WARN".

{{}}

To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).

{{}}

Spark context Web UI available at http://192.168.0.6:4040

{{}}

Spark context available as 'sc' (master = 
spark://MacBook-Pro-de-Paulo-2.local:7077, app id = app-20201125173618-0002).

{{}}

Spark session available as 'spark'.

{{}}

Welcome to

{{}}

                    __

{{}}

     / __/__  ___ _/ /__

{{}}

    _\ \/ _ \/ _ `/ __/  '_/

{{}}

   /___/ .__/\_,_/_/ /_/\_\   version 3.0.0

{{}}

      /_/

{{}}

         

{{}}

Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_212)

{{}}

Type in expressions to have them evaluated.

{{}}

Type :help for more information.

{{}}

 

{{}}

scala>
{quote}
{{And when I try to fetch prometheus metrics for driver, everything works 
fine:}}
{quote}$ curl -s http://localhost:4040/metrics/prometheus/ | head -n 5

metrics_app_20201125173618_0002_driver_BlockManager_disk_diskSpaceUsed_MB_Number\{type="gauges"}
 0

metrics_app_20201125173618_0002_driver_BlockManager_disk_diskSpaceUsed_MB_Value\{type="gauges"}
 0

metrics_app_20201125173618_0002_driver_BlockManager_memory_maxMem_MB_Number\{type="gauges"}
 732

metrics_app_20201125173618_0002_driver_BlockManager_memory_maxMem_MB_Value\{type="gauges"}
 732

metrics_app_20201125173618_0002_driver_BlockManager_memory_maxOffHeapMem_MB_Number\{type="gauges"}
 0
{quote}

*The problem appears when I try accessing master metrics*, and I get the 
following problem:


{quote}$ curl -s http://localhost:8080/metrics/master/prometheus




      

        setUIRoot('')

        

        

        Spark Master at spark://MacBook-Pro-de-Paulo-2.local:7077

      

      

        

          

            

              

                

                  

                  3.0.0

                

                Spark Master at spark://MacBook-Pro-de-Paulo-2.local:7077

              

            

          

          

          

            

              URL: 
spark://MacBook-Pro-de-Paulo-2.local:7077
...
{quote}
The same happens for all of those here:

{quote}{{$ curl -s [http://localhost:8080/metrics/applications/prometheus/]}}
{{$ curl -s [http://localhost:8081/metrics/prometheus/]}}
{quote}
Instead, *I expected metrics in prometheus metrics*. All related JSON endpoints 
seem to be working fine.

{{}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33544) explode should not filter when used with CreateArray

2020-11-25 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17238892#comment-17238892
 ] 

Apache Spark commented on SPARK-33544:
--

User 'tgravescs' has created a pull request for this issue:
https://github.com/apache/spark/pull/30504

> explode should not filter when used with CreateArray
> 
>
> Key: SPARK-33544
> URL: https://issues.apache.org/jira/browse/SPARK-33544
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Thomas Graves
>Priority: Major
>
> https://issues.apache.org/jira/browse/SPARK-32295 added in an optimization to 
> insert a filter for not null and size > 0 when using inner explode/inline. 
> This is fine in most cases but the extra filter is not needed if the explode 
> is with a create array and not using Literals (it already handles LIterals).  
> When this happens you know that the values aren't null and it has a size.  It 
> already handles the empty array.
> for instance:
> val df = someDF.selectExpr("number", "explode(array(word, col3))")
> So in this case we shouldn't be inserting the extra Filter and that filter 
> can get pushed down into like a parquet reader as well. This is just causing 
> extra overhead.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33544) explode should not filter when used with CreateArray

2020-11-25 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33544:


Assignee: Apache Spark

> explode should not filter when used with CreateArray
> 
>
> Key: SPARK-33544
> URL: https://issues.apache.org/jira/browse/SPARK-33544
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Thomas Graves
>Assignee: Apache Spark
>Priority: Major
>
> https://issues.apache.org/jira/browse/SPARK-32295 added in an optimization to 
> insert a filter for not null and size > 0 when using inner explode/inline. 
> This is fine in most cases but the extra filter is not needed if the explode 
> is with a create array and not using Literals (it already handles LIterals).  
> When this happens you know that the values aren't null and it has a size.  It 
> already handles the empty array.
> for instance:
> val df = someDF.selectExpr("number", "explode(array(word, col3))")
> So in this case we shouldn't be inserting the extra Filter and that filter 
> can get pushed down into like a parquet reader as well. This is just causing 
> extra overhead.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33544) explode should not filter when used with CreateArray

2020-11-25 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33544:


Assignee: (was: Apache Spark)

> explode should not filter when used with CreateArray
> 
>
> Key: SPARK-33544
> URL: https://issues.apache.org/jira/browse/SPARK-33544
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Thomas Graves
>Priority: Major
>
> https://issues.apache.org/jira/browse/SPARK-32295 added in an optimization to 
> insert a filter for not null and size > 0 when using inner explode/inline. 
> This is fine in most cases but the extra filter is not needed if the explode 
> is with a create array and not using Literals (it already handles LIterals).  
> When this happens you know that the values aren't null and it has a size.  It 
> already handles the empty array.
> for instance:
> val df = someDF.selectExpr("number", "explode(array(word, col3))")
> So in this case we shouldn't be inserting the extra Filter and that filter 
> can get pushed down into like a parquet reader as well. This is just causing 
> extra overhead.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26645) CSV infer schema bug infers decimal(9,-1)

2020-11-25 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-26645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-26645:
--
Affects Version/s: 2.4.7

> CSV infer schema bug infers decimal(9,-1)
> -
>
> Key: SPARK-26645
> URL: https://issues.apache.org/jira/browse/SPARK-26645
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0, 2.4.7
>Reporter: Ohad Raviv
>Assignee: Marco Gaido
>Priority: Minor
> Fix For: 3.0.0
>
>
> we have a file /tmp/t1/file.txt that contains only one line "1.18927098E9".
> running:
> {code:python}
> df = spark.read.csv('/tmp/t1', header=False, inferSchema=True, sep='\t')
> print df.dtypes
> {code}
> causes:
> {noformat}
> ValueError: Could not parse datatype: decimal(9,-1)
> {noformat}
> I'm not sure where the bug is - inferSchema or dtypes?
> I saw it is legal to have a decimal with negative scale in the code 
> (CSVInferSchema.scala):
> {code:python}
> if (bigDecimal.scale <= 0) {
> // `DecimalType` conversion can fail when
> //   1. The precision is bigger than 38.
> //   2. scale is bigger than precision.
> DecimalType(bigDecimal.precision, bigDecimal.scale)
>   } 
> {code}
> but what does it mean?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26645) CSV infer schema bug infers decimal(9,-1)

2020-11-25 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-26645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17238878#comment-17238878
 ] 

Apache Spark commented on SPARK-26645:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/30503

> CSV infer schema bug infers decimal(9,-1)
> -
>
> Key: SPARK-26645
> URL: https://issues.apache.org/jira/browse/SPARK-26645
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Ohad Raviv
>Assignee: Marco Gaido
>Priority: Minor
> Fix For: 3.0.0
>
>
> we have a file /tmp/t1/file.txt that contains only one line "1.18927098E9".
> running:
> {code:python}
> df = spark.read.csv('/tmp/t1', header=False, inferSchema=True, sep='\t')
> print df.dtypes
> {code}
> causes:
> {noformat}
> ValueError: Could not parse datatype: decimal(9,-1)
> {noformat}
> I'm not sure where the bug is - inferSchema or dtypes?
> I saw it is legal to have a decimal with negative scale in the code 
> (CSVInferSchema.scala):
> {code:python}
> if (bigDecimal.scale <= 0) {
> // `DecimalType` conversion can fail when
> //   1. The precision is bigger than 38.
> //   2. scale is bigger than precision.
> DecimalType(bigDecimal.precision, bigDecimal.scale)
>   } 
> {code}
> but what does it mean?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26645) CSV infer schema bug infers decimal(9,-1)

2020-11-25 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-26645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17238876#comment-17238876
 ] 

Dongjoon Hyun commented on SPARK-26645:
---

Okay. Let me make a backporting PR to branch-2.4, [~bullsoverbears].

> CSV infer schema bug infers decimal(9,-1)
> -
>
> Key: SPARK-26645
> URL: https://issues.apache.org/jira/browse/SPARK-26645
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Ohad Raviv
>Assignee: Marco Gaido
>Priority: Minor
> Fix For: 3.0.0
>
>
> we have a file /tmp/t1/file.txt that contains only one line "1.18927098E9".
> running:
> {code:python}
> df = spark.read.csv('/tmp/t1', header=False, inferSchema=True, sep='\t')
> print df.dtypes
> {code}
> causes:
> {noformat}
> ValueError: Could not parse datatype: decimal(9,-1)
> {noformat}
> I'm not sure where the bug is - inferSchema or dtypes?
> I saw it is legal to have a decimal with negative scale in the code 
> (CSVInferSchema.scala):
> {code:python}
> if (bigDecimal.scale <= 0) {
> // `DecimalType` conversion can fail when
> //   1. The precision is bigger than 38.
> //   2. scale is bigger than precision.
> DecimalType(bigDecimal.precision, bigDecimal.scale)
>   } 
> {code}
> but what does it mean?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32223) Support adding a user provided config map.

2020-11-25 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17238872#comment-17238872
 ] 

Dongjoon Hyun commented on SPARK-32223:
---

Is driver/executor template not enough for that purpose, [~prashant]?

> Support adding a user provided config map.
> --
>
> Key: SPARK-32223
> URL: https://issues.apache.org/jira/browse/SPARK-32223
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Prashant Sharma
>Priority: Major
>
> One of the challenge with this is, spark.properties is not user provided and 
> is calculated based on certain factors. So a user provided config map, cannot 
> be used as is to mount as SPARK_CONF_DIR, so it will have to be somehow 
> augmented with the correct spark.properties.
> Q, Do we support update to config map properties for an already running job?
> Ans: No, since the spark.properties is calculated at the time of job 
> submission, it cannot be updated on the fly and it is not supported by Spark 
> at the moment for all the configuration values.
> Q. What are the usecases, where supplying SPARK_CONF_DIR via a config map 
> helps?
> One of the use case, I can think of is programmatically submitting a `spark 
> on k8s` job - e.g. spark as a service on a cloud deployment may find this 
> feature useful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-19875) Map->filter on many columns gets stuck in constraint inference optimization code

2020-11-25 Thread Asif (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-19875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17238411#comment-17238411
 ] 

Asif edited comment on SPARK-19875 at 11/25/20, 6:02 PM:
-

[~maropu], [~sameerag]  [~jay.pranavamurthi] I have generated a PR for 
SPARK-33152 which fixes the OOM or unreasonable compile time in queries.

The PR is [pr-for-spark-33152|https://github.com/apache/spark/pull/30185]

I cannot get any body for code review.

The explanation of the logic used is in the PR.

If needed we can go through the code together. This is going to be used by 
workday in production.


was (Author: ashahid7):
[~maropu], [~sameerag]  [~jay.pranavamurthi] I have generated a PR for 
SPARK-3152 which fixes the OOM or unreasonable compile time in queries.

The PR is [pr-for-spark-33152|https://github.com/apache/spark/pull/30185]

I cannot get any body for code review.

The explanation of the logic used is in the PR.

If needed we can go through the code together. This is going to be used by 
workday in production.

> Map->filter on many columns gets stuck in constraint inference optimization 
> code
> 
>
> Key: SPARK-19875
> URL: https://issues.apache.org/jira/browse/SPARK-19875
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Jay Pranavamurthi
>Priority: Major
>  Labels: bulk-closed
> Attachments: TestFilter.scala, test10cols.csv, test50cols.csv
>
>
> The attached code (TestFilter.scala) works with a 10-column csv dataset, but 
> gets stuck with a 50-column csv dataset. Both datasets are attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33561) distinguish table properties and options in V2 catalog API

2020-11-25 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17238835#comment-17238835
 ] 

Apache Spark commented on SPARK-33561:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/30502

> distinguish table properties and options in V2 catalog API
> --
>
> Key: SPARK-33561
> URL: https://issues.apache.org/jira/browse/SPARK-33561
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Wenchen Fan
>Assignee: Ryan Blue
>Priority: Major
> Fix For: 3.1.0
>
>
> Right now, user-specified table properties and options are merged and passed 
> to `TableCatalog.createTable` as table properties, because v2 catalog does 
> not have the table options concept.
> We need to distinguish table properties and options if we want to use v2 
> catalog API to implement hive catalog. We can add an `option.` prefix to 
> table options when merging it into table properties. For backward 
> compatibility, we can also merge the original table options without the 
> `option.` prefix into table properties as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33561) distinguish table properties and options in V2 catalog API

2020-11-25 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17238834#comment-17238834
 ] 

Apache Spark commented on SPARK-33561:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/30502

> distinguish table properties and options in V2 catalog API
> --
>
> Key: SPARK-33561
> URL: https://issues.apache.org/jira/browse/SPARK-33561
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Wenchen Fan
>Assignee: Ryan Blue
>Priority: Major
> Fix For: 3.1.0
>
>
> Right now, user-specified table properties and options are merged and passed 
> to `TableCatalog.createTable` as table properties, because v2 catalog does 
> not have the table options concept.
> We need to distinguish table properties and options if we want to use v2 
> catalog API to implement hive catalog. We can add an `option.` prefix to 
> table options when merging it into table properties. For backward 
> compatibility, we can also merge the original table options without the 
> `option.` prefix into table properties as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33563) Expose inverse hyperbolic trig functions in PySpark and SparkR

2020-11-25 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17238833#comment-17238833
 ] 

Apache Spark commented on SPARK-33563:
--

User 'zero323' has created a pull request for this issue:
https://github.com/apache/spark/pull/30501

> Expose inverse hyperbolic trig functions in PySpark and SparkR
> --
>
> Key: SPARK-33563
> URL: https://issues.apache.org/jira/browse/SPARK-33563
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SparkR, SQL
>Affects Versions: 3.1.0
>Reporter: Maciej Szymkiewicz
>Priority: Minor
>
> {{acosh}}, {{asinh}} and {{atanh}} were exposed in Scala {{sql.functions}} in 
> Spark 3.1 (SPARK-33061).
> For consistency, we should expose these in Python and R as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33563) Expose inverse hyperbolic trig functions in PySpark and SparkR

2020-11-25 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33563:


Assignee: (was: Apache Spark)

> Expose inverse hyperbolic trig functions in PySpark and SparkR
> --
>
> Key: SPARK-33563
> URL: https://issues.apache.org/jira/browse/SPARK-33563
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SparkR, SQL
>Affects Versions: 3.1.0
>Reporter: Maciej Szymkiewicz
>Priority: Minor
>
> {{acosh}}, {{asinh}} and {{atanh}} were exposed in Scala {{sql.functions}} in 
> Spark 3.1 (SPARK-33061).
> For consistency, we should expose these in Python and R as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33563) Expose inverse hyperbolic trig functions in PySpark and SparkR

2020-11-25 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33563:


Assignee: Apache Spark

> Expose inverse hyperbolic trig functions in PySpark and SparkR
> --
>
> Key: SPARK-33563
> URL: https://issues.apache.org/jira/browse/SPARK-33563
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SparkR, SQL
>Affects Versions: 3.1.0
>Reporter: Maciej Szymkiewicz
>Assignee: Apache Spark
>Priority: Minor
>
> {{acosh}}, {{asinh}} and {{atanh}} were exposed in Scala {{sql.functions}} in 
> Spark 3.1 (SPARK-33061).
> For consistency, we should expose these in Python and R as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31257) Unify create table syntax to fix ambiguous two different CREATE TABLE syntaxes

2020-11-25 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17238832#comment-17238832
 ] 

Apache Spark commented on SPARK-31257:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/30502

> Unify create table syntax to fix ambiguous two different CREATE TABLE syntaxes
> --
>
> Key: SPARK-31257
> URL: https://issues.apache.org/jira/browse/SPARK-31257
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Jungtaek Lim
>Assignee: Ryan Blue
>Priority: Major
> Fix For: 3.1.0
>
>
> There's a discussion in dev@ mailing list to point out ambiguous syntaxes for 
> CREATE TABLE DDL. This issue tracks the efforts to resolve the root issue via 
> unifying the create table syntax.
> [https://lists.apache.org/thread.html/rf1acfaaa3de2d3129575199c28e7d529d38f2783e7d3c5be2ac8923d%40%3Cdev.spark.apache.org%3E]
> We should ensure the new "single" create table syntax is very deterministic 
> to both devs and end users.
> This work also includes how to pass the extra CREATE TABLE information(from 
> the Hive tyle CREATE TABLE) to the v2 catalog:
> 1.  turn the Hive serde info to table properties with `option.` prefix.
> 2. add a new v2 table option `external` to indicate CREATE EXTERNAL TABLE.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33540) Subexpression elimination for interpreted predicate

2020-11-25 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33540.
---
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30497
[https://github.com/apache/spark/pull/30497]

> Subexpression elimination for interpreted predicate
> ---
>
> Key: SPARK-33540
> URL: https://issues.apache.org/jira/browse/SPARK-33540
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
> Fix For: 3.1.0
>
>
> We can support subexpression elimination for interpreted predicate.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33562) Improve the style of the checkbox in executor page

2020-11-25 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33562:


Assignee: Gengliang Wang  (was: Apache Spark)

> Improve the style of the checkbox in executor page
> --
>
> Key: SPARK-33562
> URL: https://issues.apache.org/jira/browse/SPARK-33562
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.0.0, 3.0.1
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> The width of class `container-fluid-div` is set as 200px after 
> https://github.com/apache/spark/pull/21688 . This makes the checkbox in the 
> executor page messy.
> We should remove the width style.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33562) Improve the style of the checkbox in executor page

2020-11-25 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17238813#comment-17238813
 ] 

Apache Spark commented on SPARK-33562:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/30500

> Improve the style of the checkbox in executor page
> --
>
> Key: SPARK-33562
> URL: https://issues.apache.org/jira/browse/SPARK-33562
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.0.0, 3.0.1
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> The width of class `container-fluid-div` is set as 200px after 
> https://github.com/apache/spark/pull/21688 . This makes the checkbox in the 
> executor page messy.
> We should remove the width style.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33562) Improve the style of the checkbox in executor page

2020-11-25 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33562:


Assignee: Apache Spark  (was: Gengliang Wang)

> Improve the style of the checkbox in executor page
> --
>
> Key: SPARK-33562
> URL: https://issues.apache.org/jira/browse/SPARK-33562
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.0.0, 3.0.1
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Major
>
> The width of class `container-fluid-div` is set as 200px after 
> https://github.com/apache/spark/pull/21688 . This makes the checkbox in the 
> executor page messy.
> We should remove the width style.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33563) Expose inverse hyperbolic trig functions in PySpark and SparkR

2020-11-25 Thread Maciej Szymkiewicz (Jira)

Maciej Szymkiewicz created SPARK-33563:
--

 Summary: Expose inverse hyperbolic trig functions in PySpark and 
SparkR
 Key: SPARK-33563
 URL: https://issues.apache.org/jira/browse/SPARK-33563
 Project: Spark
  Issue Type: Improvement
  Components: PySpark, SparkR, SQL
Affects Versions: 3.1.0
Reporter: Maciej Szymkiewicz


{{acosh}}, {{asinh}} and {{atanh}} were exposed in Scala {{sql.functions}} in 
Spark 3.1 (SPARK-33061).

For consistency, we should expose these in Python and R as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33562) Improve the style of the checkbox in executor page

2020-11-25 Thread Gengliang Wang (Jira)

Gengliang Wang created SPARK-33562:
--

 Summary: Improve the style of the checkbox in executor page
 Key: SPARK-33562
 URL: https://issues.apache.org/jira/browse/SPARK-33562
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Affects Versions: 3.0.1, 3.0.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang


The width of class `container-fluid-div` is set as 200px after 
https://github.com/apache/spark/pull/21688 . This makes the checkbox in the 
executor page messy.
We should remove the width style.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32165) SessionState leaks SparkListener with multiple SparkSession

2020-11-25 Thread Vikas Kushwaha (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17238799#comment-17238799
 ] 

Vikas Kushwaha commented on SPARK-32165:


gentle ping.

> SessionState leaks SparkListener with multiple SparkSession
> ---
>
> Key: SPARK-32165
> URL: https://issues.apache.org/jira/browse/SPARK-32165
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xianjin YE
>Priority: Major
>
> Copied from 
> [https://github.com/apache/spark/pull/28128#issuecomment-653102770]
>  
> {code:java}
>   test("SPARK-31354: SparkContext only register one SparkSession 
> ApplicationEnd listener") {
> val conf = new SparkConf()
>   .setMaster("local")
>   .setAppName("test-app-SPARK-31354-1")
> val context = new SparkContext(conf)
> SparkSession
>   .builder()
>   .sparkContext(context)
>   .master("local")
>   .getOrCreate()
>   .sessionState // this touches the sessionState
> val postFirstCreation = context.listenerBus.listeners.size()
> SparkSession.clearActiveSession()
> SparkSession.clearDefaultSession()
> SparkSession
>   .builder()
>   .sparkContext(context)
>   .master("local")
>   .getOrCreate()
>   .sessionState // this touches the sessionState
> val postSecondCreation = context.listenerBus.listeners.size()
> SparkSession.clearActiveSession()
> SparkSession.clearDefaultSession()
> assert(postFirstCreation == postSecondCreation)
>   }
> {code}
> The problem can be reproduced by the above code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26645) CSV infer schema bug infers decimal(9,-1)

2020-11-25 Thread Punit Shah (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-26645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17238798#comment-17238798
 ] 

Punit Shah commented on SPARK-26645:


Hello [~dongjoon] If we can get this PR then this would be tremendously helpful.

> CSV infer schema bug infers decimal(9,-1)
> -
>
> Key: SPARK-26645
> URL: https://issues.apache.org/jira/browse/SPARK-26645
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Ohad Raviv
>Assignee: Marco Gaido
>Priority: Minor
> Fix For: 3.0.0
>
>
> we have a file /tmp/t1/file.txt that contains only one line "1.18927098E9".
> running:
> {code:python}
> df = spark.read.csv('/tmp/t1', header=False, inferSchema=True, sep='\t')
> print df.dtypes
> {code}
> causes:
> {noformat}
> ValueError: Could not parse datatype: decimal(9,-1)
> {noformat}
> I'm not sure where the bug is - inferSchema or dtypes?
> I saw it is legal to have a decimal with negative scale in the code 
> (CSVInferSchema.scala):
> {code:python}
> if (bigDecimal.scale <= 0) {
> // `DecimalType` conversion can fail when
> //   1. The precision is bigger than 38.
> //   2. scale is bigger than precision.
> DecimalType(bigDecimal.precision, bigDecimal.scale)
>   } 
> {code}
> but what does it mean?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33559) Column pruning with monotonically_increasing_id

2020-11-25 Thread Gaetan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gaetan updated SPARK-33559:
---
Affects Version/s: (was: 3.0.1)
   2.4.0

> Column pruning with monotonically_increasing_id
> ---
>
> Key: SPARK-33559
> URL: https://issues.apache.org/jira/browse/SPARK-33559
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Gaetan
>Priority: Minor
>
> {code:java}
> df = ss.read.parquet("/path/to/parquet/dataset") 
> df.select("partnerid").withColumn("index", 
> sf.monotonically_increasing_id()).explain(True){code}
>  {{We should expect to only read partnerid from parquet dataset but we 
> actually read the whole dataset:}}
> {code:java}
> ... == Physical Plan == Project [partnerid#6794, 
> monotonically_increasing_id() AS index#24939L] +- FileScan parquet 
> [impression_id#6550,arbitrage_id#6551,display_timestamp#6552L,requesttimestamputc#6553,affiliateid#6554,amp_adrequest_type#6555,app_id#6556,app_name#6557,appnexus_viewability#6558,apxpagevertical#6559,arbitrage_time#6560,banner_type#6561,display_type_int#6562,has_multiple_display_types#6563,bannerid#6564,bid_app_id_hash#6565,bid_url_domain_hash#6566,bidding_details#6567,bid_level_core#6568,bidrandomization_user_factor#6569,bidrandomization_user_mu#6570,bidrandomization_user_sigma#6571,big_lastrequesttimestampsession#6572,big_nbrequestaffiliatesession#6573,...
>  566 more fields] ...{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33496) Improve error message of ANSI explicit cast

2020-11-25 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-33496.

Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30440
[https://github.com/apache/spark/pull/30440]

> Improve error message of ANSI explicit cast
> ---
>
> Key: SPARK-33496
> URL: https://issues.apache.org/jira/browse/SPARK-33496
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.1.0
>
>
> After https://github.com/apache/spark/pull/30260, there are some type 
> conversions disallowed under ANSI mode.
> To make it more user-friendly, we should tell users what they can do if they 
> have to use the casting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33561) distinguish table properties and options in V2 catalog API

2020-11-25 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-33561:
---

Assignee: Ryan Blue

> distinguish table properties and options in V2 catalog API
> --
>
> Key: SPARK-33561
> URL: https://issues.apache.org/jira/browse/SPARK-33561
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Wenchen Fan
>Assignee: Ryan Blue
>Priority: Major
> Fix For: 3.1.0
>
>
> Right now, user-specified table properties and options are merged and passed 
> to `TableCatalog.createTable` as table properties, because v2 catalog does 
> not have the table options concept.
> We need to distinguish table properties and options if we want to use v2 
> catalog API to implement hive catalog. We can add an `option.` prefix to 
> table options when merging it into table properties. For backward 
> compatibility, we can also merge the original table options without the 
> `option.` prefix into table properties as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31257) Unify create table syntax to fix ambiguous two different CREATE TABLE syntaxes

2020-11-25 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-31257:
---

Assignee: Ryan Blue

> Unify create table syntax to fix ambiguous two different CREATE TABLE syntaxes
> --
>
> Key: SPARK-31257
> URL: https://issues.apache.org/jira/browse/SPARK-31257
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Jungtaek Lim
>Assignee: Ryan Blue
>Priority: Major
> Fix For: 3.1.0
>
>
> There's a discussion in dev@ mailing list to point out ambiguous syntaxes for 
> CREATE TABLE DDL. This issue tracks the efforts to resolve the root issue via 
> unifying the create table syntax.
> [https://lists.apache.org/thread.html/rf1acfaaa3de2d3129575199c28e7d529d38f2783e7d3c5be2ac8923d%40%3Cdev.spark.apache.org%3E]
> We should ensure the new "single" create table syntax is very deterministic 
> to both devs and end users.
> This work also includes how to pass the extra CREATE TABLE information(from 
> the Hive tyle CREATE TABLE) to the v2 catalog:
> 1.  turn the Hive serde info to table properties with `option.` prefix.
> 2. add a new v2 table option `external` to indicate CREATE EXTERNAL TABLE.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33561) distinguish table properties and options in V2 catalog API

2020-11-25 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33561:


Assignee: Apache Spark

> distinguish table properties and options in V2 catalog API
> --
>
> Key: SPARK-33561
> URL: https://issues.apache.org/jira/browse/SPARK-33561
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>Priority: Major
>
> Right now, user-specified table properties and options are merged and passed 
> to `TableCatalog.createTable` as table properties, because v2 catalog does 
> not have the table options concept.
> We need to distinguish table properties and options if we want to use v2 
> catalog API to implement hive catalog. We can add an `option.` prefix to 
> table options when merging it into table properties. For backward 
> compatibility, we can also merge the original table options without the 
> `option.` prefix into table properties as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31257) Unify create table syntax to fix ambiguous two different CREATE TABLE syntaxes

2020-11-25 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-31257.
-
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 28026
[https://github.com/apache/spark/pull/28026]

> Unify create table syntax to fix ambiguous two different CREATE TABLE syntaxes
> --
>
> Key: SPARK-31257
> URL: https://issues.apache.org/jira/browse/SPARK-31257
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Jungtaek Lim
>Priority: Major
> Fix For: 3.1.0
>
>
> There's a discussion in dev@ mailing list to point out ambiguous syntaxes for 
> CREATE TABLE DDL. This issue tracks the efforts to resolve the root issue via 
> unifying the create table syntax.
> [https://lists.apache.org/thread.html/rf1acfaaa3de2d3129575199c28e7d529d38f2783e7d3c5be2ac8923d%40%3Cdev.spark.apache.org%3E]
> We should ensure the new "single" create table syntax is very deterministic 
> to both devs and end users.
> This work also includes how to pass the extra CREATE TABLE information(from 
> the Hive tyle CREATE TABLE) to the v2 catalog:
> 1.  turn the Hive serde info to table properties with `option.` prefix.
> 2. add a new v2 table option `external` to indicate CREATE EXTERNAL TABLE.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33561) distinguish table properties and options in V2 catalog API

2020-11-25 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-33561.
-
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 28026
[https://github.com/apache/spark/pull/28026]

> distinguish table properties and options in V2 catalog API
> --
>
> Key: SPARK-33561
> URL: https://issues.apache.org/jira/browse/SPARK-33561
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Wenchen Fan
>Priority: Major
> Fix For: 3.1.0
>
>
> Right now, user-specified table properties and options are merged and passed 
> to `TableCatalog.createTable` as table properties, because v2 catalog does 
> not have the table options concept.
> We need to distinguish table properties and options if we want to use v2 
> catalog API to implement hive catalog. We can add an `option.` prefix to 
> table options when merging it into table properties. For backward 
> compatibility, we can also merge the original table options without the 
> `option.` prefix into table properties as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33561) distinguish table properties and options in V2 catalog API

2020-11-25 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33561:


Assignee: (was: Apache Spark)

> distinguish table properties and options in V2 catalog API
> --
>
> Key: SPARK-33561
> URL: https://issues.apache.org/jira/browse/SPARK-33561
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Wenchen Fan
>Priority: Major
>
> Right now, user-specified table properties and options are merged and passed 
> to `TableCatalog.createTable` as table properties, because v2 catalog does 
> not have the table options concept.
> We need to distinguish table properties and options if we want to use v2 
> catalog API to implement hive catalog. We can add an `option.` prefix to 
> table options when merging it into table properties. For backward 
> compatibility, we can also merge the original table options without the 
> `option.` prefix into table properties as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33561) distinguish table properties and options in V2 catalog API

2020-11-25 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17238781#comment-17238781
 ] 

Apache Spark commented on SPARK-33561:
--

User 'rdblue' has created a pull request for this issue:
https://github.com/apache/spark/pull/28026

> distinguish table properties and options in V2 catalog API
> --
>
> Key: SPARK-33561
> URL: https://issues.apache.org/jira/browse/SPARK-33561
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Wenchen Fan
>Priority: Major
>
> Right now, user-specified table properties and options are merged and passed 
> to `TableCatalog.createTable` as table properties, because v2 catalog does 
> not have the table options concept.
> We need to distinguish table properties and options if we want to use v2 
> catalog API to implement hive catalog. We can add an `option.` prefix to 
> table options when merging it into table properties. For backward 
> compatibility, we can also merge the original table options without the 
> `option.` prefix into table properties as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31257) Unify create table syntax to fix ambiguous two different CREATE TABLE syntaxes

2020-11-25 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-31257:

Description: 
There's a discussion in dev@ mailing list to point out ambiguous syntaxes for 
CREATE TABLE DDL. This issue tracks the efforts to resolve the root issue via 
unifying the create table syntax.

[https://lists.apache.org/thread.html/rf1acfaaa3de2d3129575199c28e7d529d38f2783e7d3c5be2ac8923d%40%3Cdev.spark.apache.org%3E]

We should ensure the new "single" create table syntax is very deterministic to 
both devs and end users.

This work also includes how to pass the extra CREATE TABLE information(from the 
Hive tyle CREATE TABLE) to the v2 catalog:

1.  turn the Hive serde info to table properties with `option.` prefix.

2. add a new v2 table option `external` to indicate CREATE EXTERNAL TABLE.

  was:
There's a discussion in dev@ mailing list to point out ambiguous syntaxes for 
CREATE TABLE DDL. This issue tracks the efforts to resolve the root issue via 
unifying the create table syntax.

https://lists.apache.org/thread.html/rf1acfaaa3de2d3129575199c28e7d529d38f2783e7d3c5be2ac8923d%40%3Cdev.spark.apache.org%3E

We should ensure the new "single" create table syntax is very deterministic to 
both devs and end users.


> Unify create table syntax to fix ambiguous two different CREATE TABLE syntaxes
> --
>
> Key: SPARK-31257
> URL: https://issues.apache.org/jira/browse/SPARK-31257
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> There's a discussion in dev@ mailing list to point out ambiguous syntaxes for 
> CREATE TABLE DDL. This issue tracks the efforts to resolve the root issue via 
> unifying the create table syntax.
> [https://lists.apache.org/thread.html/rf1acfaaa3de2d3129575199c28e7d529d38f2783e7d3c5be2ac8923d%40%3Cdev.spark.apache.org%3E]
> We should ensure the new "single" create table syntax is very deterministic 
> to both devs and end users.
> This work also includes how to pass the extra CREATE TABLE information(from 
> the Hive tyle CREATE TABLE) to the v2 catalog:
> 1.  turn the Hive serde info to table properties with `option.` prefix.
> 2. add a new v2 table option `external` to indicate CREATE EXTERNAL TABLE.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33561) distinguish table properties and options in V2 catalog API

2020-11-25 Thread Wenchen Fan (Jira)

Wenchen Fan created SPARK-33561:
---

 Summary: distinguish table properties and options in V2 catalog API
 Key: SPARK-33561
 URL: https://issues.apache.org/jira/browse/SPARK-33561
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.1.0
Reporter: Wenchen Fan


Right now, user-specified table properties and options are merged and passed to 
`TableCatalog.createTable` as table properties, because v2 catalog does not 
have the table options concept.

We need to distinguish table properties and options if we want to use v2 
catalog API to implement hive catalog. We can add an `option.` prefix to table 
options when merging it into table properties. For backward compatibility, we 
can also merge the original table options without the `option.` prefix into 
table properties as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-29302) dynamic partition overwrite with speculation enabled

2020-11-25 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-29302:
---

Assignee: Du Ripeng

> dynamic partition overwrite with speculation enabled
> 
>
> Key: SPARK-29302
> URL: https://issues.apache.org/jira/browse/SPARK-29302
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: feiwang
>Assignee: Du Ripeng
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> Now, for a dynamic partition overwrite operation,  the filename of a task 
> output is determinable.
> So, if speculation is enabled,  would a task conflict with  its relative 
> speculation task?
> Would the two tasks concurrent write a same file?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-27194) Job failures when task attempts do not clean up spark-staging parquet files

2020-11-25 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-27194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-27194:
---

Assignee: Du Ripeng

> Job failures when task attempts do not clean up spark-staging parquet files
> ---
>
> Key: SPARK-27194
> URL: https://issues.apache.org/jira/browse/SPARK-27194
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 2.3.1, 2.3.2, 2.3.3
>Reporter: Reza Safi
>Assignee: Du Ripeng
>Priority: Major
> Fix For: 3.1.0
>
>
> When a container fails for some reason (for example when killed by yarn for 
> exceeding memory limits), the subsequent task attempts for the tasks that 
> were running on that container all fail with a FileAlreadyExistsException. 
> The original task attempt does not seem to successfully call abortTask (or at 
> least its "best effort" delete is unsuccessful) and clean up the parquet file 
> it was writing to, so when later task attempts try to write to the same 
> spark-staging directory using the same file name, the job fails.
> Here is what transpires in the logs:
> The container where task 200.0 is running is killed and the task is lost:
> {code}
> 19/02/20 09:33:25 ERROR cluster.YarnClusterScheduler: Lost executor y on 
> t.y.z.com: Container killed by YARN for exceeding memory limits. 8.1 GB of 8 
> GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
>  19/02/20 09:33:25 WARN scheduler.TaskSetManager: Lost task 200.0 in stage 
> 0.0 (TID xxx, t.y.z.com, executor 93): ExecutorLostFailure (executor 93 
> exited caused by one of the running tasks) Reason: Container killed by YARN 
> for exceeding memory limits. 8.1 GB of 8 GB physical memory used. Consider 
> boosting spark.yarn.executor.memoryOverhead.
> {code}
> The task is re-attempted on a different executor and fails because the 
> part-00200-blah-blah.c000.snappy.parquet file from the first task attempt 
> already exists:
> {code}
> 19/02/20 09:35:01 WARN scheduler.TaskSetManager: Lost task 200.1 in stage 0.0 
> (TID 594, tn.y.z.com, executor 70): org.apache.spark.SparkException: Task 
> failed while writing rows.
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:285)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:197)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:196)
>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>  at org.apache.spark.scheduler.Task.run(Task.scala:109)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745)
>  Caused by: org.apache.hadoop.fs.FileAlreadyExistsException: 
> /user/hive/warehouse/tmp_supply_feb1/.spark-staging-blah-blah-blah/dt=2019-02-17/part-00200-blah-blah.c000.snappy.parquet
>  for client a.b.c.d already exists
> {code}
> The job fails when the the configured task attempts (spark.task.maxFailures) 
> have failed with the same error:
> {code}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 200 
> in stage 0.0 failed 20 times, most recent failure: Lost task 284.19 in stage 
> 0.0 (TID yyy, tm.y.z.com, executor 16): org.apache.spark.SparkException: Task 
> failed while writing rows.
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:285)
>  ...
>  Caused by: org.apache.hadoop.fs.FileAlreadyExistsException: 
> /user/hive/warehouse/tmp_supply_feb1/.spark-staging-blah-blah-blah/dt=2019-02-17/part-00200-blah-blah.c000.snappy.parquet
>  for client i.p.a.d already exists
> {code}
> SPARK-26682 wasn't the root cause here, since there wasn't any stage 
> reattempt.
> This issue seems to happen when 
> spark.sql.sources.partitionOverwriteMode=dynamic. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33560) Add "unused import" check to Maven compilation process

2020-11-25 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-33560:
-
Summary: Add "unused import" check to Maven compilation process  (was: Add 
"unused import" check to Maven compilation check)

> Add "unused import" check to Maven compilation process
> --
>
> Key: SPARK-33560
> URL: https://issues.apache.org/jira/browse/SPARK-33560
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Yang Jie
>Priority: Minor
>
> Similar to SPARK-33441, need add "unused import" check  to maven pom.
> The blocker is how to achieve the same effect as SBT compiler check, It seems 
> that adding "-P:silencer:globalFilters=.*deprecated.*" configuration to 
> "scala-maven-plugin" is not supported at present



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33560) Add "unused import" check to Maven compilation check

2020-11-25 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-33560:
-
Summary: Add "unused import" check to Maven compilation check  (was: Add 
"unused import" check during Maven compilation)

> Add "unused import" check to Maven compilation check
> 
>
> Key: SPARK-33560
> URL: https://issues.apache.org/jira/browse/SPARK-33560
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Yang Jie
>Priority: Minor
>
> Similar to SPARK-33441, need add "unused import" check  to maven pom.
> The blocker is how to achieve the same effect as SBT compiler check, It seems 
> that adding "-P:silencer:globalFilters=.*deprecated.*" configuration to 
> "scala-maven-plugin" is not supported at present



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33560) Add "unused import" check during Maven compilation

2020-11-25 Thread Yang Jie (Jira)

Yang Jie created SPARK-33560:


 Summary: Add "unused import" check during Maven compilation
 Key: SPARK-33560
 URL: https://issues.apache.org/jira/browse/SPARK-33560
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.1.0
Reporter: Yang Jie


Similar to SPARK-33441, need add "unused import" check  to maven pom.

The blocker is how to achieve the same effect as SBT compiler check, It seems 
that adding "-P:silencer:globalFilters=.*deprecated.*" configuration to 
"scala-maven-plugin" is not supported at present



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Inbox (4) | New Cloud Notification

2020-11-25 Thread CLOUD-SPARK . APACHE . ORG



Dear User4 New documents assigned to 'ISSUES@SPARK.APACHE.ORG ' are available on SPARK.APACHE.ORG CLOUDclick here to retrieve document(s) now

Powered by
SPARK.APACHE.ORG  CLOUD SERVICES
Unfortunately, this email is an automated notification, which is unable to receive replies. 

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33559) Column pruning with monotonically_increasing_id

2020-11-25 Thread Gaetan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gaetan updated SPARK-33559:
---
Description: 
{code:java}
df = ss.read.parquet("/path/to/parquet/dataset") 
df.select("partnerid").withColumn("index", 
sf.monotonically_increasing_id()).explain(True){code}

 {{We should expect to only read partnerid from parquet dataset but we actually 
read the whole dataset:}}


{code:java}
... == Physical Plan == Project [partnerid#6794, monotonically_increasing_id() 
AS index#24939L] +- FileScan parquet 
[impression_id#6550,arbitrage_id#6551,display_timestamp#6552L,requesttimestamputc#6553,affiliateid#6554,amp_adrequest_type#6555,app_id#6556,app_name#6557,appnexus_viewability#6558,apxpagevertical#6559,arbitrage_time#6560,banner_type#6561,display_type_int#6562,has_multiple_display_types#6563,bannerid#6564,bid_app_id_hash#6565,bid_url_domain_hash#6566,bidding_details#6567,bid_level_core#6568,bidrandomization_user_factor#6569,bidrandomization_user_mu#6570,bidrandomization_user_sigma#6571,big_lastrequesttimestampsession#6572,big_nbrequestaffiliatesession#6573,...
 566 more fields] ...{code}

  was:
{{}}{{}}
{code:java}
df = ss.read.parquet("/path/to/parquet/dataset") 
df.select("partnerid").withColumn("index", 
sf.monotonically_increasing_id()).explain(True){code}
{{}}
{{We should expect to only read partnerid from parquet dataset but we actually 
read the whole dataset:}}
{{}}
{code:java}
... == Physical Plan == Project [partnerid#6794, monotonically_increasing_id() 
AS index#24939L] +- FileScan parquet 
[impression_id#6550,arbitrage_id#6551,display_timestamp#6552L,requesttimestamputc#6553,affiliateid#6554,amp_adrequest_type#6555,app_id#6556,app_name#6557,appnexus_viewability#6558,apxpagevertical#6559,arbitrage_time#6560,banner_type#6561,display_type_int#6562,has_multiple_display_types#6563,bannerid#6564,bid_app_id_hash#6565,bid_url_domain_hash#6566,bidding_details#6567,bid_level_core#6568,bidrandomization_user_factor#6569,bidrandomization_user_mu#6570,bidrandomization_user_sigma#6571,big_lastrequesttimestampsession#6572,big_nbrequestaffiliatesession#6573,...
 566 more fields] ...{code}
{{}}


> Column pruning with monotonically_increasing_id
> ---
>
> Key: SPARK-33559
> URL: https://issues.apache.org/jira/browse/SPARK-33559
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Gaetan
>Priority: Minor
>
> {code:java}
> df = ss.read.parquet("/path/to/parquet/dataset") 
> df.select("partnerid").withColumn("index", 
> sf.monotonically_increasing_id()).explain(True){code}
>  {{We should expect to only read partnerid from parquet dataset but we 
> actually read the whole dataset:}}
> {code:java}
> ... == Physical Plan == Project [partnerid#6794, 
> monotonically_increasing_id() AS index#24939L] +- FileScan parquet 
> [impression_id#6550,arbitrage_id#6551,display_timestamp#6552L,requesttimestamputc#6553,affiliateid#6554,amp_adrequest_type#6555,app_id#6556,app_name#6557,appnexus_viewability#6558,apxpagevertical#6559,arbitrage_time#6560,banner_type#6561,display_type_int#6562,has_multiple_display_types#6563,bannerid#6564,bid_app_id_hash#6565,bid_url_domain_hash#6566,bidding_details#6567,bid_level_core#6568,bidrandomization_user_factor#6569,bidrandomization_user_mu#6570,bidrandomization_user_sigma#6571,big_lastrequesttimestampsession#6572,big_nbrequestaffiliatesession#6573,...
>  566 more fields] ...{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33559) Column pruning with monotonically_increasing_id

2020-11-25 Thread Gaetan (Jira)

Gaetan created SPARK-33559:
--

 Summary: Column pruning with monotonically_increasing_id
 Key: SPARK-33559
 URL: https://issues.apache.org/jira/browse/SPARK-33559
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.0.1
Reporter: Gaetan


{{}}{{}}
{code:java}
df = ss.read.parquet("/path/to/parquet/dataset") 
df.select("partnerid").withColumn("index", 
sf.monotonically_increasing_id()).explain(True){code}
{{}}
{{We should expect to only read partnerid from parquet dataset but we actually 
read the whole dataset:}}
{{}}
{code:java}
... == Physical Plan == Project [partnerid#6794, monotonically_increasing_id() 
AS index#24939L] +- FileScan parquet 
[impression_id#6550,arbitrage_id#6551,display_timestamp#6552L,requesttimestamputc#6553,affiliateid#6554,amp_adrequest_type#6555,app_id#6556,app_name#6557,appnexus_viewability#6558,apxpagevertical#6559,arbitrage_time#6560,banner_type#6561,display_type_int#6562,has_multiple_display_types#6563,bannerid#6564,bid_app_id_hash#6565,bid_url_domain_hash#6566,bidding_details#6567,bid_level_core#6568,bidrandomization_user_factor#6569,bidrandomization_user_mu#6570,bidrandomization_user_sigma#6571,big_lastrequesttimestampsession#6572,big_nbrequestaffiliatesession#6573,...
 566 more fields] ...{code}
{{}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-29302) dynamic partition overwrite with speculation enabled

2020-11-25 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-29302.
-
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 29000
[https://github.com/apache/spark/pull/29000]

> dynamic partition overwrite with speculation enabled
> 
>
> Key: SPARK-29302
> URL: https://issues.apache.org/jira/browse/SPARK-29302
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: feiwang
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> Now, for a dynamic partition overwrite operation,  the filename of a task 
> output is determinable.
> So, if speculation is enabled,  would a task conflict with  its relative 
> speculation task?
> Would the two tasks concurrent write a same file?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-27194) Job failures when task attempts do not clean up spark-staging parquet files

2020-11-25 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-27194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-27194.
-
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 29000
[https://github.com/apache/spark/pull/29000]

> Job failures when task attempts do not clean up spark-staging parquet files
> ---
>
> Key: SPARK-27194
> URL: https://issues.apache.org/jira/browse/SPARK-27194
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 2.3.1, 2.3.2, 2.3.3
>Reporter: Reza Safi
>Priority: Major
> Fix For: 3.1.0
>
>
> When a container fails for some reason (for example when killed by yarn for 
> exceeding memory limits), the subsequent task attempts for the tasks that 
> were running on that container all fail with a FileAlreadyExistsException. 
> The original task attempt does not seem to successfully call abortTask (or at 
> least its "best effort" delete is unsuccessful) and clean up the parquet file 
> it was writing to, so when later task attempts try to write to the same 
> spark-staging directory using the same file name, the job fails.
> Here is what transpires in the logs:
> The container where task 200.0 is running is killed and the task is lost:
> {code}
> 19/02/20 09:33:25 ERROR cluster.YarnClusterScheduler: Lost executor y on 
> t.y.z.com: Container killed by YARN for exceeding memory limits. 8.1 GB of 8 
> GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
>  19/02/20 09:33:25 WARN scheduler.TaskSetManager: Lost task 200.0 in stage 
> 0.0 (TID xxx, t.y.z.com, executor 93): ExecutorLostFailure (executor 93 
> exited caused by one of the running tasks) Reason: Container killed by YARN 
> for exceeding memory limits. 8.1 GB of 8 GB physical memory used. Consider 
> boosting spark.yarn.executor.memoryOverhead.
> {code}
> The task is re-attempted on a different executor and fails because the 
> part-00200-blah-blah.c000.snappy.parquet file from the first task attempt 
> already exists:
> {code}
> 19/02/20 09:35:01 WARN scheduler.TaskSetManager: Lost task 200.1 in stage 0.0 
> (TID 594, tn.y.z.com, executor 70): org.apache.spark.SparkException: Task 
> failed while writing rows.
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:285)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:197)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:196)
>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>  at org.apache.spark.scheduler.Task.run(Task.scala:109)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745)
>  Caused by: org.apache.hadoop.fs.FileAlreadyExistsException: 
> /user/hive/warehouse/tmp_supply_feb1/.spark-staging-blah-blah-blah/dt=2019-02-17/part-00200-blah-blah.c000.snappy.parquet
>  for client a.b.c.d already exists
> {code}
> The job fails when the the configured task attempts (spark.task.maxFailures) 
> have failed with the same error:
> {code}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 200 
> in stage 0.0 failed 20 times, most recent failure: Lost task 284.19 in stage 
> 0.0 (TID yyy, tm.y.z.com, executor 16): org.apache.spark.SparkException: Task 
> failed while writing rows.
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:285)
>  ...
>  Caused by: org.apache.hadoop.fs.FileAlreadyExistsException: 
> /user/hive/warehouse/tmp_supply_feb1/.spark-staging-blah-blah-blah/dt=2019-02-17/part-00200-blah-blah.c000.snappy.parquet
>  for client i.p.a.d already exists
> {code}
> SPARK-26682 wasn't the root cause here, since there wasn't any stage 
> reattempt.
> This issue seems to happen when 
> spark.sql.sources.partitionOverwriteMode=dynamic. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33509) List partition by names from V2 tables that support partition management

2020-11-25 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-33509.
-
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30452
[https://github.com/apache/spark/pull/30452]

> List partition by names from V2 tables that support partition management
> 
>
> Key: SPARK-33509
> URL: https://issues.apache.org/jira/browse/SPARK-33509
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.1.0
>
>
> Currently, the SupportsPartitionManagement interface exposes only the 
> listPartitionIdentifiers() method which does not allow to list partition by 
> names. So, it is hard to implement:
> {code:java}
> SHOW PARTITIONS table PARTITION(month=2)
> {code}
> from the table like:
> {code:java}
> CREATE TABLE $table (price int, qty int, year int, month int)
> USING parquet
> partitioned by (year, month)
> {code}
> because listPartitionIdentifiers() requires to specify value for *year* .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33509) List partition by names from V2 tables that support partition management

2020-11-25 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-33509:
---

Assignee: Maxim Gekk

> List partition by names from V2 tables that support partition management
> 
>
> Key: SPARK-33509
> URL: https://issues.apache.org/jira/browse/SPARK-33509
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
>
> Currently, the SupportsPartitionManagement interface exposes only the 
> listPartitionIdentifiers() method which does not allow to list partition by 
> names. So, it is hard to implement:
> {code:java}
> SHOW PARTITIONS table PARTITION(month=2)
> {code}
> from the table like:
> {code:java}
> CREATE TABLE $table (price int, qty int, year int, month int)
> USING parquet
> partitioned by (year, month)
> {code}
> because listPartitionIdentifiers() requires to specify value for *year* .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33558) Unify v1 and v2 ALTER TABLE .. PARTITION tests

2020-11-25 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17238695#comment-17238695
 ] 

Apache Spark commented on SPARK-33558:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/30499

> Unify v1 and v2 ALTER TABLE .. PARTITION tests
> --
>
> Key: SPARK-33558
> URL: https://issues.apache.org/jira/browse/SPARK-33558
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Extract ALTER TABLE .. PARTITION tests to the common place to run them for V1 
> and v2 datasources. Some tests can be places to V1 and V2 specific test 
> suites.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33558) Unify v1 and v2 ALTER TABLE .. PARTITION tests

2020-11-25 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33558:


Assignee: Apache Spark

> Unify v1 and v2 ALTER TABLE .. PARTITION tests
> --
>
> Key: SPARK-33558
> URL: https://issues.apache.org/jira/browse/SPARK-33558
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Assignee: Apache Spark
>Priority: Major
>
> Extract ALTER TABLE .. PARTITION tests to the common place to run them for V1 
> and v2 datasources. Some tests can be places to V1 and V2 specific test 
> suites.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33558) Unify v1 and v2 ALTER TABLE .. PARTITION tests

2020-11-25 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17238694#comment-17238694
 ] 

Apache Spark commented on SPARK-33558:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/30499

> Unify v1 and v2 ALTER TABLE .. PARTITION tests
> --
>
> Key: SPARK-33558
> URL: https://issues.apache.org/jira/browse/SPARK-33558
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Extract ALTER TABLE .. PARTITION tests to the common place to run them for V1 
> and v2 datasources. Some tests can be places to V1 and V2 specific test 
> suites.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33558) Unify v1 and v2 ALTER TABLE .. PARTITION tests

2020-11-25 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33558:


Assignee: (was: Apache Spark)

> Unify v1 and v2 ALTER TABLE .. PARTITION tests
> --
>
> Key: SPARK-33558
> URL: https://issues.apache.org/jira/browse/SPARK-33558
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Extract ALTER TABLE .. PARTITION tests to the common place to run them for V1 
> and v2 datasources. Some tests can be places to V1 and V2 specific test 
> suites.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33558) Unify v1 and v2 ALTER TABLE .. PARTITION tests

2020-11-25 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-33558:
--

 Summary: Unify v1 and v2 ALTER TABLE .. PARTITION tests
 Key: SPARK-33558
 URL: https://issues.apache.org/jira/browse/SPARK-33558
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.1.0
Reporter: Maxim Gekk


Extract ALTER TABLE .. PARTITION tests to the common place to run them for V1 
and v2 datasources. Some tests can be places to V1 and V2 specific test suites.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 >

1 - 100 of 128 matches

Mail list logo