date:20200325

[jira] [Created] (SPARK-31265) Add -XX:MaxDirectMemorySize jvm options in yarn mode

2020-03-25 Thread wangzhun (Jira)

wangzhun created SPARK-31265:


 Summary: Add -XX:MaxDirectMemorySize jvm options in yarn mode
 Key: SPARK-31265
 URL: https://issues.apache.org/jira/browse/SPARK-31265
 Project: Spark
  Issue Type: Improvement
  Components: YARN
Affects Versions: 3.0.0
Reporter: wangzhun


Current memory composition `amMemory` + `amMemoryOverhead`
{code:java}
val capability = Records.newRecord(classOf[Resource])
capability.setMemory(amMemory + amMemoryOverhead)
capability.setVirtualCores(amCores)
if (amResources.nonEmpty) {
 ResourceRequestHelper.setResourceRequests(amResources, capability)
}
logDebug(s"Created resource capability for AM request: $capability")
{code}
{code:java}
 // Add Xmx for AM memory 
javaOpts += "-Xmx" + amMemory + "m"
{code}
It is possible that the physical memory of the container exceeds the limit and 
is killed by yarn.
I suggest setting `-XX:MaxDirectMemorySize` here
{code:java}
// Add Xmx for AM memory
javaOpts += "-Xmx" + amMemory + "m"
javaOpts += s"-XX:MaxDirectMemorySize=${amMemoryOverhead}m"{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31237) Replace 3-letter time zones by zone offsets

2020-03-25 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-31237.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 28001
[https://github.com/apache/spark/pull/28001]

> Replace 3-letter time zones by zone offsets
> ---
>
> Key: SPARK-31237
> URL: https://issues.apache.org/jira/browse/SPARK-31237
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.0.0
>
>
> 3-letter time zones are ambitious, and have been already deprecated in JDK, 
> see [https://docs.oracle.com/javase/8/docs/api/java/util/TimeZone.html] . 
> Also, some short names are mapped to region-based zone IDs, and don't conform 
> to actual definitions. For example, the PST short name is mapped to 
> America/Los_Angeles. It has different zone offsets in Java 7 and Java 8 APIs:
> {code:scala}
> scala> TimeZone.getTimeZone("PST").getOffset(Timestamp.valueOf("2016-11-05 
> 23:00:00").getTime)/360.0
> res11: Double = -7.0
> scala> TimeZone.getTimeZone("PST").getOffset(Timestamp.valueOf("2016-11-06 
> 00:00:00").getTime)/360.0
> res12: Double = -7.0
> scala> TimeZone.getTimeZone("PST").getOffset(Timestamp.valueOf("2016-11-06 
> 01:00:00").getTime)/360.0
> res13: Double = -8.0
> scala> TimeZone.getTimeZone("PST").getOffset(Timestamp.valueOf("2016-11-06 
> 02:00:00").getTime)/360.0
> res14: Double = -8.0
> scala> TimeZone.getTimeZone("PST").getOffset(Timestamp.valueOf("2016-11-06 
> 03:00:00").getTime)/360.0
> res15: Double = -8.0
> {code}
> and in Java 8 API 
> https://github.com/apache/spark/pull/27980#discussion_r396287278
> By definition, PST must be a constant and equals to UTC-08:00, see 
> https://www.timeanddate.com/time/zones/pst
> The ticket aims to replace all short time zone names by zone offsets in tests.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31237) Replace 3-letter time zones by zone offsets

2020-03-25 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-31237:
---

Assignee: Maxim Gekk

> Replace 3-letter time zones by zone offsets
> ---
>
> Key: SPARK-31237
> URL: https://issues.apache.org/jira/browse/SPARK-31237
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
>
> 3-letter time zones are ambitious, and have been already deprecated in JDK, 
> see [https://docs.oracle.com/javase/8/docs/api/java/util/TimeZone.html] . 
> Also, some short names are mapped to region-based zone IDs, and don't conform 
> to actual definitions. For example, the PST short name is mapped to 
> America/Los_Angeles. It has different zone offsets in Java 7 and Java 8 APIs:
> {code:scala}
> scala> TimeZone.getTimeZone("PST").getOffset(Timestamp.valueOf("2016-11-05 
> 23:00:00").getTime)/360.0
> res11: Double = -7.0
> scala> TimeZone.getTimeZone("PST").getOffset(Timestamp.valueOf("2016-11-06 
> 00:00:00").getTime)/360.0
> res12: Double = -7.0
> scala> TimeZone.getTimeZone("PST").getOffset(Timestamp.valueOf("2016-11-06 
> 01:00:00").getTime)/360.0
> res13: Double = -8.0
> scala> TimeZone.getTimeZone("PST").getOffset(Timestamp.valueOf("2016-11-06 
> 02:00:00").getTime)/360.0
> res14: Double = -8.0
> scala> TimeZone.getTimeZone("PST").getOffset(Timestamp.valueOf("2016-11-06 
> 03:00:00").getTime)/360.0
> res15: Double = -8.0
> {code}
> and in Java 8 API 
> https://github.com/apache/spark/pull/27980#discussion_r396287278
> By definition, PST must be a constant and equals to UTC-08:00, see 
> https://www.timeanddate.com/time/zones/pst
> The ticket aims to replace all short time zone names by zone offsets in tests.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31223) Update py code to generate data in testsuites

2020-03-25 Thread zhengruifeng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengruifeng resolved SPARK-31223.
--
Resolution: Fixed

> Update py code to generate data in testsuites
> -
>
> Key: SPARK-31223
> URL: https://issues.apache.org/jira/browse/SPARK-31223
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.1.0
>Reporter: zhengruifeng
>Assignee: Huaxin Gao
>Priority: Trivial
>
> in FValueTestSuite/ANOVASelectorSuite/ANOVATestSuite/...:
>  
> can not regenerate the test datasets with given py code (like 
> {color:#676773}X = np.random.rand(20, 6){color}), so:
> 1, directly create X like : X = np.array(...);
> 2, or, set a seed at first;
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31223) Update py code to generate data in testsuites

2020-03-25 Thread zhengruifeng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengruifeng reassigned SPARK-31223:


Assignee: Huaxin Gao  (was: zhengruifeng)

> Update py code to generate data in testsuites
> -
>
> Key: SPARK-31223
> URL: https://issues.apache.org/jira/browse/SPARK-31223
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.1.0
>Reporter: zhengruifeng
>Assignee: Huaxin Gao
>Priority: Trivial
>
> in FValueTestSuite/ANOVASelectorSuite/ANOVATestSuite/...:
>  
> can not regenerate the test datasets with given py code (like 
> {color:#676773}X = np.random.rand(20, 6){color}), so:
> 1, directly create X like : X = np.array(...);
> 2, or, set a seed at first;
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31223) Update py code to generate data in testsuites

2020-03-25 Thread zhengruifeng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengruifeng reassigned SPARK-31223:


Assignee: zhengruifeng

> Update py code to generate data in testsuites
> -
>
> Key: SPARK-31223
> URL: https://issues.apache.org/jira/browse/SPARK-31223
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.1.0
>Reporter: zhengruifeng
>Assignee: zhengruifeng
>Priority: Trivial
>
> in FValueTestSuite/ANOVASelectorSuite/ANOVATestSuite/...:
>  
> can not regenerate the test datasets with given py code (like 
> {color:#676773}X = np.random.rand(20, 6){color}), so:
> 1, directly create X like : X = np.array(...);
> 2, or, set a seed at first;
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-30623) Spark external shuffle allow disable of separate event loop group

2020-03-25 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-30623:
-

Assignee: Yuanjian Li

> Spark external shuffle allow disable of separate event loop group
> -
>
> Key: SPARK-30623
> URL: https://issues.apache.org/jira/browse/SPARK-30623
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.0.0
>Reporter: Thomas Graves
>Assignee: Yuanjian Li
>Priority: Major
> Fix For: 3.0.0
>
>
> In SPARK-24355 changes were made to add a separate event loop group for 
> processing ChunkFetchRequests , this  allow fors the other threads to handle 
> regular connection requests when the configuration value is set. This however 
> seems to have added some latency (see pr 22173 comments at the end).
> To help with this we could make sure the secondary event loop group isn't 
> used when the configuration of 
> spark.shuffle.server.chunkFetchHandlerThreadsPercent isn't explicitly set. 
> This should result in getting the same behavior as before.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-30623) Spark external shuffle allow disable of separate event loop group

2020-03-25 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-30623.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 27665
[https://github.com/apache/spark/pull/27665]

> Spark external shuffle allow disable of separate event loop group
> -
>
> Key: SPARK-30623
> URL: https://issues.apache.org/jira/browse/SPARK-30623
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.0.0
>Reporter: Thomas Graves
>Priority: Major
> Fix For: 3.0.0
>
>
> In SPARK-24355 changes were made to add a separate event loop group for 
> processing ChunkFetchRequests , this  allow fors the other threads to handle 
> regular connection requests when the configuration value is set. This however 
> seems to have added some latency (see pr 22173 comments at the end).
> To help with this we could make sure the secondary event loop group isn't 
> used when the configuration of 
> spark.shuffle.server.chunkFetchHandlerThreadsPercent isn't explicitly set. 
> This should result in getting the same behavior as before.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31264) Repartition by dynamic partition columns before insert table

2020-03-25 Thread Yuming Wang (Jira)

Yuming Wang created SPARK-31264:
---

 Summary: Repartition by dynamic partition columns before insert 
table
 Key: SPARK-31264
 URL: https://issues.apache.org/jira/browse/SPARK-31264
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.1.0
Reporter: Yuming Wang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-30623) Spark external shuffle allow disable of separate event loop group

2020-03-25 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-30623:
--
Affects Version/s: (was: 2.4.4)

> Spark external shuffle allow disable of separate event loop group
> -
>
> Key: SPARK-30623
> URL: https://issues.apache.org/jira/browse/SPARK-30623
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.0.0
>Reporter: Thomas Graves
>Priority: Major
>
> In SPARK-24355 changes were made to add a separate event loop group for 
> processing ChunkFetchRequests , this  allow fors the other threads to handle 
> regular connection requests when the configuration value is set. This however 
> seems to have added some latency (see pr 22173 comments at the end).
> To help with this we could make sure the secondary event loop group isn't 
> used when the configuration of 
> spark.shuffle.server.chunkFetchHandlerThreadsPercent isn't explicitly set. 
> This should result in getting the same behavior as before.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30623) Spark external shuffle allow disable of separate event loop group

2020-03-25 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17067359#comment-17067359
 ] 

Dongjoon Hyun commented on SPARK-30623:
---

Hi, [~tgraves]. Since SPARK-24355 is fixed at 3.0.0, I removed `2.4.4` from the 
`Affected Version`. Please let me know if I'm wrong.

> Spark external shuffle allow disable of separate event loop group
> -
>
> Key: SPARK-30623
> URL: https://issues.apache.org/jira/browse/SPARK-30623
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Thomas Graves
>Priority: Major
>
> In SPARK-24355 changes were made to add a separate event loop group for 
> processing ChunkFetchRequests , this  allow fors the other threads to handle 
> regular connection requests when the configuration value is set. This however 
> seems to have added some latency (see pr 22173 comments at the end).
> To help with this we could make sure the secondary event loop group isn't 
> used when the configuration of 
> spark.shuffle.server.chunkFetchHandlerThreadsPercent isn't explicitly set. 
> This should result in getting the same behavior as before.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31263) Enable yarn shuffle service close the idle connections

2020-03-25 Thread feiwang (Jira)

feiwang created SPARK-31263:
---

 Summary: Enable yarn shuffle service close the idle connections
 Key: SPARK-31263
 URL: https://issues.apache.org/jira/browse/SPARK-31263
 Project: Spark
  Issue Type: Improvement
  Components: Shuffle
Affects Versions: 3.1.0
Reporter: feiwang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31262) Test case import another test case contains bracketed comments, can't display bracketed comments in golden files well.

2020-03-25 Thread jiaan.geng (Jira)

jiaan.geng created SPARK-31262:
--

 Summary: Test case import another test case contains bracketed 
comments, can't display bracketed comments in golden files well.
 Key: SPARK-31262
 URL: https://issues.apache.org/jira/browse/SPARK-31262
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.0
Reporter: jiaan.geng


The content of 
{code:java}
nested-comments.sql 
{code} show below:

{code:java}
-- This test case just used to test imported bracketed comments.

-- the first case of bracketed comment
--QUERY-DELIMITER-START
/* This is the first example of bracketed comment.
SELECT 'ommented out content' AS first;
*/
SELECT 'selected content' AS first;
--QUERY-DELIMITER-END
{code}
The test case 
{code:java}
comments.sql 
{code} imports 
{code:java}
nested-comments.sql
{code}
 below:

{code:java}
--IMPORT nested-comments.sql
{code}

The output will be:

{code:java}
-- !query
/* This is the first example of bracketed comment.
SELECT 'ommented out content' AS first
-- !query schema
struct<>
-- !query output
org.apache.spark.sql.catalyst.parser.ParseException

mismatched input '/' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 
'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 
'DROP', 
'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 
'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 
'REVOKE', '
ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 
'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0)

== SQL ==
/* This is the first example of bracketed comment.
^^^
SELECT 'ommented out content' AS first


-- !query
*/
SELECT 'selected content' AS first
-- !query schema
struct<>
-- !query output
org.apache.spark.sql.catalyst.parser.ParseException

extraneous input '*/' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 
'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 
'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 
'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 
'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 
'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 
0)

== SQL ==
*/
^^^
SELECT 'selected content' AS first
{code}





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-30923) Spark MLlib, GraphX 3.0 QA umbrella

2020-03-25 Thread zhengruifeng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengruifeng resolved SPARK-30923.
--
Resolution: Fixed

> Spark MLlib, GraphX 3.0 QA umbrella
> ---
>
> Key: SPARK-30923
> URL: https://issues.apache.org/jira/browse/SPARK-30923
> Project: Spark
>  Issue Type: Umbrella
>  Components: Documentation, GraphX, ML, MLlib, PySpark
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Priority: Blocker
>
> Description
>  This JIRA lists tasks for the next Spark release's QA period for MLlib and 
> GraphX. *SparkR is separate.
> The list below gives an overview of what is involved, and the corresponding 
> JIRA issues are linked below that.
> h2. API
>  * Check binary API compatibility for Scala/Java
>  * Audit new public APIs (from the generated html doc)
>  ** Scala
>  ** Java compatibility
>  ** Python coverage
>  * Check Experimental, DeveloperApi tags
> h2. Algorithms and performance
>  * Performance tests
> h2. Documentation and example code
>  * For new algorithms, create JIRAs for updating the user guide sections & 
> examples
>  * Update Programming Guide
>  * Update website



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31260) How to speed up WholeStageCodegen in Spark SQL Query?

2020-03-25 Thread HongJin (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HongJin updated SPARK-31260:

Description: 
It's took about 2mins for one 248 MB file. 2 files ~ 5 mins How can I tune or 
maximize the performance.

Initialize spark as below:

{{.setMaster(numCores)
 .set("spark.driver.host", "localhost")
 .set("spark.executor.cores","2")
 .set("spark.num.executors","2")
 .set("spark.executor.memory", "4g")
 .set("spark.dynamicAllocation.enabled", "true")
 .set("spark.dynamicAllocation.minExecutors","2")
 .set("spark.dynamicAllocation.maxExecutors","2")
 .set("spark.ui.enabled","true")
 .set("spark.sql.shuffle.partitions",defaultPartitions)}}

{{}}

{{joinedDf = upperCaseLeft.as("l")
 .join(upperCaseRight.as("r"), caseTransformedKeys, "full_outer")
 .select(compositeKeysCol ::: nonKeyCols.map(col => 
mapHelper(col,toleranceValue,caseSensitive)): _*)}}

{{}}

{{}}

{{}}

{{data = joinedDf.take(1000)}}

{{}}

[https://i.stack.imgur.com/oeYww.png]{{}}

 

 

 

 

== Parsed Logical Plan ==
GlobalLimit 5
+- LocalLimit 5
 +- Project [COL1#155, CASE WHEN (isnull(COL2#98) && isnull(COL2#114)) THEN 
[null] WHEN isnull(COL2#98) THEN concat([null]<==>, COL2#114) WHEN 
isnull(COL2#114) THEN concat(COL2#98, <==>[null]) WHEN ((upper(COL2#98) = 
upper(COL2#114)) && true) THEN concat(, COL2#98) WHEN (abs((cast(COL2#98 as 
double) - cast(COL2#114 as double))) <= 0.1) THEN concat(COL2#98, , 
COL2#114) ELSE concat(COL2#98, <==>, COL2#114) END AS COL2#171, CASE WHEN 
(isnull(COL3#99) && isnull(COL3#115)) THEN [null] WHEN isnull(COL3#99) THEN 
concat([null]<==>, COL3#115) WHEN isnull(COL3#115) THEN concat(COL3#99, 
<==>[null]) WHEN ((upper(COL3#99) = upper(COL3#115)) && true) THEN concat(, 
COL3#99) WHEN (abs((cast(COL3#99 as double) - cast(COL3#115 as double))) <= 
0.1) THEN concat(COL3#99, , COL3#115) ELSE concat(COL3#99, <==>, COL3#115) 
END AS COL3#172, CASE WHEN (isnull(COL4#100) && isnull(COL4#116)) THEN [null] 
WHEN isnull(COL4#100) THEN concat([null]<==>, COL4#116) WHEN isnull(COL4#116) 
THEN concat(COL4#100, <==>[null]) WHEN ((upper(COL4#100) = upper(COL4#116)) && 
true) THEN concat(, COL4#100) WHEN (abs((cast(COL4#100 as double) - 
cast(COL4#116 as double))) <= 0.1) THEN concat(COL4#100, , COL4#116) ELSE 
concat(COL4#100, <==>, COL4#116) END AS COL4#173, CASE WHEN (isnull(COL5#101) 
&& isnull(COL5#117)) THEN [null] WHEN isnull(COL5#101) THEN concat([null]<==>, 
COL5#117) WHEN isnull(COL5#117) THEN concat(COL5#101, <==>[null]) WHEN 
((upper(COL5#101) = upper(COL5#117)) && true) THEN concat(, COL5#101) WHEN 
(abs((cast(COL5#101 as double) - cast(COL5#117 as double))) <= 0.1) THEN 
concat(COL5#101, , COL5#117) ELSE concat(COL5#101, <==>, COL5#117) END AS 
COL5#174, CASE WHEN (isnull(COL6#102) && isnull(COL6#118)) THEN [null] WHEN 
isnull(COL6#102) THEN concat([null]<==>, COL6#118) WHEN isnull(COL6#118) THEN 
concat(COL6#102, <==>[null]) WHEN ((upper(COL6#102) = upper(COL6#118)) && true) 
THEN concat(, COL6#102) WHEN (abs((cast(COL6#102 as double) - cast(COL6#118 as 
double))) <= 0.1) THEN concat(COL6#102, , COL6#118) ELSE concat(COL6#102, 
<==>, COL6#118) END AS COL6#175, CASE WHEN (isnull(COL7#103) && 
isnull(COL7#119)) THEN [null] WHEN isnull(COL7#103) THEN concat([null]<==>, 
COL7#119) WHEN isnull(COL7#119) THEN concat(COL7#103, <==>[null]) WHEN 
((upper(COL7#103) = upper(COL7#119)) && true) THEN concat(, COL7#103) WHEN 
(abs((cast(COL7#103 as double) - cast(COL7#119 as double))) <= 0.1) THEN 
concat(COL7#103, , COL7#119) ELSE concat(COL7#103, <==>, COL7#119) END AS 
COL7#176, CASE WHEN (isnull(COL8#104) && isnull(COL8#120)) THEN [null] WHEN 
isnull(COL8#104) THEN concat([null]<==>, COL8#120) WHEN isnull(COL8#120) THEN 
concat(COL8#104, <==>[null]) WHEN ((upper(COL8#104) = upper(COL8#120)) && true) 
THEN concat(, COL8#104) WHEN (abs((cast(COL8#104 as double) - cast(COL8#120 as 
double))) <= 0.1) THEN concat(COL8#104, , COL8#120) ELSE concat(COL8#104, 
<==>, COL8#120) END AS COL8#177]
 +- Project [coalesce(COL1#97, COL1#113) AS COL1#155, COL2#98, COL3#99, 
COL4#100, COL5#101, COL6#102, COL7#103, COL8#104, COL2#114, COL3#115, COL4#116, 
COL5#117, COL6#118, COL7#119, COL8#120]
 +- Join FullOuter, (COL1#97 = COL1#113)
 :- SubqueryAlias `l`
 : +- ResolvedHint (broadcast)
 : +- Project [col1#10 AS COL1#97, col2#11 AS COL2#98, col3#12 AS COL3#99, 
col4#13 AS COL4#100, col5#14 AS COL5#101, col6#15 AS COL6#102, col7#16 AS 
COL7#103, col8#17 AS COL8#104]
 : +- Project [col1#10, col2#11, col3#12, col4#13, col5#14, col6#15, col7#16, 
col8#17]
 : +- Relation[col1#10,col2#11,col3#12,col4#13,col5#14,col6#15,col7#16,col8#17] 
csv
 +- SubqueryAlias `r`
 +- ResolvedHint (broadcast)
 +- Project [col1#36 AS COL1#113, col2#37 AS COL2#114, col3#38 AS COL3#115, 
col4#39 AS COL4#116, col5#40 AS COL5#117, col6#41 AS COL6#118, col7#42 AS 
COL7#119, col8#43 AS COL8#120]
 +- Project [col1#36, col2#37, col3#38, col4#39, col5#40, col6#41, col7#42, 
co

[jira] [Created] (SPARK-31261) Avoid npe when reading bad csv input with `columnNameCorruptRecord` specified

2020-03-25 Thread Zhenhua Wang (Jira)

Zhenhua Wang created SPARK-31261:


 Summary: Avoid npe when reading bad csv input with 
`columnNameCorruptRecord` specified
 Key: SPARK-31261
 URL: https://issues.apache.org/jira/browse/SPARK-31261
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.5, 3.0.0, 3.1.0
Reporter: Zhenhua Wang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31260) How to speed up WholeStageCodegen in Spark SQL Query?

2020-03-25 Thread HongJin (Jira)

HongJin created SPARK-31260:
---

 Summary: How to speed up WholeStageCodegen in Spark SQL Query?
 Key: SPARK-31260
 URL: https://issues.apache.org/jira/browse/SPARK-31260
 Project: Spark
  Issue Type: Question
  Components: Spark Core
Affects Versions: 2.4.4
Reporter: HongJin


It's took about 2mins for one 248 MB file. 2 files ~ 5 mins How can I tune or 
maximize the performance.

Initialize spark as below:

{{.setMaster(numCores)
.set("spark.driver.host", "localhost")
.set("spark.executor.cores","2")
.set("spark.num.executors","2")
.set("spark.executor.memory", "4g")
.set("spark.dynamicAllocation.enabled", "true")
.set("spark.dynamicAllocation.minExecutors","2")
.set("spark.dynamicAllocation.maxExecutors","2")
.set("spark.ui.enabled","true")
.set("spark.sql.shuffle.partitions",defaultPartitions)}}

{{}}

{{joinedDf = upperCaseLeft.as("l")
  .join(upperCaseRight.as("r"), caseTransformedKeys, "full_outer")
  .select(compositeKeysCol ::: nonKeyCols.map(col => 
mapHelper(col,toleranceValue,caseSensitive)): _*)}}

{{}}

{{}}

{{}}

{{data = joinedDf.take(1000)}}

{{}}

[https://i.stack.imgur.com/oeYww.png]{{}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30466) remove dependency on jackson-mapper-asl-1.9.13 and jackson-core-asl-1.9.13

2020-03-25 Thread Nicholas Marion (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17067300#comment-17067300
 ] 

Nicholas Marion commented on SPARK-30466:
-

It is worth noting that the following dependencies rely on codehaus jackson:

Apache Hadoop, fixed in 3.x versions with 
[https://github.com/apache/hadoop/commit/67d9f2808efb34b9a7b0b824cb4033b95ad33474#diff-e2c362dd211f462f1f629e34af05f497]

 

Apache parquet-mq, fixed in 1.11.0 with 
[https://github.com/apache/parquet-mr/commit/47398be76cfb6634000532e9432430c4676442dd#diff-c6f127eb650758aad91ecf02a2e52add]

 

Apache Avro, fixed in 1.9.x with 
[https://github.com/apache/avro/commit/95234db14b7afca9593829f43c41a9851e08dcd7#diff-f5fe6838f0d551a0e3bca3774778b2eb]

 

Apache Hive, fixed in 3.x with 
[https://github.com/apache/hive/commit/245c39b4c8f711fbc1c9c00df013e4c7fcbdc0a2]


Apache Hadoop 3.x versions are supported within Spark 2.4.x
Apache parquet-mq, appears to be a simple upgrade in pom.xml

Apache Avro, required a little more than a simple upgrade in pom.xml; but was 
still simple.

Apache Hive 2.x was recently added to Spark 3.x, with 
[https://github.com/apache/spark/commit/c98e5eb3396a6db92f2420e743afa9ddff319ca2]

bu upgrading to Hive 3.x was not as straightforward and will likely require a 
lot more work.

Once these 4 dependencies have been updated; we should be out of using the 
vulnerable codehaus-jackson jars.

 

> remove dependency on jackson-mapper-asl-1.9.13 and jackson-core-asl-1.9.13
> --
>
> Key: SPARK-30466
> URL: https://issues.apache.org/jira/browse/SPARK-30466
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Michael Burgener
>Priority: Major
>  Labels: security
>
> These 2 libraries are deprecated and replaced by the jackson-databind 
> libraries which are already included.  These two libraries are flagged by our 
> vulnerability scanners as having the following security vulnerabilities.  
> I've set the priority to Major due to the Critical nature and hopefully they 
> can be addressed quickly.  Please note, I'm not a developer but work in 
> InfoSec and this was flagged when we incorporated spark into our product.  If 
> you feel the priority is not set correctly please change accordingly.  I'll 
> watch the issue and flag our dev team to update once resolved.  
> jackson-mapper-asl-1.9.13
> CVE-2018-7489 (CVSS 3.0 Score 9.8 CRITICAL)
> [https://nvd.nist.gov/vuln/detail/CVE-2018-7489] 
>  
> CVE-2017-7525 (CVSS 3.0 Score 9.8 CRITICAL)
> [https://nvd.nist.gov/vuln/detail/CVE-2017-7525]
>  
> CVE-2017-17485 (CVSS 3.0 Score 9.8 CRITICAL)
> [https://nvd.nist.gov/vuln/detail/CVE-2017-17485]
>  
> CVE-2017-15095 (CVSS 3.0 Score 9.8 CRITICAL)
> [https://nvd.nist.gov/vuln/detail/CVE-2017-15095]
>  
> CVE-2018-5968 (CVSS 3.0 Score 8.1 High)
> [https://nvd.nist.gov/vuln/detail/CVE-2018-5968]
>  
> jackson-core-asl-1.9.13
> CVE-2016-7051 (CVSS 3.0 Score 8.6 High)
> https://nvd.nist.gov/vuln/detail/CVE-2016-7051



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31259) Fix log error of curRequestSize in ShuffleBlockFetcherIterator

2020-03-25 Thread wuyi (Jira)

wuyi created SPARK-31259:


 Summary: Fix log error of curRequestSize in 
ShuffleBlockFetcherIterator
 Key: SPARK-31259
 URL: https://issues.apache.org/jira/browse/SPARK-31259
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.0.0, 3.1.0
Reporter: wuyi


The log of curRequestSize is incorrect. Because curRequestSize may be the total 
size of several group blocks but we use it for each group.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31258) sbt unidoc fail to resolve arvo dependency

2020-03-25 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-31258:
-
Summary: sbt unidoc fail to resolve arvo dependency  (was: sbt unidoc fail 
to resolving arvo dependency)

> sbt unidoc fail to resolve arvo dependency
> --
>
> Key: SPARK-31258
> URL: https://issues.apache.org/jira/browse/SPARK-31258
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Kent Yao
>Priority: Major
>
> {code:java}
> warn] Multiple main classes detected.  Run 'show discoveredMainClasses' to 
> see the list
> [warn] Multiple main classes detected.  Run 'show discoveredMainClasses' to 
> see the list
> [info] Main Scala API documentation to 
> /home/jenkins/workspace/SparkPullRequestBuilder@6/target/scala-2.12/unidoc...
> [info] Main Java API documentation to 
> /home/jenkins/workspace/SparkPullRequestBuilder@6/target/javaunidoc...
> [error] 
> /home/jenkins/workspace/SparkPullRequestBuilder@6/core/src/main/scala/org/apache/spark/serializer/GenericAvroSerializer.scala:123:
>  value createDatumWriter is not a member of 
> org.apache.avro.generic.GenericData
> [error] writerCache.getOrElseUpdate(schema, 
> GenericData.get.createDatumWriter(schema))
> [error] ^
> [info] No documentation generated with unsuccessful compiler run
> [error] one error found
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31258) sbt unidoc fail to resolving arvo dependency

2020-03-25 Thread Kent Yao (Jira)

Kent Yao created SPARK-31258:


 Summary: sbt unidoc fail to resolving arvo dependency
 Key: SPARK-31258
 URL: https://issues.apache.org/jira/browse/SPARK-31258
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.1.0
Reporter: Kent Yao



{code:java}
warn] Multiple main classes detected.  Run 'show discoveredMainClasses' to see 
the list
[warn] Multiple main classes detected.  Run 'show discoveredMainClasses' to see 
the list
[info] Main Scala API documentation to 
/home/jenkins/workspace/SparkPullRequestBuilder@6/target/scala-2.12/unidoc...
[info] Main Java API documentation to 
/home/jenkins/workspace/SparkPullRequestBuilder@6/target/javaunidoc...
[error] 
/home/jenkins/workspace/SparkPullRequestBuilder@6/core/src/main/scala/org/apache/spark/serializer/GenericAvroSerializer.scala:123:
 value createDatumWriter is not a member of org.apache.avro.generic.GenericData
[error] writerCache.getOrElseUpdate(schema, 
GenericData.get.createDatumWriter(schema))
[error] ^
[info] No documentation generated with unsuccessful compiler run
[error] one error found
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30095) create function syntax has to be enhance in Doc for multiple dependent jars

2020-03-25 Thread Huaxin Gao (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17067221#comment-17067221
 ] 

Huaxin Gao commented on SPARK-30095:


[~abhishek.akg] Any update on this?

> create function syntax has to be enhance in Doc for multiple dependent jars 
> 
>
> Key: SPARK-30095
> URL: https://issues.apache.org/jira/browse/SPARK-30095
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 3.0.0
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Minor
>
> Create Function Example and Syntax has to be enhance as below
> 1. Case 1: How to use multiple dependent jars in the path while creating 
> function is not clear. -- Syntax to be given
> 2. Case 2: What are the different schema supported like file:/// is not 
> updated in doc - Supported Schema to be provided



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31257) Fix ambiguous two different CREATE TABLE syntaxes

2020-03-25 Thread Jungtaek Lim (Jira)

Jungtaek Lim created SPARK-31257:


 Summary: Fix ambiguous two different CREATE TABLE syntaxes
 Key: SPARK-31257
 URL: https://issues.apache.org/jira/browse/SPARK-31257
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Jungtaek Lim


There's a discussion in dev@ mailing list to point out ambiguous syntaxes for 
CREATE TABLE DDL. This issue tracks the efforts to resolve the problem.

https://lists.apache.org/thread.html/rf1acfaaa3de2d3129575199c28e7d529d38f2783e7d3c5be2ac8923d%40%3Cdev.spark.apache.org%3E

Note that the priority of this issue is set to blocker as the ambiguity is 
brought by SPARK-30098 which will be shipped in Spark 3.0.0; before we ship 
SPARK-30098 we should fix the syntax and ensure the syntax is very 
deterministic to both devs and end users.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31256) Dropna doesn't work for struct columns

2020-03-25 Thread Michael Souder (Jira)

Michael Souder created SPARK-31256:
--

 Summary: Dropna doesn't work for struct columns
 Key: SPARK-31256
 URL: https://issues.apache.org/jira/browse/SPARK-31256
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 2.4.5
 Environment: Spark 2.4.5

Python 3.7.4
Reporter: Michael Souder


Dropna using a subset with a column from a struct drops the entire data frame.
{code:python}
import pyspark.sql.functions as F

df = spark.createDataFrame([(5, 80, 'Alice'), (10, None, 'Bob'), (15, 80, 
None)], schema=['age', 'height', 'name'])
df.show()
+---+--+-+
|age|height| name|
+---+--+-+
|  5|80|Alice|
| 10|  null|  Bob|
| 15|80| null|
+---+--+-+

# this works just fine
df.dropna(subset=['name']).show()
+---+--+-+
|age|height| name|
+---+--+-+
|  5|80|Alice|
| 10|  null|  Bob|
+---+--+-+

# now add a struct column
df_with_struct = df.withColumn('struct_col', F.struct('age', 'height', 'name'))
df_with_struct.show(truncate=False)
+---+--+-+--+
|age|height|name |struct_col|
+---+--+-+--+
|5  |80|Alice|[5, 80, Alice]|
|10 |null  |Bob  |[10,, Bob]|
|15 |80|null |[15, 80,] |
+---+--+-+--+

# now dropna drops the whole dataframe when you use struct_col
df_with_struct.dropna(subset=['struct_col.name']).show(truncate=False)
+---+--++--+
|age|height|name|struct_col|
+---+--++--+
+---+--++--+
{code}
 I've tested the above code in Spark 2.4.4 with python 3.7.4 and Spark 2.3.1 
with python 3.6.8 and in both, the result looks like:
{code:python}
df_with_struct.dropna(subset=['struct_col.name']).show(truncate=False)
+---+--+-+--+
|age|height|name |struct_col|
+---+--+-+--+
|5  |80|Alice|[5, 80, Alice]|
|10 |null  |Bob  |[10,, Bob]|
+---+--+-+--+
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31255) DataSourceV2: Add metadata columns

2020-03-25 Thread Ryan Blue (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Blue updated SPARK-31255:
--
Issue Type: New Feature  (was: Bug)

> DataSourceV2: Add metadata columns
> --
>
> Key: SPARK-31255
> URL: https://issues.apache.org/jira/browse/SPARK-31255
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Ryan Blue
>Priority: Major
>
> DSv2 should support reading additional metadata columns that are not in a 
> table's schema. This allows users to project metadata like Kafka's offset, 
> timestamp, and partition. It also allows other sources to expose metadata 
> like file and row position.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31255) DataSourceV2: Add metadata columns

2020-03-25 Thread Ryan Blue (Jira)

Ryan Blue created SPARK-31255:
-

 Summary: DataSourceV2: Add metadata columns
 Key: SPARK-31255
 URL: https://issues.apache.org/jira/browse/SPARK-31255
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Ryan Blue


DSv2 should support reading additional metadata columns that are not in a 
table's schema. This allows users to project metadata like Kafka's offset, 
timestamp, and partition. It also allows other sources to expose metadata like 
file and row position.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31254) `HiveResult.toHiveString` does not use the current session time zone

2020-03-25 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-31254:
--

 Summary: `HiveResult.toHiveString` does not use the current 
session time zone
 Key: SPARK-31254
 URL: https://issues.apache.org/jira/browse/SPARK-31254
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


Currently, date/timestamp formatters in `HiveResult.toHiveString` are 
initialized once on instantiation of the `HiveResult` object, and pick up the 
session time zone. If the sessions time zone is changed, the formatters still 
use the previous one.

See the discussion there 
https://github.com/apache/spark/pull/23391#discussion_r397347820



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31244) Use Minio instead of Ceph in K8S DepsTestsSuite

2020-03-25 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-31244:
-

Assignee: Dongjoon Hyun

> Use Minio instead of Ceph in K8S DepsTestsSuite
> ---
>
> Key: SPARK-31244
> URL: https://issues.apache.org/jira/browse/SPARK-31244
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Tests
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>
> `DepsTestsSuite` is using `ceph` for S3 storage. However, it's not robust on 
> `minikube` version. Also, the image size is almost 1GB. 
> {code}
> ceph/daemon   
> v4.0.3-stable-4.0-nautilus-centos-7-x86_64   a6a05ccdf9246 months ago 
>852MB
> ceph/daemon   
> v4.0.11-stable-4.0-nautilus-centos-7 87f695550d8e12 hours ago 
>901MB
> {code}
> {code}
> $ minikube version
> minikube version: v1.8.2
> $ minikube -p minikube docker-env | source
> $ docker run -it --rm -e NETWORK_AUTO_DETECT=4 -e RGW_FRONTEND_PORT=8000 -e 
> SREE_PORT=5001 -e CEPH_DEMO_UID=nano -e CEPH_DAEMON=demo 
> ceph/daemon:v4.0.3-stable-4.0-nautilus-centos-7-x86_64 /bin/sh
> 2020-03-25 04:26:21  /opt/ceph-container/bin/entrypoint.sh: ERROR- it looks 
> like we have not been able to discover the network settings
> $ docker run -it --rm -e NETWORK_AUTO_DETECT=4 -e RGW_FRONTEND_PORT=8000 -e 
> SREE_PORT=5001 -e CEPH_DEMO_UID=nano -e CEPH_DAEMON=demo 
> ceph/daemon:v4.0.11-stable-4.0-nautilus-centos-7 /bin/sh
> 2020-03-25 04:20:30  /opt/ceph-container/bin/entrypoint.sh: ERROR- it looks 
> like we have not been able to discover the network settings
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31244) Use Minio instead of Ceph in K8S DepsTestsSuite

2020-03-25 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-31244.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 28015
[https://github.com/apache/spark/pull/28015]

> Use Minio instead of Ceph in K8S DepsTestsSuite
> ---
>
> Key: SPARK-31244
> URL: https://issues.apache.org/jira/browse/SPARK-31244
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Tests
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.0.0
>
>
> `DepsTestsSuite` is using `ceph` for S3 storage. However, it's not robust on 
> `minikube` version. Also, the image size is almost 1GB. 
> {code}
> ceph/daemon   
> v4.0.3-stable-4.0-nautilus-centos-7-x86_64   a6a05ccdf9246 months ago 
>852MB
> ceph/daemon   
> v4.0.11-stable-4.0-nautilus-centos-7 87f695550d8e12 hours ago 
>901MB
> {code}
> {code}
> $ minikube version
> minikube version: v1.8.2
> $ minikube -p minikube docker-env | source
> $ docker run -it --rm -e NETWORK_AUTO_DETECT=4 -e RGW_FRONTEND_PORT=8000 -e 
> SREE_PORT=5001 -e CEPH_DEMO_UID=nano -e CEPH_DAEMON=demo 
> ceph/daemon:v4.0.3-stable-4.0-nautilus-centos-7-x86_64 /bin/sh
> 2020-03-25 04:26:21  /opt/ceph-container/bin/entrypoint.sh: ERROR- it looks 
> like we have not been able to discover the network settings
> $ docker run -it --rm -e NETWORK_AUTO_DETECT=4 -e RGW_FRONTEND_PORT=8000 -e 
> SREE_PORT=5001 -e CEPH_DEMO_UID=nano -e CEPH_DAEMON=demo 
> ceph/daemon:v4.0.11-stable-4.0-nautilus-centos-7 /bin/sh
> 2020-03-25 04:20:30  /opt/ceph-container/bin/entrypoint.sh: ERROR- it looks 
> like we have not been able to discover the network settings
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31249) Flaky Test: CoarseGrainedSchedulerBackendSuite.custom log url for Spark UI is applied

2020-03-25 Thread Xingbo Jiang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066939#comment-17066939
 ] 

Xingbo Jiang commented on SPARK-31249:
--

I can't reproduce this failure, maybe the ExecutorAdded event has been lost?

> Flaky Test: CoarseGrainedSchedulerBackendSuite.custom log url for Spark UI is 
> applied
> -
>
> Key: SPARK-31249
> URL: https://issues.apache.org/jira/browse/SPARK-31249
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120302/testReport/
> {code}
> sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: 2 did 
> not equal 3
>   at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530)
>   at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529)
>   at 
> org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
>   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:503)
>   at 
> org.apache.spark.scheduler.CoarseGrainedSchedulerBackendSuite.$anonfun$new$11(CoarseGrainedSchedulerBackendSuite.scala:186)
>   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:151)
>   at 
> org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184)
>   at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:286)
>   at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196)
>   at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31246) GracefulShutdown does not work when application is terminated from RestSubmissionClient or YarnClient

2020-03-25 Thread Xiao Li (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-31246:

Priority: Major  (was: Blocker)

> GracefulShutdown does not work when application is terminated from 
> RestSubmissionClient or YarnClient
> -
>
> Key: SPARK-31246
> URL: https://issues.apache.org/jira/browse/SPARK-31246
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 2.4.3
> Environment: spark-2.4.3
>Reporter: Gajanan Hebbar
>Priority: Major
>
> While starting the Spark Application 
> "*spark.streaming.stopGracefullyOnShutdown*" is set to true
> try to terminate the application programatically using JAVA API
> 1 using RestSubmissionClient client = new RestSubmissionClient(masterUrl);
>              SubmitRestProtocolResponse statusResponse = 
> client.killSubmission(submissionId);  
>  
> 2. using getYarnClient().killApplication(appId);
>  
> In both the cases Application dose not stop gracefully
>  
> But killing the Application using  
>  
> Kill -SIGTERM   will shutdown the application gracefully.
> Expected : Application should have terminated gracefully in all cases when 
> spark.streaming.stopGracefullyOnShutdown is set



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31133) fix sql ref doc for DML

2020-03-25 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-31133.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

This is resolved via https://github.com/apache/spark/pull/27891

> fix sql ref doc for DML
> ---
>
> Key: SPARK-31133
> URL: https://issues.apache.org/jira/browse/SPARK-31133
> Project: Spark
>  Issue Type: Documentation
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31253) add metrics to shuffle reader

2020-03-25 Thread Wenchen Fan (Jira)

Wenchen Fan created SPARK-31253:
---

 Summary: add metrics to shuffle reader
 Key: SPARK-31253
 URL: https://issues.apache.org/jira/browse/SPARK-31253
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Wenchen Fan
Assignee: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31181) Remove the default value assumption on CREATE TABLE test cases

2020-03-25 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-31181:
--
Parent: SPARK-31085
Issue Type: Sub-task  (was: Improvement)

> Remove the default value assumption on CREATE TABLE test cases
> --
>
> Key: SPARK-31181
> URL: https://issues.apache.org/jira/browse/SPARK-31181
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-31136) Revert SPARK-30098 Use default datasource as provider for CREATE TABLE syntax

2020-03-25 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun closed SPARK-31136.
-

> Revert SPARK-30098 Use default datasource as provider for CREATE TABLE syntax
> -
>
> Key: SPARK-31136
> URL: https://issues.apache.org/jira/browse/SPARK-31136
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
>
> We need to consider the behavior change of SPARK-30098 .
> This is a placeholder to keep the discussion and the final decision.
> `CREATE TABLE` syntax changes its behavior silently.
> The following is one example of the breaking the existing user data pipelines.
> *Apache Spark 2.4.5*
> {code}
> spark-sql> CREATE TABLE t(a STRING);
> spark-sql> LOAD DATA INPATH '/usr/local/spark/README.md' INTO TABLE t;
> spark-sql> SELECT * FROM t LIMIT 1;
> # Apache Spark
> Time taken: 2.05 seconds, Fetched 1 row(s)
> {code}
> {code}
> spark-sql> CREATE TABLE t(a CHAR(3));
> spark-sql> INSERT INTO TABLE t SELECT 'a ';
> spark-sql> SELECT a, length(a) FROM t;
> a 3
> {code}
> *Apache Spark 3.0.0-preview2*
> {code}
> spark-sql> CREATE TABLE t(a STRING);
> spark-sql> LOAD DATA INPATH '/usr/local/spark/README.md' INTO TABLE t;
> Error in query: LOAD DATA is not supported for datasource tables: 
> `default`.`t`;
> {code}
> {code}
> spark-sql> CREATE TABLE t(a CHAR(3));
> spark-sql> INSERT INTO TABLE t SELECT 'a ';
> spark-sql> SELECT a, length(a) FROM t;
> a 2
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31147) forbid CHAR type in non-Hive-Serde tables

2020-03-25 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-31147:
--
Parent: SPARK-31085
Issue Type: Sub-task  (was: Bug)

> forbid CHAR type in non-Hive-Serde tables
> -
>
> Key: SPARK-31147
> URL: https://issues.apache.org/jira/browse/SPARK-31147
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31147) forbid CHAR type in non-Hive-Serde tables

2020-03-25 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-31147.
---
Fix Version/s: 3.1.0
   Resolution: Fixed

This is resolved via https://github.com/apache/spark/pull/27902 in `master` 
branch. We will backport to `branch-3.0`, too.

> forbid CHAR type in non-Hive-Serde tables
> -
>
> Key: SPARK-31147
> URL: https://issues.apache.org/jira/browse/SPARK-31147
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31252) Flaky test: ElementTrackingStoreSuite.asynchronous tracking single-fire

2020-03-25 Thread Gabor Somogyi (Jira)

Gabor Somogyi created SPARK-31252:
-

 Summary: Flaky test: ElementTrackingStoreSuite.asynchronous 
tracking single-fire
 Key: SPARK-31252
 URL: https://issues.apache.org/jira/browse/SPARK-31252
 Project: Spark
  Issue Type: Bug
  Components: Tests
Affects Versions: 3.0.0, 3.1.0
Reporter: Gabor Somogyi


Error Message
org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to 
eventually never returned normally. Attempted 1 times over 230.305107 
milliseconds. Last failure message: false did not equal true.
Stacktrace
sbt.ForkMain$ForkError: 
org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to 
eventually never returned normally. Attempted 1 times over 230.305107 
milliseconds. Last failure message: false did not equal true.
at 
org.scalatest.concurrent.Eventually.tryTryAgain$1(Eventually.scala:432)
at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:439)
at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:391)
at 
org.apache.spark.status.ElementTrackingStoreSuite.eventually(ElementTrackingStoreSuite.scala:31)
at 
org.apache.spark.status.ElementTrackingStoreSuite.$anonfun$new$1(ElementTrackingStoreSuite.scala:64)
at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
at org.scalatest.Transformer.apply(Transformer.scala:20)
at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:151)
at 
org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184)
at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196)
at org.scalatest.SuperEngine.runTestImpl(Engine.scala:286)
at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196)
at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178)
at 
org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:58)
at 
org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:221)
at 
org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:214)
at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:58)
at 
org.scalatest.FunSuiteLike.$anonfun$runTests$1(FunSuiteLike.scala:229)
at 
org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:393)
at scala.collection.immutable.List.foreach(List.scala:392)
at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:381)
at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:376)
at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:458)
at org.scalatest.FunSuiteLike.runTests(FunSuiteLike.scala:229)
at org.scalatest.FunSuiteLike.runTests$(FunSuiteLike.scala:228)
at org.scalatest.FunSuite.runTests(FunSuite.scala:1560)
at org.scalatest.Suite.run(Suite.scala:1124)
at org.scalatest.Suite.run$(Suite.scala:1106)
at 
org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560)
at org.scalatest.FunSuiteLike.$anonfun$run$1(FunSuiteLike.scala:233)
at org.scalatest.SuperEngine.runImpl(Engine.scala:518)
at org.scalatest.FunSuiteLike.run(FunSuiteLike.scala:233)
at org.scalatest.FunSuiteLike.run$(FunSuiteLike.scala:232)
at 
org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:58)
at 
org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:58)
at 
org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317)
at 
org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510)
at sbt.ForkMain$Run$2.call(ForkMain.java:296)
at sbt.ForkMain$Run$2.call(ForkMain.java:286)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: sbt.ForkMain$ForkError: 
org.scalatest.exceptions.TestFailedException: false did not equal true
at 
org.scalatest.MatchersHelper$.indicateFailure(MatchersHelper.scala:343)
at 
org.scalatest.Matchers$AnyShouldWrapper.shouldEqual(Matchers.scala:679

[jira] [Resolved] (SPARK-31196) Server-side processing of History UI list of applications

2020-03-25 Thread Jira



 [ 
https://issues.apache.org/jira/browse/SPARK-31196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavol Vidlička resolved SPARK-31196.

Resolution: Won't Fix

After looking into what would need to be changed to implement server-side 
processing, I think that the cost of implementing it and maintenance outweighs 
the benefits

> Server-side processing of History UI list of applications
> -
>
> Key: SPARK-31196
> URL: https://issues.apache.org/jira/browse/SPARK-31196
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 2.3.0, 2.4.5
>Reporter: Pavol Vidlička
>Priority: Minor
>
> Loading the list of applications in the History UI does not scale well with a 
> large number of applications. Fetching and rendering the list for 10k+ 
> applications takes over a minute (much longer for more applications) and 
> tends to freeze the browser.
> Using `spark.history.ui.maxApplications` is not a great solution, because (as 
> the name implies), it limits the number of applications shown in the UI, 
> which hinders usability of the History Server.
> A solution would be to use server [side processing of the 
> DataTable|https://datatables.net/examples/data_sources/server_side]. This 
> would limit amount of data sent to the client and processed by the browser.
> This proposed change plays nicely with KVStore abstraction implemented in 
> SPARK-18085, which was supposed to solve some of the scalability issues. It 
> could also definitely solve History UI scalability issues reported for 
> example in SPARK-21254, SPARK-17243, SPARK-17671



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31251) Flaky Test: StreamingContextSuite.stop gracefully

2020-03-25 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-31251:


 Summary: Flaky Test: StreamingContextSuite.stop gracefully
 Key: SPARK-31251
 URL: https://issues.apache.org/jira/browse/SPARK-31251
 Project: Spark
  Issue Type: Test
  Components: Tests
Affects Versions: 3.1.0
Reporter: Hyukjin Kwon


https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120337/testReport/

{code}
sbt.ForkMain$ForkError: 
org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to 
eventually never returned normally. Attempted 532 times over 10.00564787199 
seconds. Last failure message: 0 was not greater than 0.
at 
org.scalatest.concurrent.Eventually.tryTryAgain$1(Eventually.scala:432)
at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:439)
at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:391)
at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:479)
at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:308)
at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:307)
at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:479)
at 
org.apache.spark.streaming.StreamingContextSuite.$anonfun$new$33(StreamingContextSuite.scala:312)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158)
at 
org.apache.spark.streaming.StreamingContextSuite.$anonfun$new$32(StreamingContextSuite.scala:300)
at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
at org.scalatest.Transformer.apply(Transformer.scala:20)
at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:151)
at 
org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184)
at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196)
at org.scalatest.SuperEngine.runTestImpl(Engine.scala:286)
at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196)
at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178)
at 
org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:58)
at 
org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:221)
at 
org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:214)
at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:58)
at 
org.scalatest.FunSuiteLike.$anonfun$runTests$1(FunSuiteLike.scala:229)
at 
org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:393)
at scala.collection.immutable.List.foreach(List.scala:392)
at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:381)
at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:376)
at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:458)
at org.scalatest.FunSuiteLike.runTests(FunSuiteLike.scala:229)
at org.scalatest.FunSuiteLike.runTests$(FunSuiteLike.scala:228)
at org.scalatest.FunSuite.runTests(FunSuite.scala:1560)
at org.scalatest.Suite.run(Suite.scala:1124)
at org.scalatest.Suite.run$(Suite.scala:1106)
at 
org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560)
at org.scalatest.FunSuiteLike.$anonfun$run$1(FunSuiteLike.scala:233)
at org.scalatest.SuperEngine.runImpl(Engine.scala:518)
at org.scalatest.FunSuiteLike.run(FunSuiteLike.scala:233)
at org.scalatest.FunSuiteLike.run$(FunSuiteLike.scala:232)
at 
org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:58)
at 
org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:58)
at 
org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317)
at 
org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510)
at sbt.ForkMain$Run$2.call(ForkMain.java:296)
at sbt.ForkMain$Run$2.call(ForkMain.java:286)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at

[jira] [Updated] (SPARK-31248) Flaky Test: ExecutorAllocationManagerSuite.interleaving add and remove

2020-03-25 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-31248:
-
Description: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120300/testReport/

{code}
sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: 12 did 
not equal 8
at 
org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530)
at 
org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529)
at 
org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
at 
org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:503)
at 
org.apache.spark.ExecutorAllocationManagerSuite.$anonfun$new$51(ExecutorAllocationManagerSuite.scala:864)
at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
at org.scalatest.Transformer.apply(Transformer.scala:20)
at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:151)
at 
org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184)
{code}


  was:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120300/testReport/

sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: 12 did 
not equal 8
at 
org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530)
at 
org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529)
at 
org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
at 
org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:503)
at 
org.apache.spark.ExecutorAllocationManagerSuite.$anonfun$new$51(ExecutorAllocationManagerSuite.scala:864)
at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
at org.scalatest.Transformer.apply(Transformer.scala:20)
at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:151)
at 
org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184)


> Flaky Test: ExecutorAllocationManagerSuite.interleaving add and remove
> --
>
> Key: SPARK-31248
> URL: https://issues.apache.org/jira/browse/SPARK-31248
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120300/testReport/
> {code}
> sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: 12 did 
> not equal 8
>   at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530)
>   at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529)
>   at 
> org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
>   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:503)
>   at 
> org.apache.spark.ExecutorAllocationManagerSuite.$anonfun$new$51(ExecutorAllocationManagerSuite.scala:864)
>   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:151)
>   at 
> org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31248) Flaky Test: ExecutorAllocationManagerSuite.interleaving add and remove

2020-03-25 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-31248:
-
Summary: Flaky Test: ExecutorAllocationManagerSuite.interleaving add and 
remove  (was: Flaky Test: 
org.apache.spark.ExecutorAllocationManagerSuite.interleaving add and remove)

> Flaky Test: ExecutorAllocationManagerSuite.interleaving add and remove
> --
>
> Key: SPARK-31248
> URL: https://issues.apache.org/jira/browse/SPARK-31248
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120300/testReport/
> sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: 12 did 
> not equal 8
>   at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530)
>   at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529)
>   at 
> org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
>   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:503)
>   at 
> org.apache.spark.ExecutorAllocationManagerSuite.$anonfun$new$51(ExecutorAllocationManagerSuite.scala:864)
>   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:151)
>   at 
> org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31250) Flaky Test: KafkaDelegationTokenSuite.(It is not a test it is a sbt.testing.SuiteSelector)

2020-03-25 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-31250:


 Summary: Flaky Test: KafkaDelegationTokenSuite.(It is not a test 
it is a sbt.testing.SuiteSelector)
 Key: SPARK-31250
 URL: https://issues.apache.org/jira/browse/SPARK-31250
 Project: Spark
  Issue Type: Test
  Components: Tests
Affects Versions: 3.1.0
Reporter: Hyukjin Kwon


https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120321/testReport/

{code}
sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: Failed to 
create new KafkaAdminClient
at 
org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:451)
at org.apache.kafka.clients.admin.Admin.create(Admin.java:59)
at 
org.apache.kafka.clients.admin.AdminClient.create(AdminClient.java:39)
at 
org.apache.spark.sql.kafka010.KafkaTestUtils.setupEmbeddedKafkaServer(KafkaTestUtils.scala:267)
at 
org.apache.spark.sql.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:290)
at 
org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.beforeAll(KafkaDelegationTokenSuite.scala:49)
at 
org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:58)
at 
org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317)
at 
org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510)
at sbt.ForkMain$Run$2.call(ForkMain.java:296)
at sbt.ForkMain$Run$2.call(ForkMain.java:286)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: 
javax.security.auth.login.LoginException: Client not found in Kerberos database 
(6) - Client not found in Kerberos database
at 
org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:158)
at 
org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:146)
at 
org.apache.kafka.common.network.ChannelBuilders.clientChannelBuilder(ChannelBuilders.java:67)
at 
org.apache.kafka.clients.ClientUtils.createChannelBuilder(ClientUtils.java:99)
at 
org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:426)
... 17 more
Caused by: sbt.ForkMain$ForkError: javax.security.auth.login.LoginException: 
Client not found in Kerberos database (6) - Client not found in Kerberos 
database
at 
com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:804)
at 
com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
at 
javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
at java.security.AccessController.doPrivileged(Native Method)
at 
javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
at javax.security.auth.login.LoginContext.login(LoginContext.java:587)
at 
org.apache.kafka.common.security.authenticator.AbstractLogin.login(AbstractLogin.java:60)
at 
org.apache.kafka.common.security.kerberos.KerberosLogin.login(KerberosLogin.java:103)
at 
org.apache.kafka.common.security.authenticator.LoginManager.(LoginManager.java:62)
at 
org.apache.kafka.common.security.authenticator.LoginManager.acquireLoginManager(LoginManager.java:105)
at 
org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:147)
... 21 more
Caused by: sbt.ForkMain$ForkError: sun.security.krb5.KrbException: Client not 
found in Kerberos database (6) - Client not found in Kerberos database
at sun.security.krb5.KrbAsRep.(KrbAsRep.java:82)
at sun.security.krb5.KrbAsReqBuilder.send(KrbAsReqBuilder.java:316)
at sun.security.krb5.KrbAsReqBuilder.action(KrbAsReqBuilder.java:361)
at 
com.sun.security.auth.module.Krb5Login

[jira] [Created] (SPARK-31249) Flaky Test: CoarseGrainedSchedulerBackendSuite.custom log url for Spark UI is applied

2020-03-25 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-31249:


 Summary: Flaky Test: CoarseGrainedSchedulerBackendSuite.custom log 
url for Spark UI is applied
 Key: SPARK-31249
 URL: https://issues.apache.org/jira/browse/SPARK-31249
 Project: Spark
  Issue Type: Test
  Components: Tests
Affects Versions: 3.1.0
Reporter: Hyukjin Kwon


https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120302/testReport/

{code}
sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: 2 did not 
equal 3
at 
org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530)
at 
org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529)
at 
org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
at 
org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:503)
at 
org.apache.spark.scheduler.CoarseGrainedSchedulerBackendSuite.$anonfun$new$11(CoarseGrainedSchedulerBackendSuite.scala:186)
at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
at org.scalatest.Transformer.apply(Transformer.scala:20)
at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:151)
at 
org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184)
at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196)
at org.scalatest.SuperEngine.runTestImpl(Engine.scala:286)
at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196)
at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31248) Flaky Test: org.apache.spark.ExecutorAllocationManagerSuite.interleaving add and remove

2020-03-25 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-31248:


 Summary: Flaky Test: 
org.apache.spark.ExecutorAllocationManagerSuite.interleaving add and remove
 Key: SPARK-31248
 URL: https://issues.apache.org/jira/browse/SPARK-31248
 Project: Spark
  Issue Type: Test
  Components: Tests
Affects Versions: 3.1.0
Reporter: Hyukjin Kwon


https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120300/testReport/

sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: 12 did 
not equal 8
at 
org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530)
at 
org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529)
at 
org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
at 
org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:503)
at 
org.apache.spark.ExecutorAllocationManagerSuite.$anonfun$new$51(ExecutorAllocationManagerSuite.scala:864)
at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
at org.scalatest.Transformer.apply(Transformer.scala:20)
at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:151)
at 
org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31247) Flaky test: KafkaContinuousSourceSuite.assign from latest offsets (failOnDataLoss: false)

2020-03-25 Thread Gabor Somogyi (Jira)

Gabor Somogyi created SPARK-31247:
-

 Summary: Flaky test: KafkaContinuousSourceSuite.assign from latest 
offsets (failOnDataLoss: false)
 Key: SPARK-31247
 URL: https://issues.apache.org/jira/browse/SPARK-31247
 Project: Spark
  Issue Type: Bug
  Components: Structured Streaming, Tests
Affects Versions: 3.0.0, 3.1.0
Reporter: Gabor Somogyi


Error Message
org.scalatest.exceptions.TestFailedException:  Error adding data: Timeout after 
waiting for 1 ms. 
org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:78)
  
org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:30)
  
org.apache.spark.sql.kafka010.KafkaTestUtils.$anonfun$sendMessages$3(KafkaTestUtils.scala:425)
  scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)  
scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)  
scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)  
scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)  
scala.collection.TraversableLike.map(TraversableLike.scala:238)  
scala.collection.TraversableLike.map$(TraversableLike.scala:231)  
scala.collection.AbstractTraversable.map(Traversable.scala:108)   == Progress 
==AssertOnQuery(, )AddKafkaData(topics = Set(topic-13), data 
= WrappedArray(1, 2, 3), message = )CheckAnswer: [2],[3],[4]StopStream  
  
StartStream(ContinuousTrigger(1000),org.apache.spark.util.SystemClock@1f1a9495,Map(),null)
CheckAnswer: [2],[3],[4]StopStreamAddKafkaData(topics = 
Set(topic-13), data = WrappedArray(4, 5, 6), message = )
StartStream(ContinuousTrigger(1000),org.apache.spark.util.SystemClock@2b3bec2c,Map(),null)
CheckAnswer: [2],[3],[4],[5],[6],[7] => AddKafkaData(topics = 
Set(topic-13), data = WrappedArray(7, 8), message = )CheckAnswer: 
[2],[3],[4],[5],[6],[7],[8],[9]AssertOnQuery(, Add partitions)   
 AddKafkaData(topics = Set(topic-13), data = WrappedArray(9, 10, 11, 12, 13, 
14, 15, 16), message = )CheckAnswer: 
[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12],[13],[14],[15],[16],[17]  == 
Stream == Output Mode: Append Stream state: {KafkaSource[Assign[topic-13-4, 
topic-13-3, topic-13-2, topic-13-1, topic-13-0]]: 
{"topic-13":{"2":2,"4":2,"1":1,"3":1,"0":1}}} Thread state: alive Thread stack 
trace: sun.misc.Unsafe.park(Native Method) 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
 scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:242) 
scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:258) 
scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:187) 
org.apache.spark.util.ThreadUtils$.awaitReady(ThreadUtils.scala:336) 
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:746) 
org.apache.spark.SparkContext.runJob(SparkContext.scala:2104) 
org.apache.spark.SparkContext.runJob(SparkContext.scala:2125) 
org.apache.spark.SparkContext.runJob(SparkContext.scala:2144) 
org.apache.spark.SparkContext.runJob(SparkContext.scala:2169) 
org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1006) 
org.apache.spark.rdd.RDD$$Lambda$2999/724038556.apply(Unknown Source) 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) 
org.apache.spark.rdd.RDD.withScope(RDD.scala:390) 
org.apache.spark.rdd.RDD.collect(RDD.scala:1005) 
org.apache.spark.sql.execution.streaming.continuous.WriteToContinuousDataSourceExec.doExecute(WriteToContinuousDataSourceExec.scala:57)
 
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:175)
 org.apache.spark.sql.execution.SparkPlan$$Lambda$2791/4135277.apply(Unknown 
Source) 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:213)
 org.apache.spark.sql.execution.SparkPlan$$Lambda$2823/504830038.apply(Unknown 
Source) 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:210) 
org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:171) 
org.apache.spark.sql.execution.streaming.continuous.ContinuousExecution.$anonfun$runContinuous$4(ContinuousExecution.scala:256)
 
org.apache.spark.sql.execution.streaming.continuous.ContinuousExecution$$Lambda$2765/297007729.apply(Unknown
 Source) 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.

[jira] [Updated] (SPARK-31228) Add version information to the configuration of Kafka

2020-03-25 Thread Gabor Somogyi (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Somogyi updated SPARK-31228:
--
Description: 
external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/package.scala
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/package.scala

  was: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/package.scala


> Add version information to the configuration of Kafka
> -
>
> Key: SPARK-31228
> URL: https://issues.apache.org/jira/browse/SPARK-31228
> Project: Spark
>  Issue Type: Sub-task
>  Components: DStreams
>Affects Versions: 3.1.0
>Reporter: jiaan.geng
>Priority: Major
>
> external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/package.scala
> external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/package.scala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26341) Expose executor memory metrics at the stage level, in the Stages tab

2020-03-25 Thread angerszhu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-26341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066572#comment-17066572
 ] 

angerszhu commented on SPARK-26341:
---

I have do this in our own version, will raise a pr soon these days.

> Expose executor memory metrics at the stage level, in the Stages tab
> 
>
> Key: SPARK-26341
> URL: https://issues.apache.org/jira/browse/SPARK-26341
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, Web UI
>Affects Versions: 2.4.0
>Reporter: Edward Lu
>Priority: Major
>
> Sub-task SPARK-23431 will add stage level executor memory metrics (peak 
> values for each stage, and peak values for each executor for the stage). This 
> information should also be exposed the the web UI, so that users can see 
> which stages are memory intensive.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31246) GracefulShutdown does not work when application is terminated from RestSubmissionClient or YarnClient

2020-03-25 Thread Gajanan Hebbar (Jira)

Gajanan Hebbar created SPARK-31246:
--

 Summary: GracefulShutdown does not work when application is 
terminated from RestSubmissionClient or YarnClient
 Key: SPARK-31246
 URL: https://issues.apache.org/jira/browse/SPARK-31246
 Project: Spark
  Issue Type: Bug
  Components: Java API
Affects Versions: 2.4.3
 Environment: spark-2.4.3
Reporter: Gajanan Hebbar


While starting the Spark Application 
"*spark.streaming.stopGracefullyOnShutdown*" is set to true

try to terminate the application programatically using JAVA API

1 using RestSubmissionClient client = new RestSubmissionClient(masterUrl);
             SubmitRestProtocolResponse statusResponse = 
client.killSubmission(submissionId);  

 

2. using getYarnClient().killApplication(appId);

 

In both the cases Application dose not stop gracefully

 

But killing the Application using  

 

Kill -SIGTERM   will shutdown the application gracefully.


Expected : Application should have terminated gracefully in all cases when 
spark.streaming.stopGracefullyOnShutdown is set



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31232) Specify formats of `spark.sql.session.timeZone`

2020-03-25 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-31232.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 27999
[https://github.com/apache/spark/pull/27999]

> Specify formats of `spark.sql.session.timeZone`
> ---
>
> Key: SPARK-31232
> URL: https://issues.apache.org/jira/browse/SPARK-31232
> Project: Spark
>  Issue Type: Documentation
>  Components: SQL
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.0.0
>
>
> There are two distinct types of ID (see 
> https://docs.oracle.com/javase/8/docs/api/java/time/ZoneId.html):
> # Fixed offsets - a fully resolved offset from UTC/Greenwich, that uses the 
> same offset for all local date-times
> # Geographical regions - an area where a specific set of rules for finding 
> the offset from UTC/Greenwich apply
> For example three-letter time zone IDs are ambitious, and depend on the 
> locale. They have been already deprecated in JDK, see 
> https://docs.oracle.com/javase/8/docs/api/java/util/TimeZone.html :
> {code}
> For compatibility with JDK 1.1.x, some other three-letter time zone IDs (such 
> as "PST", "CTT", "AST") are also supported. However, their use is deprecated 
> because the same abbreviation is often used for multiple time zones (for 
> example, "CST" could be U.S. "Central Standard Time" and "China Standard 
> Time"), and the Java platform can then only recognize one of them.
> {code}
> The ticket aims to specify formats of the SQL config 
> *spark.sql.session.timeZone* in the 2 forms mentioned above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31232) Specify formats of `spark.sql.session.timeZone`

2020-03-25 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-31232:
---

Assignee: Maxim Gekk

> Specify formats of `spark.sql.session.timeZone`
> ---
>
> Key: SPARK-31232
> URL: https://issues.apache.org/jira/browse/SPARK-31232
> Project: Spark
>  Issue Type: Documentation
>  Components: SQL
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
>
> There are two distinct types of ID (see 
> https://docs.oracle.com/javase/8/docs/api/java/time/ZoneId.html):
> # Fixed offsets - a fully resolved offset from UTC/Greenwich, that uses the 
> same offset for all local date-times
> # Geographical regions - an area where a specific set of rules for finding 
> the offset from UTC/Greenwich apply
> For example three-letter time zone IDs are ambitious, and depend on the 
> locale. They have been already deprecated in JDK, see 
> https://docs.oracle.com/javase/8/docs/api/java/util/TimeZone.html :
> {code}
> For compatibility with JDK 1.1.x, some other three-letter time zone IDs (such 
> as "PST", "CTT", "AST") are also supported. However, their use is deprecated 
> because the same abbreviation is often used for multiple time zones (for 
> example, "CST" could be U.S. "Central Standard Time" and "China Standard 
> Time"), and the Java platform can then only recognize one of them.
> {code}
> The ticket aims to specify formats of the SQL config 
> *spark.sql.session.timeZone* in the 2 forms mentioned above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-31218) counts in BinaryClassificationMetrics should be cached

2020-03-25 Thread CacheCheck (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066474#comment-17066474
 ] 

CacheCheck edited comment on SPARK-31218 at 3/25/20, 8:40 AM:
--

I mean rdd {{counts}} in the lazy val block below {{recallByThreshold}}. It is 
used by counts.count() when generating {{binnedCounts}}, and again by collect() 
when {{binnedCounts}} generates {{agg}}. So I think {{counts}} should be 
persisted, and it can be unpersisted after {{cumulativeCounts}} is persisted.


was (Author: spark_cachecheck):
I mean rdd {{counts}} below the lazy val block below {{recallByThreshold}}. It 
is used by counts.count() when generating {{binnedCounts}}, and again by 
collect() when {{binnedCounts}} generates {{agg}}. So I think {{counts}} should 
be persisted, and it can be unpersisted after {{cumulativeCounts}} is persisted.

> counts in BinaryClassificationMetrics should be cached
> --
>
> Key: SPARK-31218
> URL: https://issues.apache.org/jira/browse/SPARK-31218
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Affects Versions: 2.4.4, 2.4.5
>Reporter: CacheCheck
>Priority: Major
>
> In mllib.evaluation.BinaryClassifcationMetrics.recallByThreshold(), rdd 
> _counts_ should be cached for the following multiple actions will use it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-31218) counts in BinaryClassificationMetrics should be cached

2020-03-25 Thread CacheCheck (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066474#comment-17066474
 ] 

CacheCheck edited comment on SPARK-31218 at 3/25/20, 8:37 AM:
--

I mean rdd {{counts}} below the lazy val block below {{recallByThreshold}}. It 
is used by counts.count() when generating {{binnedCounts}}, and again by 
collect() when {{binnedCounts}} generates {{agg}}. So I think {{counts}} should 
be persisted, and it can be unpersisted after {{cumulativeCounts}} is persisted.


was (Author: spark_cachecheck):
I mean rdd {{counts}} belong in the lazy val block below {{recallByThreshold}}. 
It is used by counts.count() when generating {{binnedCounts}}, and again by 
collect() when {{binnedCounts}} generates {{agg}}. So I think {{counts}} should 
be persisted, and it can be unpersisted after {{cumulativeCounts}} is persisted.

> counts in BinaryClassificationMetrics should be cached
> --
>
> Key: SPARK-31218
> URL: https://issues.apache.org/jira/browse/SPARK-31218
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Affects Versions: 2.4.4, 2.4.5
>Reporter: CacheCheck
>Priority: Major
>
> In mllib.evaluation.BinaryClassifcationMetrics.recallByThreshold(), rdd 
> _counts_ should be cached for the following multiple actions will use it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31218) counts in BinaryClassificationMetrics should be cached

2020-03-25 Thread CacheCheck (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066474#comment-17066474
 ] 

CacheCheck commented on SPARK-31218:


I mean rdd {{counts}} belong in the lazy val block below {{recallByThreshold}}. 
It is used by counts.count() when generating {{binnedCounts}}, and again by 
collect() when {{binnedCounts}} generates {{agg}}. So I think {{counts}} should 
be persisted, and it can be unpersisted after {{cumulativeCounts}} is persisted.

> counts in BinaryClassificationMetrics should be cached
> --
>
> Key: SPARK-31218
> URL: https://issues.apache.org/jira/browse/SPARK-31218
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Affects Versions: 2.4.4, 2.4.5
>Reporter: CacheCheck
>Priority: Major
>
> In mllib.evaluation.BinaryClassifcationMetrics.recallByThreshold(), rdd 
> _counts_ should be cached for the following multiple actions will use it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-30822) Pyspark queries fail if terminated with a semicolon

2020-03-25 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-30822:
---

Assignee: Samuel Setegne

> Pyspark queries fail if terminated with a semicolon
> ---
>
> Key: SPARK-30822
> URL: https://issues.apache.org/jira/browse/SPARK-30822
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 3.0.0
>Reporter: Samuel Setegne
>Assignee: Samuel Setegne
>Priority: Minor
> Fix For: 3.0.0
>
>   Original Estimate: 10m
>  Remaining Estimate: 10m
>
> When a user submits a directly executable SQL statement terminated with a 
> semicolon, they receive a 
> `org.apache.spark.sql.catalyst.parser.ParseException` of `mismatched input 
> ";"`. SQL-92 describes a direct SQL statement as having the format of 
> ` ` and the majority of SQL 
> implementations either require the semicolon as a statement terminator, or 
> make it optional (meaning not raising an exception when it's included, 
> seemingly in recognition that it's a common behavior).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31184) Support getTablesByType API of Hive Client

2020-03-25 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-31184:
--
Fix Version/s: (was: 3.1.0)
   3.0.0

> Support getTablesByType API of Hive Client
> --
>
> Key: SPARK-31184
> URL: https://issues.apache.org/jira/browse/SPARK-31184
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Xin Wu
>Assignee: Xin Wu
>Priority: Major
> Fix For: 3.0.0
>
>
> Hive 2.3+ supports getTablesByType API, which is a precondition to implement 
> SHOW VIEWS in HiveExternalCatalog. Currently, without this API, we can not 
> get hive table with type HiveTableType.VIRTUAL_VIEW directly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31184) Support getTablesByType API of Hive Client

2020-03-25 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-31184:
--
Affects Version/s: (was: 3.1.0)
   3.0.0

> Support getTablesByType API of Hive Client
> --
>
> Key: SPARK-31184
> URL: https://issues.apache.org/jira/browse/SPARK-31184
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xin Wu
>Assignee: Xin Wu
>Priority: Major
> Fix For: 3.0.0
>
>
> Hive 2.3+ supports getTablesByType API, which is a precondition to implement 
> SHOW VIEWS in HiveExternalCatalog. Currently, without this API, we can not 
> get hive table with type HiveTableType.VIRTUAL_VIEW directly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-30822) Pyspark queries fail if terminated with a semicolon

2020-03-25 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-30822.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 27567
[https://github.com/apache/spark/pull/27567]

> Pyspark queries fail if terminated with a semicolon
> ---
>
> Key: SPARK-30822
> URL: https://issues.apache.org/jira/browse/SPARK-30822
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 3.0.0
>Reporter: Samuel Setegne
>Priority: Minor
> Fix For: 3.0.0
>
>   Original Estimate: 10m
>  Remaining Estimate: 10m
>
> When a user submits a directly executable SQL statement terminated with a 
> semicolon, they receive a 
> `org.apache.spark.sql.catalyst.parser.ParseException` of `mismatched input 
> ";"`. SQL-92 describes a direct SQL statement as having the format of 
> ` ` and the majority of SQL 
> implementations either require the semicolon as a statement terminator, or 
> make it optional (meaning not raising an exception when it's included, 
> seemingly in recognition that it's a common behavior).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

58 matches

Mail list logo