[jira] [Created] (SPARK-31265) Add -XX:MaxDirectMemorySize jvm options in yarn mode
wangzhun created SPARK-31265: Summary: Add -XX:MaxDirectMemorySize jvm options in yarn mode Key: SPARK-31265 URL: https://issues.apache.org/jira/browse/SPARK-31265 Project: Spark Issue Type: Improvement Components: YARN Affects Versions: 3.0.0 Reporter: wangzhun Current memory composition `amMemory` + `amMemoryOverhead` {code:java} val capability = Records.newRecord(classOf[Resource]) capability.setMemory(amMemory + amMemoryOverhead) capability.setVirtualCores(amCores) if (amResources.nonEmpty) { ResourceRequestHelper.setResourceRequests(amResources, capability) } logDebug(s"Created resource capability for AM request: $capability") {code} {code:java} // Add Xmx for AM memory javaOpts += "-Xmx" + amMemory + "m" {code} It is possible that the physical memory of the container exceeds the limit and is killed by yarn. I suggest setting `-XX:MaxDirectMemorySize` here {code:java} // Add Xmx for AM memory javaOpts += "-Xmx" + amMemory + "m" javaOpts += s"-XX:MaxDirectMemorySize=${amMemoryOverhead}m"{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31237) Replace 3-letter time zones by zone offsets
[ https://issues.apache.org/jira/browse/SPARK-31237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-31237. - Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 28001 [https://github.com/apache/spark/pull/28001] > Replace 3-letter time zones by zone offsets > --- > > Key: SPARK-31237 > URL: https://issues.apache.org/jira/browse/SPARK-31237 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.0.0 > > > 3-letter time zones are ambitious, and have been already deprecated in JDK, > see [https://docs.oracle.com/javase/8/docs/api/java/util/TimeZone.html] . > Also, some short names are mapped to region-based zone IDs, and don't conform > to actual definitions. For example, the PST short name is mapped to > America/Los_Angeles. It has different zone offsets in Java 7 and Java 8 APIs: > {code:scala} > scala> TimeZone.getTimeZone("PST").getOffset(Timestamp.valueOf("2016-11-05 > 23:00:00").getTime)/360.0 > res11: Double = -7.0 > scala> TimeZone.getTimeZone("PST").getOffset(Timestamp.valueOf("2016-11-06 > 00:00:00").getTime)/360.0 > res12: Double = -7.0 > scala> TimeZone.getTimeZone("PST").getOffset(Timestamp.valueOf("2016-11-06 > 01:00:00").getTime)/360.0 > res13: Double = -8.0 > scala> TimeZone.getTimeZone("PST").getOffset(Timestamp.valueOf("2016-11-06 > 02:00:00").getTime)/360.0 > res14: Double = -8.0 > scala> TimeZone.getTimeZone("PST").getOffset(Timestamp.valueOf("2016-11-06 > 03:00:00").getTime)/360.0 > res15: Double = -8.0 > {code} > and in Java 8 API > https://github.com/apache/spark/pull/27980#discussion_r396287278 > By definition, PST must be a constant and equals to UTC-08:00, see > https://www.timeanddate.com/time/zones/pst > The ticket aims to replace all short time zone names by zone offsets in tests. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31237) Replace 3-letter time zones by zone offsets
[ https://issues.apache.org/jira/browse/SPARK-31237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-31237: --- Assignee: Maxim Gekk > Replace 3-letter time zones by zone offsets > --- > > Key: SPARK-31237 > URL: https://issues.apache.org/jira/browse/SPARK-31237 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > > 3-letter time zones are ambitious, and have been already deprecated in JDK, > see [https://docs.oracle.com/javase/8/docs/api/java/util/TimeZone.html] . > Also, some short names are mapped to region-based zone IDs, and don't conform > to actual definitions. For example, the PST short name is mapped to > America/Los_Angeles. It has different zone offsets in Java 7 and Java 8 APIs: > {code:scala} > scala> TimeZone.getTimeZone("PST").getOffset(Timestamp.valueOf("2016-11-05 > 23:00:00").getTime)/360.0 > res11: Double = -7.0 > scala> TimeZone.getTimeZone("PST").getOffset(Timestamp.valueOf("2016-11-06 > 00:00:00").getTime)/360.0 > res12: Double = -7.0 > scala> TimeZone.getTimeZone("PST").getOffset(Timestamp.valueOf("2016-11-06 > 01:00:00").getTime)/360.0 > res13: Double = -8.0 > scala> TimeZone.getTimeZone("PST").getOffset(Timestamp.valueOf("2016-11-06 > 02:00:00").getTime)/360.0 > res14: Double = -8.0 > scala> TimeZone.getTimeZone("PST").getOffset(Timestamp.valueOf("2016-11-06 > 03:00:00").getTime)/360.0 > res15: Double = -8.0 > {code} > and in Java 8 API > https://github.com/apache/spark/pull/27980#discussion_r396287278 > By definition, PST must be a constant and equals to UTC-08:00, see > https://www.timeanddate.com/time/zones/pst > The ticket aims to replace all short time zone names by zone offsets in tests. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31223) Update py code to generate data in testsuites
[ https://issues.apache.org/jira/browse/SPARK-31223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng resolved SPARK-31223. -- Resolution: Fixed > Update py code to generate data in testsuites > - > > Key: SPARK-31223 > URL: https://issues.apache.org/jira/browse/SPARK-31223 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 3.1.0 >Reporter: zhengruifeng >Assignee: Huaxin Gao >Priority: Trivial > > in FValueTestSuite/ANOVASelectorSuite/ANOVATestSuite/...: > > can not regenerate the test datasets with given py code (like > {color:#676773}X = np.random.rand(20, 6){color}), so: > 1, directly create X like : X = np.array(...); > 2, or, set a seed at first; > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31223) Update py code to generate data in testsuites
[ https://issues.apache.org/jira/browse/SPARK-31223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng reassigned SPARK-31223: Assignee: Huaxin Gao (was: zhengruifeng) > Update py code to generate data in testsuites > - > > Key: SPARK-31223 > URL: https://issues.apache.org/jira/browse/SPARK-31223 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 3.1.0 >Reporter: zhengruifeng >Assignee: Huaxin Gao >Priority: Trivial > > in FValueTestSuite/ANOVASelectorSuite/ANOVATestSuite/...: > > can not regenerate the test datasets with given py code (like > {color:#676773}X = np.random.rand(20, 6){color}), so: > 1, directly create X like : X = np.array(...); > 2, or, set a seed at first; > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31223) Update py code to generate data in testsuites
[ https://issues.apache.org/jira/browse/SPARK-31223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng reassigned SPARK-31223: Assignee: zhengruifeng > Update py code to generate data in testsuites > - > > Key: SPARK-31223 > URL: https://issues.apache.org/jira/browse/SPARK-31223 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 3.1.0 >Reporter: zhengruifeng >Assignee: zhengruifeng >Priority: Trivial > > in FValueTestSuite/ANOVASelectorSuite/ANOVATestSuite/...: > > can not regenerate the test datasets with given py code (like > {color:#676773}X = np.random.rand(20, 6){color}), so: > 1, directly create X like : X = np.array(...); > 2, or, set a seed at first; > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-30623) Spark external shuffle allow disable of separate event loop group
[ https://issues.apache.org/jira/browse/SPARK-30623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-30623: - Assignee: Yuanjian Li > Spark external shuffle allow disable of separate event loop group > - > > Key: SPARK-30623 > URL: https://issues.apache.org/jira/browse/SPARK-30623 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 3.0.0 >Reporter: Thomas Graves >Assignee: Yuanjian Li >Priority: Major > Fix For: 3.0.0 > > > In SPARK-24355 changes were made to add a separate event loop group for > processing ChunkFetchRequests , this allow fors the other threads to handle > regular connection requests when the configuration value is set. This however > seems to have added some latency (see pr 22173 comments at the end). > To help with this we could make sure the secondary event loop group isn't > used when the configuration of > spark.shuffle.server.chunkFetchHandlerThreadsPercent isn't explicitly set. > This should result in getting the same behavior as before. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-30623) Spark external shuffle allow disable of separate event loop group
[ https://issues.apache.org/jira/browse/SPARK-30623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-30623. - Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 27665 [https://github.com/apache/spark/pull/27665] > Spark external shuffle allow disable of separate event loop group > - > > Key: SPARK-30623 > URL: https://issues.apache.org/jira/browse/SPARK-30623 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 3.0.0 >Reporter: Thomas Graves >Priority: Major > Fix For: 3.0.0 > > > In SPARK-24355 changes were made to add a separate event loop group for > processing ChunkFetchRequests , this allow fors the other threads to handle > regular connection requests when the configuration value is set. This however > seems to have added some latency (see pr 22173 comments at the end). > To help with this we could make sure the secondary event loop group isn't > used when the configuration of > spark.shuffle.server.chunkFetchHandlerThreadsPercent isn't explicitly set. > This should result in getting the same behavior as before. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31264) Repartition by dynamic partition columns before insert table
Yuming Wang created SPARK-31264: --- Summary: Repartition by dynamic partition columns before insert table Key: SPARK-31264 URL: https://issues.apache.org/jira/browse/SPARK-31264 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.1.0 Reporter: Yuming Wang -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30623) Spark external shuffle allow disable of separate event loop group
[ https://issues.apache.org/jira/browse/SPARK-30623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-30623: -- Affects Version/s: (was: 2.4.4) > Spark external shuffle allow disable of separate event loop group > - > > Key: SPARK-30623 > URL: https://issues.apache.org/jira/browse/SPARK-30623 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 3.0.0 >Reporter: Thomas Graves >Priority: Major > > In SPARK-24355 changes were made to add a separate event loop group for > processing ChunkFetchRequests , this allow fors the other threads to handle > regular connection requests when the configuration value is set. This however > seems to have added some latency (see pr 22173 comments at the end). > To help with this we could make sure the secondary event loop group isn't > used when the configuration of > spark.shuffle.server.chunkFetchHandlerThreadsPercent isn't explicitly set. > This should result in getting the same behavior as before. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30623) Spark external shuffle allow disable of separate event loop group
[ https://issues.apache.org/jira/browse/SPARK-30623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17067359#comment-17067359 ] Dongjoon Hyun commented on SPARK-30623: --- Hi, [~tgraves]. Since SPARK-24355 is fixed at 3.0.0, I removed `2.4.4` from the `Affected Version`. Please let me know if I'm wrong. > Spark external shuffle allow disable of separate event loop group > - > > Key: SPARK-30623 > URL: https://issues.apache.org/jira/browse/SPARK-30623 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 2.4.4, 3.0.0 >Reporter: Thomas Graves >Priority: Major > > In SPARK-24355 changes were made to add a separate event loop group for > processing ChunkFetchRequests , this allow fors the other threads to handle > regular connection requests when the configuration value is set. This however > seems to have added some latency (see pr 22173 comments at the end). > To help with this we could make sure the secondary event loop group isn't > used when the configuration of > spark.shuffle.server.chunkFetchHandlerThreadsPercent isn't explicitly set. > This should result in getting the same behavior as before. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31263) Enable yarn shuffle service close the idle connections
feiwang created SPARK-31263: --- Summary: Enable yarn shuffle service close the idle connections Key: SPARK-31263 URL: https://issues.apache.org/jira/browse/SPARK-31263 Project: Spark Issue Type: Improvement Components: Shuffle Affects Versions: 3.1.0 Reporter: feiwang -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31262) Test case import another test case contains bracketed comments, can't display bracketed comments in golden files well.
jiaan.geng created SPARK-31262: -- Summary: Test case import another test case contains bracketed comments, can't display bracketed comments in golden files well. Key: SPARK-31262 URL: https://issues.apache.org/jira/browse/SPARK-31262 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.1.0 Reporter: jiaan.geng The content of {code:java} nested-comments.sql {code} show below: {code:java} -- This test case just used to test imported bracketed comments. -- the first case of bracketed comment --QUERY-DELIMITER-START /* This is the first example of bracketed comment. SELECT 'ommented out content' AS first; */ SELECT 'selected content' AS first; --QUERY-DELIMITER-END {code} The test case {code:java} comments.sql {code} imports {code:java} nested-comments.sql {code} below: {code:java} --IMPORT nested-comments.sql {code} The output will be: {code:java} -- !query /* This is the first example of bracketed comment. SELECT 'ommented out content' AS first -- !query schema struct<> -- !query output org.apache.spark.sql.catalyst.parser.ParseException mismatched input '/' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', ' ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0) == SQL == /* This is the first example of bracketed comment. ^^^ SELECT 'ommented out content' AS first -- !query */ SELECT 'selected content' AS first -- !query schema struct<> -- !query output org.apache.spark.sql.catalyst.parser.ParseException extraneous input '*/' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0) == SQL == */ ^^^ SELECT 'selected content' AS first {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-30923) Spark MLlib, GraphX 3.0 QA umbrella
[ https://issues.apache.org/jira/browse/SPARK-30923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng resolved SPARK-30923. -- Resolution: Fixed > Spark MLlib, GraphX 3.0 QA umbrella > --- > > Key: SPARK-30923 > URL: https://issues.apache.org/jira/browse/SPARK-30923 > Project: Spark > Issue Type: Umbrella > Components: Documentation, GraphX, ML, MLlib, PySpark >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Blocker > > Description > This JIRA lists tasks for the next Spark release's QA period for MLlib and > GraphX. *SparkR is separate. > The list below gives an overview of what is involved, and the corresponding > JIRA issues are linked below that. > h2. API > * Check binary API compatibility for Scala/Java > * Audit new public APIs (from the generated html doc) > ** Scala > ** Java compatibility > ** Python coverage > * Check Experimental, DeveloperApi tags > h2. Algorithms and performance > * Performance tests > h2. Documentation and example code > * For new algorithms, create JIRAs for updating the user guide sections & > examples > * Update Programming Guide > * Update website -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31260) How to speed up WholeStageCodegen in Spark SQL Query?
[ https://issues.apache.org/jira/browse/SPARK-31260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HongJin updated SPARK-31260: Description: It's took about 2mins for one 248 MB file. 2 files ~ 5 mins How can I tune or maximize the performance. Initialize spark as below: {{.setMaster(numCores) .set("spark.driver.host", "localhost") .set("spark.executor.cores","2") .set("spark.num.executors","2") .set("spark.executor.memory", "4g") .set("spark.dynamicAllocation.enabled", "true") .set("spark.dynamicAllocation.minExecutors","2") .set("spark.dynamicAllocation.maxExecutors","2") .set("spark.ui.enabled","true") .set("spark.sql.shuffle.partitions",defaultPartitions)}} {{}} {{joinedDf = upperCaseLeft.as("l") .join(upperCaseRight.as("r"), caseTransformedKeys, "full_outer") .select(compositeKeysCol ::: nonKeyCols.map(col => mapHelper(col,toleranceValue,caseSensitive)): _*)}} {{}} {{}} {{}} {{data = joinedDf.take(1000)}} {{}} [https://i.stack.imgur.com/oeYww.png]{{}} == Parsed Logical Plan == GlobalLimit 5 +- LocalLimit 5 +- Project [COL1#155, CASE WHEN (isnull(COL2#98) && isnull(COL2#114)) THEN [null] WHEN isnull(COL2#98) THEN concat([null]<==>, COL2#114) WHEN isnull(COL2#114) THEN concat(COL2#98, <==>[null]) WHEN ((upper(COL2#98) = upper(COL2#114)) && true) THEN concat(, COL2#98) WHEN (abs((cast(COL2#98 as double) - cast(COL2#114 as double))) <= 0.1) THEN concat(COL2#98, , COL2#114) ELSE concat(COL2#98, <==>, COL2#114) END AS COL2#171, CASE WHEN (isnull(COL3#99) && isnull(COL3#115)) THEN [null] WHEN isnull(COL3#99) THEN concat([null]<==>, COL3#115) WHEN isnull(COL3#115) THEN concat(COL3#99, <==>[null]) WHEN ((upper(COL3#99) = upper(COL3#115)) && true) THEN concat(, COL3#99) WHEN (abs((cast(COL3#99 as double) - cast(COL3#115 as double))) <= 0.1) THEN concat(COL3#99, , COL3#115) ELSE concat(COL3#99, <==>, COL3#115) END AS COL3#172, CASE WHEN (isnull(COL4#100) && isnull(COL4#116)) THEN [null] WHEN isnull(COL4#100) THEN concat([null]<==>, COL4#116) WHEN isnull(COL4#116) THEN concat(COL4#100, <==>[null]) WHEN ((upper(COL4#100) = upper(COL4#116)) && true) THEN concat(, COL4#100) WHEN (abs((cast(COL4#100 as double) - cast(COL4#116 as double))) <= 0.1) THEN concat(COL4#100, , COL4#116) ELSE concat(COL4#100, <==>, COL4#116) END AS COL4#173, CASE WHEN (isnull(COL5#101) && isnull(COL5#117)) THEN [null] WHEN isnull(COL5#101) THEN concat([null]<==>, COL5#117) WHEN isnull(COL5#117) THEN concat(COL5#101, <==>[null]) WHEN ((upper(COL5#101) = upper(COL5#117)) && true) THEN concat(, COL5#101) WHEN (abs((cast(COL5#101 as double) - cast(COL5#117 as double))) <= 0.1) THEN concat(COL5#101, , COL5#117) ELSE concat(COL5#101, <==>, COL5#117) END AS COL5#174, CASE WHEN (isnull(COL6#102) && isnull(COL6#118)) THEN [null] WHEN isnull(COL6#102) THEN concat([null]<==>, COL6#118) WHEN isnull(COL6#118) THEN concat(COL6#102, <==>[null]) WHEN ((upper(COL6#102) = upper(COL6#118)) && true) THEN concat(, COL6#102) WHEN (abs((cast(COL6#102 as double) - cast(COL6#118 as double))) <= 0.1) THEN concat(COL6#102, , COL6#118) ELSE concat(COL6#102, <==>, COL6#118) END AS COL6#175, CASE WHEN (isnull(COL7#103) && isnull(COL7#119)) THEN [null] WHEN isnull(COL7#103) THEN concat([null]<==>, COL7#119) WHEN isnull(COL7#119) THEN concat(COL7#103, <==>[null]) WHEN ((upper(COL7#103) = upper(COL7#119)) && true) THEN concat(, COL7#103) WHEN (abs((cast(COL7#103 as double) - cast(COL7#119 as double))) <= 0.1) THEN concat(COL7#103, , COL7#119) ELSE concat(COL7#103, <==>, COL7#119) END AS COL7#176, CASE WHEN (isnull(COL8#104) && isnull(COL8#120)) THEN [null] WHEN isnull(COL8#104) THEN concat([null]<==>, COL8#120) WHEN isnull(COL8#120) THEN concat(COL8#104, <==>[null]) WHEN ((upper(COL8#104) = upper(COL8#120)) && true) THEN concat(, COL8#104) WHEN (abs((cast(COL8#104 as double) - cast(COL8#120 as double))) <= 0.1) THEN concat(COL8#104, , COL8#120) ELSE concat(COL8#104, <==>, COL8#120) END AS COL8#177] +- Project [coalesce(COL1#97, COL1#113) AS COL1#155, COL2#98, COL3#99, COL4#100, COL5#101, COL6#102, COL7#103, COL8#104, COL2#114, COL3#115, COL4#116, COL5#117, COL6#118, COL7#119, COL8#120] +- Join FullOuter, (COL1#97 = COL1#113) :- SubqueryAlias `l` : +- ResolvedHint (broadcast) : +- Project [col1#10 AS COL1#97, col2#11 AS COL2#98, col3#12 AS COL3#99, col4#13 AS COL4#100, col5#14 AS COL5#101, col6#15 AS COL6#102, col7#16 AS COL7#103, col8#17 AS COL8#104] : +- Project [col1#10, col2#11, col3#12, col4#13, col5#14, col6#15, col7#16, col8#17] : +- Relation[col1#10,col2#11,col3#12,col4#13,col5#14,col6#15,col7#16,col8#17] csv +- SubqueryAlias `r` +- ResolvedHint (broadcast) +- Project [col1#36 AS COL1#113, col2#37 AS COL2#114, col3#38 AS COL3#115, col4#39 AS COL4#116, col5#40 AS COL5#117, col6#41 AS COL6#118, col7#42 AS COL7#119, col8#43 AS COL8#120] +- Project [col1#36, col2#37, col3#38, col4#39, col5#40, col6#41, col7#42, co
[jira] [Created] (SPARK-31261) Avoid npe when reading bad csv input with `columnNameCorruptRecord` specified
Zhenhua Wang created SPARK-31261: Summary: Avoid npe when reading bad csv input with `columnNameCorruptRecord` specified Key: SPARK-31261 URL: https://issues.apache.org/jira/browse/SPARK-31261 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.5, 3.0.0, 3.1.0 Reporter: Zhenhua Wang -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31260) How to speed up WholeStageCodegen in Spark SQL Query?
HongJin created SPARK-31260: --- Summary: How to speed up WholeStageCodegen in Spark SQL Query? Key: SPARK-31260 URL: https://issues.apache.org/jira/browse/SPARK-31260 Project: Spark Issue Type: Question Components: Spark Core Affects Versions: 2.4.4 Reporter: HongJin It's took about 2mins for one 248 MB file. 2 files ~ 5 mins How can I tune or maximize the performance. Initialize spark as below: {{.setMaster(numCores) .set("spark.driver.host", "localhost") .set("spark.executor.cores","2") .set("spark.num.executors","2") .set("spark.executor.memory", "4g") .set("spark.dynamicAllocation.enabled", "true") .set("spark.dynamicAllocation.minExecutors","2") .set("spark.dynamicAllocation.maxExecutors","2") .set("spark.ui.enabled","true") .set("spark.sql.shuffle.partitions",defaultPartitions)}} {{}} {{joinedDf = upperCaseLeft.as("l") .join(upperCaseRight.as("r"), caseTransformedKeys, "full_outer") .select(compositeKeysCol ::: nonKeyCols.map(col => mapHelper(col,toleranceValue,caseSensitive)): _*)}} {{}} {{}} {{}} {{data = joinedDf.take(1000)}} {{}} [https://i.stack.imgur.com/oeYww.png]{{}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30466) remove dependency on jackson-mapper-asl-1.9.13 and jackson-core-asl-1.9.13
[ https://issues.apache.org/jira/browse/SPARK-30466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17067300#comment-17067300 ] Nicholas Marion commented on SPARK-30466: - It is worth noting that the following dependencies rely on codehaus jackson: Apache Hadoop, fixed in 3.x versions with [https://github.com/apache/hadoop/commit/67d9f2808efb34b9a7b0b824cb4033b95ad33474#diff-e2c362dd211f462f1f629e34af05f497] Apache parquet-mq, fixed in 1.11.0 with [https://github.com/apache/parquet-mr/commit/47398be76cfb6634000532e9432430c4676442dd#diff-c6f127eb650758aad91ecf02a2e52add] Apache Avro, fixed in 1.9.x with [https://github.com/apache/avro/commit/95234db14b7afca9593829f43c41a9851e08dcd7#diff-f5fe6838f0d551a0e3bca3774778b2eb] Apache Hive, fixed in 3.x with [https://github.com/apache/hive/commit/245c39b4c8f711fbc1c9c00df013e4c7fcbdc0a2] Apache Hadoop 3.x versions are supported within Spark 2.4.x Apache parquet-mq, appears to be a simple upgrade in pom.xml Apache Avro, required a little more than a simple upgrade in pom.xml; but was still simple. Apache Hive 2.x was recently added to Spark 3.x, with [https://github.com/apache/spark/commit/c98e5eb3396a6db92f2420e743afa9ddff319ca2] bu upgrading to Hive 3.x was not as straightforward and will likely require a lot more work. Once these 4 dependencies have been updated; we should be out of using the vulnerable codehaus-jackson jars. > remove dependency on jackson-mapper-asl-1.9.13 and jackson-core-asl-1.9.13 > -- > > Key: SPARK-30466 > URL: https://issues.apache.org/jira/browse/SPARK-30466 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.4.4, 3.0.0 >Reporter: Michael Burgener >Priority: Major > Labels: security > > These 2 libraries are deprecated and replaced by the jackson-databind > libraries which are already included. These two libraries are flagged by our > vulnerability scanners as having the following security vulnerabilities. > I've set the priority to Major due to the Critical nature and hopefully they > can be addressed quickly. Please note, I'm not a developer but work in > InfoSec and this was flagged when we incorporated spark into our product. If > you feel the priority is not set correctly please change accordingly. I'll > watch the issue and flag our dev team to update once resolved. > jackson-mapper-asl-1.9.13 > CVE-2018-7489 (CVSS 3.0 Score 9.8 CRITICAL) > [https://nvd.nist.gov/vuln/detail/CVE-2018-7489] > > CVE-2017-7525 (CVSS 3.0 Score 9.8 CRITICAL) > [https://nvd.nist.gov/vuln/detail/CVE-2017-7525] > > CVE-2017-17485 (CVSS 3.0 Score 9.8 CRITICAL) > [https://nvd.nist.gov/vuln/detail/CVE-2017-17485] > > CVE-2017-15095 (CVSS 3.0 Score 9.8 CRITICAL) > [https://nvd.nist.gov/vuln/detail/CVE-2017-15095] > > CVE-2018-5968 (CVSS 3.0 Score 8.1 High) > [https://nvd.nist.gov/vuln/detail/CVE-2018-5968] > > jackson-core-asl-1.9.13 > CVE-2016-7051 (CVSS 3.0 Score 8.6 High) > https://nvd.nist.gov/vuln/detail/CVE-2016-7051 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31259) Fix log error of curRequestSize in ShuffleBlockFetcherIterator
wuyi created SPARK-31259: Summary: Fix log error of curRequestSize in ShuffleBlockFetcherIterator Key: SPARK-31259 URL: https://issues.apache.org/jira/browse/SPARK-31259 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.0.0, 3.1.0 Reporter: wuyi The log of curRequestSize is incorrect. Because curRequestSize may be the total size of several group blocks but we use it for each group. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31258) sbt unidoc fail to resolve arvo dependency
[ https://issues.apache.org/jira/browse/SPARK-31258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-31258: - Summary: sbt unidoc fail to resolve arvo dependency (was: sbt unidoc fail to resolving arvo dependency) > sbt unidoc fail to resolve arvo dependency > -- > > Key: SPARK-31258 > URL: https://issues.apache.org/jira/browse/SPARK-31258 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.1.0 >Reporter: Kent Yao >Priority: Major > > {code:java} > warn] Multiple main classes detected. Run 'show discoveredMainClasses' to > see the list > [warn] Multiple main classes detected. Run 'show discoveredMainClasses' to > see the list > [info] Main Scala API documentation to > /home/jenkins/workspace/SparkPullRequestBuilder@6/target/scala-2.12/unidoc... > [info] Main Java API documentation to > /home/jenkins/workspace/SparkPullRequestBuilder@6/target/javaunidoc... > [error] > /home/jenkins/workspace/SparkPullRequestBuilder@6/core/src/main/scala/org/apache/spark/serializer/GenericAvroSerializer.scala:123: > value createDatumWriter is not a member of > org.apache.avro.generic.GenericData > [error] writerCache.getOrElseUpdate(schema, > GenericData.get.createDatumWriter(schema)) > [error] ^ > [info] No documentation generated with unsuccessful compiler run > [error] one error found > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31258) sbt unidoc fail to resolving arvo dependency
Kent Yao created SPARK-31258: Summary: sbt unidoc fail to resolving arvo dependency Key: SPARK-31258 URL: https://issues.apache.org/jira/browse/SPARK-31258 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.1.0 Reporter: Kent Yao {code:java} warn] Multiple main classes detected. Run 'show discoveredMainClasses' to see the list [warn] Multiple main classes detected. Run 'show discoveredMainClasses' to see the list [info] Main Scala API documentation to /home/jenkins/workspace/SparkPullRequestBuilder@6/target/scala-2.12/unidoc... [info] Main Java API documentation to /home/jenkins/workspace/SparkPullRequestBuilder@6/target/javaunidoc... [error] /home/jenkins/workspace/SparkPullRequestBuilder@6/core/src/main/scala/org/apache/spark/serializer/GenericAvroSerializer.scala:123: value createDatumWriter is not a member of org.apache.avro.generic.GenericData [error] writerCache.getOrElseUpdate(schema, GenericData.get.createDatumWriter(schema)) [error] ^ [info] No documentation generated with unsuccessful compiler run [error] one error found {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30095) create function syntax has to be enhance in Doc for multiple dependent jars
[ https://issues.apache.org/jira/browse/SPARK-30095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17067221#comment-17067221 ] Huaxin Gao commented on SPARK-30095: [~abhishek.akg] Any update on this? > create function syntax has to be enhance in Doc for multiple dependent jars > > > Key: SPARK-30095 > URL: https://issues.apache.org/jira/browse/SPARK-30095 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Affects Versions: 3.0.0 >Reporter: ABHISHEK KUMAR GUPTA >Priority: Minor > > Create Function Example and Syntax has to be enhance as below > 1. Case 1: How to use multiple dependent jars in the path while creating > function is not clear. -- Syntax to be given > 2. Case 2: What are the different schema supported like file:/// is not > updated in doc - Supported Schema to be provided -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31257) Fix ambiguous two different CREATE TABLE syntaxes
Jungtaek Lim created SPARK-31257: Summary: Fix ambiguous two different CREATE TABLE syntaxes Key: SPARK-31257 URL: https://issues.apache.org/jira/browse/SPARK-31257 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0 Reporter: Jungtaek Lim There's a discussion in dev@ mailing list to point out ambiguous syntaxes for CREATE TABLE DDL. This issue tracks the efforts to resolve the problem. https://lists.apache.org/thread.html/rf1acfaaa3de2d3129575199c28e7d529d38f2783e7d3c5be2ac8923d%40%3Cdev.spark.apache.org%3E Note that the priority of this issue is set to blocker as the ambiguity is brought by SPARK-30098 which will be shipped in Spark 3.0.0; before we ship SPARK-30098 we should fix the syntax and ensure the syntax is very deterministic to both devs and end users. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31256) Dropna doesn't work for struct columns
Michael Souder created SPARK-31256: -- Summary: Dropna doesn't work for struct columns Key: SPARK-31256 URL: https://issues.apache.org/jira/browse/SPARK-31256 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 2.4.5 Environment: Spark 2.4.5 Python 3.7.4 Reporter: Michael Souder Dropna using a subset with a column from a struct drops the entire data frame. {code:python} import pyspark.sql.functions as F df = spark.createDataFrame([(5, 80, 'Alice'), (10, None, 'Bob'), (15, 80, None)], schema=['age', 'height', 'name']) df.show() +---+--+-+ |age|height| name| +---+--+-+ | 5|80|Alice| | 10| null| Bob| | 15|80| null| +---+--+-+ # this works just fine df.dropna(subset=['name']).show() +---+--+-+ |age|height| name| +---+--+-+ | 5|80|Alice| | 10| null| Bob| +---+--+-+ # now add a struct column df_with_struct = df.withColumn('struct_col', F.struct('age', 'height', 'name')) df_with_struct.show(truncate=False) +---+--+-+--+ |age|height|name |struct_col| +---+--+-+--+ |5 |80|Alice|[5, 80, Alice]| |10 |null |Bob |[10,, Bob]| |15 |80|null |[15, 80,] | +---+--+-+--+ # now dropna drops the whole dataframe when you use struct_col df_with_struct.dropna(subset=['struct_col.name']).show(truncate=False) +---+--++--+ |age|height|name|struct_col| +---+--++--+ +---+--++--+ {code} I've tested the above code in Spark 2.4.4 with python 3.7.4 and Spark 2.3.1 with python 3.6.8 and in both, the result looks like: {code:python} df_with_struct.dropna(subset=['struct_col.name']).show(truncate=False) +---+--+-+--+ |age|height|name |struct_col| +---+--+-+--+ |5 |80|Alice|[5, 80, Alice]| |10 |null |Bob |[10,, Bob]| +---+--+-+--+ {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31255) DataSourceV2: Add metadata columns
[ https://issues.apache.org/jira/browse/SPARK-31255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Blue updated SPARK-31255: -- Issue Type: New Feature (was: Bug) > DataSourceV2: Add metadata columns > -- > > Key: SPARK-31255 > URL: https://issues.apache.org/jira/browse/SPARK-31255 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.0.0 >Reporter: Ryan Blue >Priority: Major > > DSv2 should support reading additional metadata columns that are not in a > table's schema. This allows users to project metadata like Kafka's offset, > timestamp, and partition. It also allows other sources to expose metadata > like file and row position. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31255) DataSourceV2: Add metadata columns
Ryan Blue created SPARK-31255: - Summary: DataSourceV2: Add metadata columns Key: SPARK-31255 URL: https://issues.apache.org/jira/browse/SPARK-31255 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0 Reporter: Ryan Blue DSv2 should support reading additional metadata columns that are not in a table's schema. This allows users to project metadata like Kafka's offset, timestamp, and partition. It also allows other sources to expose metadata like file and row position. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31254) `HiveResult.toHiveString` does not use the current session time zone
Maxim Gekk created SPARK-31254: -- Summary: `HiveResult.toHiveString` does not use the current session time zone Key: SPARK-31254 URL: https://issues.apache.org/jira/browse/SPARK-31254 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk Currently, date/timestamp formatters in `HiveResult.toHiveString` are initialized once on instantiation of the `HiveResult` object, and pick up the session time zone. If the sessions time zone is changed, the formatters still use the previous one. See the discussion there https://github.com/apache/spark/pull/23391#discussion_r397347820 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31244) Use Minio instead of Ceph in K8S DepsTestsSuite
[ https://issues.apache.org/jira/browse/SPARK-31244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-31244: - Assignee: Dongjoon Hyun > Use Minio instead of Ceph in K8S DepsTestsSuite > --- > > Key: SPARK-31244 > URL: https://issues.apache.org/jira/browse/SPARK-31244 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Tests >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > > `DepsTestsSuite` is using `ceph` for S3 storage. However, it's not robust on > `minikube` version. Also, the image size is almost 1GB. > {code} > ceph/daemon > v4.0.3-stable-4.0-nautilus-centos-7-x86_64 a6a05ccdf9246 months ago >852MB > ceph/daemon > v4.0.11-stable-4.0-nautilus-centos-7 87f695550d8e12 hours ago >901MB > {code} > {code} > $ minikube version > minikube version: v1.8.2 > $ minikube -p minikube docker-env | source > $ docker run -it --rm -e NETWORK_AUTO_DETECT=4 -e RGW_FRONTEND_PORT=8000 -e > SREE_PORT=5001 -e CEPH_DEMO_UID=nano -e CEPH_DAEMON=demo > ceph/daemon:v4.0.3-stable-4.0-nautilus-centos-7-x86_64 /bin/sh > 2020-03-25 04:26:21 /opt/ceph-container/bin/entrypoint.sh: ERROR- it looks > like we have not been able to discover the network settings > $ docker run -it --rm -e NETWORK_AUTO_DETECT=4 -e RGW_FRONTEND_PORT=8000 -e > SREE_PORT=5001 -e CEPH_DEMO_UID=nano -e CEPH_DAEMON=demo > ceph/daemon:v4.0.11-stable-4.0-nautilus-centos-7 /bin/sh > 2020-03-25 04:20:30 /opt/ceph-container/bin/entrypoint.sh: ERROR- it looks > like we have not been able to discover the network settings > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31244) Use Minio instead of Ceph in K8S DepsTestsSuite
[ https://issues.apache.org/jira/browse/SPARK-31244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-31244. --- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 28015 [https://github.com/apache/spark/pull/28015] > Use Minio instead of Ceph in K8S DepsTestsSuite > --- > > Key: SPARK-31244 > URL: https://issues.apache.org/jira/browse/SPARK-31244 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Tests >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.0.0 > > > `DepsTestsSuite` is using `ceph` for S3 storage. However, it's not robust on > `minikube` version. Also, the image size is almost 1GB. > {code} > ceph/daemon > v4.0.3-stable-4.0-nautilus-centos-7-x86_64 a6a05ccdf9246 months ago >852MB > ceph/daemon > v4.0.11-stable-4.0-nautilus-centos-7 87f695550d8e12 hours ago >901MB > {code} > {code} > $ minikube version > minikube version: v1.8.2 > $ minikube -p minikube docker-env | source > $ docker run -it --rm -e NETWORK_AUTO_DETECT=4 -e RGW_FRONTEND_PORT=8000 -e > SREE_PORT=5001 -e CEPH_DEMO_UID=nano -e CEPH_DAEMON=demo > ceph/daemon:v4.0.3-stable-4.0-nautilus-centos-7-x86_64 /bin/sh > 2020-03-25 04:26:21 /opt/ceph-container/bin/entrypoint.sh: ERROR- it looks > like we have not been able to discover the network settings > $ docker run -it --rm -e NETWORK_AUTO_DETECT=4 -e RGW_FRONTEND_PORT=8000 -e > SREE_PORT=5001 -e CEPH_DEMO_UID=nano -e CEPH_DAEMON=demo > ceph/daemon:v4.0.11-stable-4.0-nautilus-centos-7 /bin/sh > 2020-03-25 04:20:30 /opt/ceph-container/bin/entrypoint.sh: ERROR- it looks > like we have not been able to discover the network settings > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31249) Flaky Test: CoarseGrainedSchedulerBackendSuite.custom log url for Spark UI is applied
[ https://issues.apache.org/jira/browse/SPARK-31249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066939#comment-17066939 ] Xingbo Jiang commented on SPARK-31249: -- I can't reproduce this failure, maybe the ExecutorAdded event has been lost? > Flaky Test: CoarseGrainedSchedulerBackendSuite.custom log url for Spark UI is > applied > - > > Key: SPARK-31249 > URL: https://issues.apache.org/jira/browse/SPARK-31249 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Priority: Major > > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120302/testReport/ > {code} > sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: 2 did > not equal 3 > at > org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530) > at > org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529) > at > org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560) > at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:503) > at > org.apache.spark.scheduler.CoarseGrainedSchedulerBackendSuite.$anonfun$new$11(CoarseGrainedSchedulerBackendSuite.scala:186) > at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) > at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:151) > at > org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184) > at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196) > at org.scalatest.SuperEngine.runTestImpl(Engine.scala:286) > at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196) > at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31246) GracefulShutdown does not work when application is terminated from RestSubmissionClient or YarnClient
[ https://issues.apache.org/jira/browse/SPARK-31246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-31246: Priority: Major (was: Blocker) > GracefulShutdown does not work when application is terminated from > RestSubmissionClient or YarnClient > - > > Key: SPARK-31246 > URL: https://issues.apache.org/jira/browse/SPARK-31246 > Project: Spark > Issue Type: Bug > Components: Java API >Affects Versions: 2.4.3 > Environment: spark-2.4.3 >Reporter: Gajanan Hebbar >Priority: Major > > While starting the Spark Application > "*spark.streaming.stopGracefullyOnShutdown*" is set to true > try to terminate the application programatically using JAVA API > 1 using RestSubmissionClient client = new RestSubmissionClient(masterUrl); > SubmitRestProtocolResponse statusResponse = > client.killSubmission(submissionId); > > 2. using getYarnClient().killApplication(appId); > > In both the cases Application dose not stop gracefully > > But killing the Application using > > Kill -SIGTERM will shutdown the application gracefully. > Expected : Application should have terminated gracefully in all cases when > spark.streaming.stopGracefullyOnShutdown is set -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31133) fix sql ref doc for DML
[ https://issues.apache.org/jira/browse/SPARK-31133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-31133. --- Fix Version/s: 3.0.0 Resolution: Fixed This is resolved via https://github.com/apache/spark/pull/27891 > fix sql ref doc for DML > --- > > Key: SPARK-31133 > URL: https://issues.apache.org/jira/browse/SPARK-31133 > Project: Spark > Issue Type: Documentation > Components: SQL >Affects Versions: 3.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31253) add metrics to shuffle reader
Wenchen Fan created SPARK-31253: --- Summary: add metrics to shuffle reader Key: SPARK-31253 URL: https://issues.apache.org/jira/browse/SPARK-31253 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Wenchen Fan Assignee: Wenchen Fan -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31181) Remove the default value assumption on CREATE TABLE test cases
[ https://issues.apache.org/jira/browse/SPARK-31181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-31181: -- Parent: SPARK-31085 Issue Type: Sub-task (was: Improvement) > Remove the default value assumption on CREATE TABLE test cases > -- > > Key: SPARK-31181 > URL: https://issues.apache.org/jira/browse/SPARK-31181 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-31136) Revert SPARK-30098 Use default datasource as provider for CREATE TABLE syntax
[ https://issues.apache.org/jira/browse/SPARK-31136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun closed SPARK-31136. - > Revert SPARK-30098 Use default datasource as provider for CREATE TABLE syntax > - > > Key: SPARK-31136 > URL: https://issues.apache.org/jira/browse/SPARK-31136 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Priority: Blocker > > We need to consider the behavior change of SPARK-30098 . > This is a placeholder to keep the discussion and the final decision. > `CREATE TABLE` syntax changes its behavior silently. > The following is one example of the breaking the existing user data pipelines. > *Apache Spark 2.4.5* > {code} > spark-sql> CREATE TABLE t(a STRING); > spark-sql> LOAD DATA INPATH '/usr/local/spark/README.md' INTO TABLE t; > spark-sql> SELECT * FROM t LIMIT 1; > # Apache Spark > Time taken: 2.05 seconds, Fetched 1 row(s) > {code} > {code} > spark-sql> CREATE TABLE t(a CHAR(3)); > spark-sql> INSERT INTO TABLE t SELECT 'a '; > spark-sql> SELECT a, length(a) FROM t; > a 3 > {code} > *Apache Spark 3.0.0-preview2* > {code} > spark-sql> CREATE TABLE t(a STRING); > spark-sql> LOAD DATA INPATH '/usr/local/spark/README.md' INTO TABLE t; > Error in query: LOAD DATA is not supported for datasource tables: > `default`.`t`; > {code} > {code} > spark-sql> CREATE TABLE t(a CHAR(3)); > spark-sql> INSERT INTO TABLE t SELECT 'a '; > spark-sql> SELECT a, length(a) FROM t; > a 2 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31147) forbid CHAR type in non-Hive-Serde tables
[ https://issues.apache.org/jira/browse/SPARK-31147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-31147: -- Parent: SPARK-31085 Issue Type: Sub-task (was: Bug) > forbid CHAR type in non-Hive-Serde tables > - > > Key: SPARK-31147 > URL: https://issues.apache.org/jira/browse/SPARK-31147 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Fix For: 3.1.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31147) forbid CHAR type in non-Hive-Serde tables
[ https://issues.apache.org/jira/browse/SPARK-31147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-31147. --- Fix Version/s: 3.1.0 Resolution: Fixed This is resolved via https://github.com/apache/spark/pull/27902 in `master` branch. We will backport to `branch-3.0`, too. > forbid CHAR type in non-Hive-Serde tables > - > > Key: SPARK-31147 > URL: https://issues.apache.org/jira/browse/SPARK-31147 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Fix For: 3.1.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31252) Flaky test: ElementTrackingStoreSuite.asynchronous tracking single-fire
Gabor Somogyi created SPARK-31252: - Summary: Flaky test: ElementTrackingStoreSuite.asynchronous tracking single-fire Key: SPARK-31252 URL: https://issues.apache.org/jira/browse/SPARK-31252 Project: Spark Issue Type: Bug Components: Tests Affects Versions: 3.0.0, 3.1.0 Reporter: Gabor Somogyi Error Message org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to eventually never returned normally. Attempted 1 times over 230.305107 milliseconds. Last failure message: false did not equal true. Stacktrace sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to eventually never returned normally. Attempted 1 times over 230.305107 milliseconds. Last failure message: false did not equal true. at org.scalatest.concurrent.Eventually.tryTryAgain$1(Eventually.scala:432) at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:439) at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:391) at org.apache.spark.status.ElementTrackingStoreSuite.eventually(ElementTrackingStoreSuite.scala:31) at org.apache.spark.status.ElementTrackingStoreSuite.$anonfun$new$1(ElementTrackingStoreSuite.scala:64) at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:151) at org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184) at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:286) at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196) at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178) at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:58) at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:221) at org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:214) at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:58) at org.scalatest.FunSuiteLike.$anonfun$runTests$1(FunSuiteLike.scala:229) at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:393) at scala.collection.immutable.List.foreach(List.scala:392) at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:381) at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:376) at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:458) at org.scalatest.FunSuiteLike.runTests(FunSuiteLike.scala:229) at org.scalatest.FunSuiteLike.runTests$(FunSuiteLike.scala:228) at org.scalatest.FunSuite.runTests(FunSuite.scala:1560) at org.scalatest.Suite.run(Suite.scala:1124) at org.scalatest.Suite.run$(Suite.scala:1106) at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560) at org.scalatest.FunSuiteLike.$anonfun$run$1(FunSuiteLike.scala:233) at org.scalatest.SuperEngine.runImpl(Engine.scala:518) at org.scalatest.FunSuiteLike.run(FunSuiteLike.scala:233) at org.scalatest.FunSuiteLike.run$(FunSuiteLike.scala:232) at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:58) at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213) at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208) at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:58) at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317) at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510) at sbt.ForkMain$Run$2.call(ForkMain.java:296) at sbt.ForkMain$Run$2.call(ForkMain.java:286) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: false did not equal true at org.scalatest.MatchersHelper$.indicateFailure(MatchersHelper.scala:343) at org.scalatest.Matchers$AnyShouldWrapper.shouldEqual(Matchers.scala:679
[jira] [Resolved] (SPARK-31196) Server-side processing of History UI list of applications
[ https://issues.apache.org/jira/browse/SPARK-31196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavol Vidlička resolved SPARK-31196. Resolution: Won't Fix After looking into what would need to be changed to implement server-side processing, I think that the cost of implementing it and maintenance outweighs the benefits > Server-side processing of History UI list of applications > - > > Key: SPARK-31196 > URL: https://issues.apache.org/jira/browse/SPARK-31196 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 2.3.0, 2.4.5 >Reporter: Pavol Vidlička >Priority: Minor > > Loading the list of applications in the History UI does not scale well with a > large number of applications. Fetching and rendering the list for 10k+ > applications takes over a minute (much longer for more applications) and > tends to freeze the browser. > Using `spark.history.ui.maxApplications` is not a great solution, because (as > the name implies), it limits the number of applications shown in the UI, > which hinders usability of the History Server. > A solution would be to use server [side processing of the > DataTable|https://datatables.net/examples/data_sources/server_side]. This > would limit amount of data sent to the client and processed by the browser. > This proposed change plays nicely with KVStore abstraction implemented in > SPARK-18085, which was supposed to solve some of the scalability issues. It > could also definitely solve History UI scalability issues reported for > example in SPARK-21254, SPARK-17243, SPARK-17671 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31251) Flaky Test: StreamingContextSuite.stop gracefully
Hyukjin Kwon created SPARK-31251: Summary: Flaky Test: StreamingContextSuite.stop gracefully Key: SPARK-31251 URL: https://issues.apache.org/jira/browse/SPARK-31251 Project: Spark Issue Type: Test Components: Tests Affects Versions: 3.1.0 Reporter: Hyukjin Kwon https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120337/testReport/ {code} sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to eventually never returned normally. Attempted 532 times over 10.00564787199 seconds. Last failure message: 0 was not greater than 0. at org.scalatest.concurrent.Eventually.tryTryAgain$1(Eventually.scala:432) at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:439) at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:391) at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:479) at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:308) at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:307) at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:479) at org.apache.spark.streaming.StreamingContextSuite.$anonfun$new$33(StreamingContextSuite.scala:312) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158) at org.apache.spark.streaming.StreamingContextSuite.$anonfun$new$32(StreamingContextSuite.scala:300) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:151) at org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184) at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:286) at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196) at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178) at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:58) at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:221) at org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:214) at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:58) at org.scalatest.FunSuiteLike.$anonfun$runTests$1(FunSuiteLike.scala:229) at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:393) at scala.collection.immutable.List.foreach(List.scala:392) at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:381) at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:376) at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:458) at org.scalatest.FunSuiteLike.runTests(FunSuiteLike.scala:229) at org.scalatest.FunSuiteLike.runTests$(FunSuiteLike.scala:228) at org.scalatest.FunSuite.runTests(FunSuite.scala:1560) at org.scalatest.Suite.run(Suite.scala:1124) at org.scalatest.Suite.run$(Suite.scala:1106) at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560) at org.scalatest.FunSuiteLike.$anonfun$run$1(FunSuiteLike.scala:233) at org.scalatest.SuperEngine.runImpl(Engine.scala:518) at org.scalatest.FunSuiteLike.run(FunSuiteLike.scala:233) at org.scalatest.FunSuiteLike.run$(FunSuiteLike.scala:232) at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:58) at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213) at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208) at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:58) at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317) at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510) at sbt.ForkMain$Run$2.call(ForkMain.java:296) at sbt.ForkMain$Run$2.call(ForkMain.java:286) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at
[jira] [Updated] (SPARK-31248) Flaky Test: ExecutorAllocationManagerSuite.interleaving add and remove
[ https://issues.apache.org/jira/browse/SPARK-31248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-31248: - Description: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120300/testReport/ {code} sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: 12 did not equal 8 at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530) at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529) at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560) at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:503) at org.apache.spark.ExecutorAllocationManagerSuite.$anonfun$new$51(ExecutorAllocationManagerSuite.scala:864) at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:151) at org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184) {code} was: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120300/testReport/ sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: 12 did not equal 8 at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530) at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529) at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560) at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:503) at org.apache.spark.ExecutorAllocationManagerSuite.$anonfun$new$51(ExecutorAllocationManagerSuite.scala:864) at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:151) at org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184) > Flaky Test: ExecutorAllocationManagerSuite.interleaving add and remove > -- > > Key: SPARK-31248 > URL: https://issues.apache.org/jira/browse/SPARK-31248 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Priority: Major > > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120300/testReport/ > {code} > sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: 12 did > not equal 8 > at > org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530) > at > org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529) > at > org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560) > at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:503) > at > org.apache.spark.ExecutorAllocationManagerSuite.$anonfun$new$51(ExecutorAllocationManagerSuite.scala:864) > at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) > at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:151) > at > org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31248) Flaky Test: ExecutorAllocationManagerSuite.interleaving add and remove
[ https://issues.apache.org/jira/browse/SPARK-31248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-31248: - Summary: Flaky Test: ExecutorAllocationManagerSuite.interleaving add and remove (was: Flaky Test: org.apache.spark.ExecutorAllocationManagerSuite.interleaving add and remove) > Flaky Test: ExecutorAllocationManagerSuite.interleaving add and remove > -- > > Key: SPARK-31248 > URL: https://issues.apache.org/jira/browse/SPARK-31248 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Priority: Major > > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120300/testReport/ > sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: 12 did > not equal 8 > at > org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530) > at > org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529) > at > org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560) > at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:503) > at > org.apache.spark.ExecutorAllocationManagerSuite.$anonfun$new$51(ExecutorAllocationManagerSuite.scala:864) > at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) > at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:151) > at > org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31250) Flaky Test: KafkaDelegationTokenSuite.(It is not a test it is a sbt.testing.SuiteSelector)
Hyukjin Kwon created SPARK-31250: Summary: Flaky Test: KafkaDelegationTokenSuite.(It is not a test it is a sbt.testing.SuiteSelector) Key: SPARK-31250 URL: https://issues.apache.org/jira/browse/SPARK-31250 Project: Spark Issue Type: Test Components: Tests Affects Versions: 3.1.0 Reporter: Hyukjin Kwon https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120321/testReport/ {code} sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: Failed to create new KafkaAdminClient at org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:451) at org.apache.kafka.clients.admin.Admin.create(Admin.java:59) at org.apache.kafka.clients.admin.AdminClient.create(AdminClient.java:39) at org.apache.spark.sql.kafka010.KafkaTestUtils.setupEmbeddedKafkaServer(KafkaTestUtils.scala:267) at org.apache.spark.sql.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:290) at org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.beforeAll(KafkaDelegationTokenSuite.scala:49) at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212) at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208) at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:58) at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317) at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510) at sbt.ForkMain$Run$2.call(ForkMain.java:296) at sbt.ForkMain$Run$2.call(ForkMain.java:286) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: javax.security.auth.login.LoginException: Client not found in Kerberos database (6) - Client not found in Kerberos database at org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:158) at org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:146) at org.apache.kafka.common.network.ChannelBuilders.clientChannelBuilder(ChannelBuilders.java:67) at org.apache.kafka.clients.ClientUtils.createChannelBuilder(ClientUtils.java:99) at org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:426) ... 17 more Caused by: sbt.ForkMain$ForkError: javax.security.auth.login.LoginException: Client not found in Kerberos database (6) - Client not found in Kerberos database at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:804) at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755) at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195) at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682) at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680) at javax.security.auth.login.LoginContext.login(LoginContext.java:587) at org.apache.kafka.common.security.authenticator.AbstractLogin.login(AbstractLogin.java:60) at org.apache.kafka.common.security.kerberos.KerberosLogin.login(KerberosLogin.java:103) at org.apache.kafka.common.security.authenticator.LoginManager.(LoginManager.java:62) at org.apache.kafka.common.security.authenticator.LoginManager.acquireLoginManager(LoginManager.java:105) at org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:147) ... 21 more Caused by: sbt.ForkMain$ForkError: sun.security.krb5.KrbException: Client not found in Kerberos database (6) - Client not found in Kerberos database at sun.security.krb5.KrbAsRep.(KrbAsRep.java:82) at sun.security.krb5.KrbAsReqBuilder.send(KrbAsReqBuilder.java:316) at sun.security.krb5.KrbAsReqBuilder.action(KrbAsReqBuilder.java:361) at com.sun.security.auth.module.Krb5Login
[jira] [Created] (SPARK-31249) Flaky Test: CoarseGrainedSchedulerBackendSuite.custom log url for Spark UI is applied
Hyukjin Kwon created SPARK-31249: Summary: Flaky Test: CoarseGrainedSchedulerBackendSuite.custom log url for Spark UI is applied Key: SPARK-31249 URL: https://issues.apache.org/jira/browse/SPARK-31249 Project: Spark Issue Type: Test Components: Tests Affects Versions: 3.1.0 Reporter: Hyukjin Kwon https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120302/testReport/ {code} sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: 2 did not equal 3 at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530) at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529) at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560) at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:503) at org.apache.spark.scheduler.CoarseGrainedSchedulerBackendSuite.$anonfun$new$11(CoarseGrainedSchedulerBackendSuite.scala:186) at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:151) at org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184) at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:286) at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196) at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31248) Flaky Test: org.apache.spark.ExecutorAllocationManagerSuite.interleaving add and remove
Hyukjin Kwon created SPARK-31248: Summary: Flaky Test: org.apache.spark.ExecutorAllocationManagerSuite.interleaving add and remove Key: SPARK-31248 URL: https://issues.apache.org/jira/browse/SPARK-31248 Project: Spark Issue Type: Test Components: Tests Affects Versions: 3.1.0 Reporter: Hyukjin Kwon https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120300/testReport/ sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: 12 did not equal 8 at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530) at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529) at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560) at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:503) at org.apache.spark.ExecutorAllocationManagerSuite.$anonfun$new$51(ExecutorAllocationManagerSuite.scala:864) at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:151) at org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31247) Flaky test: KafkaContinuousSourceSuite.assign from latest offsets (failOnDataLoss: false)
Gabor Somogyi created SPARK-31247: - Summary: Flaky test: KafkaContinuousSourceSuite.assign from latest offsets (failOnDataLoss: false) Key: SPARK-31247 URL: https://issues.apache.org/jira/browse/SPARK-31247 Project: Spark Issue Type: Bug Components: Structured Streaming, Tests Affects Versions: 3.0.0, 3.1.0 Reporter: Gabor Somogyi Error Message org.scalatest.exceptions.TestFailedException: Error adding data: Timeout after waiting for 1 ms. org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:78) org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:30) org.apache.spark.sql.kafka010.KafkaTestUtils.$anonfun$sendMessages$3(KafkaTestUtils.scala:425) scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) scala.collection.TraversableLike.map(TraversableLike.scala:238) scala.collection.TraversableLike.map$(TraversableLike.scala:231) scala.collection.AbstractTraversable.map(Traversable.scala:108) == Progress ==AssertOnQuery(, )AddKafkaData(topics = Set(topic-13), data = WrappedArray(1, 2, 3), message = )CheckAnswer: [2],[3],[4]StopStream StartStream(ContinuousTrigger(1000),org.apache.spark.util.SystemClock@1f1a9495,Map(),null) CheckAnswer: [2],[3],[4]StopStreamAddKafkaData(topics = Set(topic-13), data = WrappedArray(4, 5, 6), message = ) StartStream(ContinuousTrigger(1000),org.apache.spark.util.SystemClock@2b3bec2c,Map(),null) CheckAnswer: [2],[3],[4],[5],[6],[7] => AddKafkaData(topics = Set(topic-13), data = WrappedArray(7, 8), message = )CheckAnswer: [2],[3],[4],[5],[6],[7],[8],[9]AssertOnQuery(, Add partitions) AddKafkaData(topics = Set(topic-13), data = WrappedArray(9, 10, 11, 12, 13, 14, 15, 16), message = )CheckAnswer: [2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12],[13],[14],[15],[16],[17] == Stream == Output Mode: Append Stream state: {KafkaSource[Assign[topic-13-4, topic-13-3, topic-13-2, topic-13-1, topic-13-0]]: {"topic-13":{"2":2,"4":2,"1":1,"3":1,"0":1}}} Thread state: alive Thread stack trace: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:242) scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:258) scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:187) org.apache.spark.util.ThreadUtils$.awaitReady(ThreadUtils.scala:336) org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:746) org.apache.spark.SparkContext.runJob(SparkContext.scala:2104) org.apache.spark.SparkContext.runJob(SparkContext.scala:2125) org.apache.spark.SparkContext.runJob(SparkContext.scala:2144) org.apache.spark.SparkContext.runJob(SparkContext.scala:2169) org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1006) org.apache.spark.rdd.RDD$$Lambda$2999/724038556.apply(Unknown Source) org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) org.apache.spark.rdd.RDD.withScope(RDD.scala:390) org.apache.spark.rdd.RDD.collect(RDD.scala:1005) org.apache.spark.sql.execution.streaming.continuous.WriteToContinuousDataSourceExec.doExecute(WriteToContinuousDataSourceExec.scala:57) org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:175) org.apache.spark.sql.execution.SparkPlan$$Lambda$2791/4135277.apply(Unknown Source) org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:213) org.apache.spark.sql.execution.SparkPlan$$Lambda$2823/504830038.apply(Unknown Source) org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:210) org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:171) org.apache.spark.sql.execution.streaming.continuous.ContinuousExecution.$anonfun$runContinuous$4(ContinuousExecution.scala:256) org.apache.spark.sql.execution.streaming.continuous.ContinuousExecution$$Lambda$2765/297007729.apply(Unknown Source) org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.
[jira] [Updated] (SPARK-31228) Add version information to the configuration of Kafka
[ https://issues.apache.org/jira/browse/SPARK-31228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Somogyi updated SPARK-31228: -- Description: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/package.scala external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/package.scala was: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/package.scala > Add version information to the configuration of Kafka > - > > Key: SPARK-31228 > URL: https://issues.apache.org/jira/browse/SPARK-31228 > Project: Spark > Issue Type: Sub-task > Components: DStreams >Affects Versions: 3.1.0 >Reporter: jiaan.geng >Priority: Major > > external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/package.scala > external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/package.scala -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26341) Expose executor memory metrics at the stage level, in the Stages tab
[ https://issues.apache.org/jira/browse/SPARK-26341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066572#comment-17066572 ] angerszhu commented on SPARK-26341: --- I have do this in our own version, will raise a pr soon these days. > Expose executor memory metrics at the stage level, in the Stages tab > > > Key: SPARK-26341 > URL: https://issues.apache.org/jira/browse/SPARK-26341 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, Web UI >Affects Versions: 2.4.0 >Reporter: Edward Lu >Priority: Major > > Sub-task SPARK-23431 will add stage level executor memory metrics (peak > values for each stage, and peak values for each executor for the stage). This > information should also be exposed the the web UI, so that users can see > which stages are memory intensive. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31246) GracefulShutdown does not work when application is terminated from RestSubmissionClient or YarnClient
Gajanan Hebbar created SPARK-31246: -- Summary: GracefulShutdown does not work when application is terminated from RestSubmissionClient or YarnClient Key: SPARK-31246 URL: https://issues.apache.org/jira/browse/SPARK-31246 Project: Spark Issue Type: Bug Components: Java API Affects Versions: 2.4.3 Environment: spark-2.4.3 Reporter: Gajanan Hebbar While starting the Spark Application "*spark.streaming.stopGracefullyOnShutdown*" is set to true try to terminate the application programatically using JAVA API 1 using RestSubmissionClient client = new RestSubmissionClient(masterUrl); SubmitRestProtocolResponse statusResponse = client.killSubmission(submissionId); 2. using getYarnClient().killApplication(appId); In both the cases Application dose not stop gracefully But killing the Application using Kill -SIGTERM will shutdown the application gracefully. Expected : Application should have terminated gracefully in all cases when spark.streaming.stopGracefullyOnShutdown is set -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31232) Specify formats of `spark.sql.session.timeZone`
[ https://issues.apache.org/jira/browse/SPARK-31232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-31232. - Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 27999 [https://github.com/apache/spark/pull/27999] > Specify formats of `spark.sql.session.timeZone` > --- > > Key: SPARK-31232 > URL: https://issues.apache.org/jira/browse/SPARK-31232 > Project: Spark > Issue Type: Documentation > Components: SQL >Affects Versions: 2.4.5, 3.0.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.0.0 > > > There are two distinct types of ID (see > https://docs.oracle.com/javase/8/docs/api/java/time/ZoneId.html): > # Fixed offsets - a fully resolved offset from UTC/Greenwich, that uses the > same offset for all local date-times > # Geographical regions - an area where a specific set of rules for finding > the offset from UTC/Greenwich apply > For example three-letter time zone IDs are ambitious, and depend on the > locale. They have been already deprecated in JDK, see > https://docs.oracle.com/javase/8/docs/api/java/util/TimeZone.html : > {code} > For compatibility with JDK 1.1.x, some other three-letter time zone IDs (such > as "PST", "CTT", "AST") are also supported. However, their use is deprecated > because the same abbreviation is often used for multiple time zones (for > example, "CST" could be U.S. "Central Standard Time" and "China Standard > Time"), and the Java platform can then only recognize one of them. > {code} > The ticket aims to specify formats of the SQL config > *spark.sql.session.timeZone* in the 2 forms mentioned above. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31232) Specify formats of `spark.sql.session.timeZone`
[ https://issues.apache.org/jira/browse/SPARK-31232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-31232: --- Assignee: Maxim Gekk > Specify formats of `spark.sql.session.timeZone` > --- > > Key: SPARK-31232 > URL: https://issues.apache.org/jira/browse/SPARK-31232 > Project: Spark > Issue Type: Documentation > Components: SQL >Affects Versions: 2.4.5, 3.0.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > > There are two distinct types of ID (see > https://docs.oracle.com/javase/8/docs/api/java/time/ZoneId.html): > # Fixed offsets - a fully resolved offset from UTC/Greenwich, that uses the > same offset for all local date-times > # Geographical regions - an area where a specific set of rules for finding > the offset from UTC/Greenwich apply > For example three-letter time zone IDs are ambitious, and depend on the > locale. They have been already deprecated in JDK, see > https://docs.oracle.com/javase/8/docs/api/java/util/TimeZone.html : > {code} > For compatibility with JDK 1.1.x, some other three-letter time zone IDs (such > as "PST", "CTT", "AST") are also supported. However, their use is deprecated > because the same abbreviation is often used for multiple time zones (for > example, "CST" could be U.S. "Central Standard Time" and "China Standard > Time"), and the Java platform can then only recognize one of them. > {code} > The ticket aims to specify formats of the SQL config > *spark.sql.session.timeZone* in the 2 forms mentioned above. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-31218) counts in BinaryClassificationMetrics should be cached
[ https://issues.apache.org/jira/browse/SPARK-31218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066474#comment-17066474 ] CacheCheck edited comment on SPARK-31218 at 3/25/20, 8:40 AM: -- I mean rdd {{counts}} in the lazy val block below {{recallByThreshold}}. It is used by counts.count() when generating {{binnedCounts}}, and again by collect() when {{binnedCounts}} generates {{agg}}. So I think {{counts}} should be persisted, and it can be unpersisted after {{cumulativeCounts}} is persisted. was (Author: spark_cachecheck): I mean rdd {{counts}} below the lazy val block below {{recallByThreshold}}. It is used by counts.count() when generating {{binnedCounts}}, and again by collect() when {{binnedCounts}} generates {{agg}}. So I think {{counts}} should be persisted, and it can be unpersisted after {{cumulativeCounts}} is persisted. > counts in BinaryClassificationMetrics should be cached > -- > > Key: SPARK-31218 > URL: https://issues.apache.org/jira/browse/SPARK-31218 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib >Affects Versions: 2.4.4, 2.4.5 >Reporter: CacheCheck >Priority: Major > > In mllib.evaluation.BinaryClassifcationMetrics.recallByThreshold(), rdd > _counts_ should be cached for the following multiple actions will use it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-31218) counts in BinaryClassificationMetrics should be cached
[ https://issues.apache.org/jira/browse/SPARK-31218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066474#comment-17066474 ] CacheCheck edited comment on SPARK-31218 at 3/25/20, 8:37 AM: -- I mean rdd {{counts}} below the lazy val block below {{recallByThreshold}}. It is used by counts.count() when generating {{binnedCounts}}, and again by collect() when {{binnedCounts}} generates {{agg}}. So I think {{counts}} should be persisted, and it can be unpersisted after {{cumulativeCounts}} is persisted. was (Author: spark_cachecheck): I mean rdd {{counts}} belong in the lazy val block below {{recallByThreshold}}. It is used by counts.count() when generating {{binnedCounts}}, and again by collect() when {{binnedCounts}} generates {{agg}}. So I think {{counts}} should be persisted, and it can be unpersisted after {{cumulativeCounts}} is persisted. > counts in BinaryClassificationMetrics should be cached > -- > > Key: SPARK-31218 > URL: https://issues.apache.org/jira/browse/SPARK-31218 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib >Affects Versions: 2.4.4, 2.4.5 >Reporter: CacheCheck >Priority: Major > > In mllib.evaluation.BinaryClassifcationMetrics.recallByThreshold(), rdd > _counts_ should be cached for the following multiple actions will use it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31218) counts in BinaryClassificationMetrics should be cached
[ https://issues.apache.org/jira/browse/SPARK-31218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066474#comment-17066474 ] CacheCheck commented on SPARK-31218: I mean rdd {{counts}} belong in the lazy val block below {{recallByThreshold}}. It is used by counts.count() when generating {{binnedCounts}}, and again by collect() when {{binnedCounts}} generates {{agg}}. So I think {{counts}} should be persisted, and it can be unpersisted after {{cumulativeCounts}} is persisted. > counts in BinaryClassificationMetrics should be cached > -- > > Key: SPARK-31218 > URL: https://issues.apache.org/jira/browse/SPARK-31218 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib >Affects Versions: 2.4.4, 2.4.5 >Reporter: CacheCheck >Priority: Major > > In mllib.evaluation.BinaryClassifcationMetrics.recallByThreshold(), rdd > _counts_ should be cached for the following multiple actions will use it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-30822) Pyspark queries fail if terminated with a semicolon
[ https://issues.apache.org/jira/browse/SPARK-30822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-30822: --- Assignee: Samuel Setegne > Pyspark queries fail if terminated with a semicolon > --- > > Key: SPARK-30822 > URL: https://issues.apache.org/jira/browse/SPARK-30822 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 3.0.0 >Reporter: Samuel Setegne >Assignee: Samuel Setegne >Priority: Minor > Fix For: 3.0.0 > > Original Estimate: 10m > Remaining Estimate: 10m > > When a user submits a directly executable SQL statement terminated with a > semicolon, they receive a > `org.apache.spark.sql.catalyst.parser.ParseException` of `mismatched input > ";"`. SQL-92 describes a direct SQL statement as having the format of > ` ` and the majority of SQL > implementations either require the semicolon as a statement terminator, or > make it optional (meaning not raising an exception when it's included, > seemingly in recognition that it's a common behavior). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31184) Support getTablesByType API of Hive Client
[ https://issues.apache.org/jira/browse/SPARK-31184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-31184: -- Fix Version/s: (was: 3.1.0) 3.0.0 > Support getTablesByType API of Hive Client > -- > > Key: SPARK-31184 > URL: https://issues.apache.org/jira/browse/SPARK-31184 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Xin Wu >Assignee: Xin Wu >Priority: Major > Fix For: 3.0.0 > > > Hive 2.3+ supports getTablesByType API, which is a precondition to implement > SHOW VIEWS in HiveExternalCatalog. Currently, without this API, we can not > get hive table with type HiveTableType.VIRTUAL_VIEW directly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31184) Support getTablesByType API of Hive Client
[ https://issues.apache.org/jira/browse/SPARK-31184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-31184: -- Affects Version/s: (was: 3.1.0) 3.0.0 > Support getTablesByType API of Hive Client > -- > > Key: SPARK-31184 > URL: https://issues.apache.org/jira/browse/SPARK-31184 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xin Wu >Assignee: Xin Wu >Priority: Major > Fix For: 3.0.0 > > > Hive 2.3+ supports getTablesByType API, which is a precondition to implement > SHOW VIEWS in HiveExternalCatalog. Currently, without this API, we can not > get hive table with type HiveTableType.VIRTUAL_VIEW directly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-30822) Pyspark queries fail if terminated with a semicolon
[ https://issues.apache.org/jira/browse/SPARK-30822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-30822. - Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 27567 [https://github.com/apache/spark/pull/27567] > Pyspark queries fail if terminated with a semicolon > --- > > Key: SPARK-30822 > URL: https://issues.apache.org/jira/browse/SPARK-30822 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 3.0.0 >Reporter: Samuel Setegne >Priority: Minor > Fix For: 3.0.0 > > Original Estimate: 10m > Remaining Estimate: 10m > > When a user submits a directly executable SQL statement terminated with a > semicolon, they receive a > `org.apache.spark.sql.catalyst.parser.ParseException` of `mismatched input > ";"`. SQL-92 describes a direct SQL statement as having the format of > ` ` and the majority of SQL > implementations either require the semicolon as a statement terminator, or > make it optional (meaning not raising an exception when it's included, > seemingly in recognition that it's a common behavior). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org