[jira] [Commented] (SPARK-36860) Create the external hive table for HBase failed

2021-09-26 Thread Kousuke Saruta (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420483#comment-17420483
 ] 

Kousuke Saruta commented on SPARK-36860:


[~yimo_yym]
Spark doesn't support creating Hive table using storage handlers yet.
Please see also 
http://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html#specifying-storage-format-for-hive-tables


> Create the external hive table for HBase failed 
> 
>
> Key: SPARK-36860
> URL: https://issues.apache.org/jira/browse/SPARK-36860
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: wineternity
>Priority: Major
>
> We use follow sql to create hive external table , which read from hbase
> {code:java}
> CREATE EXTERNAL TABLE if not exists dev.sanyu_spotlight_headline_material(
>rowkey string COMMENT 'HBase主键',
>content string COMMENT '图文正文')
> USING HIVE   
> ROW FORMAT SERDE
>'org.apache.hadoop.hive.hbase.HBaseSerDe'
>  STORED BY
>'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>  WITH SERDEPROPERTIES (
>'hbase.columns.mapping'=':key, cf1:content'
> )
>  TBLPROPERTIES (
>'hbase.table.name'='spotlight_headline_material'
>  );
> {code}
> But the sql failed in Spark 3.1.2, which throw this exception
> {code:java}
> 21/09/27 11:44:24 INFO scheduler.DAGScheduler: Asked to cancel job group 
> 26d7459f-7b58-4c18-9939-5f2737525ff2
> 21/09/27 11:44:24 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query with 26d7459f-7b58-4c18-9939-5f2737525ff2, currentState 
> RUNNING,
> org.apache.spark.sql.catalyst.parser.ParseException:
> Operation not allowed: Unexpected combination of ROW FORMAT SERDE 
> 'org.apache.hadoop.hive.hbase.HBaseSerDe' and STORED BY 
> 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'WITHSERDEPROPERTIES('hbase.columns.mapping'=':key,
>  cf1:content')(line 5, pos 0)
> {code}
> this check was introduced from this change: 
> [https://github.com/apache/spark/pull/28026]
>  
> Could anyone gave the introduction how to create the external table for hbase 
> in spark3 now ? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36856) Building by "./build/mvn" may be stuck on MacOS

2021-09-26 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420482#comment-17420482
 ] 

Apache Spark commented on SPARK-36856:
--

User 'copperybean' has created a pull request for this issue:
https://github.com/apache/spark/pull/34111

> Building by "./build/mvn" may be stuck on MacOS
> ---
>
> Key: SPARK-36856
> URL: https://issues.apache.org/jira/browse/SPARK-36856
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.0.0, 3.3.0
> Environment: MacOS 11.4
>Reporter: copperybean
>Priority: Major
>
> Command "./build/mvn" will be stuck on my MacOS 11.4. Because it is using 
> error java home. On my mac, "/usr/bin/java" is a real file instead of a 
> symbolic link, so the java home is set to path "/usr", and lead the launched 
> maven process stuck with this error java home.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36856) Building by "./build/mvn" may be stuck on MacOS

2021-09-26 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420481#comment-17420481
 ] 

Apache Spark commented on SPARK-36856:
--

User 'copperybean' has created a pull request for this issue:
https://github.com/apache/spark/pull/34111

> Building by "./build/mvn" may be stuck on MacOS
> ---
>
> Key: SPARK-36856
> URL: https://issues.apache.org/jira/browse/SPARK-36856
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.0.0, 3.3.0
> Environment: MacOS 11.4
>Reporter: copperybean
>Priority: Major
>
> Command "./build/mvn" will be stuck on my MacOS 11.4. Because it is using 
> error java home. On my mac, "/usr/bin/java" is a real file instead of a 
> symbolic link, so the java home is set to path "/usr", and lead the launched 
> maven process stuck with this error java home.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36856) Building by "./build/mvn" may be stuck on MacOS

2021-09-26 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36856:


Assignee: Apache Spark

> Building by "./build/mvn" may be stuck on MacOS
> ---
>
> Key: SPARK-36856
> URL: https://issues.apache.org/jira/browse/SPARK-36856
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.0.0, 3.3.0
> Environment: MacOS 11.4
>Reporter: copperybean
>Assignee: Apache Spark
>Priority: Major
>
> Command "./build/mvn" will be stuck on my MacOS 11.4. Because it is using 
> error java home. On my mac, "/usr/bin/java" is a real file instead of a 
> symbolic link, so the java home is set to path "/usr", and lead the launched 
> maven process stuck with this error java home.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36856) Building by "./build/mvn" may be stuck on MacOS

2021-09-26 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36856:


Assignee: (was: Apache Spark)

> Building by "./build/mvn" may be stuck on MacOS
> ---
>
> Key: SPARK-36856
> URL: https://issues.apache.org/jira/browse/SPARK-36856
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.0.0, 3.3.0
> Environment: MacOS 11.4
>Reporter: copperybean
>Priority: Major
>
> Command "./build/mvn" will be stuck on my MacOS 11.4. Because it is using 
> error java home. On my mac, "/usr/bin/java" is a real file instead of a 
> symbolic link, so the java home is set to path "/usr", and lead the launched 
> maven process stuck with this error java home.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36856) Building by "./build/mvn" may be stuck on MacOS

2021-09-26 Thread copperybean (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

copperybean updated SPARK-36856:

Priority: Major  (was: Minor)

> Building by "./build/mvn" may be stuck on MacOS
> ---
>
> Key: SPARK-36856
> URL: https://issues.apache.org/jira/browse/SPARK-36856
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.0.0, 3.3.0
> Environment: MacOS 11.4
>Reporter: copperybean
>Priority: Major
>
> Command "./build/mvn" will be stuck on my MacOS 11.4. Because it is using 
> error java home. On my mac, "/usr/bin/java" is a real file instead of a 
> symbolic link, so the java home is set to path "/usr", and lead the launched 
> maven process stuck with this error java home.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36740) collection operators should handle duplicated NaN

2021-09-26 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-36740.
-
Fix Version/s: 3.2.0
 Assignee: angerszhu
   Resolution: Fixed

> collection operators should handle duplicated NaN
> -
>
> Key: SPARK-36740
> URL: https://issues.apache.org/jira/browse/SPARK-36740
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
>  Labels: correctness
> Fix For: 3.2.0
>
>
> collection operators should handle duplicated NaN, current OpenHashSet can't 
> handle duplicated NaN



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36835) Spark 3.2.0 POMs are no longer "dependency reduced"

2021-09-26 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420473#comment-17420473
 ] 

Apache Spark commented on SPARK-36835:
--

User 'sunchao' has created a pull request for this issue:
https://github.com/apache/spark/pull/34110

> Spark 3.2.0 POMs are no longer "dependency reduced"
> ---
>
> Key: SPARK-36835
> URL: https://issues.apache.org/jira/browse/SPARK-36835
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Josh Rosen
>Assignee: Chao Sun
>Priority: Blocker
> Fix For: 3.2.0
>
>
> It looks like Spark 3.2.0's POMs are no longer "dependency reduced". As a 
> result, applications may pull in additional unnecessary dependencies when 
> depending on Spark.
> Spark uses the Maven Shade plugin to create effective POMs and to bundle 
> shaded versions of certain libraries with Spark (namely, Jetty, Guava, and 
> JPPML). [By 
> default|https://maven.apache.org/plugins/maven-shade-plugin/shade-mojo.html#createDependencyReducedPom],
>  the Maven Shade plugin generates simplified POMs which remove dependencies 
> on artifacts that have been shaded.
> SPARK-33212 / 
> [b6f46ca29742029efea2790af7fdefbc2fcf52de|https://github.com/apache/spark/commit/b6f46ca29742029efea2790af7fdefbc2fcf52de]
>  changed the configuration of the Maven Shade plugin, setting 
> {{createDependencyReducedPom}} to {{false}}.
> As a result, the generated POMs now include compile-scope dependencies on the 
> shaded libraries. For example, compare the {{org.eclipse.jetty}} dependencies 
> in:
>  * Spark 3.1.2: 
> [https://repo1.maven.org/maven2/org/apache/spark/spark-core_2.12/3.1.2/spark-core_2.12-3.1.2.pom]
>  * Spark 3.2.0 RC2: 
> [https://repository.apache.org/content/repositories/orgapachespark-1390/org/apache/spark/spark-core_2.12/3.2.0/spark-core_2.12-3.2.0.pom]
> I think we should revert back to generating "dependency reduced" POMs to 
> ensure that Spark declares a proper set of dependencies and to avoid "unknown 
> unknown" consequences of changing our generated POM format.
> /cc [~csun]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36860) Create the external hive table for HBase failed

2021-09-26 Thread wineternity (Jira)
wineternity created SPARK-36860:
---

 Summary: Create the external hive table for HBase failed 
 Key: SPARK-36860
 URL: https://issues.apache.org/jira/browse/SPARK-36860
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.2
Reporter: wineternity


We use follow sql to create hive external table , which read from hbase
{code:java}
CREATE EXTERNAL TABLE if not exists dev.sanyu_spotlight_headline_material(
   rowkey string COMMENT 'HBase主键',
   content string COMMENT '图文正文')
USING HIVE   
ROW FORMAT SERDE
   'org.apache.hadoop.hive.hbase.HBaseSerDe'
 STORED BY
   'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
 WITH SERDEPROPERTIES (
   'hbase.columns.mapping'=':key, cf1:content'
)
 TBLPROPERTIES (
   'hbase.table.name'='spotlight_headline_material'
 );
{code}
But the sql failed in Spark 3.1.2, which throw this exception
{code:java}
21/09/27 11:44:24 INFO scheduler.DAGScheduler: Asked to cancel job group 
26d7459f-7b58-4c18-9939-5f2737525ff2
21/09/27 11:44:24 ERROR thriftserver.SparkExecuteStatementOperation: Error 
executing query with 26d7459f-7b58-4c18-9939-5f2737525ff2, currentState RUNNING,
org.apache.spark.sql.catalyst.parser.ParseException:
Operation not allowed: Unexpected combination of ROW FORMAT SERDE 
'org.apache.hadoop.hive.hbase.HBaseSerDe' and STORED BY 
'org.apache.hadoop.hive.hbase.HBaseStorageHandler'WITHSERDEPROPERTIES('hbase.columns.mapping'=':key,
 cf1:content')(line 5, pos 0)
{code}
this check was introduced from this change: 
[https://github.com/apache/spark/pull/28026]

 

Could anyone gave the introduction how to create the external table for hbase 
in spark3 now ? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36838) Improve InSet NaN check generated code performance

2021-09-26 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-36838:
-
Priority: Minor  (was: Major)

> Improve InSet NaN check generated code performance
> --
>
> Key: SPARK-36838
> URL: https://issues.apache.org/jira/browse/SPARK-36838
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.3, 3.1.2, 3.2.0
>Reporter: angerszhu
>Assignee: Apache Spark
>Priority: Minor
>
> Since Set can't check is NaN value is contained in current set.
> With codegen, only when value set contains NaN then we have necessary to 
> check if the value is NaN, or we just need t
> o check is the Set contains the value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36838) Improve InSet NaN check generated code performance

2021-09-26 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-36838.
--
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34097
[https://github.com/apache/spark/pull/34097]

> Improve InSet NaN check generated code performance
> --
>
> Key: SPARK-36838
> URL: https://issues.apache.org/jira/browse/SPARK-36838
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.3, 3.1.2, 3.2.0
>Reporter: angerszhu
>Assignee: Apache Spark
>Priority: Minor
> Fix For: 3.3.0
>
>
> Since Set can't check is NaN value is contained in current set.
> With codegen, only when value set contains NaN then we have necessary to 
> check if the value is NaN, or we just need t
> o check is the Set contains the value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36838) Improve InSet NaN check generated code performance

2021-09-26 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-36838:
-
Issue Type: Improvement  (was: Bug)

> Improve InSet NaN check generated code performance
> --
>
> Key: SPARK-36838
> URL: https://issues.apache.org/jira/browse/SPARK-36838
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.3, 3.1.2, 3.2.0
>Reporter: angerszhu
>Assignee: Apache Spark
>Priority: Major
>
> Since Set can't check is NaN value is contained in current set.
> With codegen, only when value set contains NaN then we have necessary to 
> check if the value is NaN, or we just need t
> o check is the Set contains the value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36853) Code failing on checkstyle

2021-09-26 Thread Shockang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420438#comment-17420438
 ] 

Shockang commented on SPARK-36853:
--

OK

> Code failing on checkstyle
> --
>
> Key: SPARK-36853
> URL: https://issues.apache.org/jira/browse/SPARK-36853
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Abhinav Kumar
>Priority: Trivial
>
> There are more - just pasting sample 
>  
> [INFO] There are 32 errors reported by Checkstyle 8.43 with 
> dev/checkstyle.xml ruleset.
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF11.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 107).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF12.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 116).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF13.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 104).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF13.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 125).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF14.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 109).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF14.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 134).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF15.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 114).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF15.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 143).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF16.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 119).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF16.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 152).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF17.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 124).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF17.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 161).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF18.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 129).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF18.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 170).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF19.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 134).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF19.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 179).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF20.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 139).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF20.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 188).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF21.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 144).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF21.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 197).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF22.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 149).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF22.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 206).
> [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[44,25] 
> (naming) MethodName: Method name 'ProcessingTime' must match pattern 
> '^[a-z][a-z0-9][a-zA-Z0-9_]*$'.
> [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[60,25] 
> (naming) MethodName: Method name 'ProcessingTime' must match pattern 
> '^[a-z][a-z0-9][a-zA-Z0-9_]*$'.
> [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[75,25] 
> (naming) MethodName: Method name 'ProcessingTime' must match pattern 
> '^[a-z][a-z0-9][a-zA-Z0-9_]*$'.
> [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[88,25] 
> (naming) MethodName: Method name 'ProcessingTime' must match pattern 
> '^[a-z][a-z0-9][a-zA-Z0-9_]*$'.
> [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[100,25] 
> (naming) MethodName: Method name 'Once' must match pattern 
> '^[a-z][a-z0-9][a-zA-Z0-9_]*$'.
> [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[110,25] 
> (naming) MethodName: Method name 'AvailableNow' must match pattern 
> '^[a-z][a-z0-9][a-zA-Z0-9_]*$'.
> [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[120,25] 
> 

[jira] [Reopened] (SPARK-36853) Code failing on checkstyle

2021-09-26 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reopened SPARK-36853:
--

> Code failing on checkstyle
> --
>
> Key: SPARK-36853
> URL: https://issues.apache.org/jira/browse/SPARK-36853
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Abhinav Kumar
>Priority: Trivial
>
> There are more - just pasting sample 
>  
> [INFO] There are 32 errors reported by Checkstyle 8.43 with 
> dev/checkstyle.xml ruleset.
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF11.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 107).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF12.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 116).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF13.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 104).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF13.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 125).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF14.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 109).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF14.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 134).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF15.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 114).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF15.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 143).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF16.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 119).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF16.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 152).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF17.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 124).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF17.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 161).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF18.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 129).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF18.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 170).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF19.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 134).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF19.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 179).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF20.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 139).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF20.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 188).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF21.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 144).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF21.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 197).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF22.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 149).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF22.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 206).
> [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[44,25] 
> (naming) MethodName: Method name 'ProcessingTime' must match pattern 
> '^[a-z][a-z0-9][a-zA-Z0-9_]*$'.
> [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[60,25] 
> (naming) MethodName: Method name 'ProcessingTime' must match pattern 
> '^[a-z][a-z0-9][a-zA-Z0-9_]*$'.
> [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[75,25] 
> (naming) MethodName: Method name 'ProcessingTime' must match pattern 
> '^[a-z][a-z0-9][a-zA-Z0-9_]*$'.
> [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[88,25] 
> (naming) MethodName: Method name 'ProcessingTime' must match pattern 
> '^[a-z][a-z0-9][a-zA-Z0-9_]*$'.
> [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[100,25] 
> (naming) MethodName: Method name 'Once' must match pattern 
> '^[a-z][a-z0-9][a-zA-Z0-9_]*$'.
> [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[110,25] 
> (naming) MethodName: Method name 'AvailableNow' must match pattern 
> '^[a-z][a-z0-9][a-zA-Z0-9_]*$'.
> [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[120,25] 
> (naming) MethodName: Method name 

[jira] [Commented] (SPARK-36853) Code failing on checkstyle

2021-09-26 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420417#comment-17420417
 ] 

Hyukjin Kwon commented on SPARK-36853:
--

okay, they look valid. I don't know why they are not caught in CI though. feel 
free to make a Pr and go ahead.

> Code failing on checkstyle
> --
>
> Key: SPARK-36853
> URL: https://issues.apache.org/jira/browse/SPARK-36853
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Abhinav Kumar
>Priority: Trivial
>
> There are more - just pasting sample 
>  
> [INFO] There are 32 errors reported by Checkstyle 8.43 with 
> dev/checkstyle.xml ruleset.
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF11.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 107).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF12.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 116).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF13.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 104).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF13.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 125).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF14.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 109).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF14.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 134).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF15.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 114).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF15.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 143).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF16.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 119).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF16.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 152).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF17.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 124).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF17.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 161).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF18.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 129).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF18.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 170).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF19.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 134).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF19.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 179).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF20.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 139).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF20.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 188).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF21.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 144).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF21.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 197).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF22.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 149).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF22.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 206).
> [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[44,25] 
> (naming) MethodName: Method name 'ProcessingTime' must match pattern 
> '^[a-z][a-z0-9][a-zA-Z0-9_]*$'.
> [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[60,25] 
> (naming) MethodName: Method name 'ProcessingTime' must match pattern 
> '^[a-z][a-z0-9][a-zA-Z0-9_]*$'.
> [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[75,25] 
> (naming) MethodName: Method name 'ProcessingTime' must match pattern 
> '^[a-z][a-z0-9][a-zA-Z0-9_]*$'.
> [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[88,25] 
> (naming) MethodName: Method name 'ProcessingTime' must match pattern 
> '^[a-z][a-z0-9][a-zA-Z0-9_]*$'.
> [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[100,25] 
> (naming) MethodName: Method name 'Once' must match pattern 
> '^[a-z][a-z0-9][a-zA-Z0-9_]*$'.
> [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[110,25] 
> (naming) MethodName: Method name 'AvailableNow' must match pattern 
> 

[jira] [Assigned] (SPARK-36859) Upgrade kubernetes-client to 5.8.0

2021-09-26 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-36859:
-

Assignee: Dongjoon Hyun

> Upgrade kubernetes-client to 5.8.0
> --
>
> Key: SPARK-36859
> URL: https://issues.apache.org/jira/browse/SPARK-36859
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Kubernetes
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>
> This issue aims to support Kubernetes Model v1.22.1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36859) Upgrade kubernetes-client to 5.8.0

2021-09-26 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-36859.
---
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34109
[https://github.com/apache/spark/pull/34109]

> Upgrade kubernetes-client to 5.8.0
> --
>
> Key: SPARK-36859
> URL: https://issues.apache.org/jira/browse/SPARK-36859
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Kubernetes
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.3.0
>
>
> This issue aims to support Kubernetes Model v1.22.1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36802) Incorrect writing the string, containing symbols like "\" to Hive

2021-09-26 Thread Vladislav (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420397#comment-17420397
 ] 

Vladislav commented on SPARK-36802:
---

[~hyukjin.kwon], 3.x has the same problem.

> Incorrect writing the string, containing symbols like "\" to Hive 
> --
>
> Key: SPARK-36802
> URL: https://issues.apache.org/jira/browse/SPARK-36802
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Vladislav
>Priority: Minor
>
> After writing the strings, containing symbol like "\" to Hive, the result 
> record in HiveTable doesn't contain that symbol. It happens when using the 
> standart method of pyspark.sql.readwriter.DataFrameWriter saveAsTable as well 
> as insertInto.
> For example, running the query
> spark.sql("select '\d\{4}' as code").write.saveAsTable('db.table')
> I've got the next result in Hive:
> spark.table('db.table').collect()[0][0]
> >>"d\{4}"
> But expected the next 
> >> "\d\{4}"
> Spark version : '2.3.0.2.6.5.0-292'
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36859) Upgrade kubernetes-client to 5.8.0

2021-09-26 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420383#comment-17420383
 ] 

Apache Spark commented on SPARK-36859:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/34109

> Upgrade kubernetes-client to 5.8.0
> --
>
> Key: SPARK-36859
> URL: https://issues.apache.org/jira/browse/SPARK-36859
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Kubernetes
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> This issue aims to support Kubernetes Model v1.22.1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36859) Upgrade kubernetes-client to 5.8.0

2021-09-26 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36859:


Assignee: (was: Apache Spark)

> Upgrade kubernetes-client to 5.8.0
> --
>
> Key: SPARK-36859
> URL: https://issues.apache.org/jira/browse/SPARK-36859
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Kubernetes
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> This issue aims to support Kubernetes Model v1.22.1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36859) Upgrade kubernetes-client to 5.8.0

2021-09-26 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36859:


Assignee: Apache Spark

> Upgrade kubernetes-client to 5.8.0
> --
>
> Key: SPARK-36859
> URL: https://issues.apache.org/jira/browse/SPARK-36859
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Kubernetes
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>
> This issue aims to support Kubernetes Model v1.22.1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36859) Upgrade kubernetes-client to 5.8.0

2021-09-26 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420382#comment-17420382
 ] 

Apache Spark commented on SPARK-36859:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/34109

> Upgrade kubernetes-client to 5.8.0
> --
>
> Key: SPARK-36859
> URL: https://issues.apache.org/jira/browse/SPARK-36859
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Kubernetes
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> This issue aims to support Kubernetes Model v1.22.1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36859) Upgrade kubernetes-client to 5.8.0

2021-09-26 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-36859:
-

 Summary: Upgrade kubernetes-client to 5.8.0
 Key: SPARK-36859
 URL: https://issues.apache.org/jira/browse/SPARK-36859
 Project: Spark
  Issue Type: Improvement
  Components: Build, Kubernetes
Affects Versions: 3.3.0
Reporter: Dongjoon Hyun


This issue aims to support Kubernetes Model v1.22.1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36858) Spark API to apply same function to multiple columns

2021-09-26 Thread Armand BERGES (Jira)
Armand BERGES created SPARK-36858:
-

 Summary: Spark API to apply same function to multiple columns
 Key: SPARK-36858
 URL: https://issues.apache.org/jira/browse/SPARK-36858
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Affects Versions: 3.1.2, 2.4.8
Reporter: Armand BERGES


Hi

My team and I have regularly need to apply the same function to multiple 
columns at once.

For example, we want to remove all non alphanumerical characters to each 
columns of our dataframes. 

When we hit this use case first, some people in my team were using this kind of 
code : 


{code:java}
val colListToClean =  ## Generate some list, could be very long.
val dfToClean: DataFrame = ... ## This is the dataframe we want to clean
def cleanFunction(colName: String): Column = ... ## Write some function to 
manipulate column based on its name.
val dfCleaned = colListToClean.foldLeft(dfToClean)((df, colName) => 
df.withColumn(colName, cleanFunction(colName)){code}

This kind of code when applied on a large set of columns overloaded our driver 
(because a Dataframe is generated for each column to clean).

Based on this issue, we developed some code to add two functions : 


 * One to apply the same function to multiple columns
 * One to rename multiple columns based on a Map. 

 

I wonder if your ever ask your team to add such kind of API ? If you did, had 
you any kind of issue regarding the implementation ? If you didn't, is this any 
idea you could add to Spark ? 

Best regards, 

 

LvffY

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-36694) Unable to build Spark on Azure DevOps with ubuntu-latest

2021-09-26 Thread Armand BERGES (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420327#comment-17420327
 ] 

Armand BERGES edited comment on SPARK-36694 at 9/26/21, 4:28 PM:
-

Didn't see last answer from [~sarutak], just answer now.


was (Author: lvffy):
Didn't last answer from [~sarutak], just answer now.

> Unable to build Spark on Azure DevOps with ubuntu-latest
> 
>
> Key: SPARK-36694
> URL: https://issues.apache.org/jira/browse/SPARK-36694
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.8
>Reporter: Armand BERGES
>Priority: Minor
>
> Hello
> With my team we're currently triying to set up some test environment by using 
> Spark on kubernetes.
> For this purpose, we're following your (great) documentation to [build 
> spark|https://spark.apache.org/docs/2.4.8/building-spark.html] and [build 
> spark docker 
> images|https://spark.apache.org/docs/latest/running-on-kubernetes.html#docker-images].
>  
> To make our build, we're using Azure DevOps.
> I don't know if it's a known bug or requirements I didn't see but I found 
> that I couldn't build spark on the Azure agent *ubuntu-latest* which I trust 
> to be *ubuntu-20.04*. The exact same build works on *ubuntu-18.04*. 
> My maven build always failed on building *spark-unsafe_2.11* with the 
> following error :
> {code:java}
> [error] warning: [options] bootstrap class path not set in conjunction with 
> -source 8 
> [error] 
> /home/vsts/work/1/s/spark/common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java:25:
>  error: cannot find symbol [error] import sun.misc.Cleaner; 
> [error] ^ 
> [error] symbol: class Cleaner 
> [error] location: package sun.misc 
> [error] 
> /home/vsts/work/1/s/spark/common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java:172:
>  error: cannot find symbol 
> [error] Cleaner cleaner = Cleaner.create(buffer, () -> freeMemory(memory)); 
> [error] ^ 
> [error] symbol: class Cleaner [error] location: class Platform 
> [error] 
> /home/vsts/work/1/s/spark/common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java:172:
>  error: cannot find symbol 
> [error] Cleaner cleaner = Cleaner.create(buffer, () -> freeMemory(memory)); 
> [error] ^ 
> [error] symbol: variable Cleaner 
> [error] location: class Platform 
> [error] 3 errors 
> [error] 1 warning 
> [error] Compile failed at Sep 8, 2021 10:37:02 AM [1.126s]{code}
>  
> Please tell me if I miss anything, 
> Best regards



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-36694) Unable to build Spark on Azure DevOps with ubuntu-latest

2021-09-26 Thread Armand BERGES (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Armand BERGES reopened SPARK-36694:
---

Didn't last answer from [~sarutak], just answer now.

> Unable to build Spark on Azure DevOps with ubuntu-latest
> 
>
> Key: SPARK-36694
> URL: https://issues.apache.org/jira/browse/SPARK-36694
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.8
>Reporter: Armand BERGES
>Priority: Minor
>
> Hello
> With my team we're currently triying to set up some test environment by using 
> Spark on kubernetes.
> For this purpose, we're following your (great) documentation to [build 
> spark|https://spark.apache.org/docs/2.4.8/building-spark.html] and [build 
> spark docker 
> images|https://spark.apache.org/docs/latest/running-on-kubernetes.html#docker-images].
>  
> To make our build, we're using Azure DevOps.
> I don't know if it's a known bug or requirements I didn't see but I found 
> that I couldn't build spark on the Azure agent *ubuntu-latest* which I trust 
> to be *ubuntu-20.04*. The exact same build works on *ubuntu-18.04*. 
> My maven build always failed on building *spark-unsafe_2.11* with the 
> following error :
> {code:java}
> [error] warning: [options] bootstrap class path not set in conjunction with 
> -source 8 
> [error] 
> /home/vsts/work/1/s/spark/common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java:25:
>  error: cannot find symbol [error] import sun.misc.Cleaner; 
> [error] ^ 
> [error] symbol: class Cleaner 
> [error] location: package sun.misc 
> [error] 
> /home/vsts/work/1/s/spark/common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java:172:
>  error: cannot find symbol 
> [error] Cleaner cleaner = Cleaner.create(buffer, () -> freeMemory(memory)); 
> [error] ^ 
> [error] symbol: class Cleaner [error] location: class Platform 
> [error] 
> /home/vsts/work/1/s/spark/common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java:172:
>  error: cannot find symbol 
> [error] Cleaner cleaner = Cleaner.create(buffer, () -> freeMemory(memory)); 
> [error] ^ 
> [error] symbol: variable Cleaner 
> [error] location: class Platform 
> [error] 3 errors 
> [error] 1 warning 
> [error] Compile failed at Sep 8, 2021 10:37:02 AM [1.126s]{code}
>  
> Please tell me if I miss anything, 
> Best regards



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36694) Unable to build Spark on Azure DevOps with ubuntu-latest

2021-09-26 Thread Armand BERGES (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420326#comment-17420326
 ] 

Armand BERGES commented on SPARK-36694:
---

[~sarutak] is there any command you want me to test to ensure this ? 

The code used to build spark is the following snippet : 



 
{noformat}
  - task: Maven@3
displayName: 'Build spark ${{ parameters.spark_checkout_reference }}'
inputs:  
  mavenPomFile: 'spark/pom.xml'  
  goals: 'clean package'  
  options: -DskipTests --activate-profiles ${{ 
parameters.spark_profile_to_activate }} --no-transfer-progress  
  jdkVersionOption: '1.8'  ## This should ensure that maven is running 
under Java 8.   
  mavenOptions: $(MAVEN_OPTS){noformat}
 

> Unable to build Spark on Azure DevOps with ubuntu-latest
> 
>
> Key: SPARK-36694
> URL: https://issues.apache.org/jira/browse/SPARK-36694
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.8
>Reporter: Armand BERGES
>Priority: Minor
>
> Hello
> With my team we're currently triying to set up some test environment by using 
> Spark on kubernetes.
> For this purpose, we're following your (great) documentation to [build 
> spark|https://spark.apache.org/docs/2.4.8/building-spark.html] and [build 
> spark docker 
> images|https://spark.apache.org/docs/latest/running-on-kubernetes.html#docker-images].
>  
> To make our build, we're using Azure DevOps.
> I don't know if it's a known bug or requirements I didn't see but I found 
> that I couldn't build spark on the Azure agent *ubuntu-latest* which I trust 
> to be *ubuntu-20.04*. The exact same build works on *ubuntu-18.04*. 
> My maven build always failed on building *spark-unsafe_2.11* with the 
> following error :
> {code:java}
> [error] warning: [options] bootstrap class path not set in conjunction with 
> -source 8 
> [error] 
> /home/vsts/work/1/s/spark/common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java:25:
>  error: cannot find symbol [error] import sun.misc.Cleaner; 
> [error] ^ 
> [error] symbol: class Cleaner 
> [error] location: package sun.misc 
> [error] 
> /home/vsts/work/1/s/spark/common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java:172:
>  error: cannot find symbol 
> [error] Cleaner cleaner = Cleaner.create(buffer, () -> freeMemory(memory)); 
> [error] ^ 
> [error] symbol: class Cleaner [error] location: class Platform 
> [error] 
> /home/vsts/work/1/s/spark/common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java:172:
>  error: cannot find symbol 
> [error] Cleaner cleaner = Cleaner.create(buffer, () -> freeMemory(memory)); 
> [error] ^ 
> [error] symbol: variable Cleaner 
> [error] location: class Platform 
> [error] 3 errors 
> [error] 1 warning 
> [error] Compile failed at Sep 8, 2021 10:37:02 AM [1.126s]{code}
>  
> Please tell me if I miss anything, 
> Best regards



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36826) CVEs in libraries used in bundled jars

2021-09-26 Thread Sean R. Owen (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420296#comment-17420296
 ] 

Sean R. Owen commented on SPARK-36826:
--

A few things here -

Static analysis output like this has value, but it only tells you that some 
vulnerability affects some part of potentially large dependencies. It doesn't 
mean it affects Spark.

That said it's always safer to just update the dependencies, if it's easy. Most 
of these come from Hadoop, which is somewhat tricky to update in a maintenance 
branch without breaking things. But I believe all of these are updated already 
directly or indirectly in Spark 3.2.0. You'll want to check code in master for 
check like this. See 
https://github.com/apache/spark/blob/master/dev/deps/spark-deps-hadoop-3.2-hive-2.3
 for example.

> CVEs in libraries used in bundled jars
> --
>
> Key: SPARK-36826
> URL: https://issues.apache.org/jira/browse/SPARK-36826
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.2
>Reporter: Carlos Rodríguez Hernández
>Priority: Major
>
> Hi, I found several CVEs in dependency libraries bundled in the 
> _aws-java-sdk-bundle_ jar.
> We are using Spark 3.1.2, which bundles _hadoop-*_ jars version 3.2.0:
> {code:bash}
> $ curl -JLO 
> "https://ftp.cixug.es/apache/spark/spark-3.1.2/spark-3.1.2-bin-hadoop3.2.tgz;
> $ tar xzf spark-3.1.2-bin-hadoop3.2.tgz
> $ find spark-3.1.2-bin-hadoop3.2/jars -wholename '*/hadoop-*'
> spark-3.1.2-bin-hadoop3.2/jars/hadoop-client-3.2.0.jar
> spark-3.1.2-bin-hadoop3.2/jars/hadoop-mapreduce-client-core-3.2.0.jar
> spark-3.1.2-bin-hadoop3.2/jars/hadoop-common-3.2.0.jar
> spark-3.1.2-bin-hadoop3.2/jars/hadoop-mapreduce-client-jobclient-3.2.0.jar
> spark-3.1.2-bin-hadoop3.2/jars/hadoop-auth-3.2.0.jar
> spark-3.1.2-bin-hadoop3.2/jars/hadoop-yarn-server-common-3.2.0.jar
> spark-3.1.2-bin-hadoop3.2/jars/hadoop-yarn-api-3.2.0.jar
> spark-3.1.2-bin-hadoop3.2/jars/hadoop-yarn-registry-3.2.0.jar
> spark-3.1.2-bin-hadoop3.2/jars/hadoop-annotations-3.2.0.jar
> spark-3.1.2-bin-hadoop3.2/jars/hadoop-yarn-client-3.2.0.jar
> spark-3.1.2-bin-hadoop3.2/jars/hadoop-hdfs-client-3.2.0.jar
> spark-3.1.2-bin-hadoop3.2/jars/hadoop-mapreduce-client-common-3.2.0.jar
> spark-3.1.2-bin-hadoop3.2/jars/hadoop-yarn-common-3.2.0.jar
> spark-3.1.2-bin-hadoop3.2/jars/hadoop-yarn-server-web-proxy-3.2.0.jar
> {code}
> There is a dependency between _hadoop-aws_, _hadoop-common_, and 
> _hadoop-project_ versions, as well, the _aws-java-sdk_ one should match the 
> required by _hadoop-project_, due to this dependencies we are including 
> _hadoop-aws-3.2.0_ and _aws-java-sdk-bundle-1.11.375_:
> {code:bash}
> $ find spark-3.1.2-bin-hadoop3.2/jars -wholename 
> spark-3.1.2-bin-hadoop3.2/jars/hadoop-aws-3.2.0.jar
> spark-3.1.2-bin-hadoop3.2/jars/aws-java-sdk-bundle-1.11.375.jar
> {code}
> Taking a look at the _hadoop-project_ pom, the _aws-java-sdk_ version is the 
> correct one:
> {code:bash}
> $ curl -JLO 
> "https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-project/3.2.0/hadoop-project-3.2.0.pom;
> $ cat hadoop-project-3.2.0.pom | grep aws-java-sdk
> 1.11.375
> aws-java-sdk-bundle
> ${aws-java-sdk.version}
> {code}
> Do you think it would be possible to update the versions of the jars to solve 
> the vulnerabilities?
> 
> Please see below the CVE report for _jars/aws-java-sdk-bundle-1.11.375.jar_:
> ||LIBRARY||VULNERABILITY ID||SEVERITY||INSTALLED VERSION||FIXED 
> VERSION||TITLE||
> |com.fasterxml.jackson.core:jackson-databind|CVE-2017-15095|CRITICAL|2.6.7.1|2.9.4,
>  2.8.11|jackson-databind: Unsafe|
> |com.fasterxml.jackson.core:jackson-databind|CVE-2017-17485|CRITICAL|2.6.7.1|2.8.11,
>  2.9.4|jackson-databind: Unsafe|
> |com.fasterxml.jackson.core:jackson-databind|CVE-2018-11307|CRITICAL|2.6.7.1|2.8.11.2,
>  2.7.9.4, 2.9.6|jackson-databind: Potential|
> |com.fasterxml.jackson.core:jackson-databind|CVE-2018-14718|CRITICAL|2.6.7.1|2.7.9.5,
>  2.8.11.3, 2.9.7|jackson-databind: arbitrary code|
> |com.fasterxml.jackson.core:jackson-databind|CVE-2018-14719|CRITICAL|2.6.7.1|2.7.9.5,
>  2.8.11.3, 2.9.7|jackson-databind: arbitrary|
> |com.fasterxml.jackson.core:jackson-databind|CVE-2018-14720|CRITICAL|2.6.7.1|2.6.7.2,
>  2.9.7|jackson-databind: exfiltration/XXE|
> |com.fasterxml.jackson.core:jackson-databind|CVE-2018-14721|CRITICAL|2.6.7.1|2.6.7.2,
>  2.9.7|jackson-databind: server-side request|
> |com.fasterxml.jackson.core:jackson-databind|CVE-2018-19360|CRITICAL|2.6.7.1|2.6.7.3,
>  2.7.9.5, 2.8.11.3|jackson-databind: improper|
> |com.fasterxml.jackson.core:jackson-databind|CVE-2018-19361|CRITICAL|2.6.7.1|2.6.7.3,
>  2.7.9.5, 2.8.11.3|jackson-databind: improper|
> |com.fasterxml.jackson.core:jackson-databind|CVE-2018-19362|CRITICAL|2.6.7.1|2.6.7.3,
>  2.7.9.5, 

[jira] [Resolved] (SPARK-36826) CVEs in libraries used in bundled jars

2021-09-26 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-36826.
--
Resolution: Not A Problem

> CVEs in libraries used in bundled jars
> --
>
> Key: SPARK-36826
> URL: https://issues.apache.org/jira/browse/SPARK-36826
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.2
>Reporter: Carlos Rodríguez Hernández
>Priority: Major
>
> Hi, I found several CVEs in dependency libraries bundled in the 
> _aws-java-sdk-bundle_ jar.
> We are using Spark 3.1.2, which bundles _hadoop-*_ jars version 3.2.0:
> {code:bash}
> $ curl -JLO 
> "https://ftp.cixug.es/apache/spark/spark-3.1.2/spark-3.1.2-bin-hadoop3.2.tgz;
> $ tar xzf spark-3.1.2-bin-hadoop3.2.tgz
> $ find spark-3.1.2-bin-hadoop3.2/jars -wholename '*/hadoop-*'
> spark-3.1.2-bin-hadoop3.2/jars/hadoop-client-3.2.0.jar
> spark-3.1.2-bin-hadoop3.2/jars/hadoop-mapreduce-client-core-3.2.0.jar
> spark-3.1.2-bin-hadoop3.2/jars/hadoop-common-3.2.0.jar
> spark-3.1.2-bin-hadoop3.2/jars/hadoop-mapreduce-client-jobclient-3.2.0.jar
> spark-3.1.2-bin-hadoop3.2/jars/hadoop-auth-3.2.0.jar
> spark-3.1.2-bin-hadoop3.2/jars/hadoop-yarn-server-common-3.2.0.jar
> spark-3.1.2-bin-hadoop3.2/jars/hadoop-yarn-api-3.2.0.jar
> spark-3.1.2-bin-hadoop3.2/jars/hadoop-yarn-registry-3.2.0.jar
> spark-3.1.2-bin-hadoop3.2/jars/hadoop-annotations-3.2.0.jar
> spark-3.1.2-bin-hadoop3.2/jars/hadoop-yarn-client-3.2.0.jar
> spark-3.1.2-bin-hadoop3.2/jars/hadoop-hdfs-client-3.2.0.jar
> spark-3.1.2-bin-hadoop3.2/jars/hadoop-mapreduce-client-common-3.2.0.jar
> spark-3.1.2-bin-hadoop3.2/jars/hadoop-yarn-common-3.2.0.jar
> spark-3.1.2-bin-hadoop3.2/jars/hadoop-yarn-server-web-proxy-3.2.0.jar
> {code}
> There is a dependency between _hadoop-aws_, _hadoop-common_, and 
> _hadoop-project_ versions, as well, the _aws-java-sdk_ one should match the 
> required by _hadoop-project_, due to this dependencies we are including 
> _hadoop-aws-3.2.0_ and _aws-java-sdk-bundle-1.11.375_:
> {code:bash}
> $ find spark-3.1.2-bin-hadoop3.2/jars -wholename 
> spark-3.1.2-bin-hadoop3.2/jars/hadoop-aws-3.2.0.jar
> spark-3.1.2-bin-hadoop3.2/jars/aws-java-sdk-bundle-1.11.375.jar
> {code}
> Taking a look at the _hadoop-project_ pom, the _aws-java-sdk_ version is the 
> correct one:
> {code:bash}
> $ curl -JLO 
> "https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-project/3.2.0/hadoop-project-3.2.0.pom;
> $ cat hadoop-project-3.2.0.pom | grep aws-java-sdk
> 1.11.375
> aws-java-sdk-bundle
> ${aws-java-sdk.version}
> {code}
> Do you think it would be possible to update the versions of the jars to solve 
> the vulnerabilities?
> 
> Please see below the CVE report for _jars/aws-java-sdk-bundle-1.11.375.jar_:
> ||LIBRARY||VULNERABILITY ID||SEVERITY||INSTALLED VERSION||FIXED 
> VERSION||TITLE||
> |com.fasterxml.jackson.core:jackson-databind|CVE-2017-15095|CRITICAL|2.6.7.1|2.9.4,
>  2.8.11|jackson-databind: Unsafe|
> |com.fasterxml.jackson.core:jackson-databind|CVE-2017-17485|CRITICAL|2.6.7.1|2.8.11,
>  2.9.4|jackson-databind: Unsafe|
> |com.fasterxml.jackson.core:jackson-databind|CVE-2018-11307|CRITICAL|2.6.7.1|2.8.11.2,
>  2.7.9.4, 2.9.6|jackson-databind: Potential|
> |com.fasterxml.jackson.core:jackson-databind|CVE-2018-14718|CRITICAL|2.6.7.1|2.7.9.5,
>  2.8.11.3, 2.9.7|jackson-databind: arbitrary code|
> |com.fasterxml.jackson.core:jackson-databind|CVE-2018-14719|CRITICAL|2.6.7.1|2.7.9.5,
>  2.8.11.3, 2.9.7|jackson-databind: arbitrary|
> |com.fasterxml.jackson.core:jackson-databind|CVE-2018-14720|CRITICAL|2.6.7.1|2.6.7.2,
>  2.9.7|jackson-databind: exfiltration/XXE|
> |com.fasterxml.jackson.core:jackson-databind|CVE-2018-14721|CRITICAL|2.6.7.1|2.6.7.2,
>  2.9.7|jackson-databind: server-side request|
> |com.fasterxml.jackson.core:jackson-databind|CVE-2018-19360|CRITICAL|2.6.7.1|2.6.7.3,
>  2.7.9.5, 2.8.11.3|jackson-databind: improper|
> |com.fasterxml.jackson.core:jackson-databind|CVE-2018-19361|CRITICAL|2.6.7.1|2.6.7.3,
>  2.7.9.5, 2.8.11.3|jackson-databind: improper|
> |com.fasterxml.jackson.core:jackson-databind|CVE-2018-19362|CRITICAL|2.6.7.1|2.6.7.3,
>  2.7.9.5, 2.8.11.3|jackson-databind: improper|
> |com.fasterxml.jackson.core:jackson-databind|CVE-2018-7489|CRITICAL|2.6.7.1|2.8.11.1,
>  2.9.5|jackson-databind: incomplete fix|
> |com.fasterxml.jackson.core:jackson-databind|CVE-2019-14379|CRITICAL|2.6.7.1|2.9.9.2|jackson-databind:
>  default|
> |com.fasterxml.jackson.core:jackson-databind|CVE-2019-14540|CRITICAL|2.6.7.1|2.9.10|jackson-databind:|
> |com.fasterxml.jackson.core:jackson-databind|CVE-2019-14892|CRITICAL|2.6.7.1|2.9.10,
>  2.8.11.5, 2.6.7.3|jackson-databind: Serialization|
> |com.fasterxml.jackson.core:jackson-databind|CVE-2019-14893|CRITICAL|2.6.7.1|2.8.11.5,
>  2.9.10|jackson-databind:|
> 

[jira] [Commented] (SPARK-31602) memory leak of JobConf

2021-09-26 Thread muhong (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420286#comment-17420286
 ] 

muhong commented on SPARK-31602:


we meet the same question on the spark driver side,but not find the answer。

we found the leak JobConf associate inside the DistributeFileSystem,the 
DistributeFileSystem are store in the FileSystem$Cache, it seems that the 
DistributeFileSystem were not closed; 

 

> memory leak of JobConf
> --
>
> Key: SPARK-31602
> URL: https://issues.apache.org/jira/browse/SPARK-31602
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: angerszhu
>Priority: Major
>  Labels: bulk-closed
> Attachments: image-2020-04-29-14-34-39-496.png, 
> image-2020-04-29-14-35-55-986.png
>
>
> !image-2020-04-29-14-34-39-496.png!
> !image-2020-04-29-14-35-55-986.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36851) Incorrect parsing of negative ANSI typed interval literals

2021-09-26 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang reassigned SPARK-36851:
--

Assignee: Peng Lei

> Incorrect parsing of negative ANSI typed interval literals
> --
>
> Key: SPARK-36851
> URL: https://issues.apache.org/jira/browse/SPARK-36851
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Peng Lei
>Priority: Major
> Fix For: 3.2.0
>
>
> If start field and end field are the same, parser doesn't take into account 
> the sign before interval literal string. For example:
> Works fine:
> {code:sql}
> spark-sql> select interval -'1-1' year to month;
> -1-1
> {code}
> Incorrect result:
> {code:sql}
> spark-sql> select interval -'1' year;
> 1-0
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36851) Incorrect parsing of negative ANSI typed interval literals

2021-09-26 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-36851.

Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 34107
[https://github.com/apache/spark/pull/34107]

> Incorrect parsing of negative ANSI typed interval literals
> --
>
> Key: SPARK-36851
> URL: https://issues.apache.org/jira/browse/SPARK-36851
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> If start field and end field are the same, parser doesn't take into account 
> the sign before interval literal string. For example:
> Works fine:
> {code:sql}
> spark-sql> select interval -'1-1' year to month;
> -1-1
> {code}
> Incorrect result:
> {code:sql}
> spark-sql> select interval -'1' year;
> 1-0
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36857) structured streaming support backpressure for kafka source

2021-09-26 Thread baizhendong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

baizhendong updated SPARK-36857:

Description: Spark streaming support backpressure for kafka, but in 
structured streaming, not support backpressure for kafka. Can someone explain 
why not support it?  (was: Spark streaming support backpressure for kafka, but 
in structured streaming, not support backpressure for kafka.)

> structured streaming support backpressure for kafka source
> --
>
> Key: SPARK-36857
> URL: https://issues.apache.org/jira/browse/SPARK-36857
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.4.8, 3.1.2
>Reporter: baizhendong
>Priority: Major
>
> Spark streaming support backpressure for kafka, but in structured streaming, 
> not support backpressure for kafka. Can someone explain why not support it?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36857) structured streaming support backpressure for kafka source

2021-09-26 Thread baizhendong (Jira)
baizhendong created SPARK-36857:
---

 Summary: structured streaming support backpressure for kafka source
 Key: SPARK-36857
 URL: https://issues.apache.org/jira/browse/SPARK-36857
 Project: Spark
  Issue Type: Improvement
  Components: Structured Streaming
Affects Versions: 3.1.2, 2.4.8
Reporter: baizhendong


Spark streaming support backpressure for kafka, but in structured streaming, 
not support backpressure for kafka.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36853) Code failing on checkstyle

2021-09-26 Thread Abhinav Kumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420258#comment-17420258
 ] 

Abhinav Kumar commented on SPARK-36853:
---

This error in thrown in Windows in Maven installing phase. The build succeeds 
but with these errors. 

> Code failing on checkstyle
> --
>
> Key: SPARK-36853
> URL: https://issues.apache.org/jira/browse/SPARK-36853
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Abhinav Kumar
>Priority: Trivial
>
> There are more - just pasting sample 
>  
> [INFO] There are 32 errors reported by Checkstyle 8.43 with 
> dev/checkstyle.xml ruleset.
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF11.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 107).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF12.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 116).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF13.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 104).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF13.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 125).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF14.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 109).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF14.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 134).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF15.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 114).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF15.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 143).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF16.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 119).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF16.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 152).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF17.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 124).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF17.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 161).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF18.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 129).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF18.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 170).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF19.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 134).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF19.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 179).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF20.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 139).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF20.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 188).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF21.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 144).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF21.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 197).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF22.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 149).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF22.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 206).
> [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[44,25] 
> (naming) MethodName: Method name 'ProcessingTime' must match pattern 
> '^[a-z][a-z0-9][a-zA-Z0-9_]*$'.
> [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[60,25] 
> (naming) MethodName: Method name 'ProcessingTime' must match pattern 
> '^[a-z][a-z0-9][a-zA-Z0-9_]*$'.
> [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[75,25] 
> (naming) MethodName: Method name 'ProcessingTime' must match pattern 
> '^[a-z][a-z0-9][a-zA-Z0-9_]*$'.
> [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[88,25] 
> (naming) MethodName: Method name 'ProcessingTime' must match pattern 
> '^[a-z][a-z0-9][a-zA-Z0-9_]*$'.
> [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[100,25] 
> (naming) MethodName: Method name 'Once' must match pattern 
> '^[a-z][a-z0-9][a-zA-Z0-9_]*$'.
> [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[110,25] 
> (naming) MethodName: Method name 'AvailableNow' must match pattern 
> 

[jira] [Created] (SPARK-36856) Building by "./build/mvn" may be stuck on MacOS

2021-09-26 Thread copperybean (Jira)
copperybean created SPARK-36856:
---

 Summary: Building by "./build/mvn" may be stuck on MacOS
 Key: SPARK-36856
 URL: https://issues.apache.org/jira/browse/SPARK-36856
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.0.0, 3.3.0
 Environment: MacOS 11.4
Reporter: copperybean


Command "./build/mvn" will be stuck on my MacOS 11.4. Because it is using error 
java home. On my mac, "/usr/bin/java" is a real file instead of a symbolic 
link, so the java home is set to path "/usr", and lead the launched maven 
process stuck with this error java home.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36855) Uniformly execute the mergeContinuousShuffleBlockIdsIfNeeded method first

2021-09-26 Thread jinhai (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jinhai updated SPARK-36855:
---
Description: In the ShuffleBlockFetcherIterator.partitionBlocksByFetchMode 
method, both local and host-local execute the 
mergeContinuousShuffleBlockIdsIfNeeded method first, but remote blocks executes 
the method many times in the createFetchRequests method. Can the method of 
merge blocks be executed only once in advance?  (was: In the 
ShuffleBlockFetcherIterator.partitionBlocksByFetchMode method, both local and 
host-local execute the mergeContinuousShuffleBlockIdsIfNeeded method first, but 
remote blocks executes the method many times in the createFetchRequests method. 
Can merge blocks be executed only once in advance?)

> Uniformly execute the mergeContinuousShuffleBlockIdsIfNeeded method first
> -
>
> Key: SPARK-36855
> URL: https://issues.apache.org/jira/browse/SPARK-36855
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.0, 3.1.1, 3.1.2
>Reporter: jinhai
>Priority: Major
>
> In the ShuffleBlockFetcherIterator.partitionBlocksByFetchMode method, both 
> local and host-local execute the mergeContinuousShuffleBlockIdsIfNeeded 
> method first, but remote blocks executes the method many times in the 
> createFetchRequests method. Can the method of merge blocks be executed only 
> once in advance?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36855) Uniformly execute the mergeContinuousShuffleBlockIdsIfNeeded method first

2021-09-26 Thread jinhai (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jinhai updated SPARK-36855:
---
Description: In the ShuffleBlockFetcherIterator.partitionBlocksByFetchMode 
method, both local and host-local execute the 
mergeContinuousShuffleBlockIdsIfNeeded method first, but remote blocks executes 
the method many times in the createFetchRequests method. Can merge blocks be 
executed only once in advance?  (was: In the 
ShuffleBlockFetcherIterator.partitionBlocksByFetchMode method, both local and 
host-local execute the mergeContinuousShuffleBlockIdsIfNeeded method first, but 
remote blocks executes the mergeContinuousShuffleBlockIdsIfNeeded method many 
times in the createFetchRequests method. Can merge blocks be executed only once 
in advance?)

> Uniformly execute the mergeContinuousShuffleBlockIdsIfNeeded method first
> -
>
> Key: SPARK-36855
> URL: https://issues.apache.org/jira/browse/SPARK-36855
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.0, 3.1.1, 3.1.2
>Reporter: jinhai
>Priority: Major
>
> In the ShuffleBlockFetcherIterator.partitionBlocksByFetchMode method, both 
> local and host-local execute the mergeContinuousShuffleBlockIdsIfNeeded 
> method first, but remote blocks executes the method many times in the 
> createFetchRequests method. Can merge blocks be executed only once in advance?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36855) Uniformly execute the mergeContinuousShuffleBlockIdsIfNeeded method first

2021-09-26 Thread jinhai (Jira)
jinhai created SPARK-36855:
--

 Summary: Uniformly execute the 
mergeContinuousShuffleBlockIdsIfNeeded method first
 Key: SPARK-36855
 URL: https://issues.apache.org/jira/browse/SPARK-36855
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.1.2, 3.1.1, 3.1.0
Reporter: jinhai


In the ShuffleBlockFetcherIterator.partitionBlocksByFetchMode method, both 
local and host-local execute the mergeContinuousShuffleBlockIdsIfNeeded method 
first, but remote blocks executes the mergeContinuousShuffleBlockIdsIfNeeded 
method many times in the createFetchRequests method. Can merge blocks be 
executed only once in advance?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36638) Generalize OptimizeSkewedJoin

2021-09-26 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420215#comment-17420215
 ] 

Apache Spark commented on SPARK-36638:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/34108

> Generalize OptimizeSkewedJoin
> -
>
> Key: SPARK-36638
> URL: https://issues.apache.org/jira/browse/SPARK-36638
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: zhengruifeng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org