[jira] [Assigned] (SPARK-37142) Add __all__ to pyspark/pandas/*/__init__.py
[ https://issues.apache.org/jira/browse/SPARK-37142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37142: Assignee: (was: Apache Spark) > Add __all__ to pyspark/pandas/*/__init__.py > --- > > Key: SPARK-37142 > URL: https://issues.apache.org/jira/browse/SPARK-37142 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dch nguyen >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37142) Add __all__ to pyspark/pandas/*/__init__.py
[ https://issues.apache.org/jira/browse/SPARK-37142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37142: Assignee: Apache Spark > Add __all__ to pyspark/pandas/*/__init__.py > --- > > Key: SPARK-37142 > URL: https://issues.apache.org/jira/browse/SPARK-37142 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dch nguyen >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37142) Add __all__ to pyspark/pandas/*/__init__.py
[ https://issues.apache.org/jira/browse/SPARK-37142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dch nguyen updated SPARK-37142: --- Issue Type: Improvement (was: Bug) > Add __all__ to pyspark/pandas/*/__init__.py > --- > > Key: SPARK-37142 > URL: https://issues.apache.org/jira/browse/SPARK-37142 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dch nguyen >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37142) Add __all__ to pyspark/pandas/*/__init__.py
dch nguyen created SPARK-37142: -- Summary: Add __all__ to pyspark/pandas/*/__init__.py Key: SPARK-37142 URL: https://issues.apache.org/jira/browse/SPARK-37142 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 3.3.0 Reporter: dch nguyen -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37128) Application has been removed by master but driver still running
[ https://issues.apache.org/jira/browse/SPARK-37128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435189#comment-17435189 ] JacobZheng commented on SPARK-37128: My Spark version is 3.0.1 and I run Spark standalone on k8s.I don't know how to reproduce it, so it doesn't always come up. Sometimes it appears when an oom exception occurs in the executor. [~hyukjin.kwon] > Application has been removed by master but driver still running > --- > > Key: SPARK-37128 > URL: https://issues.apache.org/jira/browse/SPARK-37128 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: JacobZheng >Priority: Major > > {code:java} > 21/08/30 10:27:31 INFO Master: Removing executor app-20210827190502-0030/1 > because it is EXITED > 21/08/30 10:27:31 INFO Master: Launching executor app-20210827190502-0030/4 > on worker worker-20210826183405-10.39.0.69-37147 > 21/08/30 10:27:31 INFO Master: 10.39.0.68:47160 got disassociated, removing > it. > 21/08/30 10:27:31 INFO Master: 10.39.0.68:35160 got disassociated, removing > it. > 21/08/30 10:27:31 INFO Master: Removing app app-20210827190502-0030 > 21/08/30 10:27:31 WARN Master: Got status update for unknown executor > app-20210827190502-0030/4 > 21/08/30 10:27:31 WARN Master: Got status update for unknown executor > app-20210827190502-0030/4 > 21/08/30 10:27:46 WARN Master: Got status update for unknown executor > app-20210827190502-0030/2 > 21/08/30 10:27:48 WARN Master: Got status update for unknown executor > app-20210827190502-0030/0 > 21/08/30 10:27:50 WARN Master: Got status update for unknown executor > app-20210827190502-0030/3{code} > As the logs show, Spark master removed my application. But my driver process > is still running. I would like to know what could be the cause of this and > how I can avoid it. > > My Spark version is 3.0.1 and I run Spark standalone on k8s.I don't know how > to reproduce it, so it doesn't always come up. Sometimes it appears when an > oom exception occurs in the executor -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37128) Application has been removed by master but driver still running
[ https://issues.apache.org/jira/browse/SPARK-37128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] JacobZheng updated SPARK-37128: --- Description: {code:java} 21/08/30 10:27:31 INFO Master: Removing executor app-20210827190502-0030/1 because it is EXITED 21/08/30 10:27:31 INFO Master: Launching executor app-20210827190502-0030/4 on worker worker-20210826183405-10.39.0.69-37147 21/08/30 10:27:31 INFO Master: 10.39.0.68:47160 got disassociated, removing it. 21/08/30 10:27:31 INFO Master: 10.39.0.68:35160 got disassociated, removing it. 21/08/30 10:27:31 INFO Master: Removing app app-20210827190502-0030 21/08/30 10:27:31 WARN Master: Got status update for unknown executor app-20210827190502-0030/4 21/08/30 10:27:31 WARN Master: Got status update for unknown executor app-20210827190502-0030/4 21/08/30 10:27:46 WARN Master: Got status update for unknown executor app-20210827190502-0030/2 21/08/30 10:27:48 WARN Master: Got status update for unknown executor app-20210827190502-0030/0 21/08/30 10:27:50 WARN Master: Got status update for unknown executor app-20210827190502-0030/3{code} As the logs show, Spark master removed my application. But my driver process is still running. I would like to know what could be the cause of this and how I can avoid it. My Spark version is 3.0.1 and I run Spark standalone on k8s.I don't know how to reproduce it, so it doesn't always come up. Sometimes it appears when an oom exception occurs in the executor was: {code:java} 21/08/30 10:27:31 INFO Master: Removing executor app-20210827190502-0030/1 because it is EXITED 21/08/30 10:27:31 INFO Master: Launching executor app-20210827190502-0030/4 on worker worker-20210826183405-10.39.0.69-37147 21/08/30 10:27:31 INFO Master: 10.39.0.68:47160 got disassociated, removing it. 21/08/30 10:27:31 INFO Master: 10.39.0.68:35160 got disassociated, removing it. 21/08/30 10:27:31 INFO Master: Removing app app-20210827190502-0030 21/08/30 10:27:31 WARN Master: Got status update for unknown executor app-20210827190502-0030/4 21/08/30 10:27:31 WARN Master: Got status update for unknown executor app-20210827190502-0030/4 21/08/30 10:27:46 WARN Master: Got status update for unknown executor app-20210827190502-0030/2 21/08/30 10:27:48 WARN Master: Got status update for unknown executor app-20210827190502-0030/0 21/08/30 10:27:50 WARN Master: Got status update for unknown executor app-20210827190502-0030/3{code} As the logs show, Spark master removed my application. But my driver process is still running. I would like to know what could be the cause of this and how I can avoid it. My Spark version is 3.0.1 and I run Spark standalone on k8s. > Application has been removed by master but driver still running > --- > > Key: SPARK-37128 > URL: https://issues.apache.org/jira/browse/SPARK-37128 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: JacobZheng >Priority: Major > > {code:java} > 21/08/30 10:27:31 INFO Master: Removing executor app-20210827190502-0030/1 > because it is EXITED > 21/08/30 10:27:31 INFO Master: Launching executor app-20210827190502-0030/4 > on worker worker-20210826183405-10.39.0.69-37147 > 21/08/30 10:27:31 INFO Master: 10.39.0.68:47160 got disassociated, removing > it. > 21/08/30 10:27:31 INFO Master: 10.39.0.68:35160 got disassociated, removing > it. > 21/08/30 10:27:31 INFO Master: Removing app app-20210827190502-0030 > 21/08/30 10:27:31 WARN Master: Got status update for unknown executor > app-20210827190502-0030/4 > 21/08/30 10:27:31 WARN Master: Got status update for unknown executor > app-20210827190502-0030/4 > 21/08/30 10:27:46 WARN Master: Got status update for unknown executor > app-20210827190502-0030/2 > 21/08/30 10:27:48 WARN Master: Got status update for unknown executor > app-20210827190502-0030/0 > 21/08/30 10:27:50 WARN Master: Got status update for unknown executor > app-20210827190502-0030/3{code} > As the logs show, Spark master removed my application. But my driver process > is still running. I would like to know what could be the cause of this and > how I can avoid it. > > My Spark version is 3.0.1 and I run Spark standalone on k8s.I don't know how > to reproduce it, so it doesn't always come up. Sometimes it appears when an > oom exception occurs in the executor -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37128) Application has been removed by master but driver still running
[ https://issues.apache.org/jira/browse/SPARK-37128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] JacobZheng updated SPARK-37128: --- Description: {code:java} 21/08/30 10:27:31 INFO Master: Removing executor app-20210827190502-0030/1 because it is EXITED 21/08/30 10:27:31 INFO Master: Launching executor app-20210827190502-0030/4 on worker worker-20210826183405-10.39.0.69-37147 21/08/30 10:27:31 INFO Master: 10.39.0.68:47160 got disassociated, removing it. 21/08/30 10:27:31 INFO Master: 10.39.0.68:35160 got disassociated, removing it. 21/08/30 10:27:31 INFO Master: Removing app app-20210827190502-0030 21/08/30 10:27:31 WARN Master: Got status update for unknown executor app-20210827190502-0030/4 21/08/30 10:27:31 WARN Master: Got status update for unknown executor app-20210827190502-0030/4 21/08/30 10:27:46 WARN Master: Got status update for unknown executor app-20210827190502-0030/2 21/08/30 10:27:48 WARN Master: Got status update for unknown executor app-20210827190502-0030/0 21/08/30 10:27:50 WARN Master: Got status update for unknown executor app-20210827190502-0030/3{code} As the logs show, Spark master removed my application. But my driver process is still running. I would like to know what could be the cause of this and how I can avoid it. My Spark version is 3.0.1 and I run Spark standalone on k8s. was: {code:java} 21/08/30 10:27:31 INFO Master: Removing executor app-20210827190502-0030/1 because it is EXITED 21/08/30 10:27:31 INFO Master: Launching executor app-20210827190502-0030/4 on worker worker-20210826183405-10.39.0.69-37147 21/08/30 10:27:31 INFO Master: 10.39.0.68:47160 got disassociated, removing it. 21/08/30 10:27:31 INFO Master: 10.39.0.68:35160 got disassociated, removing it. 21/08/30 10:27:31 INFO Master: Removing app app-20210827190502-0030 21/08/30 10:27:31 WARN Master: Got status update for unknown executor app-20210827190502-0030/4 21/08/30 10:27:31 WARN Master: Got status update for unknown executor app-20210827190502-0030/4 21/08/30 10:27:46 WARN Master: Got status update for unknown executor app-20210827190502-0030/2 21/08/30 10:27:48 WARN Master: Got status update for unknown executor app-20210827190502-0030/0 21/08/30 10:27:50 WARN Master: Got status update for unknown executor app-20210827190502-0030/3{code} As the logs show, Spark master removed my application. But my driver process is still running. I would like to know what could be the cause of this and how I can avoid it. > Application has been removed by master but driver still running > --- > > Key: SPARK-37128 > URL: https://issues.apache.org/jira/browse/SPARK-37128 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: JacobZheng >Priority: Major > > {code:java} > 21/08/30 10:27:31 INFO Master: Removing executor app-20210827190502-0030/1 > because it is EXITED > 21/08/30 10:27:31 INFO Master: Launching executor app-20210827190502-0030/4 > on worker worker-20210826183405-10.39.0.69-37147 > 21/08/30 10:27:31 INFO Master: 10.39.0.68:47160 got disassociated, removing > it. > 21/08/30 10:27:31 INFO Master: 10.39.0.68:35160 got disassociated, removing > it. > 21/08/30 10:27:31 INFO Master: Removing app app-20210827190502-0030 > 21/08/30 10:27:31 WARN Master: Got status update for unknown executor > app-20210827190502-0030/4 > 21/08/30 10:27:31 WARN Master: Got status update for unknown executor > app-20210827190502-0030/4 > 21/08/30 10:27:46 WARN Master: Got status update for unknown executor > app-20210827190502-0030/2 > 21/08/30 10:27:48 WARN Master: Got status update for unknown executor > app-20210827190502-0030/0 > 21/08/30 10:27:50 WARN Master: Got status update for unknown executor > app-20210827190502-0030/3{code} > As the logs show, Spark master removed my application. But my driver process > is still running. I would like to know what could be the cause of this and > how I can avoid it. > > My Spark version is 3.0.1 and I run Spark standalone on k8s. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37117) Can't read files in one of Parquet encryption modes (external keymaterial)
[ https://issues.apache.org/jira/browse/SPARK-37117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435187#comment-17435187 ] Apache Spark commented on SPARK-37117: -- User 'ggershinsky' has created a pull request for this issue: https://github.com/apache/spark/pull/34415 > Can't read files in one of Parquet encryption modes (external keymaterial) > --- > > Key: SPARK-37117 > URL: https://issues.apache.org/jira/browse/SPARK-37117 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gidon Gershinsky >Priority: Major > > Parquet encryption has a number of modes. One of them is "external > keymaterial", which keeps encrypted data keys in a separate file (as opposed > to inside Parquet file). Upon reading, the Spark Parquet connector does not > pass the file path, which causes an NPE. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37117) Can't read files in one of Parquet encryption modes (external keymaterial)
[ https://issues.apache.org/jira/browse/SPARK-37117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37117: Assignee: Apache Spark > Can't read files in one of Parquet encryption modes (external keymaterial) > --- > > Key: SPARK-37117 > URL: https://issues.apache.org/jira/browse/SPARK-37117 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gidon Gershinsky >Assignee: Apache Spark >Priority: Major > > Parquet encryption has a number of modes. One of them is "external > keymaterial", which keeps encrypted data keys in a separate file (as opposed > to inside Parquet file). Upon reading, the Spark Parquet connector does not > pass the file path, which causes an NPE. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37117) Can't read files in one of Parquet encryption modes (external keymaterial)
[ https://issues.apache.org/jira/browse/SPARK-37117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37117: Assignee: (was: Apache Spark) > Can't read files in one of Parquet encryption modes (external keymaterial) > --- > > Key: SPARK-37117 > URL: https://issues.apache.org/jira/browse/SPARK-37117 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gidon Gershinsky >Priority: Major > > Parquet encryption has a number of modes. One of them is "external > keymaterial", which keeps encrypted data keys in a separate file (as opposed > to inside Parquet file). Upon reading, the Spark Parquet connector does not > pass the file path, which causes an NPE. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37117) Can't read files in one of Parquet encryption modes (external keymaterial)
[ https://issues.apache.org/jira/browse/SPARK-37117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435186#comment-17435186 ] Apache Spark commented on SPARK-37117: -- User 'ggershinsky' has created a pull request for this issue: https://github.com/apache/spark/pull/34415 > Can't read files in one of Parquet encryption modes (external keymaterial) > --- > > Key: SPARK-37117 > URL: https://issues.apache.org/jira/browse/SPARK-37117 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gidon Gershinsky >Priority: Major > > Parquet encryption has a number of modes. One of them is "external > keymaterial", which keeps encrypted data keys in a separate file (as opposed > to inside Parquet file). Upon reading, the Spark Parquet connector does not > pass the file path, which causes an NPE. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37139) Inline type hints for python/pyspark/taskcontext.py and python/pyspark/version.py
[ https://issues.apache.org/jira/browse/SPARK-37139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37139: Assignee: Apache Spark > Inline type hints for python/pyspark/taskcontext.py and > python/pyspark/version.py > - > > Key: SPARK-37139 > URL: https://issues.apache.org/jira/browse/SPARK-37139 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dch nguyen >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37139) Inline type hints for python/pyspark/taskcontext.py and python/pyspark/version.py
[ https://issues.apache.org/jira/browse/SPARK-37139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37139: Assignee: (was: Apache Spark) > Inline type hints for python/pyspark/taskcontext.py and > python/pyspark/version.py > - > > Key: SPARK-37139 > URL: https://issues.apache.org/jira/browse/SPARK-37139 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dch nguyen >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37139) Inline type hints for python/pyspark/taskcontext.py and python/pyspark/version.py
[ https://issues.apache.org/jira/browse/SPARK-37139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435172#comment-17435172 ] Apache Spark commented on SPARK-37139: -- User 'dchvn' has created a pull request for this issue: https://github.com/apache/spark/pull/34414 > Inline type hints for python/pyspark/taskcontext.py and > python/pyspark/version.py > - > > Key: SPARK-37139 > URL: https://issues.apache.org/jira/browse/SPARK-37139 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dch nguyen >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37141) WorkerSuite cannot run on Mac OS
[ https://issues.apache.org/jira/browse/SPARK-37141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-37141: - Description: After SPARK-35907 run `org.apache.spark.deploy.worker.WorkerSuite` on Mac os(both M1 and Intel) failed {code:java} mvn clean install -DskipTests -pl core -am mvn test -pl core -Dtest=none -DwildcardSuites=org.apache.spark.deploy.worker.WorkerSuite {code} {code:java} WorkerSuite: - test isUseLocalNodeSSLConfig - test maybeUpdateSSLSettings - test clearing of finishedExecutors (small number of executors) - test clearing of finishedExecutors (more executors) - test clearing of finishedDrivers (small number of drivers) - test clearing of finishedDrivers (more drivers) [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 47.973 s [INFO] Finished at: 2021-10-28T13:46:56+08:00 [INFO] [ERROR] Failed to execute goal org.scalatest:scalatest-maven-plugin:2.0.2:test (test) on project spark-core_2.12: There are test failures -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException {code} {code:java} 21/10/28 13:46:56.133 dispatcher-event-loop-1 ERROR Utils: Failed to create directory /tmp java.nio.file.FileAlreadyExistsException: /tmp at sun.nio.fs.UnixException.translateToIOException(UnixException.java:88) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) at sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384) at java.nio.file.Files.createDirectory(Files.java:674) at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781) at java.nio.file.Files.createDirectories(Files.java:727) at org.apache.spark.util.Utils$.createDirectory(Utils.scala:292) at org.apache.spark.deploy.worker.Worker.createWorkDir(Worker.scala:221) at org.apache.spark.deploy.worker.Worker.onStart(Worker.scala:232) at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:120) at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213) at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) at org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75) at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} was: Run `org.apache.spark.deploy.worker.WorkerSuite` on Mac os(both M1 and Intel) failed {code:java} mvn clean install -DskipTests -pl core -am mvn test -pl core -Dtest=none -DwildcardSuites=org.apache.spark.deploy.worker.WorkerSuite {code} {code:java} WorkerSuite: - test isUseLocalNodeSSLConfig - test maybeUpdateSSLSettings - test clearing of finishedExecutors (small number of executors) - test clearing of finishedExecutors (more executors) - test clearing of finishedDrivers (small number of drivers) - test clearing of finishedDrivers (more drivers) [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 47.973 s [INFO] Finished at: 2021-10-28T13:46:56+08:00 [INFO] [ERROR] Failed to execute goal org.scalatest:scalatest-maven-plugin:2.0.2:test (test) on project spark-core_2.12: There are test failures -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException {code} {code:java} 21/10/28 13:46:56.133 dispatcher-event-loop-1 ERROR Utils: Failed to create directory /tmp java.nio.file.FileAlreadyExistsException: /tmp at sun.nio.fs.UnixException.translateToIOException(UnixException.java:88) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.f
[jira] [Created] (SPARK-37141) WorkerSuite cannot run on Mac OS
Yang Jie created SPARK-37141: Summary: WorkerSuite cannot run on Mac OS Key: SPARK-37141 URL: https://issues.apache.org/jira/browse/SPARK-37141 Project: Spark Issue Type: Bug Components: Tests Affects Versions: 3.3.0 Reporter: Yang Jie Run `org.apache.spark.deploy.worker.WorkerSuite` on Mac os(both M1 and Intel) failed {code:java} mvn clean install -DskipTests -pl core -am mvn test -pl core -Dtest=none -DwildcardSuites=org.apache.spark.deploy.worker.WorkerSuite {code} {code:java} WorkerSuite: - test isUseLocalNodeSSLConfig - test maybeUpdateSSLSettings - test clearing of finishedExecutors (small number of executors) - test clearing of finishedExecutors (more executors) - test clearing of finishedDrivers (small number of drivers) - test clearing of finishedDrivers (more drivers) [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 47.973 s [INFO] Finished at: 2021-10-28T13:46:56+08:00 [INFO] [ERROR] Failed to execute goal org.scalatest:scalatest-maven-plugin:2.0.2:test (test) on project spark-core_2.12: There are test failures -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException {code} {code:java} 21/10/28 13:46:56.133 dispatcher-event-loop-1 ERROR Utils: Failed to create directory /tmp java.nio.file.FileAlreadyExistsException: /tmp at sun.nio.fs.UnixException.translateToIOException(UnixException.java:88) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) at sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384) at java.nio.file.Files.createDirectory(Files.java:674) at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781) at java.nio.file.Files.createDirectories(Files.java:727) at org.apache.spark.util.Utils$.createDirectory(Utils.scala:292) at org.apache.spark.deploy.worker.Worker.createWorkDir(Worker.scala:221) at org.apache.spark.deploy.worker.Worker.onStart(Worker.scala:232) at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:120) at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213) at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) at org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75) at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37133) Add a config to optionally enforce ANSI reserved keywords
[ https://issues.apache.org/jira/browse/SPARK-37133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-37133. - Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 34403 [https://github.com/apache/spark/pull/34403] > Add a config to optionally enforce ANSI reserved keywords > - > > Key: SPARK-37133 > URL: https://issues.apache.org/jira/browse/SPARK-37133 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37140) Inline type hints for python/pyspark/resultiterable.py
[ https://issues.apache.org/jira/browse/SPARK-37140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435146#comment-17435146 ] Apache Spark commented on SPARK-37140: -- User 'dchvn' has created a pull request for this issue: https://github.com/apache/spark/pull/34413 > Inline type hints for python/pyspark/resultiterable.py > -- > > Key: SPARK-37140 > URL: https://issues.apache.org/jira/browse/SPARK-37140 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dch nguyen >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37140) Inline type hints for python/pyspark/resultiterable.py
[ https://issues.apache.org/jira/browse/SPARK-37140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37140: Assignee: Apache Spark > Inline type hints for python/pyspark/resultiterable.py > -- > > Key: SPARK-37140 > URL: https://issues.apache.org/jira/browse/SPARK-37140 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dch nguyen >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37140) Inline type hints for python/pyspark/resultiterable.py
[ https://issues.apache.org/jira/browse/SPARK-37140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435145#comment-17435145 ] Apache Spark commented on SPARK-37140: -- User 'dchvn' has created a pull request for this issue: https://github.com/apache/spark/pull/34413 > Inline type hints for python/pyspark/resultiterable.py > -- > > Key: SPARK-37140 > URL: https://issues.apache.org/jira/browse/SPARK-37140 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dch nguyen >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37140) Inline type hints for python/pyspark/resultiterable.py
[ https://issues.apache.org/jira/browse/SPARK-37140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37140: Assignee: (was: Apache Spark) > Inline type hints for python/pyspark/resultiterable.py > -- > > Key: SPARK-37140 > URL: https://issues.apache.org/jira/browse/SPARK-37140 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dch nguyen >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37138) Support ANSI Interval in functions that support numeric type
[ https://issues.apache.org/jira/browse/SPARK-37138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435144#comment-17435144 ] Apache Spark commented on SPARK-37138: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/34412 > Support ANSI Interval in functions that support numeric type > > > Key: SPARK-37138 > URL: https://issues.apache.org/jira/browse/SPARK-37138 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Priority: Major > > Support ANSI Interval in functions that support numeric type -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37138) Support ANSI Interval in functions that support numeric type
[ https://issues.apache.org/jira/browse/SPARK-37138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435143#comment-17435143 ] Apache Spark commented on SPARK-37138: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/34412 > Support ANSI Interval in functions that support numeric type > > > Key: SPARK-37138 > URL: https://issues.apache.org/jira/browse/SPARK-37138 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Priority: Major > > Support ANSI Interval in functions that support numeric type -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37138) Support ANSI Interval in functions that support numeric type
[ https://issues.apache.org/jira/browse/SPARK-37138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37138: Assignee: (was: Apache Spark) > Support ANSI Interval in functions that support numeric type > > > Key: SPARK-37138 > URL: https://issues.apache.org/jira/browse/SPARK-37138 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Priority: Major > > Support ANSI Interval in functions that support numeric type -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37138) Support ANSI Interval in functions that support numeric type
[ https://issues.apache.org/jira/browse/SPARK-37138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37138: Assignee: Apache Spark > Support ANSI Interval in functions that support numeric type > > > Key: SPARK-37138 > URL: https://issues.apache.org/jira/browse/SPARK-37138 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Assignee: Apache Spark >Priority: Major > > Support ANSI Interval in functions that support numeric type -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37140) Inline type hints for python/pyspark/resultiterable.py
dch nguyen created SPARK-37140: -- Summary: Inline type hints for python/pyspark/resultiterable.py Key: SPARK-37140 URL: https://issues.apache.org/jira/browse/SPARK-37140 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.3.0 Reporter: dch nguyen -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37139) Inline type hints for python/pyspark/taskcontext.py and python/pyspark/version.py
dch nguyen created SPARK-37139: -- Summary: Inline type hints for python/pyspark/taskcontext.py and python/pyspark/version.py Key: SPARK-37139 URL: https://issues.apache.org/jira/browse/SPARK-37139 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.3.0 Reporter: dch nguyen -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37137) Inline type hints for python/pyspark/conf.py
[ https://issues.apache.org/jira/browse/SPARK-37137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435142#comment-17435142 ] Apache Spark commented on SPARK-37137: -- User 'ByronHsu' has created a pull request for this issue: https://github.com/apache/spark/pull/34411 > Inline type hints for python/pyspark/conf.py > > > Key: SPARK-37137 > URL: https://issues.apache.org/jira/browse/SPARK-37137 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Byron Hsu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37137) Inline type hints for python/pyspark/conf.py
[ https://issues.apache.org/jira/browse/SPARK-37137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37137: Assignee: Apache Spark > Inline type hints for python/pyspark/conf.py > > > Key: SPARK-37137 > URL: https://issues.apache.org/jira/browse/SPARK-37137 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Byron Hsu >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37137) Inline type hints for python/pyspark/conf.py
[ https://issues.apache.org/jira/browse/SPARK-37137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37137: Assignee: (was: Apache Spark) > Inline type hints for python/pyspark/conf.py > > > Key: SPARK-37137 > URL: https://issues.apache.org/jira/browse/SPARK-37137 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Byron Hsu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37138) Support ANSI Interval in functions that support numeric type
angerszhu created SPARK-37138: - Summary: Support ANSI Interval in functions that support numeric type Key: SPARK-37138 URL: https://issues.apache.org/jira/browse/SPARK-37138 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: angerszhu Support ANSI Interval in functions that support numeric type -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37137) Inline type hints for python/pyspark/conf.py
Byron Hsu created SPARK-37137: - Summary: Inline type hints for python/pyspark/conf.py Key: SPARK-37137 URL: https://issues.apache.org/jira/browse/SPARK-37137 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.2.0 Reporter: Byron Hsu -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37136) Remove code about hive build in functions
[ https://issues.apache.org/jira/browse/SPARK-37136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37136: Assignee: (was: Apache Spark) > Remove code about hive build in functions > - > > Key: SPARK-37136 > URL: https://issues.apache.org/jira/browse/SPARK-37136 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Priority: Major > > Since we have implement `histogram_numeric`, no we can remove code about hive > build in functions -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37136) Remove code about hive build in functions
[ https://issues.apache.org/jira/browse/SPARK-37136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37136: Assignee: Apache Spark > Remove code about hive build in functions > - > > Key: SPARK-37136 > URL: https://issues.apache.org/jira/browse/SPARK-37136 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Assignee: Apache Spark >Priority: Major > > Since we have implement `histogram_numeric`, no we can remove code about hive > build in functions -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37136) Remove code about hive build in functions
[ https://issues.apache.org/jira/browse/SPARK-37136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435134#comment-17435134 ] Apache Spark commented on SPARK-37136: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/34410 > Remove code about hive build in functions > - > > Key: SPARK-37136 > URL: https://issues.apache.org/jira/browse/SPARK-37136 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Priority: Major > > Since we have implement `histogram_numeric`, no we can remove code about hive > build in functions -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37135) Fix some mirco-benchmarks run failed
[ https://issues.apache.org/jira/browse/SPARK-37135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37135: Assignee: Apache Spark > Fix some mirco-benchmarks run failed > - > > Key: SPARK-37135 > URL: https://issues.apache.org/jira/browse/SPARK-37135 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 3.3.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > > 2 mirco-benchmarks run failed: > > org.apache.spark.serializer.KryoSerializerBenchmark > {code:java} > Running org.apache.spark.serializer.KryoSerializerBenchmark:Running > org.apache.spark.serializer.KryoSerializerBenchmark:Running benchmark: > Benchmark KryoPool vs old"pool of 1" implementation Running case: > KryoPool:true21/10/27 16:09:26 ERROR SparkContext: Error initializing > SparkContext.java.lang.AssertionError: assertion failed: spark.test.home is > not set! at scala.Predef$.assert(Predef.scala:223) at > org.apache.spark.deploy.worker.Worker.(Worker.scala:148) at > org.apache.spark.deploy.worker.Worker$.startRpcEnvAndEndpoint(Worker.scala:954) > at > org.apache.spark.deploy.LocalSparkCluster.$anonfun$start$2(LocalSparkCluster.scala:71) > at > org.apache.spark.deploy.LocalSparkCluster.$anonfun$start$2$adapted(LocalSparkCluster.scala:65) > at scala.collection.immutable.Range.foreach(Range.scala:158) at > org.apache.spark.deploy.LocalSparkCluster.start(LocalSparkCluster.scala:65) > at > org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2971) > at org.apache.spark.SparkContext.(SparkContext.scala:562) at > org.apache.spark.SparkContext.(SparkContext.scala:138) at > org.apache.spark.serializer.KryoSerializerBenchmark$.createSparkContext(KryoSerializerBenchmark.scala:86) > at > org.apache.spark.serializer.KryoSerializerBenchmark$.sc$lzycompute$1(KryoSerializerBenchmark.scala:58) > at > org.apache.spark.serializer.KryoSerializerBenchmark$.sc$1(KryoSerializerBenchmark.scala:58) > at > org.apache.spark.serializer.KryoSerializerBenchmark$.$anonfun$run$3(KryoSerializerBenchmark.scala:63) > at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) at > scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659) at > scala.util.Success.$anonfun$map$1(Try.scala:255) at > scala.util.Success.map(Try.scala:213) at > scala.concurrent.Future.$anonfun$map$1(Future.scala:292) at > scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33) at > scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33) at > scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64) at > java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1426) > at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) > at > java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020) > at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656) > at > java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594) > at > java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183){code} > org.apache.spark.sql.execution.benchmark.DateTimeBenchmark > {code:java} > Exception in thread "main" > org.apache.spark.sql.catalyst.parser.ParseException: Exception in thread > "main" org.apache.spark.sql.catalyst.parser.ParseException: Cannot mix > year-month and day-time fields: interval 1 month 2 day(line 1, pos 38) > == SQL ==cast(timestamp_seconds(id) as date) + interval 1 month 2 > day--^^^ > at > org.apache.spark.sql.errors.QueryParsingErrors$.mixedIntervalUnitsError(QueryParsingErrors.scala:214) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.constructMultiUnitsIntervalLiteral(AstBuilder.scala:2435) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.$anonfun$visitInterval$1(AstBuilder.scala:2479) > at > org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:133) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.visitInterval(AstBuilder.scala:2454) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.visitInterval(AstBuilder.scala:57) > at > org.apache.spark.sql.catalyst.parser.SqlBaseParser$IntervalContext.accept(SqlBaseParser.java:17681) > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37135) Fix some mirco-benchmarks run failed
[ https://issues.apache.org/jira/browse/SPARK-37135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435133#comment-17435133 ] Apache Spark commented on SPARK-37135: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/34409 > Fix some mirco-benchmarks run failed > - > > Key: SPARK-37135 > URL: https://issues.apache.org/jira/browse/SPARK-37135 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Minor > > 2 mirco-benchmarks run failed: > > org.apache.spark.serializer.KryoSerializerBenchmark > {code:java} > Running org.apache.spark.serializer.KryoSerializerBenchmark:Running > org.apache.spark.serializer.KryoSerializerBenchmark:Running benchmark: > Benchmark KryoPool vs old"pool of 1" implementation Running case: > KryoPool:true21/10/27 16:09:26 ERROR SparkContext: Error initializing > SparkContext.java.lang.AssertionError: assertion failed: spark.test.home is > not set! at scala.Predef$.assert(Predef.scala:223) at > org.apache.spark.deploy.worker.Worker.(Worker.scala:148) at > org.apache.spark.deploy.worker.Worker$.startRpcEnvAndEndpoint(Worker.scala:954) > at > org.apache.spark.deploy.LocalSparkCluster.$anonfun$start$2(LocalSparkCluster.scala:71) > at > org.apache.spark.deploy.LocalSparkCluster.$anonfun$start$2$adapted(LocalSparkCluster.scala:65) > at scala.collection.immutable.Range.foreach(Range.scala:158) at > org.apache.spark.deploy.LocalSparkCluster.start(LocalSparkCluster.scala:65) > at > org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2971) > at org.apache.spark.SparkContext.(SparkContext.scala:562) at > org.apache.spark.SparkContext.(SparkContext.scala:138) at > org.apache.spark.serializer.KryoSerializerBenchmark$.createSparkContext(KryoSerializerBenchmark.scala:86) > at > org.apache.spark.serializer.KryoSerializerBenchmark$.sc$lzycompute$1(KryoSerializerBenchmark.scala:58) > at > org.apache.spark.serializer.KryoSerializerBenchmark$.sc$1(KryoSerializerBenchmark.scala:58) > at > org.apache.spark.serializer.KryoSerializerBenchmark$.$anonfun$run$3(KryoSerializerBenchmark.scala:63) > at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) at > scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659) at > scala.util.Success.$anonfun$map$1(Try.scala:255) at > scala.util.Success.map(Try.scala:213) at > scala.concurrent.Future.$anonfun$map$1(Future.scala:292) at > scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33) at > scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33) at > scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64) at > java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1426) > at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) > at > java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020) > at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656) > at > java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594) > at > java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183){code} > org.apache.spark.sql.execution.benchmark.DateTimeBenchmark > {code:java} > Exception in thread "main" > org.apache.spark.sql.catalyst.parser.ParseException: Exception in thread > "main" org.apache.spark.sql.catalyst.parser.ParseException: Cannot mix > year-month and day-time fields: interval 1 month 2 day(line 1, pos 38) > == SQL ==cast(timestamp_seconds(id) as date) + interval 1 month 2 > day--^^^ > at > org.apache.spark.sql.errors.QueryParsingErrors$.mixedIntervalUnitsError(QueryParsingErrors.scala:214) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.constructMultiUnitsIntervalLiteral(AstBuilder.scala:2435) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.$anonfun$visitInterval$1(AstBuilder.scala:2479) > at > org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:133) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.visitInterval(AstBuilder.scala:2454) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.visitInterval(AstBuilder.scala:57) > at > org.apache.spark.sql.catalyst.parser.SqlBaseParser$IntervalContext.accept(SqlBaseParser.java:17681) > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37135) Fix some mirco-benchmarks run failed
[ https://issues.apache.org/jira/browse/SPARK-37135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37135: Assignee: Apache Spark > Fix some mirco-benchmarks run failed > - > > Key: SPARK-37135 > URL: https://issues.apache.org/jira/browse/SPARK-37135 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 3.3.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > > 2 mirco-benchmarks run failed: > > org.apache.spark.serializer.KryoSerializerBenchmark > {code:java} > Running org.apache.spark.serializer.KryoSerializerBenchmark:Running > org.apache.spark.serializer.KryoSerializerBenchmark:Running benchmark: > Benchmark KryoPool vs old"pool of 1" implementation Running case: > KryoPool:true21/10/27 16:09:26 ERROR SparkContext: Error initializing > SparkContext.java.lang.AssertionError: assertion failed: spark.test.home is > not set! at scala.Predef$.assert(Predef.scala:223) at > org.apache.spark.deploy.worker.Worker.(Worker.scala:148) at > org.apache.spark.deploy.worker.Worker$.startRpcEnvAndEndpoint(Worker.scala:954) > at > org.apache.spark.deploy.LocalSparkCluster.$anonfun$start$2(LocalSparkCluster.scala:71) > at > org.apache.spark.deploy.LocalSparkCluster.$anonfun$start$2$adapted(LocalSparkCluster.scala:65) > at scala.collection.immutable.Range.foreach(Range.scala:158) at > org.apache.spark.deploy.LocalSparkCluster.start(LocalSparkCluster.scala:65) > at > org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2971) > at org.apache.spark.SparkContext.(SparkContext.scala:562) at > org.apache.spark.SparkContext.(SparkContext.scala:138) at > org.apache.spark.serializer.KryoSerializerBenchmark$.createSparkContext(KryoSerializerBenchmark.scala:86) > at > org.apache.spark.serializer.KryoSerializerBenchmark$.sc$lzycompute$1(KryoSerializerBenchmark.scala:58) > at > org.apache.spark.serializer.KryoSerializerBenchmark$.sc$1(KryoSerializerBenchmark.scala:58) > at > org.apache.spark.serializer.KryoSerializerBenchmark$.$anonfun$run$3(KryoSerializerBenchmark.scala:63) > at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) at > scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659) at > scala.util.Success.$anonfun$map$1(Try.scala:255) at > scala.util.Success.map(Try.scala:213) at > scala.concurrent.Future.$anonfun$map$1(Future.scala:292) at > scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33) at > scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33) at > scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64) at > java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1426) > at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) > at > java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020) > at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656) > at > java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594) > at > java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183){code} > org.apache.spark.sql.execution.benchmark.DateTimeBenchmark > {code:java} > Exception in thread "main" > org.apache.spark.sql.catalyst.parser.ParseException: Exception in thread > "main" org.apache.spark.sql.catalyst.parser.ParseException: Cannot mix > year-month and day-time fields: interval 1 month 2 day(line 1, pos 38) > == SQL ==cast(timestamp_seconds(id) as date) + interval 1 month 2 > day--^^^ > at > org.apache.spark.sql.errors.QueryParsingErrors$.mixedIntervalUnitsError(QueryParsingErrors.scala:214) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.constructMultiUnitsIntervalLiteral(AstBuilder.scala:2435) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.$anonfun$visitInterval$1(AstBuilder.scala:2479) > at > org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:133) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.visitInterval(AstBuilder.scala:2454) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.visitInterval(AstBuilder.scala:57) > at > org.apache.spark.sql.catalyst.parser.SqlBaseParser$IntervalContext.accept(SqlBaseParser.java:17681) > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37135) Fix some mirco-benchmarks run failed
[ https://issues.apache.org/jira/browse/SPARK-37135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37135: Assignee: (was: Apache Spark) > Fix some mirco-benchmarks run failed > - > > Key: SPARK-37135 > URL: https://issues.apache.org/jira/browse/SPARK-37135 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Minor > > 2 mirco-benchmarks run failed: > > org.apache.spark.serializer.KryoSerializerBenchmark > {code:java} > Running org.apache.spark.serializer.KryoSerializerBenchmark:Running > org.apache.spark.serializer.KryoSerializerBenchmark:Running benchmark: > Benchmark KryoPool vs old"pool of 1" implementation Running case: > KryoPool:true21/10/27 16:09:26 ERROR SparkContext: Error initializing > SparkContext.java.lang.AssertionError: assertion failed: spark.test.home is > not set! at scala.Predef$.assert(Predef.scala:223) at > org.apache.spark.deploy.worker.Worker.(Worker.scala:148) at > org.apache.spark.deploy.worker.Worker$.startRpcEnvAndEndpoint(Worker.scala:954) > at > org.apache.spark.deploy.LocalSparkCluster.$anonfun$start$2(LocalSparkCluster.scala:71) > at > org.apache.spark.deploy.LocalSparkCluster.$anonfun$start$2$adapted(LocalSparkCluster.scala:65) > at scala.collection.immutable.Range.foreach(Range.scala:158) at > org.apache.spark.deploy.LocalSparkCluster.start(LocalSparkCluster.scala:65) > at > org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2971) > at org.apache.spark.SparkContext.(SparkContext.scala:562) at > org.apache.spark.SparkContext.(SparkContext.scala:138) at > org.apache.spark.serializer.KryoSerializerBenchmark$.createSparkContext(KryoSerializerBenchmark.scala:86) > at > org.apache.spark.serializer.KryoSerializerBenchmark$.sc$lzycompute$1(KryoSerializerBenchmark.scala:58) > at > org.apache.spark.serializer.KryoSerializerBenchmark$.sc$1(KryoSerializerBenchmark.scala:58) > at > org.apache.spark.serializer.KryoSerializerBenchmark$.$anonfun$run$3(KryoSerializerBenchmark.scala:63) > at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) at > scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659) at > scala.util.Success.$anonfun$map$1(Try.scala:255) at > scala.util.Success.map(Try.scala:213) at > scala.concurrent.Future.$anonfun$map$1(Future.scala:292) at > scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33) at > scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33) at > scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64) at > java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1426) > at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) > at > java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020) > at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656) > at > java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594) > at > java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183){code} > org.apache.spark.sql.execution.benchmark.DateTimeBenchmark > {code:java} > Exception in thread "main" > org.apache.spark.sql.catalyst.parser.ParseException: Exception in thread > "main" org.apache.spark.sql.catalyst.parser.ParseException: Cannot mix > year-month and day-time fields: interval 1 month 2 day(line 1, pos 38) > == SQL ==cast(timestamp_seconds(id) as date) + interval 1 month 2 > day--^^^ > at > org.apache.spark.sql.errors.QueryParsingErrors$.mixedIntervalUnitsError(QueryParsingErrors.scala:214) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.constructMultiUnitsIntervalLiteral(AstBuilder.scala:2435) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.$anonfun$visitInterval$1(AstBuilder.scala:2479) > at > org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:133) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.visitInterval(AstBuilder.scala:2454) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.visitInterval(AstBuilder.scala:57) > at > org.apache.spark.sql.catalyst.parser.SqlBaseParser$IntervalContext.accept(SqlBaseParser.java:17681) > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37136) Remove code about hive build in functions
angerszhu created SPARK-37136: - Summary: Remove code about hive build in functions Key: SPARK-37136 URL: https://issues.apache.org/jira/browse/SPARK-37136 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.2.0 Reporter: angerszhu Since we have implement `histogram_numeric`, no we can remove code about hive build in functions -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37135) Fix some mirco-benchmarks run failed
Yang Jie created SPARK-37135: Summary: Fix some mirco-benchmarks run failed Key: SPARK-37135 URL: https://issues.apache.org/jira/browse/SPARK-37135 Project: Spark Issue Type: Bug Components: Spark Core, SQL Affects Versions: 3.3.0 Reporter: Yang Jie 2 mirco-benchmarks run failed: org.apache.spark.serializer.KryoSerializerBenchmark {code:java} Running org.apache.spark.serializer.KryoSerializerBenchmark:Running org.apache.spark.serializer.KryoSerializerBenchmark:Running benchmark: Benchmark KryoPool vs old"pool of 1" implementation Running case: KryoPool:true21/10/27 16:09:26 ERROR SparkContext: Error initializing SparkContext.java.lang.AssertionError: assertion failed: spark.test.home is not set! at scala.Predef$.assert(Predef.scala:223) at org.apache.spark.deploy.worker.Worker.(Worker.scala:148) at org.apache.spark.deploy.worker.Worker$.startRpcEnvAndEndpoint(Worker.scala:954) at org.apache.spark.deploy.LocalSparkCluster.$anonfun$start$2(LocalSparkCluster.scala:71) at org.apache.spark.deploy.LocalSparkCluster.$anonfun$start$2$adapted(LocalSparkCluster.scala:65) at scala.collection.immutable.Range.foreach(Range.scala:158) at org.apache.spark.deploy.LocalSparkCluster.start(LocalSparkCluster.scala:65) at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2971) at org.apache.spark.SparkContext.(SparkContext.scala:562) at org.apache.spark.SparkContext.(SparkContext.scala:138) at org.apache.spark.serializer.KryoSerializerBenchmark$.createSparkContext(KryoSerializerBenchmark.scala:86) at org.apache.spark.serializer.KryoSerializerBenchmark$.sc$lzycompute$1(KryoSerializerBenchmark.scala:58) at org.apache.spark.serializer.KryoSerializerBenchmark$.sc$1(KryoSerializerBenchmark.scala:58) at org.apache.spark.serializer.KryoSerializerBenchmark$.$anonfun$run$3(KryoSerializerBenchmark.scala:63) at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659) at scala.util.Success.$anonfun$map$1(Try.scala:255) at scala.util.Success.map(Try.scala:213) at scala.concurrent.Future.$anonfun$map$1(Future.scala:292) at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33) at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64) at java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1426) at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020) at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656) at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594) at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183){code} org.apache.spark.sql.execution.benchmark.DateTimeBenchmark {code:java} Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException: Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException: Cannot mix year-month and day-time fields: interval 1 month 2 day(line 1, pos 38) == SQL ==cast(timestamp_seconds(id) as date) + interval 1 month 2 day--^^^ at org.apache.spark.sql.errors.QueryParsingErrors$.mixedIntervalUnitsError(QueryParsingErrors.scala:214) at org.apache.spark.sql.catalyst.parser.AstBuilder.constructMultiUnitsIntervalLiteral(AstBuilder.scala:2435) at org.apache.spark.sql.catalyst.parser.AstBuilder.$anonfun$visitInterval$1(AstBuilder.scala:2479) at org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:133) at org.apache.spark.sql.catalyst.parser.AstBuilder.visitInterval(AstBuilder.scala:2454) at org.apache.spark.sql.catalyst.parser.AstBuilder.visitInterval(AstBuilder.scala:57) at org.apache.spark.sql.catalyst.parser.SqlBaseParser$IntervalContext.accept(SqlBaseParser.java:17681) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37036) Add util function to raise advice warning for pandas API on Spark.
[ https://issues.apache.org/jira/browse/SPARK-37036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-37036. -- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 34389 [https://github.com/apache/spark/pull/34389] > Add util function to raise advice warning for pandas API on Spark. > -- > > Key: SPARK-37036 > URL: https://issues.apache.org/jira/browse/SPARK-37036 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.3.0 > > > Pandas API on Spark has some features that potentially cause the performance > degradation or an unexpected behavior e.g. `sort_index`, `index_col`, > `to_pandas`, etc. > > We should raise the proper advice warning for those functions so that users > can adjust their pandas-on-Spark code base more robust. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37036) Add util function to raise advice warning for pandas API on Spark.
[ https://issues.apache.org/jira/browse/SPARK-37036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-37036: Assignee: Haejoon Lee > Add util function to raise advice warning for pandas API on Spark. > -- > > Key: SPARK-37036 > URL: https://issues.apache.org/jira/browse/SPARK-37036 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > > Pandas API on Spark has some features that potentially cause the performance > degradation or an unexpected behavior e.g. `sort_index`, `index_col`, > `to_pandas`, etc. > > We should raise the proper advice warning for those functions so that users > can adjust their pandas-on-Spark code base more robust. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37119) parse_url can not handle `{` and `}` correctly
[ https://issues.apache.org/jira/browse/SPARK-37119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-37119: - Priority: Major (was: Critical) > parse_url can not handle `{` and `}` correctly > -- > > Key: SPARK-37119 > URL: https://issues.apache.org/jira/browse/SPARK-37119 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.8, 3.2.0, 3.3.0 >Reporter: Liu Shuo >Priority: Major > > when we execute the follow sql command > {code:java} > select parse_url('http://facebook.com/path/p1.php?query={aa}', 'QUERY') > {code} > the expected result: > query=\{aa} > the actual result: > null -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37119) parse_url can not handle `{` and `}` correctly
[ https://issues.apache.org/jira/browse/SPARK-37119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-37119. -- Resolution: Invalid > parse_url can not handle `{` and `}` correctly > -- > > Key: SPARK-37119 > URL: https://issues.apache.org/jira/browse/SPARK-37119 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.8, 3.2.0, 3.3.0 >Reporter: Liu Shuo >Priority: Critical > > when we execute the follow sql command > {code:java} > select parse_url('http://facebook.com/path/p1.php?query={aa}', 'QUERY') > {code} > the expected result: > query=\{aa} > the actual result: > null -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37117) Can't read files in one of Parquet encryption modes (external keymaterial)
[ https://issues.apache.org/jira/browse/SPARK-37117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-37117: - Target Version/s: (was: 3.2.1) > Can't read files in one of Parquet encryption modes (external keymaterial) > --- > > Key: SPARK-37117 > URL: https://issues.apache.org/jira/browse/SPARK-37117 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gidon Gershinsky >Priority: Major > > Parquet encryption has a number of modes. One of them is "external > keymaterial", which keeps encrypted data keys in a separate file (as opposed > to inside Parquet file). Upon reading, the Spark Parquet connector does not > pass the file path, which causes an NPE. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37122) java.lang.IllegalArgumentException Related to Prometheus
[ https://issues.apache.org/jira/browse/SPARK-37122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-37122: - Priority: Major (was: Critical) > java.lang.IllegalArgumentException Related to Prometheus > > > Key: SPARK-37122 > URL: https://issues.apache.org/jira/browse/SPARK-37122 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.1.1 >Reporter: Biswa Singh >Priority: Major > > This issue is similar to > https://issues.apache.org/jira/browse/SPARK-35237?focusedCommentId=17340723&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17340723. > We receive the Following warning continuously: > > 21:00:26.277 [rpc-server-4-2] WARN o.a.s.n.s.TransportChannelHandler - > Exception in connection from > /10.198.3.179:51184java.lang.IllegalArgumentException: Too large frame: > 5135603447297303916 at > org.sparkproject.guava.base.Preconditions.checkArgument(Preconditions.java:119) > at > org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:148) > at > org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:98) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) > at > io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) at > io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) > at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Unknown Source) > > Below are other details related to prometheus and my findings. Please SCROLL > DOWN to see the details: > > {noformat} > Prometheus Scrape Configuration > === > - job_name: 'kubernetes-pods' > kubernetes_sd_configs: > - role: pod > relabel_configs: > - action: labelmap > regex: __meta_kubernetes_pod_label_(.+) > - source_labels: [__meta_kubernetes_namespace] > action: replace > target_label: kubernetes_namespace > - source_labels: [__meta_kubernetes_pod_name] > action: replace > target_label: kubernetes_pod_name > - source_labels: > [__meta_kubernetes_pod_annotation_prometheus_io_scrape] > action: keep > regex: true > - source_labels: > [__meta_kubernetes_pod_annotation_prometheus_io_scheme] > action: replace > target_label: __scheme__ > regex: (https?) > - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path] > action: replace > target_label: __metrics_path__ > regex: (.+) > - source_labels: [__address__, > __meta_kubernetes_pod_prometheus_io_port] > action: replace > target_label: __address__ > regex: ([^:]+)(?::\d+)?;(\d+) > replacement: $1:$2 > tcptrack command output in spark3 pod > == > 10.198.22.240:51258 10.198.40.143:7079 CLOSED 10s 0 B/s > 10.198.22.240:51258 10.198.40.143:7079 CLOSED 10s 0 B/s > 10.198.22.240:50354 10.198.40.143:7079 CLOSED 40s 0 B/s > 10.198.22.240:33152 10.198.40.143:4040 ESTABLISHED 2s 0 B/s > 10.198.22.240:47726 10.198.40.143:8090 ESTABLISHED 9s 0 B/s > 10.198.22.240 = prometheus pod > ip10.198.40.143 = testpod ip > Issue > == > Though the scrape config is expected to scrape on port 8090. I see prometheus > tries to initiate scrape on ports like 7079, 7078, 4040, etc on > the spark3 pod and hence the exception in spark3 pod. But is this really a > p
[jira] [Issue Comment Deleted] (SPARK-37095) Inline type hints for files in python/pyspark/broadcast.py
[ https://issues.apache.org/jira/browse/SPARK-37095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Byron Hsu updated SPARK-37095: -- Comment: was deleted (was: test) > Inline type hints for files in python/pyspark/broadcast.py > -- > > Key: SPARK-37095 > URL: https://issues.apache.org/jira/browse/SPARK-37095 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dch nguyen >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37128) Application has been removed by master but driver still running
[ https://issues.apache.org/jira/browse/SPARK-37128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435121#comment-17435121 ] Hyukjin Kwon commented on SPARK-37128: -- Can you share the steps to reproduce the issue? which environment did you use? > Application has been removed by master but driver still running > --- > > Key: SPARK-37128 > URL: https://issues.apache.org/jira/browse/SPARK-37128 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: JacobZheng >Priority: Major > > {code:java} > 21/08/30 10:27:31 INFO Master: Removing executor app-20210827190502-0030/1 > because it is EXITED > 21/08/30 10:27:31 INFO Master: Launching executor app-20210827190502-0030/4 > on worker worker-20210826183405-10.39.0.69-37147 > 21/08/30 10:27:31 INFO Master: 10.39.0.68:47160 got disassociated, removing > it. > 21/08/30 10:27:31 INFO Master: 10.39.0.68:35160 got disassociated, removing > it. > 21/08/30 10:27:31 INFO Master: Removing app app-20210827190502-0030 > 21/08/30 10:27:31 WARN Master: Got status update for unknown executor > app-20210827190502-0030/4 > 21/08/30 10:27:31 WARN Master: Got status update for unknown executor > app-20210827190502-0030/4 > 21/08/30 10:27:46 WARN Master: Got status update for unknown executor > app-20210827190502-0030/2 > 21/08/30 10:27:48 WARN Master: Got status update for unknown executor > app-20210827190502-0030/0 > 21/08/30 10:27:50 WARN Master: Got status update for unknown executor > app-20210827190502-0030/3{code} > As the logs show, Spark master removed my application. But my driver process > is still running. I would like to know what could be the cause of this and > how I can avoid it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37131) Support use IN/EXISTS with subquery in Project/Aggregate
[ https://issues.apache.org/jira/browse/SPARK-37131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435120#comment-17435120 ] Hyukjin Kwon commented on SPARK-37131: -- cc [~allisonwang-db] and [~cloud_fan] FYI > Support use IN/EXISTS with subquery in Project/Aggregate > > > Key: SPARK-37131 > URL: https://issues.apache.org/jira/browse/SPARK-37131 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Tongwei >Priority: Major > > {code:java} > CREATE TABLE tbl1 (col1 INT, col2 STRING) USING PARQUET; > INSERT OVERWRITE TABLE tbl1 SELECT 0,1; > CREATE TABLE tbl2 (c1 INT, c2 STRING) USING PARQUET; > INSERT OVERWRITE TABLE tbl2 SELECT 0,2; > case 1: > select c1 in (select col1 from tbl1) from tbl2 > Error msg: > IN/EXISTS predicate sub-queries can only be used in Filter/Join and a > few commands: Project [] > case 2: > select count(1), case when c1 in (select col1 from tbl1) then "A" else > "B" end as tag from tbl2 group by case when c1 in (select col1 from tbl1) > then "A" else "B" end > Error msg: > IN/EXISTS predicate sub-queries can only be used in Filter/Join and a > few commands: Aggregate [] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37095) Inline type hints for files in python/pyspark/broadcast.py
[ https://issues.apache.org/jira/browse/SPARK-37095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435119#comment-17435119 ] Byron Hsu commented on SPARK-37095: --- test > Inline type hints for files in python/pyspark/broadcast.py > -- > > Key: SPARK-37095 > URL: https://issues.apache.org/jira/browse/SPARK-37095 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dch nguyen >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37132) Incorrect Spark 3.2.0 package names with included Hadoop binaries
[ https://issues.apache.org/jira/browse/SPARK-37132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-37132. -- Resolution: Duplicate > Incorrect Spark 3.2.0 package names with included Hadoop binaries > - > > Key: SPARK-37132 > URL: https://issues.apache.org/jira/browse/SPARK-37132 > Project: Spark > Issue Type: Bug > Components: Build, Documentation >Affects Versions: 3.2.0 >Reporter: Denis Krivenko >Priority: Trivial > > *Spark 3.2.0+Hadoop* packages contains Hadoop 3.3 binaries, however file > names still refer to Hadoop 3.2, i.e. _spark-3.2.0-bin-*hadoop3.2*.tgz_ > [https://dlcdn.apache.org/spark/spark-3.2.0/] > [https://dlcdn.apache.org/spark/spark-3.2.0/spark-3.2.0-bin-hadoop3.2.tgz] > [https://dlcdn.apache.org/spark/spark-3.2.0/spark-3.2.0-bin-hadoop3.2-scala2.13.tgz] > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37134) documentation - unclear "Using PySpark Native Features"
[ https://issues.apache.org/jira/browse/SPARK-37134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435117#comment-17435117 ] Hyukjin Kwon commented on SPARK-37134: -- They are individual items so OR is correct. Feel free to create a PR to clarify them. > documentation - unclear "Using PySpark Native Features" > --- > > Key: SPARK-37134 > URL: https://issues.apache.org/jira/browse/SPARK-37134 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.6.2 >Reporter: carl rees >Priority: Major > > sorry, no idea on the version it affects or what Shepard is? no explanation > on this form so guessed whatever! > > This page of your documentation is UNCLEAR > paragraph "Using PySpark Native Features" QUOTE > "PySpark allows to upload Python files ({{.py}}), zipped Python packages > ({{.zip}}), and Egg files ({{.egg}}) to the executors by: > * Setting the configuration setting {{spark.submit.pyFiles}} > * Setting {{--py-files}} option in Spark scripts > * Directly calling > [{{pyspark.SparkContext.addPyFile()}}|https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.SparkContext.addPyFile.html#pyspark.SparkContext.addPyFile] > in applications > > QUESTION: is this all of the above or each of the above steps? > suggest adding "OR" between each bullet point? > > [https://spark.apache.org/docs/latest/api/python/user_guide/python_packaging.html] > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37134) documentation - unclear "Using PySpark Native Features"
[ https://issues.apache.org/jira/browse/SPARK-37134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-37134: - Target Version/s: (was: 1.6.2) > documentation - unclear "Using PySpark Native Features" > --- > > Key: SPARK-37134 > URL: https://issues.apache.org/jira/browse/SPARK-37134 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.6.2 >Reporter: carl rees >Priority: Critical > > sorry, no idea on the version it affects or what Shepard is? no explanation > on this form so guessed whatever! > > This page of your documentation is UNCLEAR > paragraph "Using PySpark Native Features" QUOTE > "PySpark allows to upload Python files ({{.py}}), zipped Python packages > ({{.zip}}), and Egg files ({{.egg}}) to the executors by: > * Setting the configuration setting {{spark.submit.pyFiles}} > * Setting {{--py-files}} option in Spark scripts > * Directly calling > [{{pyspark.SparkContext.addPyFile()}}|https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.SparkContext.addPyFile.html#pyspark.SparkContext.addPyFile] > in applications > > QUESTION: is this all of the above or each of the above steps? > suggest adding "OR" between each bullet point? > > [https://spark.apache.org/docs/latest/api/python/user_guide/python_packaging.html] > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37134) documentation - unclear "Using PySpark Native Features"
[ https://issues.apache.org/jira/browse/SPARK-37134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-37134: - Shepherd: (was: Tenovip33) > documentation - unclear "Using PySpark Native Features" > --- > > Key: SPARK-37134 > URL: https://issues.apache.org/jira/browse/SPARK-37134 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.6.2 >Reporter: carl rees >Priority: Major > > sorry, no idea on the version it affects or what Shepard is? no explanation > on this form so guessed whatever! > > This page of your documentation is UNCLEAR > paragraph "Using PySpark Native Features" QUOTE > "PySpark allows to upload Python files ({{.py}}), zipped Python packages > ({{.zip}}), and Egg files ({{.egg}}) to the executors by: > * Setting the configuration setting {{spark.submit.pyFiles}} > * Setting {{--py-files}} option in Spark scripts > * Directly calling > [{{pyspark.SparkContext.addPyFile()}}|https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.SparkContext.addPyFile.html#pyspark.SparkContext.addPyFile] > in applications > > QUESTION: is this all of the above or each of the above steps? > suggest adding "OR" between each bullet point? > > [https://spark.apache.org/docs/latest/api/python/user_guide/python_packaging.html] > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37134) documentation - unclear "Using PySpark Native Features"
[ https://issues.apache.org/jira/browse/SPARK-37134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-37134: - Priority: Major (was: Critical) > documentation - unclear "Using PySpark Native Features" > --- > > Key: SPARK-37134 > URL: https://issues.apache.org/jira/browse/SPARK-37134 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.6.2 >Reporter: carl rees >Priority: Major > > sorry, no idea on the version it affects or what Shepard is? no explanation > on this form so guessed whatever! > > This page of your documentation is UNCLEAR > paragraph "Using PySpark Native Features" QUOTE > "PySpark allows to upload Python files ({{.py}}), zipped Python packages > ({{.zip}}), and Egg files ({{.egg}}) to the executors by: > * Setting the configuration setting {{spark.submit.pyFiles}} > * Setting {{--py-files}} option in Spark scripts > * Directly calling > [{{pyspark.SparkContext.addPyFile()}}|https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.SparkContext.addPyFile.html#pyspark.SparkContext.addPyFile] > in applications > > QUESTION: is this all of the above or each of the above steps? > suggest adding "OR" between each bullet point? > > [https://spark.apache.org/docs/latest/api/python/user_guide/python_packaging.html] > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37134) documentation - unclear "Using PySpark Native Features"
[ https://issues.apache.org/jira/browse/SPARK-37134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-37134: - Environment: (was: ?) > documentation - unclear "Using PySpark Native Features" > --- > > Key: SPARK-37134 > URL: https://issues.apache.org/jira/browse/SPARK-37134 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.6.2 >Reporter: carl rees >Priority: Critical > Fix For: 1.6.2 > > > sorry, no idea on the version it affects or what Shepard is? no explanation > on this form so guessed whatever! > > This page of your documentation is UNCLEAR > paragraph "Using PySpark Native Features" QUOTE > "PySpark allows to upload Python files ({{.py}}), zipped Python packages > ({{.zip}}), and Egg files ({{.egg}}) to the executors by: > * Setting the configuration setting {{spark.submit.pyFiles}} > * Setting {{--py-files}} option in Spark scripts > * Directly calling > [{{pyspark.SparkContext.addPyFile()}}|https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.SparkContext.addPyFile.html#pyspark.SparkContext.addPyFile] > in applications > > QUESTION: is this all of the above or each of the above steps? > suggest adding "OR" between each bullet point? > > [https://spark.apache.org/docs/latest/api/python/user_guide/python_packaging.html] > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37134) documentation - unclear "Using PySpark Native Features"
[ https://issues.apache.org/jira/browse/SPARK-37134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-37134: - Fix Version/s: (was: 1.6.2) > documentation - unclear "Using PySpark Native Features" > --- > > Key: SPARK-37134 > URL: https://issues.apache.org/jira/browse/SPARK-37134 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.6.2 >Reporter: carl rees >Priority: Critical > > sorry, no idea on the version it affects or what Shepard is? no explanation > on this form so guessed whatever! > > This page of your documentation is UNCLEAR > paragraph "Using PySpark Native Features" QUOTE > "PySpark allows to upload Python files ({{.py}}), zipped Python packages > ({{.zip}}), and Egg files ({{.egg}}) to the executors by: > * Setting the configuration setting {{spark.submit.pyFiles}} > * Setting {{--py-files}} option in Spark scripts > * Directly calling > [{{pyspark.SparkContext.addPyFile()}}|https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.SparkContext.addPyFile.html#pyspark.SparkContext.addPyFile] > in applications > > QUESTION: is this all of the above or each of the above steps? > suggest adding "OR" between each bullet point? > > [https://spark.apache.org/docs/latest/api/python/user_guide/python_packaging.html] > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37121) TestUtils.isPythonVersionAtLeast38 returns incorrect results
[ https://issues.apache.org/jira/browse/SPARK-37121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-37121. -- Fix Version/s: 3.2.1 3.3.0 Resolution: Fixed Issue resolved by pull request 34395 [https://github.com/apache/spark/pull/34395] > TestUtils.isPythonVersionAtLeast38 returns incorrect results > > > Key: SPARK-37121 > URL: https://issues.apache.org/jira/browse/SPARK-37121 > Project: Spark > Issue Type: Bug > Components: Tests >Affects Versions: 3.2.0 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Fix For: 3.3.0, 3.2.1 > > > I was working on {{HiveExternalCatalogVersionsSuite}} recently and noticed > that it was never running against the Spark 2.x release lines, only the 3.x > ones. The problem was coming from here, specifically the Python 3.8+ version > check: > {code} > versions > .filter(v => v.startsWith("3") || !TestUtils.isPythonVersionAtLeast38()) > .filter(v => v.startsWith("3") || > !SystemUtils.isJavaVersionAtLeast(JavaVersion.JAVA_9)) > {code} > I found that {{TestUtils.isPythonVersionAtLeast38()}} was always returning > true, even when my system installation of Python3 was 3.7. Thinking it was an > environment issue, I pulled up a debugger to check which version of Python > the test JVM was seeing, and it was in fact Python 3.7. > Turns out the issue is with the {{isPythonVersionAtLeast38}} method: > {code} > def isPythonVersionAtLeast38(): Boolean = { > val attempt = if (Utils.isWindows) { > Try(Process(Seq("cmd.exe", "/C", "python3 --version")) > .run(ProcessLogger(s => s.startsWith("Python 3.8") || > s.startsWith("Python 3.9"))) > .exitValue()) > } else { > Try(Process(Seq("sh", "-c", "python3 --version")) > .run(ProcessLogger(s => s.startsWith("Python 3.8") || > s.startsWith("Python 3.9"))) > .exitValue()) > } > attempt.isSuccess && attempt.get == 0 > } > {code} > It's trying to evaluate the version of Python using a {{ProcessLogger}}, but > the logger accepts a {{String => Unit}} function, i.e., it does not make use > of the return value in any way (since it's meant for logging). So the result > of the {{startsWith}} checks are thrown away, and {{attempt.isSuccess && > attempt.get == 0}} will always be true as long as your system has a > {{python3}} binary of any version. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37121) TestUtils.isPythonVersionAtLeast38 returns incorrect results
[ https://issues.apache.org/jira/browse/SPARK-37121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-37121: Assignee: Erik Krogen > TestUtils.isPythonVersionAtLeast38 returns incorrect results > > > Key: SPARK-37121 > URL: https://issues.apache.org/jira/browse/SPARK-37121 > Project: Spark > Issue Type: Bug > Components: Tests >Affects Versions: 3.2.0 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > > I was working on {{HiveExternalCatalogVersionsSuite}} recently and noticed > that it was never running against the Spark 2.x release lines, only the 3.x > ones. The problem was coming from here, specifically the Python 3.8+ version > check: > {code} > versions > .filter(v => v.startsWith("3") || !TestUtils.isPythonVersionAtLeast38()) > .filter(v => v.startsWith("3") || > !SystemUtils.isJavaVersionAtLeast(JavaVersion.JAVA_9)) > {code} > I found that {{TestUtils.isPythonVersionAtLeast38()}} was always returning > true, even when my system installation of Python3 was 3.7. Thinking it was an > environment issue, I pulled up a debugger to check which version of Python > the test JVM was seeing, and it was in fact Python 3.7. > Turns out the issue is with the {{isPythonVersionAtLeast38}} method: > {code} > def isPythonVersionAtLeast38(): Boolean = { > val attempt = if (Utils.isWindows) { > Try(Process(Seq("cmd.exe", "/C", "python3 --version")) > .run(ProcessLogger(s => s.startsWith("Python 3.8") || > s.startsWith("Python 3.9"))) > .exitValue()) > } else { > Try(Process(Seq("sh", "-c", "python3 --version")) > .run(ProcessLogger(s => s.startsWith("Python 3.8") || > s.startsWith("Python 3.9"))) > .exitValue()) > } > attempt.isSuccess && attempt.get == 0 > } > {code} > It's trying to evaluate the version of Python using a {{ProcessLogger}}, but > the logger accepts a {{String => Unit}} function, i.e., it does not make use > of the return value in any way (since it's meant for logging). So the result > of the {{startsWith}} checks are thrown away, and {{attempt.isSuccess && > attempt.get == 0}} will always be true as long as your system has a > {{python3}} binary of any version. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37047) Add overloads for lpad and rpad for BINARY strings
[ https://issues.apache.org/jira/browse/SPARK-37047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435105#comment-17435105 ] Apache Spark commented on SPARK-37047: -- User 'mkaravel' has created a pull request for this issue: https://github.com/apache/spark/pull/34407 > Add overloads for lpad and rpad for BINARY strings > -- > > Key: SPARK-37047 > URL: https://issues.apache.org/jira/browse/SPARK-37047 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.3.0 >Reporter: Menelaos Karavelas >Assignee: Menelaos Karavelas >Priority: Major > Fix For: 3.3.0 > > > Currently, `lpad` and `rpad` accept BINARY strings as input (both in terms of > input string to be padded and padding pattern), and these strings get cast to > UTF8 strings. The result of the operation is a UTF8 string which may be > invalid as it can contain non-UTF8 characters. > What we would like to do is to overload `lpad` and `rpad` to accept BINARY > strings as inputs (both for the string to be padded and the padding pattern) > and produce a left or right padded BINARY string as output. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37134) documentation - unclear "Using PySpark Native Features"
[ https://issues.apache.org/jira/browse/SPARK-37134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] carl rees updated SPARK-37134: -- Description: sorry, no idea on the version it affects or what Shepard is? no explanation on this form so guessed whatever! This page of your documentation is UNCLEAR paragraph "Using PySpark Native Features" QUOTE "PySpark allows to upload Python files ({{.py}}), zipped Python packages ({{.zip}}), and Egg files ({{.egg}}) to the executors by: * Setting the configuration setting {{spark.submit.pyFiles}} * Setting {{--py-files}} option in Spark scripts * Directly calling [{{pyspark.SparkContext.addPyFile()}}|https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.SparkContext.addPyFile.html#pyspark.SparkContext.addPyFile] in applications QUESTION: is this all of the above or each of the above steps? suggest adding "OR" between each bullet point? [https://spark.apache.org/docs/latest/api/python/user_guide/python_packaging.html] was: sorry, no idea on the version it affects or what Shepard is? no explanation on this form so guessed whatever! This page of your documentation is UNCLEAR paragraph "Using PySpark Native Features" QUOTE "PySpark allows to upload Python files ({{.py}}), zipped Python packages ({{.zip}}), and Egg files ({{.egg}}) to the executors by: * Setting the configuration setting {{spark.submit.pyFiles}} * Setting {{--py-files}} option in Spark scripts * Directly calling [{{pyspark.SparkContext.addPyFile()}}|https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.SparkContext.addPyFile.html#pyspark.SparkContext.addPyFile] in applications QUESTION: is this all of the above or each of the above steps? [https://spark.apache.org/docs/latest/api/python/user_guide/python_packaging.html] > documentation - unclear "Using PySpark Native Features" > --- > > Key: SPARK-37134 > URL: https://issues.apache.org/jira/browse/SPARK-37134 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.6.2 > Environment: ? >Reporter: carl rees >Priority: Critical > Fix For: 1.6.2 > > > sorry, no idea on the version it affects or what Shepard is? no explanation > on this form so guessed whatever! > > This page of your documentation is UNCLEAR > paragraph "Using PySpark Native Features" QUOTE > "PySpark allows to upload Python files ({{.py}}), zipped Python packages > ({{.zip}}), and Egg files ({{.egg}}) to the executors by: > * Setting the configuration setting {{spark.submit.pyFiles}} > * Setting {{--py-files}} option in Spark scripts > * Directly calling > [{{pyspark.SparkContext.addPyFile()}}|https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.SparkContext.addPyFile.html#pyspark.SparkContext.addPyFile] > in applications > > QUESTION: is this all of the above or each of the above steps? > suggest adding "OR" between each bullet point? > > [https://spark.apache.org/docs/latest/api/python/user_guide/python_packaging.html] > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37134) documentation - unclear "Using PySpark Native Features"
carl rees created SPARK-37134: - Summary: documentation - unclear "Using PySpark Native Features" Key: SPARK-37134 URL: https://issues.apache.org/jira/browse/SPARK-37134 Project: Spark Issue Type: Improvement Components: Documentation Affects Versions: 1.6.2 Environment: ? Reporter: carl rees Fix For: 1.6.2 sorry, no idea on the version it affects or what Shepard is? no explanation on this form so guessed whatever! This page of your documentation is UNCLEAR paragraph "Using PySpark Native Features" QUOTE "PySpark allows to upload Python files ({{.py}}), zipped Python packages ({{.zip}}), and Egg files ({{.egg}}) to the executors by: * Setting the configuration setting {{spark.submit.pyFiles}} * Setting {{--py-files}} option in Spark scripts * Directly calling [{{pyspark.SparkContext.addPyFile()}}|https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.SparkContext.addPyFile.html#pyspark.SparkContext.addPyFile] in applications QUESTION: is this all of the above or each of the above steps? [https://spark.apache.org/docs/latest/api/python/user_guide/python_packaging.html] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36646) Push down group by partition column for Aggregate (Min/Max/Count) for Parquet
[ https://issues.apache.org/jira/browse/SPARK-36646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434995#comment-17434995 ] Apache Spark commented on SPARK-36646: -- User 'huaxingao' has created a pull request for this issue: https://github.com/apache/spark/pull/34405 > Push down group by partition column for Aggregate (Min/Max/Count) for Parquet > - > > Key: SPARK-36646 > URL: https://issues.apache.org/jira/browse/SPARK-36646 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Huaxin Gao >Priority: Major > > If Aggregate (Min/Max/Count) in parquet is group by partition column, push > down group by -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36646) Push down group by partition column for Aggregate (Min/Max/Count) for Parquet
[ https://issues.apache.org/jira/browse/SPARK-36646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36646: Assignee: (was: Apache Spark) > Push down group by partition column for Aggregate (Min/Max/Count) for Parquet > - > > Key: SPARK-36646 > URL: https://issues.apache.org/jira/browse/SPARK-36646 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Huaxin Gao >Priority: Major > > If Aggregate (Min/Max/Count) in parquet is group by partition column, push > down group by -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36646) Push down group by partition column for Aggregate (Min/Max/Count) for Parquet
[ https://issues.apache.org/jira/browse/SPARK-36646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36646: Assignee: Apache Spark > Push down group by partition column for Aggregate (Min/Max/Count) for Parquet > - > > Key: SPARK-36646 > URL: https://issues.apache.org/jira/browse/SPARK-36646 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Huaxin Gao >Assignee: Apache Spark >Priority: Major > > If Aggregate (Min/Max/Count) in parquet is group by partition column, push > down group by -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-30220) Support Filter expression uses IN/EXISTS predicate sub-queries
[ https://issues.apache.org/jira/browse/SPARK-30220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-30220: Assignee: (was: Apache Spark) > Support Filter expression uses IN/EXISTS predicate sub-queries > -- > > Key: SPARK-30220 > URL: https://issues.apache.org/jira/browse/SPARK-30220 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: jiaan.geng >Priority: Major > > Spark SQL cannot supports a SQL with nested aggregate as below: > > {code:java} > select sum(unique1) FILTER (WHERE > unique1 IN (SELECT unique1 FROM onek where unique1 < 100)) FROM tenk1;{code} > > And Spark will throw exception as follows: > > {code:java} > org.apache.spark.sql.AnalysisException > IN/EXISTS predicate sub-queries can only be used in Filter/Join and a few > commands: Aggregate [sum(cast(unique1#x as bigint)) AS sum(unique1)#xL] > : +- Project [unique1#x] > : +- Filter (unique1#x < 100) > : +- SubqueryAlias `onek` > : +- RelationV2[unique1#x, unique2#x, two#x, four#x, ten#x, twenty#x, > hundred#x, thousand#x, twothousand#x, fivethous#x, tenthous#x, odd#x, even#x, > stringu1#x, stringu2#x, string4#x] csv > file:/home/xitong/code/gengjiaan/spark/sql/core/target/scala-2.12/test-classes/test-data/postgresql/onek.data > +- SubqueryAlias `tenk1` > +- RelationV2[unique1#x, unique2#x, two#x, four#x, ten#x, twenty#x, > hundred#x, thousand#x, twothousand#x, fivethous#x, tenthous#x, odd#x, even#x, > stringu1#x, stringu2#x, string4#x] csv > file:/home/xitong/code/gengjiaan/spark/sql/core/target/scala-2.12/test-classes/test-data/postgresql/tenk.data{code} > > But PostgreSQL supports this syntax. > {code:java} > select sum(unique1) FILTER (WHERE > unique1 IN (SELECT unique1 FROM onek where unique1 < 100)) FROM tenk1; > sum > -- > 4950 > (1 row){code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-30220) Support Filter expression uses IN/EXISTS predicate sub-queries
[ https://issues.apache.org/jira/browse/SPARK-30220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-30220: Assignee: Apache Spark > Support Filter expression uses IN/EXISTS predicate sub-queries > -- > > Key: SPARK-30220 > URL: https://issues.apache.org/jira/browse/SPARK-30220 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: jiaan.geng >Assignee: Apache Spark >Priority: Major > > Spark SQL cannot supports a SQL with nested aggregate as below: > > {code:java} > select sum(unique1) FILTER (WHERE > unique1 IN (SELECT unique1 FROM onek where unique1 < 100)) FROM tenk1;{code} > > And Spark will throw exception as follows: > > {code:java} > org.apache.spark.sql.AnalysisException > IN/EXISTS predicate sub-queries can only be used in Filter/Join and a few > commands: Aggregate [sum(cast(unique1#x as bigint)) AS sum(unique1)#xL] > : +- Project [unique1#x] > : +- Filter (unique1#x < 100) > : +- SubqueryAlias `onek` > : +- RelationV2[unique1#x, unique2#x, two#x, four#x, ten#x, twenty#x, > hundred#x, thousand#x, twothousand#x, fivethous#x, tenthous#x, odd#x, even#x, > stringu1#x, stringu2#x, string4#x] csv > file:/home/xitong/code/gengjiaan/spark/sql/core/target/scala-2.12/test-classes/test-data/postgresql/onek.data > +- SubqueryAlias `tenk1` > +- RelationV2[unique1#x, unique2#x, two#x, four#x, ten#x, twenty#x, > hundred#x, thousand#x, twothousand#x, fivethous#x, tenthous#x, odd#x, even#x, > stringu1#x, stringu2#x, string4#x] csv > file:/home/xitong/code/gengjiaan/spark/sql/core/target/scala-2.12/test-classes/test-data/postgresql/tenk.data{code} > > But PostgreSQL supports this syntax. > {code:java} > select sum(unique1) FILTER (WHERE > unique1 IN (SELECT unique1 FROM onek where unique1 < 100)) FROM tenk1; > sum > -- > 4950 > (1 row){code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30220) Support Filter expression uses IN/EXISTS predicate sub-queries
[ https://issues.apache.org/jira/browse/SPARK-30220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434897#comment-17434897 ] Apache Spark commented on SPARK-30220: -- User 'tanelk' has created a pull request for this issue: https://github.com/apache/spark/pull/34402 > Support Filter expression uses IN/EXISTS predicate sub-queries > -- > > Key: SPARK-30220 > URL: https://issues.apache.org/jira/browse/SPARK-30220 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: jiaan.geng >Priority: Major > > Spark SQL cannot supports a SQL with nested aggregate as below: > > {code:java} > select sum(unique1) FILTER (WHERE > unique1 IN (SELECT unique1 FROM onek where unique1 < 100)) FROM tenk1;{code} > > And Spark will throw exception as follows: > > {code:java} > org.apache.spark.sql.AnalysisException > IN/EXISTS predicate sub-queries can only be used in Filter/Join and a few > commands: Aggregate [sum(cast(unique1#x as bigint)) AS sum(unique1)#xL] > : +- Project [unique1#x] > : +- Filter (unique1#x < 100) > : +- SubqueryAlias `onek` > : +- RelationV2[unique1#x, unique2#x, two#x, four#x, ten#x, twenty#x, > hundred#x, thousand#x, twothousand#x, fivethous#x, tenthous#x, odd#x, even#x, > stringu1#x, stringu2#x, string4#x] csv > file:/home/xitong/code/gengjiaan/spark/sql/core/target/scala-2.12/test-classes/test-data/postgresql/onek.data > +- SubqueryAlias `tenk1` > +- RelationV2[unique1#x, unique2#x, two#x, four#x, ten#x, twenty#x, > hundred#x, thousand#x, twothousand#x, fivethous#x, tenthous#x, odd#x, even#x, > stringu1#x, stringu2#x, string4#x] csv > file:/home/xitong/code/gengjiaan/spark/sql/core/target/scala-2.12/test-classes/test-data/postgresql/tenk.data{code} > > But PostgreSQL supports this syntax. > {code:java} > select sum(unique1) FILTER (WHERE > unique1 IN (SELECT unique1 FROM onek where unique1 < 100)) FROM tenk1; > sum > -- > 4950 > (1 row){code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37133) Add a config to optionally enforce ANSI reserved keywords
[ https://issues.apache.org/jira/browse/SPARK-37133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37133: Assignee: Apache Spark (was: Wenchen Fan) > Add a config to optionally enforce ANSI reserved keywords > - > > Key: SPARK-37133 > URL: https://issues.apache.org/jira/browse/SPARK-37133 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Wenchen Fan >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37133) Add a config to optionally enforce ANSI reserved keywords
[ https://issues.apache.org/jira/browse/SPARK-37133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434896#comment-17434896 ] Apache Spark commented on SPARK-37133: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/34403 > Add a config to optionally enforce ANSI reserved keywords > - > > Key: SPARK-37133 > URL: https://issues.apache.org/jira/browse/SPARK-37133 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37133) Add a config to optionally enforce ANSI reserved keywords
[ https://issues.apache.org/jira/browse/SPARK-37133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37133: Assignee: Wenchen Fan (was: Apache Spark) > Add a config to optionally enforce ANSI reserved keywords > - > > Key: SPARK-37133 > URL: https://issues.apache.org/jira/browse/SPARK-37133 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37132) Incorrect Spark 3.2.0 package names with included Hadoop binaries
Denis Krivenko created SPARK-37132: -- Summary: Incorrect Spark 3.2.0 package names with included Hadoop binaries Key: SPARK-37132 URL: https://issues.apache.org/jira/browse/SPARK-37132 Project: Spark Issue Type: Bug Components: Build, Documentation Affects Versions: 3.2.0 Reporter: Denis Krivenko *Spark 3.2.0+Hadoop* packages contains Hadoop 3.3 binaries, however file names still refer to Hadoop 3.2, i.e. _spark-3.2.0-bin-*hadoop3.2*.tgz_ [https://dlcdn.apache.org/spark/spark-3.2.0/] [https://dlcdn.apache.org/spark/spark-3.2.0/spark-3.2.0-bin-hadoop3.2.tgz] [https://dlcdn.apache.org/spark/spark-3.2.0/spark-3.2.0-bin-hadoop3.2-scala2.13.tgz] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37133) Add a config to optionally enforce ANSI reserved keywords
Wenchen Fan created SPARK-37133: --- Summary: Add a config to optionally enforce ANSI reserved keywords Key: SPARK-37133 URL: https://issues.apache.org/jira/browse/SPARK-37133 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.0 Reporter: Wenchen Fan Assignee: Wenchen Fan -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37071) OpenHashMap should be serializable without reference tracking
[ https://issues.apache.org/jira/browse/SPARK-37071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-37071. -- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 34351 [https://github.com/apache/spark/pull/34351] > OpenHashMap should be serializable without reference tracking > - > > Key: SPARK-37071 > URL: https://issues.apache.org/jira/browse/SPARK-37071 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: Emil Ejbyfeldt >Assignee: Emil Ejbyfeldt >Priority: Minor > Fix For: 3.3.0 > > > The current implementation of OpenHashMap does not serialize without kryo > reference tracking turned on. This is unexpected from a simple type like > OpenHashMap, and forces the users to turn of reference tracking to use code > where OpenHashMap is used. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37071) OpenHashMap should be serializable without reference tracking
[ https://issues.apache.org/jira/browse/SPARK-37071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-37071: Assignee: Emil Ejbyfeldt > OpenHashMap should be serializable without reference tracking > - > > Key: SPARK-37071 > URL: https://issues.apache.org/jira/browse/SPARK-37071 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: Emil Ejbyfeldt >Assignee: Emil Ejbyfeldt >Priority: Minor > > The current implementation of OpenHashMap does not serialize without kryo > reference tracking turned on. This is unexpected from a simple type like > OpenHashMap, and forces the users to turn of reference tracking to use code > where OpenHashMap is used. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37131) Support use IN/EXISTS with subquery in Project/Aggregate
[ https://issues.apache.org/jira/browse/SPARK-37131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tongwei updated SPARK-37131: Description: {code:java} CREATE TABLE tbl1 (col1 INT, col2 STRING) USING PARQUET; INSERT OVERWRITE TABLE tbl1 SELECT 0,1; CREATE TABLE tbl2 (c1 INT, c2 STRING) USING PARQUET; INSERT OVERWRITE TABLE tbl2 SELECT 0,2; case 1: select c1 in (select col1 from tbl1) from tbl2 Error msg: IN/EXISTS predicate sub-queries can only be used in Filter/Join and a few commands: Project [] case 2: select count(1), case when c1 in (select col1 from tbl1) then "A" else "B" end as tag from tbl2 group by case when c1 in (select col1 from tbl1) then "A" else "B" end Error msg: IN/EXISTS predicate sub-queries can only be used in Filter/Join and a few commands: Aggregate [] {code} was: {code:java} CREATE TABLE tbl1 (col1 INT, col2 STRING) USING PARQUET; INSERT OVERWRITE TABLE tbl1 SELECT 0,1; CREATE TABLE tbl2 (c1 INT, c2 STRING) USING PARQUET; INSERT OVERWRITE TABLE tbl2 SELECT 0,2; case 1: select c1 in (select col1 from tbl1) from tbl2 Error msg: IN/EXISTS predicate sub-queries can only be used in Filter/Join and a few commands: Project [] case 2: select count(1), case when c1 in (select col1 from tbl1) then "A" else "B" end as tag from tbl2 group by case when c1 in (select col1 from tbl1) then "A" else "B" end Error msg: IN/EXISTS predicate sub-queries can only be used in Filter/Join and a few commands: Aggregate [] {code} > Support use IN/EXISTS with subquery in Project/Aggregate > > > Key: SPARK-37131 > URL: https://issues.apache.org/jira/browse/SPARK-37131 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Tongwei >Priority: Major > > {code:java} > CREATE TABLE tbl1 (col1 INT, col2 STRING) USING PARQUET; > INSERT OVERWRITE TABLE tbl1 SELECT 0,1; > CREATE TABLE tbl2 (c1 INT, c2 STRING) USING PARQUET; > INSERT OVERWRITE TABLE tbl2 SELECT 0,2; > case 1: > select c1 in (select col1 from tbl1) from tbl2 > Error msg: > IN/EXISTS predicate sub-queries can only be used in Filter/Join and a > few commands: Project [] > case 2: > select count(1), case when c1 in (select col1 from tbl1) then "A" else > "B" end as tag from tbl2 group by case when c1 in (select col1 from tbl1) > then "A" else "B" end > Error msg: > IN/EXISTS predicate sub-queries can only be used in Filter/Join and a > few commands: Aggregate [] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37131) Support use IN/EXISTS with subquery in Project/Aggregate
[ https://issues.apache.org/jira/browse/SPARK-37131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tongwei updated SPARK-37131: Description: {code:java} CREATE TABLE tbl1 (col1 INT, col2 STRING) USING PARQUET; INSERT OVERWRITE TABLE tbl1 SELECT 0,1; CREATE TABLE tbl2 (c1 INT, c2 STRING) USING PARQUET; INSERT OVERWRITE TABLE tbl2 SELECT 0,2; case 1: select c1 in (select col1 from tbl1) from tbl2 Error msg: IN/EXISTS predicate sub-queries can only be used in Filter/Join and a few commands: Project [] case 2: select count(1), case when c1 in (select col1 from tbl1) then "A" else "B" end as tag from tbl2 group by case when c1 in (select col1 from tbl1) then "A" else "B" end Error msg: IN/EXISTS predicate sub-queries can only be used in Filter/Join and a few commands: Aggregate [] {code} was: ``` CREATE TABLE tbl1 (col1 INT, col2 STRING) USING PARQUET; INSERT OVERWRITE TABLE tbl1 SELECT 0,1; CREATE TABLE tbl2 (c1 INT, c2 STRING) USING PARQUET; INSERT OVERWRITE TABLE tbl2 SELECT 0,2; case 1: select c1 in (select col1 from tbl1) from tbl2 Error msg: IN/EXISTS predicate sub-queries can only be used in Filter/Join and a few commands: Project [] case 2: select count(1), case when c1 in (select col1 from tbl1) then "A" else "B" end as tag from tbl2 group by case when c1 in (select col1 from tbl1) then "A" else "B" end Error msg: IN/EXISTS predicate sub-queries can only be used in Filter/Join and a few commands: Aggregate [] ``` > Support use IN/EXISTS with subquery in Project/Aggregate > > > Key: SPARK-37131 > URL: https://issues.apache.org/jira/browse/SPARK-37131 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Tongwei >Priority: Major > > > > {code:java} > CREATE TABLE tbl1 (col1 INT, col2 STRING) USING PARQUET; > INSERT OVERWRITE TABLE tbl1 SELECT 0,1; > CREATE TABLE tbl2 (c1 INT, c2 STRING) USING PARQUET; > INSERT OVERWRITE TABLE tbl2 SELECT 0,2; > case 1: > select c1 in (select col1 from tbl1) from tbl2 > Error msg: > IN/EXISTS predicate sub-queries can only be used in Filter/Join and a > few commands: Project [] > case 2: > select count(1), case when c1 in (select col1 from tbl1) then "A" else > "B" end as tag from tbl2 group by case when c1 in (select col1 from tbl1) > then "A" else "B" end > Error msg: > IN/EXISTS predicate sub-queries can only be used in Filter/Join and a > few commands: Aggregate [] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37131) Support use IN/EXISTS with subquery in Project/Aggregate
[ https://issues.apache.org/jira/browse/SPARK-37131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tongwei updated SPARK-37131: Description: CREATE TABLE tbl1 (col1 INT, col2 STRING) USING PARQUET; INSERT OVERWRITE TABLE tbl1 SELECT 0,1; CREATE TABLE tbl2 (c1 INT, c2 STRING) USING PARQUET; INSERT OVERWRITE TABLE tbl2 SELECT 0,2; case 1: select c1 in (select col1 from tbl1) from tbl2 Error msg: IN/EXISTS predicate sub-queries can only be used in Filter/Join and a few commands: Project [] case 2: select count(1), case when c1 in (select col1 from tbl1) then "A" else "B" end as tag from tbl2 group by case when c1 in (select col1 from tbl1) then "A" else "B" end Error msg: IN/EXISTS predicate sub-queries can only be used in Filter/Join and a few commands: Aggregate [] was: CREATE TABLE tbl1 (col1 INT, col2 STRING) USING PARQUET; INSERT OVERWRITE TABLE tbl1 SELECT 0,1; CREATE TABLE tbl2 (c1 INT, c2 STRING) USING PARQUET; INSERT OVERWRITE TABLE tbl2 SELECT 0,2; case 1: select c1 in (select col1 from tbl1) from tbl2 Error msg: IN/EXISTS predicate sub-queries can only be used in Filter/Join and a few commands: Project [] case 2: select count(*), case when c1 in (select col1 from tbl1) then "A" else "B" end as tag from tbl2 group by case when c1 in (select col1 from tbl1) then "A" else "B" end Error msg: IN/EXISTS predicate sub-queries can only be used in Filter/Join and a few commands: Aggregate [] > Support use IN/EXISTS with subquery in Project/Aggregate > > > Key: SPARK-37131 > URL: https://issues.apache.org/jira/browse/SPARK-37131 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Tongwei >Priority: Major > > CREATE TABLE tbl1 (col1 INT, col2 STRING) USING PARQUET; > INSERT OVERWRITE TABLE tbl1 SELECT 0,1; > CREATE TABLE tbl2 (c1 INT, c2 STRING) USING PARQUET; > INSERT OVERWRITE TABLE tbl2 SELECT 0,2; > case 1: > select c1 in (select col1 from tbl1) from tbl2 > Error msg: > IN/EXISTS predicate sub-queries can only be used in Filter/Join and a few > commands: Project [] > case 2: > select count(1), case when c1 in (select col1 from tbl1) then "A" else "B" > end as tag from tbl2 group by case when c1 in (select col1 from tbl1) then > "A" else "B" end > Error msg: > IN/EXISTS predicate sub-queries can only be used in Filter/Join and a few > commands: Aggregate [] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37131) Support use IN/EXISTS with subquery in Project/Aggregate
[ https://issues.apache.org/jira/browse/SPARK-37131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tongwei updated SPARK-37131: Description: ``` CREATE TABLE tbl1 (col1 INT, col2 STRING) USING PARQUET; INSERT OVERWRITE TABLE tbl1 SELECT 0,1; CREATE TABLE tbl2 (c1 INT, c2 STRING) USING PARQUET; INSERT OVERWRITE TABLE tbl2 SELECT 0,2; case 1: select c1 in (select col1 from tbl1) from tbl2 Error msg: IN/EXISTS predicate sub-queries can only be used in Filter/Join and a few commands: Project [] case 2: select count(1), case when c1 in (select col1 from tbl1) then "A" else "B" end as tag from tbl2 group by case when c1 in (select col1 from tbl1) then "A" else "B" end Error msg: IN/EXISTS predicate sub-queries can only be used in Filter/Join and a few commands: Aggregate [] ``` was: CREATE TABLE tbl1 (col1 INT, col2 STRING) USING PARQUET; INSERT OVERWRITE TABLE tbl1 SELECT 0,1; CREATE TABLE tbl2 (c1 INT, c2 STRING) USING PARQUET; INSERT OVERWRITE TABLE tbl2 SELECT 0,2; case 1: select c1 in (select col1 from tbl1) from tbl2 Error msg: IN/EXISTS predicate sub-queries can only be used in Filter/Join and a few commands: Project [] case 2: select count(1), case when c1 in (select col1 from tbl1) then "A" else "B" end as tag from tbl2 group by case when c1 in (select col1 from tbl1) then "A" else "B" end Error msg: IN/EXISTS predicate sub-queries can only be used in Filter/Join and a few commands: Aggregate [] > Support use IN/EXISTS with subquery in Project/Aggregate > > > Key: SPARK-37131 > URL: https://issues.apache.org/jira/browse/SPARK-37131 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Tongwei >Priority: Major > > ``` > CREATE TABLE tbl1 (col1 INT, col2 STRING) USING PARQUET; > INSERT OVERWRITE TABLE tbl1 SELECT 0,1; > CREATE TABLE tbl2 (c1 INT, c2 STRING) USING PARQUET; > INSERT OVERWRITE TABLE tbl2 SELECT 0,2; > case 1: > select c1 in (select col1 from tbl1) from tbl2 > Error msg: > IN/EXISTS predicate sub-queries can only be used in Filter/Join and a few > commands: Project [] > case 2: > select count(1), case when c1 in (select col1 from tbl1) then "A" else "B" > end as tag from tbl2 group by case when c1 in (select col1 from tbl1) then > "A" else "B" end > Error msg: > IN/EXISTS predicate sub-queries can only be used in Filter/Join and a few > commands: Aggregate [] > ``` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37131) Support use IN/EXISTS with subquery in Project/Aggregate
Tongwei created SPARK-37131: --- Summary: Support use IN/EXISTS with subquery in Project/Aggregate Key: SPARK-37131 URL: https://issues.apache.org/jira/browse/SPARK-37131 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: Tongwei CREATE TABLE tbl1 (col1 INT, col2 STRING) USING PARQUET; INSERT OVERWRITE TABLE tbl1 SELECT 0,1; CREATE TABLE tbl2 (c1 INT, c2 STRING) USING PARQUET; INSERT OVERWRITE TABLE tbl2 SELECT 0,2; case 1: select c1 in (select col1 from tbl1) from tbl2 Error msg: IN/EXISTS predicate sub-queries can only be used in Filter/Join and a few commands: Project [] case 2: select count(*), case when c1 in (select col1 from tbl1) then "A" else "B" end as tag from tbl2 group by case when c1 in (select col1 from tbl1) then "A" else "B" end Error msg: IN/EXISTS predicate sub-queries can only be used in Filter/Join and a few commands: Aggregate [] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37130) why spark-X.X.X-bin-without-hadoop.tgz does not provide spark-hive_X.jar (and spark-hive-thriftserver_X.jar)
[ https://issues.apache.org/jira/browse/SPARK-37130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434848#comment-17434848 ] Patrice DUROUX commented on SPARK-37130: ps: a diff output {{$ diff spark-without.lst spark-with.lst }} {{667a668}} {{> /jars/activation-1.1.1.jar}} {{671a673}} {{> /jars/antlr-runtime-3.5.2.jar}} {{684a687}} {{> /jars/bonecp-0.8.0.RELEASE.jar}} {{689a693}} {{> /jars/commons-cli-1.2.jar}} {{694a699}} {{> /jars/commons-dbcp-1.4.jar}} {{695a701}} {{> /jars/commons-lang-2.6.jar}} {{696a703}} {{> /jars/commons-logging-1.1.3.jar}} {{698a706}} {{> /jars/commons-pool-1.5.4.jar}} {{701a710,717}} {{> /jars/curator-client-2.13.0.jar}} {{> /jars/curator-framework-2.13.0.jar}} {{> /jars/curator-recipes-2.13.0.jar}} {{> /jars/datanucleus-api-jdo-4.2.4.jar}} {{> /jars/datanucleus-core-4.1.17.jar}} {{> /jars/datanucleus-rdbms-4.1.19.jar}} {{> /jars/derby-10.14.2.0.jar}} {{> /jars/dropwizard-metrics-hadoop-metrics2-reporter-0.1.2.jar}} {{704c720,739}} {{< /jars/gson-2.8.6.jar}} {{---}} {{> /jars/gson-2.2.4.jar}} {{> /jars/guava-14.0.1.jar}} {{> /jars/hadoop-client-api-3.3.1.jar}} {{> /jars/hadoop-client-runtime-3.3.1.jar}} {{> /jars/hadoop-shaded-guava-1.1.1.jar}} {{> /jars/hadoop-yarn-server-web-proxy-3.3.1.jar}} {{> /jars/HikariCP-2.5.1.jar}} {{> /jars/hive-beeline-2.3.9.jar}} {{> /jars/hive-cli-2.3.9.jar}} {{> /jars/hive-common-2.3.9.jar}} {{> /jars/hive-exec-2.3.9-core.jar}} {{> /jars/hive-jdbc-2.3.9.jar}} {{> /jars/hive-llap-common-2.3.9.jar}} {{> /jars/hive-metastore-2.3.9.jar}} {{> /jars/hive-serde-2.3.9.jar}} {{> /jars/hive-service-rpc-3.1.2.jar}} {{> /jars/hive-shims-0.23-2.3.9.jar}} {{> /jars/hive-shims-2.3.9.jar}} {{> /jars/hive-shims-common-2.3.9.jar}} {{> /jars/hive-shims-scheduler-2.3.9.jar}} {{705a741}} {{> /jars/hive-vector-code-gen-2.3.9.jar}} {{708a745,747}} {{> /jars/htrace-core4-4.1.0-incubating.jar}} {{> /jars/httpclient-4.5.13.jar}} {{> /jars/httpcore-4.4.14.jar}} {{712a752}} {{> /jars/jackson-core-asl-1.9.13.jar}} {{715a756}} {{> /jars/jackson-mapper-asl-1.9.13.jar}} {{724a766,767}} {{> /jars/javax.jdo-3.2.0-m3.jar}} {{> /jars/javolution-5.5.1.jar}} {{727a771}} {{> /jars/jdo-api-3.0.1.jar}} {{734a779,783}} {{> /jars/jline-2.14.6.jar}} {{> /jars/joda-time-2.10.10.jar}} {{> /jars/jodd-core-3.5.2.jar}} {{> /jars/jpam-1.1.jar}} {{> /jars/json-1.8.jar}} {{739a789}} {{> /jars/jta-1.1.jar}} {{765a816,818}} {{> /jars/libfb303-0.9.3.jar}} {{> /jars/libthrift-0.12.0.jar}} {{> /jars/log4j-1.2.17.jar}} {{792a846}} {{> /jars/protobuf-java-2.5.0.jar}} {{804a859,860}} {{> /jars/slf4j-api-1.7.30.jar}} {{> /jars/slf4j-log4j12-1.7.30.jar}} {{809a866,867}} {{> /jars/spark-hive_2.12-3.2.0.jar}} {{> /jars/spark-hive-thriftserver_2.12-3.2.0.jar}} {{829a888,889}} {{> /jars/ST4-4.0.4.jar}} {{> /jars/stax-api-1.0.1.jar}} {{830a891}} {{> /jars/super-csv-2.2.0.jar}} {{832a894}} {{> /jars/transaction-api-1.1.jar}} {{833a896}} {{> /jars/velocity-1.5.jar}} {{836a900,901}} {{> /jars/zookeeper-3.6.2.jar}} {{> /jars/zookeeper-jute-3.6.2.jar}} {{919a985}} {{> /python/dist/}} {{1015a1082,1087}} {{> /python/pyspark.egg-info/}} {{> /python/pyspark.egg-info/dependency_links.txt}} {{> /python/pyspark.egg-info/PKG-INFO}} {{> /python/pyspark.egg-info/requires.txt}} {{> /python/pyspark.egg-info/SOURCES.txt}} {{> /python/pyspark.egg-info/top_level.txt}} {{1269a1342,1346}} {{> /python/pyspark/__pycache__/}} {{> /python/pyspark/__pycache__/install.cpython-38.pyc}} {{> /python/pyspark/python/}} {{> /python/pyspark/python/pyspark/}} {{> /python/pyspark/python/pyspark/shell.py}} {{1497a1575,1579}} {{> /R/lib/SparkR/doc/}} {{> /R/lib/SparkR/doc/index.html}} {{> /R/lib/SparkR/doc/sparkr-vignettes.html}} {{> /R/lib/SparkR/doc/sparkr-vignettes.R}} {{> /R/lib/SparkR/doc/sparkr-vignettes.Rmd}} {{1514a1597}} {{> /R/lib/SparkR/Meta/vignette.rds}} > why spark-X.X.X-bin-without-hadoop.tgz does not provide spark-hive_X.jar (and > spark-hive-thriftserver_X.jar) > > > Key: SPARK-37130 > URL: https://issues.apache.org/jira/browse/SPARK-37130 > Project: Spark > Issue Type: Improvement > Components: Deploy >Affects Versions: 3.1.2, 3.2.0 >Reporter: Patrice DUROUX >Priority: Minor > > Hi, > As my deployment is having its own Hadoop(+Hive) installed, I have tried to > install Spark using its bundle without Hadoop. I suspect that some jars are > missing that are present in the corresponding spark-X.X.X-bin-hadoop3.2.tgz. > After comparing their contents both spark-hive_2.12-X.X.X.jar and > spark-hive-thriftserver_2.12-X.X.X.jar are not in the > spark-X.X.X-bin-without---hadoop.tgz. And I don't know if some others should > also be there. > Thanks, > Patrice > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (SPARK-37130) why spark-X.X.X-bin-without-hadoop.tgz does not provide spark-hive_X.jar (and spark-hive-thriftserver_X.jar)
Patrice DUROUX created SPARK-37130: -- Summary: why spark-X.X.X-bin-without-hadoop.tgz does not provide spark-hive_X.jar (and spark-hive-thriftserver_X.jar) Key: SPARK-37130 URL: https://issues.apache.org/jira/browse/SPARK-37130 Project: Spark Issue Type: Improvement Components: Deploy Affects Versions: 3.2.0, 3.1.2 Reporter: Patrice DUROUX Hi, As my deployment is having its own Hadoop(+Hive) installed, I have tried to install Spark using its bundle without Hadoop. I suspect that some jars are missing that are present in the corresponding spark-X.X.X-bin-hadoop3.2.tgz. After comparing their contents both spark-hive_2.12-X.X.X.jar and spark-hive-thriftserver_2.12-X.X.X.jar are not in the spark-X.X.X-bin-without---hadoop.tgz. And I don't know if some others should also be there. Thanks, Patrice -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30537) toPandas gets wrong dtypes when applied on empty DF when Arrow enabled
[ https://issues.apache.org/jira/browse/SPARK-30537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434840#comment-17434840 ] Apache Spark commented on SPARK-30537: -- User 'pralabhkumar' has created a pull request for this issue: https://github.com/apache/spark/pull/34401 > toPandas gets wrong dtypes when applied on empty DF when Arrow enabled > -- > > Key: SPARK-30537 > URL: https://issues.apache.org/jira/browse/SPARK-30537 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 2.4.4, 3.0.0 >Reporter: Hyukjin Kwon >Priority: Major > > Same issue with SPARK-29188 persists when Arrow optimization is enabled. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-30537) toPandas gets wrong dtypes when applied on empty DF when Arrow enabled
[ https://issues.apache.org/jira/browse/SPARK-30537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-30537: Assignee: Apache Spark > toPandas gets wrong dtypes when applied on empty DF when Arrow enabled > -- > > Key: SPARK-30537 > URL: https://issues.apache.org/jira/browse/SPARK-30537 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 2.4.4, 3.0.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > > Same issue with SPARK-29188 persists when Arrow optimization is enabled. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-30537) toPandas gets wrong dtypes when applied on empty DF when Arrow enabled
[ https://issues.apache.org/jira/browse/SPARK-30537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-30537: Assignee: (was: Apache Spark) > toPandas gets wrong dtypes when applied on empty DF when Arrow enabled > -- > > Key: SPARK-30537 > URL: https://issues.apache.org/jira/browse/SPARK-30537 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 2.4.4, 3.0.0 >Reporter: Hyukjin Kwon >Priority: Major > > Same issue with SPARK-29188 persists when Arrow optimization is enabled. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30537) toPandas gets wrong dtypes when applied on empty DF when Arrow enabled
[ https://issues.apache.org/jira/browse/SPARK-30537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434839#comment-17434839 ] Apache Spark commented on SPARK-30537: -- User 'pralabhkumar' has created a pull request for this issue: https://github.com/apache/spark/pull/34401 > toPandas gets wrong dtypes when applied on empty DF when Arrow enabled > -- > > Key: SPARK-30537 > URL: https://issues.apache.org/jira/browse/SPARK-30537 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 2.4.4, 3.0.0 >Reporter: Hyukjin Kwon >Priority: Major > > Same issue with SPARK-29188 persists when Arrow optimization is enabled. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16280) Implement histogram_numeric SQL function
[ https://issues.apache.org/jira/browse/SPARK-16280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434821#comment-17434821 ] Apache Spark commented on SPARK-16280: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/34380 > Implement histogram_numeric SQL function > > > Key: SPARK-16280 > URL: https://issues.apache.org/jira/browse/SPARK-16280 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: angerszhu >Priority: Major > Labels: bulk-closed > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-16280) Implement histogram_numeric SQL function
[ https://issues.apache.org/jira/browse/SPARK-16280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-16280: --- Assignee: angerszhu > Implement histogram_numeric SQL function > > > Key: SPARK-16280 > URL: https://issues.apache.org/jira/browse/SPARK-16280 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: angerszhu >Priority: Major > Labels: bulk-closed > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16280) Implement histogram_numeric SQL function
[ https://issues.apache.org/jira/browse/SPARK-16280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434812#comment-17434812 ] Wenchen Fan commented on SPARK-16280: - resolved by https://github.com/apache/spark/pull/34380 > Implement histogram_numeric SQL function > > > Key: SPARK-16280 > URL: https://issues.apache.org/jira/browse/SPARK-16280 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: angerszhu >Priority: Major > Labels: bulk-closed > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37082) Implement histogram_numeric aggregate function in spark
[ https://issues.apache.org/jira/browse/SPARK-37082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-37082. - Resolution: Duplicate > Implement histogram_numeric aggregate function in spark > --- > > Key: SPARK-37082 > URL: https://issues.apache.org/jira/browse/SPARK-37082 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2, 3.2.0 >Reporter: angerszhu >Priority: Major > > Implement histogram_numeric function in spark -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36975) Refactor HiveClientImpl collect hive client call logic
[ https://issues.apache.org/jira/browse/SPARK-36975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434790#comment-17434790 ] Apache Spark commented on SPARK-36975: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/34400 > Refactor HiveClientImpl collect hive client call logic > -- > > Key: SPARK-36975 > URL: https://issues.apache.org/jira/browse/SPARK-36975 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.1.2, 3.2.0 >Reporter: angerszhu >Priority: Major > > Currently, we treat one call withHiveState as one Hive Client call, it's too > weirld. Need to refator. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36975) Refactor HiveClientImpl collect hive client call logic
[ https://issues.apache.org/jira/browse/SPARK-36975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434789#comment-17434789 ] Apache Spark commented on SPARK-36975: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/34400 > Refactor HiveClientImpl collect hive client call logic > -- > > Key: SPARK-36975 > URL: https://issues.apache.org/jira/browse/SPARK-36975 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.1.2, 3.2.0 >Reporter: angerszhu >Priority: Major > > Currently, we treat one call withHiveState as one Hive Client call, it's too > weirld. Need to refator. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37129) Supplement all micro benchmark results use to Java 17
[ https://issues.apache.org/jira/browse/SPARK-37129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434762#comment-17434762 ] Yang Jie commented on SPARK-37129: -- I try to run all benchmarks on GA now > Supplement all micro benchmark results use to Java 17 > - > > Key: SPARK-37129 > URL: https://issues.apache.org/jira/browse/SPARK-37129 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35260) DataSourceV2 Function Catalog implementation
[ https://issues.apache.org/jira/browse/SPARK-35260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434754#comment-17434754 ] Dongjoon Hyun commented on SPARK-35260: --- I assigned this umbrella issue to, [~csun]. > DataSourceV2 Function Catalog implementation > > > Key: SPARK-35260 > URL: https://issues.apache.org/jira/browse/SPARK-35260 > Project: Spark > Issue Type: Umbrella > Components: SQL >Affects Versions: 3.2.0 >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > > This tracks the implementation and follow-up work for V2 Function Catalog > introduced in SPARK-27658 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35260) DataSourceV2 Function Catalog implementation
[ https://issues.apache.org/jira/browse/SPARK-35260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-35260: - Assignee: Chao Sun > DataSourceV2 Function Catalog implementation > > > Key: SPARK-35260 > URL: https://issues.apache.org/jira/browse/SPARK-35260 > Project: Spark > Issue Type: Umbrella > Components: SQL >Affects Versions: 3.2.0 >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > > This tracks the implementation and follow-up work for V2 Function Catalog > introduced in SPARK-27658 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-37129) Supplement all micro benchmark results use to Java 17
[ https://issues.apache.org/jira/browse/SPARK-37129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-37129: - Comment: was deleted (was: working on this) > Supplement all micro benchmark results use to Java 17 > - > > Key: SPARK-37129 > URL: https://issues.apache.org/jira/browse/SPARK-37129 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37129) Supplement all micro benchmark results use to Java 17
[ https://issues.apache.org/jira/browse/SPARK-37129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434711#comment-17434711 ] Yang Jie commented on SPARK-37129: -- working on this > Supplement all micro benchmark results use to Java 17 > - > > Key: SPARK-37129 > URL: https://issues.apache.org/jira/browse/SPARK-37129 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37129) Supplement all micro benchmark results use to Java 17
Yang Jie created SPARK-37129: Summary: Supplement all micro benchmark results use to Java 17 Key: SPARK-37129 URL: https://issues.apache.org/jira/browse/SPARK-37129 Project: Spark Issue Type: Sub-task Components: Spark Core, SQL Affects Versions: 3.3.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37115) Replace HiveClient call with hive shim
[ https://issues.apache.org/jira/browse/SPARK-37115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-37115: --- Assignee: angerszhu > Replace HiveClient call with hive shim > -- > > Key: SPARK-37115 > URL: https://issues.apache.org/jira/browse/SPARK-37115 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > > Replace HiveClient call with hive shim -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org