[jira] [Commented] (SPARK-37141) WorkerSuite cannot run on Mac OS

2021-10-28 Thread Yazhi Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435199#comment-17435199
 ] 

Yazhi Wang commented on SPARK-37141:


I'm working on it

> WorkerSuite cannot run on Mac OS
> 
>
> Key: SPARK-37141
> URL: https://issues.apache.org/jira/browse/SPARK-37141
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Priority: Minor
>
> After SPARK-35907 run `org.apache.spark.deploy.worker.WorkerSuite` on Mac 
> os(both M1 and Intel) failed
> {code:java}
> mvn clean install -DskipTests -pl core -am
> mvn test -pl core -Dtest=none 
> -DwildcardSuites=org.apache.spark.deploy.worker.WorkerSuite
> {code}
> {code:java}
> WorkerSuite:
> - test isUseLocalNodeSSLConfig
> - test maybeUpdateSSLSettings
> - test clearing of finishedExecutors (small number of executors)
> - test clearing of finishedExecutors (more executors)
> - test clearing of finishedDrivers (small number of drivers)
> - test clearing of finishedDrivers (more drivers)
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time:  47.973 s
> [INFO] Finished at: 2021-10-28T13:46:56+08:00
> [INFO] 
> 
> [ERROR] Failed to execute goal 
> org.scalatest:scalatest-maven-plugin:2.0.2:test (test) on project 
> spark-core_2.12: There are test failures -> [Help 1]
> [ERROR] 
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR] 
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
> {code}
> {code:java}
> 21/10/28 13:46:56.133 dispatcher-event-loop-1 ERROR Utils: Failed to create 
> directory /tmp
> java.nio.file.FileAlreadyExistsException: /tmp
>         at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:88)
>         at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>         at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>         at 
> sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
>         at java.nio.file.Files.createDirectory(Files.java:674)
>         at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
>         at java.nio.file.Files.createDirectories(Files.java:727)
>         at org.apache.spark.util.Utils$.createDirectory(Utils.scala:292)
>         at 
> org.apache.spark.deploy.worker.Worker.createWorkDir(Worker.scala:221)
>         at org.apache.spark.deploy.worker.Worker.onStart(Worker.scala:232)
>         at 
> org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:120)
>         at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
>         at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
>         at 
> org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
>         at 
> org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37129) Supplement all micro benchmark results use to Java 17

2021-10-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435200#comment-17435200
 ] 

Apache Spark commented on SPARK-37129:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/34418

> Supplement all micro benchmark results use to Java 17
> -
>
> Key: SPARK-37129
> URL: https://issues.apache.org/jira/browse/SPARK-37129
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37129) Supplement all micro benchmark results use to Java 17

2021-10-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37129:


Assignee: (was: Apache Spark)

> Supplement all micro benchmark results use to Java 17
> -
>
> Key: SPARK-37129
> URL: https://issues.apache.org/jira/browse/SPARK-37129
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37129) Supplement all micro benchmark results use to Java 17

2021-10-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37129:


Assignee: Apache Spark

> Supplement all micro benchmark results use to Java 17
> -
>
> Key: SPARK-37129
> URL: https://issues.apache.org/jira/browse/SPARK-37129
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37135) Fix some mirco-benchmarks run failed

2021-10-28 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-37135.
--
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34409
[https://github.com/apache/spark/pull/34409]

> Fix some mirco-benchmarks run failed 
> -
>
> Key: SPARK-37135
> URL: https://issues.apache.org/jira/browse/SPARK-37135
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
> Fix For: 3.3.0
>
>
> 2 mirco-benchmarks run failed:
>  
> org.apache.spark.serializer.KryoSerializerBenchmark
> {code:java}
> Running org.apache.spark.serializer.KryoSerializerBenchmark:Running 
> org.apache.spark.serializer.KryoSerializerBenchmark:Running benchmark: 
> Benchmark KryoPool vs old"pool of 1" implementation  Running case: 
> KryoPool:true21/10/27 16:09:26 ERROR SparkContext: Error initializing 
> SparkContext.java.lang.AssertionError: assertion failed: spark.test.home is 
> not set! at scala.Predef$.assert(Predef.scala:223) at 
> org.apache.spark.deploy.worker.Worker.(Worker.scala:148) at 
> org.apache.spark.deploy.worker.Worker$.startRpcEnvAndEndpoint(Worker.scala:954)
>  at 
> org.apache.spark.deploy.LocalSparkCluster.$anonfun$start$2(LocalSparkCluster.scala:71)
>  at 
> org.apache.spark.deploy.LocalSparkCluster.$anonfun$start$2$adapted(LocalSparkCluster.scala:65)
>  at scala.collection.immutable.Range.foreach(Range.scala:158) at 
> org.apache.spark.deploy.LocalSparkCluster.start(LocalSparkCluster.scala:65) 
> at 
> org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2971)
>  at org.apache.spark.SparkContext.(SparkContext.scala:562) at 
> org.apache.spark.SparkContext.(SparkContext.scala:138) at 
> org.apache.spark.serializer.KryoSerializerBenchmark$.createSparkContext(KryoSerializerBenchmark.scala:86)
>  at 
> org.apache.spark.serializer.KryoSerializerBenchmark$.sc$lzycompute$1(KryoSerializerBenchmark.scala:58)
>  at 
> org.apache.spark.serializer.KryoSerializerBenchmark$.sc$1(KryoSerializerBenchmark.scala:58)
>  at 
> org.apache.spark.serializer.KryoSerializerBenchmark$.$anonfun$run$3(KryoSerializerBenchmark.scala:63)
>  at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) at 
> scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659) at 
> scala.util.Success.$anonfun$map$1(Try.scala:255) at 
> scala.util.Success.map(Try.scala:213) at 
> scala.concurrent.Future.$anonfun$map$1(Future.scala:292) at 
> scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33) at 
> scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33) at 
> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64) at 
> java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1426)
>  at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) 
> at 
> java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
>  at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656) 
> at 
> java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594) 
> at 
> java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183){code}
> org.apache.spark.sql.execution.benchmark.DateTimeBenchmark
> {code:java}
> Exception in thread "main" 
> org.apache.spark.sql.catalyst.parser.ParseException: Exception in thread 
> "main" org.apache.spark.sql.catalyst.parser.ParseException: Cannot mix 
> year-month and day-time fields: interval 1 month 2 day(line 1, pos 38)
> == SQL ==cast(timestamp_seconds(id) as date) + interval 1 month 2 
> day--^^^
>  at 
> org.apache.spark.sql.errors.QueryParsingErrors$.mixedIntervalUnitsError(QueryParsingErrors.scala:214)
>  at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.constructMultiUnitsIntervalLiteral(AstBuilder.scala:2435)
>  at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.$anonfun$visitInterval$1(AstBuilder.scala:2479)
>  at 
> org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:133)
>  at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.visitInterval(AstBuilder.scala:2454)
>  at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.visitInterval(AstBuilder.scala:57)
>  at 
> org.apache.spark.sql.catalyst.parser.SqlBaseParser$IntervalContext.accept(SqlBaseParser.java:17681)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37142) Add __all__ to pyspark/pandas/*/__init__.py

2021-10-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435211#comment-17435211
 ] 

Apache Spark commented on SPARK-37142:
--

User 'dchvn' has created a pull request for this issue:
https://github.com/apache/spark/pull/34416

> Add __all__ to pyspark/pandas/*/__init__.py
> ---
>
> Key: SPARK-37142
> URL: https://issues.apache.org/jira/browse/SPARK-37142
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37141) WorkerSuite cannot run on Mac OS

2021-10-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435235#comment-17435235
 ] 

Apache Spark commented on SPARK-37141:
--

User 'toujours33' has created a pull request for this issue:
https://github.com/apache/spark/pull/34420

> WorkerSuite cannot run on Mac OS
> 
>
> Key: SPARK-37141
> URL: https://issues.apache.org/jira/browse/SPARK-37141
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Priority: Minor
>
> After SPARK-35907 run `org.apache.spark.deploy.worker.WorkerSuite` on Mac 
> os(both M1 and Intel) failed
> {code:java}
> mvn clean install -DskipTests -pl core -am
> mvn test -pl core -Dtest=none 
> -DwildcardSuites=org.apache.spark.deploy.worker.WorkerSuite
> {code}
> {code:java}
> WorkerSuite:
> - test isUseLocalNodeSSLConfig
> - test maybeUpdateSSLSettings
> - test clearing of finishedExecutors (small number of executors)
> - test clearing of finishedExecutors (more executors)
> - test clearing of finishedDrivers (small number of drivers)
> - test clearing of finishedDrivers (more drivers)
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time:  47.973 s
> [INFO] Finished at: 2021-10-28T13:46:56+08:00
> [INFO] 
> 
> [ERROR] Failed to execute goal 
> org.scalatest:scalatest-maven-plugin:2.0.2:test (test) on project 
> spark-core_2.12: There are test failures -> [Help 1]
> [ERROR] 
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR] 
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
> {code}
> {code:java}
> 21/10/28 13:46:56.133 dispatcher-event-loop-1 ERROR Utils: Failed to create 
> directory /tmp
> java.nio.file.FileAlreadyExistsException: /tmp
>         at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:88)
>         at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>         at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>         at 
> sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
>         at java.nio.file.Files.createDirectory(Files.java:674)
>         at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
>         at java.nio.file.Files.createDirectories(Files.java:727)
>         at org.apache.spark.util.Utils$.createDirectory(Utils.scala:292)
>         at 
> org.apache.spark.deploy.worker.Worker.createWorkDir(Worker.scala:221)
>         at org.apache.spark.deploy.worker.Worker.onStart(Worker.scala:232)
>         at 
> org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:120)
>         at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
>         at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
>         at 
> org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
>         at 
> org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37141) WorkerSuite cannot run on Mac OS

2021-10-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37141:


Assignee: (was: Apache Spark)

> WorkerSuite cannot run on Mac OS
> 
>
> Key: SPARK-37141
> URL: https://issues.apache.org/jira/browse/SPARK-37141
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Priority: Minor
>
> After SPARK-35907 run `org.apache.spark.deploy.worker.WorkerSuite` on Mac 
> os(both M1 and Intel) failed
> {code:java}
> mvn clean install -DskipTests -pl core -am
> mvn test -pl core -Dtest=none 
> -DwildcardSuites=org.apache.spark.deploy.worker.WorkerSuite
> {code}
> {code:java}
> WorkerSuite:
> - test isUseLocalNodeSSLConfig
> - test maybeUpdateSSLSettings
> - test clearing of finishedExecutors (small number of executors)
> - test clearing of finishedExecutors (more executors)
> - test clearing of finishedDrivers (small number of drivers)
> - test clearing of finishedDrivers (more drivers)
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time:  47.973 s
> [INFO] Finished at: 2021-10-28T13:46:56+08:00
> [INFO] 
> 
> [ERROR] Failed to execute goal 
> org.scalatest:scalatest-maven-plugin:2.0.2:test (test) on project 
> spark-core_2.12: There are test failures -> [Help 1]
> [ERROR] 
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR] 
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
> {code}
> {code:java}
> 21/10/28 13:46:56.133 dispatcher-event-loop-1 ERROR Utils: Failed to create 
> directory /tmp
> java.nio.file.FileAlreadyExistsException: /tmp
>         at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:88)
>         at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>         at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>         at 
> sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
>         at java.nio.file.Files.createDirectory(Files.java:674)
>         at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
>         at java.nio.file.Files.createDirectories(Files.java:727)
>         at org.apache.spark.util.Utils$.createDirectory(Utils.scala:292)
>         at 
> org.apache.spark.deploy.worker.Worker.createWorkDir(Worker.scala:221)
>         at org.apache.spark.deploy.worker.Worker.onStart(Worker.scala:232)
>         at 
> org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:120)
>         at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
>         at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
>         at 
> org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
>         at 
> org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37141) WorkerSuite cannot run on Mac OS

2021-10-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37141:


Assignee: Apache Spark

> WorkerSuite cannot run on Mac OS
> 
>
> Key: SPARK-37141
> URL: https://issues.apache.org/jira/browse/SPARK-37141
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>
> After SPARK-35907 run `org.apache.spark.deploy.worker.WorkerSuite` on Mac 
> os(both M1 and Intel) failed
> {code:java}
> mvn clean install -DskipTests -pl core -am
> mvn test -pl core -Dtest=none 
> -DwildcardSuites=org.apache.spark.deploy.worker.WorkerSuite
> {code}
> {code:java}
> WorkerSuite:
> - test isUseLocalNodeSSLConfig
> - test maybeUpdateSSLSettings
> - test clearing of finishedExecutors (small number of executors)
> - test clearing of finishedExecutors (more executors)
> - test clearing of finishedDrivers (small number of drivers)
> - test clearing of finishedDrivers (more drivers)
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time:  47.973 s
> [INFO] Finished at: 2021-10-28T13:46:56+08:00
> [INFO] 
> 
> [ERROR] Failed to execute goal 
> org.scalatest:scalatest-maven-plugin:2.0.2:test (test) on project 
> spark-core_2.12: There are test failures -> [Help 1]
> [ERROR] 
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR] 
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
> {code}
> {code:java}
> 21/10/28 13:46:56.133 dispatcher-event-loop-1 ERROR Utils: Failed to create 
> directory /tmp
> java.nio.file.FileAlreadyExistsException: /tmp
>         at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:88)
>         at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>         at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>         at 
> sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
>         at java.nio.file.Files.createDirectory(Files.java:674)
>         at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
>         at java.nio.file.Files.createDirectories(Files.java:727)
>         at org.apache.spark.util.Utils$.createDirectory(Utils.scala:292)
>         at 
> org.apache.spark.deploy.worker.Worker.createWorkDir(Worker.scala:221)
>         at org.apache.spark.deploy.worker.Worker.onStart(Worker.scala:232)
>         at 
> org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:120)
>         at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
>         at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
>         at 
> org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
>         at 
> org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36928) Handle ANSI intervals in ColumnarRow, ColumnarBatchRow and ColumnarArray

2021-10-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36928:


Assignee: (was: Apache Spark)

> Handle ANSI intervals in ColumnarRow, ColumnarBatchRow and ColumnarArray
> 
>
> Key: SPARK-36928
> URL: https://issues.apache.org/jira/browse/SPARK-36928
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Priority: Major
>
> Handle ANSI interval types - YearMonthIntervalType and DayTimeIntervalType in 
> Columnar* classes, and write tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36928) Handle ANSI intervals in ColumnarRow, ColumnarBatchRow and ColumnarArray

2021-10-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435247#comment-17435247
 ] 

Apache Spark commented on SPARK-36928:
--

User 'Peng-Lei' has created a pull request for this issue:
https://github.com/apache/spark/pull/34421

> Handle ANSI intervals in ColumnarRow, ColumnarBatchRow and ColumnarArray
> 
>
> Key: SPARK-36928
> URL: https://issues.apache.org/jira/browse/SPARK-36928
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Priority: Major
>
> Handle ANSI interval types - YearMonthIntervalType and DayTimeIntervalType in 
> Columnar* classes, and write tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36928) Handle ANSI intervals in ColumnarRow, ColumnarBatchRow and ColumnarArray

2021-10-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36928:


Assignee: Apache Spark

> Handle ANSI intervals in ColumnarRow, ColumnarBatchRow and ColumnarArray
> 
>
> Key: SPARK-36928
> URL: https://issues.apache.org/jira/browse/SPARK-36928
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Major
>
> Handle ANSI interval types - YearMonthIntervalType and DayTimeIntervalType in 
> Columnar* classes, and write tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37143) Supplement the missing Java 11 benchmark result files

2021-10-28 Thread Yang Jie (Jira)
Yang Jie created SPARK-37143:


 Summary: Supplement the missing Java 11 benchmark result files
 Key: SPARK-37143
 URL: https://issues.apache.org/jira/browse/SPARK-37143
 Project: Spark
  Issue Type: Bug
  Components: Tests
Affects Versions: 3.3.0
Reporter: Yang Jie


CharVarcharBenchmark-results.txt and UpdateFieldsBenchmark-results.txt exist in 
the project, but CharVarcharBenchmark-jdk11-results.txt and 
UpdateFieldsBenchmark-jdk11-results.txt are missing

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37143) Supplement the missing Java 11 benchmark result files

2021-10-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435266#comment-17435266
 ] 

Apache Spark commented on SPARK-37143:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/34423

> Supplement the missing Java 11 benchmark result files
> -
>
> Key: SPARK-37143
> URL: https://issues.apache.org/jira/browse/SPARK-37143
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Priority: Minor
>
> CharVarcharBenchmark-results.txt and UpdateFieldsBenchmark-results.txt exist 
> in the project, but CharVarcharBenchmark-jdk11-results.txt and 
> UpdateFieldsBenchmark-jdk11-results.txt are missing
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37143) Supplement the missing Java 11 benchmark result files

2021-10-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37143:


Assignee: Apache Spark

> Supplement the missing Java 11 benchmark result files
> -
>
> Key: SPARK-37143
> URL: https://issues.apache.org/jira/browse/SPARK-37143
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>
> CharVarcharBenchmark-results.txt and UpdateFieldsBenchmark-results.txt exist 
> in the project, but CharVarcharBenchmark-jdk11-results.txt and 
> UpdateFieldsBenchmark-jdk11-results.txt are missing
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37143) Supplement the missing Java 11 benchmark result files

2021-10-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37143:


Assignee: (was: Apache Spark)

> Supplement the missing Java 11 benchmark result files
> -
>
> Key: SPARK-37143
> URL: https://issues.apache.org/jira/browse/SPARK-37143
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Priority: Minor
>
> CharVarcharBenchmark-results.txt and UpdateFieldsBenchmark-results.txt exist 
> in the project, but CharVarcharBenchmark-jdk11-results.txt and 
> UpdateFieldsBenchmark-jdk11-results.txt are missing
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37144) Inline type hints for python/pyspark/file.py

2021-10-28 Thread dch nguyen (Jira)
dch nguyen created SPARK-37144:
--

 Summary: Inline type hints for python/pyspark/file.py
 Key: SPARK-37144
 URL: https://issues.apache.org/jira/browse/SPARK-37144
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.3.0
Reporter: dch nguyen






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37144) Inline type hints for python/pyspark/file.py

2021-10-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37144:


Assignee: Apache Spark

> Inline type hints for python/pyspark/file.py
> 
>
> Key: SPARK-37144
> URL: https://issues.apache.org/jira/browse/SPARK-37144
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37144) Inline type hints for python/pyspark/file.py

2021-10-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435311#comment-17435311
 ] 

Apache Spark commented on SPARK-37144:
--

User 'dchvn' has created a pull request for this issue:
https://github.com/apache/spark/pull/34424

> Inline type hints for python/pyspark/file.py
> 
>
> Key: SPARK-37144
> URL: https://issues.apache.org/jira/browse/SPARK-37144
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37144) Inline type hints for python/pyspark/file.py

2021-10-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37144:


Assignee: (was: Apache Spark)

> Inline type hints for python/pyspark/file.py
> 
>
> Key: SPARK-37144
> URL: https://issues.apache.org/jira/browse/SPARK-37144
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37145) Improvement for extending pod feature steps with KubernetesConf

2021-10-28 Thread wangxin (Jira)
wangxin created SPARK-37145:
---

 Summary: Improvement for extending pod feature steps with 
KubernetesConf
 Key: SPARK-37145
 URL: https://issues.apache.org/jira/browse/SPARK-37145
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 3.2.0
Reporter: wangxin


SPARK-33261 provides us with great convenience, but it only construct a 
`KubernetesFeatureConfigStep` with a empty construction method.

It would be better to use the construction method with `KubernetesConf` (or 
more detail: `KubernetesDriverConf` and `KubernetesExecutorConf`)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37146) Inline type hints for python/pyspark/__init__.py

2021-10-28 Thread dch nguyen (Jira)
dch nguyen created SPARK-37146:
--

 Summary: Inline type hints for python/pyspark/__init__.py
 Key: SPARK-37146
 URL: https://issues.apache.org/jira/browse/SPARK-37146
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.3.0
Reporter: dch nguyen






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37146) Inline type hints for python/pyspark/__init__.py

2021-10-28 Thread dch nguyen (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435341#comment-17435341
 ] 

dch nguyen commented on SPARK-37146:


I am working on this

> Inline type hints for python/pyspark/__init__.py
> 
>
> Key: SPARK-37146
> URL: https://issues.apache.org/jira/browse/SPARK-37146
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36928) Handle ANSI intervals in ColumnarRow, ColumnarBatchRow and ColumnarArray

2021-10-28 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-36928:


Assignee: PengLei

> Handle ANSI intervals in ColumnarRow, ColumnarBatchRow and ColumnarArray
> 
>
> Key: SPARK-36928
> URL: https://issues.apache.org/jira/browse/SPARK-36928
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Assignee: PengLei
>Priority: Major
>
> Handle ANSI interval types - YearMonthIntervalType and DayTimeIntervalType in 
> Columnar* classes, and write tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36928) Handle ANSI intervals in ColumnarRow, ColumnarBatchRow and ColumnarArray

2021-10-28 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-36928.
--
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34421
[https://github.com/apache/spark/pull/34421]

> Handle ANSI intervals in ColumnarRow, ColumnarBatchRow and ColumnarArray
> 
>
> Key: SPARK-36928
> URL: https://issues.apache.org/jira/browse/SPARK-36928
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Assignee: PengLei
>Priority: Major
> Fix For: 3.3.0
>
>
> Handle ANSI interval types - YearMonthIntervalType and DayTimeIntervalType in 
> Columnar* classes, and write tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37136) Remove code about hive build in functions

2021-10-28 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-37136:
---

Assignee: angerszhu

> Remove code about hive build in functions
> -
>
> Key: SPARK-37136
> URL: https://issues.apache.org/jira/browse/SPARK-37136
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
>
> Since we have implement `histogram_numeric`, no we can remove code about hive 
> build in functions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37136) Remove code about hive build in functions

2021-10-28 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-37136.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34410
[https://github.com/apache/spark/pull/34410]

> Remove code about hive build in functions
> -
>
> Key: SPARK-37136
> URL: https://issues.apache.org/jira/browse/SPARK-37136
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.3.0
>
>
> Since we have implement `histogram_numeric`, no we can remove code about hive 
> build in functions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37105) Pass all UTs in `sql/hive` with Java 17

2021-10-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435397#comment-17435397
 ] 

Apache Spark commented on SPARK-37105:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/34425

> Pass all UTs in `sql/hive` with Java 17
> ---
>
> Key: SPARK-37105
> URL: https://issues.apache.org/jira/browse/SPARK-37105
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
> Fix For: 3.3.0
>
>
> Add `extraJavaTestArgs` to sql/hive pom.xml and run 
> {code:java}
> build/mvn clean install -Phadoop-3.2 -Phive-2.3 -Phive -pl sql/hive
> {code}
> there are 22 failed tests 
> {code:java}
> Run completed in 1 hour, 2 minutes, 3 seconds.
> Total number of tests run: 3547
> Suites: completed 117, aborted 0
> Tests: succeeded 3525, failed 22, canceled 6, ignored 605, pending 0
> *** 22 TESTS FAILED ***
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37105) Pass all UTs in `sql/hive` with Java 17

2021-10-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435399#comment-17435399
 ] 

Apache Spark commented on SPARK-37105:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/34425

> Pass all UTs in `sql/hive` with Java 17
> ---
>
> Key: SPARK-37105
> URL: https://issues.apache.org/jira/browse/SPARK-37105
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
> Fix For: 3.3.0
>
>
> Add `extraJavaTestArgs` to sql/hive pom.xml and run 
> {code:java}
> build/mvn clean install -Phadoop-3.2 -Phive-2.3 -Phive -pl sql/hive
> {code}
> there are 22 failed tests 
> {code:java}
> Run completed in 1 hour, 2 minutes, 3 seconds.
> Total number of tests run: 3547
> Suites: completed 117, aborted 0
> Tests: succeeded 3525, failed 22, canceled 6, ignored 605, pending 0
> *** 22 TESTS FAILED ***
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37118) Add KMeans distanceMeasure param to PythonMLLibAPI

2021-10-28 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-37118:
-
Affects Version/s: (was: 3.2.1)
   3.2.0

> Add KMeans distanceMeasure param to PythonMLLibAPI
> --
>
> Key: SPARK-37118
> URL: https://issues.apache.org/jira/browse/SPARK-37118
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib, PySpark
>Affects Versions: 3.2.0
>Reporter: Raimi bin Karim
>Priority: Trivial
>
> SPARK-22119 added KMeans {{distanceMeasure}} to the Python API.
> We should include this parameter too in the 
> {{PythonMLLibAPI.t}}{{rainKMeansModel}} method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37118) Add KMeans distanceMeasure param to PythonMLLibAPI

2021-10-28 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-37118:
-
Fix Version/s: (was: 3.2.1)

> Add KMeans distanceMeasure param to PythonMLLibAPI
> --
>
> Key: SPARK-37118
> URL: https://issues.apache.org/jira/browse/SPARK-37118
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib, PySpark
>Affects Versions: 3.2.1
>Reporter: Raimi bin Karim
>Priority: Trivial
>
> SPARK-22119 added KMeans {{distanceMeasure}} to the Python API.
> We should include this parameter too in the 
> {{PythonMLLibAPI.t}}{{rainKMeansModel}} method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37118) Add KMeans distanceMeasure param to PythonMLLibAPI

2021-10-28 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-37118:


Assignee: Raimi bin Karim

> Add KMeans distanceMeasure param to PythonMLLibAPI
> --
>
> Key: SPARK-37118
> URL: https://issues.apache.org/jira/browse/SPARK-37118
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib, PySpark
>Affects Versions: 3.2.0
>Reporter: Raimi bin Karim
>Assignee: Raimi bin Karim
>Priority: Trivial
>
> SPARK-22119 added KMeans {{distanceMeasure}} to the Python API.
> We should include this parameter too in the 
> {{PythonMLLibAPI.t}}{{rainKMeansModel}} method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37118) Add KMeans distanceMeasure param to PythonMLLibAPI

2021-10-28 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-37118.
--
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34394
[https://github.com/apache/spark/pull/34394]

> Add KMeans distanceMeasure param to PythonMLLibAPI
> --
>
> Key: SPARK-37118
> URL: https://issues.apache.org/jira/browse/SPARK-37118
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib, PySpark
>Affects Versions: 3.2.0
>Reporter: Raimi bin Karim
>Assignee: Raimi bin Karim
>Priority: Trivial
> Fix For: 3.3.0
>
>
> SPARK-22119 added KMeans {{distanceMeasure}} to the Python API.
> We should include this parameter too in the 
> {{PythonMLLibAPI.t}}{{rainKMeansModel}} method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37147) MetricsReporter producing NullPointerException when element 'triggerExecution' not present in Map[]

2021-10-28 Thread Radoslaw Busz (Jira)
Radoslaw Busz created SPARK-37147:
-

 Summary: MetricsReporter producing NullPointerException when 
element 'triggerExecution' not present in Map[]
 Key: SPARK-37147
 URL: https://issues.apache.org/jira/browse/SPARK-37147
 Project: Spark
  Issue Type: Bug
  Components: Structured Streaming
Affects Versions: 3.2.0
Reporter: Radoslaw Busz


The exception occurs in MetricsReporter when it tries to register gauges using 
lastProgress of each stream. This problem was partially fixed in 
https://issues.apache.org/jira/browse/SPARK-22975 but in introduced a NPE when 
'triggerExecution' element is not present in durationMS map:

 
{code:java}
registerGauge("latency", _.durationMs.get("triggerExecution").longValue(), 0L)
{code}
 

I find it difficult to reproduce this every time but it happens every few 
restarts when structured streaming uses a slow event source (very little or no 
events). In my case it breaks metric reporting via Codehale/Dropwizard and 
generates multiple stacktraces such as:
{code:java}
21/09/16 09:51:36 ERROR ScheduledReporter: Exception thrown from 
GangliaReporter#report. Exception was suppressed.
java.lang.NullPointerException
at 
org.apache.spark.sql.execution.streaming.MetricsReporter.$anonfun$new$3(MetricsReporter.scala:43)
at 
org.apache.spark.sql.execution.streaming.MetricsReporter.$anonfun$new$3$adapted(MetricsReporter.scala:43)
at scala.Option.map(Option.scala:230)
at 
org.apache.spark.sql.execution.streaming.MetricsReporter$$anon$1.getValue(MetricsReporter.scala:68)
at 
com.codahale.metrics.ganglia.GangliaReporter.reportGauge(GangliaReporter.java:353)
at 
com.codahale.metrics.ganglia.GangliaReporter.report(GangliaReporter.java:240)
at 
com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:237)
at 
com.codahale.metrics.ScheduledReporter.lambda$start$0(ScheduledReporter.java:177)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{code}
I'm happy to implement a fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37147) MetricsReporter producing NullPointerException when element 'triggerExecution' not present in Map[]

2021-10-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435567#comment-17435567
 ] 

Apache Spark commented on SPARK-37147:
--

User 'gitplaneta' has created a pull request for this issue:
https://github.com/apache/spark/pull/34426

> MetricsReporter producing NullPointerException when element 
> 'triggerExecution' not present in Map[]
> ---
>
> Key: SPARK-37147
> URL: https://issues.apache.org/jira/browse/SPARK-37147
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.2.0
>Reporter: Radoslaw Busz
>Priority: Major
>
> The exception occurs in MetricsReporter when it tries to register gauges 
> using lastProgress of each stream. This problem was partially fixed in 
> https://issues.apache.org/jira/browse/SPARK-22975 but in introduced a NPE 
> when 'triggerExecution' element is not present in durationMS map:
>  
> {code:java}
> registerGauge("latency", _.durationMs.get("triggerExecution").longValue(), 0L)
> {code}
>  
> I find it difficult to reproduce this every time but it happens every few 
> restarts when structured streaming uses a slow event source (very little or 
> no events). In my case it breaks metric reporting via Codehale/Dropwizard and 
> generates multiple stacktraces such as:
> {code:java}
> 21/09/16 09:51:36 ERROR ScheduledReporter: Exception thrown from 
> GangliaReporter#report. Exception was suppressed.
> java.lang.NullPointerException
>   at 
> org.apache.spark.sql.execution.streaming.MetricsReporter.$anonfun$new$3(MetricsReporter.scala:43)
>   at 
> org.apache.spark.sql.execution.streaming.MetricsReporter.$anonfun$new$3$adapted(MetricsReporter.scala:43)
>   at scala.Option.map(Option.scala:230)
>   at 
> org.apache.spark.sql.execution.streaming.MetricsReporter$$anon$1.getValue(MetricsReporter.scala:68)
>   at 
> com.codahale.metrics.ganglia.GangliaReporter.reportGauge(GangliaReporter.java:353)
>   at 
> com.codahale.metrics.ganglia.GangliaReporter.report(GangliaReporter.java:240)
>   at 
> com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:237)
>   at 
> com.codahale.metrics.ScheduledReporter.lambda$start$0(ScheduledReporter.java:177)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> I'm happy to implement a fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37147) MetricsReporter producing NullPointerException when element 'triggerExecution' not present in Map[]

2021-10-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37147:


Assignee: (was: Apache Spark)

> MetricsReporter producing NullPointerException when element 
> 'triggerExecution' not present in Map[]
> ---
>
> Key: SPARK-37147
> URL: https://issues.apache.org/jira/browse/SPARK-37147
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.2.0
>Reporter: Radoslaw Busz
>Priority: Major
>
> The exception occurs in MetricsReporter when it tries to register gauges 
> using lastProgress of each stream. This problem was partially fixed in 
> https://issues.apache.org/jira/browse/SPARK-22975 but in introduced a NPE 
> when 'triggerExecution' element is not present in durationMS map:
>  
> {code:java}
> registerGauge("latency", _.durationMs.get("triggerExecution").longValue(), 0L)
> {code}
>  
> I find it difficult to reproduce this every time but it happens every few 
> restarts when structured streaming uses a slow event source (very little or 
> no events). In my case it breaks metric reporting via Codehale/Dropwizard and 
> generates multiple stacktraces such as:
> {code:java}
> 21/09/16 09:51:36 ERROR ScheduledReporter: Exception thrown from 
> GangliaReporter#report. Exception was suppressed.
> java.lang.NullPointerException
>   at 
> org.apache.spark.sql.execution.streaming.MetricsReporter.$anonfun$new$3(MetricsReporter.scala:43)
>   at 
> org.apache.spark.sql.execution.streaming.MetricsReporter.$anonfun$new$3$adapted(MetricsReporter.scala:43)
>   at scala.Option.map(Option.scala:230)
>   at 
> org.apache.spark.sql.execution.streaming.MetricsReporter$$anon$1.getValue(MetricsReporter.scala:68)
>   at 
> com.codahale.metrics.ganglia.GangliaReporter.reportGauge(GangliaReporter.java:353)
>   at 
> com.codahale.metrics.ganglia.GangliaReporter.report(GangliaReporter.java:240)
>   at 
> com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:237)
>   at 
> com.codahale.metrics.ScheduledReporter.lambda$start$0(ScheduledReporter.java:177)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> I'm happy to implement a fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37147) MetricsReporter producing NullPointerException when element 'triggerExecution' not present in Map[]

2021-10-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37147:


Assignee: Apache Spark

> MetricsReporter producing NullPointerException when element 
> 'triggerExecution' not present in Map[]
> ---
>
> Key: SPARK-37147
> URL: https://issues.apache.org/jira/browse/SPARK-37147
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.2.0
>Reporter: Radoslaw Busz
>Assignee: Apache Spark
>Priority: Major
>
> The exception occurs in MetricsReporter when it tries to register gauges 
> using lastProgress of each stream. This problem was partially fixed in 
> https://issues.apache.org/jira/browse/SPARK-22975 but in introduced a NPE 
> when 'triggerExecution' element is not present in durationMS map:
>  
> {code:java}
> registerGauge("latency", _.durationMs.get("triggerExecution").longValue(), 0L)
> {code}
>  
> I find it difficult to reproduce this every time but it happens every few 
> restarts when structured streaming uses a slow event source (very little or 
> no events). In my case it breaks metric reporting via Codehale/Dropwizard and 
> generates multiple stacktraces such as:
> {code:java}
> 21/09/16 09:51:36 ERROR ScheduledReporter: Exception thrown from 
> GangliaReporter#report. Exception was suppressed.
> java.lang.NullPointerException
>   at 
> org.apache.spark.sql.execution.streaming.MetricsReporter.$anonfun$new$3(MetricsReporter.scala:43)
>   at 
> org.apache.spark.sql.execution.streaming.MetricsReporter.$anonfun$new$3$adapted(MetricsReporter.scala:43)
>   at scala.Option.map(Option.scala:230)
>   at 
> org.apache.spark.sql.execution.streaming.MetricsReporter$$anon$1.getValue(MetricsReporter.scala:68)
>   at 
> com.codahale.metrics.ganglia.GangliaReporter.reportGauge(GangliaReporter.java:353)
>   at 
> com.codahale.metrics.ganglia.GangliaReporter.report(GangliaReporter.java:240)
>   at 
> com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:237)
>   at 
> com.codahale.metrics.ScheduledReporter.lambda$start$0(ScheduledReporter.java:177)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> I'm happy to implement a fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37148) Improve error messages under ANSI mode

2021-10-28 Thread Allison Wang (Jira)
Allison Wang created SPARK-37148:


 Summary: Improve error messages under ANSI mode
 Key: SPARK-37148
 URL: https://issues.apache.org/jira/browse/SPARK-37148
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.0
Reporter: Allison Wang


Improve error messages when ANSI mode is enabled. Many exceptions thrown under 
ANSI mode can be disruptive to users. We should provide clear error messages 
with workarounds.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37149) Improve error messages for arithmetic overflow under ANSI mode

2021-10-28 Thread Allison Wang (Jira)
Allison Wang created SPARK-37149:


 Summary: Improve error messages for arithmetic overflow under ANSI 
mode
 Key: SPARK-37149
 URL: https://issues.apache.org/jira/browse/SPARK-37149
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.3.0
Reporter: Allison Wang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37149) Improve error messages for arithmetic overflow exception under ANSI mode

2021-10-28 Thread Allison Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated SPARK-37149:
-
Summary: Improve error messages for arithmetic overflow exception under 
ANSI mode  (was: Improve error messages for arithmetic overflow under ANSI mode)

> Improve error messages for arithmetic overflow exception under ANSI mode
> 
>
> Key: SPARK-37149
> URL: https://issues.apache.org/jira/browse/SPARK-37149
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Allison Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37149) Improve error messages for arithmetic overflow errors under ANSI mode

2021-10-28 Thread Allison Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated SPARK-37149:
-
Summary: Improve error messages for arithmetic overflow errors under ANSI 
mode  (was: Improve error messages for arithmetic overflow exception under ANSI 
mode)

> Improve error messages for arithmetic overflow errors under ANSI mode
> -
>
> Key: SPARK-37149
> URL: https://issues.apache.org/jira/browse/SPARK-37149
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Allison Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37149) Improve error messages for arithmetic overflow under ANSI mode

2021-10-28 Thread Allison Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated SPARK-37149:
-
Summary: Improve error messages for arithmetic overflow under ANSI mode  
(was: Improve error messages for arithmetic overflow errors under ANSI mode)

> Improve error messages for arithmetic overflow under ANSI mode
> --
>
> Key: SPARK-37149
> URL: https://issues.apache.org/jira/browse/SPARK-37149
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Allison Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37149) Improve error messages for arithmetic overflow under ANSI mode

2021-10-28 Thread Allison Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated SPARK-37149:
-
Description: Improve error messages for arithmetic overflow exceptions. We 
can instruct users to 1) turn off ANSI mode or 2) use `try_` functions if 
applicable.

> Improve error messages for arithmetic overflow under ANSI mode
> --
>
> Key: SPARK-37149
> URL: https://issues.apache.org/jira/browse/SPARK-37149
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Allison Wang
>Priority: Major
>
> Improve error messages for arithmetic overflow exceptions. We can instruct 
> users to 1) turn off ANSI mode or 2) use `try_` functions if applicable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37141) WorkerSuite cannot run on Mac OS

2021-10-28 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-37141:
-

Assignee: Yazhi Wang

> WorkerSuite cannot run on Mac OS
> 
>
> Key: SPARK-37141
> URL: https://issues.apache.org/jira/browse/SPARK-37141
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Assignee: Yazhi Wang
>Priority: Minor
>
> After SPARK-35907 run `org.apache.spark.deploy.worker.WorkerSuite` on Mac 
> os(both M1 and Intel) failed
> {code:java}
> mvn clean install -DskipTests -pl core -am
> mvn test -pl core -Dtest=none 
> -DwildcardSuites=org.apache.spark.deploy.worker.WorkerSuite
> {code}
> {code:java}
> WorkerSuite:
> - test isUseLocalNodeSSLConfig
> - test maybeUpdateSSLSettings
> - test clearing of finishedExecutors (small number of executors)
> - test clearing of finishedExecutors (more executors)
> - test clearing of finishedDrivers (small number of drivers)
> - test clearing of finishedDrivers (more drivers)
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time:  47.973 s
> [INFO] Finished at: 2021-10-28T13:46:56+08:00
> [INFO] 
> 
> [ERROR] Failed to execute goal 
> org.scalatest:scalatest-maven-plugin:2.0.2:test (test) on project 
> spark-core_2.12: There are test failures -> [Help 1]
> [ERROR] 
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR] 
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
> {code}
> {code:java}
> 21/10/28 13:46:56.133 dispatcher-event-loop-1 ERROR Utils: Failed to create 
> directory /tmp
> java.nio.file.FileAlreadyExistsException: /tmp
>         at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:88)
>         at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>         at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>         at 
> sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
>         at java.nio.file.Files.createDirectory(Files.java:674)
>         at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
>         at java.nio.file.Files.createDirectories(Files.java:727)
>         at org.apache.spark.util.Utils$.createDirectory(Utils.scala:292)
>         at 
> org.apache.spark.deploy.worker.Worker.createWorkDir(Worker.scala:221)
>         at org.apache.spark.deploy.worker.Worker.onStart(Worker.scala:232)
>         at 
> org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:120)
>         at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
>         at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
>         at 
> org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
>         at 
> org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37141) WorkerSuite cannot run on Mac OS

2021-10-28 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-37141.
---
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34420
[https://github.com/apache/spark/pull/34420]

> WorkerSuite cannot run on Mac OS
> 
>
> Key: SPARK-37141
> URL: https://issues.apache.org/jira/browse/SPARK-37141
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Assignee: Yazhi Wang
>Priority: Minor
> Fix For: 3.3.0
>
>
> After SPARK-35907 run `org.apache.spark.deploy.worker.WorkerSuite` on Mac 
> os(both M1 and Intel) failed
> {code:java}
> mvn clean install -DskipTests -pl core -am
> mvn test -pl core -Dtest=none 
> -DwildcardSuites=org.apache.spark.deploy.worker.WorkerSuite
> {code}
> {code:java}
> WorkerSuite:
> - test isUseLocalNodeSSLConfig
> - test maybeUpdateSSLSettings
> - test clearing of finishedExecutors (small number of executors)
> - test clearing of finishedExecutors (more executors)
> - test clearing of finishedDrivers (small number of drivers)
> - test clearing of finishedDrivers (more drivers)
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time:  47.973 s
> [INFO] Finished at: 2021-10-28T13:46:56+08:00
> [INFO] 
> 
> [ERROR] Failed to execute goal 
> org.scalatest:scalatest-maven-plugin:2.0.2:test (test) on project 
> spark-core_2.12: There are test failures -> [Help 1]
> [ERROR] 
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR] 
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
> {code}
> {code:java}
> 21/10/28 13:46:56.133 dispatcher-event-loop-1 ERROR Utils: Failed to create 
> directory /tmp
> java.nio.file.FileAlreadyExistsException: /tmp
>         at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:88)
>         at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>         at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>         at 
> sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
>         at java.nio.file.Files.createDirectory(Files.java:674)
>         at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
>         at java.nio.file.Files.createDirectories(Files.java:727)
>         at org.apache.spark.util.Utils$.createDirectory(Utils.scala:292)
>         at 
> org.apache.spark.deploy.worker.Worker.createWorkDir(Worker.scala:221)
>         at org.apache.spark.deploy.worker.Worker.onStart(Worker.scala:232)
>         at 
> org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:120)
>         at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
>         at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
>         at 
> org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
>         at 
> org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37143) Supplement the missing Java 11 benchmark result files

2021-10-28 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-37143.
---
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34423
[https://github.com/apache/spark/pull/34423]

> Supplement the missing Java 11 benchmark result files
> -
>
> Key: SPARK-37143
> URL: https://issues.apache.org/jira/browse/SPARK-37143
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 3.3.0
>
>
> CharVarcharBenchmark-results.txt and UpdateFieldsBenchmark-results.txt exist 
> in the project, but CharVarcharBenchmark-jdk11-results.txt and 
> UpdateFieldsBenchmark-jdk11-results.txt are missing
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37143) Supplement the missing Java 11 benchmark result files

2021-10-28 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-37143:
-

Assignee: Yang Jie

> Supplement the missing Java 11 benchmark result files
> -
>
> Key: SPARK-37143
> URL: https://issues.apache.org/jira/browse/SPARK-37143
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
>
> CharVarcharBenchmark-results.txt and UpdateFieldsBenchmark-results.txt exist 
> in the project, but CharVarcharBenchmark-jdk11-results.txt and 
> UpdateFieldsBenchmark-jdk11-results.txt are missing
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37149) Improve error messages for arithmetic overflow under ANSI mode

2021-10-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435607#comment-17435607
 ] 

Apache Spark commented on SPARK-37149:
--

User 'allisonwang-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/34427

> Improve error messages for arithmetic overflow under ANSI mode
> --
>
> Key: SPARK-37149
> URL: https://issues.apache.org/jira/browse/SPARK-37149
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Allison Wang
>Priority: Major
>
> Improve error messages for arithmetic overflow exceptions. We can instruct 
> users to 1) turn off ANSI mode or 2) use `try_` functions if applicable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37149) Improve error messages for arithmetic overflow under ANSI mode

2021-10-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37149:


Assignee: Apache Spark

> Improve error messages for arithmetic overflow under ANSI mode
> --
>
> Key: SPARK-37149
> URL: https://issues.apache.org/jira/browse/SPARK-37149
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Allison Wang
>Assignee: Apache Spark
>Priority: Major
>
> Improve error messages for arithmetic overflow exceptions. We can instruct 
> users to 1) turn off ANSI mode or 2) use `try_` functions if applicable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37149) Improve error messages for arithmetic overflow under ANSI mode

2021-10-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435608#comment-17435608
 ] 

Apache Spark commented on SPARK-37149:
--

User 'allisonwang-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/34427

> Improve error messages for arithmetic overflow under ANSI mode
> --
>
> Key: SPARK-37149
> URL: https://issues.apache.org/jira/browse/SPARK-37149
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Allison Wang
>Priority: Major
>
> Improve error messages for arithmetic overflow exceptions. We can instruct 
> users to 1) turn off ANSI mode or 2) use `try_` functions if applicable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37149) Improve error messages for arithmetic overflow under ANSI mode

2021-10-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37149:


Assignee: (was: Apache Spark)

> Improve error messages for arithmetic overflow under ANSI mode
> --
>
> Key: SPARK-37149
> URL: https://issues.apache.org/jira/browse/SPARK-37149
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Allison Wang
>Priority: Major
>
> Improve error messages for arithmetic overflow exceptions. We can instruct 
> users to 1) turn off ANSI mode or 2) use `try_` functions if applicable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32165) SessionState leaks SparkListener with multiple SparkSession

2021-10-28 Thread Arvin Zheng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435637#comment-17435637
 ] 

Arvin Zheng commented on SPARK-32165:
-

Hi [~advancedxy], any updates on this?

> SessionState leaks SparkListener with multiple SparkSession
> ---
>
> Key: SPARK-32165
> URL: https://issues.apache.org/jira/browse/SPARK-32165
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xianjin YE
>Priority: Major
>
> Copied from 
> [https://github.com/apache/spark/pull/28128#issuecomment-653102770]
> I'd like to point out that this pr 
> (https://github.com/apache/spark/pull/28128) doesn't fix the memory leaky 
> completely. Once {{SessionState}} is touched, it will add two more listeners 
> into the SparkContext, namely {{SQLAppStatusListener}} and 
> {{ExecutionListenerBus}}
> It can be reproduced easily as
> {code:java}
>   test("SPARK-31354: SparkContext only register one SparkSession 
> ApplicationEnd listener") {
> val conf = new SparkConf()
>   .setMaster("local")
>   .setAppName("test-app-SPARK-31354-1")
> val context = new SparkContext(conf)
> SparkSession
>   .builder()
>   .sparkContext(context)
>   .master("local")
>   .getOrCreate()
>   .sessionState // this touches the sessionState
> val postFirstCreation = context.listenerBus.listeners.size()
> SparkSession.clearActiveSession()
> SparkSession.clearDefaultSession()
> SparkSession
>   .builder()
>   .sparkContext(context)
>   .master("local")
>   .getOrCreate()
>   .sessionState // this touches the sessionState
> val postSecondCreation = context.listenerBus.listeners.size()
> SparkSession.clearActiveSession()
> SparkSession.clearDefaultSession()
> assert(postFirstCreation == postSecondCreation)
>   }
> {code}
> The problem can be reproduced by the above code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37150) Migrate DESCRIBE NAMESPACE to use V2 command by default

2021-10-28 Thread Terry Kim (Jira)
Terry Kim created SPARK-37150:
-

 Summary: Migrate DESCRIBE NAMESPACE to use V2 command by default
 Key: SPARK-37150
 URL: https://issues.apache.org/jira/browse/SPARK-37150
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.3.0
Reporter: Terry Kim


Migrate DESCRIBE NAMESPACE to use V2 command by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36627) Tasks with Java proxy objects fail to deserialize

2021-10-28 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-36627:


Assignee: Samuel Souza

> Tasks with Java proxy objects fail to deserialize
> -
>
> Key: SPARK-36627
> URL: https://issues.apache.org/jira/browse/SPARK-36627
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.3
>Reporter: Samuel Souza
>Assignee: Samuel Souza
>Priority: Minor
>
> In JavaSerializer.JavaDeserializationStream we override resolveClass of 
> ObjectInputStream to use the threads' contextClassLoader. However, we do not 
> override resolveProxyClass, which is used when deserializing Java proxy 
> objects, which makes spark use the wrong classloader when deserializing 
> objects, which causes the job to fail with the following exception:
> {code}
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in 
> stage 1.0 (TID 4, , executor 1): java.lang.ClassNotFoundException: 
> 
>   at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
>   at 
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
>   at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
>   at java.base/java.lang.Class.forName0(Native Method)
>   at java.base/java.lang.Class.forName(Class.java:398)
>   at 
> java.base/java.io.ObjectInputStream.resolveProxyClass(ObjectInputStream.java:829)
>   at 
> java.base/java.io.ObjectInputStream.readProxyDesc(ObjectInputStream.java:1917)
>   ...
>   at 
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36627) Tasks with Java proxy objects fail to deserialize

2021-10-28 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-36627.
--
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 33879
[https://github.com/apache/spark/pull/33879]

> Tasks with Java proxy objects fail to deserialize
> -
>
> Key: SPARK-36627
> URL: https://issues.apache.org/jira/browse/SPARK-36627
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.3
>Reporter: Samuel Souza
>Assignee: Samuel Souza
>Priority: Minor
> Fix For: 3.3.0
>
>
> In JavaSerializer.JavaDeserializationStream we override resolveClass of 
> ObjectInputStream to use the threads' contextClassLoader. However, we do not 
> override resolveProxyClass, which is used when deserializing Java proxy 
> objects, which makes spark use the wrong classloader when deserializing 
> objects, which causes the job to fail with the following exception:
> {code}
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in 
> stage 1.0 (TID 4, , executor 1): java.lang.ClassNotFoundException: 
> 
>   at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
>   at 
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
>   at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
>   at java.base/java.lang.Class.forName0(Native Method)
>   at java.base/java.lang.Class.forName(Class.java:398)
>   at 
> java.base/java.io.ObjectInputStream.resolveProxyClass(ObjectInputStream.java:829)
>   at 
> java.base/java.io.ObjectInputStream.readProxyDesc(ObjectInputStream.java:1917)
>   ...
>   at 
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37020) Limit push down in DS V2

2021-10-28 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao resolved SPARK-37020.

  Assignee: Huaxin Gao
Resolution: Fixed

> Limit push down in DS V2
> 
>
> Key: SPARK-37020
> URL: https://issues.apache.org/jira/browse/SPARK-37020
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-34960) Aggregate (Min/Max/Count) push down for ORC

2021-10-28 Thread L. C. Hsieh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh reassigned SPARK-34960:
---

Assignee: Cheng Su

> Aggregate (Min/Max/Count) push down for ORC
> ---
>
> Key: SPARK-34960
> URL: https://issues.apache.org/jira/browse/SPARK-34960
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Cheng Su
>Assignee: Cheng Su
>Priority: Minor
>
> Similar to Parquet (https://issues.apache.org/jira/browse/SPARK-34952), we 
> can also push down certain aggregations into ORC. ORC exposes column 
> statistics in interface `org.apache.orc.Reader` 
> ([https://github.com/apache/orc/blob/master/java/core/src/java/org/apache/orc/Reader.java#L118]
>  ), where Spark can utilize for aggregation push down.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-34960) Aggregate (Min/Max/Count) push down for ORC

2021-10-28 Thread L. C. Hsieh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh resolved SPARK-34960.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34298
[https://github.com/apache/spark/pull/34298]

> Aggregate (Min/Max/Count) push down for ORC
> ---
>
> Key: SPARK-34960
> URL: https://issues.apache.org/jira/browse/SPARK-34960
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Cheng Su
>Assignee: Cheng Su
>Priority: Minor
> Fix For: 3.3.0
>
>
> Similar to Parquet (https://issues.apache.org/jira/browse/SPARK-34952), we 
> can also push down certain aggregations into ORC. ORC exposes column 
> statistics in interface `org.apache.orc.Reader` 
> ([https://github.com/apache/orc/blob/master/java/core/src/java/org/apache/orc/Reader.java#L118]
>  ), where Spark can utilize for aggregation push down.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37151) Avoid executor state sync attempt fail continuously in a short timeframe

2021-10-28 Thread Xingbo Jiang (Jira)
Xingbo Jiang created SPARK-37151:


 Summary: Avoid executor state sync attempt fail continuously in a 
short timeframe
 Key: SPARK-37151
 URL: https://issues.apache.org/jira/browse/SPARK-37151
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.2.0
Reporter: Xingbo Jiang
Assignee: Xingbo Jiang


An executor would retry sending the ExecutorStateChanged message when the 
previous attempt failed. This would not be an issue when the attempt failed 
with TimeoutException. But if the connection between the executor and the 
Master is broken, the attempt would fail immediately, leading to the retry 
attempt also fail, and quickly reaches the max attempt limitation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37151) Avoid executor state sync attempt fail continuously in a short timeframe

2021-10-28 Thread Xingbo Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xingbo Jiang updated SPARK-37151:
-
Description: A worker would retry sending the ExecutorStateChanged message 
when the previous attempt failed. This would not be an issue when the attempt 
failed with TimeoutException. But if the connection between the worker and the 
master is broken, the attempt would fail immediately, leading to the retry 
attempt also fail, and quickly reaches the max attempt limitation.  (was: An 
executor would retry sending the ExecutorStateChanged message when the previous 
attempt failed. This would not be an issue when the attempt failed with 
TimeoutException. But if the connection between the executor and the Master is 
broken, the attempt would fail immediately, leading to the retry attempt also 
fail, and quickly reaches the max attempt limitation.)

> Avoid executor state sync attempt fail continuously in a short timeframe
> 
>
> Key: SPARK-37151
> URL: https://issues.apache.org/jira/browse/SPARK-37151
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Xingbo Jiang
>Assignee: Xingbo Jiang
>Priority: Major
>
> A worker would retry sending the ExecutorStateChanged message when the 
> previous attempt failed. This would not be an issue when the attempt failed 
> with TimeoutException. But if the connection between the worker and the 
> master is broken, the attempt would fail immediately, leading to the retry 
> attempt also fail, and quickly reaches the max attempt limitation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37151) Avoid executor state sync attempt fail continuously in a short timeframe

2021-10-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435732#comment-17435732
 ] 

Apache Spark commented on SPARK-37151:
--

User 'jiangxb1987' has created a pull request for this issue:
https://github.com/apache/spark/pull/34428

> Avoid executor state sync attempt fail continuously in a short timeframe
> 
>
> Key: SPARK-37151
> URL: https://issues.apache.org/jira/browse/SPARK-37151
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Xingbo Jiang
>Assignee: Xingbo Jiang
>Priority: Major
>
> A worker would retry sending the ExecutorStateChanged message when the 
> previous attempt failed. This would not be an issue when the attempt failed 
> with TimeoutException. But if the connection between the worker and the 
> master is broken, the attempt would fail immediately, leading to the retry 
> attempt also fail, and quickly reaches the max attempt limitation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37151) Avoid executor state sync attempt fail continuously in a short timeframe

2021-10-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37151:


Assignee: Apache Spark  (was: Xingbo Jiang)

> Avoid executor state sync attempt fail continuously in a short timeframe
> 
>
> Key: SPARK-37151
> URL: https://issues.apache.org/jira/browse/SPARK-37151
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Xingbo Jiang
>Assignee: Apache Spark
>Priority: Major
>
> A worker would retry sending the ExecutorStateChanged message when the 
> previous attempt failed. This would not be an issue when the attempt failed 
> with TimeoutException. But if the connection between the worker and the 
> master is broken, the attempt would fail immediately, leading to the retry 
> attempt also fail, and quickly reaches the max attempt limitation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37151) Avoid executor state sync attempt fail continuously in a short timeframe

2021-10-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37151:


Assignee: Xingbo Jiang  (was: Apache Spark)

> Avoid executor state sync attempt fail continuously in a short timeframe
> 
>
> Key: SPARK-37151
> URL: https://issues.apache.org/jira/browse/SPARK-37151
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Xingbo Jiang
>Assignee: Xingbo Jiang
>Priority: Major
>
> A worker would retry sending the ExecutorStateChanged message when the 
> previous attempt failed. This would not be an issue when the attempt failed 
> with TimeoutException. But if the connection between the worker and the 
> master is broken, the attempt would fail immediately, leading to the retry 
> attempt also fail, and quickly reaches the max attempt limitation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37150) Migrate DESCRIBE NAMESPACE to use V2 command by default

2021-10-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37150:


Assignee: Apache Spark

> Migrate DESCRIBE NAMESPACE to use V2 command by default
> ---
>
> Key: SPARK-37150
> URL: https://issues.apache.org/jira/browse/SPARK-37150
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Terry Kim
>Assignee: Apache Spark
>Priority: Major
>
> Migrate DESCRIBE NAMESPACE to use V2 command by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37150) Migrate DESCRIBE NAMESPACE to use V2 command by default

2021-10-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435734#comment-17435734
 ] 

Apache Spark commented on SPARK-37150:
--

User 'imback82' has created a pull request for this issue:
https://github.com/apache/spark/pull/34429

> Migrate DESCRIBE NAMESPACE to use V2 command by default
> ---
>
> Key: SPARK-37150
> URL: https://issues.apache.org/jira/browse/SPARK-37150
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Terry Kim
>Priority: Major
>
> Migrate DESCRIBE NAMESPACE to use V2 command by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37150) Migrate DESCRIBE NAMESPACE to use V2 command by default

2021-10-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37150:


Assignee: (was: Apache Spark)

> Migrate DESCRIBE NAMESPACE to use V2 command by default
> ---
>
> Key: SPARK-37150
> URL: https://issues.apache.org/jira/browse/SPARK-37150
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Terry Kim
>Priority: Major
>
> Migrate DESCRIBE NAMESPACE to use V2 command by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37150) Migrate DESCRIBE NAMESPACE to use V2 command by default

2021-10-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435736#comment-17435736
 ] 

Apache Spark commented on SPARK-37150:
--

User 'imback82' has created a pull request for this issue:
https://github.com/apache/spark/pull/34429

> Migrate DESCRIBE NAMESPACE to use V2 command by default
> ---
>
> Key: SPARK-37150
> URL: https://issues.apache.org/jira/browse/SPARK-37150
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Terry Kim
>Priority: Major
>
> Migrate DESCRIBE NAMESPACE to use V2 command by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36525) DS V2 Index Support

2021-10-28 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao updated SPARK-36525:
---
Description: 
Many data sources support index to improvement query performance. In order to 
take advantage of the index support in data source, the following APIs will be 
added for working with indexes:

{code:java}
 public interface SupportsIndex extends Table {

  /**
   * Creates an index.
   *
   * @param indexName the name of the index to be created
   * @param indexType the type of the index to be created. If this is not 
specified, Spark
   *  will use empty String.
   * @param columns the columns on which index to be created
   * @param columnsProperties the properties of the columns on which index to 
be created
   * @param properties the properties of the index to be created
   * @throws IndexAlreadyExistsException If the index already exists.
   */
  void createIndex(String indexName,
  String indexType,
  NamedReference[] columns,
  Map> columnsProperties,
  Map properties)
  throws IndexAlreadyExistsException;

  /**
   * Drops the index with the given name.
   *
   * @param indexName the name of the index to be dropped.
   * @throws NoSuchIndexException If the index does not exist.
   */
  void dropIndex(String indexName) throws NoSuchIndexException;

  /**
   * Checks whether an index exists in this table.
   *
   * @param indexName the name of the index
   * @return true if the index exists, false otherwise
   */
  boolean indexExists(String indexName);

  /**
   * Lists all the indexes in this table.
   */
  TableIndex[] listIndexes();
}

{code}



  was:
Many data sources support index to improvement query performance. In order to 
take advantage of the index support in data source, the following APIs will be 
added for working with indexes:

{code:java}
  /**
   * Creates an index.
   *
   * @param indexName the name of the index to be created
   * @param indexType the IndexType of the index to be created
   * @param table the table on which index to be created
   * @param columns the columns on which index to be created
   * @param properties the properties of the index to be created
   * @throws IndexAlreadyExistsException If the index already exists (optional)
   * @throws UnsupportedOperationException If create index is not a supported 
operation
   */
  void createIndex(String indexName,
  String indexType,
  Identifier table,
  FieldReference[] columns,
  Map properties)
  throws IndexAlreadyExistsException, UnsupportedOperationException;

  /**
   * Soft deletes the index with the given name.
   * Deleted index can be restored by calling restoreIndex.
   *
   * @param indexName the name of the index to be deleted
   * @return true if the index is deleted
   * @throws NoSuchIndexException If the index does not exist (optional)
   * @throws UnsupportedOperationException If delete index is not a supported 
operation
   */
  default boolean deleteIndex(String indexName)
  throws NoSuchIndexException, UnsupportedOperationException

  /**
   * Checks whether an index exists.
   *
   * @param indexName the name of the index
   * @return true if the index exists, false otherwise
   */
  boolean indexExists(String indexName);

  /**
   * Lists all the indexes in a table.
   *
   * @param table the table to be checked on for indexes
   * @throws NoSuchTableException
   */
  Index[] listIndexes(Identifier table) throws NoSuchTableException;

  /**
   * Hard deletes the index with the given name.
   * The Index can't be restored once dropped.
   *
   * @param indexName the name of the index to be dropped.
   * @return true if the index is dropped
   * @throws NoSuchIndexException If the index does not exist (optional)
   * @throws UnsupportedOperationException If drop index is not a supported 
operation
   */
  boolean dropIndex(String indexName) throws NoSuchIndexException, 
UnsupportedOperationException;

  /**
   * Restores the index with the given name.
   * Deleted index can be restored by calling restoreIndex, but dropped index 
can't be restored.
   *
   * @param indexName the name of the index to be restored
   * @return true if the index is restored
   * @throws NoSuchIndexException If the index does not exist (optional)
   * @throws UnsupportedOperationException
   */
  default boolean restoreIndex(String indexName)
  throws NoSuchIndexException, UnsupportedOperationException

  /**
   * Refreshes index using the latest data. This causes the index to be rebuilt.
   *
   * @param indexName the name of the index to be rebuilt
   * @return true if the index is rebuilt
   * @throws NoSuchIndexException If the index does not exist (optional)
   * @throws UnsupportedOperationException
   */
  default boolean refreshIndex(String indexName)
  throws NoSuchIndexException, UnsupportedOperationException

  /**
   * Alter Index usi

[jira] [Assigned] (SPARK-37131) Support use IN/EXISTS with subquery in Project/Aggregate

2021-10-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37131:


Assignee: (was: Apache Spark)

> Support use IN/EXISTS with subquery in Project/Aggregate
> 
>
> Key: SPARK-37131
> URL: https://issues.apache.org/jira/browse/SPARK-37131
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Tongwei
>Priority: Major
>
> {code:java}
> CREATE TABLE tbl1 (col1 INT, col2 STRING) USING PARQUET;
> INSERT OVERWRITE TABLE tbl1 SELECT 0,1;
> CREATE TABLE tbl2 (c1 INT, c2 STRING) USING PARQUET; 
> INSERT OVERWRITE TABLE tbl2 SELECT 0,2;
> case 1:
> select c1 in (select col1 from tbl1) from tbl2 
> Error msg:
> IN/EXISTS predicate sub-queries can only be used in Filter/Join and a 
> few commands: Project []
> case 2:
> select count(1), case when c1 in (select col1 from tbl1) then "A" else 
> "B" end as tag from tbl2 group by case when c1 in (select col1 from tbl1) 
> then "A" else "B" end 
> Error msg:
> IN/EXISTS predicate sub-queries can only be used in Filter/Join and a 
> few commands: Aggregate []
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37131) Support use IN/EXISTS with subquery in Project/Aggregate

2021-10-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435738#comment-17435738
 ] 

Apache Spark commented on SPARK-37131:
--

User 'TongWeii' has created a pull request for this issue:
https://github.com/apache/spark/pull/34430

> Support use IN/EXISTS with subquery in Project/Aggregate
> 
>
> Key: SPARK-37131
> URL: https://issues.apache.org/jira/browse/SPARK-37131
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Tongwei
>Priority: Major
>
> {code:java}
> CREATE TABLE tbl1 (col1 INT, col2 STRING) USING PARQUET;
> INSERT OVERWRITE TABLE tbl1 SELECT 0,1;
> CREATE TABLE tbl2 (c1 INT, c2 STRING) USING PARQUET; 
> INSERT OVERWRITE TABLE tbl2 SELECT 0,2;
> case 1:
> select c1 in (select col1 from tbl1) from tbl2 
> Error msg:
> IN/EXISTS predicate sub-queries can only be used in Filter/Join and a 
> few commands: Project []
> case 2:
> select count(1), case when c1 in (select col1 from tbl1) then "A" else 
> "B" end as tag from tbl2 group by case when c1 in (select col1 from tbl1) 
> then "A" else "B" end 
> Error msg:
> IN/EXISTS predicate sub-queries can only be used in Filter/Join and a 
> few commands: Aggregate []
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37131) Support use IN/EXISTS with subquery in Project/Aggregate

2021-10-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37131:


Assignee: Apache Spark

> Support use IN/EXISTS with subquery in Project/Aggregate
> 
>
> Key: SPARK-37131
> URL: https://issues.apache.org/jira/browse/SPARK-37131
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Tongwei
>Assignee: Apache Spark
>Priority: Major
>
> {code:java}
> CREATE TABLE tbl1 (col1 INT, col2 STRING) USING PARQUET;
> INSERT OVERWRITE TABLE tbl1 SELECT 0,1;
> CREATE TABLE tbl2 (c1 INT, c2 STRING) USING PARQUET; 
> INSERT OVERWRITE TABLE tbl2 SELECT 0,2;
> case 1:
> select c1 in (select col1 from tbl1) from tbl2 
> Error msg:
> IN/EXISTS predicate sub-queries can only be used in Filter/Join and a 
> few commands: Project []
> case 2:
> select count(1), case when c1 in (select col1 from tbl1) then "A" else 
> "B" end as tag from tbl2 group by case when c1 in (select col1 from tbl1) 
> then "A" else "B" end 
> Error msg:
> IN/EXISTS predicate sub-queries can only be used in Filter/Join and a 
> few commands: Aggregate []
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21187) Complete support for remaining Spark data types in Arrow Converters

2021-10-28 Thread Bryan Cutler (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-21187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler updated SPARK-21187:
-
Attachment: (was: 0--1172099527-254246775-1412485878)

> Complete support for remaining Spark data types in Arrow Converters
> ---
>
> Key: SPARK-21187
> URL: https://issues.apache.org/jira/browse/SPARK-21187
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark, SQL
>Affects Versions: 2.3.0
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Major
> Fix For: 3.1.0
>
>
> This is to track adding the remaining type support in Arrow Converters. 
> Currently, only primitive data types are supported. '
> Remaining types:
>  * -*Date*-
>  * -*Timestamp*-
>  * *Complex*: -Struct-, -Array-, -Map-
>  * -*Decimal*-
>  * -*Binary*-
>  * -*Categorical*- when converting from Pandas
> Some things to do before closing this out:
>  * -Look to upgrading to Arrow 0.7 for better Decimal support (can now write 
> values as BigDecimal)-
>  * -Need to add some user docs-
>  * -Make sure Python tests are thorough-
>  * Check into complex type support mentioned in comments by [~leif], should 
> we support mulit-indexing?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35437) Use expressions to filter Hive partitions at client side

2021-10-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435753#comment-17435753
 ] 

Apache Spark commented on SPARK-35437:
--

User 'cxzl25' has created a pull request for this issue:
https://github.com/apache/spark/pull/34431

> Use expressions to filter Hive partitions at client side
> 
>
> Key: SPARK-35437
> URL: https://issues.apache.org/jira/browse/SPARK-35437
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: dzcxzl
>Priority: Minor
> Fix For: 3.3.0
>
>
> When we have a table with a lot of partitions and there is no way to filter 
> it on the MetaStore Server, we will get all the partition details and filter 
> it on the client side. This is slow and puts a lot of pressure on the 
> MetaStore Server.
> We can first pull all the partition names, filter by expressions, and then 
> obtain detailed information about the corresponding partitions from the 
> MetaStore Server.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32268) Bloom Filter Join

2021-10-28 Thread Penglei Shi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435755#comment-17435755
 ] 

Penglei Shi commented on SPARK-32268:
-

Hi [~yumwang], I have some problems about bloom filter in my test(tpcds-q37),
 # For computing left and right rowCount of the join,  the estimated size of 
the join(inventory, item, date_dim) is huge, because it's a product of children 
size according to SizeInBytesOnlyStatsPlanVisitor.visitJoin, this makes a wrong 
small side.
 # For the configuration 
`spark.sql.optimizer.dynamicPartitionPruning.pruningSideExtraFilterRatio`,  the 
default value is 0.04, it looks like the filtering side row count should less 
than the pruning size row count * 0.04 * 0.04 in `pruningHasBenefit`, this also 
makes q37 not working with bloom filter.

Could you help me?

> Bloom Filter Join
> -
>
> Key: SPARK-32268
> URL: https://issues.apache.org/jira/browse/SPARK-32268
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Attachments: q16-bloom-filter.jpg, q16-default.jpg
>
>
> We can improve the performance of some joins by pre-filtering one side of a 
> join using a Bloom filter and IN predicate generated from the values from the 
> other side of the join.
>  For 
> example:[tpcds/q16.sql|https://github.com/apache/spark/blob/a78d6ce376edf2a8836e01f47b9dff5371058d4c/sql/core/src/test/resources/tpcds/q16.sql].
>  [Before this 
> optimization|https://issues.apache.org/jira/secure/attachment/13007418/q16-default.jpg].
>  [After this 
> optimization|https://issues.apache.org/jira/secure/attachment/13007416/q16-bloom-filter.jpg].
> *Query Performance Benchmarks: TPC-DS Performance Evaluation*
>  Our setup for running TPC-DS benchmark was as follows: TPC-DS 5T and 
> Partitioned Parquet table
>  
> |Query|Default(Seconds)|Enable Bloom Filter Join(Seconds)|
> |tpcds q16|84|46|
> |tpcds q36|29|21|
> |tpcds q57|39|28|
> |tpcds q94|42|34|
> |tpcds q95|306|288|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37152) Inline type hints for python/pyspark/context.py

2021-10-28 Thread Byron Hsu (Jira)
Byron Hsu created SPARK-37152:
-

 Summary: Inline type hints for python/pyspark/context.py
 Key: SPARK-37152
 URL: https://issues.apache.org/jira/browse/SPARK-37152
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Byron Hsu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37153) Inline type hints for python/pyspark/profiler.py

2021-10-28 Thread Byron Hsu (Jira)
Byron Hsu created SPARK-37153:
-

 Summary: Inline type hints for python/pyspark/profiler.py
 Key: SPARK-37153
 URL: https://issues.apache.org/jira/browse/SPARK-37153
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Byron Hsu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37152) Inline type hints for python/pyspark/context.py

2021-10-28 Thread Byron Hsu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435757#comment-17435757
 ] 

Byron Hsu commented on SPARK-37152:
---

i am working on this

> Inline type hints for python/pyspark/context.py
> ---
>
> Key: SPARK-37152
> URL: https://issues.apache.org/jira/browse/SPARK-37152
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Byron Hsu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37153) Inline type hints for python/pyspark/profiler.py

2021-10-28 Thread Byron Hsu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435758#comment-17435758
 ] 

Byron Hsu commented on SPARK-37153:
---

i am working on this

> Inline type hints for python/pyspark/profiler.py
> 
>
> Key: SPARK-37153
> URL: https://issues.apache.org/jira/browse/SPARK-37153
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Byron Hsu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37154) Inline type hints for python/pyspark/rdd.py

2021-10-28 Thread Byron Hsu (Jira)
Byron Hsu created SPARK-37154:
-

 Summary: Inline type hints for python/pyspark/rdd.py
 Key: SPARK-37154
 URL: https://issues.apache.org/jira/browse/SPARK-37154
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Byron Hsu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37157) Inline type hints for python/pyspark/util.py

2021-10-28 Thread Byron Hsu (Jira)
Byron Hsu created SPARK-37157:
-

 Summary: Inline type hints for python/pyspark/util.py
 Key: SPARK-37157
 URL: https://issues.apache.org/jira/browse/SPARK-37157
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Byron Hsu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37156) Inline type hints for python/pyspark/storagelevel.py

2021-10-28 Thread Byron Hsu (Jira)
Byron Hsu created SPARK-37156:
-

 Summary: Inline type hints for python/pyspark/storagelevel.py
 Key: SPARK-37156
 URL: https://issues.apache.org/jira/browse/SPARK-37156
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Byron Hsu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37155) Inline type hints for python/pyspark/statcounter.py

2021-10-28 Thread Byron Hsu (Jira)
Byron Hsu created SPARK-37155:
-

 Summary: Inline type hints for python/pyspark/statcounter.py
 Key: SPARK-37155
 URL: https://issues.apache.org/jira/browse/SPARK-37155
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Byron Hsu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37154) Inline type hints for python/pyspark/rdd.py

2021-10-28 Thread Byron Hsu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435760#comment-17435760
 ] 

Byron Hsu commented on SPARK-37154:
---

i am working on this

> Inline type hints for python/pyspark/rdd.py
> ---
>
> Key: SPARK-37154
> URL: https://issues.apache.org/jira/browse/SPARK-37154
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Byron Hsu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37156) Inline type hints for python/pyspark/storagelevel.py

2021-10-28 Thread Byron Hsu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435762#comment-17435762
 ] 

Byron Hsu commented on SPARK-37156:
---

i am working on this

> Inline type hints for python/pyspark/storagelevel.py
> 
>
> Key: SPARK-37156
> URL: https://issues.apache.org/jira/browse/SPARK-37156
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Byron Hsu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37155) Inline type hints for python/pyspark/statcounter.py

2021-10-28 Thread Byron Hsu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435763#comment-17435763
 ] 

Byron Hsu commented on SPARK-37155:
---

i am working on this

> Inline type hints for python/pyspark/statcounter.py
> ---
>
> Key: SPARK-37155
> URL: https://issues.apache.org/jira/browse/SPARK-37155
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Byron Hsu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37157) Inline type hints for python/pyspark/util.py

2021-10-28 Thread Byron Hsu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435761#comment-17435761
 ] 

Byron Hsu commented on SPARK-37157:
---

i am working on this

> Inline type hints for python/pyspark/util.py
> 
>
> Key: SPARK-37157
> URL: https://issues.apache.org/jira/browse/SPARK-37157
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Byron Hsu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37139) Inline type hints for python/pyspark/taskcontext.py and python/pyspark/version.py

2021-10-28 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-37139:


Assignee: dch nguyen

> Inline type hints for python/pyspark/taskcontext.py and 
> python/pyspark/version.py
> -
>
> Key: SPARK-37139
> URL: https://issues.apache.org/jira/browse/SPARK-37139
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Assignee: dch nguyen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37139) Inline type hints for python/pyspark/taskcontext.py and python/pyspark/version.py

2021-10-28 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-37139.
--
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34414
[https://github.com/apache/spark/pull/34414]

> Inline type hints for python/pyspark/taskcontext.py and 
> python/pyspark/version.py
> -
>
> Key: SPARK-37139
> URL: https://issues.apache.org/jira/browse/SPARK-37139
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Assignee: dch nguyen
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37042) Inline type hints for kinesis.py and listener.py in python/pyspark/streaming

2021-10-28 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-37042.
--
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34323
[https://github.com/apache/spark/pull/34323]

> Inline type hints for kinesis.py and listener.py in python/pyspark/streaming
> 
>
> Key: SPARK-37042
> URL: https://issues.apache.org/jira/browse/SPARK-37042
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Assignee: dch nguyen
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37042) Inline type hints for kinesis.py and listener.py in python/pyspark/streaming

2021-10-28 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-37042:


Assignee: dch nguyen

> Inline type hints for kinesis.py and listener.py in python/pyspark/streaming
> 
>
> Key: SPARK-37042
> URL: https://issues.apache.org/jira/browse/SPARK-37042
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Assignee: dch nguyen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37144) Inline type hints for python/pyspark/file.py

2021-10-28 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-37144:


Assignee: dch nguyen

> Inline type hints for python/pyspark/file.py
> 
>
> Key: SPARK-37144
> URL: https://issues.apache.org/jira/browse/SPARK-37144
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Assignee: dch nguyen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37144) Inline type hints for python/pyspark/file.py

2021-10-28 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-37144.
--
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34424
[https://github.com/apache/spark/pull/34424]

> Inline type hints for python/pyspark/file.py
> 
>
> Key: SPARK-37144
> URL: https://issues.apache.org/jira/browse/SPARK-37144
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Assignee: dch nguyen
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37107) Inline type hints for files in python/pyspark/status.py

2021-10-28 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-37107.
--
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34375
[https://github.com/apache/spark/pull/34375]

> Inline type hints for files in python/pyspark/status.py
> ---
>
> Key: SPARK-37107
> URL: https://issues.apache.org/jira/browse/SPARK-37107
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Assignee: dch nguyen
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37107) Inline type hints for files in python/pyspark/status.py

2021-10-28 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-37107:


Assignee: dch nguyen

> Inline type hints for files in python/pyspark/status.py
> ---
>
> Key: SPARK-37107
> URL: https://issues.apache.org/jira/browse/SPARK-37107
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Assignee: dch nguyen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37134) documentation - unclear "Using PySpark Native Features"

2021-10-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37134:


Assignee: (was: Apache Spark)

> documentation - unclear "Using PySpark Native Features"
> ---
>
> Key: SPARK-37134
> URL: https://issues.apache.org/jira/browse/SPARK-37134
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.6.2
>Reporter: carl rees
>Priority: Major
>
> sorry, no idea on the version it affects or what Shepard is? no explanation 
> on this form so guessed whatever!
>  
> This page of your documentation is UNCLEAR
> paragraph "Using PySpark Native Features" QUOTE
> "PySpark allows to upload Python files ({{.py}}), zipped Python packages 
> ({{.zip}}), and Egg files ({{.egg}}) to the executors by:
>  * Setting the configuration setting {{spark.submit.pyFiles}}
>  * Setting {{--py-files}} option in Spark scripts
>  * Directly calling 
> [{{pyspark.SparkContext.addPyFile()}}|https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.SparkContext.addPyFile.html#pyspark.SparkContext.addPyFile]
>  in applications
>  
> QUESTION: is this all of the above or each of the above steps?
> suggest adding "OR" between each bullet point?
>  
> [https://spark.apache.org/docs/latest/api/python/user_guide/python_packaging.html]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37134) documentation - unclear "Using PySpark Native Features"

2021-10-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435778#comment-17435778
 ] 

Apache Spark commented on SPARK-37134:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/34432

> documentation - unclear "Using PySpark Native Features"
> ---
>
> Key: SPARK-37134
> URL: https://issues.apache.org/jira/browse/SPARK-37134
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.6.2
>Reporter: carl rees
>Priority: Major
>
> sorry, no idea on the version it affects or what Shepard is? no explanation 
> on this form so guessed whatever!
>  
> This page of your documentation is UNCLEAR
> paragraph "Using PySpark Native Features" QUOTE
> "PySpark allows to upload Python files ({{.py}}), zipped Python packages 
> ({{.zip}}), and Egg files ({{.egg}}) to the executors by:
>  * Setting the configuration setting {{spark.submit.pyFiles}}
>  * Setting {{--py-files}} option in Spark scripts
>  * Directly calling 
> [{{pyspark.SparkContext.addPyFile()}}|https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.SparkContext.addPyFile.html#pyspark.SparkContext.addPyFile]
>  in applications
>  
> QUESTION: is this all of the above or each of the above steps?
> suggest adding "OR" between each bullet point?
>  
> [https://spark.apache.org/docs/latest/api/python/user_guide/python_packaging.html]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37134) documentation - unclear "Using PySpark Native Features"

2021-10-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37134:


Assignee: Apache Spark

> documentation - unclear "Using PySpark Native Features"
> ---
>
> Key: SPARK-37134
> URL: https://issues.apache.org/jira/browse/SPARK-37134
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.6.2
>Reporter: carl rees
>Assignee: Apache Spark
>Priority: Major
>
> sorry, no idea on the version it affects or what Shepard is? no explanation 
> on this form so guessed whatever!
>  
> This page of your documentation is UNCLEAR
> paragraph "Using PySpark Native Features" QUOTE
> "PySpark allows to upload Python files ({{.py}}), zipped Python packages 
> ({{.zip}}), and Egg files ({{.egg}}) to the executors by:
>  * Setting the configuration setting {{spark.submit.pyFiles}}
>  * Setting {{--py-files}} option in Spark scripts
>  * Directly calling 
> [{{pyspark.SparkContext.addPyFile()}}|https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.SparkContext.addPyFile.html#pyspark.SparkContext.addPyFile]
>  in applications
>  
> QUESTION: is this all of the above or each of the above steps?
> suggest adding "OR" between each bullet point?
>  
> [https://spark.apache.org/docs/latest/api/python/user_guide/python_packaging.html]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32268) Bloom Filter Join

2021-10-28 Thread Yuming Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435782#comment-17435782
 ] 

Yuming Wang commented on SPARK-32268:
-

[~Penglei Shi] Please add my WeChat: yumwang666. We can discuss offline.

> Bloom Filter Join
> -
>
> Key: SPARK-32268
> URL: https://issues.apache.org/jira/browse/SPARK-32268
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Attachments: q16-bloom-filter.jpg, q16-default.jpg
>
>
> We can improve the performance of some joins by pre-filtering one side of a 
> join using a Bloom filter and IN predicate generated from the values from the 
> other side of the join.
>  For 
> example:[tpcds/q16.sql|https://github.com/apache/spark/blob/a78d6ce376edf2a8836e01f47b9dff5371058d4c/sql/core/src/test/resources/tpcds/q16.sql].
>  [Before this 
> optimization|https://issues.apache.org/jira/secure/attachment/13007418/q16-default.jpg].
>  [After this 
> optimization|https://issues.apache.org/jira/secure/attachment/13007416/q16-bloom-filter.jpg].
> *Query Performance Benchmarks: TPC-DS Performance Evaluation*
>  Our setup for running TPC-DS benchmark was as follows: TPC-DS 5T and 
> Partitioned Parquet table
>  
> |Query|Default(Seconds)|Enable Bloom Filter Join(Seconds)|
> |tpcds q16|84|46|
> |tpcds q36|29|21|
> |tpcds q57|39|28|
> |tpcds q94|42|34|
> |tpcds q95|306|288|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37134) documentation - unclear "Using PySpark Native Features"

2021-10-28 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-37134:
-
Affects Version/s: (was: 1.6.2)
   3.2.0

> documentation - unclear "Using PySpark Native Features"
> ---
>
> Key: SPARK-37134
> URL: https://issues.apache.org/jira/browse/SPARK-37134
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.2.0
>Reporter: carl rees
>Priority: Major
>
> sorry, no idea on the version it affects or what Shepard is? no explanation 
> on this form so guessed whatever!
>  
> This page of your documentation is UNCLEAR
> paragraph "Using PySpark Native Features" QUOTE
> "PySpark allows to upload Python files ({{.py}}), zipped Python packages 
> ({{.zip}}), and Egg files ({{.egg}}) to the executors by:
>  * Setting the configuration setting {{spark.submit.pyFiles}}
>  * Setting {{--py-files}} option in Spark scripts
>  * Directly calling 
> [{{pyspark.SparkContext.addPyFile()}}|https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.SparkContext.addPyFile.html#pyspark.SparkContext.addPyFile]
>  in applications
>  
> QUESTION: is this all of the above or each of the above steps?
> suggest adding "OR" between each bullet point?
>  
> [https://spark.apache.org/docs/latest/api/python/user_guide/python_packaging.html]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37134) documentation - unclear "Using PySpark Native Features"

2021-10-28 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-37134.
--
Fix Version/s: 3.3.0
   3.2.1
 Assignee: Hyukjin Kwon
   Resolution: Fixed

Fixed in https://github.com/apache/spark/pull/34432

> documentation - unclear "Using PySpark Native Features"
> ---
>
> Key: SPARK-37134
> URL: https://issues.apache.org/jira/browse/SPARK-37134
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.2.0
>Reporter: carl rees
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.2.1, 3.3.0
>
>
> sorry, no idea on the version it affects or what Shepard is? no explanation 
> on this form so guessed whatever!
>  
> This page of your documentation is UNCLEAR
> paragraph "Using PySpark Native Features" QUOTE
> "PySpark allows to upload Python files ({{.py}}), zipped Python packages 
> ({{.zip}}), and Egg files ({{.egg}}) to the executors by:
>  * Setting the configuration setting {{spark.submit.pyFiles}}
>  * Setting {{--py-files}} option in Spark scripts
>  * Directly calling 
> [{{pyspark.SparkContext.addPyFile()}}|https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.SparkContext.addPyFile.html#pyspark.SparkContext.addPyFile]
>  in applications
>  
> QUESTION: is this all of the above or each of the above steps?
> suggest adding "OR" between each bullet point?
>  
> [https://spark.apache.org/docs/latest/api/python/user_guide/python_packaging.html]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >