[jira] [Commented] (SPARK-30043) Add built-in Array Functions: array_fill
[ https://issues.apache.org/jira/browse/SPARK-30043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17005181#comment-17005181 ] jiaan.geng commented on SPARK-30043: OK. > Add built-in Array Functions: array_fill > > > Key: SPARK-30043 > URL: https://issues.apache.org/jira/browse/SPARK-30043 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: jiaan.geng >Priority: Major > > |{{array_fill}}{{(}}{{anyelement}}{{, }}{{int[]}}{{ [, > {{int[]}}])}}|{{anyarray}}|returns an array initialized with supplied value > and dimensions, optionally with lower bounds other than 1|{{array_fill(7, > ARRAY[3], ARRAY[2])}}|{{[2:4]=\{7,7,7}}}| > [https://www.postgresql.org/docs/11/functions-array.html] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30042) Add built-in Array Functions: array_dims
[ https://issues.apache.org/jira/browse/SPARK-30042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17005180#comment-17005180 ] jiaan.geng commented on SPARK-30042: OK. I doubt the useful too. > Add built-in Array Functions: array_dims > > > Key: SPARK-30042 > URL: https://issues.apache.org/jira/browse/SPARK-30042 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: jiaan.geng >Priority: Major > > |{{array_dims}}{{(}}{{anyarray}}{{)}}|{{text}}|returns a text representation > of array's dimensions|{{array_dims(ARRAY[[1,2,3], [4,5,6]])}}|{{[1:2][1:3]}}| > [https://www.postgresql.org/docs/11/functions-array.html] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-30370) Update SqlBase.g4 to combine namespace and database tokens.
[ https://issues.apache.org/jira/browse/SPARK-30370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-30370: --- Assignee: Terry Kim > Update SqlBase.g4 to combine namespace and database tokens. > --- > > Key: SPARK-30370 > URL: https://issues.apache.org/jira/browse/SPARK-30370 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Terry Kim >Assignee: Terry Kim >Priority: Minor > > Instead of using `(database | NAMESPACE)` in the grammar, create > namespace : NAMESPACE | DATABASE | SCHEMA; > and use it instead. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-30370) Update SqlBase.g4 to combine namespace and database tokens.
[ https://issues.apache.org/jira/browse/SPARK-30370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-30370. - Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 27027 [https://github.com/apache/spark/pull/27027] > Update SqlBase.g4 to combine namespace and database tokens. > --- > > Key: SPARK-30370 > URL: https://issues.apache.org/jira/browse/SPARK-30370 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Terry Kim >Assignee: Terry Kim >Priority: Minor > Fix For: 3.0.0 > > > Instead of using `(database | NAMESPACE)` in the grammar, create > namespace : NAMESPACE | DATABASE | SCHEMA; > and use it instead. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-30348) Flaky test: org.apache.spark.deploy.master.MasterSuite.SPARK-27510: Master should avoid dead loop while launching executor failed in Worker
[ https://issues.apache.org/jira/browse/SPARK-30348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-30348. - Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 27004 [https://github.com/apache/spark/pull/27004] > Flaky test: org.apache.spark.deploy.master.MasterSuite.SPARK-27510: Master > should avoid dead loop while launching executor failed in Worker > > > Key: SPARK-30348 > URL: https://issues.apache.org/jira/browse/SPARK-30348 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Major > Fix For: 3.0.0 > > > [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115664/testReport/] > > {code:java} > org.apache.spark.deploy.master.MasterSuite.SPARK-27510: Master should avoid > dead loop while launching executor failed in Worker > org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to > eventually never returned normally. Attempted 656 times over 10.002408616 > seconds. Last failure message: Map() did not contain key > "app-20191223154506-". > sbt.ForkMain$ForkError: > org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to > eventually never returned normally. Attempted 656 times over 10.002408616 > seconds. Last failure message: Map() did not contain key > "app-20191223154506-". > at > org.scalatest.concurrent.Eventually.tryTryAgain$1(Eventually.scala:432) > at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:439) > at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:391) > at > org.apache.spark.deploy.master.MasterSuite.eventually(MasterSuite.scala:111) > at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:337) > at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:336) > at > org.apache.spark.deploy.master.MasterSuite.eventually(MasterSuite.scala:111) > at > org.apache.spark.deploy.master.MasterSuite.$anonfun$new$40(MasterSuite.scala:681) > at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) > at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:149) > at > org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184) > at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196) > at org.scalatest.SuperEngine.runTestImpl(Engine.scala:286) > at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196) > at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178) > at > org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:56) > at > org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:221) > at > org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:214) > at > org.apache.spark.deploy.master.MasterSuite.org$scalatest$BeforeAndAfter$$super$runTest(MasterSuite.scala:111) > at org.scalatest.BeforeAndAfter.runTest(BeforeAndAfter.scala:203) > at org.scalatest.BeforeAndAfter.runTest$(BeforeAndAfter.scala:192) > at > org.apache.spark.deploy.master.MasterSuite.runTest(MasterSuite.scala:111) > at > org.scalatest.FunSuiteLike.$anonfun$runTests$1(FunSuiteLike.scala:229) > at > org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:393) > at scala.collection.immutable.List.foreach(List.scala:392) > at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:381) > at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:376) > at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:458) > at org.scalatest.FunSuiteLike.runTests(FunSuiteLike.scala:229) > at org.scalatest.FunSuiteLike.runTests$(FunSuiteLike.scala:228) > at org.scalatest.FunSuite.runTests(FunSuite.scala:1560) > at org.scalatest.Suite.run(Suite.scala:1124) > at org.scalatest.Suite.run$(Suite.scala:1106) > at > org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560) > at org.scalatest.FunSuiteLike.$anonfun$run$1(FunSuiteLike.scala:233) > at org.scalatest.SuperEngine.runImpl(Engine.scala:518) > at org.scalatest.FunSuiteLike.run(FunSuiteLike.scal
[jira] [Assigned] (SPARK-30348) Flaky test: org.apache.spark.deploy.master.MasterSuite.SPARK-27510: Master should avoid dead loop while launching executor failed in Worker
[ https://issues.apache.org/jira/browse/SPARK-30348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-30348: --- Assignee: Jungtaek Lim > Flaky test: org.apache.spark.deploy.master.MasterSuite.SPARK-27510: Master > should avoid dead loop while launching executor failed in Worker > > > Key: SPARK-30348 > URL: https://issues.apache.org/jira/browse/SPARK-30348 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Major > > [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115664/testReport/] > > {code:java} > org.apache.spark.deploy.master.MasterSuite.SPARK-27510: Master should avoid > dead loop while launching executor failed in Worker > org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to > eventually never returned normally. Attempted 656 times over 10.002408616 > seconds. Last failure message: Map() did not contain key > "app-20191223154506-". > sbt.ForkMain$ForkError: > org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to > eventually never returned normally. Attempted 656 times over 10.002408616 > seconds. Last failure message: Map() did not contain key > "app-20191223154506-". > at > org.scalatest.concurrent.Eventually.tryTryAgain$1(Eventually.scala:432) > at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:439) > at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:391) > at > org.apache.spark.deploy.master.MasterSuite.eventually(MasterSuite.scala:111) > at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:337) > at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:336) > at > org.apache.spark.deploy.master.MasterSuite.eventually(MasterSuite.scala:111) > at > org.apache.spark.deploy.master.MasterSuite.$anonfun$new$40(MasterSuite.scala:681) > at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) > at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:149) > at > org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184) > at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196) > at org.scalatest.SuperEngine.runTestImpl(Engine.scala:286) > at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196) > at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178) > at > org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:56) > at > org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:221) > at > org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:214) > at > org.apache.spark.deploy.master.MasterSuite.org$scalatest$BeforeAndAfter$$super$runTest(MasterSuite.scala:111) > at org.scalatest.BeforeAndAfter.runTest(BeforeAndAfter.scala:203) > at org.scalatest.BeforeAndAfter.runTest$(BeforeAndAfter.scala:192) > at > org.apache.spark.deploy.master.MasterSuite.runTest(MasterSuite.scala:111) > at > org.scalatest.FunSuiteLike.$anonfun$runTests$1(FunSuiteLike.scala:229) > at > org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:393) > at scala.collection.immutable.List.foreach(List.scala:392) > at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:381) > at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:376) > at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:458) > at org.scalatest.FunSuiteLike.runTests(FunSuiteLike.scala:229) > at org.scalatest.FunSuiteLike.runTests$(FunSuiteLike.scala:228) > at org.scalatest.FunSuite.runTests(FunSuite.scala:1560) > at org.scalatest.Suite.run(Suite.scala:1124) > at org.scalatest.Suite.run$(Suite.scala:1106) > at > org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560) > at org.scalatest.FunSuiteLike.$anonfun$run$1(FunSuiteLike.scala:233) > at org.scalatest.SuperEngine.runImpl(Engine.scala:518) > at org.scalatest.FunSuiteLike.run(FunSuiteLike.scala:233) > at org.scalatest.FunSuiteLike.run$(FunSuiteLike.scala:232) > at > org.apache.spark.SparkFunSuite.org$scalatest
[jira] [Commented] (SPARK-29098) Test both ANSI mode and Spark mode
[ https://issues.apache.org/jira/browse/SPARK-29098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17005145#comment-17005145 ] Aman Omer commented on SPARK-29098: --- I will work on this > Test both ANSI mode and Spark mode > -- > > Key: SPARK-29098 > URL: https://issues.apache.org/jira/browse/SPARK-29098 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Gengliang Wang >Priority: Major > > The PostgreSQL test case improves the test coverage of Spark SQL. > There are SQL files that have different results with/without ANSI > flags(spark.sql.failOnIntegralTypeOverflow, spark.sql.parser.ansi.enabled, > etc) enabled. > We should run tests against these SQL files with both ANSI mode and Spark > mode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-27148) Support CURRENT_TIME and LOCALTIME when ANSI mode enabled
[ https://issues.apache.org/jira/browse/SPARK-27148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Omer updated SPARK-27148: -- Comment: was deleted (was: I will work on this.) > Support CURRENT_TIME and LOCALTIME when ANSI mode enabled > - > > Key: SPARK-27148 > URL: https://issues.apache.org/jira/browse/SPARK-27148 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Takeshi Yamamuro >Priority: Major > > CURRENT_TIME and LOCALTIME should be supported in the ANSI standard; > {code:java} > postgres=# select CURRENT_TIME; > timetz > > 16:45:43.398109+09 > (1 row) > postgres=# select LOCALTIME; > time > > 16:45:48.60969 > (1 row){code} > Before this, we need to support TIME types (java.sql.Time). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-29098) Test both ANSI mode and Spark mode
[ https://issues.apache.org/jira/browse/SPARK-29098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Omer updated SPARK-29098: -- Comment: was deleted (was: I will work on this.) > Test both ANSI mode and Spark mode > -- > > Key: SPARK-29098 > URL: https://issues.apache.org/jira/browse/SPARK-29098 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Gengliang Wang >Priority: Major > > The PostgreSQL test case improves the test coverage of Spark SQL. > There are SQL files that have different results with/without ANSI > flags(spark.sql.failOnIntegralTypeOverflow, spark.sql.parser.ansi.enabled, > etc) enabled. > We should run tests against these SQL files with both ANSI mode and Spark > mode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27348) HeartbeatReceiver doesn't remove lost executors from CoarseGrainedSchedulerBackend
[ https://issues.apache.org/jira/browse/SPARK-27348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-27348: --- Assignee: wuyi > HeartbeatReceiver doesn't remove lost executors from > CoarseGrainedSchedulerBackend > -- > > Key: SPARK-27348 > URL: https://issues.apache.org/jira/browse/SPARK-27348 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Shixiong Zhu >Assignee: wuyi >Priority: Major > Fix For: 3.0.0 > > > When a heartbeat timeout happens in HeartbeatReceiver, it doesn't remove lost > executors from CoarseGrainedSchedulerBackend. When a connection of an > executor is not gracefully shut down, CoarseGrainedSchedulerBackend may not > receive a disconnect event. In this case, CoarseGrainedSchedulerBackend still > thinks a lost executor is still alive. CoarseGrainedSchedulerBackend may ask > TaskScheduler to run tasks on this lost executor. This task will never finish > and the job will hang forever. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27348) HeartbeatReceiver doesn't remove lost executors from CoarseGrainedSchedulerBackend
[ https://issues.apache.org/jira/browse/SPARK-27348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-27348. - Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 26980 [https://github.com/apache/spark/pull/26980] > HeartbeatReceiver doesn't remove lost executors from > CoarseGrainedSchedulerBackend > -- > > Key: SPARK-27348 > URL: https://issues.apache.org/jira/browse/SPARK-27348 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Shixiong Zhu >Priority: Major > Fix For: 3.0.0 > > > When a heartbeat timeout happens in HeartbeatReceiver, it doesn't remove lost > executors from CoarseGrainedSchedulerBackend. When a connection of an > executor is not gracefully shut down, CoarseGrainedSchedulerBackend may not > receive a disconnect event. In this case, CoarseGrainedSchedulerBackend still > thinks a lost executor is still alive. CoarseGrainedSchedulerBackend may ask > TaskScheduler to run tasks on this lost executor. This task will never finish > and the job will hang forever. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29098) Test both ANSI mode and Spark mode
[ https://issues.apache.org/jira/browse/SPARK-29098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17005136#comment-17005136 ] Aman Omer commented on SPARK-29098: --- I will work on this. > Test both ANSI mode and Spark mode > -- > > Key: SPARK-29098 > URL: https://issues.apache.org/jira/browse/SPARK-29098 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Gengliang Wang >Priority: Major > > The PostgreSQL test case improves the test coverage of Spark SQL. > There are SQL files that have different results with/without ANSI > flags(spark.sql.failOnIntegralTypeOverflow, spark.sql.parser.ansi.enabled, > etc) enabled. > We should run tests against these SQL files with both ANSI mode and Spark > mode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27148) Support CURRENT_TIME and LOCALTIME when ANSI mode enabled
[ https://issues.apache.org/jira/browse/SPARK-27148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17005135#comment-17005135 ] Aman Omer commented on SPARK-27148: --- I will work on this. > Support CURRENT_TIME and LOCALTIME when ANSI mode enabled > - > > Key: SPARK-27148 > URL: https://issues.apache.org/jira/browse/SPARK-27148 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Takeshi Yamamuro >Priority: Major > > CURRENT_TIME and LOCALTIME should be supported in the ANSI standard; > {code:java} > postgres=# select CURRENT_TIME; > timetz > > 16:45:43.398109+09 > (1 row) > postgres=# select LOCALTIME; > time > > 16:45:48.60969 > (1 row){code} > Before this, we need to support TIME types (java.sql.Time). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28122) Binary String Functions: SHA functions
[ https://issues.apache.org/jira/browse/SPARK-28122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17005131#comment-17005131 ] Aman Omer commented on SPARK-28122: --- I will work on this. > Binary String Functions: SHA functions > --- > > Key: SPARK-28122 > URL: https://issues.apache.org/jira/browse/SPARK-28122 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > ||Function||Return Type||Description||Example||Result|| > |{{sha224(}}{{bytea}}{{)}}|{{bytea}}|SHA-224 > hash|{{sha224('abc')}}|{{\x23097d223405d8228642a477bda255b32aadbce4bda0b3f7e36c9da7}}| > |{{sha256(}}{{bytea}}{{)}}|{{bytea}}|SHA-256 > hash|{{sha256('abc')}}|{{\xba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad}}| > |{{sha384(}}{{bytea}}{{)}}|{{bytea}}|SHA-384 > hash|{{sha384('abc')}}|{{\xcb00753f45a35e8bb5a03d699ac65007272c32ab0eded1631a8b605a43ff5bed8086072ba1e7cc2358baeca134c825a7}}| > |{{sha512(}}{{bytea}}{{)}}|{{bytea}}|SHA-512 > hash|{{sha512('abc')}}|{{\xddaf35a193617abacc417349ae20413112e6fa4e89a97ea20a964b55d39a2192992a274fc1a836ba3c23a3feebbd454d4423643ce80e2a9ac94fa54ca49f}}| > More details: https://www.postgresql.org/docs/11/functions-binarystring.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30377) Make Regressors extend abstract class Regressor
[ https://issues.apache.org/jira/browse/SPARK-30377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17005099#comment-17005099 ] zhengruifeng commented on SPARK-30377: -- [~huaxingao] As to making OVR extend Classifier, there is a previous ticket about in https://issues.apache.org/jira/browse/SPARK-8799 > Make Regressors extend abstract class Regressor > --- > > Key: SPARK-30377 > URL: https://issues.apache.org/jira/browse/SPARK-30377 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 3.0.0 >Reporter: zhengruifeng >Priority: Major > > Just found that {{AFTSurvivalRegression}} , {{DecisionTreeRegressor}}, > {{FMRegressor}}, {{GBTRegressor}}, {{RandomForestRegressor}} directly extend > {{Predictor}} > > Only {{GeneralizedLinearRegression}} and {{LinearRegression now extend > Regressor.}} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30377) Make Regressors extend abstract class Regressor
[ https://issues.apache.org/jira/browse/SPARK-30377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17005098#comment-17005098 ] zhengruifeng commented on SPARK-30377: -- there is no api difference now. > Make Regressors extend abstract class Regressor > --- > > Key: SPARK-30377 > URL: https://issues.apache.org/jira/browse/SPARK-30377 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 3.0.0 >Reporter: zhengruifeng >Priority: Major > > Just found that {{AFTSurvivalRegression}} , {{DecisionTreeRegressor}}, > {{FMRegressor}}, {{GBTRegressor}}, {{RandomForestRegressor}} directly extend > {{Predictor}} > > Only {{GeneralizedLinearRegression}} and {{LinearRegression now extend > Regressor.}} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-30376) Unify the computation of numFeatures
[ https://issues.apache.org/jira/browse/SPARK-30376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng resolved SPARK-30376. -- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 27037 [https://github.com/apache/spark/pull/27037] > Unify the computation of numFeatures > > > Key: SPARK-30376 > URL: https://issues.apache.org/jira/browse/SPARK-30376 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 3.0.0 >Reporter: zhengruifeng >Assignee: zhengruifeng >Priority: Trivial > Fix For: 3.0.0 > > > Try to extract numFeatures from metadata first, if do not exists, then > extract from the dataset. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-30376) Unify the computation of numFeatures
[ https://issues.apache.org/jira/browse/SPARK-30376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng reassigned SPARK-30376: Assignee: zhengruifeng > Unify the computation of numFeatures > > > Key: SPARK-30376 > URL: https://issues.apache.org/jira/browse/SPARK-30376 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 3.0.0 >Reporter: zhengruifeng >Assignee: zhengruifeng >Priority: Trivial > > Try to extract numFeatures from metadata first, if do not exists, then > extract from the dataset. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30377) Make Regressors extend abstract class Regressor
[ https://issues.apache.org/jira/browse/SPARK-30377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17005086#comment-17005086 ] Huaxin Gao commented on SPARK-30377: Seems there is no API difference, but it would be more logical to make every Regressor to extend ml.Regressor. I quickly checked on Classifiers. Everything extends Classifier except OneVsRest. I guess I will open a separate Jira to make OneVsRest extend Classifier. > Make Regressors extend abstract class Regressor > --- > > Key: SPARK-30377 > URL: https://issues.apache.org/jira/browse/SPARK-30377 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 3.0.0 >Reporter: zhengruifeng >Priority: Major > > Just found that {{AFTSurvivalRegression}} , {{DecisionTreeRegressor}}, > {{FMRegressor}}, {{GBTRegressor}}, {{RandomForestRegressor}} directly extend > {{Predictor}} > > Only {{GeneralizedLinearRegression}} and {{LinearRegression now extend > Regressor.}} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30082) Zeros are being treated as NaNs
[ https://issues.apache.org/jira/browse/SPARK-30082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-30082: Labels: correctness (was: ) > Zeros are being treated as NaNs > --- > > Key: SPARK-30082 > URL: https://issues.apache.org/jira/browse/SPARK-30082 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4 >Reporter: John Ayad >Assignee: John Ayad >Priority: Major > Labels: correctness > Fix For: 2.4.5, 3.0.0 > > > If you attempt to run > {code:java} > df = df.replace(float('nan'), somethingToReplaceWith) > {code} > It will replace all {{0}} s in columns of type {{Integer}} > Example code snippet to repro this: > {code:java} > from pyspark.sql import SQLContext > spark = SQLContext(sc).sparkSession > df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value")) > df.show() > df = df.replace(float('nan'), 5) > df.show() > {code} > Here's the output I get when I run this code: > {code:java} > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/__ / .__/\_,_/_/ /_/\_\ version 2.4.4 > /_/ > Using Python version 3.7.5 (default, Nov 1 2019 02:16:32) > SparkSession available as 'spark'. > >>> from pyspark.sql import SQLContext > >>> spark = SQLContext(sc).sparkSession > >>> df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value")) > >>> df.show() > +-+-+ > |index|value| > +-+-+ > |1|0| > |2|3| > |3|0| > +-+-+ > >>> df = df.replace(float('nan'), 5) > >>> df.show() > +-+-+ > |index|value| > +-+-+ > |1|5| > |2|3| > |3|5| > +-+-+ > >>> > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30196) Bump lz4-java version to 1.7.0
[ https://issues.apache.org/jira/browse/SPARK-30196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17005071#comment-17005071 ] Takeshi Yamamuro commented on SPARK-30196: -- Thanks for the report and I've filed an issue in lz4-java: [https://github.com/lz4/lz4-java/issues/156] > Bump lz4-java version to 1.7.0 > -- > > Key: SPARK-30196 > URL: https://issues.apache.org/jira/browse/SPARK-30196 > Project: Spark > Issue Type: Improvement > Components: Build, Spark Core >Affects Versions: 3.0.0 >Reporter: Takeshi Yamamuro >Assignee: Takeshi Yamamuro >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-29390) Add the justify_days(), justify_hours() and justify_interval() functions
[ https://issues.apache.org/jira/browse/SPARK-29390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li reopened SPARK-29390: - > Add the justify_days(), justify_hours() and justify_interval() functions > --- > > Key: SPARK-29390 > URL: https://issues.apache.org/jira/browse/SPARK-29390 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Assignee: Kent Yao >Priority: Major > Fix For: 3.0.0 > > > See *Table 9.31. Date/Time Functions* > ([https://www.postgresql.org/docs/12/functions-datetime.html)] > |{{justify_days(}}{{interval}}{{)}}|{{interval}}|Adjust interval so 30-day > time periods are represented as months|{{justify_days(interval '35 > days')}}|{{1 mon 5 days}}| > | {{justify_hours(}}{{interval}}{{)}}|{{interval}}|Adjust interval so 24-hour > time periods are represented as days|{{justify_hours(interval '27 > hours')}}|{{1 day 03:00:00}}| > | {{justify_interval(}}{{interval}}{{)}}|{{interval}}|Adjust interval using > {{justify_days}} and {{justify_hours}}, with additional sign > adjustments|{{justify_interval(interval '1 mon -1 hour')}}|{{29 days > 23:00:00}}| -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29390) Add the justify_days(), justify_hours() and justify_interval() functions
[ https://issues.apache.org/jira/browse/SPARK-29390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-29390. - Resolution: Later > Add the justify_days(), justify_hours() and justify_interval() functions > --- > > Key: SPARK-29390 > URL: https://issues.apache.org/jira/browse/SPARK-29390 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Assignee: Kent Yao >Priority: Major > Fix For: 3.0.0 > > > See *Table 9.31. Date/Time Functions* > ([https://www.postgresql.org/docs/12/functions-datetime.html)] > |{{justify_days(}}{{interval}}{{)}}|{{interval}}|Adjust interval so 30-day > time periods are represented as months|{{justify_days(interval '35 > days')}}|{{1 mon 5 days}}| > | {{justify_hours(}}{{interval}}{{)}}|{{interval}}|Adjust interval so 24-hour > time periods are represented as days|{{justify_hours(interval '27 > hours')}}|{{1 day 03:00:00}}| > | {{justify_interval(}}{{interval}}{{)}}|{{interval}}|Adjust interval using > {{justify_days}} and {{justify_hours}}, with additional sign > adjustments|{{justify_interval(interval '1 mon -1 hour')}}|{{29 days > 23:00:00}}| -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30196) Bump lz4-java version to 1.7.0
[ https://issues.apache.org/jira/browse/SPARK-30196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17005066#comment-17005066 ] Lars Francke commented on SPARK-30196: -- FYI: This seems to have broken Spark 3 on Mac OS for me due to {code:java} dyld: lazy symbol binding failed: Symbol not found: chkstk_darwin Referenced from: /private/var/folders/1v/ckh8py712_n_5r628_16w0l4gn/T/liblz4-java-820584040681098780.dylib (which was built for Mac OS X 10.15) Expected in: /usr/lib/libSystem.B.dylibdyld: Symbol not found: chkstk_darwin Referenced from: /private/var/folders/1v/ckh8py712_n_5r628_16w0l4gn/T/liblz4-java-820584040681098780.dylib (which was built for Mac OS X 10.15) Expected in: /usr/lib/libSystem.B.dylib {code} I did a bit of googling but I'm not sure what's going on. Reverting to 1.6 works for me. I'm on MacOS 10.13. Any hints are appreciated. If the lz4 stuff really only works with MacOS 10.15 that'd be sad but I can't really believe that. Has anyone tried Spark 3 Preview 2 on a Mac with 10.15/10.14/10.13? > Bump lz4-java version to 1.7.0 > -- > > Key: SPARK-30196 > URL: https://issues.apache.org/jira/browse/SPARK-30196 > Project: Spark > Issue Type: Improvement > Components: Build, Spark Core >Affects Versions: 3.0.0 >Reporter: Takeshi Yamamuro >Assignee: Takeshi Yamamuro >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-30342) Update LIST JAR/FILE command
[ https://issues.apache.org/jira/browse/SPARK-30342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-30342. -- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 26996 [https://github.com/apache/spark/pull/26996] > Update LIST JAR/FILE command > > > Key: SPARK-30342 > URL: https://issues.apache.org/jira/browse/SPARK-30342 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Rakesh Raushan >Assignee: Rakesh Raushan >Priority: Minor > Fix For: 3.0.0 > > > LIST FILE/JAR command is not documented properly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-30342) Update LIST JAR/FILE command
[ https://issues.apache.org/jira/browse/SPARK-30342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-30342: Assignee: Rakesh Raushan > Update LIST JAR/FILE command > > > Key: SPARK-30342 > URL: https://issues.apache.org/jira/browse/SPARK-30342 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Rakesh Raushan >Assignee: Rakesh Raushan >Priority: Minor > > LIST FILE/JAR command is not documented properly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29437) CSV Writer should escape 'escapechar' when it exists in the data
[ https://issues.apache.org/jira/browse/SPARK-29437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Raj Boudh resolved SPARK-29437. - Resolution: Not A Problem > CSV Writer should escape 'escapechar' when it exists in the data > > > Key: SPARK-29437 > URL: https://issues.apache.org/jira/browse/SPARK-29437 > Project: Spark > Issue Type: Bug > Components: Input/Output >Affects Versions: 2.4.3 >Reporter: Tomasz Bartczak >Priority: Trivial > > When the data contains escape character (default '\') it should either be > escaped or quoted. > Steps to reproduce: > [https://gist.github.com/kretes/58f7f66a0780681a44c175a2ac3c0da2] > > The effect can be either bad data read or sometimes even unable to properly > read the csv, e.g. when escape character is the last character in the column > - it break the column reading for that row and effectively break e.g. type > inference for a dataframe -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30130) Hardcoded numeric values in common table expressions which utilize GROUP BY are interpreted as ordinal positions
[ https://issues.apache.org/jira/browse/SPARK-30130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17004824#comment-17004824 ] Ankit Raj Boudh commented on SPARK-30130: - [~hyukjin.kwon], please confirm, it's require to fix ? then i will start working on this jira. > Hardcoded numeric values in common table expressions which utilize GROUP BY > are interpreted as ordinal positions > > > Key: SPARK-30130 > URL: https://issues.apache.org/jira/browse/SPARK-30130 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4 >Reporter: Matt Boegner >Priority: Minor > > Hardcoded numeric values in common table expressions which utilize GROUP BY > are interpreted as ordinal positions. > {code:java} > val df = spark.sql(""" > with a as (select 0 as test, count(*) group by test) > select * from a > """) > df.show(){code} > This results in an error message like {color:#e01e5a}GROUP BY position 0 is > not in select list (valid range is [1, 2]){color} . > > However, this error does not appear in a traditional subselect format. For > example, this query executes correctly: > {code:java} > val df = spark.sql(""" > select * from (select 0 as test, count(*) group by test) a > """) > df.show(){code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30383) Remove meaning less tooltip from Executor Tab
[ https://issues.apache.org/jira/browse/SPARK-30383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17004794#comment-17004794 ] Ankit Raj Boudh commented on SPARK-30383: - I think not only Executor Tab but all pages need to check, i will check in all the pages and will submit PR today > Remove meaning less tooltip from Executor Tab > -- > > Key: SPARK-30383 > URL: https://issues.apache.org/jira/browse/SPARK-30383 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.0.0 >Reporter: ABHISHEK KUMAR GUPTA >Priority: Minor > > There are tooltips display as it is Like Disk Used, Total Tasks in Executor > Table under Executor Tab. > Should improve and remove meaning less Tool Tips. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30377) Make Regressors extend abstract class Regressor
[ https://issues.apache.org/jira/browse/SPARK-30377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17004793#comment-17004793 ] Sean R. Owen commented on SPARK-30377: -- It seems logical; what would the API difference be? > Make Regressors extend abstract class Regressor > --- > > Key: SPARK-30377 > URL: https://issues.apache.org/jira/browse/SPARK-30377 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 3.0.0 >Reporter: zhengruifeng >Priority: Major > > Just found that {{AFTSurvivalRegression}} , {{DecisionTreeRegressor}}, > {{FMRegressor}}, {{GBTRegressor}}, {{RandomForestRegressor}} directly extend > {{Predictor}} > > Only {{GeneralizedLinearRegression}} and {{LinearRegression now extend > Regressor.}} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30383) Remove meaning less tooltip from Executor Tab
[ https://issues.apache.org/jira/browse/SPARK-30383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17004792#comment-17004792 ] Ankit Raj Boudh commented on SPARK-30383: - i will submit the PR > Remove meaning less tooltip from Executor Tab > -- > > Key: SPARK-30383 > URL: https://issues.apache.org/jira/browse/SPARK-30383 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.0.0 >Reporter: ABHISHEK KUMAR GUPTA >Priority: Minor > > There are tooltips display as it is Like Disk Used, Total Tasks in Executor > Table under Executor Tab. > Should improve and remove meaning less Tool Tips. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30383) Remove meaning less tooltip from Executor Tab
ABHISHEK KUMAR GUPTA created SPARK-30383: Summary: Remove meaning less tooltip from Executor Tab Key: SPARK-30383 URL: https://issues.apache.org/jira/browse/SPARK-30383 Project: Spark Issue Type: Improvement Components: Web UI Affects Versions: 3.0.0 Reporter: ABHISHEK KUMAR GUPTA There are tooltips display as it is Like Disk Used, Total Tasks in Executor Table under Executor Tab. Should improve and remove meaning less Tool Tips. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27764) Feature Parity between PostgreSQL and Spark
[ https://issues.apache.org/jira/browse/SPARK-27764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17004786#comment-17004786 ] Takeshi Yamamuro commented on SPARK-27764: -- I've created two new umbrella tickets, SPARK-30374 and SPARK-30375, for ANSI-related issues and implementation-dependent issues, then moved some tickets there (See the description above for more details). If you have any problem, please let me know. > Feature Parity between PostgreSQL and Spark > --- > > Key: SPARK-27764 > URL: https://issues.apache.org/jira/browse/SPARK-27764 > Project: Spark > Issue Type: Umbrella > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Major > > PostgreSQL is one of the most advanced open source databases. This umbrella > Jira is trying to track the missing features and bugs. > UPDATED: This umbrella tickets basically intend to include bug reports and > general issues for the feature parity. For implementation-dependent > behaviours and ANS/SQL standard topics, you need to check the two umbrella > below; > - SPARK-30374 Feature Parity between PostgreSQL and Spark (ANSI/SQL) > - SPARK-30375 Feature Parity between PostgreSQL and Spark > (implementation-dependent behaviours) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30382) start-thriftserver throws ClassNotFoundException
Ajith S created SPARK-30382: --- Summary: start-thriftserver throws ClassNotFoundException Key: SPARK-30382 URL: https://issues.apache.org/jira/browse/SPARK-30382 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0 Reporter: Ajith S start-thriftserver.sh --help throws {code} . Thrift server options: Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/logging/log4j/spi/LoggerContextFactory at org.apache.hive.service.server.HiveServer2.main(HiveServer2.java:167) at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:82) at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala) Caused by: java.lang.ClassNotFoundException: org.apache.logging.log4j.spi.LoggerContextFactory at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 3 more {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28429) SQL Datetime util function being casted to double instead of timestamp
[ https://issues.apache.org/jira/browse/SPARK-28429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro updated SPARK-28429: - Parent Issue: SPARK-27764 (was: SPARK-30375) > SQL Datetime util function being casted to double instead of timestamp > -- > > Key: SPARK-28429 > URL: https://issues.apache.org/jira/browse/SPARK-28429 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Dylan Guedes >Priority: Major > > In the code below, 'now()+'100 days' are casted to double and then an error > is thrown: > {code:sql} > CREATE TEMP VIEW v_window AS > SELECT i, min(i) over (order by i range between '1 day' preceding and '10 > days' following) as min_i > FROM range(now(), now()+'100 days', '1 hour') i; > {code} > Error: > {code:sql} > cannot resolve '(current_timestamp() + CAST('100 days' AS DOUBLE))' due to > data type mismatch: differing types in '(current_timestamp() + CAST('100 > days' AS DOUBLE))' (timestamp and double).;{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29659) Support COMMENT ON syntax
[ https://issues.apache.org/jira/browse/SPARK-29659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro updated SPARK-29659: - Parent Issue: SPARK-27764 (was: SPARK-30375) > Support COMMENT ON syntax > - > > Key: SPARK-29659 > URL: https://issues.apache.org/jira/browse/SPARK-29659 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Takeshi Yamamuro >Priority: Major > > [https://www.postgresql.org/docs/current/sql-comment.html] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27987) Support POSIX Regular Expressions
[ https://issues.apache.org/jira/browse/SPARK-27987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro updated SPARK-27987: - Parent Issue: SPARK-27764 (was: SPARK-30375) > Support POSIX Regular Expressions > - > > Key: SPARK-27987 > URL: https://issues.apache.org/jira/browse/SPARK-27987 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > POSIX regular expressions provide a more powerful means for pattern matching > than the LIKE and SIMILAR TO operators. Many Unix tools such as egrep, sed, > or awk use a pattern matching language that is similar to the one described > here. > ||Operator||Description||Example|| > |{{~}}|Matches regular expression, case sensitive|{{'thomas' ~ '.*thomas.*'}}| > |{{~*}}|Matches regular expression, case insensitive|{{'thomas' ~* > '.*Thomas.*'}}| > |{{!~}}|Does not match regular expression, case sensitive|{{'thomas' !~ > '.*Thomas.*'}}| > |{{!~*}}|Does not match regular expression, case insensitive|{{'thomas' !~* > '.*vadim.*'}}| > https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-POSIX-REGEXP -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27764) Feature Parity between PostgreSQL and Spark
[ https://issues.apache.org/jira/browse/SPARK-27764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro updated SPARK-27764: - Description: PostgreSQL is one of the most advanced open source databases. This umbrella Jira is trying to track the missing features and bugs. UPDATED: This umbrella tickets basically intend to include bug reports and general issues for the feature parity. For implementation-dependent behaviours and ANS/SQL standard topics, you need to check the two umbrella below; - SPARK-30374 Feature Parity between PostgreSQL and Spark (ANSI/SQL) - SPARK-30375 Feature Parity between PostgreSQL and Spark (implementation-dependent behaviours) was: PostgreSQL is one of the most advanced open source databases. This umbrella Jira is trying to track the missing features and bugs. UPDATE: This umbrella tickets basically intend to include bug reports and general issues for the feature parity. For implementation-dependent behaviours and ANS/SQL standard topics, you need to check the two umbrella below; - SPARK-30374 Feature Parity between PostgreSQL and Spark (ANSI/SQL) - SPARK-30375 Feature Parity between PostgreSQL and Spark (implementation-dependent behaviours) > Feature Parity between PostgreSQL and Spark > --- > > Key: SPARK-27764 > URL: https://issues.apache.org/jira/browse/SPARK-27764 > Project: Spark > Issue Type: Umbrella > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Major > > PostgreSQL is one of the most advanced open source databases. This umbrella > Jira is trying to track the missing features and bugs. > UPDATED: This umbrella tickets basically intend to include bug reports and > general issues for the feature parity. For implementation-dependent > behaviours and ANS/SQL standard topics, you need to check the two umbrella > below; > - SPARK-30374 Feature Parity between PostgreSQL and Spark (ANSI/SQL) > - SPARK-30375 Feature Parity between PostgreSQL and Spark > (implementation-dependent behaviours) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28402) Array indexing is 1-based
[ https://issues.apache.org/jira/browse/SPARK-28402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro resolved SPARK-28402. -- Resolution: Won't Fix > Array indexing is 1-based > - > > Key: SPARK-28402 > URL: https://issues.apache.org/jira/browse/SPARK-28402 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Peter Toth >Priority: Major > > Array indexing is 1-based in PostgreSQL: > [https://www.postgresql.org/docs/12/arrays.html] > > {quote}The array subscript numbers are written within square brackets. By > default PostgreSQL uses a one-based numbering convention for arrays, that is, > an array of _{{n}}_ elements starts with {{array[1]}} and ends with > {{array[_{{n}}_]}}.{quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28451) substr returns different values
[ https://issues.apache.org/jira/browse/SPARK-28451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro resolved SPARK-28451. -- Resolution: Won't Fix > substr returns different values > --- > > Key: SPARK-28451 > URL: https://issues.apache.org/jira/browse/SPARK-28451 > Project: Spark > Issue Type: Sub-task > Components: Tests >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > PostgreSQL: > {noformat} > postgres=# select substr('1234567890', -1, 5); > substr > > 123 > (1 row) > postgres=# select substr('1234567890', 1, -1); > ERROR: negative substring length not allowed > {noformat} > Spark SQL: > {noformat} > spark-sql> select substr('1234567890', -1, 5); > 0 > spark-sql> select substr('1234567890', 1, -1); > spark-sql> > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28451) substr returns different values
[ https://issues.apache.org/jira/browse/SPARK-28451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17004771#comment-17004771 ] Takeshi Yamamuro commented on SPARK-28451: -- I'll close this for now based on the discussion above. Thanks, all. > substr returns different values > --- > > Key: SPARK-28451 > URL: https://issues.apache.org/jira/browse/SPARK-28451 > Project: Spark > Issue Type: Sub-task > Components: Tests >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > PostgreSQL: > {noformat} > postgres=# select substr('1234567890', -1, 5); > substr > > 123 > (1 row) > postgres=# select substr('1234567890', 1, -1); > ERROR: negative substring length not allowed > {noformat} > Spark SQL: > {noformat} > spark-sql> select substr('1234567890', -1, 5); > 0 > spark-sql> select substr('1234567890', 1, -1); > spark-sql> > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28296) Improved VALUES support
[ https://issues.apache.org/jira/browse/SPARK-28296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro updated SPARK-28296: - Parent Issue: SPARK-27764 (was: SPARK-30375) > Improved VALUES support > --- > > Key: SPARK-28296 > URL: https://issues.apache.org/jira/browse/SPARK-28296 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Peter Toth >Priority: Major > > These are valid queries in PostgreSQL, but they don't work in Spark SQL: > {noformat} > values ((select 1)); > values ((select c from test1)); > select (values(c)) from test10; > with cte(foo) as ( values(42) ) values((select foo from cte)); > {noformat} > where test1 and test10: > {noformat} > CREATE TABLE test1 (c INTEGER); > INSERT INTO test1 VALUES(1); > CREATE TABLE test10 (c INTEGER); > INSERT INTO test10 SELECT generate_sequence(1, 10); > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27978) Add built-in Aggregate Functions: string_agg
[ https://issues.apache.org/jira/browse/SPARK-27978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17004762#comment-17004762 ] Takeshi Yamamuro commented on SPARK-27978: -- I'll close this for now because I think the workaround above is enough. If necessary, please reopen this. > Add built-in Aggregate Functions: string_agg > > > Key: SPARK-27978 > URL: https://issues.apache.org/jira/browse/SPARK-27978 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > ||Function||Argument Type(s)||Return Type||Partial Mode||Description|| > |string_agg(_{{expression}}_,_{{delimiter}}_)|({{text}}, {{text}}) or > ({{bytea}}, {{bytea}})|same as argument types|No|input values concatenated > into a string, separated by delimiter| > https://www.postgresql.org/docs/current/functions-aggregate.html > We can workaround it by concat_ws(_{{delimiter}}_, > collect_list(_{{expression}}_)) currently. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27978) Add built-in Aggregate Functions: string_agg
[ https://issues.apache.org/jira/browse/SPARK-27978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro resolved SPARK-27978. -- Resolution: Won't Fix > Add built-in Aggregate Functions: string_agg > > > Key: SPARK-27978 > URL: https://issues.apache.org/jira/browse/SPARK-27978 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > ||Function||Argument Type(s)||Return Type||Partial Mode||Description|| > |string_agg(_{{expression}}_,_{{delimiter}}_)|({{text}}, {{text}}) or > ({{bytea}}, {{bytea}})|same as argument types|No|input values concatenated > into a string, separated by delimiter| > https://www.postgresql.org/docs/current/functions-aggregate.html > We can workaround it by concat_ws(_{{delimiter}}_, > collect_list(_{{expression}}_)) currently. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29891) Add built-in Array Functions: array_length
[ https://issues.apache.org/jira/browse/SPARK-29891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17004760#comment-17004760 ] Takeshi Yamamuro commented on SPARK-29891: -- I'll close this for now because our length is enough for this use case. If necessary, please reopen this. > Add built-in Array Functions: array_length > -- > > Key: SPARK-29891 > URL: https://issues.apache.org/jira/browse/SPARK-29891 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: jiaan.geng >Priority: Major > > |{{array_length}}{{(}}{{anyarray}}{{, }}{{int}}{{)}}|{{int}}|returns the > length of the requested array dimension|{{array_length(array[1,2,3], > 1)}}|{{3}}| > | | | | | | > Other DBs: > [https://phoenix.apache.org/language/functions.html#array_length] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29891) Add built-in Array Functions: array_length
[ https://issues.apache.org/jira/browse/SPARK-29891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro resolved SPARK-29891. -- Resolution: Won't Fix > Add built-in Array Functions: array_length > -- > > Key: SPARK-29891 > URL: https://issues.apache.org/jira/browse/SPARK-29891 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: jiaan.geng >Priority: Major > > |{{array_length}}{{(}}{{anyarray}}{{, }}{{int}}{{)}}|{{int}}|returns the > length of the requested array dimension|{{array_length(array[1,2,3], > 1)}}|{{3}}| > | | | | | | > Other DBs: > [https://phoenix.apache.org/language/functions.html#array_length] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29984) Add built-in Array Functions: array_ndims
[ https://issues.apache.org/jira/browse/SPARK-29984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17004759#comment-17004759 ] Takeshi Yamamuro commented on SPARK-29984: -- I'll close this for now because I cannot find a strong reason to support this. If necessary, please reopen this. > Add built-in Array Functions: array_ndims > - > > Key: SPARK-29984 > URL: https://issues.apache.org/jira/browse/SPARK-29984 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: jiaan.geng >Priority: Major > > |{{array_ndims}}{{(}}{{anyarray}}{{)}}|{{int}}|returns the number of > dimensions of the array|{{array_ndims(ARRAY[[1,2,3], [4,5,6]])}}|{{2}}| > [https://www.postgresql.org/docs/11/functions-array.html] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29984) Add built-in Array Functions: array_ndims
[ https://issues.apache.org/jira/browse/SPARK-29984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro resolved SPARK-29984. -- Resolution: Won't Fix > Add built-in Array Functions: array_ndims > - > > Key: SPARK-29984 > URL: https://issues.apache.org/jira/browse/SPARK-29984 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: jiaan.geng >Priority: Major > > |{{array_ndims}}{{(}}{{anyarray}}{{)}}|{{int}}|returns the number of > dimensions of the array|{{array_ndims(ARRAY[[1,2,3], [4,5,6]])}}|{{2}}| > [https://www.postgresql.org/docs/11/functions-array.html] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28037) Add built-in String Functions: quote_literal
[ https://issues.apache.org/jira/browse/SPARK-28037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17004756#comment-17004756 ] Takeshi Yamamuro commented on SPARK-28037: -- I'll close this for now because I'm not sure this feature is useful for Spark. If necessary, please reopen this. > Add built-in String Functions: quote_literal > > > Key: SPARK-28037 > URL: https://issues.apache.org/jira/browse/SPARK-28037 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > ||Function||Return Type||Description||Example||Result|| > |{{quote_literal(_{{string}}_ }}{{text}}{{)}}|{{text}}|Return the given > string suitably quoted to be used as a string literal in an SQL statement > string. Embedded single-quotes and backslashes are properly doubled. Note > that {{quote_literal}} returns null on null input; if the argument might be > null, {{quote_nullable}} is often more suitable. See also [Example > 43.1|https://www.postgresql.org/docs/11/plpgsql-statements.html#PLPGSQL-QUOTE-LITERAL-EXAMPLE].|{{quote_literal(E'O\'Reilly')}}|{{'O''Reilly'}}| > |{{quote_literal(_{{value}}_ }}{{anyelement}}{{)}}|{{text}}|Coerce the given > value to text and then quote it as a literal. Embedded single-quotes and > backslashes are properly doubled.|{{quote_literal(42.5)}}|{{'42.5'}}| > https://www.postgresql.org/docs/11/functions-string.html > https://docs.aws.amazon.com/redshift/latest/dg/r_QUOTE_LITERAL.html > https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/Functions/String/QUOTE_LITERAL.htm?tocpath=SQL%20Reference%20Manual%7CSQL%20Functions%7CString%20Functions%7C_38 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28037) Add built-in String Functions: quote_literal
[ https://issues.apache.org/jira/browse/SPARK-28037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro resolved SPARK-28037. -- Resolution: Won't Fix > Add built-in String Functions: quote_literal > > > Key: SPARK-28037 > URL: https://issues.apache.org/jira/browse/SPARK-28037 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > ||Function||Return Type||Description||Example||Result|| > |{{quote_literal(_{{string}}_ }}{{text}}{{)}}|{{text}}|Return the given > string suitably quoted to be used as a string literal in an SQL statement > string. Embedded single-quotes and backslashes are properly doubled. Note > that {{quote_literal}} returns null on null input; if the argument might be > null, {{quote_nullable}} is often more suitable. See also [Example > 43.1|https://www.postgresql.org/docs/11/plpgsql-statements.html#PLPGSQL-QUOTE-LITERAL-EXAMPLE].|{{quote_literal(E'O\'Reilly')}}|{{'O''Reilly'}}| > |{{quote_literal(_{{value}}_ }}{{anyelement}}{{)}}|{{text}}|Coerce the given > value to text and then quote it as a literal. Embedded single-quotes and > backslashes are properly doubled.|{{quote_literal(42.5)}}|{{'42.5'}}| > https://www.postgresql.org/docs/11/functions-string.html > https://docs.aws.amazon.com/redshift/latest/dg/r_QUOTE_LITERAL.html > https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/Functions/String/QUOTE_LITERAL.htm?tocpath=SQL%20Reference%20Manual%7CSQL%20Functions%7CString%20Functions%7C_38 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-30043) Add built-in Array Functions: array_fill
[ https://issues.apache.org/jira/browse/SPARK-30043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro resolved SPARK-30043. -- Resolution: Won't Fix > Add built-in Array Functions: array_fill > > > Key: SPARK-30043 > URL: https://issues.apache.org/jira/browse/SPARK-30043 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: jiaan.geng >Priority: Major > > |{{array_fill}}{{(}}{{anyelement}}{{, }}{{int[]}}{{ [, > {{int[]}}])}}|{{anyarray}}|returns an array initialized with supplied value > and dimensions, optionally with lower bounds other than 1|{{array_fill(7, > ARRAY[3], ARRAY[2])}}|{{[2:4]=\{7,7,7}}}| > [https://www.postgresql.org/docs/11/functions-array.html] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30043) Add built-in Array Functions: array_fill
[ https://issues.apache.org/jira/browse/SPARK-30043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17004755#comment-17004755 ] Takeshi Yamamuro commented on SPARK-30043: -- I'll close this for now because I cannot find a strong reason to support this. If necessary, please reopen this. Thanks. > Add built-in Array Functions: array_fill > > > Key: SPARK-30043 > URL: https://issues.apache.org/jira/browse/SPARK-30043 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: jiaan.geng >Priority: Major > > |{{array_fill}}{{(}}{{anyelement}}{{, }}{{int[]}}{{ [, > {{int[]}}])}}|{{anyarray}}|returns an array initialized with supplied value > and dimensions, optionally with lower bounds other than 1|{{array_fill(7, > ARRAY[3], ARRAY[2])}}|{{[2:4]=\{7,7,7}}}| > [https://www.postgresql.org/docs/11/functions-array.html] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30182) Support nested aggregates
[ https://issues.apache.org/jira/browse/SPARK-30182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro updated SPARK-30182: - Parent Issue: SPARK-27764 (was: SPARK-30375) > Support nested aggregates > - > > Key: SPARK-30182 > URL: https://issues.apache.org/jira/browse/SPARK-30182 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: jiaan.geng >Priority: Major > > Spark SQL cannot supports a SQL with nested aggregate as below: > {code:java} > SELECT sum(salary), row_number() OVER (ORDER BY depname), sum( > sum(salary) FILTER (WHERE enroll_date > '2007-01-01') > ) FILTER (WHERE depname <> 'sales') OVER (ORDER BY depname DESC) AS > "filtered_sum", > depname > FROM empsalary GROUP BY depname;{code} > And Spark will throw exception as follows: > {code:java} > org.apache.spark.sql.AnalysisException > It is not allowed to use an aggregate function in the argument of another > aggregate function. Please use the inner aggregate function in a > sub-query.{code} > But PostgreSQL supports this syntax. > {code:java} > SELECT sum(salary), row_number() OVER (ORDER BY depname), sum( > sum(salary) FILTER (WHERE enroll_date > '2007-01-01') > ) FILTER (WHERE depname <> 'sales') OVER (ORDER BY depname DESC) AS > "filtered_sum", > depname > FROM empsalary GROUP BY depname; > sum | row_number | filtered_sum | depname > ---++--+--- > 25100 | 1 | 22600 | develop > 7400 | 2 | 3500 | personnel > 14600 | 3 | | sales > (3 rows){code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28036) Support negative length at LEFT/RIGHT SQL functions
[ https://issues.apache.org/jira/browse/SPARK-28036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro resolved SPARK-28036. -- Resolution: Won't Fix > Support negative length at LEFT/RIGHT SQL functions > --- > > Key: SPARK-28036 > URL: https://issues.apache.org/jira/browse/SPARK-28036 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > PostgreSQL: > {code:sql} > postgres=# select left('ahoj', -2), right('ahoj', -2); > left | right > --+--- > ah | oj > (1 row) > {code} > Spark SQL: > {code:sql} > spark-sql> select left('ahoj', -2), right('ahoj', -2); > spark-sql> > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28036) Support negative length at LEFT/RIGHT SQL functions
[ https://issues.apache.org/jira/browse/SPARK-28036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17004748#comment-17004748 ] Takeshi Yamamuro commented on SPARK-28036: -- I'll close this for now because I cannot find a strong reason to support this. If necessary, please reopen this. > Support negative length at LEFT/RIGHT SQL functions > --- > > Key: SPARK-28036 > URL: https://issues.apache.org/jira/browse/SPARK-28036 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > PostgreSQL: > {code:sql} > postgres=# select left('ahoj', -2), right('ahoj', -2); > left | right > --+--- > ah | oj > (1 row) > {code} > Spark SQL: > {code:sql} > spark-sql> select left('ahoj', -2), right('ahoj', -2); > spark-sql> > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30042) Add built-in Array Functions: array_dims
[ https://issues.apache.org/jira/browse/SPARK-30042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17004747#comment-17004747 ] Takeshi Yamamuro commented on SPARK-30042: -- I'll close this for now based on the discussion above. If necessary, please reopen this. > Add built-in Array Functions: array_dims > > > Key: SPARK-30042 > URL: https://issues.apache.org/jira/browse/SPARK-30042 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: jiaan.geng >Priority: Major > > |{{array_dims}}{{(}}{{anyarray}}{{)}}|{{text}}|returns a text representation > of array's dimensions|{{array_dims(ARRAY[[1,2,3], [4,5,6]])}}|{{[1:2][1:3]}}| > [https://www.postgresql.org/docs/11/functions-array.html] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-30042) Add built-in Array Functions: array_dims
[ https://issues.apache.org/jira/browse/SPARK-30042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro resolved SPARK-30042. -- Resolution: Won't Fix > Add built-in Array Functions: array_dims > > > Key: SPARK-30042 > URL: https://issues.apache.org/jira/browse/SPARK-30042 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: jiaan.geng >Priority: Major > > |{{array_dims}}{{(}}{{anyarray}}{{)}}|{{text}}|returns a text representation > of array's dimensions|{{array_dims(ARRAY[[1,2,3], [4,5,6]])}}|{{[1:2][1:3]}}| > [https://www.postgresql.org/docs/11/functions-array.html] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28516) Data Type Formatting Functions: `to_char`
[ https://issues.apache.org/jira/browse/SPARK-28516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro resolved SPARK-28516. -- Resolution: Won't Fix > Data Type Formatting Functions: `to_char` > - > > Key: SPARK-28516 > URL: https://issues.apache.org/jira/browse/SPARK-28516 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Dylan Guedes >Priority: Major > > Currently, Spark does not have support for `to_char`. PgSQL, however, > [does|[https://www.postgresql.org/docs/12/functions-formatting.html]]: > Query example: > {code:sql} > SELECT to_char(SUM(n) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND 1 > FOLLOWING),'9D9') > {code} > ||Function||Return Type||Description||Example|| > |{{to_char(}}{{timestamp}}{{, }}{{text}}{{)}}|{{text}}|convert time stamp to > string|{{to_char(current_timestamp, 'HH12:MI:SS')}}| > |{{to_char(}}{{interval}}{{, }}{{text}}{{)}}|{{text}}|convert interval to > string|{{to_char(interval '15h 2m 12s', 'HH24:MI:SS')}}| > |{{to_char(}}{{int}}{{, }}{{text}}{{)}}|{{text}}|convert integer to > string|{{to_char(125, '999')}}| > |{{to_char}}{{(}}{{double precision}}{{, }}{{text}}{{)}}|{{text}}|convert > real/double precision to string|{{to_char(125.8::real, '999D9')}}| > |{{to_char(}}{{numeric}}{{, }}{{text}}{{)}}|{{text}}|convert numeric to > string|{{to_char(-125.8, '999D99S')}}| -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-28516) Data Type Formatting Functions: `to_char`
[ https://issues.apache.org/jira/browse/SPARK-28516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17004746#comment-17004746 ] Takeshi Yamamuro edited comment on SPARK-28516 at 12/29/19 11:20 AM: - I'll close this for now because I'm not sure this feature is useful for Spark. If necessary, please reopen this. Thanks. was (Author: maropu): I'll close this because I'm not sure this feature is useful for Spark. If necessary, please reopen this. Thanks. > Data Type Formatting Functions: `to_char` > - > > Key: SPARK-28516 > URL: https://issues.apache.org/jira/browse/SPARK-28516 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Dylan Guedes >Priority: Major > > Currently, Spark does not have support for `to_char`. PgSQL, however, > [does|[https://www.postgresql.org/docs/12/functions-formatting.html]]: > Query example: > {code:sql} > SELECT to_char(SUM(n) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND 1 > FOLLOWING),'9D9') > {code} > ||Function||Return Type||Description||Example|| > |{{to_char(}}{{timestamp}}{{, }}{{text}}{{)}}|{{text}}|convert time stamp to > string|{{to_char(current_timestamp, 'HH12:MI:SS')}}| > |{{to_char(}}{{interval}}{{, }}{{text}}{{)}}|{{text}}|convert interval to > string|{{to_char(interval '15h 2m 12s', 'HH24:MI:SS')}}| > |{{to_char(}}{{int}}{{, }}{{text}}{{)}}|{{text}}|convert integer to > string|{{to_char(125, '999')}}| > |{{to_char}}{{(}}{{double precision}}{{, }}{{text}}{{)}}|{{text}}|convert > real/double precision to string|{{to_char(125.8::real, '999D9')}}| > |{{to_char(}}{{numeric}}{{, }}{{text}}{{)}}|{{text}}|convert numeric to > string|{{to_char(-125.8, '999D99S')}}| -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28516) Data Type Formatting Functions: `to_char`
[ https://issues.apache.org/jira/browse/SPARK-28516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17004746#comment-17004746 ] Takeshi Yamamuro commented on SPARK-28516: -- I'll close this because I'm not sure this feature is useful for Spark. If necessary, please reopen this. Thanks. > Data Type Formatting Functions: `to_char` > - > > Key: SPARK-28516 > URL: https://issues.apache.org/jira/browse/SPARK-28516 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Dylan Guedes >Priority: Major > > Currently, Spark does not have support for `to_char`. PgSQL, however, > [does|[https://www.postgresql.org/docs/12/functions-formatting.html]]: > Query example: > {code:sql} > SELECT to_char(SUM(n) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND 1 > FOLLOWING),'9D9') > {code} > ||Function||Return Type||Description||Example|| > |{{to_char(}}{{timestamp}}{{, }}{{text}}{{)}}|{{text}}|convert time stamp to > string|{{to_char(current_timestamp, 'HH12:MI:SS')}}| > |{{to_char(}}{{interval}}{{, }}{{text}}{{)}}|{{text}}|convert interval to > string|{{to_char(interval '15h 2m 12s', 'HH24:MI:SS')}}| > |{{to_char(}}{{int}}{{, }}{{text}}{{)}}|{{text}}|convert integer to > string|{{to_char(125, '999')}}| > |{{to_char}}{{(}}{{double precision}}{{, }}{{text}}{{)}}|{{text}}|convert > real/double precision to string|{{to_char(125.8::real, '999D9')}}| > |{{to_char(}}{{numeric}}{{, }}{{text}}{{)}}|{{text}}|convert numeric to > string|{{to_char(-125.8, '999D99S')}}| -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28490) Support `TIME` type in Spark
[ https://issues.apache.org/jira/browse/SPARK-28490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro resolved SPARK-28490. -- Resolution: Duplicate > Support `TIME` type in Spark > > > Key: SPARK-28490 > URL: https://issues.apache.org/jira/browse/SPARK-28490 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Zhu, Lipeng >Priority: Major > > Support the TIME type and related time operators in Spark. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28865) Table inheritance
[ https://issues.apache.org/jira/browse/SPARK-28865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro updated SPARK-28865: - Parent Issue: SPARK-30375 (was: SPARK-27764) > Table inheritance > - > > Key: SPARK-28865 > URL: https://issues.apache.org/jira/browse/SPARK-28865 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > PostgreSQL implements table inheritance, which can be a useful tool for > database designers. (SQL:1999 and later define a type inheritance feature, > which differs in many respects from the features described here.) > > [https://www.postgresql.org/docs/11/ddl-inherit.html|https://www.postgresql.org/docs/9.5/ddl-inherit.html] > [https://www.postgresql.org/docs/11/tutorial-inheritance.html] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28687) Support `epoch`, `isoyear`, `milliseconds` and `microseconds` at `extract()`
[ https://issues.apache.org/jira/browse/SPARK-28687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro updated SPARK-28687: - Parent Issue: SPARK-30375 (was: SPARK-27764) > Support `epoch`, `isoyear`, `milliseconds` and `microseconds` at `extract()` > > > Key: SPARK-28687 > URL: https://issues.apache.org/jira/browse/SPARK-28687 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.0.0 > > > Currently, we support these field for EXTRACT: CENTURY, MILLENNIUM, DECADE, > YEAR, QUARTER, MONTH, WEEK, DAY, DAYOFWEEK, HOUR, MINUTE, SECOND, DOW, > ISODOW, DOY, > We also need support: EPOCH, MICROSECONDS, MILLISECONDS, TIMEZONE, > TIMEZONE_M, TIMEZONE_H, ISOYEAR. > https://www.postgresql.org/docs/11/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28623) Support `dow`, `isodow` and `doy` at `extract()`
[ https://issues.apache.org/jira/browse/SPARK-28623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro updated SPARK-28623: - Parent Issue: SPARK-30375 (was: SPARK-27764) > Support `dow`, `isodow` and `doy` at `extract()` > > > Key: SPARK-28623 > URL: https://issues.apache.org/jira/browse/SPARK-28623 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.0.0 > > > Currently, we support these field for EXTRACT: YEAR, QUARTER, MONTH, WEEK, > DAY, DAYOFWEEK, HOUR, MINUTE, SECOND. > We also need support: EPOCH, CENTURY, MILLENNIUM, DECADE, MICROSECONDS, > MILLISECONDS, DOW, ISODOW, DOY, TIMEZONE, TIMEZONE_M, TIMEZONE_H, JULIAN, > ISOYEAR. > https://www.postgresql.org/docs/11/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30381) GBT reuse splits for all trees
zhengruifeng created SPARK-30381: Summary: GBT reuse splits for all trees Key: SPARK-30381 URL: https://issues.apache.org/jira/browse/SPARK-30381 Project: Spark Issue Type: Improvement Components: ML Affects Versions: 3.0.0 Reporter: zhengruifeng Assignee: zhengruifeng In existing GBT, for each tree, it will first compute avaiable splits of each feature (via RandomForest.findSplits), based on sampled dataset at this iteration. Then it will use these splits to discretize vectors into BaggedPoint[TreePoint]s. The BaggedPoints (of the same size of input vectors) are then cached and used at this iteration. Note that the splits for discretization in each tree are different (if subsamplingRate<1), only because the sampled vectors are different. However, the splits at different iterations shoud be similar if sampled dataset is big enough, and even the same if subsamplingRate=1. However, in other famous GBT impls (like XGBoost/lightGBM) with binned features, the splits for discretization is the same for different iterations: {code:java} import xgboost as xgb from sklearn.datasets import load_svmlight_file X, y = load_svmlight_file('/data0/Dev/Opensource/spark/data/mllib/sample_linear_regression_data.txt') dtrain = xgb.DMatrix(X[:, :2], label=y) num_round = 3 param = {'max_depth': 2, 'eta': 1, 'objective': 'reg:squarederror', 'tree_method': 'hist', 'max_bin': 2, 'eta': 0.01, 'subsample':0.5} bst = xgb.train(param, dtrain, num_round) bst.trees_to_dataframe('/tmp/bst') Out[61]: Tree Node ID Feature Split Yes No MissingGain Cover 0 0 0 0-0 f1 0.000408 0-1 0-2 0-1 170.337143 256.0 1 0 1 0-1 f0 0.003531 0-3 0-4 0-3 44.865482 121.0 2 0 2 0-2 f0 0.003531 0-5 0-6 0-5 125.615570 135.0 3 0 3 0-3Leaf NaN NaN NaN NaN -0.010050 67.0 4 0 4 0-4Leaf NaN NaN NaN NaN0.002126 54.0 5 0 5 0-5Leaf NaN NaN NaN NaN0.020972 69.0 6 0 6 0-6Leaf NaN NaN NaN NaN0.001714 66.0 7 1 0 1-0 f0 0.003531 1-1 1-2 1-1 50.417793 263.0 8 1 1 1-1 f1 0.000408 1-3 1-4 1-3 48.732742 124.0 9 1 2 1-2 f1 0.000408 1-5 1-6 1-5 52.832161 139.0 10 1 3 1-3Leaf NaN NaN NaN NaN -0.012784 63.0 11 1 4 1-4Leaf NaN NaN NaN NaN -0.000287 61.0 12 1 5 1-5Leaf NaN NaN NaN NaN0.008661 64.0 13 1 6 1-6Leaf NaN NaN NaN NaN -0.003624 75.0 14 2 0 2-0 f1 0.000408 2-1 2-2 2-1 62.136013 242.0 15 2 1 2-1 f0 0.003531 2-3 2-4 2-3 150.537781 118.0 16 2 2 2-2 f0 0.003531 2-5 2-6 2-53.829046 124.0 17 2 3 2-3Leaf NaN NaN NaN NaN -0.016737 65.0 18 2 4 2-4Leaf NaN NaN NaN NaN0.005809 53.0 19 2 5 2-5Leaf NaN NaN NaN NaN0.005251 60.0 20 2 6 2-6Leaf NaN NaN NaN NaN0.001709 64.0 {code} We can see that even if we set subsample=0.5, the three trees share the same splits. So I think we could reuse the splits and treePoints at all iterations: at iteration=0, compute the splits on whole training dataset, and use the splits to generate treepoints. At each iteration, directly generate baggedPoints based on the treePoints. Here we do not need to persist/unpersist the internal training dataset for each tree. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28033) String concatenation should low priority than other operators
[ https://issues.apache.org/jira/browse/SPARK-28033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro resolved SPARK-28033. -- Resolution: Won't Fix > String concatenation should low priority than other operators > - > > Key: SPARK-28033 > URL: https://issues.apache.org/jira/browse/SPARK-28033 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.3 >Reporter: Yuming Wang >Priority: Major > > Spark SQL: > {code:sql} > spark-sql> explain select 'four: ' || 2 + 2; > == Physical Plan == > *(1) Project [null AS (CAST(concat(four: , CAST(2 AS STRING)) AS DOUBLE) + > CAST(2 AS DOUBLE))#2] > +- Scan OneRowRelation[] > spark-sql> select 'four: ' || 2 + 2; > NULL > {code} > Hive: > {code:sql} > hive> select 'four: ' || 2 + 2; > OK > four: 4 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28033) String concatenation should low priority than other operators
[ https://issues.apache.org/jira/browse/SPARK-28033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17004744#comment-17004744 ] Takeshi Yamamuro commented on SPARK-28033: -- I'll close this because the corresponding pr has been closed. If necessary, please reopen this. > String concatenation should low priority than other operators > - > > Key: SPARK-28033 > URL: https://issues.apache.org/jira/browse/SPARK-28033 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.3 >Reporter: Yuming Wang >Priority: Major > > Spark SQL: > {code:sql} > spark-sql> explain select 'four: ' || 2 + 2; > == Physical Plan == > *(1) Project [null AS (CAST(concat(four: , CAST(2 AS STRING)) AS DOUBLE) + > CAST(2 AS DOUBLE))#2] > +- Scan OneRowRelation[] > spark-sql> select 'four: ' || 2 + 2; > NULL > {code} > Hive: > {code:sql} > hive> select 'four: ' || 2 + 2; > OK > four: 4 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28028) Cast numeric to integral type need round
[ https://issues.apache.org/jira/browse/SPARK-28028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro updated SPARK-28028: - Parent Issue: SPARK-30375 (was: SPARK-27764) > Cast numeric to integral type need round > > > Key: SPARK-28028 > URL: https://issues.apache.org/jira/browse/SPARK-28028 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > For example > Case 1: > {code:sql} > select cast(-1.5 as smallint); > {code} > Spark SQL returns {{-1}}, but PostgreSQL returns {{-2}}. > > Case 2: > {code:sql} > SELECT smallint(float('32767.6')) > {code} > Spark SQL returns {{32767}}, but PostgreSQL throws {{ERROR: smallint out of > range}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27987) Support POSIX Regular Expressions
[ https://issues.apache.org/jira/browse/SPARK-27987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro updated SPARK-27987: - Parent Issue: SPARK-30375 (was: SPARK-27764) > Support POSIX Regular Expressions > - > > Key: SPARK-27987 > URL: https://issues.apache.org/jira/browse/SPARK-27987 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > POSIX regular expressions provide a more powerful means for pattern matching > than the LIKE and SIMILAR TO operators. Many Unix tools such as egrep, sed, > or awk use a pattern matching language that is similar to the one described > here. > ||Operator||Description||Example|| > |{{~}}|Matches regular expression, case sensitive|{{'thomas' ~ '.*thomas.*'}}| > |{{~*}}|Matches regular expression, case insensitive|{{'thomas' ~* > '.*Thomas.*'}}| > |{{!~}}|Does not match regular expression, case sensitive|{{'thomas' !~ > '.*Thomas.*'}}| > |{{!~*}}|Does not match regular expression, case insensitive|{{'thomas' !~* > '.*vadim.*'}}| > https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-POSIX-REGEXP -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25061) Spark SQL Thrift Server fails to not pick up hiveconf passing parameter
[ https://issues.apache.org/jira/browse/SPARK-25061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17004736#comment-17004736 ] Ajith S commented on SPARK-25061: - I could reproduce this and as per documentation, https://spark.apache.org/docs/latest/sql-distributed-sql-engine.html --hiveconf can be used to pass hive properties to thrift server. Raising PR for fixing the same. > Spark SQL Thrift Server fails to not pick up hiveconf passing parameter > > > Key: SPARK-25061 > URL: https://issues.apache.org/jira/browse/SPARK-25061 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Zineng Yuan >Priority: Major > > Spark thrift server should use passing parameter value and overwrites the > same conf from hive-site.xml. For example, the server should overwrite what > exists in hive-site.xml. > ./sbin/start-thriftserver.sh --master yarn-client ... > --hiveconf > "hive.server2.authentication.kerberos.principal=" ... > > hive.server2.authentication.kerberos.principal > hive/_HOST@ > > However, the server takes what in hive-site.xml. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28027) Missing some mathematical operators
[ https://issues.apache.org/jira/browse/SPARK-28027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro updated SPARK-28027: - Parent Issue: SPARK-30375 (was: SPARK-27764) > Missing some mathematical operators > --- > > Key: SPARK-28027 > URL: https://issues.apache.org/jira/browse/SPARK-28027 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > ||Operator||Description||Example||Result|| > |{{^}}|exponentiation (associates left to right)|{{2.0 ^ 3.0}}|{{8}}| > |{{\|/}}|square root|{{\|/ 25.0}}|{{5}}| > |{{\|\|/}}|cube root|{{\|\|/ 27.0}}|{{3}}| > |{{\!}}|factorial|{{5 !}}|{{120}}| > |{{\!\!}}|factorial (prefix operator)|{{!! 5}}|{{120}}| > |{{@}}|absolute value|{{@ -5.0}}|{{5}}| > |{{#}}|bitwise XOR|{{17 # 5}}|{{20}}| > |{{<<}}|bitwise shift left|{{1 << 4}}|{{16}}| > |{{>>}}|bitwise shift right|{{8 >> 2}}|{{2}}| > > Please note that we have {{^}}, {{\!}} and {{\!!\}}, but it has different > meanings. > [https://www.postgresql.org/docs/11/functions-math.html] > > [https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/LanguageElements/Operators/BitwiseOperators.htm] > [https://docs.aws.amazon.com/redshift/latest/dg/r_OPERATOR_SYMBOLS.html] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27931) Accept 'on' and 'off' as input for boolean data type
[ https://issues.apache.org/jira/browse/SPARK-27931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro updated SPARK-27931: - Parent Issue: SPARK-30375 (was: SPARK-27764) > Accept 'on' and 'off' as input for boolean data type > > > Key: SPARK-27931 > URL: https://issues.apache.org/jira/browse/SPARK-27931 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Assignee: YoungGyu Chun >Priority: Major > Fix For: 3.0.0 > > > This ticket contains three things: > 1. Accept 'on' and 'off' as input for boolean data type > {code:sql} > SELECT cast('no' as boolean) AS false; > SELECT cast('off' as boolean) AS false; > {code} > 2. Accept unique prefixes thereof: > {code:sql} > SELECT cast('of' as boolean) AS false; > SELECT cast('fal' as boolean) AS false; > {code} > 3. Trim the string when cast to boolean type > {code:sql} > SELECT cast('true ' as boolean) AS true; > SELECT cast(' FALSE' as boolean) AS true; > {code} > More details: > [https://www.postgresql.org/docs/devel/datatype-boolean.html] > > [https://github.com/postgres/postgres/blob/REL_12_BETA1/src/backend/utils/adt/bool.c#L25] > > [https://github.com/postgres/postgres/commit/05a7db05826c5eb68173b6d7ef1553c19322ef48] > > [https://github.com/postgres/postgres/commit/9729c9360886bee7feddc6a1124b0742de4b9f3d] > Other DBs: > [http://docs.aws.amazon.com/redshift/latest/dg/r_Boolean_type.html] > [https://my.vertica.com/docs/5.0/HTML/Master/2983.htm] > > [https://github.com/prestosql/presto/blob/b845cd66da3eb1fcece50efba83ea12bc40afbaa/presto-main/src/main/java/com/facebook/presto/type/VarcharOperators.java#L108-L138] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28656) Support `millennium`, `century` and `decade` at `extract()`
[ https://issues.apache.org/jira/browse/SPARK-28656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro updated SPARK-28656: - Parent Issue: SPARK-30375 (was: SPARK-27764) > Support `millennium`, `century` and `decade` at `extract()` > --- > > Key: SPARK-28656 > URL: https://issues.apache.org/jira/browse/SPARK-28656 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.0.0 > > > Currently, we support these field for EXTRACT: YEAR, QUARTER, MONTH, WEEK, > DAY, DAYOFWEEK, HOUR, MINUTE, SECOND. > We also need support: EPOCH, CENTURY, MILLENNIUM, DECADE, MICROSECONDS, > MILLISECONDS, DOW, ISODOW, DOY, TIMEZONE, TIMEZONE_M, TIMEZONE_H, JULIAN, > ISOYEAR. > https://www.postgresql.org/docs/11/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28674) Spark should support select into from where as PostgreSQL supports
[ https://issues.apache.org/jira/browse/SPARK-28674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro updated SPARK-28674: - Parent Issue: SPARK-30374 (was: SPARK-27764) > Spark should support select into from where > as PostgreSQL supports > > > Key: SPARK-28674 > URL: https://issues.apache.org/jira/browse/SPARK-28674 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.4.0 >Reporter: ABHISHEK KUMAR GUPTA >Priority: Major > > Spark should support {{select into from where > }} as PostgreSQL supports > {code} > create table dup(id int); > insert into dup values(1); > insert into dup values(2); > select id into test_dup from dup where id=1; > select * from test_dup; > {code} > *Result: Success in PostgreSQL* > But select id into test_dup from dup where id=1; in Spark gives ParseException > {code} > scala> sql("show tables").show(); > ++-+---+ > |database|tableName|isTemporary| > ++-+---+ > |func| dup| false| > ++-+---+ > {code} > {code} > scala> sql("select id into test_dup from dup where id=1").show() > org.apache.spark.sql.catalyst.parser.ParseException: > mismatched input 'test_dup' expecting (line 1, pos 15) > == SQL == > select id into test_dup from dup where id=1 > ---^^^ > at > org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:241) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:117) > at > org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:69) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642) > ... 49 elided > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28718) Support field synonyms at `extract`
[ https://issues.apache.org/jira/browse/SPARK-28718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro updated SPARK-28718: - Parent Issue: SPARK-30375 (was: SPARK-27764) > Support field synonyms at `extract` > --- > > Key: SPARK-28718 > URL: https://issues.apache.org/jira/browse/SPARK-28718 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Minor > Fix For: 3.0.0 > > > Here is the list of field synonyms supported by PostgreSQL at extract: > https://github.com/postgres/postgres/blob/master/src/backend/utils/adt/datetime.c#L171-L234 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28768) Implement more text pattern operators
[ https://issues.apache.org/jira/browse/SPARK-28768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro resolved SPARK-28768. -- Resolution: Won't Fix > Implement more text pattern operators > - > > Key: SPARK-28768 > URL: https://issues.apache.org/jira/browse/SPARK-28768 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > {code:sql} > postgres=# \do ~*~ > List of operators >Schema | Name | Left arg type | Right arg type | Result type | > Description > +--+---++-+- > pg_catalog | ~<=~ | character | character | boolean | less than > or equal > pg_catalog | ~<=~ | text | text | boolean | less than > or equal > pg_catalog | ~<~ | character | character | boolean | less than > pg_catalog | ~<~ | text | text | boolean | less than > pg_catalog | ~>=~ | character | character | boolean | greater > than or equal > pg_catalog | ~>=~ | text | text | boolean | greater > than or equal > pg_catalog | ~>~ | character | character | boolean | greater > than > pg_catalog | ~>~ | text | text | boolean | greater > than > pg_catalog | ~~ | bytea | bytea | boolean | matches > LIKE expression > pg_catalog | ~~ | character | text | boolean | matches > LIKE expression > pg_catalog | ~~ | name | text | boolean | matches > LIKE expression > pg_catalog | ~~ | text | text | boolean | matches > LIKE expression > (12 rows) > {code} > {noformat} > postgres=# select '1' ~<~ '2'; > ?column? > -- > t > (1 row) > {noformat} > https://stackoverflow.com/questions/35807872/operator-in-postgres -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28768) Implement more text pattern operators
[ https://issues.apache.org/jira/browse/SPARK-28768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17004723#comment-17004723 ] Takeshi Yamamuro commented on SPARK-28768: -- I'll close for now because I'm not sure that this feature is useful for Spark. If necessary, please reopen this. > Implement more text pattern operators > - > > Key: SPARK-28768 > URL: https://issues.apache.org/jira/browse/SPARK-28768 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > {code:sql} > postgres=# \do ~*~ > List of operators >Schema | Name | Left arg type | Right arg type | Result type | > Description > +--+---++-+- > pg_catalog | ~<=~ | character | character | boolean | less than > or equal > pg_catalog | ~<=~ | text | text | boolean | less than > or equal > pg_catalog | ~<~ | character | character | boolean | less than > pg_catalog | ~<~ | text | text | boolean | less than > pg_catalog | ~>=~ | character | character | boolean | greater > than or equal > pg_catalog | ~>=~ | text | text | boolean | greater > than or equal > pg_catalog | ~>~ | character | character | boolean | greater > than > pg_catalog | ~>~ | text | text | boolean | greater > than > pg_catalog | ~~ | bytea | bytea | boolean | matches > LIKE expression > pg_catalog | ~~ | character | text | boolean | matches > LIKE expression > pg_catalog | ~~ | name | text | boolean | matches > LIKE expression > pg_catalog | ~~ | text | text | boolean | matches > LIKE expression > (12 rows) > {code} > {noformat} > postgres=# select '1' ~<~ '2'; > ?column? > -- > t > (1 row) > {noformat} > https://stackoverflow.com/questions/35807872/operator-in-postgres -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28768) Implement more text pattern operators
[ https://issues.apache.org/jira/browse/SPARK-28768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro updated SPARK-28768: - Parent Issue: SPARK-30375 (was: SPARK-27764) > Implement more text pattern operators > - > > Key: SPARK-28768 > URL: https://issues.apache.org/jira/browse/SPARK-28768 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > {code:sql} > postgres=# \do ~*~ > List of operators >Schema | Name | Left arg type | Right arg type | Result type | > Description > +--+---++-+- > pg_catalog | ~<=~ | character | character | boolean | less than > or equal > pg_catalog | ~<=~ | text | text | boolean | less than > or equal > pg_catalog | ~<~ | character | character | boolean | less than > pg_catalog | ~<~ | text | text | boolean | less than > pg_catalog | ~>=~ | character | character | boolean | greater > than or equal > pg_catalog | ~>=~ | text | text | boolean | greater > than or equal > pg_catalog | ~>~ | character | character | boolean | greater > than > pg_catalog | ~>~ | text | text | boolean | greater > than > pg_catalog | ~~ | bytea | bytea | boolean | matches > LIKE expression > pg_catalog | ~~ | character | text | boolean | matches > LIKE expression > pg_catalog | ~~ | name | text | boolean | matches > LIKE expression > pg_catalog | ~~ | text | text | boolean | matches > LIKE expression > (12 rows) > {code} > {noformat} > postgres=# select '1' ~<~ '2'; > ?column? > -- > t > (1 row) > {noformat} > https://stackoverflow.com/questions/35807872/operator-in-postgres -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29187) Return null from `date_part()` for the null `field`
[ https://issues.apache.org/jira/browse/SPARK-29187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro updated SPARK-29187: - Parent Issue: SPARK-30375 (was: SPARK-27764) > Return null from `date_part()` for the null `field` > --- > > Key: SPARK-29187 > URL: https://issues.apache.org/jira/browse/SPARK-29187 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Minor > Fix For: 3.0.0 > > > PostgreSQL return NULL for the NULL field from the date_part() function: > {code} > maxim=# select date_part(null, date'2019-09-20'); > date_part > --- > > (1 row) > {code} > but Spark fails with the error: > {code} > spark-sql> select date_part(null, date'2019-09-20'); > Error in query: null; line 1 pos 7 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29311) Return seconds with fraction from `date_part`/`extract`
[ https://issues.apache.org/jira/browse/SPARK-29311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro updated SPARK-29311: - Parent Issue: SPARK-30375 (was: SPARK-27764) > Return seconds with fraction from `date_part`/`extract` > --- > > Key: SPARK-29311 > URL: https://issues.apache.org/jira/browse/SPARK-29311 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Minor > Fix For: 3.0.0 > > > The `date_part()` and `extract` should return seconds with fractional part > for the `SECOND` field as PostgreSQL does: > {code} > # SELECT date_part('SECONDS', timestamp'2019-10-01 00:00:01.01'); > date_part > --- > 1.01 > (1 row) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23179) Support option to throw exception if overflow occurs during Decimal arithmetic
[ https://issues.apache.org/jira/browse/SPARK-23179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro updated SPARK-23179: - Parent Issue: SPARK-30374 (was: SPARK-27764) > Support option to throw exception if overflow occurs during Decimal arithmetic > -- > > Key: SPARK-23179 > URL: https://issues.apache.org/jira/browse/SPARK-23179 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Marco Gaido >Assignee: Marco Gaido >Priority: Major > Fix For: 3.0.0 > > > SQL ANSI 2011 states that in case of overflow during arithmetic operations, > an exception should be thrown. This is what most of the SQL DBs do (eg. > SQLServer, DB2). Hive currently returns NULL (as Spark does) but HIVE-18291 > is open to be SQL compliant. > I propose to have a config option which allows to decide whether Spark should > behave according to SQL standards or in the current way (ie. returning NULL). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29096) The exact math method should be called only when there is a corresponding function in Math
[ https://issues.apache.org/jira/browse/SPARK-29096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro updated SPARK-29096: - Parent Issue: SPARK-30374 (was: SPARK-27764) > The exact math method should be called only when there is a corresponding > function in Math > -- > > Key: SPARK-29096 > URL: https://issues.apache.org/jira/browse/SPARK-29096 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Fix For: 3.0.0 > > > After https://github.com/apache/spark/pull/21599, if the option > "spark.sql.failOnIntegralTypeOverflow" is enabled, all the Binary Arithmetic > operator will used the exact version function. > However, only `Add`/`Substract`/`Multiply` has a corresponding exact function > in java.lang.Math . When the option "spark.sql.failOnIntegralTypeOverflow" is > enabled, a runtime exception "BinaryArithmetics must override either > exactMathMethod or genCode" is thrown if the other Binary Arithmetic > operators are used, such as "Divide", "Remainder". > The exact math method should be called only when there is a corresponding > function in Math -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26218) Throw exception on overflow for integers
[ https://issues.apache.org/jira/browse/SPARK-26218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro updated SPARK-26218: - Parent Issue: SPARK-30374 (was: SPARK-27764) > Throw exception on overflow for integers > > > Key: SPARK-26218 > URL: https://issues.apache.org/jira/browse/SPARK-26218 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Marco Gaido >Assignee: Marco Gaido >Priority: Major > Fix For: 3.0.0 > > > SPARK-24598 just updated the documentation in order to state that our > addition is a Java style one and not a SQL style. But in order to follow the > SQL standard we should instead throw an exception if an overflow occurs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29386) Copy data between a file and a table
[ https://issues.apache.org/jira/browse/SPARK-29386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17004722#comment-17004722 ] Takeshi Yamamuro commented on SPARK-29386: -- I'll close this for now because this feature is pg-specific. If necessary, please reopen this. > Copy data between a file and a table > - > > Key: SPARK-29386 > URL: https://issues.apache.org/jira/browse/SPARK-29386 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Priority: Major > > https://www.postgresql.org/docs/12/sql-copy.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29386) Copy data between a file and a table
[ https://issues.apache.org/jira/browse/SPARK-29386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro resolved SPARK-29386. -- Resolution: Won't Fix > Copy data between a file and a table > - > > Key: SPARK-29386 > URL: https://issues.apache.org/jira/browse/SPARK-29386 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Priority: Major > > https://www.postgresql.org/docs/12/sql-copy.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29386) Copy data between a file and a table
[ https://issues.apache.org/jira/browse/SPARK-29386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro updated SPARK-29386: - Parent Issue: SPARK-30375 (was: SPARK-27764) > Copy data between a file and a table > - > > Key: SPARK-29386 > URL: https://issues.apache.org/jira/browse/SPARK-29386 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Priority: Major > > https://www.postgresql.org/docs/12/sql-copy.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29587) Real data type is not supported in Spark SQL which is supporting in postgresql
[ https://issues.apache.org/jira/browse/SPARK-29587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro updated SPARK-29587: - Parent Issue: SPARK-30374 (was: SPARK-27764) > Real data type is not supported in Spark SQL which is supporting in postgresql > -- > > Key: SPARK-29587 > URL: https://issues.apache.org/jira/browse/SPARK-29587 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.4.4 >Reporter: jobit mathew >Assignee: Kent Yao >Priority: Minor > Fix For: 3.0.0 > > > Real data type is not supported in Spark SQL which is supporting in > postgresql. > +*In postgresql query success*+ > CREATE TABLE weather2(prcp real); > insert into weather2 values(2.5); > select * from weather2; > > || ||prcp|| > |1|2,5| > +*In spark sql getting error*+ > spark-sql> CREATE TABLE weather2(prcp real); > Error in query: > DataType real is not supported.(line 1, pos 27) > == SQL == > CREATE TABLE weather2(prcp real) > --- > Better to add the datatype "real " support in sql also > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29583) extract support interval type
[ https://issues.apache.org/jira/browse/SPARK-29583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro updated SPARK-29583: - Parent Issue: SPARK-30375 (was: SPARK-27764) > extract support interval type > - > > Key: SPARK-29583 > URL: https://issues.apache.org/jira/browse/SPARK-29583 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > {code:sql} > postgres=# select extract(minute from INTERVAL '1 YEAR 10 DAYS 50 MINUTES'); > date_part > --- > 50 > (1 row) > postgres=# select extract(minute from cast('2019-07-01 17:12:33.068' as > timestamp) - cast('2019-07-01 15:57:07.912' as timestamp)); > date_part > --- > 15 > (1 row) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29584) NOT NULL is not supported in Spark
[ https://issues.apache.org/jira/browse/SPARK-29584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17004721#comment-17004721 ] Takeshi Yamamuro commented on SPARK-29584: -- This issue is related to integrity constraints that are related to SPARK-19842. > NOT NULL is not supported in Spark > -- > > Key: SPARK-29584 > URL: https://issues.apache.org/jira/browse/SPARK-29584 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: ABHISHEK KUMAR GUPTA >Priority: Major > > Spark while creating table restricting column for NULL value is not supported. > As below > PostgreSQL: SUCCESS No Exception > CREATE TABLE Persons (ID int *NOT NULL*, LastName varchar(255) *NOT > NULL*,FirstName varchar(255) NOT NULL, Age int); > insert into Persons values(1,'GUPTA','Abhi',NULL); > select * from persons; > > Spark: Parse Exception > jdbc:hive2://10.18.19.208:23040/default> CREATE TABLE Persons (ID int NOT > NULL, LastName varchar(255) NOT NULL,FirstName varchar(255) NOT NULL, Age > int); > Error: org.apache.spark.sql.catalyst.parser.ParseException: > no viable alternative at input 'CREATE TABLE Persons (ID int NOT'(line 1, pos > 29) > Parse Exception -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29584) NOT NULL is not supported in Spark
[ https://issues.apache.org/jira/browse/SPARK-29584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro resolved SPARK-29584. -- Resolution: Duplicate > NOT NULL is not supported in Spark > -- > > Key: SPARK-29584 > URL: https://issues.apache.org/jira/browse/SPARK-29584 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: ABHISHEK KUMAR GUPTA >Priority: Major > > Spark while creating table restricting column for NULL value is not supported. > As below > PostgreSQL: SUCCESS No Exception > CREATE TABLE Persons (ID int *NOT NULL*, LastName varchar(255) *NOT > NULL*,FirstName varchar(255) NOT NULL, Age int); > insert into Persons values(1,'GUPTA','Abhi',NULL); > select * from persons; > > Spark: Parse Exception > jdbc:hive2://10.18.19.208:23040/default> CREATE TABLE Persons (ID int NOT > NULL, LastName varchar(255) NOT NULL,FirstName varchar(255) NOT NULL, Age > int); > Error: org.apache.spark.sql.catalyst.parser.ParseException: > no viable alternative at input 'CREATE TABLE Persons (ID int NOT'(line 1, pos > 29) > Parse Exception -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29616) Bankers' rounding for double types
[ https://issues.apache.org/jira/browse/SPARK-29616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro resolved SPARK-29616. -- Resolution: Won't Fix > Bankers' rounding for double types > -- > > Key: SPARK-29616 > URL: https://issues.apache.org/jira/browse/SPARK-29616 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Takeshi Yamamuro >Priority: Trivial > > PostgreSQL uses banker's rounding mode for double types; > {code} > postgres=# select * from t; > a | b > -+- > 0.5 | 0.5 > 1.5 | 1.5 > 2.5 | 2.5 > 3.5 | 3.5 > 4.5 | 4.5 > (5 rows) > postgres=# \d t > Table "public.t" > Column | Type | Collation | Nullable | Default > +--+---+--+- > a | double precision | | | > b | numeric(2,1) | | | > postgres=# select round(a), round(b) from t; > round | round > ---+--- > 0 | 1 > 2 | 2 > 2 | 3 > 4 | 4 > 4 | 5 > (5 rows) > {code} > > In the master; > {code} > scala> sql("select * from t").show > +---+---+ > | a| b| > +---+---+ > |0.5|0.5| > |1.5|1.5| > |2.5|2.5| > |3.5|3.5| > |4.5|4.5| > +---+---+ > scala> sql("select * from t").printSchema > root > |-- a: double (nullable = true) > |-- b: decimal(2,1) (nullable = true) > scala> sql("select round(a), round(b) from t").show() > +---+---+ > |round(a, 0)|round(b, 0)| > +---+---+ > | 1.0| 1| > | 2.0| 2| > | 3.0| 3| > | 4.0| 4| > | 5.0| 5| > +---+---+ > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29616) Bankers' rounding for double types
[ https://issues.apache.org/jira/browse/SPARK-29616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro updated SPARK-29616: - Parent Issue: SPARK-30375 (was: SPARK-27764) > Bankers' rounding for double types > -- > > Key: SPARK-29616 > URL: https://issues.apache.org/jira/browse/SPARK-29616 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Takeshi Yamamuro >Priority: Trivial > > PostgreSQL uses banker's rounding mode for double types; > {code} > postgres=# select * from t; > a | b > -+- > 0.5 | 0.5 > 1.5 | 1.5 > 2.5 | 2.5 > 3.5 | 3.5 > 4.5 | 4.5 > (5 rows) > postgres=# \d t > Table "public.t" > Column | Type | Collation | Nullable | Default > +--+---+--+- > a | double precision | | | > b | numeric(2,1) | | | > postgres=# select round(a), round(b) from t; > round | round > ---+--- > 0 | 1 > 2 | 2 > 2 | 3 > 4 | 4 > 4 | 5 > (5 rows) > {code} > > In the master; > {code} > scala> sql("select * from t").show > +---+---+ > | a| b| > +---+---+ > |0.5|0.5| > |1.5|1.5| > |2.5|2.5| > |3.5|3.5| > |4.5|4.5| > +---+---+ > scala> sql("select * from t").printSchema > root > |-- a: double (nullable = true) > |-- b: decimal(2,1) (nullable = true) > scala> sql("select round(a), round(b) from t").show() > +---+---+ > |round(a, 0)|round(b, 0)| > +---+---+ > | 1.0| 1| > | 2.0| 2| > | 3.0| 3| > | 4.0| 4| > | 5.0| 5| > +---+---+ > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29659) Support COMMENT ON syntax
[ https://issues.apache.org/jira/browse/SPARK-29659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro updated SPARK-29659: - Parent Issue: SPARK-30375 (was: SPARK-27764) > Support COMMENT ON syntax > - > > Key: SPARK-29659 > URL: https://issues.apache.org/jira/browse/SPARK-29659 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Takeshi Yamamuro >Priority: Major > > [https://www.postgresql.org/docs/current/sql-comment.html] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30380) Refactor RandomForest.findSplits
zhengruifeng created SPARK-30380: Summary: Refactor RandomForest.findSplits Key: SPARK-30380 URL: https://issues.apache.org/jira/browse/SPARK-30380 Project: Spark Issue Type: Improvement Components: ML Affects Versions: 3.0.0 Reporter: zhengruifeng Current impl of {{RandomForest.findSplits}} uses {{groupByKey}} to collect non-zero values for each feature, so it is quite dangerous. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org