[jira] [Updated] (SPARK-28587) JDBC data source's partition whereClause should support jdbc dialect
[ https://issues.apache.org/jira/browse/SPARK-28587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wyp updated SPARK-28587: Description: When we use JDBC data source to search data from Phoenix, and use timestamp data type column for partitionColumn, e.g. {code:java} val url = "jdbc:phoenix:thin:url=localhost:8765;serialization=PROTOBUF" val driver = "org.apache.phoenix.queryserver.client.Driver" val df = spark.read.format("jdbc") .option("url", url) .option("driver", driver) .option("fetchsize", "1000") .option("numPartitions", "6") .option("partitionColumn", "times") .option("lowerBound", "2019-07-31 00:00:00") .option("upperBound", "2019-08-01 00:00:00") .option("dbtable", "test") .load().select("id") println(df.count()) {code} there will throw AvaticaSqlException in phoenix: {code:java} org.apache.calcite.avatica.AvaticaSqlException: Error -1 (0) : while preparing SQL: SELECT 1 FROM search_info_test WHERE "TIMES" < '2019-07-31 04:00:00' or "TIMES" is null at org.apache.calcite.avatica.Helper.createException(Helper.java:54) at org.apache.calcite.avatica.Helper.createException(Helper.java:41) at org.apache.calcite.avatica.AvaticaConnection.prepareStatement(AvaticaConnection.java:368) at org.apache.calcite.avatica.AvaticaConnection.prepareStatement(AvaticaConnection.java:299) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:300) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) at org.apache.spark.scheduler.Task.run(Task.scala:121) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) java.lang.RuntimeException: org.apache.phoenix.schema.TypeMismatchException: ERROR 203 (22005): Type mismatch. TIMESTAMP and VARCHAR for "TIMES" < '2019-07-31 04:00:00' at org.apache.calcite.avatica.jdbc.JdbcMeta.propagate(JdbcMeta.java:700) at org.apache.calcite.avatica.jdbc.PhoenixJdbcMeta.prepare(PhoenixJdbcMeta.java:67) at org.apache.calcite.avatica.remote.LocalService.apply(LocalService.java:195) at org.apache.calcite.avatica.remote.Service$PrepareRequest.accept(Service.java:1215) at org.apache.calcite.avatica.remote.Service$PrepareRequest.accept(Service.java:1186) at org.apache.calcite.avatica.remote.AbstractHandler.apply(AbstractHandler.java:94) at org.apache.calcite.avatica.remote.ProtobufHandler.apply(ProtobufHandler.java:46) at org.apache.calcite.avatica.server.AvaticaProtobufHandler.handle(AvaticaProtobufHandler.java:127) at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) at org.eclipse.jetty.server.Server.handle(Server.java:534) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108) at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) at java.lang.Thread.run(Thread.java:834) {code} the reason is because JDBC data source's partition whereClause doesn't support jdbc dialect. We should use jdbc dialect to compile '2019-07-31 04:00:00' to to_timestamp('2019-07-31 04:00:00') was: When we use JDBC data source to search
[jira] [Created] (SPARK-28587) JDBC data source's partition whereClause should support jdbc dialect
wyp created SPARK-28587: --- Summary: JDBC data source's partition whereClause should support jdbc dialect Key: SPARK-28587 URL: https://issues.apache.org/jira/browse/SPARK-28587 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.3 Reporter: wyp When we use JDBC data source to search data from Phoenix, and use timestamp data type column for partitionColumn, e.g. {code:java} val url = "jdbc:phoenix:thin:url=localhost:8765;serialization=PROTOBUF" val driver = "org.apache.phoenix.queryserver.client.Driver" val df = spark.read.format("jdbc") .option("url", url) .option("driver", driver) .option("fetchsize", "1000") .option("numPartitions", "6") .option("partitionColumn", "times") .option("lowerBound", "2019-07-31 00:00:00") .option("upperBound", "2019-08-01 00:00:00") .option("dbtable", "test") .load().select("id") println(df.count()) {code} there will throw AvaticaSqlException in phoenix: {code:java} org.apache.calcite.avatica.AvaticaSqlException: Error -1 (0) : while preparing SQL: SELECT 1 FROM search_info_test WHERE "TIMES" < '2019-07-31 04:00:00' or "TIMES" is null at org.apache.calcite.avatica.Helper.createException(Helper.java:54) at org.apache.calcite.avatica.Helper.createException(Helper.java:41) at org.apache.calcite.avatica.AvaticaConnection.prepareStatement(AvaticaConnection.java:368) at org.apache.calcite.avatica.AvaticaConnection.prepareStatement(AvaticaConnection.java:299) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:300) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) at org.apache.spark.scheduler.Task.run(Task.scala:121) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) java.lang.RuntimeException: org.apache.phoenix.schema.TypeMismatchException: ERROR 203 (22005): Type mismatch. TIMESTAMP and VARCHAR for "TIMES" < '2019-07-31 04:00:00' at org.apache.calcite.avatica.jdbc.JdbcMeta.propagate(JdbcMeta.java:700) at org.apache.calcite.avatica.jdbc.PhoenixJdbcMeta.prepare(PhoenixJdbcMeta.java:67) at org.apache.calcite.avatica.remote.LocalService.apply(LocalService.java:195) at org.apache.calcite.avatica.remote.Service$PrepareRequest.accept(Service.java:1215) at org.apache.calcite.avatica.remote.Service$PrepareRequest.accept(Service.java:1186) at org.apache.calcite.avatica.remote.AbstractHandler.apply(AbstractHandler.java:94) at org.apache.calcite.avatica.remote.ProtobufHandler.apply(ProtobufHandler.java:46) at org.apache.calcite.avatica.server.AvaticaProtobufHandler.handle(AvaticaProtobufHandler.java:127) at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) at org.eclipse.jetty.server.Server.handle(Server.java:534) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108) at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) at java.lang.Thread.run(Thread.java:834) {code} the reason is because JDBC data source's
[jira] [Resolved] (SPARK-28153) Use AtomicReference at InputFileBlockHolder (to support input_file_name with Python UDF)
[ https://issues.apache.org/jira/browse/SPARK-28153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-28153. -- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 25286 [https://github.com/apache/spark/pull/25286] > Use AtomicReference at InputFileBlockHolder (to support input_file_name with > Python UDF) > > > Key: SPARK-28153 > URL: https://issues.apache.org/jira/browse/SPARK-28153 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.3.3, 3.0.0, 2.4.3 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.0.0 > > > {code} > from pyspark.sql.functions import udf, input_file_name > spark.range(10).write.mode("overwrite").parquet("/tmp/foo") > spark.read.parquet("/tmp/foo").select(udf(lambda x: x, "long")("id"), > input_file_name()).show() > {code} > {code} > ++-+ > |(id)|input_file_name()| > ++-+ > | 8| | > | 5| | > | 0| | > | 9| | > | 6| | > | 2| | > | 3| | > | 4| | > | 7| | > | 1| | > ++-+ > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-28153) Use AtomicReference at InputFileBlockHolder (to support input_file_name with Python UDF)
[ https://issues.apache.org/jira/browse/SPARK-28153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-28153: - Comment: was deleted (was: Issue resolved by pull request 25321 [https://github.com/apache/spark/pull/25321]) > Use AtomicReference at InputFileBlockHolder (to support input_file_name with > Python UDF) > > > Key: SPARK-28153 > URL: https://issues.apache.org/jira/browse/SPARK-28153 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.3.3, 3.0.0, 2.4.3 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.0.0 > > > {code} > from pyspark.sql.functions import udf, input_file_name > spark.range(10).write.mode("overwrite").parquet("/tmp/foo") > spark.read.parquet("/tmp/foo").select(udf(lambda x: x, "long")("id"), > input_file_name()).show() > {code} > {code} > ++-+ > |(id)|input_file_name()| > ++-+ > | 8| | > | 5| | > | 0| | > | 9| | > | 6| | > | 2| | > | 3| | > | 4| | > | 7| | > | 1| | > ++-+ > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-28153) Use AtomicReference at InputFileBlockHolder (to support input_file_name with Python UDF)
[ https://issues.apache.org/jira/browse/SPARK-28153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-28153: - Comment: was deleted (was: Issue resolved by pull request 25286 [https://github.com/apache/spark/pull/25286]) > Use AtomicReference at InputFileBlockHolder (to support input_file_name with > Python UDF) > > > Key: SPARK-28153 > URL: https://issues.apache.org/jira/browse/SPARK-28153 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.3.3, 3.0.0, 2.4.3 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.0.0 > > > {code} > from pyspark.sql.functions import udf, input_file_name > spark.range(10).write.mode("overwrite").parquet("/tmp/foo") > spark.read.parquet("/tmp/foo").select(udf(lambda x: x, "long")("id"), > input_file_name()).show() > {code} > {code} > ++-+ > |(id)|input_file_name()| > ++-+ > | 8| | > | 5| | > | 0| | > | 9| | > | 6| | > | 2| | > | 3| | > | 4| | > | 7| | > | 1| | > ++-+ > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28153) Use AtomicReference at InputFileBlockHolder (to support input_file_name with Python UDF)
[ https://issues.apache.org/jira/browse/SPARK-28153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16897779#comment-16897779 ] Hyukjin Kwon commented on SPARK-28153: -- Fixed in https://github.com/apache/spark/pull/24958 > Use AtomicReference at InputFileBlockHolder (to support input_file_name with > Python UDF) > > > Key: SPARK-28153 > URL: https://issues.apache.org/jira/browse/SPARK-28153 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.3.3, 3.0.0, 2.4.3 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.0.0 > > > {code} > from pyspark.sql.functions import udf, input_file_name > spark.range(10).write.mode("overwrite").parquet("/tmp/foo") > spark.read.parquet("/tmp/foo").select(udf(lambda x: x, "long")("id"), > input_file_name()).show() > {code} > {code} > ++-+ > |(id)|input_file_name()| > ++-+ > | 8| | > | 5| | > | 0| | > | 9| | > | 6| | > | 2| | > | 3| | > | 4| | > | 7| | > | 1| | > ++-+ > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28153) Use AtomicReference at InputFileBlockHolder (to support input_file_name with Python UDF)
[ https://issues.apache.org/jira/browse/SPARK-28153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-28153: Assignee: Hyukjin Kwon > Use AtomicReference at InputFileBlockHolder (to support input_file_name with > Python UDF) > > > Key: SPARK-28153 > URL: https://issues.apache.org/jira/browse/SPARK-28153 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.3.3, 3.0.0, 2.4.3 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > > {code} > from pyspark.sql.functions import udf, input_file_name > spark.range(10).write.mode("overwrite").parquet("/tmp/foo") > spark.read.parquet("/tmp/foo").select(udf(lambda x: x, "long")("id"), > input_file_name()).show() > {code} > {code} > ++-+ > |(id)|input_file_name()| > ++-+ > | 8| | > | 5| | > | 0| | > | 9| | > | 6| | > | 2| | > | 3| | > | 4| | > | 7| | > | 1| | > ++-+ > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-28153) Use AtomicReference at InputFileBlockHolder (to support input_file_name with Python UDF)
[ https://issues.apache.org/jira/browse/SPARK-28153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reopened SPARK-28153: -- Assignee: (was: Hyukjin Kwon) > Use AtomicReference at InputFileBlockHolder (to support input_file_name with > Python UDF) > > > Key: SPARK-28153 > URL: https://issues.apache.org/jira/browse/SPARK-28153 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.3.3, 3.0.0, 2.4.3 >Reporter: Hyukjin Kwon >Priority: Major > Fix For: 3.0.0 > > > {code} > from pyspark.sql.functions import udf, input_file_name > spark.range(10).write.mode("overwrite").parquet("/tmp/foo") > spark.read.parquet("/tmp/foo").select(udf(lambda x: x, "long")("id"), > input_file_name()).show() > {code} > {code} > ++-+ > |(id)|input_file_name()| > ++-+ > | 8| | > | 5| | > | 0| | > | 9| | > | 6| | > | 2| | > | 3| | > | 4| | > | 7| | > | 1| | > ++-+ > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28153) Use AtomicReference at InputFileBlockHolder (to support input_file_name with Python UDF)
[ https://issues.apache.org/jira/browse/SPARK-28153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-28153: - Fix Version/s: (was: 3.0.0) > Use AtomicReference at InputFileBlockHolder (to support input_file_name with > Python UDF) > > > Key: SPARK-28153 > URL: https://issues.apache.org/jira/browse/SPARK-28153 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.3.3, 3.0.0, 2.4.3 >Reporter: Hyukjin Kwon >Priority: Major > > {code} > from pyspark.sql.functions import udf, input_file_name > spark.range(10).write.mode("overwrite").parquet("/tmp/foo") > spark.read.parquet("/tmp/foo").select(udf(lambda x: x, "long")("id"), > input_file_name()).show() > {code} > {code} > ++-+ > |(id)|input_file_name()| > ++-+ > | 8| | > | 5| | > | 0| | > | 9| | > | 6| | > | 2| | > | 3| | > | 4| | > | 7| | > | 1| | > ++-+ > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28153) Use AtomicReference at InputFileBlockHolder (to support input_file_name with Python UDF)
[ https://issues.apache.org/jira/browse/SPARK-28153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-28153: Assignee: Hyukjin Kwon > Use AtomicReference at InputFileBlockHolder (to support input_file_name with > Python UDF) > > > Key: SPARK-28153 > URL: https://issues.apache.org/jira/browse/SPARK-28153 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.3.3, 3.0.0, 2.4.3 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > > {code} > from pyspark.sql.functions import udf, input_file_name > spark.range(10).write.mode("overwrite").parquet("/tmp/foo") > spark.read.parquet("/tmp/foo").select(udf(lambda x: x, "long")("id"), > input_file_name()).show() > {code} > {code} > ++-+ > |(id)|input_file_name()| > ++-+ > | 8| | > | 5| | > | 0| | > | 9| | > | 6| | > | 2| | > | 3| | > | 4| | > | 7| | > | 1| | > ++-+ > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28153) Use AtomicReference at InputFileBlockHolder (to support input_file_name with Python UDF)
[ https://issues.apache.org/jira/browse/SPARK-28153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-28153: - Fix Version/s: (was: 3.0.0) > Use AtomicReference at InputFileBlockHolder (to support input_file_name with > Python UDF) > > > Key: SPARK-28153 > URL: https://issues.apache.org/jira/browse/SPARK-28153 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.3.3, 3.0.0, 2.4.3 >Reporter: Hyukjin Kwon >Priority: Major > > {code} > from pyspark.sql.functions import udf, input_file_name > spark.range(10).write.mode("overwrite").parquet("/tmp/foo") > spark.read.parquet("/tmp/foo").select(udf(lambda x: x, "long")("id"), > input_file_name()).show() > {code} > {code} > ++-+ > |(id)|input_file_name()| > ++-+ > | 8| | > | 5| | > | 0| | > | 9| | > | 6| | > | 2| | > | 3| | > | 4| | > | 7| | > | 1| | > ++-+ > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28153) Use AtomicReference at InputFileBlockHolder (to support input_file_name with Python UDF)
[ https://issues.apache.org/jira/browse/SPARK-28153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-28153. -- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 25321 [https://github.com/apache/spark/pull/25321] > Use AtomicReference at InputFileBlockHolder (to support input_file_name with > Python UDF) > > > Key: SPARK-28153 > URL: https://issues.apache.org/jira/browse/SPARK-28153 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.3.3, 3.0.0, 2.4.3 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.0.0 > > > {code} > from pyspark.sql.functions import udf, input_file_name > spark.range(10).write.mode("overwrite").parquet("/tmp/foo") > spark.read.parquet("/tmp/foo").select(udf(lambda x: x, "long")("id"), > input_file_name()).show() > {code} > {code} > ++-+ > |(id)|input_file_name()| > ++-+ > | 8| | > | 5| | > | 0| | > | 9| | > | 6| | > | 2| | > | 3| | > | 4| | > | 7| | > | 1| | > ++-+ > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-28153) Use AtomicReference at InputFileBlockHolder (to support input_file_name with Python UDF)
[ https://issues.apache.org/jira/browse/SPARK-28153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reopened SPARK-28153: -- Assignee: (was: Hyukjin Kwon) > Use AtomicReference at InputFileBlockHolder (to support input_file_name with > Python UDF) > > > Key: SPARK-28153 > URL: https://issues.apache.org/jira/browse/SPARK-28153 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.3.3, 3.0.0, 2.4.3 >Reporter: Hyukjin Kwon >Priority: Major > Fix For: 3.0.0 > > > {code} > from pyspark.sql.functions import udf, input_file_name > spark.range(10).write.mode("overwrite").parquet("/tmp/foo") > spark.read.parquet("/tmp/foo").select(udf(lambda x: x, "long")("id"), > input_file_name()).show() > {code} > {code} > ++-+ > |(id)|input_file_name()| > ++-+ > | 8| | > | 5| | > | 0| | > | 9| | > | 6| | > | 2| | > | 3| | > | 4| | > | 7| | > | 1| | > ++-+ > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28153) Use AtomicReference at InputFileBlockHolder (to support input_file_name with Python UDF)
[ https://issues.apache.org/jira/browse/SPARK-28153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-28153. -- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 25321 [https://github.com/apache/spark/pull/25321] > Use AtomicReference at InputFileBlockHolder (to support input_file_name with > Python UDF) > > > Key: SPARK-28153 > URL: https://issues.apache.org/jira/browse/SPARK-28153 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.3.3, 3.0.0, 2.4.3 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.0.0 > > > {code} > from pyspark.sql.functions import udf, input_file_name > spark.range(10).write.mode("overwrite").parquet("/tmp/foo") > spark.read.parquet("/tmp/foo").select(udf(lambda x: x, "long")("id"), > input_file_name()).show() > {code} > {code} > ++-+ > |(id)|input_file_name()| > ++-+ > | 8| | > | 5| | > | 0| | > | 9| | > | 6| | > | 2| | > | 3| | > | 4| | > | 7| | > | 1| | > ++-+ > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-28153) Use AtomicReference at InputFileBlockHolder (to support input_file_name with Python UDF)
[ https://issues.apache.org/jira/browse/SPARK-28153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-28153: - Comment: was deleted (was: Issue resolved by pull request 25321 [https://github.com/apache/spark/pull/25321]) > Use AtomicReference at InputFileBlockHolder (to support input_file_name with > Python UDF) > > > Key: SPARK-28153 > URL: https://issues.apache.org/jira/browse/SPARK-28153 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.3.3, 3.0.0, 2.4.3 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.0.0 > > > {code} > from pyspark.sql.functions import udf, input_file_name > spark.range(10).write.mode("overwrite").parquet("/tmp/foo") > spark.read.parquet("/tmp/foo").select(udf(lambda x: x, "long")("id"), > input_file_name()).show() > {code} > {code} > ++-+ > |(id)|input_file_name()| > ++-+ > | 8| | > | 5| | > | 0| | > | 9| | > | 6| | > | 2| | > | 3| | > | 4| | > | 7| | > | 1| | > ++-+ > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28586) Make merge-spark-pr script compatible with Python 3
Hyukjin Kwon created SPARK-28586: Summary: Make merge-spark-pr script compatible with Python 3 Key: SPARK-28586 URL: https://issues.apache.org/jira/browse/SPARK-28586 Project: Spark Issue Type: Sub-task Components: Project Infra Affects Versions: 3.0.0 Reporter: Hyukjin Kwon merge-spark-pr.py is not Python 3 compatible. Committers should be more used to it so I will do it separately. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27888) Python 2->3 migration guide for PySpark users
[ https://issues.apache.org/jira/browse/SPARK-27888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-27888. -- Resolution: Won't Fix > Python 2->3 migration guide for PySpark users > - > > Key: SPARK-27888 > URL: https://issues.apache.org/jira/browse/SPARK-27888 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.0.0 >Reporter: Xiangrui Meng >Priority: Major > > We might need a short Python 2->3 migration guide for PySpark users. It > doesn't need to be comprehensive given many Python 2->3 migration guides > around. We just need some pointers and list items that are specific to > PySpark. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-28153) Use AtomicReference at InputFileBlockHolder (to support input_file_name with Python UDF)
[ https://issues.apache.org/jira/browse/SPARK-28153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reopened SPARK-28153: -- Assignee: (was: Hyukjin Kwon) > Use AtomicReference at InputFileBlockHolder (to support input_file_name with > Python UDF) > > > Key: SPARK-28153 > URL: https://issues.apache.org/jira/browse/SPARK-28153 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.3.3, 3.0.0, 2.4.3 >Reporter: Hyukjin Kwon >Priority: Major > Fix For: 3.0.0 > > > {code} > from pyspark.sql.functions import udf, input_file_name > spark.range(10).write.mode("overwrite").parquet("/tmp/foo") > spark.read.parquet("/tmp/foo").select(udf(lambda x: x, "long")("id"), > input_file_name()).show() > {code} > {code} > ++-+ > |(id)|input_file_name()| > ++-+ > | 8| | > | 5| | > | 0| | > | 9| | > | 6| | > | 2| | > | 3| | > | 4| | > | 7| | > | 1| | > ++-+ > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28153) Use AtomicReference at InputFileBlockHolder (to support input_file_name with Python UDF)
[ https://issues.apache.org/jira/browse/SPARK-28153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-28153: - Fix Version/s: (was: 3.0.0) > Use AtomicReference at InputFileBlockHolder (to support input_file_name with > Python UDF) > > > Key: SPARK-28153 > URL: https://issues.apache.org/jira/browse/SPARK-28153 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.3.3, 3.0.0, 2.4.3 >Reporter: Hyukjin Kwon >Priority: Major > > {code} > from pyspark.sql.functions import udf, input_file_name > spark.range(10).write.mode("overwrite").parquet("/tmp/foo") > spark.read.parquet("/tmp/foo").select(udf(lambda x: x, "long")("id"), > input_file_name()).show() > {code} > {code} > ++-+ > |(id)|input_file_name()| > ++-+ > | 8| | > | 5| | > | 0| | > | 9| | > | 6| | > | 2| | > | 3| | > | 4| | > | 7| | > | 1| | > ++-+ > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28153) Use AtomicReference at InputFileBlockHolder (to support input_file_name with Python UDF)
[ https://issues.apache.org/jira/browse/SPARK-28153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-28153: Assignee: Hyukjin Kwon > Use AtomicReference at InputFileBlockHolder (to support input_file_name with > Python UDF) > > > Key: SPARK-28153 > URL: https://issues.apache.org/jira/browse/SPARK-28153 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.3.3, 3.0.0, 2.4.3 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > > {code} > from pyspark.sql.functions import udf, input_file_name > spark.range(10).write.mode("overwrite").parquet("/tmp/foo") > spark.read.parquet("/tmp/foo").select(udf(lambda x: x, "long")("id"), > input_file_name()).show() > {code} > {code} > ++-+ > |(id)|input_file_name()| > ++-+ > | 8| | > | 5| | > | 0| | > | 9| | > | 6| | > | 2| | > | 3| | > | 4| | > | 7| | > | 1| | > ++-+ > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28153) Use AtomicReference at InputFileBlockHolder (to support input_file_name with Python UDF)
[ https://issues.apache.org/jira/browse/SPARK-28153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16897758#comment-16897758 ] Hyukjin Kwon commented on SPARK-28153: -- I am testing Python 3 compatibility of merigng script. Let me open and resolve. Please ignore this noise. > Use AtomicReference at InputFileBlockHolder (to support input_file_name with > Python UDF) > > > Key: SPARK-28153 > URL: https://issues.apache.org/jira/browse/SPARK-28153 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.3.3, 3.0.0, 2.4.3 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.0.0 > > > {code} > from pyspark.sql.functions import udf, input_file_name > spark.range(10).write.mode("overwrite").parquet("/tmp/foo") > spark.read.parquet("/tmp/foo").select(udf(lambda x: x, "long")("id"), > input_file_name()).show() > {code} > {code} > ++-+ > |(id)|input_file_name()| > ++-+ > | 8| | > | 5| | > | 0| | > | 9| | > | 6| | > | 2| | > | 3| | > | 4| | > | 7| | > | 1| | > ++-+ > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28471) Formatting dates with negative years
[ https://issues.apache.org/jira/browse/SPARK-28471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-28471: Labels: (was: correctness) > Formatting dates with negative years > > > Key: SPARK-28471 > URL: https://issues.apache.org/jira/browse/SPARK-28471 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Minor > Fix For: 3.0.0 > > > While converting dates with negative years to strings, Spark skips era > sub-field by default. That's can confuse users since years from BC era are > mirrored to current era. For example: > {code} > spark-sql> select make_date(-44, 3, 15); > 0045-03-15 > {code} > Even negative years are out of supported range by the DATE type, it would be > nice to indicate the era for such dates. > PostgreSQL outputs the era for such inputs: > {code} > # select make_date(-44, 3, 15); >make_date > --- > 0044-03-15 BC > (1 row) > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28471) Formatting dates with negative years
[ https://issues.apache.org/jira/browse/SPARK-28471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-28471: Labels: correctness (was: ) > Formatting dates with negative years > > > Key: SPARK-28471 > URL: https://issues.apache.org/jira/browse/SPARK-28471 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Minor > Labels: correctness > Fix For: 3.0.0 > > > While converting dates with negative years to strings, Spark skips era > sub-field by default. That's can confuse users since years from BC era are > mirrored to current era. For example: > {code} > spark-sql> select make_date(-44, 3, 15); > 0045-03-15 > {code} > Even negative years are out of supported range by the DATE type, it would be > nice to indicate the era for such dates. > PostgreSQL outputs the era for such inputs: > {code} > # select make_date(-44, 3, 15); >make_date > --- > 0044-03-15 BC > (1 row) > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28153) Use AtomicReference at InputFileBlockHolder (to support input_file_name with Python UDF)
[ https://issues.apache.org/jira/browse/SPARK-28153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28153: -- Summary: Use AtomicReference at InputFileBlockHolder (to support input_file_name with Python UDF) (was: input_file_name doesn't work with Python UDF in the same project) > Use AtomicReference at InputFileBlockHolder (to support input_file_name with > Python UDF) > > > Key: SPARK-28153 > URL: https://issues.apache.org/jira/browse/SPARK-28153 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.3.3, 3.0.0, 2.4.3 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.0.0 > > > {code} > from pyspark.sql.functions import udf, input_file_name > spark.range(10).write.mode("overwrite").parquet("/tmp/foo") > spark.read.parquet("/tmp/foo").select(udf(lambda x: x, "long")("id"), > input_file_name()).show() > {code} > {code} > ++-+ > |(id)|input_file_name()| > ++-+ > | 8| | > | 5| | > | 0| | > | 9| | > | 6| | > | 2| | > | 3| | > | 4| | > | 7| | > | 1| | > ++-+ > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28153) input_file_name doesn't work with Python UDF in the same project
[ https://issues.apache.org/jira/browse/SPARK-28153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28153: -- Affects Version/s: 2.4.3 > input_file_name doesn't work with Python UDF in the same project > > > Key: SPARK-28153 > URL: https://issues.apache.org/jira/browse/SPARK-28153 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.0.0, 2.4.3 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.0.0 > > > {code} > from pyspark.sql.functions import udf, input_file_name > spark.range(10).write.mode("overwrite").parquet("/tmp/foo") > spark.read.parquet("/tmp/foo").select(udf(lambda x: x, "long")("id"), > input_file_name()).show() > {code} > {code} > ++-+ > |(id)|input_file_name()| > ++-+ > | 8| | > | 5| | > | 0| | > | 9| | > | 6| | > | 2| | > | 3| | > | 4| | > | 7| | > | 1| | > ++-+ > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28153) input_file_name doesn't work with Python UDF in the same project
[ https://issues.apache.org/jira/browse/SPARK-28153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28153: -- Affects Version/s: 2.3.3 > input_file_name doesn't work with Python UDF in the same project > > > Key: SPARK-28153 > URL: https://issues.apache.org/jira/browse/SPARK-28153 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.3.3, 3.0.0, 2.4.3 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.0.0 > > > {code} > from pyspark.sql.functions import udf, input_file_name > spark.range(10).write.mode("overwrite").parquet("/tmp/foo") > spark.read.parquet("/tmp/foo").select(udf(lambda x: x, "long")("id"), > input_file_name()).show() > {code} > {code} > ++-+ > |(id)|input_file_name()| > ++-+ > | 8| | > | 5| | > | 0| | > | 9| | > | 6| | > | 2| | > | 3| | > | 4| | > | 7| | > | 1| | > ++-+ > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-24352) Flaky test: StandaloneDynamicAllocationSuite
[ https://issues.apache.org/jira/browse/SPARK-24352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-24352. --- Resolution: Fixed Fix Version/s: 2.4.4 2.3.4 3.0.0 Issue resolved by pull request 25318 [https://github.com/apache/spark/pull/25318] > Flaky test: StandaloneDynamicAllocationSuite > > > Key: SPARK-24352 > URL: https://issues.apache.org/jira/browse/SPARK-24352 > Project: Spark > Issue Type: Bug > Components: Spark Core, Tests >Affects Versions: 2.3.0 >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin >Priority: Major > Fix For: 3.0.0, 2.3.4, 2.4.4 > > > From jenkins: > [https://amplab.cs.berkeley.edu/jenkins/user/vanzin/my-views/view/Spark/job/spark-branch-2.3-test-maven-hadoop-2.6/384/testReport/junit/org.apache.spark.deploy/StandaloneDynamicAllocationSuite/executor_registration_on_a_blacklisted_host_must_fail/] > > {noformat} > Error Message > There is already an RpcEndpoint called CoarseGrainedScheduler > Stacktrace > java.lang.IllegalArgumentException: There is already an RpcEndpoint > called CoarseGrainedScheduler > at > org.apache.spark.rpc.netty.Dispatcher.registerRpcEndpoint(Dispatcher.scala:71) > at > org.apache.spark.rpc.netty.NettyRpcEnv.setupEndpoint(NettyRpcEnv.scala:130) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.createDriverEndpointRef(CoarseGrainedSchedulerBackend.scala:396) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.start(CoarseGrainedSchedulerBackend.scala:391) > at > org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend.start(StandaloneSchedulerBackend.scala:61) > at > org.apache.spark.deploy.StandaloneDynamicAllocationSuite$$anonfun$1.apply$mcV$sp(StandaloneDynamicAllocationSuite.scala:512) > at > org.apache.spark.deploy.StandaloneDynamicAllocationSuite$$anonfun$1.apply(StandaloneDynamicAllocationSuite.scala:495) > at > org.apache.spark.deploy.StandaloneDynamicAllocationSuite$$anonfun$1.apply(StandaloneDynamicAllocationSuite.scala:495) > at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) > at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:68) > at > org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:183) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196) > {noformat} > This actually looks like a previous test is leaving some stuff running and > making this one fail. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24352) Flaky test: StandaloneDynamicAllocationSuite
[ https://issues.apache.org/jira/browse/SPARK-24352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-24352: - Assignee: Marcelo Vanzin > Flaky test: StandaloneDynamicAllocationSuite > > > Key: SPARK-24352 > URL: https://issues.apache.org/jira/browse/SPARK-24352 > Project: Spark > Issue Type: Bug > Components: Spark Core, Tests >Affects Versions: 2.3.0 >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin >Priority: Major > > From jenkins: > [https://amplab.cs.berkeley.edu/jenkins/user/vanzin/my-views/view/Spark/job/spark-branch-2.3-test-maven-hadoop-2.6/384/testReport/junit/org.apache.spark.deploy/StandaloneDynamicAllocationSuite/executor_registration_on_a_blacklisted_host_must_fail/] > > {noformat} > Error Message > There is already an RpcEndpoint called CoarseGrainedScheduler > Stacktrace > java.lang.IllegalArgumentException: There is already an RpcEndpoint > called CoarseGrainedScheduler > at > org.apache.spark.rpc.netty.Dispatcher.registerRpcEndpoint(Dispatcher.scala:71) > at > org.apache.spark.rpc.netty.NettyRpcEnv.setupEndpoint(NettyRpcEnv.scala:130) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.createDriverEndpointRef(CoarseGrainedSchedulerBackend.scala:396) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.start(CoarseGrainedSchedulerBackend.scala:391) > at > org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend.start(StandaloneSchedulerBackend.scala:61) > at > org.apache.spark.deploy.StandaloneDynamicAllocationSuite$$anonfun$1.apply$mcV$sp(StandaloneDynamicAllocationSuite.scala:512) > at > org.apache.spark.deploy.StandaloneDynamicAllocationSuite$$anonfun$1.apply(StandaloneDynamicAllocationSuite.scala:495) > at > org.apache.spark.deploy.StandaloneDynamicAllocationSuite$$anonfun$1.apply(StandaloneDynamicAllocationSuite.scala:495) > at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) > at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:68) > at > org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:183) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196) > {noformat} > This actually looks like a previous test is leaving some stuff running and > making this one fail. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28564) Access history application defaults to the last attempt id
[ https://issues.apache.org/jira/browse/SPARK-28564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-28564. Resolution: Fixed Fix Version/s: 2.4.4 3.0.0 Issue resolved by pull request 25301 [https://github.com/apache/spark/pull/25301] > Access history application defaults to the last attempt id > -- > > Key: SPARK-28564 > URL: https://issues.apache.org/jira/browse/SPARK-28564 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: dzcxzl >Assignee: dzcxzl >Priority: Trivial > Fix For: 3.0.0, 2.4.4 > > > When we set spark.history.ui.maxApplications to a small value, we can't get > some apps from the page search. > If the url is spliced (http://localhost:18080/history/local-xxx), it can be > accessed if the app has no attempt. > But in the case of multiple attempted apps, such a url cannot be accessed, > and the page displays Not Found. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28564) Access history application defaults to the last attempt id
[ https://issues.apache.org/jira/browse/SPARK-28564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin reassigned SPARK-28564: -- Assignee: dzcxzl > Access history application defaults to the last attempt id > -- > > Key: SPARK-28564 > URL: https://issues.apache.org/jira/browse/SPARK-28564 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: dzcxzl >Assignee: dzcxzl >Priority: Trivial > > When we set spark.history.ui.maxApplications to a small value, we can't get > some apps from the page search. > If the url is spliced (http://localhost:18080/history/local-xxx), it can be > accessed if the app has no attempt. > But in the case of multiple attempted apps, such a url cannot be accessed, > and the page displays Not Found. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28585) Improve WebUI DAG information: Add extra info to rdd from spark plan
Pablo Langa Blanco created SPARK-28585: -- Summary: Improve WebUI DAG information: Add extra info to rdd from spark plan Key: SPARK-28585 URL: https://issues.apache.org/jira/browse/SPARK-28585 Project: Spark Issue Type: Improvement Components: Spark Core, Web UI Affects Versions: 3.0.0 Reporter: Pablo Langa Blanco The mainly improve that i want to achieve is to help developers to explore the DAG information in the Web UI in complex flows. Sometimes is very dificult to know what part of your spark plan corresponds to the DAG you are looking. This is an initial improvement only in one simple spark plan type (UnionExec). If you consider it a good idea, i want to extend it to other spark plans to improve the visualization iteratively. More info in the pull request -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28581) Replace _FUNC_ in UDF ExpressionInfo
[ https://issues.apache.org/jira/browse/SPARK-28581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-28581: - Assignee: Yuming Wang > Replace _FUNC_ in UDF ExpressionInfo > > > Key: SPARK-28581 > URL: https://issues.apache.org/jira/browse/SPARK-28581 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > > This issue aims to move {{replaceFunctionName(usage: String, functionName: > String)}} from {{DescribeFunctionCommand}} to {{ExpressionInfo}} in order to > make {{ExpressionInfo}} returns actual name instead of placeholder. We can > get {{ExpressionInfo}}s directly through > {{SessionCatalog.lookupFunctionInfo}} API and get the real names. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28581) Replace _FUNC_ in UDF ExpressionInfo
[ https://issues.apache.org/jira/browse/SPARK-28581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-28581. --- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 25314 [https://github.com/apache/spark/pull/25314] > Replace _FUNC_ in UDF ExpressionInfo > > > Key: SPARK-28581 > URL: https://issues.apache.org/jira/browse/SPARK-28581 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 3.0.0 > > > This issue aims to move {{replaceFunctionName(usage: String, functionName: > String)}} from {{DescribeFunctionCommand}} to {{ExpressionInfo}} in order to > make {{ExpressionInfo}} returns actual name instead of placeholder. We can > get {{ExpressionInfo}}s directly through > {{SessionCatalog.lookupFunctionInfo}} API and get the real names. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28581) Replace _FUNC_ in UDF ExpressionInfo
[ https://issues.apache.org/jira/browse/SPARK-28581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28581: -- Description: This issue aims to move {{replaceFunctionName(usage: String, functionName: String)}} from {{DescribeFunctionCommand}} to {{ExpressionInfo}} in order to make {{ExpressionInfo}} returns actual name instead of placeholder. We can get {{ExpressionInfo}}s directly through {{SessionCatalog.lookupFunctionInfo}} API and get the real names. (was: Moves {{replaceFunctionName(usage: String, functionName: String)}}from {{DescribeFunctionCommand}} to {{ExpressionInfo}}.) > Replace _FUNC_ in UDF ExpressionInfo > > > Key: SPARK-28581 > URL: https://issues.apache.org/jira/browse/SPARK-28581 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > This issue aims to move {{replaceFunctionName(usage: String, functionName: > String)}} from {{DescribeFunctionCommand}} to {{ExpressionInfo}} in order to > make {{ExpressionInfo}} returns actual name instead of placeholder. We can > get {{ExpressionInfo}}s directly through > {{SessionCatalog.lookupFunctionInfo}} API and get the real names. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28333) NULLS FIRST for DESC and NULLS LAST for ASC
[ https://issues.apache.org/jira/browse/SPARK-28333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-28333. --- Resolution: Won't Do Please see the discussion and review comments on the PR. > NULLS FIRST for DESC and NULLS LAST for ASC > --- > > Key: SPARK-28333 > URL: https://issues.apache.org/jira/browse/SPARK-28333 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > {code:sql} > spark-sql> create or replace temporary view t1 as select * from (values(1), > (2), (null), (3), (null)) as v (val); > spark-sql> select * from t1 order by val asc; > NULL > NULL > 1 > 2 > 3 > spark-sql> select * from t1 order by val desc; > 3 > 2 > 1 > NULL > NULL > {code} > {code:sql} > postgres=# create or replace temporary view t1 as select * from (values(1), > (2), (null), (3), (null)) as v (val); > CREATE VIEW > postgres=# select * from t1 order by val asc; > val > - >1 >2 >3 > (5 rows) > postgres=# select * from t1 order by val desc; > val > - >3 >2 >1 > (5 rows) > {code} > https://www.postgresql.org/docs/11/queries-order.html -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28584) Flaky test: org.apache.spark.scheduler.TaskSchedulerImplSuite
Marcelo Vanzin created SPARK-28584: -- Summary: Flaky test: org.apache.spark.scheduler.TaskSchedulerImplSuite Key: SPARK-28584 URL: https://issues.apache.org/jira/browse/SPARK-28584 Project: Spark Issue Type: Bug Components: Tests Affects Versions: 3.0.0 Reporter: Marcelo Vanzin This is another of those tests that don't seem to fail in PRs here, but fail more often than we'd like in our build machines. In this case it fails in several different ways, e.g.: {noformat} org.scalatest.exceptions.TestFailedException: Map(org.apache.spark.scheduler.TaskSetManager$$EnhancerByMockitoWithCGLIB$$c676cf51@412f9d43 -> 1550579875956) did not contain key org.apache.spark.scheduler.TaskSetManager$$EnhancerByMockitoWithCGLIB$$c676cf51@1945f15f at org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:528) at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560) at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:501) at org.apache.spark.scheduler.TaskSchedulerImplSuite$$anonfun$21.apply(TaskSchedulerImplSuite.scala:635) at org.apache.spark.scheduler.TaskSchedulerImplSuite$$anonfun$21.apply(TaskSchedulerImplSuite.scala:591) {noformat} Or: {noformat} The code passed to eventually never returned normally. Attempted 40 times over 503.217543 milliseconds. Last failure message: tsm.isZombie was false. Error message: org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to eventually never returned normally. Attempted 40 times over 503.217543 milliseconds. Last failure message: tsm.isZombie was false. at org.scalatest.concurrent.Eventually$class.tryTryAgain$1(Eventually.scala:421) at org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:439) at org.apache.spark.scheduler.TaskSchedulerImplSuite.eventually(TaskSchedulerImplSuite.scala:44) at org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:337) at org.apache.spark.scheduler.TaskSchedulerImplSuite.eventually(TaskSchedulerImplSuite.scala:44) at org.apache.spark.scheduler.TaskSchedulerImplSuite$$anonfun$18.apply(TaskSchedulerImplSuite.scala:543) at org.apache.spark.scheduler.TaskSchedulerImplSuite$$anonfun$18.apply(TaskSchedulerImplSuite.scala:511) {noformat} There's a race condition in the test that can cause these different failures. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25185) CBO rowcount statistics doesn't work for partitioned parquet external table
[ https://issues.apache.org/jira/browse/SPARK-25185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16897305#comment-16897305 ] Amit commented on SPARK-25185: -- Yes it seems to work, but with lot of caveats.. like the one above up until 2.3 not sure of 2.4 though > CBO rowcount statistics doesn't work for partitioned parquet external table > --- > > Key: SPARK-25185 > URL: https://issues.apache.org/jira/browse/SPARK-25185 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 2.2.1, 2.3.0 > Environment: > Tried on Ubuntu, FreBSD and windows, running spark-shell in local mode > reading data from local file system >Reporter: Amit >Priority: Major > > Created a dummy partitioned data with partition column on string type col1=a > and col1=b > added csv data-> read through spark -> created partitioned external table-> > msck repair table to load partition. Did analyze on all columns and partition > column as well. > ~println(spark.sql("select * from test_p where > e='1a'").queryExecution.toStringWithStats)~ > ~val op = spark.sql("select * from test_p where > e='1a'").queryExecution.optimizedPlan~ > // e is the partitioned column > ~val stat = op.stats(spark.sessionState.conf)~ > ~print(stat.rowCount)~ > > Created the same way in parquet the rowcount comes up correctly in case of > csv but in parquet it shows as None. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28583) Subqueries should not call `onUpdatePlan` in Adaptive Query Execution
Maryann Xue created SPARK-28583: --- Summary: Subqueries should not call `onUpdatePlan` in Adaptive Query Execution Key: SPARK-28583 URL: https://issues.apache.org/jira/browse/SPARK-28583 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0 Reporter: Maryann Xue Subqueries do not have their own execution id, thus when calling {{AdaptiveSparkPlanExec.onUpdatePlan}}, it will actually get the {{QueryExecution}} instance of the main query, which is wasteful and problematic. It could cause issues like stack overflow or dead locks in some circumstances. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28582) Pyspark daemon exit failed when receive SIGTERM on py3.7
Weichen Xu created SPARK-28582: -- Summary: Pyspark daemon exit failed when receive SIGTERM on py3.7 Key: SPARK-28582 URL: https://issues.apache.org/jira/browse/SPARK-28582 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 2.4.3 Reporter: Weichen Xu Pyspark daemon exit failed when receive SIGTERM on py3.7. We can run test on py3.7 like {code} python/run-tests --python-executables=python3.7 --testname "pyspark.tests.test_daemon DaemonTests" {code} Will fail on test "test_termination_sigterm". And we can see daemon process do not exit. This issue happen on py3.7 but lower version python works fine. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28153) input_file_name doesn't work with Python UDF in the same project
[ https://issues.apache.org/jira/browse/SPARK-28153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-28153. - Resolution: Fixed Assignee: Hyukjin Kwon Fix Version/s: 3.0.0 > input_file_name doesn't work with Python UDF in the same project > > > Key: SPARK-28153 > URL: https://issues.apache.org/jira/browse/SPARK-28153 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.0.0 > > > {code} > from pyspark.sql.functions import udf, input_file_name > spark.range(10).write.mode("overwrite").parquet("/tmp/foo") > spark.read.parquet("/tmp/foo").select(udf(lambda x: x, "long")("id"), > input_file_name()).show() > {code} > {code} > ++-+ > |(id)|input_file_name()| > ++-+ > | 8| | > | 5| | > | 0| | > | 9| | > | 6| | > | 2| | > | 3| | > | 4| | > | 7| | > | 1| | > ++-+ > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28581) Replace _FUNC_ in UDF ExpressionInfo
[ https://issues.apache.org/jira/browse/SPARK-28581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-28581: Description: Moves {{replaceFunctionName(usage: String, functionName: String)}}from {{DescribeFunctionCommand}} to {{ExpressionInfo}}. (was: This PR moves {{replaceFunctionName(usage: String, functionName: String)}} from {{DescribeFunctionCommand}} to {{ExpressionInfo}}.) > Replace _FUNC_ in UDF ExpressionInfo > > > Key: SPARK-28581 > URL: https://issues.apache.org/jira/browse/SPARK-28581 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > Moves {{replaceFunctionName(usage: String, functionName: String)}}from > {{DescribeFunctionCommand}} to {{ExpressionInfo}}. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28581) Replace _FUNC_ in UDF ExpressionInfo
[ https://issues.apache.org/jira/browse/SPARK-28581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-28581: Description: This PR moves {{replaceFunctionName(usage: String, functionName: String)}} from {{DescribeFunctionCommand}} to {{ExpressionInfo}}. > Replace _FUNC_ in UDF ExpressionInfo > > > Key: SPARK-28581 > URL: https://issues.apache.org/jira/browse/SPARK-28581 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > This PR moves {{replaceFunctionName(usage: String, functionName: String)}} > from {{DescribeFunctionCommand}} to {{ExpressionInfo}}. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28581) Replace _FUNC_ in UDF ExpressionInfo
Yuming Wang created SPARK-28581: --- Summary: Replace _FUNC_ in UDF ExpressionInfo Key: SPARK-28581 URL: https://issues.apache.org/jira/browse/SPARK-28581 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Yuming Wang -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-28547) Make it work for wide (> 10K columns data)
[ https://issues.apache.org/jira/browse/SPARK-28547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] antonkulaga reopened SPARK-28547: - I did not see any solutions. > Make it work for wide (> 10K columns data) > -- > > Key: SPARK-28547 > URL: https://issues.apache.org/jira/browse/SPARK-28547 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.4, 2.4.3 > Environment: Ubuntu server, Spark 2.4.3 Scala with >64GB RAM per > node, 32 cores (tried different configurations of executors) >Reporter: antonkulaga >Priority: Critical > > Spark is super-slow for all wide data (when there are >15kb columns and >15kb > rows). Most of the genomics/transcriptomic data is wide because number of > genes is usually >20kb and number of samples ass well. Very popular GTEX > dataset is a good example ( see for instance RNA-Seq data at > https://storage.googleapis.com/gtex_analysis_v7/rna_seq_data where gct is > just a .tsv file with two comments in the beginning). Everything done in wide > tables (even simple "describe" functions applied to all the genes-columns) > either takes hours or gets frozen (because of lost executors) irrespective of > memory and numbers of cores. While the same operations work fast (minutes) > and well with pure pandas (without any spark involved). > f -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28547) Make it work for wide (> 10K columns data)
[ https://issues.apache.org/jira/browse/SPARK-28547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16897194#comment-16897194 ] antonkulaga commented on SPARK-28547: - [~maropu] I think I was quite clear: even describe works slow as hell. So the easiest way to reproduce is just to run describe on all numeric columns in GTEX. > Make it work for wide (> 10K columns data) > -- > > Key: SPARK-28547 > URL: https://issues.apache.org/jira/browse/SPARK-28547 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.4, 2.4.3 > Environment: Ubuntu server, Spark 2.4.3 Scala with >64GB RAM per > node, 32 cores (tried different configurations of executors) >Reporter: antonkulaga >Priority: Critical > > Spark is super-slow for all wide data (when there are >15kb columns and >15kb > rows). Most of the genomics/transcriptomic data is wide because number of > genes is usually >20kb and number of samples ass well. Very popular GTEX > dataset is a good example ( see for instance RNA-Seq data at > https://storage.googleapis.com/gtex_analysis_v7/rna_seq_data where gct is > just a .tsv file with two comments in the beginning). Everything done in wide > tables (even simple "describe" functions applied to all the genes-columns) > either takes hours or gets frozen (because of lost executors) irrespective of > memory and numbers of cores. While the same operations work fast (minutes) > and well with pure pandas (without any spark involved). > f -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-27689) Error to execute hive views with spark
[ https://issues.apache.org/jira/browse/SPARK-27689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] feiwang updated SPARK-27689: Comment: was deleted (was: It seemed that this failure is caused by PR-SPARK-18801, https://github.com/apache/spark/pull/16233.) > Error to execute hive views with spark > -- > > Key: SPARK-27689 > URL: https://issues.apache.org/jira/browse/SPARK-27689 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0, 2.3.3, 2.4.3 >Reporter: Lambda >Priority: Major > > I have a python error when I execute the following code using hive views but > it works correctly when I run it with hive tables. > *Hive databases:* > {code:java} > CREATE DATABASE schema_p LOCATION "hdfs:///tmp/schema_p"; > {code} > *Hive tables:* > {code:java} > CREATE TABLE schema_p.product( > id_product string, > name string, > country string, > city string, > start_date string, > end_date string > ) > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' > OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' > LOCATION 'hdfs:///tmp/schema_p/product'; > {code} > {code:java} > CREATE TABLE schema_p.person_product( > id_person string, > id_product string, > country string, > city string, > price string, > start_date string, > end_date string > ) > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' > OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' > LOCATION 'hdfs:///tmp/schema_p/person_product'; > {code} > *Hive views:* > {code:java} > CREATE VIEW schema_p.product_v AS SELECT CAST(id_product AS INT) AS > id_product, name AS name, country AS country, city AS city, CAST(start_date > AS DATE) AS start_date, CAST(end_date AS DATE) AS end_date FROM > schema_p.product; > > CREATE VIEW schema_p.person_product_v AS SELECT CAST(id_person AS INT) AS > id_person, CAST(id_product AS INT) AS id_product, country AS country, city AS > city, CAST(price AS DECIMAL(38,8)) AS price, CAST(start_date AS DATE) AS > start_date, CAST(end_date AS DATE) AS end_date FROM schema_p.person_product; > {code} > *Code*: > {code:java} > def read_tables(sc): > in_dict = { 'product': 'product_v', 'person_product': 'person_product_v' } > data_dict = {} > for n, d in in_dict.iteritems(): > data_dict[n] = sc.read.table(d) > return data_dict > def get_population(tables, ref_date_str): > product = tables['product'] > person_product = tables['person_product'] > count_prod > =person_product.groupBy('id_product').agg(F.count('id_product').alias('count_prod')) > person_product_join = person_product.join(product,'id_product') > person_count = person_product_join.join(count_prod,'id_product') > final = person_product_join.join(person_count, 'id_person', 'left') > return final > import pyspark.sql.functions as F > import functools > from pyspark.sql.functions import col > from pyspark.sql.functions import add_months, lit, count, coalesce > spark.sql('use schema_p') > data_dict = read_tables(spark) > data_dict > population = get_population(data_dict, '2019-04-30') > population.show() > {code} > *Error:* > {code:java} > Traceback (most recent call last): > File "", line 1, in > File "", line 10, in get_population > File "/usr/hdp/current/spark2-client/python/pyspark/sql/dataframe.py", line > 931, in join > jdf = self._jdf.join(other._jdf, on, how) > File > "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.6-src.zip/py4j/java_gateway.py", > line 1160, in __call__ > File "/usr/hdp/current/spark2-client/python/pyspark/sql/utils.py", line 69, > in deco > raise AnalysisException(s.split(': ', 1)[1], stackTrace) > pyspark.sql.utils.AnalysisException: u'Resolved attribute(s) > id_person#103,start_date#108,id_product#104,end_date#109,price#107,country#105,city#106 > missing from > price#4,id_product#1,start_date#5,end_date#6,id_person#0,city#3,country#2 in > operator !Project [cast(id_person#103 as int) AS id_person#76, > cast(id_product#104 as int) AS id_product#77, cast(country#105 as string) AS > country#78, cast(city#106 as string) AS city#79, cast(price#107 as > decimal(38,8)) AS price#80, cast(start_date#108 as date) AS start_date#81, > cast(end_date#109 as date) AS end_date#82]. Attribute(s) with the same name > appear in the operation: > id_person,start_date,id_product,end_date,price,country,city. Please check if > the right attribute(s) are used.;; > Project [id_person#0, id_product#1, country#2, city#3, price#4, start_date#5, > end_date#6, name#29, country#30, city#31, start_date#32, end_date#33, > id_product#104, country#105, city#106, price#107,
[jira] [Issue Comment Deleted] (SPARK-27689) Error to execute hive views with spark
[ https://issues.apache.org/jira/browse/SPARK-27689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] feiwang updated SPARK-27689: Comment: was deleted (was: You can add an unit test into HiveSQLViewSuite.scala to reproduce it with the code below. ``` withTable("ta") { withView("va") { withView("vb") { withView("vc") { sql("CREATE TABLE ta (c1 STRING)") sql("CREATE VIEW va(c1) AS SELECT * FROM ta") sql("CREATE TEMPORARY VIEW vb AS SELECT a.c1 FROM va AS a") sql("CREATE TEMPORARY VIEW vc AS SELECT a.c1 FROM vb AS a JOIN vb as b ON a.c1 = b.c1") sql("SELECT a.c1 FROM vb as a JOIN vc as b ON a.c1 = b.c1") } } } } ```) > Error to execute hive views with spark > -- > > Key: SPARK-27689 > URL: https://issues.apache.org/jira/browse/SPARK-27689 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0, 2.3.3, 2.4.3 >Reporter: Lambda >Priority: Major > > I have a python error when I execute the following code using hive views but > it works correctly when I run it with hive tables. > *Hive databases:* > {code:java} > CREATE DATABASE schema_p LOCATION "hdfs:///tmp/schema_p"; > {code} > *Hive tables:* > {code:java} > CREATE TABLE schema_p.product( > id_product string, > name string, > country string, > city string, > start_date string, > end_date string > ) > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' > OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' > LOCATION 'hdfs:///tmp/schema_p/product'; > {code} > {code:java} > CREATE TABLE schema_p.person_product( > id_person string, > id_product string, > country string, > city string, > price string, > start_date string, > end_date string > ) > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' > OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' > LOCATION 'hdfs:///tmp/schema_p/person_product'; > {code} > *Hive views:* > {code:java} > CREATE VIEW schema_p.product_v AS SELECT CAST(id_product AS INT) AS > id_product, name AS name, country AS country, city AS city, CAST(start_date > AS DATE) AS start_date, CAST(end_date AS DATE) AS end_date FROM > schema_p.product; > > CREATE VIEW schema_p.person_product_v AS SELECT CAST(id_person AS INT) AS > id_person, CAST(id_product AS INT) AS id_product, country AS country, city AS > city, CAST(price AS DECIMAL(38,8)) AS price, CAST(start_date AS DATE) AS > start_date, CAST(end_date AS DATE) AS end_date FROM schema_p.person_product; > {code} > *Code*: > {code:java} > def read_tables(sc): > in_dict = { 'product': 'product_v', 'person_product': 'person_product_v' } > data_dict = {} > for n, d in in_dict.iteritems(): > data_dict[n] = sc.read.table(d) > return data_dict > def get_population(tables, ref_date_str): > product = tables['product'] > person_product = tables['person_product'] > count_prod > =person_product.groupBy('id_product').agg(F.count('id_product').alias('count_prod')) > person_product_join = person_product.join(product,'id_product') > person_count = person_product_join.join(count_prod,'id_product') > final = person_product_join.join(person_count, 'id_person', 'left') > return final > import pyspark.sql.functions as F > import functools > from pyspark.sql.functions import col > from pyspark.sql.functions import add_months, lit, count, coalesce > spark.sql('use schema_p') > data_dict = read_tables(spark) > data_dict > population = get_population(data_dict, '2019-04-30') > population.show() > {code} > *Error:* > {code:java} > Traceback (most recent call last): > File "", line 1, in > File "", line 10, in get_population > File "/usr/hdp/current/spark2-client/python/pyspark/sql/dataframe.py", line > 931, in join > jdf = self._jdf.join(other._jdf, on, how) > File > "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.6-src.zip/py4j/java_gateway.py", > line 1160, in __call__ > File "/usr/hdp/current/spark2-client/python/pyspark/sql/utils.py", line 69, > in deco > raise AnalysisException(s.split(': ', 1)[1], stackTrace) > pyspark.sql.utils.AnalysisException: u'Resolved attribute(s) > id_person#103,start_date#108,id_product#104,end_date#109,price#107,country#105,city#106 > missing from > price#4,id_product#1,start_date#5,end_date#6,id_person#0,city#3,country#2 in > operator !Project [cast(id_person#103 as int) AS id_person#76, > cast(id_product#104 as int) AS id_product#77, cast(country#105 as string) AS > country#78, cast(city#106 as string) AS city#79, cast(price#107 as > decimal(38,8)) AS price#80,
[jira] [Commented] (SPARK-27689) Error to execute hive views with spark
[ https://issues.apache.org/jira/browse/SPARK-27689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16897120#comment-16897120 ] feiwang commented on SPARK-27689: - You can add an unit test into HiveSQLViewSuite.scala to reproduce it with the code below. ``` withTable("ta") { withView("va") { withView("vb") { withView("vc") { sql("CREATE TABLE ta (c1 STRING)") sql("CREATE VIEW va(c1) AS SELECT * FROM ta") sql("CREATE TEMPORARY VIEW vb AS SELECT a.c1 FROM va AS a") sql("CREATE TEMPORARY VIEW vc AS SELECT a.c1 FROM vb AS a JOIN vb as b ON a.c1 = b.c1") sql("SELECT a.c1 FROM vb as a JOIN vc as b ON a.c1 = b.c1") } } } } ``` > Error to execute hive views with spark > -- > > Key: SPARK-27689 > URL: https://issues.apache.org/jira/browse/SPARK-27689 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0, 2.3.3, 2.4.3 >Reporter: Lambda >Priority: Major > > I have a python error when I execute the following code using hive views but > it works correctly when I run it with hive tables. > *Hive databases:* > {code:java} > CREATE DATABASE schema_p LOCATION "hdfs:///tmp/schema_p"; > {code} > *Hive tables:* > {code:java} > CREATE TABLE schema_p.product( > id_product string, > name string, > country string, > city string, > start_date string, > end_date string > ) > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' > OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' > LOCATION 'hdfs:///tmp/schema_p/product'; > {code} > {code:java} > CREATE TABLE schema_p.person_product( > id_person string, > id_product string, > country string, > city string, > price string, > start_date string, > end_date string > ) > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' > OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' > LOCATION 'hdfs:///tmp/schema_p/person_product'; > {code} > *Hive views:* > {code:java} > CREATE VIEW schema_p.product_v AS SELECT CAST(id_product AS INT) AS > id_product, name AS name, country AS country, city AS city, CAST(start_date > AS DATE) AS start_date, CAST(end_date AS DATE) AS end_date FROM > schema_p.product; > > CREATE VIEW schema_p.person_product_v AS SELECT CAST(id_person AS INT) AS > id_person, CAST(id_product AS INT) AS id_product, country AS country, city AS > city, CAST(price AS DECIMAL(38,8)) AS price, CAST(start_date AS DATE) AS > start_date, CAST(end_date AS DATE) AS end_date FROM schema_p.person_product; > {code} > *Code*: > {code:java} > def read_tables(sc): > in_dict = { 'product': 'product_v', 'person_product': 'person_product_v' } > data_dict = {} > for n, d in in_dict.iteritems(): > data_dict[n] = sc.read.table(d) > return data_dict > def get_population(tables, ref_date_str): > product = tables['product'] > person_product = tables['person_product'] > count_prod > =person_product.groupBy('id_product').agg(F.count('id_product').alias('count_prod')) > person_product_join = person_product.join(product,'id_product') > person_count = person_product_join.join(count_prod,'id_product') > final = person_product_join.join(person_count, 'id_person', 'left') > return final > import pyspark.sql.functions as F > import functools > from pyspark.sql.functions import col > from pyspark.sql.functions import add_months, lit, count, coalesce > spark.sql('use schema_p') > data_dict = read_tables(spark) > data_dict > population = get_population(data_dict, '2019-04-30') > population.show() > {code} > *Error:* > {code:java} > Traceback (most recent call last): > File "", line 1, in > File "", line 10, in get_population > File "/usr/hdp/current/spark2-client/python/pyspark/sql/dataframe.py", line > 931, in join > jdf = self._jdf.join(other._jdf, on, how) > File > "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.6-src.zip/py4j/java_gateway.py", > line 1160, in __call__ > File "/usr/hdp/current/spark2-client/python/pyspark/sql/utils.py", line 69, > in deco > raise AnalysisException(s.split(': ', 1)[1], stackTrace) > pyspark.sql.utils.AnalysisException: u'Resolved attribute(s) > id_person#103,start_date#108,id_product#104,end_date#109,price#107,country#105,city#106 > missing from > price#4,id_product#1,start_date#5,end_date#6,id_person#0,city#3,country#2 in > operator !Project [cast(id_person#103 as int) AS id_person#76, > cast(id_product#104 as int) AS id_product#77, cast(country#105 as string) AS > country#78, cast(city#106 as string) AS city#79, cast(price#107 as > decimal(38,8)) AS price#80,
[jira] [Updated] (SPARK-28580) ANSI SQL: unique predicate
[ https://issues.apache.org/jira/browse/SPARK-28580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-28580: --- Description: Format {code:java} ::= UNIQUE {code} was: Format ::= UNIQUE > ANSI SQL: unique predicate > -- > > Key: SPARK-28580 > URL: https://issues.apache.org/jira/browse/SPARK-28580 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: jiaan.geng >Priority: Major > > Format > {code:java} > ::= > UNIQUE {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28580) ANSI SQL: unique predicate
[ https://issues.apache.org/jira/browse/SPARK-28580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-28580: --- Issue Type: Sub-task (was: New Feature) Parent: SPARK-27764 > ANSI SQL: unique predicate > -- > > Key: SPARK-28580 > URL: https://issues.apache.org/jira/browse/SPARK-28580 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: jiaan.geng >Priority: Major > > Format > ::= > UNIQUE -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28580) ANSI SQL: unique predicate
jiaan.geng created SPARK-28580: -- Summary: ANSI SQL: unique predicate Key: SPARK-28580 URL: https://issues.apache.org/jira/browse/SPARK-28580 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 3.0.0 Reporter: jiaan.geng Format ::= UNIQUE -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28580) ANSI SQL: unique predicate
[ https://issues.apache.org/jira/browse/SPARK-28580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16896931#comment-16896931 ] jiaan.geng commented on SPARK-28580: I'm working on. > ANSI SQL: unique predicate > -- > > Key: SPARK-28580 > URL: https://issues.apache.org/jira/browse/SPARK-28580 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.0.0 >Reporter: jiaan.geng >Priority: Major > > Format > ::= > UNIQUE -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28579) MaxAbsScaler avoids conversion to breeze.vector
zhengruifeng created SPARK-28579: Summary: MaxAbsScaler avoids conversion to breeze.vector Key: SPARK-28579 URL: https://issues.apache.org/jira/browse/SPARK-28579 Project: Spark Issue Type: Improvement Components: ML Affects Versions: 3.0.0 Reporter: zhengruifeng In current impl, MaxAbsScaler will convert each vector to a breeze.vector in transformation. This should be skipped. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27924) ANSI SQL: Boolean Test
[ https://issues.apache.org/jira/browse/SPARK-27924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-27924: - Assignee: jiaan.geng > ANSI SQL: Boolean Test > -- > > Key: SPARK-27924 > URL: https://issues.apache.org/jira/browse/SPARK-27924 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Assignee: jiaan.geng >Priority: Major > > {quote} ::= > [ IS [ NOT ] ] > ::= > TRUE > | FALSE > | UNKNOWN{quote} > > Currently, the following DBMSs support the syntax: > * PostgreSQL: [https://www.postgresql.org/docs/9.1/functions-comparison.html] > * Hive: https://issues.apache.org/jira/browse/HIVE-13583 > * Redshift: > [https://docs.aws.amazon.com/redshift/latest/dg/r_Boolean_type.html] > * Vertica: > [https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/LanguageElements/Predicates/Boolean-predicate.htm] -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27924) Support ANSI SQL Boolean-Predicate syntax
[ https://issues.apache.org/jira/browse/SPARK-27924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-27924: -- Summary: Support ANSI SQL Boolean-Predicate syntax (was: ANSI SQL: Boolean Test) > Support ANSI SQL Boolean-Predicate syntax > - > > Key: SPARK-27924 > URL: https://issues.apache.org/jira/browse/SPARK-27924 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Assignee: jiaan.geng >Priority: Major > Fix For: 3.0.0 > > > {quote} ::= > [ IS [ NOT ] ] > ::= > TRUE > | FALSE > | UNKNOWN{quote} > > Currently, the following DBMSs support the syntax: > * PostgreSQL: [https://www.postgresql.org/docs/9.1/functions-comparison.html] > * Hive: https://issues.apache.org/jira/browse/HIVE-13583 > * Redshift: > [https://docs.aws.amazon.com/redshift/latest/dg/r_Boolean_type.html] > * Vertica: > [https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/LanguageElements/Predicates/Boolean-predicate.htm] -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27924) ANSI SQL: Boolean Test
[ https://issues.apache.org/jira/browse/SPARK-27924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-27924. --- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 25074 [https://github.com/apache/spark/pull/25074] > ANSI SQL: Boolean Test > -- > > Key: SPARK-27924 > URL: https://issues.apache.org/jira/browse/SPARK-27924 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Assignee: jiaan.geng >Priority: Major > Fix For: 3.0.0 > > > {quote} ::= > [ IS [ NOT ] ] > ::= > TRUE > | FALSE > | UNKNOWN{quote} > > Currently, the following DBMSs support the syntax: > * PostgreSQL: [https://www.postgresql.org/docs/9.1/functions-comparison.html] > * Hive: https://issues.apache.org/jira/browse/HIVE-13583 > * Redshift: > [https://docs.aws.amazon.com/redshift/latest/dg/r_Boolean_type.html] > * Vertica: > [https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/LanguageElements/Predicates/Boolean-predicate.htm] -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28578) Improve Github pull request template
Hyukjin Kwon created SPARK-28578: Summary: Improve Github pull request template Key: SPARK-28578 URL: https://issues.apache.org/jira/browse/SPARK-28578 Project: Spark Issue Type: Test Components: Project Infra Affects Versions: 3.0.0 Reporter: Hyukjin Kwon See http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-New-sections-in-Github-Pull-Request-description-template-td27527.html We should improve our PR template for better review iterations -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28577) Ensure executorMemoryHead requested value not less than MEMORY_OFFHEAP_SIZE when MEMORY_OFFHEAP_ENABLED is true
[ https://issues.apache.org/jira/browse/SPARK-28577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-28577: - Description: If MEMORY_OFFHEAP_ENABLED is true, we should ensure executorOverheadMemory not less than MEMORY_OFFHEAP_SIZE, otherwise the memory resource requested for executor may be not enough. (was: If MEMORY_OFFHEAP_ENABLED is true, we should ensure executorOverheadMemory not less than MEMORY_OFFHEAP_SIZE, otherwise the memory resource requested for executor is not enough.) > Ensure executorMemoryHead requested value not less than MEMORY_OFFHEAP_SIZE > when MEMORY_OFFHEAP_ENABLED is true > --- > > Key: SPARK-28577 > URL: https://issues.apache.org/jira/browse/SPARK-28577 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 2.2.3, 2.3.3, 2.4.3 >Reporter: Yang Jie >Priority: Major > > If MEMORY_OFFHEAP_ENABLED is true, we should ensure executorOverheadMemory > not less than MEMORY_OFFHEAP_SIZE, otherwise the memory resource requested > for executor may be not enough. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28577) Ensure executorMemoryHead requested value not less than MEMORY_OFFHEAP_SIZE when MEMORY_OFFHEAP_ENABLED is true
Yang Jie created SPARK-28577: Summary: Ensure executorMemoryHead requested value not less than MEMORY_OFFHEAP_SIZE when MEMORY_OFFHEAP_ENABLED is true Key: SPARK-28577 URL: https://issues.apache.org/jira/browse/SPARK-28577 Project: Spark Issue Type: Improvement Components: YARN Affects Versions: 2.4.3, 2.3.3, 2.2.3 Reporter: Yang Jie If MEMORY_OFFHEAP_ENABLED is true, we should ensure executorOverheadMemory not less than MEMORY_OFFHEAP_SIZE, otherwise the memory resource requested for executor is not enough. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28529) Fix PullupCorrelatedPredicates optimizer rule to enforce idempotence
[ https://issues.apache.org/jira/browse/SPARK-28529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-28529. - Resolution: Duplicate > Fix PullupCorrelatedPredicates optimizer rule to enforce idempotence > > > Key: SPARK-28529 > URL: https://issues.apache.org/jira/browse/SPARK-28529 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yesheng Ma >Assignee: Dilip Biswal >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28529) Fix PullupCorrelatedPredicates optimizer rule to enforce idempotence
[ https://issues.apache.org/jira/browse/SPARK-28529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li reassigned SPARK-28529: --- Assignee: Dilip Biswal > Fix PullupCorrelatedPredicates optimizer rule to enforce idempotence > > > Key: SPARK-28529 > URL: https://issues.apache.org/jira/browse/SPARK-28529 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yesheng Ma >Assignee: Dilip Biswal >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org