[jira] [Comment Edited] (SPARK-32778) Accidental Data Deletion on calling saveAsTable
[ https://issues.apache.org/jira/browse/SPARK-32778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196721#comment-17196721 ] Aman Rastogi edited comment on SPARK-32778 at 9/16/20, 6:46 AM: I have reproduced the issue with v2.4.4. Code is also similar as it was in v2.2.0 Line: 176 [https://github.com/apache/spark/blob/v2.4.4/sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala] was (Author: amanr): I have reproduced the issue with v2.4.4. Code is also similar as it was in v2.2.0 https://github.com/apache/spark/blob/v2.4.4/sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala > Accidental Data Deletion on calling saveAsTable > --- > > Key: SPARK-32778 > URL: https://issues.apache.org/jira/browse/SPARK-32778 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4 >Reporter: Aman Rastogi >Priority: Major > > {code:java} > df.write.option("path", > "/already/existing/path").mode(SaveMode.Append).format("json").saveAsTable(db.table) > {code} > Above code deleted the data present in path "/already/existing/path". This > happened because table was already not there in hive metastore however, path > given had data. And if table is not present in Hive Metastore, SaveMode gets > modified internally to SaveMode.Overwrite irrespective of what user has > provided, which leads to data deletion. This change was introduced as part of > https://issues.apache.org/jira/browse/SPARK-19583. > Now, suppose if user is not using external hive metastore (hive metastore is > associated with a cluster) and if cluster goes down or due to some reason > user has to migrate to a new cluster. Once user tries to save data using > above code in new cluster, it will first delete the data. It could be a > production data and user is completely unaware of it as they have provided > SaveMode.Append or ErrorIfExists. This will be an accidental data deletion. > > Repro Steps: > > # Save data through a hive table as mentioned in above code > # create another cluster and save data in new table in new cluster by giving > same path > > Proposed Fix: > Instead of modifying SaveMode to Overwrite, we should modify it to > ErrorIfExists in class CreateDataSourceTableAsSelectCommand. > Change (line 154) > > {code:java} > val result = saveDataIntoTable( > sparkSession, table, tableLocation, child, SaveMode.Overwrite, tableExists = > false) > > {code} > to > > {code:java} > val result = saveDataIntoTable( > sparkSession, table, tableLocation, child, SaveMode.ErrorIfExists, > tableExists = false){code} > This should not break CTAS. Even in case of CTAS, user may not want to delete > data if already exists as it could be accidental. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-32778) Accidental Data Deletion on calling saveAsTable
[ https://issues.apache.org/jira/browse/SPARK-32778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Rastogi reopened SPARK-32778: -- I have reproduced the issue with v2.4.4. Code is also similar as it was in v2.2.0 https://github.com/apache/spark/blob/v2.4.4/sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala > Accidental Data Deletion on calling saveAsTable > --- > > Key: SPARK-32778 > URL: https://issues.apache.org/jira/browse/SPARK-32778 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4 >Reporter: Aman Rastogi >Priority: Major > > {code:java} > df.write.option("path", > "/already/existing/path").mode(SaveMode.Append).format("json").saveAsTable(db.table) > {code} > Above code deleted the data present in path "/already/existing/path". This > happened because table was already not there in hive metastore however, path > given had data. And if table is not present in Hive Metastore, SaveMode gets > modified internally to SaveMode.Overwrite irrespective of what user has > provided, which leads to data deletion. This change was introduced as part of > https://issues.apache.org/jira/browse/SPARK-19583. > Now, suppose if user is not using external hive metastore (hive metastore is > associated with a cluster) and if cluster goes down or due to some reason > user has to migrate to a new cluster. Once user tries to save data using > above code in new cluster, it will first delete the data. It could be a > production data and user is completely unaware of it as they have provided > SaveMode.Append or ErrorIfExists. This will be an accidental data deletion. > > Repro Steps: > > # Save data through a hive table as mentioned in above code > # create another cluster and save data in new table in new cluster by giving > same path > > Proposed Fix: > Instead of modifying SaveMode to Overwrite, we should modify it to > ErrorIfExists in class CreateDataSourceTableAsSelectCommand. > Change (line 154) > > {code:java} > val result = saveDataIntoTable( > sparkSession, table, tableLocation, child, SaveMode.Overwrite, tableExists = > false) > > {code} > to > > {code:java} > val result = saveDataIntoTable( > sparkSession, table, tableLocation, child, SaveMode.ErrorIfExists, > tableExists = false){code} > This should not break CTAS. Even in case of CTAS, user may not want to delete > data if already exists as it could be accidental. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32778) Accidental Data Deletion on calling saveAsTable
[ https://issues.apache.org/jira/browse/SPARK-32778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Rastogi updated SPARK-32778: - Affects Version/s: (was: 2.2.0) 2.4.4 > Accidental Data Deletion on calling saveAsTable > --- > > Key: SPARK-32778 > URL: https://issues.apache.org/jira/browse/SPARK-32778 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4 >Reporter: Aman Rastogi >Priority: Major > > {code:java} > df.write.option("path", > "/already/existing/path").mode(SaveMode.Append).format("json").saveAsTable(db.table) > {code} > Above code deleted the data present in path "/already/existing/path". This > happened because table was already not there in hive metastore however, path > given had data. And if table is not present in Hive Metastore, SaveMode gets > modified internally to SaveMode.Overwrite irrespective of what user has > provided, which leads to data deletion. This change was introduced as part of > https://issues.apache.org/jira/browse/SPARK-19583. > Now, suppose if user is not using external hive metastore (hive metastore is > associated with a cluster) and if cluster goes down or due to some reason > user has to migrate to a new cluster. Once user tries to save data using > above code in new cluster, it will first delete the data. It could be a > production data and user is completely unaware of it as they have provided > SaveMode.Append or ErrorIfExists. This will be an accidental data deletion. > > Repro Steps: > > # Save data through a hive table as mentioned in above code > # create another cluster and save data in new table in new cluster by giving > same path > > Proposed Fix: > Instead of modifying SaveMode to Overwrite, we should modify it to > ErrorIfExists in class CreateDataSourceTableAsSelectCommand. > Change (line 154) > > {code:java} > val result = saveDataIntoTable( > sparkSession, table, tableLocation, child, SaveMode.Overwrite, tableExists = > false) > > {code} > to > > {code:java} > val result = saveDataIntoTable( > sparkSession, table, tableLocation, child, SaveMode.ErrorIfExists, > tableExists = false){code} > This should not break CTAS. Even in case of CTAS, user may not want to delete > data if already exists as it could be accidental. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32898) totalExecutorRunTimeMs is too big
Linhong Liu created SPARK-32898: --- Summary: totalExecutorRunTimeMs is too big Key: SPARK-32898 URL: https://issues.apache.org/jira/browse/SPARK-32898 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.0.1 Reporter: Linhong Liu This might be because of incorrectly calculating executorRunTimeMs in Executor.scala The function collectAccumulatorsAndResetStatusOnFailure(taskStartTimeNs) can be called when taskStartTimeNs is not set yet (it is 0). As of now in master branch, here is the problematic code: [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L470] There is a throw exception before this line. The catch branch still updates the metric. However the query shows as SUCCESSful in QPL. Maybe this task is speculative. Not sure. submissionTime in LiveExecutionData may also have similar problem. [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala#L449] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32804) run-example failed in standalone cluster mode
[ https://issues.apache.org/jira/browse/SPARK-32804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196660#comment-17196660 ] Apache Spark commented on SPARK-32804: -- User 'KevinSmile' has created a pull request for this issue: https://github.com/apache/spark/pull/29769 > run-example failed in standalone cluster mode > - > > Key: SPARK-32804 > URL: https://issues.apache.org/jira/browse/SPARK-32804 > Project: Spark > Issue Type: Bug > Components: Deploy, Examples >Affects Versions: 2.4.0, 3.0.0 > Environment: Spark 3.0 >Reporter: Kevin Wang >Assignee: Kevin Wang >Priority: Minor > Fix For: 3.1.0 > > Attachments: image-2020-09-05-21-55-00-227.png > > > run-example failed in standalone cluster mode (seems like something wrong in > SparkSubmitCommand Build): > > !image-2020-09-05-21-55-00-227.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32804) run-example failed in standalone cluster mode
[ https://issues.apache.org/jira/browse/SPARK-32804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196658#comment-17196658 ] Apache Spark commented on SPARK-32804: -- User 'KevinSmile' has created a pull request for this issue: https://github.com/apache/spark/pull/29769 > run-example failed in standalone cluster mode > - > > Key: SPARK-32804 > URL: https://issues.apache.org/jira/browse/SPARK-32804 > Project: Spark > Issue Type: Bug > Components: Deploy, Examples >Affects Versions: 2.4.0, 3.0.0 > Environment: Spark 3.0 >Reporter: Kevin Wang >Assignee: Kevin Wang >Priority: Minor > Fix For: 3.1.0 > > Attachments: image-2020-09-05-21-55-00-227.png > > > run-example failed in standalone cluster mode (seems like something wrong in > SparkSubmitCommand Build): > > !image-2020-09-05-21-55-00-227.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32894) Timestamp cast in exernal ocr table
[ https://issues.apache.org/jira/browse/SPARK-32894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196651#comment-17196651 ] Hyukjin Kwon commented on SPARK-32894: -- How did you create the Hive table? > Timestamp cast in exernal ocr table > --- > > Key: SPARK-32894 > URL: https://issues.apache.org/jira/browse/SPARK-32894 > Project: Spark > Issue Type: Bug > Components: Java API >Affects Versions: 3.0.0 > Environment: Spark 3.0.0 > Java 1.8 > Hadoop 3.3.0 > Hive 3.1.2 > Python 3.7 (from pyspark) >Reporter: Grigory Skvortsov >Priority: Major > > I have the external hive table stored as orc. I want to work with timestamp > column in my table using pyspark. > For example, I try this: > spark.sql('select id, time_ from mydb.table1`).show() > > Py4JJavaError: An error occurred while calling o2877.showString. > : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 > in stage 4.0 failed 4 times, most recent failure: Lost task 0.3 in stage 4.0 > (TID 19, 172.29.14.241, executor 1): java.lang.ClassCastException: > org.apache.spark.unsafe.types.UTF8String cannot be cast to java.lang.Long > at scala.runtime.BoxesRunTime.unboxToLong(BoxesRunTime.java:107) > at > org.apache.spark.sql.catalyst.expressions.MutableLong.update(SpecificInternalRow.scala:148) > at > org.apache.spark.sql.catalyst.expressions.SpecificInternalRow.update(SpecificInternalRow.scala:228) > at > org.apache.spark.sql.hive.HiveInspectors.$anonfun$unwrapperFor$53(HiveInspectors.scala:730) > at > org.apache.spark.sql.hive.HiveInspectors.$anonfun$unwrapperFor$53$adapted(HiveInspectors.scala:730) > at > org.apache.spark.sql.hive.orc.OrcFileFormat$.$anonfun$unwrapOrcStructs$4(OrcFileFormat.scala:351) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.next(FileScanRDD.scala:96) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:340) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:872) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:872) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:313) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:127) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:444) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:447) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Driver stacktrace: > at > org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2023) > at > org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:1972) > at > org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:1971) > at > scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at > scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1971) > at > org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:950) > at > org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:950) > at scala.Option.foreach(Option.scala:407) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:950) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2203) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2152) > at > org.apache.spark.sche
[jira] [Assigned] (SPARK-32897) SparkSession.builder.getOrCreate should not show deprecation warning of SQLContext
[ https://issues.apache.org/jira/browse/SPARK-32897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32897: Assignee: Apache Spark > SparkSession.builder.getOrCreate should not show deprecation warning of > SQLContext > -- > > Key: SPARK-32897 > URL: https://issues.apache.org/jira/browse/SPARK-32897 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.7, 3.0.1, 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > > In PySpark shell: > {code} > import warnings > from pyspark.sql import SparkSession, SQLContext > warnings.simplefilter('always', DeprecationWarning) > spark.stop() > SparkSession.builder.getOrCreate() > {code} > shows a deprecation warning from {{SQLContext}} > {code} > /.../spark/python/pyspark/sql/context.py:72: DeprecationWarning: Deprecated > in 3.0.0. Use SparkSession.builder.getOrCreate() instead. > DeprecationWarning) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32897) SparkSession.builder.getOrCreate should not show deprecation warning of SQLContext
[ https://issues.apache.org/jira/browse/SPARK-32897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32897: Assignee: (was: Apache Spark) > SparkSession.builder.getOrCreate should not show deprecation warning of > SQLContext > -- > > Key: SPARK-32897 > URL: https://issues.apache.org/jira/browse/SPARK-32897 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.7, 3.0.1, 3.1.0 >Reporter: Hyukjin Kwon >Priority: Major > > In PySpark shell: > {code} > import warnings > from pyspark.sql import SparkSession, SQLContext > warnings.simplefilter('always', DeprecationWarning) > spark.stop() > SparkSession.builder.getOrCreate() > {code} > shows a deprecation warning from {{SQLContext}} > {code} > /.../spark/python/pyspark/sql/context.py:72: DeprecationWarning: Deprecated > in 3.0.0. Use SparkSession.builder.getOrCreate() instead. > DeprecationWarning) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32897) SparkSession.builder.getOrCreate should not show deprecation warning of SQLContext
[ https://issues.apache.org/jira/browse/SPARK-32897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196648#comment-17196648 ] Apache Spark commented on SPARK-32897: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/29768 > SparkSession.builder.getOrCreate should not show deprecation warning of > SQLContext > -- > > Key: SPARK-32897 > URL: https://issues.apache.org/jira/browse/SPARK-32897 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.7, 3.0.1, 3.1.0 >Reporter: Hyukjin Kwon >Priority: Major > > In PySpark shell: > {code} > import warnings > from pyspark.sql import SparkSession, SQLContext > warnings.simplefilter('always', DeprecationWarning) > spark.stop() > SparkSession.builder.getOrCreate() > {code} > shows a deprecation warning from {{SQLContext}} > {code} > /.../spark/python/pyspark/sql/context.py:72: DeprecationWarning: Deprecated > in 3.0.0. Use SparkSession.builder.getOrCreate() instead. > DeprecationWarning) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32688) LiteralGenerator for float and double does not generate special values
[ https://issues.apache.org/jira/browse/SPARK-32688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro resolved SPARK-32688. -- Fix Version/s: 3.1.0 3.02 Assignee: Tanel Kiis Resolution: Fixed Resolved by https://github.com/apache/spark/pull/29515 > LiteralGenerator for float and double does not generate special values > -- > > Key: SPARK-32688 > URL: https://issues.apache.org/jira/browse/SPARK-32688 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.1.0 >Reporter: Tanel Kiis >Assignee: Tanel Kiis >Priority: Minor > Fix For: 3.02, 3.1.0 > > > Values like Double.NaN, Double.PositiveInfinity, Double.NegativeInfinity are > never returned. > The main usage of LiteralGenerator is in the > checkConsistencyBetweenInterpretedAndCodegen method. > This would have detected SPARK-32640 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32896) Add DataStreamWriter.table API
[ https://issues.apache.org/jira/browse/SPARK-32896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32896: Assignee: Apache Spark (was: Jungtaek Lim) > Add DataStreamWriter.table API > -- > > Key: SPARK-32896 > URL: https://issues.apache.org/jira/browse/SPARK-32896 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Affects Versions: 3.1.0 >Reporter: Jungtaek Lim >Assignee: Apache Spark >Priority: Major > > For now, there's no way to write to the table (especially catalog table) even > the table is capable to handle streaming write. > We can add DataStreamWriter.table API to let end users specify table as > provider, and let streaming query write into the table. That is just to > specify the table, and the overall usage of DataStreamWriter isn't changed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32896) Add DataStreamWriter.table API
[ https://issues.apache.org/jira/browse/SPARK-32896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196644#comment-17196644 ] Apache Spark commented on SPARK-32896: -- User 'HeartSaVioR' has created a pull request for this issue: https://github.com/apache/spark/pull/29767 > Add DataStreamWriter.table API > -- > > Key: SPARK-32896 > URL: https://issues.apache.org/jira/browse/SPARK-32896 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Affects Versions: 3.1.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Major > > For now, there's no way to write to the table (especially catalog table) even > the table is capable to handle streaming write. > We can add DataStreamWriter.table API to let end users specify table as > provider, and let streaming query write into the table. That is just to > specify the table, and the overall usage of DataStreamWriter isn't changed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32896) Add DataStreamWriter.table API
[ https://issues.apache.org/jira/browse/SPARK-32896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32896: Assignee: Jungtaek Lim (was: Apache Spark) > Add DataStreamWriter.table API > -- > > Key: SPARK-32896 > URL: https://issues.apache.org/jira/browse/SPARK-32896 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Affects Versions: 3.1.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Major > > For now, there's no way to write to the table (especially catalog table) even > the table is capable to handle streaming write. > We can add DataStreamWriter.table API to let end users specify table as > provider, and let streaming query write into the table. That is just to > specify the table, and the overall usage of DataStreamWriter isn't changed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32897) SparkSession.builder.getOrCreate should not show deprecation warning of SQLContext
Hyukjin Kwon created SPARK-32897: Summary: SparkSession.builder.getOrCreate should not show deprecation warning of SQLContext Key: SPARK-32897 URL: https://issues.apache.org/jira/browse/SPARK-32897 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 3.0.1, 2.4.7, 3.1.0 Reporter: Hyukjin Kwon In PySpark shell: {code} import warnings from pyspark.sql import SparkSession, SQLContext warnings.simplefilter('always', DeprecationWarning) spark.stop() SparkSession.builder.getOrCreate() {code} shows a deprecation warning from {{SQLContext}} {code} /.../spark/python/pyspark/sql/context.py:72: DeprecationWarning: Deprecated in 3.0.0. Use SparkSession.builder.getOrCreate() instead. DeprecationWarning) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32187) User Guide - Shipping Python Package
[ https://issues.apache.org/jira/browse/SPARK-32187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196634#comment-17196634 ] Hyukjin Kwon commented on SPARK-32187: -- Thank you so much [~fhoering]! > User Guide - Shipping Python Package > > > Key: SPARK-32187 > URL: https://issues.apache.org/jira/browse/SPARK-32187 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Fabian Höring >Priority: Major > > - Zipped file > - Python files > - Virtualenv with Yarn > - PEX \(?\) (see also SPARK-25433) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32896) Add DataStreamWriter.table API
Jungtaek Lim created SPARK-32896: Summary: Add DataStreamWriter.table API Key: SPARK-32896 URL: https://issues.apache.org/jira/browse/SPARK-32896 Project: Spark Issue Type: New Feature Components: Structured Streaming Affects Versions: 3.1.0 Reporter: Jungtaek Lim Assignee: Jungtaek Lim For now, there's no way to write to the table (especially catalog table) even the table is capable to handle streaming write. We can add DataStreamWriter.table API to let end users specify table as provider, and let streaming query write into the table. That is just to specify the table, and the overall usage of DataStreamWriter isn't changed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32704) Logging plan changes for execution
[ https://issues.apache.org/jira/browse/SPARK-32704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196613#comment-17196613 ] Apache Spark commented on SPARK-32704: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/29766 > Logging plan changes for execution > -- > > Key: SPARK-32704 > URL: https://issues.apache.org/jira/browse/SPARK-32704 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Takeshi Yamamuro >Assignee: Takeshi Yamamuro >Priority: Minor > Fix For: 3.1.0 > > > Since we only log plan changes for analyzer/optimizer now, this ticket > targets adding code to log plan changes in the preparation phase in > QueryExecution for execution. > {code} > scala> spark.sql("SET spark.sql.optimizer.planChangeLog.level=WARN") > scala> spark.range(10).groupBy("id").count().queryExecution.executedPlan > ... > 20/08/26 09:32:36 WARN PlanChangeLogger: > === Applying Rule org.apache.spark.sql.execution.CollapseCodegenStages === > !HashAggregate(keys=[id#19L], functions=[count(1)], output=[id#19L, > count#23L]) *(1) HashAggregate(keys=[id#19L], > functions=[count(1)], output=[id#19L, count#23L]) > !+- HashAggregate(keys=[id#19L], functions=[partial_count(1)], > output=[id#19L, count#27L]) +- *(1) HashAggregate(keys=[id#19L], > functions=[partial_count(1)], output=[id#19L, count#27L]) > ! +- Range (0, 10, step=1, splits=4) > +- *(1) Range (0, 10, step=1, splits=4) > > 20/08/26 09:32:36 WARN PlanChangeLogger: > === Result of Batch Preparations === > !HashAggregate(keys=[id#19L], functions=[count(1)], output=[id#19L, > count#23L]) *(1) HashAggregate(keys=[id#19L], > functions=[count(1)], output=[id#19L, count#23L]) > !+- HashAggregate(keys=[id#19L], functions=[partial_count(1)], > output=[id#19L, count#27L]) +- *(1) HashAggregate(keys=[id#19L], > functions=[partial_count(1)], output=[id#19L, count#27L]) > ! +- Range (0, 10, step=1, splits=4) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32704) Logging plan changes for execution
[ https://issues.apache.org/jira/browse/SPARK-32704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196612#comment-17196612 ] Apache Spark commented on SPARK-32704: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/29766 > Logging plan changes for execution > -- > > Key: SPARK-32704 > URL: https://issues.apache.org/jira/browse/SPARK-32704 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Takeshi Yamamuro >Assignee: Takeshi Yamamuro >Priority: Minor > Fix For: 3.1.0 > > > Since we only log plan changes for analyzer/optimizer now, this ticket > targets adding code to log plan changes in the preparation phase in > QueryExecution for execution. > {code} > scala> spark.sql("SET spark.sql.optimizer.planChangeLog.level=WARN") > scala> spark.range(10).groupBy("id").count().queryExecution.executedPlan > ... > 20/08/26 09:32:36 WARN PlanChangeLogger: > === Applying Rule org.apache.spark.sql.execution.CollapseCodegenStages === > !HashAggregate(keys=[id#19L], functions=[count(1)], output=[id#19L, > count#23L]) *(1) HashAggregate(keys=[id#19L], > functions=[count(1)], output=[id#19L, count#23L]) > !+- HashAggregate(keys=[id#19L], functions=[partial_count(1)], > output=[id#19L, count#27L]) +- *(1) HashAggregate(keys=[id#19L], > functions=[partial_count(1)], output=[id#19L, count#27L]) > ! +- Range (0, 10, step=1, splits=4) > +- *(1) Range (0, 10, step=1, splits=4) > > 20/08/26 09:32:36 WARN PlanChangeLogger: > === Result of Batch Preparations === > !HashAggregate(keys=[id#19L], functions=[count(1)], output=[id#19L, > count#23L]) *(1) HashAggregate(keys=[id#19L], > functions=[count(1)], output=[id#19L, count#23L]) > !+- HashAggregate(keys=[id#19L], functions=[partial_count(1)], > output=[id#19L, count#27L]) +- *(1) HashAggregate(keys=[id#19L], > functions=[partial_count(1)], output=[id#19L, count#27L]) > ! +- Range (0, 10, step=1, splits=4) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32888) reading a parallized rdd with two identical records results in a zero count df when read via spark.read.csv
[ https://issues.apache.org/jira/browse/SPARK-32888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32888: Assignee: (was: Apache Spark) > reading a parallized rdd with two identical records results in a zero count > df when read via spark.read.csv > --- > > Key: SPARK-32888 > URL: https://issues.apache.org/jira/browse/SPARK-32888 > Project: Spark > Issue Type: Documentation > Components: Spark Core >Affects Versions: 2.4.5, 2.4.6, 2.4.7, 3.0.0, 3.0.1 >Reporter: Punit Shah >Priority: Minor > > * Imagine a two-row csv file like so (where the header and first record are > duplicate rows): > aaa,bbb > aaa,bbb > * The following is pyspark code > * create a parallelized rdd like: {color:#FF}prdd = > spark.read.text("test.csv").rdd.flatMap(lambda x : x){color} > * {color:#172b4d}create a df like so: {color:#de350b}mydf = > spark.read.csv(prdd, header=True){color}{color} > * {color:#172b4d}{color:#de350b}df.count(){color:#172b4d} will result in a > record count of zero (when it should be 1){color}{color}{color} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32888) reading a parallized rdd with two identical records results in a zero count df when read via spark.read.csv
[ https://issues.apache.org/jira/browse/SPARK-32888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196591#comment-17196591 ] Apache Spark commented on SPARK-32888: -- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/29765 > reading a parallized rdd with two identical records results in a zero count > df when read via spark.read.csv > --- > > Key: SPARK-32888 > URL: https://issues.apache.org/jira/browse/SPARK-32888 > Project: Spark > Issue Type: Documentation > Components: Spark Core >Affects Versions: 2.4.5, 2.4.6, 2.4.7, 3.0.0, 3.0.1 >Reporter: Punit Shah >Priority: Minor > > * Imagine a two-row csv file like so (where the header and first record are > duplicate rows): > aaa,bbb > aaa,bbb > * The following is pyspark code > * create a parallelized rdd like: {color:#FF}prdd = > spark.read.text("test.csv").rdd.flatMap(lambda x : x){color} > * {color:#172b4d}create a df like so: {color:#de350b}mydf = > spark.read.csv(prdd, header=True){color}{color} > * {color:#172b4d}{color:#de350b}df.count(){color:#172b4d} will result in a > record count of zero (when it should be 1){color}{color}{color} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32888) reading a parallized rdd with two identical records results in a zero count df when read via spark.read.csv
[ https://issues.apache.org/jira/browse/SPARK-32888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32888: Assignee: Apache Spark > reading a parallized rdd with two identical records results in a zero count > df when read via spark.read.csv > --- > > Key: SPARK-32888 > URL: https://issues.apache.org/jira/browse/SPARK-32888 > Project: Spark > Issue Type: Documentation > Components: Spark Core >Affects Versions: 2.4.5, 2.4.6, 2.4.7, 3.0.0, 3.0.1 >Reporter: Punit Shah >Assignee: Apache Spark >Priority: Minor > > * Imagine a two-row csv file like so (where the header and first record are > duplicate rows): > aaa,bbb > aaa,bbb > * The following is pyspark code > * create a parallelized rdd like: {color:#FF}prdd = > spark.read.text("test.csv").rdd.flatMap(lambda x : x){color} > * {color:#172b4d}create a df like so: {color:#de350b}mydf = > spark.read.csv(prdd, header=True){color}{color} > * {color:#172b4d}{color:#de350b}df.count(){color:#172b4d} will result in a > record count of zero (when it should be 1){color}{color}{color} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32888) reading a parallized rdd with two identical records results in a zero count df when read via spark.read.csv
[ https://issues.apache.org/jira/browse/SPARK-32888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196590#comment-17196590 ] L. C. Hsieh commented on SPARK-32888: - This was documented in CSV related codes, although it seems not documented in user documentation. > reading a parallized rdd with two identical records results in a zero count > df when read via spark.read.csv > --- > > Key: SPARK-32888 > URL: https://issues.apache.org/jira/browse/SPARK-32888 > Project: Spark > Issue Type: Documentation > Components: Spark Core >Affects Versions: 2.4.5, 2.4.6, 2.4.7, 3.0.0, 3.0.1 >Reporter: Punit Shah >Priority: Minor > > * Imagine a two-row csv file like so (where the header and first record are > duplicate rows): > aaa,bbb > aaa,bbb > * The following is pyspark code > * create a parallelized rdd like: {color:#FF}prdd = > spark.read.text("test.csv").rdd.flatMap(lambda x : x){color} > * {color:#172b4d}create a df like so: {color:#de350b}mydf = > spark.read.csv(prdd, header=True){color}{color} > * {color:#172b4d}{color:#de350b}df.count(){color:#172b4d} will result in a > record count of zero (when it should be 1){color}{color}{color} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32888) reading a parallized rdd with two identical records results in a zero count df when read via spark.read.csv
[ https://issues.apache.org/jira/browse/SPARK-32888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] L. C. Hsieh updated SPARK-32888: Issue Type: Documentation (was: Bug) > reading a parallized rdd with two identical records results in a zero count > df when read via spark.read.csv > --- > > Key: SPARK-32888 > URL: https://issues.apache.org/jira/browse/SPARK-32888 > Project: Spark > Issue Type: Documentation > Components: Spark Core >Affects Versions: 2.4.5, 2.4.6, 2.4.7, 3.0.0, 3.0.1 >Reporter: Punit Shah >Priority: Minor > > * Imagine a two-row csv file like so (where the header and first record are > duplicate rows): > aaa,bbb > aaa,bbb > * The following is pyspark code > * create a parallelized rdd like: {color:#FF}prdd = > spark.read.text("test.csv").rdd.flatMap(lambda x : x){color} > * {color:#172b4d}create a df like so: {color:#de350b}mydf = > spark.read.csv(prdd, header=True){color}{color} > * {color:#172b4d}{color:#de350b}df.count(){color:#172b4d} will result in a > record count of zero (when it should be 1){color}{color}{color} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32891) Enhance UTF8String.trim
[ https://issues.apache.org/jira/browse/SPARK-32891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-32891: Description: It sounds like {{UTF8String.trim}} is not implemented well. We may need to look at how {{java.lang.String.trim}} is implemented. Please see comment: [https://github.com/apache/spark/blob/7eb76d698836a251065753117e22285dd1a8aa8f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L674-L675] [https://github.com/apache/spark/pull/29731#discussion_r487709672] was: Please see comment: https://github.com/apache/spark/blob/7eb76d698836a251065753117e22285dd1a8aa8f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L674-L675 https://github.com/apache/spark/pull/29731#discussion_r487709672 > Enhance UTF8String.trim > --- > > Key: SPARK-32891 > URL: https://issues.apache.org/jira/browse/SPARK-32891 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Priority: Major > > It sounds like {{UTF8String.trim}} is not implemented well. We may need to > look at how {{java.lang.String.trim}} is implemented. > Please see comment: > > [https://github.com/apache/spark/blob/7eb76d698836a251065753117e22285dd1a8aa8f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L674-L675] > [https://github.com/apache/spark/pull/29731#discussion_r487709672] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32895) DataSourceV2 allow ACCEPT_ANY_SCHEMA in write path
Sebastian Herold created SPARK-32895: Summary: DataSourceV2 allow ACCEPT_ANY_SCHEMA in write path Key: SPARK-32895 URL: https://issues.apache.org/jira/browse/SPARK-32895 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0 Reporter: Sebastian Herold During the development of a Spark-Collibra-Connector using the DataSourceV2 framework, I found some blocking limitation in the current version. The connector should accept DataFrames of arbitrary schemas and send them to the Import API of Collibra. The problem is the method {{inferSchema}} of the {{TableProvider}}. Although, my {{Table}} implementation has the capability to {{ACCEPT_ANY_SCHEMA}}. I need to infer the schema without knowing the actual schema of the data frame. This is impossible. The behaviour is maybe intended, if you are writing to an existing table with fix schema, but not if you accept any schema. Such cases cannot be implemented, right now. I found in [{{DataFrameWriter.scala}}|https://github.com/apache/spark/blob/4fac6d501a5d97530edb712ff3450890ac10e413/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala#L333] that for data sources inherited from {{FileDataSourceV2}} there is an exception and {{inferSchema}} is not called on the write path and {{getTable}} is called with the schema of the actual data frame. This is the reason why it works for these data source derived from {{FileDataSourceV2}}. I would expect a similar behaviour for my data source which has the capability to accept any schema. The problem is that the capabilities are retrieved by the {{Table}} implementation, but to get a table via {{getTable}} you need a schema. I guess the interface should be designed differently: * two different methods to infer the schema: ** one for the read path like the current implementation ** one for the write path getting the actual schema of the data frame as parameter, this allows the implementation to decide: *** Do I accept all schemas and just the return the schema of the data frame? *** Do I know the schema of the target and ignore the schema of the actual data frame? *** Can the schema of the target be evolved and I check the schema of the data frame to be a valid evolution of the target schema? If you agree, I'm willing to make a PR. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32893) Structured Streaming and Dynamic Allocation on StandaloneCluster
[ https://issues.apache.org/jira/browse/SPARK-32893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] L. C. Hsieh updated SPARK-32893: Priority: Major (was: Blocker) > Structured Streaming and Dynamic Allocation on StandaloneCluster > > > Key: SPARK-32893 > URL: https://issues.apache.org/jira/browse/SPARK-32893 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.0.1 >Reporter: Duarte Ferreira >Priority: Major > > We are currently using Spark 3.0.1 Standalone cluster to run our Structured > streaming applications. > We set the following configurations when running the application in cluster > mode: > * spark.dynamicAllocation.enabled = true > * spark.shuffle.service.enabled = true > * spark.cores.max =5 > * spark.executor.memory = 1G > * spark.executor.cores = 1 > We also have the configurations set to enable spark.shuffle.service.enabled > on each worker and have a cluster composed of 1 master and 2 slaves. > The application reads data from a kafka Topic (readTopic) using [This > documentation, > |https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html]applies > some transformations on the DataSet using spark SQL and writes data to > another Kafka Topic (writeTopic). > When we start the application it behaves correctly, it starts with 0 > executors and. as we start feeding data to the readTopic, it starts > increasing the number of executors until it reaches the 5 executors limit and > all messages are transformed and written to the writeTopic in Kafka. > If we stop feeding messages to the readTopic the application will work as > expected and starts killing executors that are not needed anymore until we > stop sending data completely and it reach 0 executors running. > If we start sending data again right away, it behaves just as expected it > starts increasing the numbers of executors again. But if we leave the > application in idle at 0 executors for around 10 minutes we start getting > errors like this: > {noformat} > *no* > 20/09/15 10:41:22 ERROR TransportClient: Failed to send RPC RPC > 7570256331800450365 to sparkmaster/10.0.12.231:7077: > java.nio.channels.ClosedChannelException > java.nio.channels.ClosedChannelException > at > io.netty.channel.AbstractChannel$AbstractUnsafe.newClosedChannelException(AbstractChannel.java:957) > at > io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:865) > at > io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1367) > at > io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717) > at > io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:764) > at > io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1104) > at > io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) > at > io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) > at > io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) > at > io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) > at sun.nio.ch.IOUtil.write(IOUtil.java:65) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:468) > at > org.apache.spark.network.protocol.MessageWithHeader.copyByteBuf(MessageWithHeader.java:148) > at > org.apache.spark.network.protocol.MessageWithHeader.transferTo(MessageWithHeader.java:123) > at > io.netty.channel.socket.nio.NioSocketChannel.doWriteFileRegion(NioSocketChannel.java:362) > at > io.netty.channel.nio.AbstractNioByteChannel.doWriteInternal(AbstractNioByteChannel.java:235) > at > io.netty.channel.nio.AbstractNioByteChannel.doWrite0(AbstractNioByteChannel.java:209) > at > io.netty.channel.socket.nio.NioSocketChannel.doWrite(NioSocketChannel.java:400) > at > io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:930) > at > io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0(AbstractNioChannel.java:354) > at > io.nett
[jira] [Created] (SPARK-32894) Timestamp cast in exernal ocr table
Grigory Skvortsov created SPARK-32894: - Summary: Timestamp cast in exernal ocr table Key: SPARK-32894 URL: https://issues.apache.org/jira/browse/SPARK-32894 Project: Spark Issue Type: Bug Components: Java API Affects Versions: 3.0.0 Environment: Spark 3.0.0 Java 1.8 Hadoop 3.3.0 Hive 3.1.2 Python 3.7 (from pyspark) Reporter: Grigory Skvortsov I have the external hive table stored as orc. I want to work with timestamp column in my table using pyspark. For example, I try this: spark.sql('select id, time_ from mydb.table1`).show() Py4JJavaError: An error occurred while calling o2877.showString. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 4.0 failed 4 times, most recent failure: Lost task 0.3 in stage 4.0 (TID 19, 172.29.14.241, executor 1): java.lang.ClassCastException: org.apache.spark.unsafe.types.UTF8String cannot be cast to java.lang.Long at scala.runtime.BoxesRunTime.unboxToLong(BoxesRunTime.java:107) at org.apache.spark.sql.catalyst.expressions.MutableLong.update(SpecificInternalRow.scala:148) at org.apache.spark.sql.catalyst.expressions.SpecificInternalRow.update(SpecificInternalRow.scala:228) at org.apache.spark.sql.hive.HiveInspectors.$anonfun$unwrapperFor$53(HiveInspectors.scala:730) at org.apache.spark.sql.hive.HiveInspectors.$anonfun$unwrapperFor$53$adapted(HiveInspectors.scala:730) at org.apache.spark.sql.hive.orc.OrcFileFormat$.$anonfun$unwrapOrcStructs$4(OrcFileFormat.scala:351) at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.next(FileScanRDD.scala:96) at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729) at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:340) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:872) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:872) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:127) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:444) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:447) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2023) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:1972) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:1971) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1971) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:950) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:950) at scala.Option.foreach(Option.scala:407) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:950) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2203) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2152) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2141) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:752) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2093) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2114) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2133) at or
[jira] [Updated] (SPARK-32893) Structured Streaming and Dynamic Allocation on StandaloneCluster
[ https://issues.apache.org/jira/browse/SPARK-32893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duarte Ferreira updated SPARK-32893: Priority: Blocker (was: Major) > Structured Streaming and Dynamic Allocation on StandaloneCluster > > > Key: SPARK-32893 > URL: https://issues.apache.org/jira/browse/SPARK-32893 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.0.1 >Reporter: Duarte Ferreira >Priority: Blocker > > We are currently using Spark 3.0.1 Standalone cluster to run our Structured > streaming applications. > We set the following configurations when running the application in cluster > mode: > * spark.dynamicAllocation.enabled = true > * spark.shuffle.service.enabled = true > * spark.cores.max =5 > * spark.executor.memory = 1G > * spark.executor.cores = 1 > We also have the configurations set to enable spark.shuffle.service.enabled > on each worker and have a cluster composed of 1 master and 2 slaves. > The application reads data from a kafka Topic (readTopic) using [This > documentation, > |https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html]applies > some transformations on the DataSet using spark SQL and writes data to > another Kafka Topic (writeTopic). > When we start the application it behaves correctly, it starts with 0 > executors and. as we start feeding data to the readTopic, it starts > increasing the number of executors until it reaches the 5 executors limit and > all messages are transformed and written to the writeTopic in Kafka. > If we stop feeding messages to the readTopic the application will work as > expected and starts killing executors that are not needed anymore until we > stop sending data completely and it reach 0 executors running. > If we start sending data again right away, it behaves just as expected it > starts increasing the numbers of executors again. But if we leave the > application in idle at 0 executors for around 10 minutes we start getting > errors like this: > {noformat} > *no* > 20/09/15 10:41:22 ERROR TransportClient: Failed to send RPC RPC > 7570256331800450365 to sparkmaster/10.0.12.231:7077: > java.nio.channels.ClosedChannelException > java.nio.channels.ClosedChannelException > at > io.netty.channel.AbstractChannel$AbstractUnsafe.newClosedChannelException(AbstractChannel.java:957) > at > io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:865) > at > io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1367) > at > io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717) > at > io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:764) > at > io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1104) > at > io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) > at > io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) > at > io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) > at > io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) > at sun.nio.ch.IOUtil.write(IOUtil.java:65) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:468) > at > org.apache.spark.network.protocol.MessageWithHeader.copyByteBuf(MessageWithHeader.java:148) > at > org.apache.spark.network.protocol.MessageWithHeader.transferTo(MessageWithHeader.java:123) > at > io.netty.channel.socket.nio.NioSocketChannel.doWriteFileRegion(NioSocketChannel.java:362) > at > io.netty.channel.nio.AbstractNioByteChannel.doWriteInternal(AbstractNioByteChannel.java:235) > at > io.netty.channel.nio.AbstractNioByteChannel.doWrite0(AbstractNioByteChannel.java:209) > at > io.netty.channel.socket.nio.NioSocketChannel.doWrite(NioSocketChannel.java:400) > at > io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:930) > at > io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0(AbstractNioChannel.java:354) > at
[jira] [Commented] (SPARK-32738) thread safe endpoints may hang due to fatal error
[ https://issues.apache.org/jira/browse/SPARK-32738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196433#comment-17196433 ] Apache Spark commented on SPARK-32738: -- User 'wzhfy' has created a pull request for this issue: https://github.com/apache/spark/pull/29764 > thread safe endpoints may hang due to fatal error > - > > Key: SPARK-32738 > URL: https://issues.apache.org/jira/browse/SPARK-32738 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.4, 2.4.6, 3.0.0 >Reporter: Zhenhua Wang >Assignee: Zhenhua Wang >Priority: Major > Fix For: 3.1.0 > > > Processing for `ThreadSafeRpcEndpoint` is controlled by 'numActiveThreads' in > `Inbox`. Now if any fatal error happens during `Inbox.process`, > 'numActiveThreads' is not reduced. Then other threads can not process > messages in that inbox, which causes the endpoint to "hang". > This problem is more serious in previous Spark 2.x versions since the driver, > executor and block manager endpoints are all thread safe endpoints. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32738) thread safe endpoints may hang due to fatal error
[ https://issues.apache.org/jira/browse/SPARK-32738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196432#comment-17196432 ] Apache Spark commented on SPARK-32738: -- User 'wzhfy' has created a pull request for this issue: https://github.com/apache/spark/pull/29764 > thread safe endpoints may hang due to fatal error > - > > Key: SPARK-32738 > URL: https://issues.apache.org/jira/browse/SPARK-32738 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.4, 2.4.6, 3.0.0 >Reporter: Zhenhua Wang >Assignee: Zhenhua Wang >Priority: Major > Fix For: 3.1.0 > > > Processing for `ThreadSafeRpcEndpoint` is controlled by 'numActiveThreads' in > `Inbox`. Now if any fatal error happens during `Inbox.process`, > 'numActiveThreads' is not reduced. Then other threads can not process > messages in that inbox, which causes the endpoint to "hang". > This problem is more serious in previous Spark 2.x versions since the driver, > executor and block manager endpoints are all thread safe endpoints. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32738) thread safe endpoints may hang due to fatal error
[ https://issues.apache.org/jira/browse/SPARK-32738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196300#comment-17196300 ] Apache Spark commented on SPARK-32738: -- User 'wzhfy' has created a pull request for this issue: https://github.com/apache/spark/pull/29763 > thread safe endpoints may hang due to fatal error > - > > Key: SPARK-32738 > URL: https://issues.apache.org/jira/browse/SPARK-32738 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.4, 2.4.6, 3.0.0 >Reporter: Zhenhua Wang >Assignee: Zhenhua Wang >Priority: Major > Fix For: 3.1.0 > > > Processing for `ThreadSafeRpcEndpoint` is controlled by 'numActiveThreads' in > `Inbox`. Now if any fatal error happens during `Inbox.process`, > 'numActiveThreads' is not reduced. Then other threads can not process > messages in that inbox, which causes the endpoint to "hang". > This problem is more serious in previous Spark 2.x versions since the driver, > executor and block manager endpoints are all thread safe endpoints. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32738) thread safe endpoints may hang due to fatal error
[ https://issues.apache.org/jira/browse/SPARK-32738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196309#comment-17196309 ] Apache Spark commented on SPARK-32738: -- User 'wzhfy' has created a pull request for this issue: https://github.com/apache/spark/pull/29763 > thread safe endpoints may hang due to fatal error > - > > Key: SPARK-32738 > URL: https://issues.apache.org/jira/browse/SPARK-32738 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.4, 2.4.6, 3.0.0 >Reporter: Zhenhua Wang >Assignee: Zhenhua Wang >Priority: Major > Fix For: 3.1.0 > > > Processing for `ThreadSafeRpcEndpoint` is controlled by 'numActiveThreads' in > `Inbox`. Now if any fatal error happens during `Inbox.process`, > 'numActiveThreads' is not reduced. Then other threads can not process > messages in that inbox, which causes the endpoint to "hang". > This problem is more serious in previous Spark 2.x versions since the driver, > executor and block manager endpoints are all thread safe endpoints. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32884) Mark TPCDSQuery*Suite as ExtendedSQLTest
[ https://issues.apache.org/jira/browse/SPARK-32884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-32884. --- Fix Version/s: 3.1.0 Assignee: Dongjoon Hyun Resolution: Fixed > Mark TPCDSQuery*Suite as ExtendedSQLTest > > > Key: SPARK-32884 > URL: https://issues.apache.org/jira/browse/SPARK-32884 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.1.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32827) Add spark.sql.maxMetadataStringLength config
[ https://issues.apache.org/jira/browse/SPARK-32827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-32827. - Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 29688 [https://github.com/apache/spark/pull/29688] > Add spark.sql.maxMetadataStringLength config > > > Key: SPARK-32827 > URL: https://issues.apache.org/jira/browse/SPARK-32827 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: ulysses you >Assignee: ulysses you >Priority: Minor > Fix For: 3.1.0 > > > Add a new config `spark.sql.maxMetadataStringLength`. This config aims to > limit metadata value length, e.g. file location. > Found that metadata has been abbreviated by `...` when tried to add some test > in `SQLQueryTestSuite`. That caused we can't replace the location value by > `className` since the `className` has been abbreviated. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32827) Add spark.sql.maxMetadataStringLength config
[ https://issues.apache.org/jira/browse/SPARK-32827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-32827: --- Assignee: ulysses you > Add spark.sql.maxMetadataStringLength config > > > Key: SPARK-32827 > URL: https://issues.apache.org/jira/browse/SPARK-32827 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: ulysses you >Assignee: ulysses you >Priority: Minor > > Add a new config `spark.sql.maxMetadataStringLength`. This config aims to > limit metadata value length, e.g. file location. > Found that metadata has been abbreviated by `...` when tried to add some test > in `SQLQueryTestSuite`. That caused we can't replace the location value by > `className` since the `className` has been abbreviated. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32893) Structured Streaming and Dynamic Allocation on StandaloneCluster
Duarte Ferreira created SPARK-32893: --- Summary: Structured Streaming and Dynamic Allocation on StandaloneCluster Key: SPARK-32893 URL: https://issues.apache.org/jira/browse/SPARK-32893 Project: Spark Issue Type: Bug Components: Structured Streaming Affects Versions: 3.0.1 Reporter: Duarte Ferreira We are currently using Spark 3.0.1 Standalone cluster to run our Structured streaming applications. We set the following configurations when running the application in cluster mode: * spark.dynamicAllocation.enabled = true * spark.shuffle.service.enabled = true * spark.cores.max =5 * spark.executor.memory = 1G * spark.executor.cores = 1 We also have the configurations set to enable spark.shuffle.service.enabled on each worker and have a cluster composed of 1 master and 2 slaves. The application reads data from a kafka Topic (readTopic) using [This documentation, |https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html]applies some transformations on the DataSet using spark SQL and writes data to another Kafka Topic (writeTopic). When we start the application it behaves correctly, it starts with 0 executors and. as we start feeding data to the readTopic, it starts increasing the number of executors until it reaches the 5 executors limit and all messages are transformed and written to the writeTopic in Kafka. If we stop feeding messages to the readTopic the application will work as expected and starts killing executors that are not needed anymore until we stop sending data completely and it reach 0 executors running. If we start sending data again right away, it behaves just as expected it starts increasing the numbers of executors again. But if we leave the application in idle at 0 executors for around 10 minutes we start getting errors like this: {noformat} *no* 20/09/15 10:41:22 ERROR TransportClient: Failed to send RPC RPC 7570256331800450365 to sparkmaster/10.0.12.231:7077: java.nio.channels.ClosedChannelException java.nio.channels.ClosedChannelException at io.netty.channel.AbstractChannel$AbstractUnsafe.newClosedChannelException(AbstractChannel.java:957) at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:865) at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1367) at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717) at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:764) at io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1104) at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) at sun.nio.ch.IOUtil.write(IOUtil.java:65) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:468) at org.apache.spark.network.protocol.MessageWithHeader.copyByteBuf(MessageWithHeader.java:148) at org.apache.spark.network.protocol.MessageWithHeader.transferTo(MessageWithHeader.java:123) at io.netty.channel.socket.nio.NioSocketChannel.doWriteFileRegion(NioSocketChannel.java:362) at io.netty.channel.nio.AbstractNioByteChannel.doWriteInternal(AbstractNioByteChannel.java:235) at io.netty.channel.nio.AbstractNioByteChannel.doWrite0(AbstractNioByteChannel.java:209) at io.netty.channel.socket.nio.NioSocketChannel.doWrite(NioSocketChannel.java:400) at io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:930) at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0(AbstractNioChannel.java:354) at io.netty.channel.AbstractChannel$AbstractUnsafe.flush(AbstractChannel.java:897) at io.netty.channel.DefaultChannelPipeline$HeadContext.flush(DefaultChannelPipeline.java:1372) at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:750) at io.netty.channel.AbstractChannelHandlerC
[jira] [Commented] (SPARK-32892) Murmur3 and xxHash64 implementations do not produce the correct results on big-endian platforms
[ https://issues.apache.org/jira/browse/SPARK-32892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196168#comment-17196168 ] Apache Spark commented on SPARK-32892: -- User 'mundaym' has created a pull request for this issue: https://github.com/apache/spark/pull/29762 > Murmur3 and xxHash64 implementations do not produce the correct results on > big-endian platforms > --- > > Key: SPARK-32892 > URL: https://issues.apache.org/jira/browse/SPARK-32892 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 3.0.1 >Reporter: Michael Munday >Priority: Minor > Labels: big-endian > > The Murmur3 and xxHash64 implementations in Spark do not produce the correct > results on big-endian systems. This causes test failures on my target > platform (s390x). > These hash functions require that multi-byte chunks be interpreted as > integers encoded in *little-endian* byte order. This requires byte reversal > when using multi-byte unsafe operations on big-endian platforms. > I have a PR ready for discussion and review. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32892) Murmur3 and xxHash64 implementations do not produce the correct results on big-endian platforms
[ https://issues.apache.org/jira/browse/SPARK-32892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196167#comment-17196167 ] Apache Spark commented on SPARK-32892: -- User 'mundaym' has created a pull request for this issue: https://github.com/apache/spark/pull/29762 > Murmur3 and xxHash64 implementations do not produce the correct results on > big-endian platforms > --- > > Key: SPARK-32892 > URL: https://issues.apache.org/jira/browse/SPARK-32892 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 3.0.1 >Reporter: Michael Munday >Priority: Minor > Labels: big-endian > > The Murmur3 and xxHash64 implementations in Spark do not produce the correct > results on big-endian systems. This causes test failures on my target > platform (s390x). > These hash functions require that multi-byte chunks be interpreted as > integers encoded in *little-endian* byte order. This requires byte reversal > when using multi-byte unsafe operations on big-endian platforms. > I have a PR ready for discussion and review. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32892) Murmur3 and xxHash64 implementations do not produce the correct results on big-endian platforms
[ https://issues.apache.org/jira/browse/SPARK-32892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32892: Assignee: Apache Spark > Murmur3 and xxHash64 implementations do not produce the correct results on > big-endian platforms > --- > > Key: SPARK-32892 > URL: https://issues.apache.org/jira/browse/SPARK-32892 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 3.0.1 >Reporter: Michael Munday >Assignee: Apache Spark >Priority: Minor > Labels: big-endian > > The Murmur3 and xxHash64 implementations in Spark do not produce the correct > results on big-endian systems. This causes test failures on my target > platform (s390x). > These hash functions require that multi-byte chunks be interpreted as > integers encoded in *little-endian* byte order. This requires byte reversal > when using multi-byte unsafe operations on big-endian platforms. > I have a PR ready for discussion and review. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32892) Murmur3 and xxHash64 implementations do not produce the correct results on big-endian platforms
[ https://issues.apache.org/jira/browse/SPARK-32892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32892: Assignee: (was: Apache Spark) > Murmur3 and xxHash64 implementations do not produce the correct results on > big-endian platforms > --- > > Key: SPARK-32892 > URL: https://issues.apache.org/jira/browse/SPARK-32892 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 3.0.1 >Reporter: Michael Munday >Priority: Minor > Labels: big-endian > > The Murmur3 and xxHash64 implementations in Spark do not produce the correct > results on big-endian systems. This causes test failures on my target > platform (s390x). > These hash functions require that multi-byte chunks be interpreted as > integers encoded in *little-endian* byte order. This requires byte reversal > when using multi-byte unsafe operations on big-endian platforms. > I have a PR ready for discussion and review. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32892) Murmur3 and xxHash64 implementations do not produce the correct results on big-endian platforms
Michael Munday created SPARK-32892: -- Summary: Murmur3 and xxHash64 implementations do not produce the correct results on big-endian platforms Key: SPARK-32892 URL: https://issues.apache.org/jira/browse/SPARK-32892 Project: Spark Issue Type: Bug Components: Spark Core, SQL Affects Versions: 3.0.1 Reporter: Michael Munday The Murmur3 and xxHash64 implementations in Spark do not produce the correct results on big-endian systems. This causes test failures on my target platform (s390x). These hash functions require that multi-byte chunks be interpreted as integers encoded in *little-endian* byte order. This requires byte reversal when using multi-byte unsafe operations on big-endian platforms. I have a PR ready for discussion and review. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31448) Difference in Storage Levels used in cache() and persist() for pyspark dataframes
[ https://issues.apache.org/jira/browse/SPARK-31448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-31448: Assignee: Abhishek Dixit > Difference in Storage Levels used in cache() and persist() for pyspark > dataframes > - > > Key: SPARK-31448 > URL: https://issues.apache.org/jira/browse/SPARK-31448 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.3 >Reporter: Abhishek Dixit >Assignee: Abhishek Dixit >Priority: Major > > There is a difference in default storage level *MEMORY_AND_DISK* in pyspark > and scala. > *Scala*: StorageLevel(true, true, false, true) > *Pyspark:* StorageLevel(True, True, False, False) > > *Problem Description:* > Calling *df.cache()* for pyspark dataframe directly invokes Scala method > cache() and Storage Level used is StorageLevel(true, true, false, true). > But calling *df.persist()* for pyspark dataframe sets the > newStorageLevel=StorageLevel(true, true, false, false) inside pyspark and > then invokes Scala function persist(newStorageLevel). > *Possible Fix:* > Invoke pyspark function persist inside pyspark function cache instead of > calling the scala function directly. > I can raise a PR for this fix if someone can confirm that this is a bug and > the possible fix is the correct approach. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31448) Difference in Storage Levels used in cache() and persist() for pyspark dataframes
[ https://issues.apache.org/jira/browse/SPARK-31448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-31448. -- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 29242 [https://github.com/apache/spark/pull/29242] > Difference in Storage Levels used in cache() and persist() for pyspark > dataframes > - > > Key: SPARK-31448 > URL: https://issues.apache.org/jira/browse/SPARK-31448 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.3 >Reporter: Abhishek Dixit >Assignee: Abhishek Dixit >Priority: Major > Fix For: 3.1.0 > > > There is a difference in default storage level *MEMORY_AND_DISK* in pyspark > and scala. > *Scala*: StorageLevel(true, true, false, true) > *Pyspark:* StorageLevel(True, True, False, False) > > *Problem Description:* > Calling *df.cache()* for pyspark dataframe directly invokes Scala method > cache() and Storage Level used is StorageLevel(true, true, false, true). > But calling *df.persist()* for pyspark dataframe sets the > newStorageLevel=StorageLevel(true, true, false, false) inside pyspark and > then invokes Scala function persist(newStorageLevel). > *Possible Fix:* > Invoke pyspark function persist inside pyspark function cache instead of > calling the scala function directly. > I can raise a PR for this fix if someone can confirm that this is a bug and > the possible fix is the correct approach. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32891) Enhance UTF8String.trim
[ https://issues.apache.org/jira/browse/SPARK-32891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196157#comment-17196157 ] Sean R. Owen commented on SPARK-32891: -- Can you inline a basic description here? what are you proposing? > Enhance UTF8String.trim > --- > > Key: SPARK-32891 > URL: https://issues.apache.org/jira/browse/SPARK-32891 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Priority: Major > > Please see comment: > https://github.com/apache/spark/blob/7eb76d698836a251065753117e22285dd1a8aa8f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L674-L675 > https://github.com/apache/spark/pull/29731#discussion_r487709672 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32891) Enhance UTF8String.trim
Yuming Wang created SPARK-32891: --- Summary: Enhance UTF8String.trim Key: SPARK-32891 URL: https://issues.apache.org/jira/browse/SPARK-32891 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.1.0 Reporter: Yuming Wang Please see comment: https://github.com/apache/spark/blob/7eb76d698836a251065753117e22285dd1a8aa8f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L674-L675 https://github.com/apache/spark/pull/29731#discussion_r487709672 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32889) orc table column name doesn't support special characters.
[ https://issues.apache.org/jira/browse/SPARK-32889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196143#comment-17196143 ] Apache Spark commented on SPARK-32889: -- User 'jzc928' has created a pull request for this issue: https://github.com/apache/spark/pull/29761 > orc table column name doesn't support special characters. > - > > Key: SPARK-32889 > URL: https://issues.apache.org/jira/browse/SPARK-32889 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1 >Reporter: jason jin >Priority: Major > > when execute > "CREATE TABLE tbl(`$` INT, b INT) using orc"; > error occurs like below. but in it's ok in hive. > Column name "$" contains invalid character(s). Please use alias to rename > it.;Column name "$" contains invalid character(s). Please use alias to rename > it.;org.apache.spark.sql.AnalysisException: Column name "$" contains invalid > character(s). Please use alias to rename it.; at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$.checkFieldName(OrcFileFormat.scala:51) > at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$.$anonfun$checkFieldNames$1(OrcFileFormat.scala:59) > at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$.$anonfun$checkFieldNames$1$adapted(OrcFileFormat.scala:59) > at > scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at > scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$.checkFieldNames(OrcFileFormat.scala:59) > at > org.apache.spark.sql.execution.command.DDLUtils$.$anonfun$checkDataColNames$1(ddl.scala:924) > at > org.apache.spark.sql.execution.command.DDLUtils$.$anonfun$checkDataColNames$1$adapted(ddl.scala:913) > at scala.Option.foreach(Option.scala:407) at > org.apache.spark.sql.execution.command.DDLUtils$.checkDataColNames(ddl.scala:913) > at > org.apache.spark.sql.execution.command.DDLUtils$.checkDataColNames(ddl.scala:908) > at > org.apache.spark.sql.execution.datasources.PreprocessTableCreation$$anonfun$apply$2.applyOrElse(rules.scala:231) > at > org.apache.spark.sql.execution.datasources.PreprocessTableCreation$$anonfun$apply$2.applyOrElse(rules.scala:80) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$2(AnalysisHelper.scala:108) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:73) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$1(AnalysisHelper.scala:108) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:207) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown(AnalysisHelper.scala:106) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown$(AnalysisHelper.scala:104) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDown(LogicalPlan.scala:29) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators(AnalysisHelper.scala:73) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators$(AnalysisHelper.scala:72) > at -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32889) orc table column name doesn't support special characters.
[ https://issues.apache.org/jira/browse/SPARK-32889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32889: Assignee: (was: Apache Spark) > orc table column name doesn't support special characters. > - > > Key: SPARK-32889 > URL: https://issues.apache.org/jira/browse/SPARK-32889 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1 >Reporter: jason jin >Priority: Major > > when execute > "CREATE TABLE tbl(`$` INT, b INT) using orc"; > error occurs like below. but in it's ok in hive. > Column name "$" contains invalid character(s). Please use alias to rename > it.;Column name "$" contains invalid character(s). Please use alias to rename > it.;org.apache.spark.sql.AnalysisException: Column name "$" contains invalid > character(s). Please use alias to rename it.; at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$.checkFieldName(OrcFileFormat.scala:51) > at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$.$anonfun$checkFieldNames$1(OrcFileFormat.scala:59) > at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$.$anonfun$checkFieldNames$1$adapted(OrcFileFormat.scala:59) > at > scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at > scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$.checkFieldNames(OrcFileFormat.scala:59) > at > org.apache.spark.sql.execution.command.DDLUtils$.$anonfun$checkDataColNames$1(ddl.scala:924) > at > org.apache.spark.sql.execution.command.DDLUtils$.$anonfun$checkDataColNames$1$adapted(ddl.scala:913) > at scala.Option.foreach(Option.scala:407) at > org.apache.spark.sql.execution.command.DDLUtils$.checkDataColNames(ddl.scala:913) > at > org.apache.spark.sql.execution.command.DDLUtils$.checkDataColNames(ddl.scala:908) > at > org.apache.spark.sql.execution.datasources.PreprocessTableCreation$$anonfun$apply$2.applyOrElse(rules.scala:231) > at > org.apache.spark.sql.execution.datasources.PreprocessTableCreation$$anonfun$apply$2.applyOrElse(rules.scala:80) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$2(AnalysisHelper.scala:108) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:73) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$1(AnalysisHelper.scala:108) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:207) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown(AnalysisHelper.scala:106) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown$(AnalysisHelper.scala:104) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDown(LogicalPlan.scala:29) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators(AnalysisHelper.scala:73) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators$(AnalysisHelper.scala:72) > at -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32889) orc table column name doesn't support special characters.
[ https://issues.apache.org/jira/browse/SPARK-32889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32889: Assignee: Apache Spark > orc table column name doesn't support special characters. > - > > Key: SPARK-32889 > URL: https://issues.apache.org/jira/browse/SPARK-32889 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1 >Reporter: jason jin >Assignee: Apache Spark >Priority: Major > > when execute > "CREATE TABLE tbl(`$` INT, b INT) using orc"; > error occurs like below. but in it's ok in hive. > Column name "$" contains invalid character(s). Please use alias to rename > it.;Column name "$" contains invalid character(s). Please use alias to rename > it.;org.apache.spark.sql.AnalysisException: Column name "$" contains invalid > character(s). Please use alias to rename it.; at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$.checkFieldName(OrcFileFormat.scala:51) > at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$.$anonfun$checkFieldNames$1(OrcFileFormat.scala:59) > at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$.$anonfun$checkFieldNames$1$adapted(OrcFileFormat.scala:59) > at > scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at > scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$.checkFieldNames(OrcFileFormat.scala:59) > at > org.apache.spark.sql.execution.command.DDLUtils$.$anonfun$checkDataColNames$1(ddl.scala:924) > at > org.apache.spark.sql.execution.command.DDLUtils$.$anonfun$checkDataColNames$1$adapted(ddl.scala:913) > at scala.Option.foreach(Option.scala:407) at > org.apache.spark.sql.execution.command.DDLUtils$.checkDataColNames(ddl.scala:913) > at > org.apache.spark.sql.execution.command.DDLUtils$.checkDataColNames(ddl.scala:908) > at > org.apache.spark.sql.execution.datasources.PreprocessTableCreation$$anonfun$apply$2.applyOrElse(rules.scala:231) > at > org.apache.spark.sql.execution.datasources.PreprocessTableCreation$$anonfun$apply$2.applyOrElse(rules.scala:80) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$2(AnalysisHelper.scala:108) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:73) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$1(AnalysisHelper.scala:108) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:207) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown(AnalysisHelper.scala:106) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown$(AnalysisHelper.scala:104) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDown(LogicalPlan.scala:29) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators(AnalysisHelper.scala:73) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators$(AnalysisHelper.scala:72) > at -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32890) Pass all `sql/hive` module UTs in Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-32890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32890: Assignee: Apache Spark > Pass all `sql/hive` module UTs in Scala 2.13 > > > Key: SPARK-32890 > URL: https://issues.apache.org/jira/browse/SPARK-32890 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Major > > There are only 4 test cases failed in sql hive module with cmd > > {code:java} > mvn clean install -DskipTests -pl sql/hive -am -Pscala-2.13 -Phive > mvn clean test -pl sql/hive -Pscala-2.13 -Phive{code} > > The failed cases as follow: > * HiveSchemaInferenceSuite (1 FAILED) > * HiveSparkSubmitSuite (1 FAILED) > * StatisticsSuite (1 FAILED) > * HiveDDLSuite (1 FAILED) > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32890) Pass all `sql/hive` module UTs in Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-32890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196132#comment-17196132 ] Apache Spark commented on SPARK-32890: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/29760 > Pass all `sql/hive` module UTs in Scala 2.13 > > > Key: SPARK-32890 > URL: https://issues.apache.org/jira/browse/SPARK-32890 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yang Jie >Priority: Major > > There are only 4 test cases failed in sql hive module with cmd > > {code:java} > mvn clean install -DskipTests -pl sql/hive -am -Pscala-2.13 -Phive > mvn clean test -pl sql/hive -Pscala-2.13 -Phive{code} > > The failed cases as follow: > * HiveSchemaInferenceSuite (1 FAILED) > * HiveSparkSubmitSuite (1 FAILED) > * StatisticsSuite (1 FAILED) > * HiveDDLSuite (1 FAILED) > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32890) Pass all `sql/hive` module UTs in Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-32890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32890: Assignee: (was: Apache Spark) > Pass all `sql/hive` module UTs in Scala 2.13 > > > Key: SPARK-32890 > URL: https://issues.apache.org/jira/browse/SPARK-32890 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yang Jie >Priority: Major > > There are only 4 test cases failed in sql hive module with cmd > > {code:java} > mvn clean install -DskipTests -pl sql/hive -am -Pscala-2.13 -Phive > mvn clean test -pl sql/hive -Pscala-2.13 -Phive{code} > > The failed cases as follow: > * HiveSchemaInferenceSuite (1 FAILED) > * HiveSparkSubmitSuite (1 FAILED) > * StatisticsSuite (1 FAILED) > * HiveDDLSuite (1 FAILED) > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32874) Enhance result set meta data check for execute statement operation for thrift server
[ https://issues.apache.org/jira/browse/SPARK-32874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-32874. - Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 29746 [https://github.com/apache/spark/pull/29746] > Enhance result set meta data check for execute statement operation for thrift > server > > > Key: SPARK-32874 > URL: https://issues.apache.org/jira/browse/SPARK-32874 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.1, 3.1.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Fix For: 3.1.0 > > > Add test cases to ensure stability for JDBC api -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32874) Enhance result set meta data check for execute statement operation for thrift server
[ https://issues.apache.org/jira/browse/SPARK-32874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-32874: --- Assignee: Kent Yao > Enhance result set meta data check for execute statement operation for thrift > server > > > Key: SPARK-32874 > URL: https://issues.apache.org/jira/browse/SPARK-32874 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.1, 3.1.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > > Add test cases to ensure stability for JDBC api -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32890) Pass all `sql/hive` module UTs in Scala 2.13
Yang Jie created SPARK-32890: Summary: Pass all `sql/hive` module UTs in Scala 2.13 Key: SPARK-32890 URL: https://issues.apache.org/jira/browse/SPARK-32890 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.1.0 Reporter: Yang Jie There are only 4 test cases failed in sql hive module with cmd {code:java} mvn clean install -DskipTests -pl sql/hive -am -Pscala-2.13 -Phive mvn clean test -pl sql/hive -Pscala-2.13 -Phive{code} The failed cases as follow: * HiveSchemaInferenceSuite (1 FAILED) * HiveSparkSubmitSuite (1 FAILED) * StatisticsSuite (1 FAILED) * HiveDDLSuite (1 FAILED) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32889) orc table column name doesn't support special characters.
jason jin created SPARK-32889: - Summary: orc table column name doesn't support special characters. Key: SPARK-32889 URL: https://issues.apache.org/jira/browse/SPARK-32889 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.1 Reporter: jason jin when execute "CREATE TABLE tbl(`$` INT, b INT) using orc"; error occurs like below. but in it's ok in hive. Column name "$" contains invalid character(s). Please use alias to rename it.;Column name "$" contains invalid character(s). Please use alias to rename it.;org.apache.spark.sql.AnalysisException: Column name "$" contains invalid character(s). Please use alias to rename it.; at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$.checkFieldName(OrcFileFormat.scala:51) at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$.$anonfun$checkFieldNames$1(OrcFileFormat.scala:59) at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$.$anonfun$checkFieldNames$1$adapted(OrcFileFormat.scala:59) at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$.checkFieldNames(OrcFileFormat.scala:59) at org.apache.spark.sql.execution.command.DDLUtils$.$anonfun$checkDataColNames$1(ddl.scala:924) at org.apache.spark.sql.execution.command.DDLUtils$.$anonfun$checkDataColNames$1$adapted(ddl.scala:913) at scala.Option.foreach(Option.scala:407) at org.apache.spark.sql.execution.command.DDLUtils$.checkDataColNames(ddl.scala:913) at org.apache.spark.sql.execution.command.DDLUtils$.checkDataColNames(ddl.scala:908) at org.apache.spark.sql.execution.datasources.PreprocessTableCreation$$anonfun$apply$2.applyOrElse(rules.scala:231) at org.apache.spark.sql.execution.datasources.PreprocessTableCreation$$anonfun$apply$2.applyOrElse(rules.scala:80) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$2(AnalysisHelper.scala:108) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:73) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$1(AnalysisHelper.scala:108) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:207) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown(AnalysisHelper.scala:106) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown$(AnalysisHelper.scala:104) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDown(LogicalPlan.scala:29) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators(AnalysisHelper.scala:73) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators$(AnalysisHelper.scala:72) at -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32887) Example command in https://spark.apache.org/docs/latest/sql-ref-syntax-aux-show-table.html to be changed
[ https://issues.apache.org/jira/browse/SPARK-32887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196024#comment-17196024 ] Apache Spark commented on SPARK-32887: -- User 'Udbhav30' has created a pull request for this issue: https://github.com/apache/spark/pull/29758 > Example command in > https://spark.apache.org/docs/latest/sql-ref-syntax-aux-show-table.html to be > changed > > > Key: SPARK-32887 > URL: https://issues.apache.org/jira/browse/SPARK-32887 > Project: Spark > Issue Type: Bug > Components: docs >Affects Versions: 3.0.0 > Environment: Spark 2.4.5, Spark 3.0.0 >Reporter: Chetan Bhat >Priority: Minor > > In the link > [https://spark.apache.org/docs/latest/sql-ref-syntax-aux-show-table.html] the > below command example mentioned is wrong. > SHOW TABLE EXTENDED IN default LIKE 'employee' PARTITION ('grade=1'); > > Complete example executed throws below error. > CREATE TABLE employee(name STRING)PARTITIONED BY (grade int) stored as > parquet; > INSERT INTO employee PARTITION (grade = 1) VALUES ('sam'); > INSERT INTO employee PARTITION (grade = 2) VALUES ('suj'); > spark-sql> SHOW TABLE EXTENDED IN default LIKE 'employee' PARTITION > ('grade=1'); > **Error in query:** > ``` > mismatched input ''grade=1'' expecting \{'ADD', 'AFTER', 'ALL', 'ALTER', > 'ANALYZE', 'AND', 'ANTI', 'ANY', 'ARCHIVE', 'ARRAY', 'AS', 'ASC', 'AT', > 'AUTHORIZATION', 'BETWEEN', 'BOTH', 'BUCKET', 'BUCKETS', 'BY', 'CACHE', > 'CASCADE', 'CASE', 'CAST', 'CHANGE', 'CHECK', 'CLEAR', 'CLUSTER', > 'CLUSTERED', 'CODEGEN', 'COLLATE', 'COLLECTION', 'COLUMN', 'COLUMNS', > 'COMMENT', 'COMMIT', 'COMPACT', 'COMPACTIONS', 'COMPUTE', 'CONCATENATE', > 'CONSTRAINT', 'COST', 'CREATE', 'CROSS', 'CUBE', 'CURRENT', 'CURRENT_DATE', > 'CURRENT_TIME', 'CURRENT_TIMESTAMP', 'CURRENT_USER', 'DATA', 'DATABASE', > DATABASES, 'DAY', 'DBPROPERTIES', 'DEFINED', 'DELETE', 'DELIMITED', 'DESC', > 'DESCRIBE', 'DFS', 'DIRECTORIES', 'DIRECTORY', 'DISTINCT', 'DISTRIBUTE', > 'DROP', 'ELSE', 'END', 'ESCAPE', 'ESCAPED', 'EXCEPT', 'EXCHANGE', 'EXISTS', > 'EXPLAIN', 'EXPORT', 'EXTENDED', 'EXTERNAL', 'EXTRACT', 'FALSE', 'FETCH', > 'FIELDS', 'FILTER', 'FILEFORMAT', 'FIRST', 'FOLLOWING', 'FOR', 'FOREIGN', > 'FORMAT', 'FORMATTED', 'FROM', 'FULL', 'FUNCTION', 'FUNCTIONS', 'GLOBAL', > 'GRANT', 'GROUP', 'GROUPING', 'HAVING', 'HOUR', 'IF', 'IGNORE', 'IMPORT', > 'IN', 'INDEX', 'INDEXES', 'INNER', 'INPATH', 'INPUTFORMAT', 'INSERT', > 'INTERSECT', 'INTERVAL', 'INTO', 'IS', 'ITEMS', 'JOIN', 'KEYS', 'LAST', > 'LATERAL', 'LAZY', 'LEADING', 'LEFT', 'LIKE', 'LIMIT', 'LINES', 'LIST', > 'LOAD', 'LOCAL', 'LOCATION', 'LOCK', 'LOCKS', 'LOGICAL', 'MACRO', 'MAP', > 'MATCHED', 'MERGE', 'MINUTE', 'MONTH', 'MSCK', 'NAMESPACE', 'NAMESPACES', > 'NATURAL', 'NO', NOT, 'NULL', 'NULLS', 'OF', 'ON', 'ONLY', 'OPTION', > 'OPTIONS', 'OR', 'ORDER', 'OUT', 'OUTER', 'OUTPUTFORMAT', 'OVER', 'OVERLAPS', > 'OVERLAY', 'OVERWRITE', 'PARTITION', 'PARTITIONED', 'PARTITIONS', 'PERCENT', > 'PIVOT', 'PLACING', 'POSITION', 'PRECEDING', 'PRIMARY', 'PRINCIPALS', > 'PROPERTIES', 'PURGE', 'QUERY', 'RANGE', 'RECORDREADER', 'RECORDWRITER', > 'RECOVER', 'REDUCE', 'REFERENCES', 'REFRESH', 'RENAME', 'REPAIR', 'REPLACE', > 'RESET', 'RESTRICT', 'REVOKE', 'RIGHT', RLIKE, 'ROLE', 'ROLES', 'ROLLBACK', > 'ROLLUP', 'ROW', 'ROWS', 'SCHEMA', 'SECOND', 'SELECT', 'SEMI', 'SEPARATED', > 'SERDE', 'SERDEPROPERTIES', 'SESSION_USER', 'SET', 'MINUS', 'SETS', 'SHOW', > 'SKEWED', 'SOME', 'SORT', 'SORTED', 'START', 'STATISTICS', 'STORED', > 'STRATIFY', 'STRUCT', 'SUBSTR', 'SUBSTRING', 'TABLE', 'TABLES', > 'TABLESAMPLE', 'TBLPROPERTIES', TEMPORARY, 'TERMINATED', 'THEN', 'TO', > 'TOUCH', 'TRAILING', 'TRANSACTION', 'TRANSACTIONS', 'TRANSFORM', 'TRIM', > 'TRUE', 'TRUNCATE', 'TYPE', 'UNARCHIVE', 'UNBOUNDED', 'UNCACHE', 'UNION', > 'UNIQUE', 'UNKNOWN', 'UNLOCK', 'UNSET', 'UPDATE', 'USE', 'USER', 'USING', > 'VALUES', 'VIEW', 'VIEWS', 'WHEN', 'WHERE', 'WINDOW', 'WITH', 'YEAR', 'DIV', > IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 59) > == SQL == > SHOW TABLE EXTENDED IN default LIKE 'employee' PARTITION ('grade=1') > ---^^^ > ``` > > Expected : - If that partition value is string we can give like this grade > ='abc' -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32481) Support truncate table to move the data to trash
[ https://issues.apache.org/jira/browse/SPARK-32481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196025#comment-17196025 ] Apache Spark commented on SPARK-32481: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/29759 > Support truncate table to move the data to trash > > > Key: SPARK-32481 > URL: https://issues.apache.org/jira/browse/SPARK-32481 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 3.1.0 >Reporter: jobit mathew >Assignee: Udbhav Agrawal >Priority: Minor > Fix For: 3.1.0 > > > *Instead of deleting the data, move the data to trash.So from trash based on > configuration data can be deleted permanently.* -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32481) Support truncate table to move the data to trash
[ https://issues.apache.org/jira/browse/SPARK-32481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196023#comment-17196023 ] Apache Spark commented on SPARK-32481: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/29759 > Support truncate table to move the data to trash > > > Key: SPARK-32481 > URL: https://issues.apache.org/jira/browse/SPARK-32481 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 3.1.0 >Reporter: jobit mathew >Assignee: Udbhav Agrawal >Priority: Minor > Fix For: 3.1.0 > > > *Instead of deleting the data, move the data to trash.So from trash based on > configuration data can be deleted permanently.* -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32887) Example command in https://spark.apache.org/docs/latest/sql-ref-syntax-aux-show-table.html to be changed
[ https://issues.apache.org/jira/browse/SPARK-32887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32887: Assignee: (was: Apache Spark) > Example command in > https://spark.apache.org/docs/latest/sql-ref-syntax-aux-show-table.html to be > changed > > > Key: SPARK-32887 > URL: https://issues.apache.org/jira/browse/SPARK-32887 > Project: Spark > Issue Type: Bug > Components: docs >Affects Versions: 3.0.0 > Environment: Spark 2.4.5, Spark 3.0.0 >Reporter: Chetan Bhat >Priority: Minor > > In the link > [https://spark.apache.org/docs/latest/sql-ref-syntax-aux-show-table.html] the > below command example mentioned is wrong. > SHOW TABLE EXTENDED IN default LIKE 'employee' PARTITION ('grade=1'); > > Complete example executed throws below error. > CREATE TABLE employee(name STRING)PARTITIONED BY (grade int) stored as > parquet; > INSERT INTO employee PARTITION (grade = 1) VALUES ('sam'); > INSERT INTO employee PARTITION (grade = 2) VALUES ('suj'); > spark-sql> SHOW TABLE EXTENDED IN default LIKE 'employee' PARTITION > ('grade=1'); > **Error in query:** > ``` > mismatched input ''grade=1'' expecting \{'ADD', 'AFTER', 'ALL', 'ALTER', > 'ANALYZE', 'AND', 'ANTI', 'ANY', 'ARCHIVE', 'ARRAY', 'AS', 'ASC', 'AT', > 'AUTHORIZATION', 'BETWEEN', 'BOTH', 'BUCKET', 'BUCKETS', 'BY', 'CACHE', > 'CASCADE', 'CASE', 'CAST', 'CHANGE', 'CHECK', 'CLEAR', 'CLUSTER', > 'CLUSTERED', 'CODEGEN', 'COLLATE', 'COLLECTION', 'COLUMN', 'COLUMNS', > 'COMMENT', 'COMMIT', 'COMPACT', 'COMPACTIONS', 'COMPUTE', 'CONCATENATE', > 'CONSTRAINT', 'COST', 'CREATE', 'CROSS', 'CUBE', 'CURRENT', 'CURRENT_DATE', > 'CURRENT_TIME', 'CURRENT_TIMESTAMP', 'CURRENT_USER', 'DATA', 'DATABASE', > DATABASES, 'DAY', 'DBPROPERTIES', 'DEFINED', 'DELETE', 'DELIMITED', 'DESC', > 'DESCRIBE', 'DFS', 'DIRECTORIES', 'DIRECTORY', 'DISTINCT', 'DISTRIBUTE', > 'DROP', 'ELSE', 'END', 'ESCAPE', 'ESCAPED', 'EXCEPT', 'EXCHANGE', 'EXISTS', > 'EXPLAIN', 'EXPORT', 'EXTENDED', 'EXTERNAL', 'EXTRACT', 'FALSE', 'FETCH', > 'FIELDS', 'FILTER', 'FILEFORMAT', 'FIRST', 'FOLLOWING', 'FOR', 'FOREIGN', > 'FORMAT', 'FORMATTED', 'FROM', 'FULL', 'FUNCTION', 'FUNCTIONS', 'GLOBAL', > 'GRANT', 'GROUP', 'GROUPING', 'HAVING', 'HOUR', 'IF', 'IGNORE', 'IMPORT', > 'IN', 'INDEX', 'INDEXES', 'INNER', 'INPATH', 'INPUTFORMAT', 'INSERT', > 'INTERSECT', 'INTERVAL', 'INTO', 'IS', 'ITEMS', 'JOIN', 'KEYS', 'LAST', > 'LATERAL', 'LAZY', 'LEADING', 'LEFT', 'LIKE', 'LIMIT', 'LINES', 'LIST', > 'LOAD', 'LOCAL', 'LOCATION', 'LOCK', 'LOCKS', 'LOGICAL', 'MACRO', 'MAP', > 'MATCHED', 'MERGE', 'MINUTE', 'MONTH', 'MSCK', 'NAMESPACE', 'NAMESPACES', > 'NATURAL', 'NO', NOT, 'NULL', 'NULLS', 'OF', 'ON', 'ONLY', 'OPTION', > 'OPTIONS', 'OR', 'ORDER', 'OUT', 'OUTER', 'OUTPUTFORMAT', 'OVER', 'OVERLAPS', > 'OVERLAY', 'OVERWRITE', 'PARTITION', 'PARTITIONED', 'PARTITIONS', 'PERCENT', > 'PIVOT', 'PLACING', 'POSITION', 'PRECEDING', 'PRIMARY', 'PRINCIPALS', > 'PROPERTIES', 'PURGE', 'QUERY', 'RANGE', 'RECORDREADER', 'RECORDWRITER', > 'RECOVER', 'REDUCE', 'REFERENCES', 'REFRESH', 'RENAME', 'REPAIR', 'REPLACE', > 'RESET', 'RESTRICT', 'REVOKE', 'RIGHT', RLIKE, 'ROLE', 'ROLES', 'ROLLBACK', > 'ROLLUP', 'ROW', 'ROWS', 'SCHEMA', 'SECOND', 'SELECT', 'SEMI', 'SEPARATED', > 'SERDE', 'SERDEPROPERTIES', 'SESSION_USER', 'SET', 'MINUS', 'SETS', 'SHOW', > 'SKEWED', 'SOME', 'SORT', 'SORTED', 'START', 'STATISTICS', 'STORED', > 'STRATIFY', 'STRUCT', 'SUBSTR', 'SUBSTRING', 'TABLE', 'TABLES', > 'TABLESAMPLE', 'TBLPROPERTIES', TEMPORARY, 'TERMINATED', 'THEN', 'TO', > 'TOUCH', 'TRAILING', 'TRANSACTION', 'TRANSACTIONS', 'TRANSFORM', 'TRIM', > 'TRUE', 'TRUNCATE', 'TYPE', 'UNARCHIVE', 'UNBOUNDED', 'UNCACHE', 'UNION', > 'UNIQUE', 'UNKNOWN', 'UNLOCK', 'UNSET', 'UPDATE', 'USE', 'USER', 'USING', > 'VALUES', 'VIEW', 'VIEWS', 'WHEN', 'WHERE', 'WINDOW', 'WITH', 'YEAR', 'DIV', > IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 59) > == SQL == > SHOW TABLE EXTENDED IN default LIKE 'employee' PARTITION ('grade=1') > ---^^^ > ``` > > Expected : - If that partition value is string we can give like this grade > ='abc' -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32887) Example command in https://spark.apache.org/docs/latest/sql-ref-syntax-aux-show-table.html to be changed
[ https://issues.apache.org/jira/browse/SPARK-32887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32887: Assignee: Apache Spark > Example command in > https://spark.apache.org/docs/latest/sql-ref-syntax-aux-show-table.html to be > changed > > > Key: SPARK-32887 > URL: https://issues.apache.org/jira/browse/SPARK-32887 > Project: Spark > Issue Type: Bug > Components: docs >Affects Versions: 3.0.0 > Environment: Spark 2.4.5, Spark 3.0.0 >Reporter: Chetan Bhat >Assignee: Apache Spark >Priority: Minor > > In the link > [https://spark.apache.org/docs/latest/sql-ref-syntax-aux-show-table.html] the > below command example mentioned is wrong. > SHOW TABLE EXTENDED IN default LIKE 'employee' PARTITION ('grade=1'); > > Complete example executed throws below error. > CREATE TABLE employee(name STRING)PARTITIONED BY (grade int) stored as > parquet; > INSERT INTO employee PARTITION (grade = 1) VALUES ('sam'); > INSERT INTO employee PARTITION (grade = 2) VALUES ('suj'); > spark-sql> SHOW TABLE EXTENDED IN default LIKE 'employee' PARTITION > ('grade=1'); > **Error in query:** > ``` > mismatched input ''grade=1'' expecting \{'ADD', 'AFTER', 'ALL', 'ALTER', > 'ANALYZE', 'AND', 'ANTI', 'ANY', 'ARCHIVE', 'ARRAY', 'AS', 'ASC', 'AT', > 'AUTHORIZATION', 'BETWEEN', 'BOTH', 'BUCKET', 'BUCKETS', 'BY', 'CACHE', > 'CASCADE', 'CASE', 'CAST', 'CHANGE', 'CHECK', 'CLEAR', 'CLUSTER', > 'CLUSTERED', 'CODEGEN', 'COLLATE', 'COLLECTION', 'COLUMN', 'COLUMNS', > 'COMMENT', 'COMMIT', 'COMPACT', 'COMPACTIONS', 'COMPUTE', 'CONCATENATE', > 'CONSTRAINT', 'COST', 'CREATE', 'CROSS', 'CUBE', 'CURRENT', 'CURRENT_DATE', > 'CURRENT_TIME', 'CURRENT_TIMESTAMP', 'CURRENT_USER', 'DATA', 'DATABASE', > DATABASES, 'DAY', 'DBPROPERTIES', 'DEFINED', 'DELETE', 'DELIMITED', 'DESC', > 'DESCRIBE', 'DFS', 'DIRECTORIES', 'DIRECTORY', 'DISTINCT', 'DISTRIBUTE', > 'DROP', 'ELSE', 'END', 'ESCAPE', 'ESCAPED', 'EXCEPT', 'EXCHANGE', 'EXISTS', > 'EXPLAIN', 'EXPORT', 'EXTENDED', 'EXTERNAL', 'EXTRACT', 'FALSE', 'FETCH', > 'FIELDS', 'FILTER', 'FILEFORMAT', 'FIRST', 'FOLLOWING', 'FOR', 'FOREIGN', > 'FORMAT', 'FORMATTED', 'FROM', 'FULL', 'FUNCTION', 'FUNCTIONS', 'GLOBAL', > 'GRANT', 'GROUP', 'GROUPING', 'HAVING', 'HOUR', 'IF', 'IGNORE', 'IMPORT', > 'IN', 'INDEX', 'INDEXES', 'INNER', 'INPATH', 'INPUTFORMAT', 'INSERT', > 'INTERSECT', 'INTERVAL', 'INTO', 'IS', 'ITEMS', 'JOIN', 'KEYS', 'LAST', > 'LATERAL', 'LAZY', 'LEADING', 'LEFT', 'LIKE', 'LIMIT', 'LINES', 'LIST', > 'LOAD', 'LOCAL', 'LOCATION', 'LOCK', 'LOCKS', 'LOGICAL', 'MACRO', 'MAP', > 'MATCHED', 'MERGE', 'MINUTE', 'MONTH', 'MSCK', 'NAMESPACE', 'NAMESPACES', > 'NATURAL', 'NO', NOT, 'NULL', 'NULLS', 'OF', 'ON', 'ONLY', 'OPTION', > 'OPTIONS', 'OR', 'ORDER', 'OUT', 'OUTER', 'OUTPUTFORMAT', 'OVER', 'OVERLAPS', > 'OVERLAY', 'OVERWRITE', 'PARTITION', 'PARTITIONED', 'PARTITIONS', 'PERCENT', > 'PIVOT', 'PLACING', 'POSITION', 'PRECEDING', 'PRIMARY', 'PRINCIPALS', > 'PROPERTIES', 'PURGE', 'QUERY', 'RANGE', 'RECORDREADER', 'RECORDWRITER', > 'RECOVER', 'REDUCE', 'REFERENCES', 'REFRESH', 'RENAME', 'REPAIR', 'REPLACE', > 'RESET', 'RESTRICT', 'REVOKE', 'RIGHT', RLIKE, 'ROLE', 'ROLES', 'ROLLBACK', > 'ROLLUP', 'ROW', 'ROWS', 'SCHEMA', 'SECOND', 'SELECT', 'SEMI', 'SEPARATED', > 'SERDE', 'SERDEPROPERTIES', 'SESSION_USER', 'SET', 'MINUS', 'SETS', 'SHOW', > 'SKEWED', 'SOME', 'SORT', 'SORTED', 'START', 'STATISTICS', 'STORED', > 'STRATIFY', 'STRUCT', 'SUBSTR', 'SUBSTRING', 'TABLE', 'TABLES', > 'TABLESAMPLE', 'TBLPROPERTIES', TEMPORARY, 'TERMINATED', 'THEN', 'TO', > 'TOUCH', 'TRAILING', 'TRANSACTION', 'TRANSACTIONS', 'TRANSFORM', 'TRIM', > 'TRUE', 'TRUNCATE', 'TYPE', 'UNARCHIVE', 'UNBOUNDED', 'UNCACHE', 'UNION', > 'UNIQUE', 'UNKNOWN', 'UNLOCK', 'UNSET', 'UPDATE', 'USE', 'USER', 'USING', > 'VALUES', 'VIEW', 'VIEWS', 'WHEN', 'WHERE', 'WINDOW', 'WITH', 'YEAR', 'DIV', > IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 59) > == SQL == > SHOW TABLE EXTENDED IN default LIKE 'employee' PARTITION ('grade=1') > ---^^^ > ``` > > Expected : - If that partition value is string we can give like this grade > ='abc' -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32887) Example command in https://spark.apache.org/docs/latest/sql-ref-syntax-aux-show-table.html to be changed
[ https://issues.apache.org/jira/browse/SPARK-32887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196022#comment-17196022 ] Apache Spark commented on SPARK-32887: -- User 'Udbhav30' has created a pull request for this issue: https://github.com/apache/spark/pull/29758 > Example command in > https://spark.apache.org/docs/latest/sql-ref-syntax-aux-show-table.html to be > changed > > > Key: SPARK-32887 > URL: https://issues.apache.org/jira/browse/SPARK-32887 > Project: Spark > Issue Type: Bug > Components: docs >Affects Versions: 3.0.0 > Environment: Spark 2.4.5, Spark 3.0.0 >Reporter: Chetan Bhat >Priority: Minor > > In the link > [https://spark.apache.org/docs/latest/sql-ref-syntax-aux-show-table.html] the > below command example mentioned is wrong. > SHOW TABLE EXTENDED IN default LIKE 'employee' PARTITION ('grade=1'); > > Complete example executed throws below error. > CREATE TABLE employee(name STRING)PARTITIONED BY (grade int) stored as > parquet; > INSERT INTO employee PARTITION (grade = 1) VALUES ('sam'); > INSERT INTO employee PARTITION (grade = 2) VALUES ('suj'); > spark-sql> SHOW TABLE EXTENDED IN default LIKE 'employee' PARTITION > ('grade=1'); > **Error in query:** > ``` > mismatched input ''grade=1'' expecting \{'ADD', 'AFTER', 'ALL', 'ALTER', > 'ANALYZE', 'AND', 'ANTI', 'ANY', 'ARCHIVE', 'ARRAY', 'AS', 'ASC', 'AT', > 'AUTHORIZATION', 'BETWEEN', 'BOTH', 'BUCKET', 'BUCKETS', 'BY', 'CACHE', > 'CASCADE', 'CASE', 'CAST', 'CHANGE', 'CHECK', 'CLEAR', 'CLUSTER', > 'CLUSTERED', 'CODEGEN', 'COLLATE', 'COLLECTION', 'COLUMN', 'COLUMNS', > 'COMMENT', 'COMMIT', 'COMPACT', 'COMPACTIONS', 'COMPUTE', 'CONCATENATE', > 'CONSTRAINT', 'COST', 'CREATE', 'CROSS', 'CUBE', 'CURRENT', 'CURRENT_DATE', > 'CURRENT_TIME', 'CURRENT_TIMESTAMP', 'CURRENT_USER', 'DATA', 'DATABASE', > DATABASES, 'DAY', 'DBPROPERTIES', 'DEFINED', 'DELETE', 'DELIMITED', 'DESC', > 'DESCRIBE', 'DFS', 'DIRECTORIES', 'DIRECTORY', 'DISTINCT', 'DISTRIBUTE', > 'DROP', 'ELSE', 'END', 'ESCAPE', 'ESCAPED', 'EXCEPT', 'EXCHANGE', 'EXISTS', > 'EXPLAIN', 'EXPORT', 'EXTENDED', 'EXTERNAL', 'EXTRACT', 'FALSE', 'FETCH', > 'FIELDS', 'FILTER', 'FILEFORMAT', 'FIRST', 'FOLLOWING', 'FOR', 'FOREIGN', > 'FORMAT', 'FORMATTED', 'FROM', 'FULL', 'FUNCTION', 'FUNCTIONS', 'GLOBAL', > 'GRANT', 'GROUP', 'GROUPING', 'HAVING', 'HOUR', 'IF', 'IGNORE', 'IMPORT', > 'IN', 'INDEX', 'INDEXES', 'INNER', 'INPATH', 'INPUTFORMAT', 'INSERT', > 'INTERSECT', 'INTERVAL', 'INTO', 'IS', 'ITEMS', 'JOIN', 'KEYS', 'LAST', > 'LATERAL', 'LAZY', 'LEADING', 'LEFT', 'LIKE', 'LIMIT', 'LINES', 'LIST', > 'LOAD', 'LOCAL', 'LOCATION', 'LOCK', 'LOCKS', 'LOGICAL', 'MACRO', 'MAP', > 'MATCHED', 'MERGE', 'MINUTE', 'MONTH', 'MSCK', 'NAMESPACE', 'NAMESPACES', > 'NATURAL', 'NO', NOT, 'NULL', 'NULLS', 'OF', 'ON', 'ONLY', 'OPTION', > 'OPTIONS', 'OR', 'ORDER', 'OUT', 'OUTER', 'OUTPUTFORMAT', 'OVER', 'OVERLAPS', > 'OVERLAY', 'OVERWRITE', 'PARTITION', 'PARTITIONED', 'PARTITIONS', 'PERCENT', > 'PIVOT', 'PLACING', 'POSITION', 'PRECEDING', 'PRIMARY', 'PRINCIPALS', > 'PROPERTIES', 'PURGE', 'QUERY', 'RANGE', 'RECORDREADER', 'RECORDWRITER', > 'RECOVER', 'REDUCE', 'REFERENCES', 'REFRESH', 'RENAME', 'REPAIR', 'REPLACE', > 'RESET', 'RESTRICT', 'REVOKE', 'RIGHT', RLIKE, 'ROLE', 'ROLES', 'ROLLBACK', > 'ROLLUP', 'ROW', 'ROWS', 'SCHEMA', 'SECOND', 'SELECT', 'SEMI', 'SEPARATED', > 'SERDE', 'SERDEPROPERTIES', 'SESSION_USER', 'SET', 'MINUS', 'SETS', 'SHOW', > 'SKEWED', 'SOME', 'SORT', 'SORTED', 'START', 'STATISTICS', 'STORED', > 'STRATIFY', 'STRUCT', 'SUBSTR', 'SUBSTRING', 'TABLE', 'TABLES', > 'TABLESAMPLE', 'TBLPROPERTIES', TEMPORARY, 'TERMINATED', 'THEN', 'TO', > 'TOUCH', 'TRAILING', 'TRANSACTION', 'TRANSACTIONS', 'TRANSFORM', 'TRIM', > 'TRUE', 'TRUNCATE', 'TYPE', 'UNARCHIVE', 'UNBOUNDED', 'UNCACHE', 'UNION', > 'UNIQUE', 'UNKNOWN', 'UNLOCK', 'UNSET', 'UPDATE', 'USE', 'USER', 'USING', > 'VALUES', 'VIEW', 'VIEWS', 'WHEN', 'WHERE', 'WINDOW', 'WITH', 'YEAR', 'DIV', > IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 59) > == SQL == > SHOW TABLE EXTENDED IN default LIKE 'employee' PARTITION ('grade=1') > ---^^^ > ``` > > Expected : - If that partition value is string we can give like this grade > ='abc' -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32888) reading a parallized rdd with two identical records results in a zero count df when read via spark.read.csv
Punit Shah created SPARK-32888: -- Summary: reading a parallized rdd with two identical records results in a zero count df when read via spark.read.csv Key: SPARK-32888 URL: https://issues.apache.org/jira/browse/SPARK-32888 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.0.1, 3.0.0, 2.4.7, 2.4.6, 2.4.5 Reporter: Punit Shah * Imagine a two-row csv file like so (where the header and first record are duplicate rows): aaa,bbb aaa,bbb * The following is pyspark code * create a parallelized rdd like: {color:#FF}prdd = spark.read.text("test.csv").rdd.flatMap(lambda x : x){color} * {color:#172b4d}create a df like so: {color:#de350b}mydf = spark.read.csv(prdd, header=True){color}{color} * {color:#172b4d}{color:#de350b}df.count(){color:#172b4d} will result in a record count of zero (when it should be 1){color}{color}{color} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-28210) Shuffle Storage API: Reads
[ https://issues.apache.org/jira/browse/SPARK-28210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196013#comment-17196013 ] Attila Zsolt Piros edited comment on SPARK-28210 at 9/15/20, 9:38 AM: -- [~tianczha] [~devaraj] I would like to work on this issue if that's fine for you. I intend to progress along the ideas of the linked PR: to pass the metadata when the reducer task is constructed. was (Author: attilapiros): [~tianczha] [~devaraj] I would like to work on this issue if that's fine for you. I would like to progress along the ideas of the linked PR: to pass the metadata when the reducer task is constructed. > Shuffle Storage API: Reads > -- > > Key: SPARK-28210 > URL: https://issues.apache.org/jira/browse/SPARK-28210 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core >Affects Versions: 3.1.0 >Reporter: Matt Cheah >Priority: Major > > As part of the effort to store shuffle data in arbitrary places, this issue > tracks implementing an API for reading the shuffle data stored by the write > API. Also ensure that the existing shuffle implementation is refactored to > use the API. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28210) Shuffle Storage API: Reads
[ https://issues.apache.org/jira/browse/SPARK-28210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196013#comment-17196013 ] Attila Zsolt Piros commented on SPARK-28210: [~tianczha] [~devaraj] I would like to work on this issue if that's fine for you. I would like to progress along the ideas of the linked PR: to pass the metadata when the reducer task is constructed. > Shuffle Storage API: Reads > -- > > Key: SPARK-28210 > URL: https://issues.apache.org/jira/browse/SPARK-28210 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core >Affects Versions: 3.1.0 >Reporter: Matt Cheah >Priority: Major > > As part of the effort to store shuffle data in arbitrary places, this issue > tracks implementing an API for reading the shuffle data stored by the write > API. Also ensure that the existing shuffle implementation is refactored to > use the API. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32886) '.../jobs/undefined' link from "Event Timeline" in jobs page
[ https://issues.apache.org/jira/browse/SPARK-32886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17195968#comment-17195968 ] Apache Spark commented on SPARK-32886: -- User 'zhli1142015' has created a pull request for this issue: https://github.com/apache/spark/pull/29757 > '.../jobs/undefined' link from "Event Timeline" in jobs page > > > Key: SPARK-32886 > URL: https://issues.apache.org/jira/browse/SPARK-32886 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.0.0, 3.1.0 >Reporter: Zhen Li >Priority: Minor > Attachments: undefinedlink.JPG > > > In event timeline view of jobs page, clicking job item would redirect you to > corresponding job page. when there are two many jobs, some job items' link > would redirect to wrong link like '.../jobs/undefined' -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32886) '.../jobs/undefined' link from "Event Timeline" in jobs page
[ https://issues.apache.org/jira/browse/SPARK-32886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32886: Assignee: Apache Spark > '.../jobs/undefined' link from "Event Timeline" in jobs page > > > Key: SPARK-32886 > URL: https://issues.apache.org/jira/browse/SPARK-32886 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.0.0, 3.1.0 >Reporter: Zhen Li >Assignee: Apache Spark >Priority: Minor > Attachments: undefinedlink.JPG > > > In event timeline view of jobs page, clicking job item would redirect you to > corresponding job page. when there are two many jobs, some job items' link > would redirect to wrong link like '.../jobs/undefined' -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32886) '.../jobs/undefined' link from "Event Timeline" in jobs page
[ https://issues.apache.org/jira/browse/SPARK-32886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32886: Assignee: (was: Apache Spark) > '.../jobs/undefined' link from "Event Timeline" in jobs page > > > Key: SPARK-32886 > URL: https://issues.apache.org/jira/browse/SPARK-32886 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.0.0, 3.1.0 >Reporter: Zhen Li >Priority: Minor > Attachments: undefinedlink.JPG > > > In event timeline view of jobs page, clicking job item would redirect you to > corresponding job page. when there are two many jobs, some job items' link > would redirect to wrong link like '.../jobs/undefined' -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32886) '.../jobs/undefined' link from "Event Timeline" in jobs page
[ https://issues.apache.org/jira/browse/SPARK-32886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32886: Assignee: Apache Spark > '.../jobs/undefined' link from "Event Timeline" in jobs page > > > Key: SPARK-32886 > URL: https://issues.apache.org/jira/browse/SPARK-32886 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.0.0, 3.1.0 >Reporter: Zhen Li >Assignee: Apache Spark >Priority: Minor > Attachments: undefinedlink.JPG > > > In event timeline view of jobs page, clicking job item would redirect you to > corresponding job page. when there are two many jobs, some job items' link > would redirect to wrong link like '.../jobs/undefined' -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32887) Example command in https://spark.apache.org/docs/latest/sql-ref-syntax-aux-show-table.html to be changed
[ https://issues.apache.org/jira/browse/SPARK-32887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17195965#comment-17195965 ] Udbhav Agrawal commented on SPARK-32887: Thanks for reporting, seems to be a documenting typo error. and since it is misleading i will raise a MR to correct it > Example command in > https://spark.apache.org/docs/latest/sql-ref-syntax-aux-show-table.html to be > changed > > > Key: SPARK-32887 > URL: https://issues.apache.org/jira/browse/SPARK-32887 > Project: Spark > Issue Type: Bug > Components: docs >Affects Versions: 3.0.0 > Environment: Spark 2.4.5, Spark 3.0.0 >Reporter: Chetan Bhat >Priority: Minor > > In the link > [https://spark.apache.org/docs/latest/sql-ref-syntax-aux-show-table.html] the > below command example mentioned is wrong. > SHOW TABLE EXTENDED IN default LIKE 'employee' PARTITION ('grade=1'); > > Complete example executed throws below error. > CREATE TABLE employee(name STRING)PARTITIONED BY (grade int) stored as > parquet; > INSERT INTO employee PARTITION (grade = 1) VALUES ('sam'); > INSERT INTO employee PARTITION (grade = 2) VALUES ('suj'); > spark-sql> SHOW TABLE EXTENDED IN default LIKE 'employee' PARTITION > ('grade=1'); > **Error in query:** > ``` > mismatched input ''grade=1'' expecting \{'ADD', 'AFTER', 'ALL', 'ALTER', > 'ANALYZE', 'AND', 'ANTI', 'ANY', 'ARCHIVE', 'ARRAY', 'AS', 'ASC', 'AT', > 'AUTHORIZATION', 'BETWEEN', 'BOTH', 'BUCKET', 'BUCKETS', 'BY', 'CACHE', > 'CASCADE', 'CASE', 'CAST', 'CHANGE', 'CHECK', 'CLEAR', 'CLUSTER', > 'CLUSTERED', 'CODEGEN', 'COLLATE', 'COLLECTION', 'COLUMN', 'COLUMNS', > 'COMMENT', 'COMMIT', 'COMPACT', 'COMPACTIONS', 'COMPUTE', 'CONCATENATE', > 'CONSTRAINT', 'COST', 'CREATE', 'CROSS', 'CUBE', 'CURRENT', 'CURRENT_DATE', > 'CURRENT_TIME', 'CURRENT_TIMESTAMP', 'CURRENT_USER', 'DATA', 'DATABASE', > DATABASES, 'DAY', 'DBPROPERTIES', 'DEFINED', 'DELETE', 'DELIMITED', 'DESC', > 'DESCRIBE', 'DFS', 'DIRECTORIES', 'DIRECTORY', 'DISTINCT', 'DISTRIBUTE', > 'DROP', 'ELSE', 'END', 'ESCAPE', 'ESCAPED', 'EXCEPT', 'EXCHANGE', 'EXISTS', > 'EXPLAIN', 'EXPORT', 'EXTENDED', 'EXTERNAL', 'EXTRACT', 'FALSE', 'FETCH', > 'FIELDS', 'FILTER', 'FILEFORMAT', 'FIRST', 'FOLLOWING', 'FOR', 'FOREIGN', > 'FORMAT', 'FORMATTED', 'FROM', 'FULL', 'FUNCTION', 'FUNCTIONS', 'GLOBAL', > 'GRANT', 'GROUP', 'GROUPING', 'HAVING', 'HOUR', 'IF', 'IGNORE', 'IMPORT', > 'IN', 'INDEX', 'INDEXES', 'INNER', 'INPATH', 'INPUTFORMAT', 'INSERT', > 'INTERSECT', 'INTERVAL', 'INTO', 'IS', 'ITEMS', 'JOIN', 'KEYS', 'LAST', > 'LATERAL', 'LAZY', 'LEADING', 'LEFT', 'LIKE', 'LIMIT', 'LINES', 'LIST', > 'LOAD', 'LOCAL', 'LOCATION', 'LOCK', 'LOCKS', 'LOGICAL', 'MACRO', 'MAP', > 'MATCHED', 'MERGE', 'MINUTE', 'MONTH', 'MSCK', 'NAMESPACE', 'NAMESPACES', > 'NATURAL', 'NO', NOT, 'NULL', 'NULLS', 'OF', 'ON', 'ONLY', 'OPTION', > 'OPTIONS', 'OR', 'ORDER', 'OUT', 'OUTER', 'OUTPUTFORMAT', 'OVER', 'OVERLAPS', > 'OVERLAY', 'OVERWRITE', 'PARTITION', 'PARTITIONED', 'PARTITIONS', 'PERCENT', > 'PIVOT', 'PLACING', 'POSITION', 'PRECEDING', 'PRIMARY', 'PRINCIPALS', > 'PROPERTIES', 'PURGE', 'QUERY', 'RANGE', 'RECORDREADER', 'RECORDWRITER', > 'RECOVER', 'REDUCE', 'REFERENCES', 'REFRESH', 'RENAME', 'REPAIR', 'REPLACE', > 'RESET', 'RESTRICT', 'REVOKE', 'RIGHT', RLIKE, 'ROLE', 'ROLES', 'ROLLBACK', > 'ROLLUP', 'ROW', 'ROWS', 'SCHEMA', 'SECOND', 'SELECT', 'SEMI', 'SEPARATED', > 'SERDE', 'SERDEPROPERTIES', 'SESSION_USER', 'SET', 'MINUS', 'SETS', 'SHOW', > 'SKEWED', 'SOME', 'SORT', 'SORTED', 'START', 'STATISTICS', 'STORED', > 'STRATIFY', 'STRUCT', 'SUBSTR', 'SUBSTRING', 'TABLE', 'TABLES', > 'TABLESAMPLE', 'TBLPROPERTIES', TEMPORARY, 'TERMINATED', 'THEN', 'TO', > 'TOUCH', 'TRAILING', 'TRANSACTION', 'TRANSACTIONS', 'TRANSFORM', 'TRIM', > 'TRUE', 'TRUNCATE', 'TYPE', 'UNARCHIVE', 'UNBOUNDED', 'UNCACHE', 'UNION', > 'UNIQUE', 'UNKNOWN', 'UNLOCK', 'UNSET', 'UPDATE', 'USE', 'USER', 'USING', > 'VALUES', 'VIEW', 'VIEWS', 'WHEN', 'WHERE', 'WINDOW', 'WITH', 'YEAR', 'DIV', > IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 59) > == SQL == > SHOW TABLE EXTENDED IN default LIKE 'employee' PARTITION ('grade=1') > ---^^^ > ``` > > Expected : - If that partition value is string we can give like this grade > ='abc' -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32887) Example command in https://spark.apache.org/docs/latest/sql-ref-syntax-aux-show-table.html to be changed
[ https://issues.apache.org/jira/browse/SPARK-32887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chetan Bhat updated SPARK-32887: Description: In the link [https://spark.apache.org/docs/latest/sql-ref-syntax-aux-show-table.html] the below command example mentioned is wrong. SHOW TABLE EXTENDED IN default LIKE 'employee' PARTITION ('grade=1'); Complete example executed throws below error. CREATE TABLE employee(name STRING)PARTITIONED BY (grade int) stored as parquet; INSERT INTO employee PARTITION (grade = 1) VALUES ('sam'); INSERT INTO employee PARTITION (grade = 2) VALUES ('suj'); spark-sql> SHOW TABLE EXTENDED IN default LIKE 'employee' PARTITION ('grade=1'); **Error in query:** ``` mismatched input ''grade=1'' expecting \{'ADD', 'AFTER', 'ALL', 'ALTER', 'ANALYZE', 'AND', 'ANTI', 'ANY', 'ARCHIVE', 'ARRAY', 'AS', 'ASC', 'AT', 'AUTHORIZATION', 'BETWEEN', 'BOTH', 'BUCKET', 'BUCKETS', 'BY', 'CACHE', 'CASCADE', 'CASE', 'CAST', 'CHANGE', 'CHECK', 'CLEAR', 'CLUSTER', 'CLUSTERED', 'CODEGEN', 'COLLATE', 'COLLECTION', 'COLUMN', 'COLUMNS', 'COMMENT', 'COMMIT', 'COMPACT', 'COMPACTIONS', 'COMPUTE', 'CONCATENATE', 'CONSTRAINT', 'COST', 'CREATE', 'CROSS', 'CUBE', 'CURRENT', 'CURRENT_DATE', 'CURRENT_TIME', 'CURRENT_TIMESTAMP', 'CURRENT_USER', 'DATA', 'DATABASE', DATABASES, 'DAY', 'DBPROPERTIES', 'DEFINED', 'DELETE', 'DELIMITED', 'DESC', 'DESCRIBE', 'DFS', 'DIRECTORIES', 'DIRECTORY', 'DISTINCT', 'DISTRIBUTE', 'DROP', 'ELSE', 'END', 'ESCAPE', 'ESCAPED', 'EXCEPT', 'EXCHANGE', 'EXISTS', 'EXPLAIN', 'EXPORT', 'EXTENDED', 'EXTERNAL', 'EXTRACT', 'FALSE', 'FETCH', 'FIELDS', 'FILTER', 'FILEFORMAT', 'FIRST', 'FOLLOWING', 'FOR', 'FOREIGN', 'FORMAT', 'FORMATTED', 'FROM', 'FULL', 'FUNCTION', 'FUNCTIONS', 'GLOBAL', 'GRANT', 'GROUP', 'GROUPING', 'HAVING', 'HOUR', 'IF', 'IGNORE', 'IMPORT', 'IN', 'INDEX', 'INDEXES', 'INNER', 'INPATH', 'INPUTFORMAT', 'INSERT', 'INTERSECT', 'INTERVAL', 'INTO', 'IS', 'ITEMS', 'JOIN', 'KEYS', 'LAST', 'LATERAL', 'LAZY', 'LEADING', 'LEFT', 'LIKE', 'LIMIT', 'LINES', 'LIST', 'LOAD', 'LOCAL', 'LOCATION', 'LOCK', 'LOCKS', 'LOGICAL', 'MACRO', 'MAP', 'MATCHED', 'MERGE', 'MINUTE', 'MONTH', 'MSCK', 'NAMESPACE', 'NAMESPACES', 'NATURAL', 'NO', NOT, 'NULL', 'NULLS', 'OF', 'ON', 'ONLY', 'OPTION', 'OPTIONS', 'OR', 'ORDER', 'OUT', 'OUTER', 'OUTPUTFORMAT', 'OVER', 'OVERLAPS', 'OVERLAY', 'OVERWRITE', 'PARTITION', 'PARTITIONED', 'PARTITIONS', 'PERCENT', 'PIVOT', 'PLACING', 'POSITION', 'PRECEDING', 'PRIMARY', 'PRINCIPALS', 'PROPERTIES', 'PURGE', 'QUERY', 'RANGE', 'RECORDREADER', 'RECORDWRITER', 'RECOVER', 'REDUCE', 'REFERENCES', 'REFRESH', 'RENAME', 'REPAIR', 'REPLACE', 'RESET', 'RESTRICT', 'REVOKE', 'RIGHT', RLIKE, 'ROLE', 'ROLES', 'ROLLBACK', 'ROLLUP', 'ROW', 'ROWS', 'SCHEMA', 'SECOND', 'SELECT', 'SEMI', 'SEPARATED', 'SERDE', 'SERDEPROPERTIES', 'SESSION_USER', 'SET', 'MINUS', 'SETS', 'SHOW', 'SKEWED', 'SOME', 'SORT', 'SORTED', 'START', 'STATISTICS', 'STORED', 'STRATIFY', 'STRUCT', 'SUBSTR', 'SUBSTRING', 'TABLE', 'TABLES', 'TABLESAMPLE', 'TBLPROPERTIES', TEMPORARY, 'TERMINATED', 'THEN', 'TO', 'TOUCH', 'TRAILING', 'TRANSACTION', 'TRANSACTIONS', 'TRANSFORM', 'TRIM', 'TRUE', 'TRUNCATE', 'TYPE', 'UNARCHIVE', 'UNBOUNDED', 'UNCACHE', 'UNION', 'UNIQUE', 'UNKNOWN', 'UNLOCK', 'UNSET', 'UPDATE', 'USE', 'USER', 'USING', 'VALUES', 'VIEW', 'VIEWS', 'WHEN', 'WHERE', 'WINDOW', 'WITH', 'YEAR', 'DIV', IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 59) == SQL == SHOW TABLE EXTENDED IN default LIKE 'employee' PARTITION ('grade=1') ---^^^ ``` Expected : - If that partition value is string we can give like this grade ='abc' was: In the link [https://spark.apache.org/docs/latest/sql-ref-syntax-aux-show-table.html] the below command example mentioned is wrong. SHOW TABLE EXTENDED IN default LIKE 'employee' PARTITION ('grade=1'); CREATE TABLE employee(name STRING)PARTITIONED BY (grade int) stored as parquet; INSERT INTO employee PARTITION (grade = 1) VALUES ('sam'); INSERT INTO employee PARTITION (grade = 2) VALUES ('suj'); spark-sql> SHOW TABLE EXTENDED IN default LIKE 'employee' PARTITION ('grade=1'); **Error in query:** ``` mismatched input ''grade=1'' expecting \{'ADD', 'AFTER', 'ALL', 'ALTER', 'ANALYZE', 'AND', 'ANTI', 'ANY', 'ARCHIVE', 'ARRAY', 'AS', 'ASC', 'AT', 'AUTHORIZATION', 'BETWEEN', 'BOTH', 'BUCKET', 'BUCKETS', 'BY', 'CACHE', 'CASCADE', 'CASE', 'CAST', 'CHANGE', 'CHECK', 'CLEAR', 'CLUSTER', 'CLUSTERED', 'CODEGEN', 'COLLATE', 'COLLECTION', 'COLUMN', 'COLUMNS', 'COMMENT', 'COMMIT', 'COMPACT', 'COMPACTIONS', 'COMPUTE', 'CONCATENATE', 'CONSTRAINT', 'COST', 'CREATE', 'CROSS', 'CUBE', 'CURRENT', 'CURRENT_DATE', 'CURRENT_TIME', 'CURRENT_TIMESTAMP', 'CURRENT_USER', 'DATA', 'DATABASE', DATABASES, 'DAY', 'DBPROPERTIES', 'DEFINED', 'DELETE', 'DELIMITED', 'DESC', 'DESCRIBE', 'DFS', 'DIRECTORIES', 'DIRECTORY', 'DISTINCT', 'DISTRIBUTE', 'DROP', 'EL
[jira] [Created] (SPARK-32887) Example command in https://spark.apache.org/docs/latest/sql-ref-syntax-aux-show-table.html to be changed
Chetan Bhat created SPARK-32887: --- Summary: Example command in https://spark.apache.org/docs/latest/sql-ref-syntax-aux-show-table.html to be changed Key: SPARK-32887 URL: https://issues.apache.org/jira/browse/SPARK-32887 Project: Spark Issue Type: Bug Components: docs Affects Versions: 3.0.0 Environment: Spark 2.4.5, Spark 3.0.0 Reporter: Chetan Bhat In the link [https://spark.apache.org/docs/latest/sql-ref-syntax-aux-show-table.html] the below command example mentioned is wrong. SHOW TABLE EXTENDED IN default LIKE 'employee' PARTITION ('grade=1'); CREATE TABLE employee(name STRING)PARTITIONED BY (grade int) stored as parquet; INSERT INTO employee PARTITION (grade = 1) VALUES ('sam'); INSERT INTO employee PARTITION (grade = 2) VALUES ('suj'); spark-sql> SHOW TABLE EXTENDED IN default LIKE 'employee' PARTITION ('grade=1'); **Error in query:** ``` mismatched input ''grade=1'' expecting \{'ADD', 'AFTER', 'ALL', 'ALTER', 'ANALYZE', 'AND', 'ANTI', 'ANY', 'ARCHIVE', 'ARRAY', 'AS', 'ASC', 'AT', 'AUTHORIZATION', 'BETWEEN', 'BOTH', 'BUCKET', 'BUCKETS', 'BY', 'CACHE', 'CASCADE', 'CASE', 'CAST', 'CHANGE', 'CHECK', 'CLEAR', 'CLUSTER', 'CLUSTERED', 'CODEGEN', 'COLLATE', 'COLLECTION', 'COLUMN', 'COLUMNS', 'COMMENT', 'COMMIT', 'COMPACT', 'COMPACTIONS', 'COMPUTE', 'CONCATENATE', 'CONSTRAINT', 'COST', 'CREATE', 'CROSS', 'CUBE', 'CURRENT', 'CURRENT_DATE', 'CURRENT_TIME', 'CURRENT_TIMESTAMP', 'CURRENT_USER', 'DATA', 'DATABASE', DATABASES, 'DAY', 'DBPROPERTIES', 'DEFINED', 'DELETE', 'DELIMITED', 'DESC', 'DESCRIBE', 'DFS', 'DIRECTORIES', 'DIRECTORY', 'DISTINCT', 'DISTRIBUTE', 'DROP', 'ELSE', 'END', 'ESCAPE', 'ESCAPED', 'EXCEPT', 'EXCHANGE', 'EXISTS', 'EXPLAIN', 'EXPORT', 'EXTENDED', 'EXTERNAL', 'EXTRACT', 'FALSE', 'FETCH', 'FIELDS', 'FILTER', 'FILEFORMAT', 'FIRST', 'FOLLOWING', 'FOR', 'FOREIGN', 'FORMAT', 'FORMATTED', 'FROM', 'FULL', 'FUNCTION', 'FUNCTIONS', 'GLOBAL', 'GRANT', 'GROUP', 'GROUPING', 'HAVING', 'HOUR', 'IF', 'IGNORE', 'IMPORT', 'IN', 'INDEX', 'INDEXES', 'INNER', 'INPATH', 'INPUTFORMAT', 'INSERT', 'INTERSECT', 'INTERVAL', 'INTO', 'IS', 'ITEMS', 'JOIN', 'KEYS', 'LAST', 'LATERAL', 'LAZY', 'LEADING', 'LEFT', 'LIKE', 'LIMIT', 'LINES', 'LIST', 'LOAD', 'LOCAL', 'LOCATION', 'LOCK', 'LOCKS', 'LOGICAL', 'MACRO', 'MAP', 'MATCHED', 'MERGE', 'MINUTE', 'MONTH', 'MSCK', 'NAMESPACE', 'NAMESPACES', 'NATURAL', 'NO', NOT, 'NULL', 'NULLS', 'OF', 'ON', 'ONLY', 'OPTION', 'OPTIONS', 'OR', 'ORDER', 'OUT', 'OUTER', 'OUTPUTFORMAT', 'OVER', 'OVERLAPS', 'OVERLAY', 'OVERWRITE', 'PARTITION', 'PARTITIONED', 'PARTITIONS', 'PERCENT', 'PIVOT', 'PLACING', 'POSITION', 'PRECEDING', 'PRIMARY', 'PRINCIPALS', 'PROPERTIES', 'PURGE', 'QUERY', 'RANGE', 'RECORDREADER', 'RECORDWRITER', 'RECOVER', 'REDUCE', 'REFERENCES', 'REFRESH', 'RENAME', 'REPAIR', 'REPLACE', 'RESET', 'RESTRICT', 'REVOKE', 'RIGHT', RLIKE, 'ROLE', 'ROLES', 'ROLLBACK', 'ROLLUP', 'ROW', 'ROWS', 'SCHEMA', 'SECOND', 'SELECT', 'SEMI', 'SEPARATED', 'SERDE', 'SERDEPROPERTIES', 'SESSION_USER', 'SET', 'MINUS', 'SETS', 'SHOW', 'SKEWED', 'SOME', 'SORT', 'SORTED', 'START', 'STATISTICS', 'STORED', 'STRATIFY', 'STRUCT', 'SUBSTR', 'SUBSTRING', 'TABLE', 'TABLES', 'TABLESAMPLE', 'TBLPROPERTIES', TEMPORARY, 'TERMINATED', 'THEN', 'TO', 'TOUCH', 'TRAILING', 'TRANSACTION', 'TRANSACTIONS', 'TRANSFORM', 'TRIM', 'TRUE', 'TRUNCATE', 'TYPE', 'UNARCHIVE', 'UNBOUNDED', 'UNCACHE', 'UNION', 'UNIQUE', 'UNKNOWN', 'UNLOCK', 'UNSET', 'UPDATE', 'USE', 'USER', 'USING', 'VALUES', 'VIEW', 'VIEWS', 'WHEN', 'WHERE', 'WINDOW', 'WITH', 'YEAR', 'DIV', IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 59) == SQL == SHOW TABLE EXTENDED IN default LIKE 'employee' PARTITION ('grade=1') ---^^^ ``` Expected : - If that partition value is string we can give like this grade ='abc' -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32578) PageRank not sending the correct values in Pergel sendMessage
[ https://issues.apache.org/jira/browse/SPARK-32578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro resolved SPARK-32578. -- Resolution: Invalid > PageRank not sending the correct values in Pergel sendMessage > - > > Key: SPARK-32578 > URL: https://issues.apache.org/jira/browse/SPARK-32578 > Project: Spark > Issue Type: Bug > Components: GraphX >Affects Versions: 2.3.0, 2.4.0, 3.0.0 >Reporter: Shay Elbaz >Priority: Major > > The core sendMessage method is incorrect: > {code:java} > def sendMessage(edge: EdgeTriplet[(Double, Double), Double]) = { > if (edge.srcAttr._2 > tol) { >Iterator((edge.dstId, edge.srcAttr._2 * edge.attr)) > // *** THIS ^ *** > } else { >Iterator.empty > } > }{code} > > Instead of using the source PR value, it's using the PR delta (2nd tuple > arg). This is not the documented behavior, nor a valid PR algorithm AFAIK. > This is a 7 years old code, all versions affected. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32578) PageRank not sending the correct values in Pergel sendMessage
[ https://issues.apache.org/jira/browse/SPARK-32578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17195948#comment-17195948 ] Shay Elbaz commented on SPARK-32578: It turned out the problem was in my benchmark, sorry about that. > PageRank not sending the correct values in Pergel sendMessage > - > > Key: SPARK-32578 > URL: https://issues.apache.org/jira/browse/SPARK-32578 > Project: Spark > Issue Type: Bug > Components: GraphX >Affects Versions: 2.3.0, 2.4.0, 3.0.0 >Reporter: Shay Elbaz >Priority: Major > > The core sendMessage method is incorrect: > {code:java} > def sendMessage(edge: EdgeTriplet[(Double, Double), Double]) = { > if (edge.srcAttr._2 > tol) { >Iterator((edge.dstId, edge.srcAttr._2 * edge.attr)) > // *** THIS ^ *** > } else { >Iterator.empty > } > }{code} > > Instead of using the source PR value, it's using the PR delta (2nd tuple > arg). This is not the documented behavior, nor a valid PR algorithm AFAIK. > This is a 7 years old code, all versions affected. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32886) '.../jobs/undefined' link from EvenTimeline view
[ https://issues.apache.org/jira/browse/SPARK-32886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhen Li updated SPARK-32886: Attachment: undefinedlink.JPG > '.../jobs/undefined' link from EvenTimeline view > > > Key: SPARK-32886 > URL: https://issues.apache.org/jira/browse/SPARK-32886 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.0.0, 3.1.0 >Reporter: Zhen Li >Priority: Minor > Attachments: undefinedlink.JPG > > > In event timeline view of jobs page, clicking job item would redirect you to > corresponding job page. when there are two many jobs, some job items' link > would redirect to wrong link like '.../jobs/undefined' -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32886) '.../jobs/undefined' link from "Event Timeline" in jobs page
[ https://issues.apache.org/jira/browse/SPARK-32886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhen Li updated SPARK-32886: Summary: '.../jobs/undefined' link from "Event Timeline" in jobs page (was: '.../jobs/undefined' link from EvenTimeline view) > '.../jobs/undefined' link from "Event Timeline" in jobs page > > > Key: SPARK-32886 > URL: https://issues.apache.org/jira/browse/SPARK-32886 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.0.0, 3.1.0 >Reporter: Zhen Li >Priority: Minor > Attachments: undefinedlink.JPG > > > In event timeline view of jobs page, clicking job item would redirect you to > corresponding job page. when there are two many jobs, some job items' link > would redirect to wrong link like '.../jobs/undefined' -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32886) '.../jobs/undefined' link from EvenTimeline view
Zhen Li created SPARK-32886: --- Summary: '.../jobs/undefined' link from EvenTimeline view Key: SPARK-32886 URL: https://issues.apache.org/jira/browse/SPARK-32886 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 3.0.0, 3.1.0 Reporter: Zhen Li In event timeline view of jobs page, clicking job item would redirect you to corresponding job page. when there are two many jobs, some job items' link would redirect to wrong link like '.../jobs/undefined' -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org