date:20200915

[jira] [Comment Edited] (SPARK-32778) Accidental Data Deletion on calling saveAsTable

2020-09-15 Thread Aman Rastogi (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196721#comment-17196721
 ] 

Aman Rastogi edited comment on SPARK-32778 at 9/16/20, 6:46 AM:


I have reproduced the issue with v2.4.4. Code is also similar as it was in 
v2.2.0 

 

Line: 176

[https://github.com/apache/spark/blob/v2.4.4/sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala]


was (Author: amanr):
I have reproduced the issue with v2.4.4. Code is also similar as it was in 
v2.2.0 

https://github.com/apache/spark/blob/v2.4.4/sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala

> Accidental Data Deletion on calling saveAsTable
> ---
>
> Key: SPARK-32778
> URL: https://issues.apache.org/jira/browse/SPARK-32778
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Aman Rastogi
>Priority: Major
>
> {code:java}
> df.write.option("path", 
> "/already/existing/path").mode(SaveMode.Append).format("json").saveAsTable(db.table)
> {code}
> Above code deleted the data present in path "/already/existing/path". This 
> happened because table was already not there in hive metastore however, path 
> given had data. And if table is not present in Hive Metastore, SaveMode gets 
> modified internally to SaveMode.Overwrite irrespective of what user has 
> provided, which leads to data deletion. This change was introduced as part of 
> https://issues.apache.org/jira/browse/SPARK-19583. 
> Now, suppose if user is not using external hive metastore (hive metastore is 
> associated with a cluster) and if cluster goes down or due to some reason 
> user has to migrate to a new cluster. Once user tries to save data using 
> above code in new cluster, it will first delete the data. It could be a 
> production data and user is completely unaware of it as they have provided 
> SaveMode.Append or ErrorIfExists. This will be an accidental data deletion.
>  
> Repro Steps:
>  
>  # Save data through a hive table as mentioned in above code
>  # create another cluster and save data in new table in new cluster by giving 
> same path
>  
> Proposed Fix:
> Instead of modifying SaveMode to Overwrite, we should modify it to 
> ErrorIfExists in class CreateDataSourceTableAsSelectCommand.
> Change (line 154)
>  
> {code:java}
> val result = saveDataIntoTable(
>  sparkSession, table, tableLocation, child, SaveMode.Overwrite, tableExists = 
> false)
>  
> {code}
> to
>  
> {code:java}
> val result = saveDataIntoTable(
>  sparkSession, table, tableLocation, child, SaveMode.ErrorIfExists, 
> tableExists = false){code}
> This should not break CTAS. Even in case of CTAS, user may not want to delete 
> data if already exists as it could be accidental.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-32778) Accidental Data Deletion on calling saveAsTable

2020-09-15 Thread Aman Rastogi (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Rastogi reopened SPARK-32778:
--

I have reproduced the issue with v2.4.4. Code is also similar as it was in 
v2.2.0 

https://github.com/apache/spark/blob/v2.4.4/sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala

> Accidental Data Deletion on calling saveAsTable
> ---
>
> Key: SPARK-32778
> URL: https://issues.apache.org/jira/browse/SPARK-32778
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Aman Rastogi
>Priority: Major
>
> {code:java}
> df.write.option("path", 
> "/already/existing/path").mode(SaveMode.Append).format("json").saveAsTable(db.table)
> {code}
> Above code deleted the data present in path "/already/existing/path". This 
> happened because table was already not there in hive metastore however, path 
> given had data. And if table is not present in Hive Metastore, SaveMode gets 
> modified internally to SaveMode.Overwrite irrespective of what user has 
> provided, which leads to data deletion. This change was introduced as part of 
> https://issues.apache.org/jira/browse/SPARK-19583. 
> Now, suppose if user is not using external hive metastore (hive metastore is 
> associated with a cluster) and if cluster goes down or due to some reason 
> user has to migrate to a new cluster. Once user tries to save data using 
> above code in new cluster, it will first delete the data. It could be a 
> production data and user is completely unaware of it as they have provided 
> SaveMode.Append or ErrorIfExists. This will be an accidental data deletion.
>  
> Repro Steps:
>  
>  # Save data through a hive table as mentioned in above code
>  # create another cluster and save data in new table in new cluster by giving 
> same path
>  
> Proposed Fix:
> Instead of modifying SaveMode to Overwrite, we should modify it to 
> ErrorIfExists in class CreateDataSourceTableAsSelectCommand.
> Change (line 154)
>  
> {code:java}
> val result = saveDataIntoTable(
>  sparkSession, table, tableLocation, child, SaveMode.Overwrite, tableExists = 
> false)
>  
> {code}
> to
>  
> {code:java}
> val result = saveDataIntoTable(
>  sparkSession, table, tableLocation, child, SaveMode.ErrorIfExists, 
> tableExists = false){code}
> This should not break CTAS. Even in case of CTAS, user may not want to delete 
> data if already exists as it could be accidental.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-32778) Accidental Data Deletion on calling saveAsTable

2020-09-15 Thread Aman Rastogi (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Rastogi updated SPARK-32778:
-
Affects Version/s: (was: 2.2.0)
   2.4.4

> Accidental Data Deletion on calling saveAsTable
> ---
>
> Key: SPARK-32778
> URL: https://issues.apache.org/jira/browse/SPARK-32778
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Aman Rastogi
>Priority: Major
>
> {code:java}
> df.write.option("path", 
> "/already/existing/path").mode(SaveMode.Append).format("json").saveAsTable(db.table)
> {code}
> Above code deleted the data present in path "/already/existing/path". This 
> happened because table was already not there in hive metastore however, path 
> given had data. And if table is not present in Hive Metastore, SaveMode gets 
> modified internally to SaveMode.Overwrite irrespective of what user has 
> provided, which leads to data deletion. This change was introduced as part of 
> https://issues.apache.org/jira/browse/SPARK-19583. 
> Now, suppose if user is not using external hive metastore (hive metastore is 
> associated with a cluster) and if cluster goes down or due to some reason 
> user has to migrate to a new cluster. Once user tries to save data using 
> above code in new cluster, it will first delete the data. It could be a 
> production data and user is completely unaware of it as they have provided 
> SaveMode.Append or ErrorIfExists. This will be an accidental data deletion.
>  
> Repro Steps:
>  
>  # Save data through a hive table as mentioned in above code
>  # create another cluster and save data in new table in new cluster by giving 
> same path
>  
> Proposed Fix:
> Instead of modifying SaveMode to Overwrite, we should modify it to 
> ErrorIfExists in class CreateDataSourceTableAsSelectCommand.
> Change (line 154)
>  
> {code:java}
> val result = saveDataIntoTable(
>  sparkSession, table, tableLocation, child, SaveMode.Overwrite, tableExists = 
> false)
>  
> {code}
> to
>  
> {code:java}
> val result = saveDataIntoTable(
>  sparkSession, table, tableLocation, child, SaveMode.ErrorIfExists, 
> tableExists = false){code}
> This should not break CTAS. Even in case of CTAS, user may not want to delete 
> data if already exists as it could be accidental.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-32898) totalExecutorRunTimeMs is too big

2020-09-15 Thread Linhong Liu (Jira)

Linhong Liu created SPARK-32898:
---

 Summary: totalExecutorRunTimeMs is too big
 Key: SPARK-32898
 URL: https://issues.apache.org/jira/browse/SPARK-32898
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.0.1
Reporter: Linhong Liu


This might be because of incorrectly calculating executorRunTimeMs in 
Executor.scala
The function collectAccumulatorsAndResetStatusOnFailure(taskStartTimeNs) can be 
called when taskStartTimeNs is not set yet (it is 0).

As of now in master branch, here is the problematic code: 

[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L470]

 

There is a throw exception before this line. The catch branch still updates the 
metric.
However the query shows as SUCCESSful in QPL. Maybe this task is speculative. 
Not sure.

 

submissionTime in LiveExecutionData may also have similar problem.

[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala#L449]

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32804) run-example failed in standalone cluster mode

2020-09-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196660#comment-17196660
 ] 

Apache Spark commented on SPARK-32804:
--

User 'KevinSmile' has created a pull request for this issue:
https://github.com/apache/spark/pull/29769

> run-example failed in standalone cluster mode
> -
>
> Key: SPARK-32804
> URL: https://issues.apache.org/jira/browse/SPARK-32804
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Examples
>Affects Versions: 2.4.0, 3.0.0
> Environment: Spark 3.0 
>Reporter: Kevin Wang
>Assignee: Kevin Wang
>Priority: Minor
> Fix For: 3.1.0
>
> Attachments: image-2020-09-05-21-55-00-227.png
>
>
> run-example failed in standalone cluster mode (seems like something wrong in 
> SparkSubmitCommand Build): 
>  
>   !image-2020-09-05-21-55-00-227.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32804) run-example failed in standalone cluster mode

2020-09-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196658#comment-17196658
 ] 

Apache Spark commented on SPARK-32804:
--

User 'KevinSmile' has created a pull request for this issue:
https://github.com/apache/spark/pull/29769

> run-example failed in standalone cluster mode
> -
>
> Key: SPARK-32804
> URL: https://issues.apache.org/jira/browse/SPARK-32804
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Examples
>Affects Versions: 2.4.0, 3.0.0
> Environment: Spark 3.0 
>Reporter: Kevin Wang
>Assignee: Kevin Wang
>Priority: Minor
> Fix For: 3.1.0
>
> Attachments: image-2020-09-05-21-55-00-227.png
>
>
> run-example failed in standalone cluster mode (seems like something wrong in 
> SparkSubmitCommand Build): 
>  
>   !image-2020-09-05-21-55-00-227.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32894) Timestamp cast in exernal ocr table

2020-09-15 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196651#comment-17196651
 ] 

Hyukjin Kwon commented on SPARK-32894:
--

How did you create the Hive table?

> Timestamp cast in exernal ocr table
> ---
>
> Key: SPARK-32894
> URL: https://issues.apache.org/jira/browse/SPARK-32894
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 3.0.0
> Environment: Spark 3.0.0
> Java 1.8
> Hadoop 3.3.0
> Hive 3.1.2
> Python 3.7 (from pyspark)
>Reporter: Grigory Skvortsov
>Priority: Major
>
> I have the external hive table stored as orc. I want to work with timestamp 
> column in my table using pyspark.
> For example, I try this:
>  spark.sql('select id, time_ from mydb.table1`).show()
>  
>  Py4JJavaError: An error occurred while calling o2877.showString.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
> in stage 4.0 failed 4 times, most recent failure: Lost task 0.3 in stage 4.0 
> (TID 19, 172.29.14.241, executor 1): java.lang.ClassCastException: 
> org.apache.spark.unsafe.types.UTF8String cannot be cast to java.lang.Long
> at scala.runtime.BoxesRunTime.unboxToLong(BoxesRunTime.java:107)
> at 
> org.apache.spark.sql.catalyst.expressions.MutableLong.update(SpecificInternalRow.scala:148)
> at 
> org.apache.spark.sql.catalyst.expressions.SpecificInternalRow.update(SpecificInternalRow.scala:228)
> at 
> org.apache.spark.sql.hive.HiveInspectors.$anonfun$unwrapperFor$53(HiveInspectors.scala:730)
> at 
> org.apache.spark.sql.hive.HiveInspectors.$anonfun$unwrapperFor$53$adapted(HiveInspectors.scala:730)
> at 
> org.apache.spark.sql.hive.orc.OrcFileFormat$.$anonfun$unwrapOrcStructs$4(OrcFileFormat.scala:351)
> at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
> at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
> at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.next(FileScanRDD.scala:96)
> at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
> at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729)
> at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:340)
> at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:872)
> at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:872)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
> at org.apache.spark.scheduler.Task.run(Task.scala:127)
> at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:444)
> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:447)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Driver stacktrace:
> at 
> org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2023)
> at 
> org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:1972)
> at 
> org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:1971)
> at 
> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
> at 
> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
> at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1971)
> at 
> org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:950)
> at 
> org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:950)
> at scala.Option.foreach(Option.scala:407)
> at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:950)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2203)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2152)
> at 
> org.apache.spark.sche

[jira] [Assigned] (SPARK-32897) SparkSession.builder.getOrCreate should not show deprecation warning of SQLContext

2020-09-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32897:


Assignee: Apache Spark

> SparkSession.builder.getOrCreate should not show deprecation warning of 
> SQLContext
> --
>
> Key: SPARK-32897
> URL: https://issues.apache.org/jira/browse/SPARK-32897
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.7, 3.0.1, 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>
> In PySpark shell:
> {code}
> import warnings
> from pyspark.sql import SparkSession, SQLContext
> warnings.simplefilter('always', DeprecationWarning)
> spark.stop()
> SparkSession.builder.getOrCreate()
> {code}
> shows a deprecation warning from {{SQLContext}}
> {code}
> /.../spark/python/pyspark/sql/context.py:72: DeprecationWarning: Deprecated 
> in 3.0.0. Use SparkSession.builder.getOrCreate() instead.
>   DeprecationWarning)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-32897) SparkSession.builder.getOrCreate should not show deprecation warning of SQLContext

2020-09-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32897:


Assignee: (was: Apache Spark)

> SparkSession.builder.getOrCreate should not show deprecation warning of 
> SQLContext
> --
>
> Key: SPARK-32897
> URL: https://issues.apache.org/jira/browse/SPARK-32897
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.7, 3.0.1, 3.1.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> In PySpark shell:
> {code}
> import warnings
> from pyspark.sql import SparkSession, SQLContext
> warnings.simplefilter('always', DeprecationWarning)
> spark.stop()
> SparkSession.builder.getOrCreate()
> {code}
> shows a deprecation warning from {{SQLContext}}
> {code}
> /.../spark/python/pyspark/sql/context.py:72: DeprecationWarning: Deprecated 
> in 3.0.0. Use SparkSession.builder.getOrCreate() instead.
>   DeprecationWarning)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32897) SparkSession.builder.getOrCreate should not show deprecation warning of SQLContext

2020-09-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196648#comment-17196648
 ] 

Apache Spark commented on SPARK-32897:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/29768

> SparkSession.builder.getOrCreate should not show deprecation warning of 
> SQLContext
> --
>
> Key: SPARK-32897
> URL: https://issues.apache.org/jira/browse/SPARK-32897
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.7, 3.0.1, 3.1.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> In PySpark shell:
> {code}
> import warnings
> from pyspark.sql import SparkSession, SQLContext
> warnings.simplefilter('always', DeprecationWarning)
> spark.stop()
> SparkSession.builder.getOrCreate()
> {code}
> shows a deprecation warning from {{SQLContext}}
> {code}
> /.../spark/python/pyspark/sql/context.py:72: DeprecationWarning: Deprecated 
> in 3.0.0. Use SparkSession.builder.getOrCreate() instead.
>   DeprecationWarning)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-32688) LiteralGenerator for float and double does not generate special values

2020-09-15 Thread Takeshi Yamamuro (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro resolved SPARK-32688.
--
Fix Version/s: 3.1.0
   3.02
 Assignee: Tanel Kiis
   Resolution: Fixed

Resolved by https://github.com/apache/spark/pull/29515

> LiteralGenerator for float and double does not generate special values
> --
>
> Key: SPARK-32688
> URL: https://issues.apache.org/jira/browse/SPARK-32688
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Tanel Kiis
>Assignee: Tanel Kiis
>Priority: Minor
> Fix For: 3.02, 3.1.0
>
>
> Values like Double.NaN, Double.PositiveInfinity, Double.NegativeInfinity are 
> never returned.
> The main usage of LiteralGenerator is in the 
> checkConsistencyBetweenInterpretedAndCodegen method.
> This would have detected SPARK-32640 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-32896) Add DataStreamWriter.table API

2020-09-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32896:


Assignee: Apache Spark  (was: Jungtaek Lim)

> Add DataStreamWriter.table API
> --
>
> Key: SPARK-32896
> URL: https://issues.apache.org/jira/browse/SPARK-32896
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 3.1.0
>Reporter: Jungtaek Lim
>Assignee: Apache Spark
>Priority: Major
>
> For now, there's no way to write to the table (especially catalog table) even 
> the table is capable to handle streaming write.
> We can add DataStreamWriter.table API to let end users specify table as 
> provider, and let streaming query write into the table. That is just to 
> specify the table, and the overall usage of DataStreamWriter isn't changed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32896) Add DataStreamWriter.table API

2020-09-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196644#comment-17196644
 ] 

Apache Spark commented on SPARK-32896:
--

User 'HeartSaVioR' has created a pull request for this issue:
https://github.com/apache/spark/pull/29767

> Add DataStreamWriter.table API
> --
>
> Key: SPARK-32896
> URL: https://issues.apache.org/jira/browse/SPARK-32896
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 3.1.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
>
> For now, there's no way to write to the table (especially catalog table) even 
> the table is capable to handle streaming write.
> We can add DataStreamWriter.table API to let end users specify table as 
> provider, and let streaming query write into the table. That is just to 
> specify the table, and the overall usage of DataStreamWriter isn't changed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-32896) Add DataStreamWriter.table API

2020-09-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32896:


Assignee: Jungtaek Lim  (was: Apache Spark)

> Add DataStreamWriter.table API
> --
>
> Key: SPARK-32896
> URL: https://issues.apache.org/jira/browse/SPARK-32896
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 3.1.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
>
> For now, there's no way to write to the table (especially catalog table) even 
> the table is capable to handle streaming write.
> We can add DataStreamWriter.table API to let end users specify table as 
> provider, and let streaming query write into the table. That is just to 
> specify the table, and the overall usage of DataStreamWriter isn't changed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-32897) SparkSession.builder.getOrCreate should not show deprecation warning of SQLContext

2020-09-15 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-32897:


 Summary: SparkSession.builder.getOrCreate should not show 
deprecation warning of SQLContext
 Key: SPARK-32897
 URL: https://issues.apache.org/jira/browse/SPARK-32897
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 3.0.1, 2.4.7, 3.1.0
Reporter: Hyukjin Kwon


In PySpark shell:

{code}
import warnings
from pyspark.sql import SparkSession, SQLContext
warnings.simplefilter('always', DeprecationWarning)
spark.stop()
SparkSession.builder.getOrCreate()
{code}

shows a deprecation warning from {{SQLContext}}

{code}
/.../spark/python/pyspark/sql/context.py:72: DeprecationWarning: Deprecated in 
3.0.0. Use SparkSession.builder.getOrCreate() instead.
  DeprecationWarning)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32187) User Guide - Shipping Python Package

2020-09-15 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196634#comment-17196634
 ] 

Hyukjin Kwon commented on SPARK-32187:
--

Thank you so much [~fhoering]!

> User Guide - Shipping Python Package
> 
>
> Key: SPARK-32187
> URL: https://issues.apache.org/jira/browse/SPARK-32187
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Fabian Höring
>Priority: Major
>
> - Zipped file
> - Python files
> - Virtualenv with Yarn
> - PEX \(?\) (see also SPARK-25433)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-32896) Add DataStreamWriter.table API

2020-09-15 Thread Jungtaek Lim (Jira)

Jungtaek Lim created SPARK-32896:


 Summary: Add DataStreamWriter.table API
 Key: SPARK-32896
 URL: https://issues.apache.org/jira/browse/SPARK-32896
 Project: Spark
  Issue Type: New Feature
  Components: Structured Streaming
Affects Versions: 3.1.0
Reporter: Jungtaek Lim
Assignee: Jungtaek Lim


For now, there's no way to write to the table (especially catalog table) even 
the table is capable to handle streaming write.

We can add DataStreamWriter.table API to let end users specify table as 
provider, and let streaming query write into the table. That is just to specify 
the table, and the overall usage of DataStreamWriter isn't changed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32704) Logging plan changes for execution

2020-09-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196613#comment-17196613
 ] 

Apache Spark commented on SPARK-32704:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/29766

> Logging plan changes for execution
> --
>
> Key: SPARK-32704
> URL: https://issues.apache.org/jira/browse/SPARK-32704
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Takeshi Yamamuro
>Assignee: Takeshi Yamamuro
>Priority: Minor
> Fix For: 3.1.0
>
>
> Since we only log plan changes for analyzer/optimizer now, this ticket 
> targets adding code to log plan changes in the preparation phase in 
> QueryExecution for execution.
> {code}
> scala> spark.sql("SET spark.sql.optimizer.planChangeLog.level=WARN")
> scala> spark.range(10).groupBy("id").count().queryExecution.executedPlan
> ...
> 20/08/26 09:32:36 WARN PlanChangeLogger: 
> === Applying Rule org.apache.spark.sql.execution.CollapseCodegenStages ===
> !HashAggregate(keys=[id#19L], functions=[count(1)], output=[id#19L, 
> count#23L])  *(1) HashAggregate(keys=[id#19L], 
> functions=[count(1)], output=[id#19L, count#23L])
> !+- HashAggregate(keys=[id#19L], functions=[partial_count(1)], 
> output=[id#19L, count#27L])   +- *(1) HashAggregate(keys=[id#19L], 
> functions=[partial_count(1)], output=[id#19L, count#27L])
> !   +- Range (0, 10, step=1, splits=4)
>   +- *(1) Range (0, 10, step=1, splits=4)
>  
> 20/08/26 09:32:36 WARN PlanChangeLogger: 
> === Result of Batch Preparations ===
> !HashAggregate(keys=[id#19L], functions=[count(1)], output=[id#19L, 
> count#23L])  *(1) HashAggregate(keys=[id#19L], 
> functions=[count(1)], output=[id#19L, count#23L])
> !+- HashAggregate(keys=[id#19L], functions=[partial_count(1)], 
> output=[id#19L, count#27L])   +- *(1) HashAggregate(keys=[id#19L], 
> functions=[partial_count(1)], output=[id#19L, count#27L])
> !   +- Range (0, 10, step=1, splits=4)  
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32704) Logging plan changes for execution

2020-09-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196612#comment-17196612
 ] 

Apache Spark commented on SPARK-32704:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/29766

> Logging plan changes for execution
> --
>
> Key: SPARK-32704
> URL: https://issues.apache.org/jira/browse/SPARK-32704
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Takeshi Yamamuro
>Assignee: Takeshi Yamamuro
>Priority: Minor
> Fix For: 3.1.0
>
>
> Since we only log plan changes for analyzer/optimizer now, this ticket 
> targets adding code to log plan changes in the preparation phase in 
> QueryExecution for execution.
> {code}
> scala> spark.sql("SET spark.sql.optimizer.planChangeLog.level=WARN")
> scala> spark.range(10).groupBy("id").count().queryExecution.executedPlan
> ...
> 20/08/26 09:32:36 WARN PlanChangeLogger: 
> === Applying Rule org.apache.spark.sql.execution.CollapseCodegenStages ===
> !HashAggregate(keys=[id#19L], functions=[count(1)], output=[id#19L, 
> count#23L])  *(1) HashAggregate(keys=[id#19L], 
> functions=[count(1)], output=[id#19L, count#23L])
> !+- HashAggregate(keys=[id#19L], functions=[partial_count(1)], 
> output=[id#19L, count#27L])   +- *(1) HashAggregate(keys=[id#19L], 
> functions=[partial_count(1)], output=[id#19L, count#27L])
> !   +- Range (0, 10, step=1, splits=4)
>   +- *(1) Range (0, 10, step=1, splits=4)
>  
> 20/08/26 09:32:36 WARN PlanChangeLogger: 
> === Result of Batch Preparations ===
> !HashAggregate(keys=[id#19L], functions=[count(1)], output=[id#19L, 
> count#23L])  *(1) HashAggregate(keys=[id#19L], 
> functions=[count(1)], output=[id#19L, count#23L])
> !+- HashAggregate(keys=[id#19L], functions=[partial_count(1)], 
> output=[id#19L, count#27L])   +- *(1) HashAggregate(keys=[id#19L], 
> functions=[partial_count(1)], output=[id#19L, count#27L])
> !   +- Range (0, 10, step=1, splits=4)  
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-32888) reading a parallized rdd with two identical records results in a zero count df when read via spark.read.csv

2020-09-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32888:


Assignee: (was: Apache Spark)

> reading a parallized rdd with two identical records results in a zero count 
> df when read via spark.read.csv
> ---
>
> Key: SPARK-32888
> URL: https://issues.apache.org/jira/browse/SPARK-32888
> Project: Spark
>  Issue Type: Documentation
>  Components: Spark Core
>Affects Versions: 2.4.5, 2.4.6, 2.4.7, 3.0.0, 3.0.1
>Reporter: Punit Shah
>Priority: Minor
>
> * Imagine a two-row csv file like so (where the header and first record are 
> duplicate rows):
> aaa,bbb
> aaa,bbb
>  * The following is pyspark code
>  * create a parallelized rdd like: {color:#FF}prdd = 
> spark.read.text("test.csv").rdd.flatMap(lambda x : x){color}
>  * {color:#172b4d}create a df like so: {color:#de350b}mydf = 
> spark.read.csv(prdd, header=True){color}{color}
>  * {color:#172b4d}{color:#de350b}df.count(){color:#172b4d} will result in a 
> record count of zero (when it should be 1){color}{color}{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32888) reading a parallized rdd with two identical records results in a zero count df when read via spark.read.csv

2020-09-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196591#comment-17196591
 ] 

Apache Spark commented on SPARK-32888:
--

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/29765

> reading a parallized rdd with two identical records results in a zero count 
> df when read via spark.read.csv
> ---
>
> Key: SPARK-32888
> URL: https://issues.apache.org/jira/browse/SPARK-32888
> Project: Spark
>  Issue Type: Documentation
>  Components: Spark Core
>Affects Versions: 2.4.5, 2.4.6, 2.4.7, 3.0.0, 3.0.1
>Reporter: Punit Shah
>Priority: Minor
>
> * Imagine a two-row csv file like so (where the header and first record are 
> duplicate rows):
> aaa,bbb
> aaa,bbb
>  * The following is pyspark code
>  * create a parallelized rdd like: {color:#FF}prdd = 
> spark.read.text("test.csv").rdd.flatMap(lambda x : x){color}
>  * {color:#172b4d}create a df like so: {color:#de350b}mydf = 
> spark.read.csv(prdd, header=True){color}{color}
>  * {color:#172b4d}{color:#de350b}df.count(){color:#172b4d} will result in a 
> record count of zero (when it should be 1){color}{color}{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-32888) reading a parallized rdd with two identical records results in a zero count df when read via spark.read.csv

2020-09-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32888:


Assignee: Apache Spark

> reading a parallized rdd with two identical records results in a zero count 
> df when read via spark.read.csv
> ---
>
> Key: SPARK-32888
> URL: https://issues.apache.org/jira/browse/SPARK-32888
> Project: Spark
>  Issue Type: Documentation
>  Components: Spark Core
>Affects Versions: 2.4.5, 2.4.6, 2.4.7, 3.0.0, 3.0.1
>Reporter: Punit Shah
>Assignee: Apache Spark
>Priority: Minor
>
> * Imagine a two-row csv file like so (where the header and first record are 
> duplicate rows):
> aaa,bbb
> aaa,bbb
>  * The following is pyspark code
>  * create a parallelized rdd like: {color:#FF}prdd = 
> spark.read.text("test.csv").rdd.flatMap(lambda x : x){color}
>  * {color:#172b4d}create a df like so: {color:#de350b}mydf = 
> spark.read.csv(prdd, header=True){color}{color}
>  * {color:#172b4d}{color:#de350b}df.count(){color:#172b4d} will result in a 
> record count of zero (when it should be 1){color}{color}{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32888) reading a parallized rdd with two identical records results in a zero count df when read via spark.read.csv

2020-09-15 Thread L. C. Hsieh (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196590#comment-17196590
 ] 

L. C. Hsieh commented on SPARK-32888:
-

 This was documented in CSV related codes, although it seems not documented in 
user documentation.

> reading a parallized rdd with two identical records results in a zero count 
> df when read via spark.read.csv
> ---
>
> Key: SPARK-32888
> URL: https://issues.apache.org/jira/browse/SPARK-32888
> Project: Spark
>  Issue Type: Documentation
>  Components: Spark Core
>Affects Versions: 2.4.5, 2.4.6, 2.4.7, 3.0.0, 3.0.1
>Reporter: Punit Shah
>Priority: Minor
>
> * Imagine a two-row csv file like so (where the header and first record are 
> duplicate rows):
> aaa,bbb
> aaa,bbb
>  * The following is pyspark code
>  * create a parallelized rdd like: {color:#FF}prdd = 
> spark.read.text("test.csv").rdd.flatMap(lambda x : x){color}
>  * {color:#172b4d}create a df like so: {color:#de350b}mydf = 
> spark.read.csv(prdd, header=True){color}{color}
>  * {color:#172b4d}{color:#de350b}df.count(){color:#172b4d} will result in a 
> record count of zero (when it should be 1){color}{color}{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-32888) reading a parallized rdd with two identical records results in a zero count df when read via spark.read.csv

2020-09-15 Thread L. C. Hsieh (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh updated SPARK-32888:

Issue Type: Documentation  (was: Bug)

> reading a parallized rdd with two identical records results in a zero count 
> df when read via spark.read.csv
> ---
>
> Key: SPARK-32888
> URL: https://issues.apache.org/jira/browse/SPARK-32888
> Project: Spark
>  Issue Type: Documentation
>  Components: Spark Core
>Affects Versions: 2.4.5, 2.4.6, 2.4.7, 3.0.0, 3.0.1
>Reporter: Punit Shah
>Priority: Minor
>
> * Imagine a two-row csv file like so (where the header and first record are 
> duplicate rows):
> aaa,bbb
> aaa,bbb
>  * The following is pyspark code
>  * create a parallelized rdd like: {color:#FF}prdd = 
> spark.read.text("test.csv").rdd.flatMap(lambda x : x){color}
>  * {color:#172b4d}create a df like so: {color:#de350b}mydf = 
> spark.read.csv(prdd, header=True){color}{color}
>  * {color:#172b4d}{color:#de350b}df.count(){color:#172b4d} will result in a 
> record count of zero (when it should be 1){color}{color}{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-32891) Enhance UTF8String.trim

2020-09-15 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-32891:

Description: 
It sounds like {{UTF8String.trim}} is not implemented well. We may need to look 
at how {{java.lang.String.trim}} is implemented.

Please see comment:
 
[https://github.com/apache/spark/blob/7eb76d698836a251065753117e22285dd1a8aa8f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L674-L675]

[https://github.com/apache/spark/pull/29731#discussion_r487709672]

  was:
Please see comment:
https://github.com/apache/spark/blob/7eb76d698836a251065753117e22285dd1a8aa8f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L674-L675

https://github.com/apache/spark/pull/29731#discussion_r487709672


> Enhance UTF8String.trim
> ---
>
> Key: SPARK-32891
> URL: https://issues.apache.org/jira/browse/SPARK-32891
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Priority: Major
>
> It sounds like {{UTF8String.trim}} is not implemented well. We may need to 
> look at how {{java.lang.String.trim}} is implemented.
> Please see comment:
>  
> [https://github.com/apache/spark/blob/7eb76d698836a251065753117e22285dd1a8aa8f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L674-L675]
> [https://github.com/apache/spark/pull/29731#discussion_r487709672]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-32895) DataSourceV2 allow ACCEPT_ANY_SCHEMA in write path

2020-09-15 Thread Sebastian Herold (Jira)

Sebastian Herold created SPARK-32895:


 Summary: DataSourceV2 allow ACCEPT_ANY_SCHEMA in write path
 Key: SPARK-32895
 URL: https://issues.apache.org/jira/browse/SPARK-32895
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Sebastian Herold


During the development of a Spark-Collibra-Connector using the DataSourceV2 
framework, I found some blocking limitation in the current version.

The connector should accept DataFrames of arbitrary schemas and send them to 
the Import API of Collibra. The problem is the method {{inferSchema}} of the 
{{TableProvider}}. Although, my {{Table}} implementation has the capability to 
{{ACCEPT_ANY_SCHEMA}}. I need to infer the schema without knowing the actual 
schema of the data frame. This is impossible. The behaviour is maybe intended, 
if you are writing to an existing table with fix schema, but not if you accept 
any schema. Such cases cannot be implemented, right now. I found in 
[{{DataFrameWriter.scala}}|https://github.com/apache/spark/blob/4fac6d501a5d97530edb712ff3450890ac10e413/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala#L333]
 that for data sources inherited from {{FileDataSourceV2}} there is an 
exception and {{inferSchema}} is not called on the write path and {{getTable}} 
is called with the schema of the actual data frame. This is the reason why it 
works for these data source derived from {{FileDataSourceV2}}. I would expect a 
similar behaviour for my data source which has the capability to accept any 
schema. The problem is that the capabilities are retrieved by the {{Table}} 
implementation, but to get a table via {{getTable}} you need a schema. I guess 
the interface should be designed differently:
* two different methods to infer the schema: 
** one for the read path like the current implementation
** one for the write path getting the actual schema of the data frame as 
parameter, this allows the implementation to decide:
*** Do I accept all schemas and just the return the schema of the data frame?
*** Do I know the schema of the target and ignore the schema of the actual data 
frame?
*** Can the schema of the target be evolved and I check the schema of the data 
frame to be a valid evolution of the target schema?

If you agree, I'm willing to make a PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-32893) Structured Streaming and Dynamic Allocation on StandaloneCluster

2020-09-15 Thread L. C. Hsieh (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh updated SPARK-32893:

Priority: Major  (was: Blocker)

> Structured Streaming and Dynamic Allocation on StandaloneCluster
> 
>
> Key: SPARK-32893
> URL: https://issues.apache.org/jira/browse/SPARK-32893
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.0.1
>Reporter: Duarte Ferreira
>Priority: Major
>
> We are currently using Spark 3.0.1 Standalone cluster to run our Structured 
> streaming applications.
> We set the following configurations when running the application in cluster 
> mode:
>  * spark.dynamicAllocation.enabled = true
>  * spark.shuffle.service.enabled = true
>  * spark.cores.max =5
>  * spark.executor.memory = 1G
>  * spark.executor.cores = 1
> We also have the configurations set to enable spark.shuffle.service.enabled 
> on each worker and have a cluster composed of 1 master and 2 slaves.
> The application reads data from a kafka Topic (readTopic) using [This 
> documentation, 
> |https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html]applies
>  some transformations on the DataSet using spark SQL and writes data to 
> another Kafka Topic (writeTopic).
> When we start the application it behaves correctly, it starts with 0 
> executors and. as we start feeding data to the readTopic, it starts 
> increasing the number of executors until it reaches the 5 executors limit and 
> all messages are transformed and written to the writeTopic in Kafka.
> If we stop feeding messages to the readTopic the application will work as 
> expected and starts killing executors that are not needed anymore until we 
> stop sending data completely and it reach 0 executors running.
> If we start sending data again right away, it behaves just as expected it 
> starts increasing the numbers of executors again. But if we leave the 
> application in idle at 0 executors for around 10 minutes we start getting 
> errors like this:
> {noformat}
> *no*
> 20/09/15 10:41:22 ERROR TransportClient: Failed to send RPC RPC 
> 7570256331800450365 to sparkmaster/10.0.12.231:7077: 
> java.nio.channels.ClosedChannelException
> java.nio.channels.ClosedChannelException
>   at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.newClosedChannelException(AbstractChannel.java:957)
>   at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:865)
>   at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1367)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:764)
>   at 
> io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1104)
>   at 
> io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>   at 
> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>   at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: Connection reset by peer
>   at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>   at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
>   at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
>   at sun.nio.ch.IOUtil.write(IOUtil.java:65)
>   at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:468)
>   at 
> org.apache.spark.network.protocol.MessageWithHeader.copyByteBuf(MessageWithHeader.java:148)
>   at 
> org.apache.spark.network.protocol.MessageWithHeader.transferTo(MessageWithHeader.java:123)
>   at 
> io.netty.channel.socket.nio.NioSocketChannel.doWriteFileRegion(NioSocketChannel.java:362)
>   at 
> io.netty.channel.nio.AbstractNioByteChannel.doWriteInternal(AbstractNioByteChannel.java:235)
>   at 
> io.netty.channel.nio.AbstractNioByteChannel.doWrite0(AbstractNioByteChannel.java:209)
>   at 
> io.netty.channel.socket.nio.NioSocketChannel.doWrite(NioSocketChannel.java:400)
>   at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:930)
>   at 
> io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0(AbstractNioChannel.java:354)
>   at 
> io.nett

[jira] [Created] (SPARK-32894) Timestamp cast in exernal ocr table

2020-09-15 Thread Grigory Skvortsov (Jira)

Grigory Skvortsov created SPARK-32894:
-

 Summary: Timestamp cast in exernal ocr table
 Key: SPARK-32894
 URL: https://issues.apache.org/jira/browse/SPARK-32894
 Project: Spark
  Issue Type: Bug
  Components: Java API
Affects Versions: 3.0.0
 Environment: Spark 3.0.0

Java 1.8

Hadoop 3.3.0

Hive 3.1.2

Python 3.7 (from pyspark)
Reporter: Grigory Skvortsov


I have the external hive table stored as orc. I want to work with timestamp 
column in my table using pyspark.

For example, I try this:
 spark.sql('select id, time_ from mydb.table1`).show()
 
 Py4JJavaError: An error occurred while calling o2877.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 4.0 failed 4 times, most recent failure: Lost task 0.3 in stage 4.0 (TID 
19, 172.29.14.241, executor 1): java.lang.ClassCastException: 
org.apache.spark.unsafe.types.UTF8String cannot be cast to java.lang.Long
at scala.runtime.BoxesRunTime.unboxToLong(BoxesRunTime.java:107)
at 
org.apache.spark.sql.catalyst.expressions.MutableLong.update(SpecificInternalRow.scala:148)
at 
org.apache.spark.sql.catalyst.expressions.SpecificInternalRow.update(SpecificInternalRow.scala:228)
at 
org.apache.spark.sql.hive.HiveInspectors.$anonfun$unwrapperFor$53(HiveInspectors.scala:730)
at 
org.apache.spark.sql.hive.HiveInspectors.$anonfun$unwrapperFor$53$adapted(HiveInspectors.scala:730)
at 
org.apache.spark.sql.hive.orc.OrcFileFormat$.$anonfun$unwrapOrcStructs$4(OrcFileFormat.scala:351)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.next(FileScanRDD.scala:96)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729)
at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:340)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:872)
at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:872)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:127)
at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:444)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:447)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Driver stacktrace:
at 
org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2023)
at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:1972)
at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:1971)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1971)
at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:950)
at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:950)
at scala.Option.foreach(Option.scala:407)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:950)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2203)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2152)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2141)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:752)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2093)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2114)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2133)
at or

[jira] [Updated] (SPARK-32893) Structured Streaming and Dynamic Allocation on StandaloneCluster

2020-09-15 Thread Duarte Ferreira (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duarte Ferreira updated SPARK-32893:

Priority: Blocker  (was: Major)

> Structured Streaming and Dynamic Allocation on StandaloneCluster
> 
>
> Key: SPARK-32893
> URL: https://issues.apache.org/jira/browse/SPARK-32893
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.0.1
>Reporter: Duarte Ferreira
>Priority: Blocker
>
> We are currently using Spark 3.0.1 Standalone cluster to run our Structured 
> streaming applications.
> We set the following configurations when running the application in cluster 
> mode:
>  * spark.dynamicAllocation.enabled = true
>  * spark.shuffle.service.enabled = true
>  * spark.cores.max =5
>  * spark.executor.memory = 1G
>  * spark.executor.cores = 1
> We also have the configurations set to enable spark.shuffle.service.enabled 
> on each worker and have a cluster composed of 1 master and 2 slaves.
> The application reads data from a kafka Topic (readTopic) using [This 
> documentation, 
> |https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html]applies
>  some transformations on the DataSet using spark SQL and writes data to 
> another Kafka Topic (writeTopic).
> When we start the application it behaves correctly, it starts with 0 
> executors and. as we start feeding data to the readTopic, it starts 
> increasing the number of executors until it reaches the 5 executors limit and 
> all messages are transformed and written to the writeTopic in Kafka.
> If we stop feeding messages to the readTopic the application will work as 
> expected and starts killing executors that are not needed anymore until we 
> stop sending data completely and it reach 0 executors running.
> If we start sending data again right away, it behaves just as expected it 
> starts increasing the numbers of executors again. But if we leave the 
> application in idle at 0 executors for around 10 minutes we start getting 
> errors like this:
> {noformat}
> *no*
> 20/09/15 10:41:22 ERROR TransportClient: Failed to send RPC RPC 
> 7570256331800450365 to sparkmaster/10.0.12.231:7077: 
> java.nio.channels.ClosedChannelException
> java.nio.channels.ClosedChannelException
>   at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.newClosedChannelException(AbstractChannel.java:957)
>   at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:865)
>   at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1367)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:764)
>   at 
> io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1104)
>   at 
> io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>   at 
> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>   at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: Connection reset by peer
>   at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>   at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
>   at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
>   at sun.nio.ch.IOUtil.write(IOUtil.java:65)
>   at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:468)
>   at 
> org.apache.spark.network.protocol.MessageWithHeader.copyByteBuf(MessageWithHeader.java:148)
>   at 
> org.apache.spark.network.protocol.MessageWithHeader.transferTo(MessageWithHeader.java:123)
>   at 
> io.netty.channel.socket.nio.NioSocketChannel.doWriteFileRegion(NioSocketChannel.java:362)
>   at 
> io.netty.channel.nio.AbstractNioByteChannel.doWriteInternal(AbstractNioByteChannel.java:235)
>   at 
> io.netty.channel.nio.AbstractNioByteChannel.doWrite0(AbstractNioByteChannel.java:209)
>   at 
> io.netty.channel.socket.nio.NioSocketChannel.doWrite(NioSocketChannel.java:400)
>   at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:930)
>   at 
> io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0(AbstractNioChannel.java:354)
>   at

[jira] [Commented] (SPARK-32738) thread safe endpoints may hang due to fatal error

2020-09-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196433#comment-17196433
 ] 

Apache Spark commented on SPARK-32738:
--

User 'wzhfy' has created a pull request for this issue:
https://github.com/apache/spark/pull/29764

> thread safe endpoints may hang due to fatal error
> -
>
> Key: SPARK-32738
> URL: https://issues.apache.org/jira/browse/SPARK-32738
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.4, 2.4.6, 3.0.0
>Reporter: Zhenhua Wang
>Assignee: Zhenhua Wang
>Priority: Major
> Fix For: 3.1.0
>
>
> Processing for `ThreadSafeRpcEndpoint` is controlled by 'numActiveThreads' in 
> `Inbox`. Now if any fatal error happens during `Inbox.process`, 
> 'numActiveThreads' is not reduced. Then other threads can not process 
> messages in that inbox, which causes the endpoint to "hang".
> This problem is more serious in previous Spark 2.x versions since the driver, 
> executor and block manager endpoints are all thread safe endpoints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32738) thread safe endpoints may hang due to fatal error

2020-09-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196432#comment-17196432
 ] 

Apache Spark commented on SPARK-32738:
--

User 'wzhfy' has created a pull request for this issue:
https://github.com/apache/spark/pull/29764

> thread safe endpoints may hang due to fatal error
> -
>
> Key: SPARK-32738
> URL: https://issues.apache.org/jira/browse/SPARK-32738
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.4, 2.4.6, 3.0.0
>Reporter: Zhenhua Wang
>Assignee: Zhenhua Wang
>Priority: Major
> Fix For: 3.1.0
>
>
> Processing for `ThreadSafeRpcEndpoint` is controlled by 'numActiveThreads' in 
> `Inbox`. Now if any fatal error happens during `Inbox.process`, 
> 'numActiveThreads' is not reduced. Then other threads can not process 
> messages in that inbox, which causes the endpoint to "hang".
> This problem is more serious in previous Spark 2.x versions since the driver, 
> executor and block manager endpoints are all thread safe endpoints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32738) thread safe endpoints may hang due to fatal error

2020-09-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196300#comment-17196300
 ] 

Apache Spark commented on SPARK-32738:
--

User 'wzhfy' has created a pull request for this issue:
https://github.com/apache/spark/pull/29763

> thread safe endpoints may hang due to fatal error
> -
>
> Key: SPARK-32738
> URL: https://issues.apache.org/jira/browse/SPARK-32738
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.4, 2.4.6, 3.0.0
>Reporter: Zhenhua Wang
>Assignee: Zhenhua Wang
>Priority: Major
> Fix For: 3.1.0
>
>
> Processing for `ThreadSafeRpcEndpoint` is controlled by 'numActiveThreads' in 
> `Inbox`. Now if any fatal error happens during `Inbox.process`, 
> 'numActiveThreads' is not reduced. Then other threads can not process 
> messages in that inbox, which causes the endpoint to "hang".
> This problem is more serious in previous Spark 2.x versions since the driver, 
> executor and block manager endpoints are all thread safe endpoints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32738) thread safe endpoints may hang due to fatal error

2020-09-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196309#comment-17196309
 ] 

Apache Spark commented on SPARK-32738:
--

User 'wzhfy' has created a pull request for this issue:
https://github.com/apache/spark/pull/29763

> thread safe endpoints may hang due to fatal error
> -
>
> Key: SPARK-32738
> URL: https://issues.apache.org/jira/browse/SPARK-32738
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.4, 2.4.6, 3.0.0
>Reporter: Zhenhua Wang
>Assignee: Zhenhua Wang
>Priority: Major
> Fix For: 3.1.0
>
>
> Processing for `ThreadSafeRpcEndpoint` is controlled by 'numActiveThreads' in 
> `Inbox`. Now if any fatal error happens during `Inbox.process`, 
> 'numActiveThreads' is not reduced. Then other threads can not process 
> messages in that inbox, which causes the endpoint to "hang".
> This problem is more serious in previous Spark 2.x versions since the driver, 
> executor and block manager endpoints are all thread safe endpoints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-32884) Mark TPCDSQuery*Suite as ExtendedSQLTest

2020-09-15 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-32884.
---
Fix Version/s: 3.1.0
 Assignee: Dongjoon Hyun
   Resolution: Fixed

> Mark TPCDSQuery*Suite as ExtendedSQLTest
> 
>
> Key: SPARK-32884
> URL: https://issues.apache.org/jira/browse/SPARK-32884
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-32827) Add spark.sql.maxMetadataStringLength config

2020-09-15 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-32827.
-
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 29688
[https://github.com/apache/spark/pull/29688]

> Add spark.sql.maxMetadataStringLength config
> 
>
> Key: SPARK-32827
> URL: https://issues.apache.org/jira/browse/SPARK-32827
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: ulysses you
>Assignee: ulysses you
>Priority: Minor
> Fix For: 3.1.0
>
>
> Add a new config `spark.sql.maxMetadataStringLength`. This config aims to 
> limit metadata value length, e.g. file location.
> Found that metadata has been abbreviated by `...` when tried to add some test 
> in `SQLQueryTestSuite`. That caused we can't replace the location value by 
> `className` since the `className` has been abbreviated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-32827) Add spark.sql.maxMetadataStringLength config

2020-09-15 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-32827:
---

Assignee: ulysses you

> Add spark.sql.maxMetadataStringLength config
> 
>
> Key: SPARK-32827
> URL: https://issues.apache.org/jira/browse/SPARK-32827
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: ulysses you
>Assignee: ulysses you
>Priority: Minor
>
> Add a new config `spark.sql.maxMetadataStringLength`. This config aims to 
> limit metadata value length, e.g. file location.
> Found that metadata has been abbreviated by `...` when tried to add some test 
> in `SQLQueryTestSuite`. That caused we can't replace the location value by 
> `className` since the `className` has been abbreviated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-32893) Structured Streaming and Dynamic Allocation on StandaloneCluster

2020-09-15 Thread Duarte Ferreira (Jira)

Duarte Ferreira created SPARK-32893:
---

 Summary: Structured Streaming and Dynamic Allocation on 
StandaloneCluster
 Key: SPARK-32893
 URL: https://issues.apache.org/jira/browse/SPARK-32893
 Project: Spark
  Issue Type: Bug
  Components: Structured Streaming
Affects Versions: 3.0.1
Reporter: Duarte Ferreira


We are currently using Spark 3.0.1 Standalone cluster to run our Structured 
streaming applications.

We set the following configurations when running the application in cluster 
mode:
 * spark.dynamicAllocation.enabled = true
 * spark.shuffle.service.enabled = true
 * spark.cores.max =5
 * spark.executor.memory = 1G
 * spark.executor.cores = 1

We also have the configurations set to enable spark.shuffle.service.enabled on 
each worker and have a cluster composed of 1 master and 2 slaves.

The application reads data from a kafka Topic (readTopic) using [This 
documentation, 
|https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html]applies
 some transformations on the DataSet using spark SQL and writes data to another 
Kafka Topic (writeTopic).

When we start the application it behaves correctly, it starts with 0 executors 
and. as we start feeding data to the readTopic, it starts increasing the number 
of executors until it reaches the 5 executors limit and all messages are 
transformed and written to the writeTopic in Kafka.

If we stop feeding messages to the readTopic the application will work as 
expected and starts killing executors that are not needed anymore until we stop 
sending data completely and it reach 0 executors running.

If we start sending data again right away, it behaves just as expected it 
starts increasing the numbers of executors again. But if we leave the 
application in idle at 0 executors for around 10 minutes we start getting 
errors like this:
{noformat}
*no*
20/09/15 10:41:22 ERROR TransportClient: Failed to send RPC RPC 
7570256331800450365 to sparkmaster/10.0.12.231:7077: 
java.nio.channels.ClosedChannelException
java.nio.channels.ClosedChannelException
at 
io.netty.channel.AbstractChannel$AbstractUnsafe.newClosedChannelException(AbstractChannel.java:957)
at 
io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:865)
at 
io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1367)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:764)
at 
io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1104)
at 
io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
at 
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at 
io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:65)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:468)
at 
org.apache.spark.network.protocol.MessageWithHeader.copyByteBuf(MessageWithHeader.java:148)
at 
org.apache.spark.network.protocol.MessageWithHeader.transferTo(MessageWithHeader.java:123)
at 
io.netty.channel.socket.nio.NioSocketChannel.doWriteFileRegion(NioSocketChannel.java:362)
at 
io.netty.channel.nio.AbstractNioByteChannel.doWriteInternal(AbstractNioByteChannel.java:235)
at 
io.netty.channel.nio.AbstractNioByteChannel.doWrite0(AbstractNioByteChannel.java:209)
at 
io.netty.channel.socket.nio.NioSocketChannel.doWrite(NioSocketChannel.java:400)
at 
io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:930)
at 
io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0(AbstractNioChannel.java:354)
at 
io.netty.channel.AbstractChannel$AbstractUnsafe.flush(AbstractChannel.java:897)
at 
io.netty.channel.DefaultChannelPipeline$HeadContext.flush(DefaultChannelPipeline.java:1372)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:750)
at 
io.netty.channel.AbstractChannelHandlerC

[jira] [Commented] (SPARK-32892) Murmur3 and xxHash64 implementations do not produce the correct results on big-endian platforms

2020-09-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196168#comment-17196168
 ] 

Apache Spark commented on SPARK-32892:
--

User 'mundaym' has created a pull request for this issue:
https://github.com/apache/spark/pull/29762

> Murmur3 and xxHash64 implementations do not produce the correct results on 
> big-endian platforms
> ---
>
> Key: SPARK-32892
> URL: https://issues.apache.org/jira/browse/SPARK-32892
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 3.0.1
>Reporter: Michael Munday
>Priority: Minor
>  Labels: big-endian
>
> The Murmur3 and xxHash64 implementations in Spark do not produce the correct 
> results on big-endian systems. This causes test failures on my target 
> platform (s390x).
> These hash functions require that multi-byte chunks be interpreted as 
> integers encoded in *little-endian* byte order. This requires byte reversal 
> when using multi-byte unsafe operations on big-endian platforms.
> I have a PR ready for discussion and review.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32892) Murmur3 and xxHash64 implementations do not produce the correct results on big-endian platforms

2020-09-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196167#comment-17196167
 ] 

Apache Spark commented on SPARK-32892:
--

User 'mundaym' has created a pull request for this issue:
https://github.com/apache/spark/pull/29762

> Murmur3 and xxHash64 implementations do not produce the correct results on 
> big-endian platforms
> ---
>
> Key: SPARK-32892
> URL: https://issues.apache.org/jira/browse/SPARK-32892
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 3.0.1
>Reporter: Michael Munday
>Priority: Minor
>  Labels: big-endian
>
> The Murmur3 and xxHash64 implementations in Spark do not produce the correct 
> results on big-endian systems. This causes test failures on my target 
> platform (s390x).
> These hash functions require that multi-byte chunks be interpreted as 
> integers encoded in *little-endian* byte order. This requires byte reversal 
> when using multi-byte unsafe operations on big-endian platforms.
> I have a PR ready for discussion and review.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-32892) Murmur3 and xxHash64 implementations do not produce the correct results on big-endian platforms

2020-09-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32892:


Assignee: Apache Spark

> Murmur3 and xxHash64 implementations do not produce the correct results on 
> big-endian platforms
> ---
>
> Key: SPARK-32892
> URL: https://issues.apache.org/jira/browse/SPARK-32892
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 3.0.1
>Reporter: Michael Munday
>Assignee: Apache Spark
>Priority: Minor
>  Labels: big-endian
>
> The Murmur3 and xxHash64 implementations in Spark do not produce the correct 
> results on big-endian systems. This causes test failures on my target 
> platform (s390x).
> These hash functions require that multi-byte chunks be interpreted as 
> integers encoded in *little-endian* byte order. This requires byte reversal 
> when using multi-byte unsafe operations on big-endian platforms.
> I have a PR ready for discussion and review.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-32892) Murmur3 and xxHash64 implementations do not produce the correct results on big-endian platforms

2020-09-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32892:


Assignee: (was: Apache Spark)

> Murmur3 and xxHash64 implementations do not produce the correct results on 
> big-endian platforms
> ---
>
> Key: SPARK-32892
> URL: https://issues.apache.org/jira/browse/SPARK-32892
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 3.0.1
>Reporter: Michael Munday
>Priority: Minor
>  Labels: big-endian
>
> The Murmur3 and xxHash64 implementations in Spark do not produce the correct 
> results on big-endian systems. This causes test failures on my target 
> platform (s390x).
> These hash functions require that multi-byte chunks be interpreted as 
> integers encoded in *little-endian* byte order. This requires byte reversal 
> when using multi-byte unsafe operations on big-endian platforms.
> I have a PR ready for discussion and review.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-32892) Murmur3 and xxHash64 implementations do not produce the correct results on big-endian platforms

2020-09-15 Thread Michael Munday (Jira)

Michael Munday created SPARK-32892:
--

 Summary: Murmur3 and xxHash64 implementations do not produce the 
correct results on big-endian platforms
 Key: SPARK-32892
 URL: https://issues.apache.org/jira/browse/SPARK-32892
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, SQL
Affects Versions: 3.0.1
Reporter: Michael Munday


The Murmur3 and xxHash64 implementations in Spark do not produce the correct 
results on big-endian systems. This causes test failures on my target platform 
(s390x).

These hash functions require that multi-byte chunks be interpreted as integers 
encoded in *little-endian* byte order. This requires byte reversal when using 
multi-byte unsafe operations on big-endian platforms.

I have a PR ready for discussion and review.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31448) Difference in Storage Levels used in cache() and persist() for pyspark dataframes

2020-09-15 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-31448:


Assignee: Abhishek Dixit

> Difference in Storage Levels used in cache() and persist() for pyspark 
> dataframes
> -
>
> Key: SPARK-31448
> URL: https://issues.apache.org/jira/browse/SPARK-31448
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.3
>Reporter: Abhishek Dixit
>Assignee: Abhishek Dixit
>Priority: Major
>
> There is a difference in default storage level *MEMORY_AND_DISK* in pyspark 
> and scala.
> *Scala*: StorageLevel(true, true, false, true)
> *Pyspark:* StorageLevel(True, True, False, False)
>  
> *Problem Description:* 
> Calling *df.cache()*  for pyspark dataframe directly invokes Scala method 
> cache() and Storage Level used is StorageLevel(true, true, false, true).
> But calling *df.persist()* for pyspark dataframe sets the 
> newStorageLevel=StorageLevel(true, true, false, false) inside pyspark and 
> then invokes Scala function persist(newStorageLevel).
> *Possible Fix:*
> Invoke pyspark function persist inside pyspark function cache instead of 
> calling the scala function directly.
> I can raise a PR for this fix if someone can confirm that this is a bug and 
> the possible fix is the correct approach.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31448) Difference in Storage Levels used in cache() and persist() for pyspark dataframes

2020-09-15 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-31448.
--
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 29242
[https://github.com/apache/spark/pull/29242]

> Difference in Storage Levels used in cache() and persist() for pyspark 
> dataframes
> -
>
> Key: SPARK-31448
> URL: https://issues.apache.org/jira/browse/SPARK-31448
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.3
>Reporter: Abhishek Dixit
>Assignee: Abhishek Dixit
>Priority: Major
> Fix For: 3.1.0
>
>
> There is a difference in default storage level *MEMORY_AND_DISK* in pyspark 
> and scala.
> *Scala*: StorageLevel(true, true, false, true)
> *Pyspark:* StorageLevel(True, True, False, False)
>  
> *Problem Description:* 
> Calling *df.cache()*  for pyspark dataframe directly invokes Scala method 
> cache() and Storage Level used is StorageLevel(true, true, false, true).
> But calling *df.persist()* for pyspark dataframe sets the 
> newStorageLevel=StorageLevel(true, true, false, false) inside pyspark and 
> then invokes Scala function persist(newStorageLevel).
> *Possible Fix:*
> Invoke pyspark function persist inside pyspark function cache instead of 
> calling the scala function directly.
> I can raise a PR for this fix if someone can confirm that this is a bug and 
> the possible fix is the correct approach.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32891) Enhance UTF8String.trim

2020-09-15 Thread Sean R. Owen (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196157#comment-17196157
 ] 

Sean R. Owen commented on SPARK-32891:
--

Can you inline a basic description here? what are you proposing?

> Enhance UTF8String.trim
> ---
>
> Key: SPARK-32891
> URL: https://issues.apache.org/jira/browse/SPARK-32891
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Priority: Major
>
> Please see comment:
> https://github.com/apache/spark/blob/7eb76d698836a251065753117e22285dd1a8aa8f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L674-L675
> https://github.com/apache/spark/pull/29731#discussion_r487709672



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-32891) Enhance UTF8String.trim

2020-09-15 Thread Yuming Wang (Jira)

Yuming Wang created SPARK-32891:
---

 Summary: Enhance UTF8String.trim
 Key: SPARK-32891
 URL: https://issues.apache.org/jira/browse/SPARK-32891
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.1.0
Reporter: Yuming Wang


Please see comment:
https://github.com/apache/spark/blob/7eb76d698836a251065753117e22285dd1a8aa8f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L674-L675

https://github.com/apache/spark/pull/29731#discussion_r487709672



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32889) orc table column name doesn't support special characters.

2020-09-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196143#comment-17196143
 ] 

Apache Spark commented on SPARK-32889:
--

User 'jzc928' has created a pull request for this issue:
https://github.com/apache/spark/pull/29761

> orc table column name doesn't support special characters.
> -
>
> Key: SPARK-32889
> URL: https://issues.apache.org/jira/browse/SPARK-32889
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: jason jin
>Priority: Major
>
> when execute 
> "CREATE TABLE tbl(`$` INT, b INT) using orc";
> error occurs like below. but in it's ok in hive.
> Column name "$" contains invalid character(s). Please use alias to rename 
> it.;Column name "$" contains invalid character(s). Please use alias to rename 
> it.;org.apache.spark.sql.AnalysisException: Column name "$" contains invalid 
> character(s). Please use alias to rename it.; at 
> org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$.checkFieldName(OrcFileFormat.scala:51)
>  at 
> org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$.$anonfun$checkFieldNames$1(OrcFileFormat.scala:59)
>  at 
> org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$.$anonfun$checkFieldNames$1$adapted(OrcFileFormat.scala:59)
>  at 
> scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at 
> scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) 
> at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) at 
> org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$.checkFieldNames(OrcFileFormat.scala:59)
>  at 
> org.apache.spark.sql.execution.command.DDLUtils$.$anonfun$checkDataColNames$1(ddl.scala:924)
>  at 
> org.apache.spark.sql.execution.command.DDLUtils$.$anonfun$checkDataColNames$1$adapted(ddl.scala:913)
>  at scala.Option.foreach(Option.scala:407) at 
> org.apache.spark.sql.execution.command.DDLUtils$.checkDataColNames(ddl.scala:913)
>  at 
> org.apache.spark.sql.execution.command.DDLUtils$.checkDataColNames(ddl.scala:908)
>  at 
> org.apache.spark.sql.execution.datasources.PreprocessTableCreation$$anonfun$apply$2.applyOrElse(rules.scala:231)
>  at 
> org.apache.spark.sql.execution.datasources.PreprocessTableCreation$$anonfun$apply$2.applyOrElse(rules.scala:80)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$2(AnalysisHelper.scala:108)
>  at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:73)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$1(AnalysisHelper.scala:108)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:207)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown(AnalysisHelper.scala:106)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown$(AnalysisHelper.scala:104)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDown(LogicalPlan.scala:29)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators(AnalysisHelper.scala:73)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators$(AnalysisHelper.scala:72)
>  at 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-32889) orc table column name doesn't support special characters.

2020-09-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32889:


Assignee: (was: Apache Spark)

> orc table column name doesn't support special characters.
> -
>
> Key: SPARK-32889
> URL: https://issues.apache.org/jira/browse/SPARK-32889
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: jason jin
>Priority: Major
>
> when execute 
> "CREATE TABLE tbl(`$` INT, b INT) using orc";
> error occurs like below. but in it's ok in hive.
> Column name "$" contains invalid character(s). Please use alias to rename 
> it.;Column name "$" contains invalid character(s). Please use alias to rename 
> it.;org.apache.spark.sql.AnalysisException: Column name "$" contains invalid 
> character(s). Please use alias to rename it.; at 
> org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$.checkFieldName(OrcFileFormat.scala:51)
>  at 
> org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$.$anonfun$checkFieldNames$1(OrcFileFormat.scala:59)
>  at 
> org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$.$anonfun$checkFieldNames$1$adapted(OrcFileFormat.scala:59)
>  at 
> scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at 
> scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) 
> at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) at 
> org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$.checkFieldNames(OrcFileFormat.scala:59)
>  at 
> org.apache.spark.sql.execution.command.DDLUtils$.$anonfun$checkDataColNames$1(ddl.scala:924)
>  at 
> org.apache.spark.sql.execution.command.DDLUtils$.$anonfun$checkDataColNames$1$adapted(ddl.scala:913)
>  at scala.Option.foreach(Option.scala:407) at 
> org.apache.spark.sql.execution.command.DDLUtils$.checkDataColNames(ddl.scala:913)
>  at 
> org.apache.spark.sql.execution.command.DDLUtils$.checkDataColNames(ddl.scala:908)
>  at 
> org.apache.spark.sql.execution.datasources.PreprocessTableCreation$$anonfun$apply$2.applyOrElse(rules.scala:231)
>  at 
> org.apache.spark.sql.execution.datasources.PreprocessTableCreation$$anonfun$apply$2.applyOrElse(rules.scala:80)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$2(AnalysisHelper.scala:108)
>  at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:73)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$1(AnalysisHelper.scala:108)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:207)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown(AnalysisHelper.scala:106)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown$(AnalysisHelper.scala:104)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDown(LogicalPlan.scala:29)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators(AnalysisHelper.scala:73)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators$(AnalysisHelper.scala:72)
>  at 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-32889) orc table column name doesn't support special characters.

2020-09-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32889:


Assignee: Apache Spark

> orc table column name doesn't support special characters.
> -
>
> Key: SPARK-32889
> URL: https://issues.apache.org/jira/browse/SPARK-32889
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: jason jin
>Assignee: Apache Spark
>Priority: Major
>
> when execute 
> "CREATE TABLE tbl(`$` INT, b INT) using orc";
> error occurs like below. but in it's ok in hive.
> Column name "$" contains invalid character(s). Please use alias to rename 
> it.;Column name "$" contains invalid character(s). Please use alias to rename 
> it.;org.apache.spark.sql.AnalysisException: Column name "$" contains invalid 
> character(s). Please use alias to rename it.; at 
> org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$.checkFieldName(OrcFileFormat.scala:51)
>  at 
> org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$.$anonfun$checkFieldNames$1(OrcFileFormat.scala:59)
>  at 
> org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$.$anonfun$checkFieldNames$1$adapted(OrcFileFormat.scala:59)
>  at 
> scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at 
> scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) 
> at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) at 
> org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$.checkFieldNames(OrcFileFormat.scala:59)
>  at 
> org.apache.spark.sql.execution.command.DDLUtils$.$anonfun$checkDataColNames$1(ddl.scala:924)
>  at 
> org.apache.spark.sql.execution.command.DDLUtils$.$anonfun$checkDataColNames$1$adapted(ddl.scala:913)
>  at scala.Option.foreach(Option.scala:407) at 
> org.apache.spark.sql.execution.command.DDLUtils$.checkDataColNames(ddl.scala:913)
>  at 
> org.apache.spark.sql.execution.command.DDLUtils$.checkDataColNames(ddl.scala:908)
>  at 
> org.apache.spark.sql.execution.datasources.PreprocessTableCreation$$anonfun$apply$2.applyOrElse(rules.scala:231)
>  at 
> org.apache.spark.sql.execution.datasources.PreprocessTableCreation$$anonfun$apply$2.applyOrElse(rules.scala:80)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$2(AnalysisHelper.scala:108)
>  at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:73)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$1(AnalysisHelper.scala:108)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:207)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown(AnalysisHelper.scala:106)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown$(AnalysisHelper.scala:104)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDown(LogicalPlan.scala:29)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators(AnalysisHelper.scala:73)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators$(AnalysisHelper.scala:72)
>  at 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-32890) Pass all `sql/hive` module UTs in Scala 2.13

2020-09-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32890:


Assignee: Apache Spark

> Pass all `sql/hive` module UTs in Scala 2.13
> 
>
> Key: SPARK-32890
> URL: https://issues.apache.org/jira/browse/SPARK-32890
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Major
>
> There are only 4 test cases failed in sql hive module with cmd 
>  
> {code:java}
> mvn clean install -DskipTests -pl sql/hive -am -Pscala-2.13 -Phive
> mvn clean test -pl sql/hive -Pscala-2.13 -Phive{code}
>  
> The failed cases as follow:
>  * HiveSchemaInferenceSuite (1 FAILED)
>  * HiveSparkSubmitSuite (1 FAILED)
>  * StatisticsSuite (1 FAILED)
>  * HiveDDLSuite (1 FAILED)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32890) Pass all `sql/hive` module UTs in Scala 2.13

2020-09-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196132#comment-17196132
 ] 

Apache Spark commented on SPARK-32890:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/29760

> Pass all `sql/hive` module UTs in Scala 2.13
> 
>
> Key: SPARK-32890
> URL: https://issues.apache.org/jira/browse/SPARK-32890
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yang Jie
>Priority: Major
>
> There are only 4 test cases failed in sql hive module with cmd 
>  
> {code:java}
> mvn clean install -DskipTests -pl sql/hive -am -Pscala-2.13 -Phive
> mvn clean test -pl sql/hive -Pscala-2.13 -Phive{code}
>  
> The failed cases as follow:
>  * HiveSchemaInferenceSuite (1 FAILED)
>  * HiveSparkSubmitSuite (1 FAILED)
>  * StatisticsSuite (1 FAILED)
>  * HiveDDLSuite (1 FAILED)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-32890) Pass all `sql/hive` module UTs in Scala 2.13

2020-09-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32890:


Assignee: (was: Apache Spark)

> Pass all `sql/hive` module UTs in Scala 2.13
> 
>
> Key: SPARK-32890
> URL: https://issues.apache.org/jira/browse/SPARK-32890
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yang Jie
>Priority: Major
>
> There are only 4 test cases failed in sql hive module with cmd 
>  
> {code:java}
> mvn clean install -DskipTests -pl sql/hive -am -Pscala-2.13 -Phive
> mvn clean test -pl sql/hive -Pscala-2.13 -Phive{code}
>  
> The failed cases as follow:
>  * HiveSchemaInferenceSuite (1 FAILED)
>  * HiveSparkSubmitSuite (1 FAILED)
>  * StatisticsSuite (1 FAILED)
>  * HiveDDLSuite (1 FAILED)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-32874) Enhance result set meta data check for execute statement operation for thrift server

2020-09-15 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-32874.
-
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 29746
[https://github.com/apache/spark/pull/29746]

> Enhance result set meta data check for execute statement operation for thrift 
> server
> 
>
> Key: SPARK-32874
> URL: https://issues.apache.org/jira/browse/SPARK-32874
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.1, 3.1.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.1.0
>
>
> Add test cases to ensure stability for JDBC api



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-32874) Enhance result set meta data check for execute statement operation for thrift server

2020-09-15 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-32874:
---

Assignee: Kent Yao

> Enhance result set meta data check for execute statement operation for thrift 
> server
> 
>
> Key: SPARK-32874
> URL: https://issues.apache.org/jira/browse/SPARK-32874
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.1, 3.1.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>
> Add test cases to ensure stability for JDBC api



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-32890) Pass all `sql/hive` module UTs in Scala 2.13

2020-09-15 Thread Yang Jie (Jira)

Yang Jie created SPARK-32890:


 Summary: Pass all `sql/hive` module UTs in Scala 2.13
 Key: SPARK-32890
 URL: https://issues.apache.org/jira/browse/SPARK-32890
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.1.0
Reporter: Yang Jie


There are only 4 test cases failed in sql hive module with cmd 

 
{code:java}
mvn clean install -DskipTests -pl sql/hive -am -Pscala-2.13 -Phive
mvn clean test -pl sql/hive -Pscala-2.13 -Phive{code}
 

The failed cases as follow:
 * HiveSchemaInferenceSuite (1 FAILED)
 * HiveSparkSubmitSuite (1 FAILED)
 * StatisticsSuite (1 FAILED)
 * HiveDDLSuite (1 FAILED)

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-32889) orc table column name doesn't support special characters.

2020-09-15 Thread jason jin (Jira)

jason jin created SPARK-32889:
-

 Summary: orc table column name doesn't support special characters.
 Key: SPARK-32889
 URL: https://issues.apache.org/jira/browse/SPARK-32889
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.1
Reporter: jason jin


when execute 

"CREATE TABLE tbl(`$` INT, b INT) using orc";

error occurs like below. but in it's ok in hive.

Column name "$" contains invalid character(s). Please use alias to rename 
it.;Column name "$" contains invalid character(s). Please use alias to rename 
it.;org.apache.spark.sql.AnalysisException: Column name "$" contains invalid 
character(s). Please use alias to rename it.; at 
org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$.checkFieldName(OrcFileFormat.scala:51)
 at 
org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$.$anonfun$checkFieldNames$1(OrcFileFormat.scala:59)
 at 
org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$.$anonfun$checkFieldNames$1$adapted(OrcFileFormat.scala:59)
 at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) 
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) 
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) at 
org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$.checkFieldNames(OrcFileFormat.scala:59)
 at 
org.apache.spark.sql.execution.command.DDLUtils$.$anonfun$checkDataColNames$1(ddl.scala:924)
 at 
org.apache.spark.sql.execution.command.DDLUtils$.$anonfun$checkDataColNames$1$adapted(ddl.scala:913)
 at scala.Option.foreach(Option.scala:407) at 
org.apache.spark.sql.execution.command.DDLUtils$.checkDataColNames(ddl.scala:913)
 at 
org.apache.spark.sql.execution.command.DDLUtils$.checkDataColNames(ddl.scala:908)
 at 
org.apache.spark.sql.execution.datasources.PreprocessTableCreation$$anonfun$apply$2.applyOrElse(rules.scala:231)
 at 
org.apache.spark.sql.execution.datasources.PreprocessTableCreation$$anonfun$apply$2.applyOrElse(rules.scala:80)
 at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$2(AnalysisHelper.scala:108)
 at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:73)
 at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$1(AnalysisHelper.scala:108)
 at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:207)
 at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown(AnalysisHelper.scala:106)
 at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown$(AnalysisHelper.scala:104)
 at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDown(LogicalPlan.scala:29)
 at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators(AnalysisHelper.scala:73)
 at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators$(AnalysisHelper.scala:72)
 at 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32887) Example command in https://spark.apache.org/docs/latest/sql-ref-syntax-aux-show-table.html to be changed

2020-09-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196024#comment-17196024
 ] 

Apache Spark commented on SPARK-32887:
--

User 'Udbhav30' has created a pull request for this issue:
https://github.com/apache/spark/pull/29758

> Example command in 
> https://spark.apache.org/docs/latest/sql-ref-syntax-aux-show-table.html to be 
> changed
> 
>
> Key: SPARK-32887
> URL: https://issues.apache.org/jira/browse/SPARK-32887
> Project: Spark
>  Issue Type: Bug
>  Components: docs
>Affects Versions: 3.0.0
> Environment: Spark 2.4.5, Spark 3.0.0
>Reporter: Chetan Bhat
>Priority: Minor
>
> In the link 
> [https://spark.apache.org/docs/latest/sql-ref-syntax-aux-show-table.html] the 
> below command example mentioned is wrong.
> SHOW TABLE EXTENDED IN default LIKE 'employee' PARTITION ('grade=1');
>  
> Complete example executed throws below error.
> CREATE TABLE employee(name STRING)PARTITIONED BY (grade int) stored as 
> parquet;
> INSERT INTO employee PARTITION (grade = 1) VALUES ('sam');
> INSERT INTO employee PARTITION (grade = 2) VALUES ('suj');
> spark-sql> SHOW TABLE EXTENDED IN default LIKE 'employee' PARTITION 
> ('grade=1');
> **Error in query:**
> ```
>  mismatched input ''grade=1'' expecting \{'ADD', 'AFTER', 'ALL', 'ALTER', 
> 'ANALYZE', 'AND', 'ANTI', 'ANY', 'ARCHIVE', 'ARRAY', 'AS', 'ASC', 'AT', 
> 'AUTHORIZATION', 'BETWEEN', 'BOTH', 'BUCKET', 'BUCKETS', 'BY', 'CACHE', 
> 'CASCADE', 'CASE', 'CAST', 'CHANGE', 'CHECK', 'CLEAR', 'CLUSTER', 
> 'CLUSTERED', 'CODEGEN', 'COLLATE', 'COLLECTION', 'COLUMN', 'COLUMNS', 
> 'COMMENT', 'COMMIT', 'COMPACT', 'COMPACTIONS', 'COMPUTE', 'CONCATENATE', 
> 'CONSTRAINT', 'COST', 'CREATE', 'CROSS', 'CUBE', 'CURRENT', 'CURRENT_DATE', 
> 'CURRENT_TIME', 'CURRENT_TIMESTAMP', 'CURRENT_USER', 'DATA', 'DATABASE', 
> DATABASES, 'DAY', 'DBPROPERTIES', 'DEFINED', 'DELETE', 'DELIMITED', 'DESC', 
> 'DESCRIBE', 'DFS', 'DIRECTORIES', 'DIRECTORY', 'DISTINCT', 'DISTRIBUTE', 
> 'DROP', 'ELSE', 'END', 'ESCAPE', 'ESCAPED', 'EXCEPT', 'EXCHANGE', 'EXISTS', 
> 'EXPLAIN', 'EXPORT', 'EXTENDED', 'EXTERNAL', 'EXTRACT', 'FALSE', 'FETCH', 
> 'FIELDS', 'FILTER', 'FILEFORMAT', 'FIRST', 'FOLLOWING', 'FOR', 'FOREIGN', 
> 'FORMAT', 'FORMATTED', 'FROM', 'FULL', 'FUNCTION', 'FUNCTIONS', 'GLOBAL', 
> 'GRANT', 'GROUP', 'GROUPING', 'HAVING', 'HOUR', 'IF', 'IGNORE', 'IMPORT', 
> 'IN', 'INDEX', 'INDEXES', 'INNER', 'INPATH', 'INPUTFORMAT', 'INSERT', 
> 'INTERSECT', 'INTERVAL', 'INTO', 'IS', 'ITEMS', 'JOIN', 'KEYS', 'LAST', 
> 'LATERAL', 'LAZY', 'LEADING', 'LEFT', 'LIKE', 'LIMIT', 'LINES', 'LIST', 
> 'LOAD', 'LOCAL', 'LOCATION', 'LOCK', 'LOCKS', 'LOGICAL', 'MACRO', 'MAP', 
> 'MATCHED', 'MERGE', 'MINUTE', 'MONTH', 'MSCK', 'NAMESPACE', 'NAMESPACES', 
> 'NATURAL', 'NO', NOT, 'NULL', 'NULLS', 'OF', 'ON', 'ONLY', 'OPTION', 
> 'OPTIONS', 'OR', 'ORDER', 'OUT', 'OUTER', 'OUTPUTFORMAT', 'OVER', 'OVERLAPS', 
> 'OVERLAY', 'OVERWRITE', 'PARTITION', 'PARTITIONED', 'PARTITIONS', 'PERCENT', 
> 'PIVOT', 'PLACING', 'POSITION', 'PRECEDING', 'PRIMARY', 'PRINCIPALS', 
> 'PROPERTIES', 'PURGE', 'QUERY', 'RANGE', 'RECORDREADER', 'RECORDWRITER', 
> 'RECOVER', 'REDUCE', 'REFERENCES', 'REFRESH', 'RENAME', 'REPAIR', 'REPLACE', 
> 'RESET', 'RESTRICT', 'REVOKE', 'RIGHT', RLIKE, 'ROLE', 'ROLES', 'ROLLBACK', 
> 'ROLLUP', 'ROW', 'ROWS', 'SCHEMA', 'SECOND', 'SELECT', 'SEMI', 'SEPARATED', 
> 'SERDE', 'SERDEPROPERTIES', 'SESSION_USER', 'SET', 'MINUS', 'SETS', 'SHOW', 
> 'SKEWED', 'SOME', 'SORT', 'SORTED', 'START', 'STATISTICS', 'STORED', 
> 'STRATIFY', 'STRUCT', 'SUBSTR', 'SUBSTRING', 'TABLE', 'TABLES', 
> 'TABLESAMPLE', 'TBLPROPERTIES', TEMPORARY, 'TERMINATED', 'THEN', 'TO', 
> 'TOUCH', 'TRAILING', 'TRANSACTION', 'TRANSACTIONS', 'TRANSFORM', 'TRIM', 
> 'TRUE', 'TRUNCATE', 'TYPE', 'UNARCHIVE', 'UNBOUNDED', 'UNCACHE', 'UNION', 
> 'UNIQUE', 'UNKNOWN', 'UNLOCK', 'UNSET', 'UPDATE', 'USE', 'USER', 'USING', 
> 'VALUES', 'VIEW', 'VIEWS', 'WHEN', 'WHERE', 'WINDOW', 'WITH', 'YEAR', 'DIV', 
> IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 59)
> == SQL ==
>  SHOW TABLE EXTENDED IN default LIKE 'employee' PARTITION ('grade=1')
>  ---^^^
> ```
>  
> Expected : - If that partition value is string we can give like this grade 
> ='abc'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32481) Support truncate table to move the data to trash

2020-09-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196025#comment-17196025
 ] 

Apache Spark commented on SPARK-32481:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/29759

> Support truncate table to move the data to trash
> 
>
> Key: SPARK-32481
> URL: https://issues.apache.org/jira/browse/SPARK-32481
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 3.1.0
>Reporter: jobit mathew
>Assignee: Udbhav Agrawal
>Priority: Minor
> Fix For: 3.1.0
>
>
> *Instead of deleting the data, move the data to trash.So from trash based on 
> configuration data can be deleted permanently.*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32481) Support truncate table to move the data to trash

2020-09-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196023#comment-17196023
 ] 

Apache Spark commented on SPARK-32481:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/29759

> Support truncate table to move the data to trash
> 
>
> Key: SPARK-32481
> URL: https://issues.apache.org/jira/browse/SPARK-32481
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 3.1.0
>Reporter: jobit mathew
>Assignee: Udbhav Agrawal
>Priority: Minor
> Fix For: 3.1.0
>
>
> *Instead of deleting the data, move the data to trash.So from trash based on 
> configuration data can be deleted permanently.*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-32887) Example command in https://spark.apache.org/docs/latest/sql-ref-syntax-aux-show-table.html to be changed

2020-09-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32887:


Assignee: (was: Apache Spark)

> Example command in 
> https://spark.apache.org/docs/latest/sql-ref-syntax-aux-show-table.html to be 
> changed
> 
>
> Key: SPARK-32887
> URL: https://issues.apache.org/jira/browse/SPARK-32887
> Project: Spark
>  Issue Type: Bug
>  Components: docs
>Affects Versions: 3.0.0
> Environment: Spark 2.4.5, Spark 3.0.0
>Reporter: Chetan Bhat
>Priority: Minor
>
> In the link 
> [https://spark.apache.org/docs/latest/sql-ref-syntax-aux-show-table.html] the 
> below command example mentioned is wrong.
> SHOW TABLE EXTENDED IN default LIKE 'employee' PARTITION ('grade=1');
>  
> Complete example executed throws below error.
> CREATE TABLE employee(name STRING)PARTITIONED BY (grade int) stored as 
> parquet;
> INSERT INTO employee PARTITION (grade = 1) VALUES ('sam');
> INSERT INTO employee PARTITION (grade = 2) VALUES ('suj');
> spark-sql> SHOW TABLE EXTENDED IN default LIKE 'employee' PARTITION 
> ('grade=1');
> **Error in query:**
> ```
>  mismatched input ''grade=1'' expecting \{'ADD', 'AFTER', 'ALL', 'ALTER', 
> 'ANALYZE', 'AND', 'ANTI', 'ANY', 'ARCHIVE', 'ARRAY', 'AS', 'ASC', 'AT', 
> 'AUTHORIZATION', 'BETWEEN', 'BOTH', 'BUCKET', 'BUCKETS', 'BY', 'CACHE', 
> 'CASCADE', 'CASE', 'CAST', 'CHANGE', 'CHECK', 'CLEAR', 'CLUSTER', 
> 'CLUSTERED', 'CODEGEN', 'COLLATE', 'COLLECTION', 'COLUMN', 'COLUMNS', 
> 'COMMENT', 'COMMIT', 'COMPACT', 'COMPACTIONS', 'COMPUTE', 'CONCATENATE', 
> 'CONSTRAINT', 'COST', 'CREATE', 'CROSS', 'CUBE', 'CURRENT', 'CURRENT_DATE', 
> 'CURRENT_TIME', 'CURRENT_TIMESTAMP', 'CURRENT_USER', 'DATA', 'DATABASE', 
> DATABASES, 'DAY', 'DBPROPERTIES', 'DEFINED', 'DELETE', 'DELIMITED', 'DESC', 
> 'DESCRIBE', 'DFS', 'DIRECTORIES', 'DIRECTORY', 'DISTINCT', 'DISTRIBUTE', 
> 'DROP', 'ELSE', 'END', 'ESCAPE', 'ESCAPED', 'EXCEPT', 'EXCHANGE', 'EXISTS', 
> 'EXPLAIN', 'EXPORT', 'EXTENDED', 'EXTERNAL', 'EXTRACT', 'FALSE', 'FETCH', 
> 'FIELDS', 'FILTER', 'FILEFORMAT', 'FIRST', 'FOLLOWING', 'FOR', 'FOREIGN', 
> 'FORMAT', 'FORMATTED', 'FROM', 'FULL', 'FUNCTION', 'FUNCTIONS', 'GLOBAL', 
> 'GRANT', 'GROUP', 'GROUPING', 'HAVING', 'HOUR', 'IF', 'IGNORE', 'IMPORT', 
> 'IN', 'INDEX', 'INDEXES', 'INNER', 'INPATH', 'INPUTFORMAT', 'INSERT', 
> 'INTERSECT', 'INTERVAL', 'INTO', 'IS', 'ITEMS', 'JOIN', 'KEYS', 'LAST', 
> 'LATERAL', 'LAZY', 'LEADING', 'LEFT', 'LIKE', 'LIMIT', 'LINES', 'LIST', 
> 'LOAD', 'LOCAL', 'LOCATION', 'LOCK', 'LOCKS', 'LOGICAL', 'MACRO', 'MAP', 
> 'MATCHED', 'MERGE', 'MINUTE', 'MONTH', 'MSCK', 'NAMESPACE', 'NAMESPACES', 
> 'NATURAL', 'NO', NOT, 'NULL', 'NULLS', 'OF', 'ON', 'ONLY', 'OPTION', 
> 'OPTIONS', 'OR', 'ORDER', 'OUT', 'OUTER', 'OUTPUTFORMAT', 'OVER', 'OVERLAPS', 
> 'OVERLAY', 'OVERWRITE', 'PARTITION', 'PARTITIONED', 'PARTITIONS', 'PERCENT', 
> 'PIVOT', 'PLACING', 'POSITION', 'PRECEDING', 'PRIMARY', 'PRINCIPALS', 
> 'PROPERTIES', 'PURGE', 'QUERY', 'RANGE', 'RECORDREADER', 'RECORDWRITER', 
> 'RECOVER', 'REDUCE', 'REFERENCES', 'REFRESH', 'RENAME', 'REPAIR', 'REPLACE', 
> 'RESET', 'RESTRICT', 'REVOKE', 'RIGHT', RLIKE, 'ROLE', 'ROLES', 'ROLLBACK', 
> 'ROLLUP', 'ROW', 'ROWS', 'SCHEMA', 'SECOND', 'SELECT', 'SEMI', 'SEPARATED', 
> 'SERDE', 'SERDEPROPERTIES', 'SESSION_USER', 'SET', 'MINUS', 'SETS', 'SHOW', 
> 'SKEWED', 'SOME', 'SORT', 'SORTED', 'START', 'STATISTICS', 'STORED', 
> 'STRATIFY', 'STRUCT', 'SUBSTR', 'SUBSTRING', 'TABLE', 'TABLES', 
> 'TABLESAMPLE', 'TBLPROPERTIES', TEMPORARY, 'TERMINATED', 'THEN', 'TO', 
> 'TOUCH', 'TRAILING', 'TRANSACTION', 'TRANSACTIONS', 'TRANSFORM', 'TRIM', 
> 'TRUE', 'TRUNCATE', 'TYPE', 'UNARCHIVE', 'UNBOUNDED', 'UNCACHE', 'UNION', 
> 'UNIQUE', 'UNKNOWN', 'UNLOCK', 'UNSET', 'UPDATE', 'USE', 'USER', 'USING', 
> 'VALUES', 'VIEW', 'VIEWS', 'WHEN', 'WHERE', 'WINDOW', 'WITH', 'YEAR', 'DIV', 
> IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 59)
> == SQL ==
>  SHOW TABLE EXTENDED IN default LIKE 'employee' PARTITION ('grade=1')
>  ---^^^
> ```
>  
> Expected : - If that partition value is string we can give like this grade 
> ='abc'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-32887) Example command in https://spark.apache.org/docs/latest/sql-ref-syntax-aux-show-table.html to be changed

2020-09-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32887:


Assignee: Apache Spark

> Example command in 
> https://spark.apache.org/docs/latest/sql-ref-syntax-aux-show-table.html to be 
> changed
> 
>
> Key: SPARK-32887
> URL: https://issues.apache.org/jira/browse/SPARK-32887
> Project: Spark
>  Issue Type: Bug
>  Components: docs
>Affects Versions: 3.0.0
> Environment: Spark 2.4.5, Spark 3.0.0
>Reporter: Chetan Bhat
>Assignee: Apache Spark
>Priority: Minor
>
> In the link 
> [https://spark.apache.org/docs/latest/sql-ref-syntax-aux-show-table.html] the 
> below command example mentioned is wrong.
> SHOW TABLE EXTENDED IN default LIKE 'employee' PARTITION ('grade=1');
>  
> Complete example executed throws below error.
> CREATE TABLE employee(name STRING)PARTITIONED BY (grade int) stored as 
> parquet;
> INSERT INTO employee PARTITION (grade = 1) VALUES ('sam');
> INSERT INTO employee PARTITION (grade = 2) VALUES ('suj');
> spark-sql> SHOW TABLE EXTENDED IN default LIKE 'employee' PARTITION 
> ('grade=1');
> **Error in query:**
> ```
>  mismatched input ''grade=1'' expecting \{'ADD', 'AFTER', 'ALL', 'ALTER', 
> 'ANALYZE', 'AND', 'ANTI', 'ANY', 'ARCHIVE', 'ARRAY', 'AS', 'ASC', 'AT', 
> 'AUTHORIZATION', 'BETWEEN', 'BOTH', 'BUCKET', 'BUCKETS', 'BY', 'CACHE', 
> 'CASCADE', 'CASE', 'CAST', 'CHANGE', 'CHECK', 'CLEAR', 'CLUSTER', 
> 'CLUSTERED', 'CODEGEN', 'COLLATE', 'COLLECTION', 'COLUMN', 'COLUMNS', 
> 'COMMENT', 'COMMIT', 'COMPACT', 'COMPACTIONS', 'COMPUTE', 'CONCATENATE', 
> 'CONSTRAINT', 'COST', 'CREATE', 'CROSS', 'CUBE', 'CURRENT', 'CURRENT_DATE', 
> 'CURRENT_TIME', 'CURRENT_TIMESTAMP', 'CURRENT_USER', 'DATA', 'DATABASE', 
> DATABASES, 'DAY', 'DBPROPERTIES', 'DEFINED', 'DELETE', 'DELIMITED', 'DESC', 
> 'DESCRIBE', 'DFS', 'DIRECTORIES', 'DIRECTORY', 'DISTINCT', 'DISTRIBUTE', 
> 'DROP', 'ELSE', 'END', 'ESCAPE', 'ESCAPED', 'EXCEPT', 'EXCHANGE', 'EXISTS', 
> 'EXPLAIN', 'EXPORT', 'EXTENDED', 'EXTERNAL', 'EXTRACT', 'FALSE', 'FETCH', 
> 'FIELDS', 'FILTER', 'FILEFORMAT', 'FIRST', 'FOLLOWING', 'FOR', 'FOREIGN', 
> 'FORMAT', 'FORMATTED', 'FROM', 'FULL', 'FUNCTION', 'FUNCTIONS', 'GLOBAL', 
> 'GRANT', 'GROUP', 'GROUPING', 'HAVING', 'HOUR', 'IF', 'IGNORE', 'IMPORT', 
> 'IN', 'INDEX', 'INDEXES', 'INNER', 'INPATH', 'INPUTFORMAT', 'INSERT', 
> 'INTERSECT', 'INTERVAL', 'INTO', 'IS', 'ITEMS', 'JOIN', 'KEYS', 'LAST', 
> 'LATERAL', 'LAZY', 'LEADING', 'LEFT', 'LIKE', 'LIMIT', 'LINES', 'LIST', 
> 'LOAD', 'LOCAL', 'LOCATION', 'LOCK', 'LOCKS', 'LOGICAL', 'MACRO', 'MAP', 
> 'MATCHED', 'MERGE', 'MINUTE', 'MONTH', 'MSCK', 'NAMESPACE', 'NAMESPACES', 
> 'NATURAL', 'NO', NOT, 'NULL', 'NULLS', 'OF', 'ON', 'ONLY', 'OPTION', 
> 'OPTIONS', 'OR', 'ORDER', 'OUT', 'OUTER', 'OUTPUTFORMAT', 'OVER', 'OVERLAPS', 
> 'OVERLAY', 'OVERWRITE', 'PARTITION', 'PARTITIONED', 'PARTITIONS', 'PERCENT', 
> 'PIVOT', 'PLACING', 'POSITION', 'PRECEDING', 'PRIMARY', 'PRINCIPALS', 
> 'PROPERTIES', 'PURGE', 'QUERY', 'RANGE', 'RECORDREADER', 'RECORDWRITER', 
> 'RECOVER', 'REDUCE', 'REFERENCES', 'REFRESH', 'RENAME', 'REPAIR', 'REPLACE', 
> 'RESET', 'RESTRICT', 'REVOKE', 'RIGHT', RLIKE, 'ROLE', 'ROLES', 'ROLLBACK', 
> 'ROLLUP', 'ROW', 'ROWS', 'SCHEMA', 'SECOND', 'SELECT', 'SEMI', 'SEPARATED', 
> 'SERDE', 'SERDEPROPERTIES', 'SESSION_USER', 'SET', 'MINUS', 'SETS', 'SHOW', 
> 'SKEWED', 'SOME', 'SORT', 'SORTED', 'START', 'STATISTICS', 'STORED', 
> 'STRATIFY', 'STRUCT', 'SUBSTR', 'SUBSTRING', 'TABLE', 'TABLES', 
> 'TABLESAMPLE', 'TBLPROPERTIES', TEMPORARY, 'TERMINATED', 'THEN', 'TO', 
> 'TOUCH', 'TRAILING', 'TRANSACTION', 'TRANSACTIONS', 'TRANSFORM', 'TRIM', 
> 'TRUE', 'TRUNCATE', 'TYPE', 'UNARCHIVE', 'UNBOUNDED', 'UNCACHE', 'UNION', 
> 'UNIQUE', 'UNKNOWN', 'UNLOCK', 'UNSET', 'UPDATE', 'USE', 'USER', 'USING', 
> 'VALUES', 'VIEW', 'VIEWS', 'WHEN', 'WHERE', 'WINDOW', 'WITH', 'YEAR', 'DIV', 
> IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 59)
> == SQL ==
>  SHOW TABLE EXTENDED IN default LIKE 'employee' PARTITION ('grade=1')
>  ---^^^
> ```
>  
> Expected : - If that partition value is string we can give like this grade 
> ='abc'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32887) Example command in https://spark.apache.org/docs/latest/sql-ref-syntax-aux-show-table.html to be changed

2020-09-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196022#comment-17196022
 ] 

Apache Spark commented on SPARK-32887:
--

User 'Udbhav30' has created a pull request for this issue:
https://github.com/apache/spark/pull/29758

> Example command in 
> https://spark.apache.org/docs/latest/sql-ref-syntax-aux-show-table.html to be 
> changed
> 
>
> Key: SPARK-32887
> URL: https://issues.apache.org/jira/browse/SPARK-32887
> Project: Spark
>  Issue Type: Bug
>  Components: docs
>Affects Versions: 3.0.0
> Environment: Spark 2.4.5, Spark 3.0.0
>Reporter: Chetan Bhat
>Priority: Minor
>
> In the link 
> [https://spark.apache.org/docs/latest/sql-ref-syntax-aux-show-table.html] the 
> below command example mentioned is wrong.
> SHOW TABLE EXTENDED IN default LIKE 'employee' PARTITION ('grade=1');
>  
> Complete example executed throws below error.
> CREATE TABLE employee(name STRING)PARTITIONED BY (grade int) stored as 
> parquet;
> INSERT INTO employee PARTITION (grade = 1) VALUES ('sam');
> INSERT INTO employee PARTITION (grade = 2) VALUES ('suj');
> spark-sql> SHOW TABLE EXTENDED IN default LIKE 'employee' PARTITION 
> ('grade=1');
> **Error in query:**
> ```
>  mismatched input ''grade=1'' expecting \{'ADD', 'AFTER', 'ALL', 'ALTER', 
> 'ANALYZE', 'AND', 'ANTI', 'ANY', 'ARCHIVE', 'ARRAY', 'AS', 'ASC', 'AT', 
> 'AUTHORIZATION', 'BETWEEN', 'BOTH', 'BUCKET', 'BUCKETS', 'BY', 'CACHE', 
> 'CASCADE', 'CASE', 'CAST', 'CHANGE', 'CHECK', 'CLEAR', 'CLUSTER', 
> 'CLUSTERED', 'CODEGEN', 'COLLATE', 'COLLECTION', 'COLUMN', 'COLUMNS', 
> 'COMMENT', 'COMMIT', 'COMPACT', 'COMPACTIONS', 'COMPUTE', 'CONCATENATE', 
> 'CONSTRAINT', 'COST', 'CREATE', 'CROSS', 'CUBE', 'CURRENT', 'CURRENT_DATE', 
> 'CURRENT_TIME', 'CURRENT_TIMESTAMP', 'CURRENT_USER', 'DATA', 'DATABASE', 
> DATABASES, 'DAY', 'DBPROPERTIES', 'DEFINED', 'DELETE', 'DELIMITED', 'DESC', 
> 'DESCRIBE', 'DFS', 'DIRECTORIES', 'DIRECTORY', 'DISTINCT', 'DISTRIBUTE', 
> 'DROP', 'ELSE', 'END', 'ESCAPE', 'ESCAPED', 'EXCEPT', 'EXCHANGE', 'EXISTS', 
> 'EXPLAIN', 'EXPORT', 'EXTENDED', 'EXTERNAL', 'EXTRACT', 'FALSE', 'FETCH', 
> 'FIELDS', 'FILTER', 'FILEFORMAT', 'FIRST', 'FOLLOWING', 'FOR', 'FOREIGN', 
> 'FORMAT', 'FORMATTED', 'FROM', 'FULL', 'FUNCTION', 'FUNCTIONS', 'GLOBAL', 
> 'GRANT', 'GROUP', 'GROUPING', 'HAVING', 'HOUR', 'IF', 'IGNORE', 'IMPORT', 
> 'IN', 'INDEX', 'INDEXES', 'INNER', 'INPATH', 'INPUTFORMAT', 'INSERT', 
> 'INTERSECT', 'INTERVAL', 'INTO', 'IS', 'ITEMS', 'JOIN', 'KEYS', 'LAST', 
> 'LATERAL', 'LAZY', 'LEADING', 'LEFT', 'LIKE', 'LIMIT', 'LINES', 'LIST', 
> 'LOAD', 'LOCAL', 'LOCATION', 'LOCK', 'LOCKS', 'LOGICAL', 'MACRO', 'MAP', 
> 'MATCHED', 'MERGE', 'MINUTE', 'MONTH', 'MSCK', 'NAMESPACE', 'NAMESPACES', 
> 'NATURAL', 'NO', NOT, 'NULL', 'NULLS', 'OF', 'ON', 'ONLY', 'OPTION', 
> 'OPTIONS', 'OR', 'ORDER', 'OUT', 'OUTER', 'OUTPUTFORMAT', 'OVER', 'OVERLAPS', 
> 'OVERLAY', 'OVERWRITE', 'PARTITION', 'PARTITIONED', 'PARTITIONS', 'PERCENT', 
> 'PIVOT', 'PLACING', 'POSITION', 'PRECEDING', 'PRIMARY', 'PRINCIPALS', 
> 'PROPERTIES', 'PURGE', 'QUERY', 'RANGE', 'RECORDREADER', 'RECORDWRITER', 
> 'RECOVER', 'REDUCE', 'REFERENCES', 'REFRESH', 'RENAME', 'REPAIR', 'REPLACE', 
> 'RESET', 'RESTRICT', 'REVOKE', 'RIGHT', RLIKE, 'ROLE', 'ROLES', 'ROLLBACK', 
> 'ROLLUP', 'ROW', 'ROWS', 'SCHEMA', 'SECOND', 'SELECT', 'SEMI', 'SEPARATED', 
> 'SERDE', 'SERDEPROPERTIES', 'SESSION_USER', 'SET', 'MINUS', 'SETS', 'SHOW', 
> 'SKEWED', 'SOME', 'SORT', 'SORTED', 'START', 'STATISTICS', 'STORED', 
> 'STRATIFY', 'STRUCT', 'SUBSTR', 'SUBSTRING', 'TABLE', 'TABLES', 
> 'TABLESAMPLE', 'TBLPROPERTIES', TEMPORARY, 'TERMINATED', 'THEN', 'TO', 
> 'TOUCH', 'TRAILING', 'TRANSACTION', 'TRANSACTIONS', 'TRANSFORM', 'TRIM', 
> 'TRUE', 'TRUNCATE', 'TYPE', 'UNARCHIVE', 'UNBOUNDED', 'UNCACHE', 'UNION', 
> 'UNIQUE', 'UNKNOWN', 'UNLOCK', 'UNSET', 'UPDATE', 'USE', 'USER', 'USING', 
> 'VALUES', 'VIEW', 'VIEWS', 'WHEN', 'WHERE', 'WINDOW', 'WITH', 'YEAR', 'DIV', 
> IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 59)
> == SQL ==
>  SHOW TABLE EXTENDED IN default LIKE 'employee' PARTITION ('grade=1')
>  ---^^^
> ```
>  
> Expected : - If that partition value is string we can give like this grade 
> ='abc'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-32888) reading a parallized rdd with two identical records results in a zero count df when read via spark.read.csv

2020-09-15 Thread Punit Shah (Jira)

Punit Shah created SPARK-32888:
--

 Summary: reading a parallized rdd with two identical records 
results in a zero count df when read via spark.read.csv
 Key: SPARK-32888
 URL: https://issues.apache.org/jira/browse/SPARK-32888
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.0.1, 3.0.0, 2.4.7, 2.4.6, 2.4.5
Reporter: Punit Shah


* Imagine a two-row csv file like so (where the header and first record are 
duplicate rows):

aaa,bbb

aaa,bbb
 * The following is pyspark code
 * create a parallelized rdd like: {color:#FF}prdd = 
spark.read.text("test.csv").rdd.flatMap(lambda x : x){color}
 * {color:#172b4d}create a df like so: {color:#de350b}mydf = 
spark.read.csv(prdd, header=True){color}{color}
 * {color:#172b4d}{color:#de350b}df.count(){color:#172b4d} will result in a 
record count of zero (when it should be 1){color}{color}{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-28210) Shuffle Storage API: Reads

2020-09-15 Thread Attila Zsolt Piros (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-28210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196013#comment-17196013
 ] 

Attila Zsolt Piros edited comment on SPARK-28210 at 9/15/20, 9:38 AM:
--

 [~tianczha] [~devaraj] I would like to work on this issue if that's fine for 
you. I intend to progress along the ideas of the linked PR: to pass the 
metadata when the reducer task is constructed. 


was (Author: attilapiros):
 [~tianczha] [~devaraj] I would like to work on this issue if that's fine for 
you. I would like to progress along the ideas of the linked PR: to pass the 
metadata when the reducer task is constructed. 

> Shuffle Storage API: Reads
> --
>
> Key: SPARK-28210
> URL: https://issues.apache.org/jira/browse/SPARK-28210
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0
>Reporter: Matt Cheah
>Priority: Major
>
> As part of the effort to store shuffle data in arbitrary places, this issue 
> tracks implementing an API for reading the shuffle data stored by the write 
> API. Also ensure that the existing shuffle implementation is refactored to 
> use the API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28210) Shuffle Storage API: Reads

2020-09-15 Thread Attila Zsolt Piros (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-28210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196013#comment-17196013
 ] 

Attila Zsolt Piros commented on SPARK-28210:


 [~tianczha] [~devaraj] I would like to work on this issue if that's fine for 
you. I would like to progress along the ideas of the linked PR: to pass the 
metadata when the reducer task is constructed. 

> Shuffle Storage API: Reads
> --
>
> Key: SPARK-28210
> URL: https://issues.apache.org/jira/browse/SPARK-28210
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0
>Reporter: Matt Cheah
>Priority: Major
>
> As part of the effort to store shuffle data in arbitrary places, this issue 
> tracks implementing an API for reading the shuffle data stored by the write 
> API. Also ensure that the existing shuffle implementation is refactored to 
> use the API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32886) '.../jobs/undefined' link from "Event Timeline" in jobs page

2020-09-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17195968#comment-17195968
 ] 

Apache Spark commented on SPARK-32886:
--

User 'zhli1142015' has created a pull request for this issue:
https://github.com/apache/spark/pull/29757

> '.../jobs/undefined' link from "Event Timeline" in jobs page
> 
>
> Key: SPARK-32886
> URL: https://issues.apache.org/jira/browse/SPARK-32886
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Zhen Li
>Priority: Minor
> Attachments: undefinedlink.JPG
>
>
> In event timeline view of jobs page, clicking job item would redirect you to 
> corresponding job page. when there are two many jobs, some job items' link 
> would redirect to wrong link like '.../jobs/undefined'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-32886) '.../jobs/undefined' link from "Event Timeline" in jobs page

2020-09-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32886:


Assignee: Apache Spark

> '.../jobs/undefined' link from "Event Timeline" in jobs page
> 
>
> Key: SPARK-32886
> URL: https://issues.apache.org/jira/browse/SPARK-32886
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Zhen Li
>Assignee: Apache Spark
>Priority: Minor
> Attachments: undefinedlink.JPG
>
>
> In event timeline view of jobs page, clicking job item would redirect you to 
> corresponding job page. when there are two many jobs, some job items' link 
> would redirect to wrong link like '.../jobs/undefined'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-32886) '.../jobs/undefined' link from "Event Timeline" in jobs page

2020-09-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32886:


Assignee: (was: Apache Spark)

> '.../jobs/undefined' link from "Event Timeline" in jobs page
> 
>
> Key: SPARK-32886
> URL: https://issues.apache.org/jira/browse/SPARK-32886
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Zhen Li
>Priority: Minor
> Attachments: undefinedlink.JPG
>
>
> In event timeline view of jobs page, clicking job item would redirect you to 
> corresponding job page. when there are two many jobs, some job items' link 
> would redirect to wrong link like '.../jobs/undefined'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-32886) '.../jobs/undefined' link from "Event Timeline" in jobs page

2020-09-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32886:


Assignee: Apache Spark

> '.../jobs/undefined' link from "Event Timeline" in jobs page
> 
>
> Key: SPARK-32886
> URL: https://issues.apache.org/jira/browse/SPARK-32886
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Zhen Li
>Assignee: Apache Spark
>Priority: Minor
> Attachments: undefinedlink.JPG
>
>
> In event timeline view of jobs page, clicking job item would redirect you to 
> corresponding job page. when there are two many jobs, some job items' link 
> would redirect to wrong link like '.../jobs/undefined'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32887) Example command in https://spark.apache.org/docs/latest/sql-ref-syntax-aux-show-table.html to be changed

2020-09-15 Thread Udbhav Agrawal (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17195965#comment-17195965
 ] 

Udbhav Agrawal commented on SPARK-32887:


Thanks for reporting, seems to be a documenting typo error. and since it is 
misleading i will raise a MR to correct it

> Example command in 
> https://spark.apache.org/docs/latest/sql-ref-syntax-aux-show-table.html to be 
> changed
> 
>
> Key: SPARK-32887
> URL: https://issues.apache.org/jira/browse/SPARK-32887
> Project: Spark
>  Issue Type: Bug
>  Components: docs
>Affects Versions: 3.0.0
> Environment: Spark 2.4.5, Spark 3.0.0
>Reporter: Chetan Bhat
>Priority: Minor
>
> In the link 
> [https://spark.apache.org/docs/latest/sql-ref-syntax-aux-show-table.html] the 
> below command example mentioned is wrong.
> SHOW TABLE EXTENDED IN default LIKE 'employee' PARTITION ('grade=1');
>  
> Complete example executed throws below error.
> CREATE TABLE employee(name STRING)PARTITIONED BY (grade int) stored as 
> parquet;
> INSERT INTO employee PARTITION (grade = 1) VALUES ('sam');
> INSERT INTO employee PARTITION (grade = 2) VALUES ('suj');
> spark-sql> SHOW TABLE EXTENDED IN default LIKE 'employee' PARTITION 
> ('grade=1');
> **Error in query:**
> ```
>  mismatched input ''grade=1'' expecting \{'ADD', 'AFTER', 'ALL', 'ALTER', 
> 'ANALYZE', 'AND', 'ANTI', 'ANY', 'ARCHIVE', 'ARRAY', 'AS', 'ASC', 'AT', 
> 'AUTHORIZATION', 'BETWEEN', 'BOTH', 'BUCKET', 'BUCKETS', 'BY', 'CACHE', 
> 'CASCADE', 'CASE', 'CAST', 'CHANGE', 'CHECK', 'CLEAR', 'CLUSTER', 
> 'CLUSTERED', 'CODEGEN', 'COLLATE', 'COLLECTION', 'COLUMN', 'COLUMNS', 
> 'COMMENT', 'COMMIT', 'COMPACT', 'COMPACTIONS', 'COMPUTE', 'CONCATENATE', 
> 'CONSTRAINT', 'COST', 'CREATE', 'CROSS', 'CUBE', 'CURRENT', 'CURRENT_DATE', 
> 'CURRENT_TIME', 'CURRENT_TIMESTAMP', 'CURRENT_USER', 'DATA', 'DATABASE', 
> DATABASES, 'DAY', 'DBPROPERTIES', 'DEFINED', 'DELETE', 'DELIMITED', 'DESC', 
> 'DESCRIBE', 'DFS', 'DIRECTORIES', 'DIRECTORY', 'DISTINCT', 'DISTRIBUTE', 
> 'DROP', 'ELSE', 'END', 'ESCAPE', 'ESCAPED', 'EXCEPT', 'EXCHANGE', 'EXISTS', 
> 'EXPLAIN', 'EXPORT', 'EXTENDED', 'EXTERNAL', 'EXTRACT', 'FALSE', 'FETCH', 
> 'FIELDS', 'FILTER', 'FILEFORMAT', 'FIRST', 'FOLLOWING', 'FOR', 'FOREIGN', 
> 'FORMAT', 'FORMATTED', 'FROM', 'FULL', 'FUNCTION', 'FUNCTIONS', 'GLOBAL', 
> 'GRANT', 'GROUP', 'GROUPING', 'HAVING', 'HOUR', 'IF', 'IGNORE', 'IMPORT', 
> 'IN', 'INDEX', 'INDEXES', 'INNER', 'INPATH', 'INPUTFORMAT', 'INSERT', 
> 'INTERSECT', 'INTERVAL', 'INTO', 'IS', 'ITEMS', 'JOIN', 'KEYS', 'LAST', 
> 'LATERAL', 'LAZY', 'LEADING', 'LEFT', 'LIKE', 'LIMIT', 'LINES', 'LIST', 
> 'LOAD', 'LOCAL', 'LOCATION', 'LOCK', 'LOCKS', 'LOGICAL', 'MACRO', 'MAP', 
> 'MATCHED', 'MERGE', 'MINUTE', 'MONTH', 'MSCK', 'NAMESPACE', 'NAMESPACES', 
> 'NATURAL', 'NO', NOT, 'NULL', 'NULLS', 'OF', 'ON', 'ONLY', 'OPTION', 
> 'OPTIONS', 'OR', 'ORDER', 'OUT', 'OUTER', 'OUTPUTFORMAT', 'OVER', 'OVERLAPS', 
> 'OVERLAY', 'OVERWRITE', 'PARTITION', 'PARTITIONED', 'PARTITIONS', 'PERCENT', 
> 'PIVOT', 'PLACING', 'POSITION', 'PRECEDING', 'PRIMARY', 'PRINCIPALS', 
> 'PROPERTIES', 'PURGE', 'QUERY', 'RANGE', 'RECORDREADER', 'RECORDWRITER', 
> 'RECOVER', 'REDUCE', 'REFERENCES', 'REFRESH', 'RENAME', 'REPAIR', 'REPLACE', 
> 'RESET', 'RESTRICT', 'REVOKE', 'RIGHT', RLIKE, 'ROLE', 'ROLES', 'ROLLBACK', 
> 'ROLLUP', 'ROW', 'ROWS', 'SCHEMA', 'SECOND', 'SELECT', 'SEMI', 'SEPARATED', 
> 'SERDE', 'SERDEPROPERTIES', 'SESSION_USER', 'SET', 'MINUS', 'SETS', 'SHOW', 
> 'SKEWED', 'SOME', 'SORT', 'SORTED', 'START', 'STATISTICS', 'STORED', 
> 'STRATIFY', 'STRUCT', 'SUBSTR', 'SUBSTRING', 'TABLE', 'TABLES', 
> 'TABLESAMPLE', 'TBLPROPERTIES', TEMPORARY, 'TERMINATED', 'THEN', 'TO', 
> 'TOUCH', 'TRAILING', 'TRANSACTION', 'TRANSACTIONS', 'TRANSFORM', 'TRIM', 
> 'TRUE', 'TRUNCATE', 'TYPE', 'UNARCHIVE', 'UNBOUNDED', 'UNCACHE', 'UNION', 
> 'UNIQUE', 'UNKNOWN', 'UNLOCK', 'UNSET', 'UPDATE', 'USE', 'USER', 'USING', 
> 'VALUES', 'VIEW', 'VIEWS', 'WHEN', 'WHERE', 'WINDOW', 'WITH', 'YEAR', 'DIV', 
> IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 59)
> == SQL ==
>  SHOW TABLE EXTENDED IN default LIKE 'employee' PARTITION ('grade=1')
>  ---^^^
> ```
>  
> Expected : - If that partition value is string we can give like this grade 
> ='abc'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-32887) Example command in https://spark.apache.org/docs/latest/sql-ref-syntax-aux-show-table.html to be changed

2020-09-15 Thread Chetan Bhat (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Bhat updated SPARK-32887:

Description: 
In the link 
[https://spark.apache.org/docs/latest/sql-ref-syntax-aux-show-table.html] the 
below command example mentioned is wrong.

SHOW TABLE EXTENDED IN default LIKE 'employee' PARTITION ('grade=1');

 

Complete example executed throws below error.

CREATE TABLE employee(name STRING)PARTITIONED BY (grade int) stored as parquet;

INSERT INTO employee PARTITION (grade = 1) VALUES ('sam');

INSERT INTO employee PARTITION (grade = 2) VALUES ('suj');

spark-sql> SHOW TABLE EXTENDED IN default LIKE 'employee' PARTITION ('grade=1');

**Error in query:**

```
 mismatched input ''grade=1'' expecting \{'ADD', 'AFTER', 'ALL', 'ALTER', 
'ANALYZE', 'AND', 'ANTI', 'ANY', 'ARCHIVE', 'ARRAY', 'AS', 'ASC', 'AT', 
'AUTHORIZATION', 'BETWEEN', 'BOTH', 'BUCKET', 'BUCKETS', 'BY', 'CACHE', 
'CASCADE', 'CASE', 'CAST', 'CHANGE', 'CHECK', 'CLEAR', 'CLUSTER', 'CLUSTERED', 
'CODEGEN', 'COLLATE', 'COLLECTION', 'COLUMN', 'COLUMNS', 'COMMENT', 'COMMIT', 
'COMPACT', 'COMPACTIONS', 'COMPUTE', 'CONCATENATE', 'CONSTRAINT', 'COST', 
'CREATE', 'CROSS', 'CUBE', 'CURRENT', 'CURRENT_DATE', 'CURRENT_TIME', 
'CURRENT_TIMESTAMP', 'CURRENT_USER', 'DATA', 'DATABASE', DATABASES, 'DAY', 
'DBPROPERTIES', 'DEFINED', 'DELETE', 'DELIMITED', 'DESC', 'DESCRIBE', 'DFS', 
'DIRECTORIES', 'DIRECTORY', 'DISTINCT', 'DISTRIBUTE', 'DROP', 'ELSE', 'END', 
'ESCAPE', 'ESCAPED', 'EXCEPT', 'EXCHANGE', 'EXISTS', 'EXPLAIN', 'EXPORT', 
'EXTENDED', 'EXTERNAL', 'EXTRACT', 'FALSE', 'FETCH', 'FIELDS', 'FILTER', 
'FILEFORMAT', 'FIRST', 'FOLLOWING', 'FOR', 'FOREIGN', 'FORMAT', 'FORMATTED', 
'FROM', 'FULL', 'FUNCTION', 'FUNCTIONS', 'GLOBAL', 'GRANT', 'GROUP', 
'GROUPING', 'HAVING', 'HOUR', 'IF', 'IGNORE', 'IMPORT', 'IN', 'INDEX', 
'INDEXES', 'INNER', 'INPATH', 'INPUTFORMAT', 'INSERT', 'INTERSECT', 'INTERVAL', 
'INTO', 'IS', 'ITEMS', 'JOIN', 'KEYS', 'LAST', 'LATERAL', 'LAZY', 'LEADING', 
'LEFT', 'LIKE', 'LIMIT', 'LINES', 'LIST', 'LOAD', 'LOCAL', 'LOCATION', 'LOCK', 
'LOCKS', 'LOGICAL', 'MACRO', 'MAP', 'MATCHED', 'MERGE', 'MINUTE', 'MONTH', 
'MSCK', 'NAMESPACE', 'NAMESPACES', 'NATURAL', 'NO', NOT, 'NULL', 'NULLS', 'OF', 
'ON', 'ONLY', 'OPTION', 'OPTIONS', 'OR', 'ORDER', 'OUT', 'OUTER', 
'OUTPUTFORMAT', 'OVER', 'OVERLAPS', 'OVERLAY', 'OVERWRITE', 'PARTITION', 
'PARTITIONED', 'PARTITIONS', 'PERCENT', 'PIVOT', 'PLACING', 'POSITION', 
'PRECEDING', 'PRIMARY', 'PRINCIPALS', 'PROPERTIES', 'PURGE', 'QUERY', 'RANGE', 
'RECORDREADER', 'RECORDWRITER', 'RECOVER', 'REDUCE', 'REFERENCES', 'REFRESH', 
'RENAME', 'REPAIR', 'REPLACE', 'RESET', 'RESTRICT', 'REVOKE', 'RIGHT', RLIKE, 
'ROLE', 'ROLES', 'ROLLBACK', 'ROLLUP', 'ROW', 'ROWS', 'SCHEMA', 'SECOND', 
'SELECT', 'SEMI', 'SEPARATED', 'SERDE', 'SERDEPROPERTIES', 'SESSION_USER', 
'SET', 'MINUS', 'SETS', 'SHOW', 'SKEWED', 'SOME', 'SORT', 'SORTED', 'START', 
'STATISTICS', 'STORED', 'STRATIFY', 'STRUCT', 'SUBSTR', 'SUBSTRING', 'TABLE', 
'TABLES', 'TABLESAMPLE', 'TBLPROPERTIES', TEMPORARY, 'TERMINATED', 'THEN', 
'TO', 'TOUCH', 'TRAILING', 'TRANSACTION', 'TRANSACTIONS', 'TRANSFORM', 'TRIM', 
'TRUE', 'TRUNCATE', 'TYPE', 'UNARCHIVE', 'UNBOUNDED', 'UNCACHE', 'UNION', 
'UNIQUE', 'UNKNOWN', 'UNLOCK', 'UNSET', 'UPDATE', 'USE', 'USER', 'USING', 
'VALUES', 'VIEW', 'VIEWS', 'WHEN', 'WHERE', 'WINDOW', 'WITH', 'YEAR', 'DIV', 
IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 59)

== SQL ==
 SHOW TABLE EXTENDED IN default LIKE 'employee' PARTITION ('grade=1')
 ---^^^

```

 

Expected : - If that partition value is string we can give like this grade 
='abc'

  was:
In the link 
[https://spark.apache.org/docs/latest/sql-ref-syntax-aux-show-table.html] the 
below command example mentioned is wrong.

SHOW TABLE EXTENDED IN default LIKE 'employee' PARTITION ('grade=1');

CREATE TABLE employee(name STRING)PARTITIONED BY (grade int) stored as parquet;

INSERT INTO employee PARTITION (grade = 1) VALUES ('sam');

INSERT INTO employee PARTITION (grade = 2) VALUES ('suj');

spark-sql> SHOW TABLE EXTENDED IN default LIKE 'employee' PARTITION ('grade=1');

**Error in query:**

```
mismatched input ''grade=1'' expecting \{'ADD', 'AFTER', 'ALL', 'ALTER', 
'ANALYZE', 'AND', 'ANTI', 'ANY', 'ARCHIVE', 'ARRAY', 'AS', 'ASC', 'AT', 
'AUTHORIZATION', 'BETWEEN', 'BOTH', 'BUCKET', 'BUCKETS', 'BY', 'CACHE', 
'CASCADE', 'CASE', 'CAST', 'CHANGE', 'CHECK', 'CLEAR', 'CLUSTER', 'CLUSTERED', 
'CODEGEN', 'COLLATE', 'COLLECTION', 'COLUMN', 'COLUMNS', 'COMMENT', 'COMMIT', 
'COMPACT', 'COMPACTIONS', 'COMPUTE', 'CONCATENATE', 'CONSTRAINT', 'COST', 
'CREATE', 'CROSS', 'CUBE', 'CURRENT', 'CURRENT_DATE', 'CURRENT_TIME', 
'CURRENT_TIMESTAMP', 'CURRENT_USER', 'DATA', 'DATABASE', DATABASES, 'DAY', 
'DBPROPERTIES', 'DEFINED', 'DELETE', 'DELIMITED', 'DESC', 'DESCRIBE', 'DFS', 
'DIRECTORIES', 'DIRECTORY', 'DISTINCT', 'DISTRIBUTE', 'DROP', 'EL

[jira] [Created] (SPARK-32887) Example command in https://spark.apache.org/docs/latest/sql-ref-syntax-aux-show-table.html to be changed

2020-09-15 Thread Chetan Bhat (Jira)

Chetan Bhat created SPARK-32887:
---

 Summary: Example command in 
https://spark.apache.org/docs/latest/sql-ref-syntax-aux-show-table.html to be 
changed
 Key: SPARK-32887
 URL: https://issues.apache.org/jira/browse/SPARK-32887
 Project: Spark
  Issue Type: Bug
  Components: docs
Affects Versions: 3.0.0
 Environment: Spark 2.4.5, Spark 3.0.0
Reporter: Chetan Bhat


In the link 
[https://spark.apache.org/docs/latest/sql-ref-syntax-aux-show-table.html] the 
below command example mentioned is wrong.

SHOW TABLE EXTENDED IN default LIKE 'employee' PARTITION ('grade=1');

CREATE TABLE employee(name STRING)PARTITIONED BY (grade int) stored as parquet;

INSERT INTO employee PARTITION (grade = 1) VALUES ('sam');

INSERT INTO employee PARTITION (grade = 2) VALUES ('suj');

spark-sql> SHOW TABLE EXTENDED IN default LIKE 'employee' PARTITION ('grade=1');

**Error in query:**

```
mismatched input ''grade=1'' expecting \{'ADD', 'AFTER', 'ALL', 'ALTER', 
'ANALYZE', 'AND', 'ANTI', 'ANY', 'ARCHIVE', 'ARRAY', 'AS', 'ASC', 'AT', 
'AUTHORIZATION', 'BETWEEN', 'BOTH', 'BUCKET', 'BUCKETS', 'BY', 'CACHE', 
'CASCADE', 'CASE', 'CAST', 'CHANGE', 'CHECK', 'CLEAR', 'CLUSTER', 'CLUSTERED', 
'CODEGEN', 'COLLATE', 'COLLECTION', 'COLUMN', 'COLUMNS', 'COMMENT', 'COMMIT', 
'COMPACT', 'COMPACTIONS', 'COMPUTE', 'CONCATENATE', 'CONSTRAINT', 'COST', 
'CREATE', 'CROSS', 'CUBE', 'CURRENT', 'CURRENT_DATE', 'CURRENT_TIME', 
'CURRENT_TIMESTAMP', 'CURRENT_USER', 'DATA', 'DATABASE', DATABASES, 'DAY', 
'DBPROPERTIES', 'DEFINED', 'DELETE', 'DELIMITED', 'DESC', 'DESCRIBE', 'DFS', 
'DIRECTORIES', 'DIRECTORY', 'DISTINCT', 'DISTRIBUTE', 'DROP', 'ELSE', 'END', 
'ESCAPE', 'ESCAPED', 'EXCEPT', 'EXCHANGE', 'EXISTS', 'EXPLAIN', 'EXPORT', 
'EXTENDED', 'EXTERNAL', 'EXTRACT', 'FALSE', 'FETCH', 'FIELDS', 'FILTER', 
'FILEFORMAT', 'FIRST', 'FOLLOWING', 'FOR', 'FOREIGN', 'FORMAT', 'FORMATTED', 
'FROM', 'FULL', 'FUNCTION', 'FUNCTIONS', 'GLOBAL', 'GRANT', 'GROUP', 
'GROUPING', 'HAVING', 'HOUR', 'IF', 'IGNORE', 'IMPORT', 'IN', 'INDEX', 
'INDEXES', 'INNER', 'INPATH', 'INPUTFORMAT', 'INSERT', 'INTERSECT', 'INTERVAL', 
'INTO', 'IS', 'ITEMS', 'JOIN', 'KEYS', 'LAST', 'LATERAL', 'LAZY', 'LEADING', 
'LEFT', 'LIKE', 'LIMIT', 'LINES', 'LIST', 'LOAD', 'LOCAL', 'LOCATION', 'LOCK', 
'LOCKS', 'LOGICAL', 'MACRO', 'MAP', 'MATCHED', 'MERGE', 'MINUTE', 'MONTH', 
'MSCK', 'NAMESPACE', 'NAMESPACES', 'NATURAL', 'NO', NOT, 'NULL', 'NULLS', 'OF', 
'ON', 'ONLY', 'OPTION', 'OPTIONS', 'OR', 'ORDER', 'OUT', 'OUTER', 
'OUTPUTFORMAT', 'OVER', 'OVERLAPS', 'OVERLAY', 'OVERWRITE', 'PARTITION', 
'PARTITIONED', 'PARTITIONS', 'PERCENT', 'PIVOT', 'PLACING', 'POSITION', 
'PRECEDING', 'PRIMARY', 'PRINCIPALS', 'PROPERTIES', 'PURGE', 'QUERY', 'RANGE', 
'RECORDREADER', 'RECORDWRITER', 'RECOVER', 'REDUCE', 'REFERENCES', 'REFRESH', 
'RENAME', 'REPAIR', 'REPLACE', 'RESET', 'RESTRICT', 'REVOKE', 'RIGHT', RLIKE, 
'ROLE', 'ROLES', 'ROLLBACK', 'ROLLUP', 'ROW', 'ROWS', 'SCHEMA', 'SECOND', 
'SELECT', 'SEMI', 'SEPARATED', 'SERDE', 'SERDEPROPERTIES', 'SESSION_USER', 
'SET', 'MINUS', 'SETS', 'SHOW', 'SKEWED', 'SOME', 'SORT', 'SORTED', 'START', 
'STATISTICS', 'STORED', 'STRATIFY', 'STRUCT', 'SUBSTR', 'SUBSTRING', 'TABLE', 
'TABLES', 'TABLESAMPLE', 'TBLPROPERTIES', TEMPORARY, 'TERMINATED', 'THEN', 
'TO', 'TOUCH', 'TRAILING', 'TRANSACTION', 'TRANSACTIONS', 'TRANSFORM', 'TRIM', 
'TRUE', 'TRUNCATE', 'TYPE', 'UNARCHIVE', 'UNBOUNDED', 'UNCACHE', 'UNION', 
'UNIQUE', 'UNKNOWN', 'UNLOCK', 'UNSET', 'UPDATE', 'USE', 'USER', 'USING', 
'VALUES', 'VIEW', 'VIEWS', 'WHEN', 'WHERE', 'WINDOW', 'WITH', 'YEAR', 'DIV', 
IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 59)

== SQL ==
SHOW TABLE EXTENDED IN default LIKE 'employee' PARTITION ('grade=1')
---^^^

```

 

Expected : - If that partition value is string we can give like this grade 
='abc'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-32578) PageRank not sending the correct values in Pergel sendMessage

2020-09-15 Thread Takeshi Yamamuro (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro resolved SPARK-32578.
--
Resolution: Invalid

> PageRank not sending the correct values in Pergel sendMessage
> -
>
> Key: SPARK-32578
> URL: https://issues.apache.org/jira/browse/SPARK-32578
> Project: Spark
>  Issue Type: Bug
>  Components: GraphX
>Affects Versions: 2.3.0, 2.4.0, 3.0.0
>Reporter: Shay Elbaz
>Priority: Major
>
> The core sendMessage method is incorrect:
> {code:java}
> def sendMessage(edge: EdgeTriplet[(Double, Double), Double]) = {
>  if (edge.srcAttr._2 > tol) {
>Iterator((edge.dstId, edge.srcAttr._2 * edge.attr))
>   // *** THIS ^ ***
>  } else {
>Iterator.empty
>  }
> }{code}
>  
> Instead of using the source PR value, it's using the PR delta (2nd tuple 
> arg). This is not the documented behavior, nor a valid PR algorithm AFAIK.
> This is a 7 years old code, all versions affected.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32578) PageRank not sending the correct values in Pergel sendMessage

2020-09-15 Thread Shay Elbaz (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17195948#comment-17195948
 ] 

Shay Elbaz commented on SPARK-32578:


It turned out the problem was in my benchmark, sorry about that.

> PageRank not sending the correct values in Pergel sendMessage
> -
>
> Key: SPARK-32578
> URL: https://issues.apache.org/jira/browse/SPARK-32578
> Project: Spark
>  Issue Type: Bug
>  Components: GraphX
>Affects Versions: 2.3.0, 2.4.0, 3.0.0
>Reporter: Shay Elbaz
>Priority: Major
>
> The core sendMessage method is incorrect:
> {code:java}
> def sendMessage(edge: EdgeTriplet[(Double, Double), Double]) = {
>  if (edge.srcAttr._2 > tol) {
>Iterator((edge.dstId, edge.srcAttr._2 * edge.attr))
>   // *** THIS ^ ***
>  } else {
>Iterator.empty
>  }
> }{code}
>  
> Instead of using the source PR value, it's using the PR delta (2nd tuple 
> arg). This is not the documented behavior, nor a valid PR algorithm AFAIK.
> This is a 7 years old code, all versions affected.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-32886) '.../jobs/undefined' link from EvenTimeline view

2020-09-15 Thread Zhen Li (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhen Li updated SPARK-32886:

Attachment: undefinedlink.JPG

> '.../jobs/undefined' link from EvenTimeline view
> 
>
> Key: SPARK-32886
> URL: https://issues.apache.org/jira/browse/SPARK-32886
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Zhen Li
>Priority: Minor
> Attachments: undefinedlink.JPG
>
>
> In event timeline view of jobs page, clicking job item would redirect you to 
> corresponding job page. when there are two many jobs, some job items' link 
> would redirect to wrong link like '.../jobs/undefined'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-32886) '.../jobs/undefined' link from "Event Timeline" in jobs page

2020-09-15 Thread Zhen Li (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhen Li updated SPARK-32886:

Summary: '.../jobs/undefined' link from "Event Timeline" in jobs page  
(was: '.../jobs/undefined' link from EvenTimeline view)

> '.../jobs/undefined' link from "Event Timeline" in jobs page
> 
>
> Key: SPARK-32886
> URL: https://issues.apache.org/jira/browse/SPARK-32886
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Zhen Li
>Priority: Minor
> Attachments: undefinedlink.JPG
>
>
> In event timeline view of jobs page, clicking job item would redirect you to 
> corresponding job page. when there are two many jobs, some job items' link 
> would redirect to wrong link like '.../jobs/undefined'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-32886) '.../jobs/undefined' link from EvenTimeline view

2020-09-15 Thread Zhen Li (Jira)

Zhen Li created SPARK-32886:
---

 Summary: '.../jobs/undefined' link from EvenTimeline view
 Key: SPARK-32886
 URL: https://issues.apache.org/jira/browse/SPARK-32886
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 3.0.0, 3.1.0
Reporter: Zhen Li


In event timeline view of jobs page, clicking job item would redirect you to 
corresponding job page. when there are two many jobs, some job items' link 
would redirect to wrong link like '.../jobs/undefined'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

77 matches

Mail list logo