[jira] [Created] (SPARK-31115) Lots of columns and distinct aggregation functions triggers compile exception on Janino

2020-03-11 Thread Jungtaek Lim (Jira)
Jungtaek Lim created SPARK-31115:


 Summary: Lots of columns and distinct aggregation functions 
triggers compile exception on Janino
 Key: SPARK-31115
 URL: https://issues.apache.org/jira/browse/SPARK-31115
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.5, 2.3.4, 3.0.0
Reporter: Jungtaek Lim


We got some report on failure on user's query which Janino throws error on 
compiling generated code. The issue is here: 
[janino-compiler/janino#113|https://github.com/janino-compiler/janino/issues/113]
 

It contains the information of generated code, symptom (error), and analysis of 
the bug, so please refer the link for more details.

It would be ideal to upgrade Janino to the new version which contains the fix 
[janino-compiler/janino#114|https://github.com/janino-compiler/janino/pull/114] 
- SPARK-31101 tracks the effort to upgrade Janino, but given upgrading Janino 
to 3.1.1 seems to bring lots of test failures and there's no guarantee to have 
Janino 3.0.16, we may need to provide workaround to avoid hitting Janino bug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8333) Spark failed to delete temp directory created by HiveContext

2020-03-11 Thread Saeed Hassanvand (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-8333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056710#comment-17056710
 ] 

Saeed Hassanvand commented on SPARK-8333:
-

Hi,

Its seems that this bug still exists! I encountered this issue in 
javaSparkContext, not hiveContext.

I using spark-submit to run a simple spark 
[example|[https://spark.apache.org/docs/latest/quick-start.html#self-contained-applications]]
 from the official spark [documentation 
page|[https://spark.apache.org/docs/latest/quick-start.html]].

 

$HADOOP_HOME: C:\winutils\bin\winutils.exe

Spark Version: spark-2.4.5-bin-hadoop2.7

Windows 10

 

Tnx.

> Spark failed to delete temp directory created by HiveContext
> 
>
> Key: SPARK-8333
> URL: https://issues.apache.org/jira/browse/SPARK-8333
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.0
> Environment: Windows7 64bit
>Reporter: sheng
>Priority: Minor
>  Labels: Hive, bulk-closed, metastore, sparksql
> Attachments: test.tar
>
>
> Spark 1.4.0 failed to stop SparkContext.
> {code:title=LocalHiveTest.scala|borderStyle=solid}
>  val sc = new SparkContext("local", "local-hive-test", new SparkConf())
>  val hc = Utils.createHiveContext(sc)
>  ... // execute some HiveQL statements
>  sc.stop()
> {code}
> sc.stop() failed to execute, it threw the following exception:
> {quote}
> 15/06/13 03:19:06 INFO Utils: Shutdown hook called
> 15/06/13 03:19:06 INFO Utils: Deleting directory 
> C:\Users\moshangcheng\AppData\Local\Temp\spark-d6d3c30e-512e-4693-a436-485e2af4baea
> 15/06/13 03:19:06 ERROR Utils: Exception while deleting Spark temp dir: 
> C:\Users\moshangcheng\AppData\Local\Temp\spark-d6d3c30e-512e-4693-a436-485e2af4baea
> java.io.IOException: Failed to delete: 
> C:\Users\moshangcheng\AppData\Local\Temp\spark-d6d3c30e-512e-4693-a436-485e2af4baea
>   at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:963)
>   at 
> org.apache.spark.util.Utils$$anonfun$1$$anonfun$apply$mcV$sp$5.apply(Utils.scala:204)
>   at 
> org.apache.spark.util.Utils$$anonfun$1$$anonfun$apply$mcV$sp$5.apply(Utils.scala:201)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at org.apache.spark.util.Utils$$anonfun$1.apply$mcV$sp(Utils.scala:201)
>   at org.apache.spark.util.SparkShutdownHook.run(Utils.scala:2292)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(Utils.scala:2262)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(Utils.scala:2262)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(Utils.scala:2262)
>   at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(Utils.scala:2262)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(Utils.scala:2262)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(Utils.scala:2262)
>   at scala.util.Try$.apply(Try.scala:161)
>   at 
> org.apache.spark.util.SparkShutdownHookManager.runAll(Utils.scala:2262)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anon$6.run(Utils.scala:2244)
>   at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
> {quote}
> It seems this bug is introduced by this SPARK-6907. In SPARK-6907, a local 
> hive metastore is created in a temp directory. The problem is the local hive 
> metastore is not shut down correctly. At the end of application,  if 
> SparkContext.stop() is called, it tries to delete the temp directory which is 
> still used by the local hive metastore, and throws an exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-8333) Spark failed to delete temp directory created by HiveContext

2020-03-11 Thread Saeed Hassanvand (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-8333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056710#comment-17056710
 ] 

Saeed Hassanvand edited comment on SPARK-8333 at 3/11/20, 7:14 AM:
---

Hi,

Its seems that this bug still exists! I encountered this issue in 
javaSparkContext, not hiveContext.

I using spark-submit to run a simple spark example 
([https://spark.apache.org/docs/latest/quick-start.html]) from the official 
spark documentation page 
([https://spark.apache.org/docs/latest/quick-start.html]].

 

$HADOOP_HOME: C:\winutils\bin\winutils.exe

Spark Version: spark-2.4.5-bin-hadoop2.7

Windows 10

 

Tnx.


was (Author: saeedhassanvand):
Hi,

Its seems that this bug still exists! I encountered this issue in 
javaSparkContext, not hiveContext.

I using spark-submit to run a simple spark 
[example|[https://spark.apache.org/docs/latest/quick-start.html#self-contained-applications]]
 from the official spark [documentation 
page|[https://spark.apache.org/docs/latest/quick-start.html]].

 

$HADOOP_HOME: C:\winutils\bin\winutils.exe

Spark Version: spark-2.4.5-bin-hadoop2.7

Windows 10

 

Tnx.

> Spark failed to delete temp directory created by HiveContext
> 
>
> Key: SPARK-8333
> URL: https://issues.apache.org/jira/browse/SPARK-8333
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.0
> Environment: Windows7 64bit
>Reporter: sheng
>Priority: Minor
>  Labels: Hive, bulk-closed, metastore, sparksql
> Attachments: test.tar
>
>
> Spark 1.4.0 failed to stop SparkContext.
> {code:title=LocalHiveTest.scala|borderStyle=solid}
>  val sc = new SparkContext("local", "local-hive-test", new SparkConf())
>  val hc = Utils.createHiveContext(sc)
>  ... // execute some HiveQL statements
>  sc.stop()
> {code}
> sc.stop() failed to execute, it threw the following exception:
> {quote}
> 15/06/13 03:19:06 INFO Utils: Shutdown hook called
> 15/06/13 03:19:06 INFO Utils: Deleting directory 
> C:\Users\moshangcheng\AppData\Local\Temp\spark-d6d3c30e-512e-4693-a436-485e2af4baea
> 15/06/13 03:19:06 ERROR Utils: Exception while deleting Spark temp dir: 
> C:\Users\moshangcheng\AppData\Local\Temp\spark-d6d3c30e-512e-4693-a436-485e2af4baea
> java.io.IOException: Failed to delete: 
> C:\Users\moshangcheng\AppData\Local\Temp\spark-d6d3c30e-512e-4693-a436-485e2af4baea
>   at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:963)
>   at 
> org.apache.spark.util.Utils$$anonfun$1$$anonfun$apply$mcV$sp$5.apply(Utils.scala:204)
>   at 
> org.apache.spark.util.Utils$$anonfun$1$$anonfun$apply$mcV$sp$5.apply(Utils.scala:201)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at org.apache.spark.util.Utils$$anonfun$1.apply$mcV$sp(Utils.scala:201)
>   at org.apache.spark.util.SparkShutdownHook.run(Utils.scala:2292)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(Utils.scala:2262)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(Utils.scala:2262)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(Utils.scala:2262)
>   at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(Utils.scala:2262)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(Utils.scala:2262)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(Utils.scala:2262)
>   at scala.util.Try$.apply(Try.scala:161)
>   at 
> org.apache.spark.util.SparkShutdownHookManager.runAll(Utils.scala:2262)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anon$6.run(Utils.scala:2244)
>   at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
> {quote}
> It seems this bug is introduced by this SPARK-6907. In SPARK-6907, a local 
> hive metastore is created in a temp directory. The problem is the local hive 
> metastore is not shut down correctly. At the end of application,  if 
> SparkContext.stop() is called, it tries to delete the temp directory which is 
> still used by the local hive metastore, and throws an exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-8333) Spark failed to delete temp directory created by HiveContext

2020-03-11 Thread Saeed Hassanvand (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-8333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056710#comment-17056710
 ] 

Saeed Hassanvand edited comment on SPARK-8333 at 3/11/20, 7:15 AM:
---

Hi,

It seems that this bug still exists! I encountered this issue in 
javaSparkContext, not hiveContext.

I using spark-submit to run a simple spark example 
([https://spark.apache.org/docs/latest/quick-start.html]) from the official 
spark documentation page 
([https://spark.apache.org/docs/latest/quick-start.html]].

 

$HADOOP_HOME: C:\winutils\bin\winutils.exe

Spark Version: spark-2.4.5-bin-hadoop2.7

Windows 10

 

Tnx.


was (Author: saeedhassanvand):
Hi,

Its seems that this bug still exists! I encountered this issue in 
javaSparkContext, not hiveContext.

I using spark-submit to run a simple spark example 
([https://spark.apache.org/docs/latest/quick-start.html]) from the official 
spark documentation page 
([https://spark.apache.org/docs/latest/quick-start.html]].

 

$HADOOP_HOME: C:\winutils\bin\winutils.exe

Spark Version: spark-2.4.5-bin-hadoop2.7

Windows 10

 

Tnx.

> Spark failed to delete temp directory created by HiveContext
> 
>
> Key: SPARK-8333
> URL: https://issues.apache.org/jira/browse/SPARK-8333
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.0
> Environment: Windows7 64bit
>Reporter: sheng
>Priority: Minor
>  Labels: Hive, bulk-closed, metastore, sparksql
> Attachments: test.tar
>
>
> Spark 1.4.0 failed to stop SparkContext.
> {code:title=LocalHiveTest.scala|borderStyle=solid}
>  val sc = new SparkContext("local", "local-hive-test", new SparkConf())
>  val hc = Utils.createHiveContext(sc)
>  ... // execute some HiveQL statements
>  sc.stop()
> {code}
> sc.stop() failed to execute, it threw the following exception:
> {quote}
> 15/06/13 03:19:06 INFO Utils: Shutdown hook called
> 15/06/13 03:19:06 INFO Utils: Deleting directory 
> C:\Users\moshangcheng\AppData\Local\Temp\spark-d6d3c30e-512e-4693-a436-485e2af4baea
> 15/06/13 03:19:06 ERROR Utils: Exception while deleting Spark temp dir: 
> C:\Users\moshangcheng\AppData\Local\Temp\spark-d6d3c30e-512e-4693-a436-485e2af4baea
> java.io.IOException: Failed to delete: 
> C:\Users\moshangcheng\AppData\Local\Temp\spark-d6d3c30e-512e-4693-a436-485e2af4baea
>   at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:963)
>   at 
> org.apache.spark.util.Utils$$anonfun$1$$anonfun$apply$mcV$sp$5.apply(Utils.scala:204)
>   at 
> org.apache.spark.util.Utils$$anonfun$1$$anonfun$apply$mcV$sp$5.apply(Utils.scala:201)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at org.apache.spark.util.Utils$$anonfun$1.apply$mcV$sp(Utils.scala:201)
>   at org.apache.spark.util.SparkShutdownHook.run(Utils.scala:2292)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(Utils.scala:2262)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(Utils.scala:2262)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(Utils.scala:2262)
>   at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(Utils.scala:2262)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(Utils.scala:2262)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(Utils.scala:2262)
>   at scala.util.Try$.apply(Try.scala:161)
>   at 
> org.apache.spark.util.SparkShutdownHookManager.runAll(Utils.scala:2262)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anon$6.run(Utils.scala:2244)
>   at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
> {quote}
> It seems this bug is introduced by this SPARK-6907. In SPARK-6907, a local 
> hive metastore is created in a temp directory. The problem is the local hive 
> metastore is not shut down correctly. At the end of application,  if 
> SparkContext.stop() is called, it tries to delete the temp directory which is 
> still used by the local hive metastore, and throws an exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31116) PrquetRowConverter does not follow case sensitivity

2020-03-11 Thread Tae-kyeom, Kim (Jira)
Tae-kyeom, Kim created SPARK-31116:
--

 Summary: PrquetRowConverter does not follow case sensitivity
 Key: SPARK-31116
 URL: https://issues.apache.org/jira/browse/SPARK-31116
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Tae-kyeom, Kim


After upgrading spark versrion to 3.0.0-SNAPSHOT. Selecting parquet columns got 
exception in case insensitive manner. Even we set spark.sql.caseSensitive to 
false. Reading parquet with case ignored schema (which means columns in parquet 
and catalyst types are same with respect to case insensitive manner)

 

To reproduce error executing follow code cause 
java.lang.IllegalArgumentException

 
{code:java}
val path = "/some/temp/path"

spark
  .range(1L)
  .selectExpr("NAMED_STRUCT('lowercase', id, 'camelCase', id + 1) AS 
StructColumn")
  .write.parquet(path)

val caseInsensitiveSchema = new StructType()
  .add(
"StructColumn",
new StructType()
  .add("LowerCase", LongType)
  .add("camelcase", LongType)

spark.read.schema(caseInsensitiveSchema).parquet(path).show(){code}
Then we got following error.


{code:java}
23:57:09.077 ERROR org.apache.spark.executor.Executor: Exception in task 0.0 in 
stage 215.0 (TID 366)23:57:09.077 ERROR org.apache.spark.executor.Executor: 
Exception in task 0.0 in stage 215.0 (TID 
366)java.lang.IllegalArgumentException: lowercase does not exist. Available: 
LowerCase, camelcase at 
org.apache.spark.sql.types.StructType.$anonfun$fieldIndex$1(StructType.scala:306)
 at scala.collection.immutable.Map$Map2.getOrElse(Map.scala:147) at 
org.apache.spark.sql.types.StructType.fieldIndex(StructType.scala:305) at 
org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter.$anonfun$fieldConverters$1(ParquetRowConverter.scala:182)
 at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) 
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at 
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at 
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at 
scala.collection.TraversableLike.map(TraversableLike.scala:238) at 
scala.collection.TraversableLike.map$(TraversableLike.scala:231) at 
scala.collection.AbstractTraversable.map(Traversable.scala:108) at 
org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter.(ParquetRowConverter.scala:181)
 at 
org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter.org$apache$spark$sql$execution$datasources$parquet$ParquetRowConverter$$newConverter(ParquetRowConverter.scala:351)
 at 
org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter.$anonfun$fieldConverters$1(ParquetRowConverter.scala:185)
 at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) 
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at 
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at 
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at 
scala.collection.TraversableLike.map(TraversableLike.scala:238) at 
scala.collection.TraversableLike.map$(TraversableLike.scala:231) at 
scala.collection.AbstractTraversable.map(Traversable.scala:108) at 
org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter.(ParquetRowConverter.scala:181)
 at 
org.apache.spark.sql.execution.datasources.parquet.ParquetRecordMaterializer.(ParquetRecordMaterializer.scala:43)
 at 
org.apache.spark.sql.execution.datasources.parquet.ParquetReadSupport.prepareForRead(ParquetReadSupport.scala:130)
 at 
org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:204)
 at 
org.apache.parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:182)
 at 
org.apache.parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:140)
 at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.$anonfun$buildReaderWithPartitionValues$2(ParquetFileFormat.scala:341)
 at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:116)
 at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:169)
 at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:93)
 at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) at 
scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) at 
scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) at 
org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1804) at 
org.apache.spark.rdd.RDD.$anonfun$count$1(RDD.scala:1229) at 
org.apache.spark.rdd.RDD.$anonfun$count$1$adapted(RDD.scala:1229) at 
org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:21

[jira] [Comment Edited] (SPARK-8333) Spark failed to delete temp directory created by HiveContext

2020-03-11 Thread Saeed Hassanvand (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-8333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056710#comment-17056710
 ] 

Saeed Hassanvand edited comment on SPARK-8333 at 3/11/20, 7:16 AM:
---

Hi,

It seems that this bug still exists! I encountered this issue in 
javaSparkContext, not hiveContext.

I using spark-submit to run a simple spark example 
([https://spark.apache.org/docs/latest/quick-start.html]) from the official 
spark documentation page 
[[https://spark.apache.org/docs/latest/quick-start.html]].

 

$HADOOP_HOME: C:\winutils\bin\winutils.exe

Spark Version: spark-2.4.5-bin-hadoop2.7

Windows 10

 

Tnx.


was (Author: saeedhassanvand):
Hi,

It seems that this bug still exists! I encountered this issue in 
javaSparkContext, not hiveContext.

I using spark-submit to run a simple spark example 
([https://spark.apache.org/docs/latest/quick-start.html]) from the official 
spark documentation page 
([https://spark.apache.org/docs/latest/quick-start.html]].

 

$HADOOP_HOME: C:\winutils\bin\winutils.exe

Spark Version: spark-2.4.5-bin-hadoop2.7

Windows 10

 

Tnx.

> Spark failed to delete temp directory created by HiveContext
> 
>
> Key: SPARK-8333
> URL: https://issues.apache.org/jira/browse/SPARK-8333
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.0
> Environment: Windows7 64bit
>Reporter: sheng
>Priority: Minor
>  Labels: Hive, bulk-closed, metastore, sparksql
> Attachments: test.tar
>
>
> Spark 1.4.0 failed to stop SparkContext.
> {code:title=LocalHiveTest.scala|borderStyle=solid}
>  val sc = new SparkContext("local", "local-hive-test", new SparkConf())
>  val hc = Utils.createHiveContext(sc)
>  ... // execute some HiveQL statements
>  sc.stop()
> {code}
> sc.stop() failed to execute, it threw the following exception:
> {quote}
> 15/06/13 03:19:06 INFO Utils: Shutdown hook called
> 15/06/13 03:19:06 INFO Utils: Deleting directory 
> C:\Users\moshangcheng\AppData\Local\Temp\spark-d6d3c30e-512e-4693-a436-485e2af4baea
> 15/06/13 03:19:06 ERROR Utils: Exception while deleting Spark temp dir: 
> C:\Users\moshangcheng\AppData\Local\Temp\spark-d6d3c30e-512e-4693-a436-485e2af4baea
> java.io.IOException: Failed to delete: 
> C:\Users\moshangcheng\AppData\Local\Temp\spark-d6d3c30e-512e-4693-a436-485e2af4baea
>   at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:963)
>   at 
> org.apache.spark.util.Utils$$anonfun$1$$anonfun$apply$mcV$sp$5.apply(Utils.scala:204)
>   at 
> org.apache.spark.util.Utils$$anonfun$1$$anonfun$apply$mcV$sp$5.apply(Utils.scala:201)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at org.apache.spark.util.Utils$$anonfun$1.apply$mcV$sp(Utils.scala:201)
>   at org.apache.spark.util.SparkShutdownHook.run(Utils.scala:2292)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(Utils.scala:2262)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(Utils.scala:2262)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(Utils.scala:2262)
>   at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(Utils.scala:2262)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(Utils.scala:2262)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(Utils.scala:2262)
>   at scala.util.Try$.apply(Try.scala:161)
>   at 
> org.apache.spark.util.SparkShutdownHookManager.runAll(Utils.scala:2262)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anon$6.run(Utils.scala:2244)
>   at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
> {quote}
> It seems this bug is introduced by this SPARK-6907. In SPARK-6907, a local 
> hive metastore is created in a temp directory. The problem is the local hive 
> metastore is not shut down correctly. At the end of application,  if 
> SparkContext.stop() is called, it tries to delete the temp directory which is 
> still used by the local hive metastore, and throws an exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-8333) Spark failed to delete temp directory created by HiveContext

2020-03-11 Thread Saeed Hassanvand (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-8333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056710#comment-17056710
 ] 

Saeed Hassanvand edited comment on SPARK-8333 at 3/11/20, 7:17 AM:
---

Hi,

It seems that this bug still exists! I encountered this issue in 
javaSparkContext, not hiveContext.

I using spark-submit to run a simple spark 
[example|[https://spark.apache.org/docs/latest/quick-start.html]] from the 
official spark documentation page 
[[https://spark.apache.org/docs/latest/quick-start.html]].

 

$HADOOP_HOME: C:\winutils\bin\winutils.exe

Spark Version: spark-2.4.5-bin-hadoop2.7

Windows 10

 

Tnx.


was (Author: saeedhassanvand):
Hi,

It seems that this bug still exists! I encountered this issue in 
javaSparkContext, not hiveContext.

I using spark-submit to run a simple spark example 
([https://spark.apache.org/docs/latest/quick-start.html]) from the official 
spark documentation page 
[[https://spark.apache.org/docs/latest/quick-start.html]].

 

$HADOOP_HOME: C:\winutils\bin\winutils.exe

Spark Version: spark-2.4.5-bin-hadoop2.7

Windows 10

 

Tnx.

> Spark failed to delete temp directory created by HiveContext
> 
>
> Key: SPARK-8333
> URL: https://issues.apache.org/jira/browse/SPARK-8333
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.0
> Environment: Windows7 64bit
>Reporter: sheng
>Priority: Minor
>  Labels: Hive, bulk-closed, metastore, sparksql
> Attachments: test.tar
>
>
> Spark 1.4.0 failed to stop SparkContext.
> {code:title=LocalHiveTest.scala|borderStyle=solid}
>  val sc = new SparkContext("local", "local-hive-test", new SparkConf())
>  val hc = Utils.createHiveContext(sc)
>  ... // execute some HiveQL statements
>  sc.stop()
> {code}
> sc.stop() failed to execute, it threw the following exception:
> {quote}
> 15/06/13 03:19:06 INFO Utils: Shutdown hook called
> 15/06/13 03:19:06 INFO Utils: Deleting directory 
> C:\Users\moshangcheng\AppData\Local\Temp\spark-d6d3c30e-512e-4693-a436-485e2af4baea
> 15/06/13 03:19:06 ERROR Utils: Exception while deleting Spark temp dir: 
> C:\Users\moshangcheng\AppData\Local\Temp\spark-d6d3c30e-512e-4693-a436-485e2af4baea
> java.io.IOException: Failed to delete: 
> C:\Users\moshangcheng\AppData\Local\Temp\spark-d6d3c30e-512e-4693-a436-485e2af4baea
>   at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:963)
>   at 
> org.apache.spark.util.Utils$$anonfun$1$$anonfun$apply$mcV$sp$5.apply(Utils.scala:204)
>   at 
> org.apache.spark.util.Utils$$anonfun$1$$anonfun$apply$mcV$sp$5.apply(Utils.scala:201)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at org.apache.spark.util.Utils$$anonfun$1.apply$mcV$sp(Utils.scala:201)
>   at org.apache.spark.util.SparkShutdownHook.run(Utils.scala:2292)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(Utils.scala:2262)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(Utils.scala:2262)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(Utils.scala:2262)
>   at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(Utils.scala:2262)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(Utils.scala:2262)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(Utils.scala:2262)
>   at scala.util.Try$.apply(Try.scala:161)
>   at 
> org.apache.spark.util.SparkShutdownHookManager.runAll(Utils.scala:2262)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anon$6.run(Utils.scala:2244)
>   at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
> {quote}
> It seems this bug is introduced by this SPARK-6907. In SPARK-6907, a local 
> hive metastore is created in a temp directory. The problem is the local hive 
> metastore is not shut down correctly. At the end of application,  if 
> SparkContext.stop() is called, it tries to delete the temp directory which is 
> still used by the local hive metastore, and throws an exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-8333) Spark failed to delete temp directory created by HiveContext

2020-03-11 Thread Saeed Hassanvand (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-8333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056710#comment-17056710
 ] 

Saeed Hassanvand edited comment on SPARK-8333 at 3/11/20, 7:19 AM:
---

Hi,

It seems that this bug still exists! I encountered this issue in 
javaSparkContext, not hiveContext.

I using spark-submit to run a simple spark example from the official spark 
[documentation page|[https://spark.apache.org/docs/latest/quick-start.html]] [].

 

$HADOOP_HOME: C:\winutils\bin\winutils.exe

Spark Version: spark-2.4.5-bin-hadoop2.7

Windows 10

 

Tnx.


was (Author: saeedhassanvand):
Hi,

It seems that this bug still exists! I encountered this issue in 
javaSparkContext, not hiveContext.

I using spark-submit to run a simple spark 
[example|[https://spark.apache.org/docs/latest/quick-start.html]] from the 
official spark documentation page 
[[https://spark.apache.org/docs/latest/quick-start.html]].

 

$HADOOP_HOME: C:\winutils\bin\winutils.exe

Spark Version: spark-2.4.5-bin-hadoop2.7

Windows 10

 

Tnx.

> Spark failed to delete temp directory created by HiveContext
> 
>
> Key: SPARK-8333
> URL: https://issues.apache.org/jira/browse/SPARK-8333
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.0
> Environment: Windows7 64bit
>Reporter: sheng
>Priority: Minor
>  Labels: Hive, bulk-closed, metastore, sparksql
> Attachments: test.tar
>
>
> Spark 1.4.0 failed to stop SparkContext.
> {code:title=LocalHiveTest.scala|borderStyle=solid}
>  val sc = new SparkContext("local", "local-hive-test", new SparkConf())
>  val hc = Utils.createHiveContext(sc)
>  ... // execute some HiveQL statements
>  sc.stop()
> {code}
> sc.stop() failed to execute, it threw the following exception:
> {quote}
> 15/06/13 03:19:06 INFO Utils: Shutdown hook called
> 15/06/13 03:19:06 INFO Utils: Deleting directory 
> C:\Users\moshangcheng\AppData\Local\Temp\spark-d6d3c30e-512e-4693-a436-485e2af4baea
> 15/06/13 03:19:06 ERROR Utils: Exception while deleting Spark temp dir: 
> C:\Users\moshangcheng\AppData\Local\Temp\spark-d6d3c30e-512e-4693-a436-485e2af4baea
> java.io.IOException: Failed to delete: 
> C:\Users\moshangcheng\AppData\Local\Temp\spark-d6d3c30e-512e-4693-a436-485e2af4baea
>   at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:963)
>   at 
> org.apache.spark.util.Utils$$anonfun$1$$anonfun$apply$mcV$sp$5.apply(Utils.scala:204)
>   at 
> org.apache.spark.util.Utils$$anonfun$1$$anonfun$apply$mcV$sp$5.apply(Utils.scala:201)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at org.apache.spark.util.Utils$$anonfun$1.apply$mcV$sp(Utils.scala:201)
>   at org.apache.spark.util.SparkShutdownHook.run(Utils.scala:2292)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(Utils.scala:2262)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(Utils.scala:2262)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(Utils.scala:2262)
>   at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(Utils.scala:2262)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(Utils.scala:2262)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(Utils.scala:2262)
>   at scala.util.Try$.apply(Try.scala:161)
>   at 
> org.apache.spark.util.SparkShutdownHookManager.runAll(Utils.scala:2262)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anon$6.run(Utils.scala:2244)
>   at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
> {quote}
> It seems this bug is introduced by this SPARK-6907. In SPARK-6907, a local 
> hive metastore is created in a temp directory. The problem is the local hive 
> metastore is not shut down correctly. At the end of application,  if 
> SparkContext.stop() is called, it tries to delete the temp directory which is 
> still used by the local hive metastore, and throws an exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-8333) Spark failed to delete temp directory created by HiveContext

2020-03-11 Thread Saeed Hassanvand (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-8333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056710#comment-17056710
 ] 

Saeed Hassanvand edited comment on SPARK-8333 at 3/11/20, 7:20 AM:
---

Hi,

It seems that this bug still exists! I encountered this issue in 
javaSparkContext, not hiveContext.

I using spark-submit to run a simple spark [example 
|[https://spark.apache.org/docs/latest/quick-start.html#self-contained-applications]]from
 the official spark documentation 
page|[https://spark.apache.org/docs/latest/quick-start.html]].

 

$HADOOP_HOME: C:\winutils\bin\winutils.exe

Spark Version: spark-2.4.5-bin-hadoop2.7

Windows 10

 

Tnx.


was (Author: saeedhassanvand):
Hi,

It seems that this bug still exists! I encountered this issue in 
javaSparkContext, not hiveContext.

I using spark-submit to run a simple spark [example 
|[https://spark.apache.org/docs/latest/quick-start.html]]from the official 
spark documentation 
page|[https://spark.apache.org/docs/latest/quick-start.html]].

 

$HADOOP_HOME: C:\winutils\bin\winutils.exe

Spark Version: spark-2.4.5-bin-hadoop2.7

Windows 10

 

Tnx.

> Spark failed to delete temp directory created by HiveContext
> 
>
> Key: SPARK-8333
> URL: https://issues.apache.org/jira/browse/SPARK-8333
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.0
> Environment: Windows7 64bit
>Reporter: sheng
>Priority: Minor
>  Labels: Hive, bulk-closed, metastore, sparksql
> Attachments: test.tar
>
>
> Spark 1.4.0 failed to stop SparkContext.
> {code:title=LocalHiveTest.scala|borderStyle=solid}
>  val sc = new SparkContext("local", "local-hive-test", new SparkConf())
>  val hc = Utils.createHiveContext(sc)
>  ... // execute some HiveQL statements
>  sc.stop()
> {code}
> sc.stop() failed to execute, it threw the following exception:
> {quote}
> 15/06/13 03:19:06 INFO Utils: Shutdown hook called
> 15/06/13 03:19:06 INFO Utils: Deleting directory 
> C:\Users\moshangcheng\AppData\Local\Temp\spark-d6d3c30e-512e-4693-a436-485e2af4baea
> 15/06/13 03:19:06 ERROR Utils: Exception while deleting Spark temp dir: 
> C:\Users\moshangcheng\AppData\Local\Temp\spark-d6d3c30e-512e-4693-a436-485e2af4baea
> java.io.IOException: Failed to delete: 
> C:\Users\moshangcheng\AppData\Local\Temp\spark-d6d3c30e-512e-4693-a436-485e2af4baea
>   at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:963)
>   at 
> org.apache.spark.util.Utils$$anonfun$1$$anonfun$apply$mcV$sp$5.apply(Utils.scala:204)
>   at 
> org.apache.spark.util.Utils$$anonfun$1$$anonfun$apply$mcV$sp$5.apply(Utils.scala:201)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at org.apache.spark.util.Utils$$anonfun$1.apply$mcV$sp(Utils.scala:201)
>   at org.apache.spark.util.SparkShutdownHook.run(Utils.scala:2292)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(Utils.scala:2262)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(Utils.scala:2262)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(Utils.scala:2262)
>   at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(Utils.scala:2262)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(Utils.scala:2262)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(Utils.scala:2262)
>   at scala.util.Try$.apply(Try.scala:161)
>   at 
> org.apache.spark.util.SparkShutdownHookManager.runAll(Utils.scala:2262)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anon$6.run(Utils.scala:2244)
>   at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
> {quote}
> It seems this bug is introduced by this SPARK-6907. In SPARK-6907, a local 
> hive metastore is created in a temp directory. The problem is the local hive 
> metastore is not shut down correctly. At the end of application,  if 
> SparkContext.stop() is called, it tries to delete the temp directory which is 
> still used by the local hive metastore, and throws an exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-8333) Spark failed to delete temp directory created by HiveContext

2020-03-11 Thread Saeed Hassanvand (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-8333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056710#comment-17056710
 ] 

Saeed Hassanvand edited comment on SPARK-8333 at 3/11/20, 7:20 AM:
---

Hi,

It seems that this bug still exists! I encountered this issue in 
javaSparkContext, not hiveContext.

I using spark-submit to run a simple spark [example 
|[https://spark.apache.org/docs/latest/quick-start.html]]from the official 
spark documentation 
page|[https://spark.apache.org/docs/latest/quick-start.html]].

 

$HADOOP_HOME: C:\winutils\bin\winutils.exe

Spark Version: spark-2.4.5-bin-hadoop2.7

Windows 10

 

Tnx.


was (Author: saeedhassanvand):
Hi,

It seems that this bug still exists! I encountered this issue in 
javaSparkContext, not hiveContext.

I using spark-submit to run a simple spark example from the official spark 
[documentation page|[https://spark.apache.org/docs/latest/quick-start.html]] [].

 

$HADOOP_HOME: C:\winutils\bin\winutils.exe

Spark Version: spark-2.4.5-bin-hadoop2.7

Windows 10

 

Tnx.

> Spark failed to delete temp directory created by HiveContext
> 
>
> Key: SPARK-8333
> URL: https://issues.apache.org/jira/browse/SPARK-8333
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.0
> Environment: Windows7 64bit
>Reporter: sheng
>Priority: Minor
>  Labels: Hive, bulk-closed, metastore, sparksql
> Attachments: test.tar
>
>
> Spark 1.4.0 failed to stop SparkContext.
> {code:title=LocalHiveTest.scala|borderStyle=solid}
>  val sc = new SparkContext("local", "local-hive-test", new SparkConf())
>  val hc = Utils.createHiveContext(sc)
>  ... // execute some HiveQL statements
>  sc.stop()
> {code}
> sc.stop() failed to execute, it threw the following exception:
> {quote}
> 15/06/13 03:19:06 INFO Utils: Shutdown hook called
> 15/06/13 03:19:06 INFO Utils: Deleting directory 
> C:\Users\moshangcheng\AppData\Local\Temp\spark-d6d3c30e-512e-4693-a436-485e2af4baea
> 15/06/13 03:19:06 ERROR Utils: Exception while deleting Spark temp dir: 
> C:\Users\moshangcheng\AppData\Local\Temp\spark-d6d3c30e-512e-4693-a436-485e2af4baea
> java.io.IOException: Failed to delete: 
> C:\Users\moshangcheng\AppData\Local\Temp\spark-d6d3c30e-512e-4693-a436-485e2af4baea
>   at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:963)
>   at 
> org.apache.spark.util.Utils$$anonfun$1$$anonfun$apply$mcV$sp$5.apply(Utils.scala:204)
>   at 
> org.apache.spark.util.Utils$$anonfun$1$$anonfun$apply$mcV$sp$5.apply(Utils.scala:201)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at org.apache.spark.util.Utils$$anonfun$1.apply$mcV$sp(Utils.scala:201)
>   at org.apache.spark.util.SparkShutdownHook.run(Utils.scala:2292)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(Utils.scala:2262)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(Utils.scala:2262)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(Utils.scala:2262)
>   at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(Utils.scala:2262)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(Utils.scala:2262)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(Utils.scala:2262)
>   at scala.util.Try$.apply(Try.scala:161)
>   at 
> org.apache.spark.util.SparkShutdownHookManager.runAll(Utils.scala:2262)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anon$6.run(Utils.scala:2244)
>   at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
> {quote}
> It seems this bug is introduced by this SPARK-6907. In SPARK-6907, a local 
> hive metastore is created in a temp directory. The problem is the local hive 
> metastore is not shut down correctly. At the end of application,  if 
> SparkContext.stop() is called, it tries to delete the temp directory which is 
> still used by the local hive metastore, and throws an exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-8333) Spark failed to delete temp directory created by HiveContext

2020-03-11 Thread Saeed Hassanvand (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-8333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056710#comment-17056710
 ] 

Saeed Hassanvand edited comment on SPARK-8333 at 3/11/20, 7:21 AM:
---

Hi,

It seems that this bug still exists! I encountered this issue in 
javaSparkContext, not hiveContext.

$HADOOP_HOME: C:\winutils\bin\winutils.exe

Spark Version: spark-2.4.5-bin-hadoop2.7

Windows 10

Tnx.


was (Author: saeedhassanvand):
Hi,

It seems that this bug still exists! I encountered this issue in 
javaSparkContext, not hiveContext.

I using spark-submit to run a simple spark [example 
|[https://spark.apache.org/docs/latest/quick-start.html#self-contained-applications]]from
 the official spark documentation 
page|[https://spark.apache.org/docs/latest/quick-start.html]].

 

$HADOOP_HOME: C:\winutils\bin\winutils.exe

Spark Version: spark-2.4.5-bin-hadoop2.7

Windows 10

 

Tnx.

> Spark failed to delete temp directory created by HiveContext
> 
>
> Key: SPARK-8333
> URL: https://issues.apache.org/jira/browse/SPARK-8333
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.0
> Environment: Windows7 64bit
>Reporter: sheng
>Priority: Minor
>  Labels: Hive, bulk-closed, metastore, sparksql
> Attachments: test.tar
>
>
> Spark 1.4.0 failed to stop SparkContext.
> {code:title=LocalHiveTest.scala|borderStyle=solid}
>  val sc = new SparkContext("local", "local-hive-test", new SparkConf())
>  val hc = Utils.createHiveContext(sc)
>  ... // execute some HiveQL statements
>  sc.stop()
> {code}
> sc.stop() failed to execute, it threw the following exception:
> {quote}
> 15/06/13 03:19:06 INFO Utils: Shutdown hook called
> 15/06/13 03:19:06 INFO Utils: Deleting directory 
> C:\Users\moshangcheng\AppData\Local\Temp\spark-d6d3c30e-512e-4693-a436-485e2af4baea
> 15/06/13 03:19:06 ERROR Utils: Exception while deleting Spark temp dir: 
> C:\Users\moshangcheng\AppData\Local\Temp\spark-d6d3c30e-512e-4693-a436-485e2af4baea
> java.io.IOException: Failed to delete: 
> C:\Users\moshangcheng\AppData\Local\Temp\spark-d6d3c30e-512e-4693-a436-485e2af4baea
>   at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:963)
>   at 
> org.apache.spark.util.Utils$$anonfun$1$$anonfun$apply$mcV$sp$5.apply(Utils.scala:204)
>   at 
> org.apache.spark.util.Utils$$anonfun$1$$anonfun$apply$mcV$sp$5.apply(Utils.scala:201)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at org.apache.spark.util.Utils$$anonfun$1.apply$mcV$sp(Utils.scala:201)
>   at org.apache.spark.util.SparkShutdownHook.run(Utils.scala:2292)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(Utils.scala:2262)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(Utils.scala:2262)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(Utils.scala:2262)
>   at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(Utils.scala:2262)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(Utils.scala:2262)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(Utils.scala:2262)
>   at scala.util.Try$.apply(Try.scala:161)
>   at 
> org.apache.spark.util.SparkShutdownHookManager.runAll(Utils.scala:2262)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anon$6.run(Utils.scala:2244)
>   at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
> {quote}
> It seems this bug is introduced by this SPARK-6907. In SPARK-6907, a local 
> hive metastore is created in a temp directory. The problem is the local hive 
> metastore is not shut down correctly. At the end of application,  if 
> SparkContext.stop() is called, it tries to delete the temp directory which is 
> still used by the local hive metastore, and throws an exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31116) PrquetRowConverter does not follow case sensitivity

2020-03-11 Thread Tae-kyeom, Kim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tae-kyeom, Kim updated SPARK-31116:
---
Priority: Blocker  (was: Major)

> PrquetRowConverter does not follow case sensitivity
> ---
>
> Key: SPARK-31116
> URL: https://issues.apache.org/jira/browse/SPARK-31116
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Tae-kyeom, Kim
>Priority: Blocker
>
> After upgrading spark versrion to 3.0.0-SNAPSHOT. Selecting parquet columns 
> got exception in case insensitive manner. Even we set spark.sql.caseSensitive 
> to false. Reading parquet with case ignored schema (which means columns in 
> parquet and catalyst types are same with respect to case insensitive manner)
>  
> To reproduce error executing follow code cause 
> java.lang.IllegalArgumentException
>  
> {code:java}
> val path = "/some/temp/path"
> spark
>   .range(1L)
>   .selectExpr("NAMED_STRUCT('lowercase', id, 'camelCase', id + 1) AS 
> StructColumn")
>   .write.parquet(path)
> val caseInsensitiveSchema = new StructType()
>   .add(
> "StructColumn",
> new StructType()
>   .add("LowerCase", LongType)
>   .add("camelcase", LongType)
> spark.read.schema(caseInsensitiveSchema).parquet(path).show(){code}
> Then we got following error.
> {code:java}
> 23:57:09.077 ERROR org.apache.spark.executor.Executor: Exception in task 0.0 
> in stage 215.0 (TID 366)23:57:09.077 ERROR 
> org.apache.spark.executor.Executor: Exception in task 0.0 in stage 215.0 (TID 
> 366)java.lang.IllegalArgumentException: lowercase does not exist. Available: 
> LowerCase, camelcase at 
> org.apache.spark.sql.types.StructType.$anonfun$fieldIndex$1(StructType.scala:306)
>  at scala.collection.immutable.Map$Map2.getOrElse(Map.scala:147) at 
> org.apache.spark.sql.types.StructType.fieldIndex(StructType.scala:305) at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter.$anonfun$fieldConverters$1(ParquetRowConverter.scala:182)
>  at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) at 
> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at 
> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at 
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at 
> scala.collection.TraversableLike.map(TraversableLike.scala:238) at 
> scala.collection.TraversableLike.map$(TraversableLike.scala:231) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:108) at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter.(ParquetRowConverter.scala:181)
>  at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter.org$apache$spark$sql$execution$datasources$parquet$ParquetRowConverter$$newConverter(ParquetRowConverter.scala:351)
>  at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter.$anonfun$fieldConverters$1(ParquetRowConverter.scala:185)
>  at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) at 
> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at 
> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at 
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at 
> scala.collection.TraversableLike.map(TraversableLike.scala:238) at 
> scala.collection.TraversableLike.map$(TraversableLike.scala:231) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:108) at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter.(ParquetRowConverter.scala:181)
>  at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetRecordMaterializer.(ParquetRecordMaterializer.scala:43)
>  at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetReadSupport.prepareForRead(ParquetReadSupport.scala:130)
>  at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:204)
>  at 
> org.apache.parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:182)
>  at 
> org.apache.parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:140)
>  at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.$anonfun$buildReaderWithPartitionValues$2(ParquetFileFormat.scala:341)
>  at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:116)
>  at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:169)
>  at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:93)
>  at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) at 
> scala.collection.Iter

[jira] [Comment Edited] (SPARK-8333) Spark failed to delete temp directory created by HiveContext

2020-03-11 Thread Saeed Hassanvand (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-8333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056710#comment-17056710
 ] 

Saeed Hassanvand edited comment on SPARK-8333 at 3/11/20, 7:25 AM:
---

Hi,

It seems that this bug still exists! I encountered this issue in 
javaSparkContext, not hiveContext.

 

{{20/03/10 15:28:12 INFO SparkUI: Stopped Spark web UI at 
http://DESKTOP-0H2AC9E:4040}}
{{20/03/10 15:28:12 INFO MapOutputTrackerMasterEndpoint: 
MapOutputTrackerMasterEndpoint stopped!}}
{{20/03/10 15:28:12 INFO MemoryStore: MemoryStore cleared}}
{{20/03/10 15:28:12 INFO BlockManager: BlockManager stopped}}
{{20/03/10 15:28:12 INFO BlockManagerMaster: BlockManagerMaster stopped}}
{{20/03/10 15:28:12 INFO 
OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
OutputCommitCoordinator stopped!}}
{{20/03/10 15:28:12 WARN SparkEnv: Exception while deleting Spark temp dir: 
C:\Users\pc-monster\AppData\Local\Temp\spark-e5bd78e4-5161-471c-9a51-4cafd16ffd36\userFiles-624b6e50-2079-46eb-b703-a121925a4e49}}
{{java.io.IOException: Failed to delete: 
C:\Users\pc-monster\AppData\Local\Temp\spark-e5bd78e4-5161-471c-9a51-4cafd16ffd36\userFiles-624b6e50-2079-46eb-b703-a121925a4e49\simple-spark-app-1.0-SNAPSHOT.jar}}
{{ at 
org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:144)}}
{{ at 
org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118)}}
{{ at 
org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:128)}}
{{ at 
org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118)}}
{{ at 
org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:91)}}
{{ at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1062)}}
{{ at org.apache.spark.SparkEnv.stop(SparkEnv.scala:103)}}
{{ at 
org.apache.spark.SparkContext$$anonfun$stop$11.apply$mcV$sp(SparkContext.scala:1974)}}
{{ at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1340)}}
{{ at org.apache.spark.SparkContext.stop(SparkContext.scala:1973)}}
{{ at org.apache.spark.sql.SparkSession.stop(SparkSession.scala:712)}}
{{ at org.example.Application.main(Application.java:18)}}
{{ at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)}}
{{ at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)}}
{{ at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)}}
{{ at java.lang.reflect.Method.invoke(Method.java:498)}}
{{ at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)}}
{{ at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)}}
{{ at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)}}
{{ at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)}}
{{ at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)}}
{{ at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)}}
{{ at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)}}
{{ at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)}}
{{20/03/10 15:28:12 INFO SparkContext: Successfully stopped SparkContext}}
{{20/03/10 15:28:12 INFO ShutdownHookManager: Shutdown hook called}}
{{20/03/10 15:28:12 INFO ShutdownHookManager: Deleting directory 
C:\Users\pc-monster\AppData\Local\Temp\spark-e5bd78e4-5161-471c-9a51-4cafd16ffd36\userFiles-624b6e50-2079-46eb-b703-a121925a4e49}}
{{20/03/10 15:28:12 ERROR ShutdownHookManager: Exception while deleting Spark 
temp dir: 
C:\Users\pc-monster\AppData\Local\Temp\spark-e5bd78e4-5161-471c-9a51-4cafd16ffd36\userFiles-624b6e50-2079-46eb-b703-a121925a4e49}}
{{java.io.IOException: Failed to delete: 
C:\Users\pc-monster\AppData\Local\Temp\spark-e5bd78e4-5161-471c-9a51-4cafd16ffd36\userFiles-624b6e50-2079-46eb-b703-a121925a4e49\simple-spark-app-1.0-SNAPSHOT.jar}}
{{ at 
org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:144)}}
{{ at 
org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118)}}
{{ at 
org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:128)}}
{{ at 
org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118)}}
{{ at 
org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:91)}}
{{ at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1062)}}
{{ at 
org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:65)}}
{{ at 
org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:62)}}
{{ at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)}}
{{ at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)}}
{{ at 
org.apache.spark.util.Shutdow

[jira] [Updated] (SPARK-31114) Constraints inferred from inequality constraints(phase 2)

2020-03-11 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-31114:

Summary: Constraints inferred from inequality constraints(phase 2)  (was: 
isNotNull should be inferred from equality constraint without cast)

> Constraints inferred from inequality constraints(phase 2)
> -
>
> Key: SPARK-31114
> URL: https://issues.apache.org/jira/browse/SPARK-31114
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31117) reduce the test time of DateTimeUtilsSuite

2020-03-11 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-31117:
---

 Summary: reduce the test time of DateTimeUtilsSuite
 Key: SPARK-31117
 URL: https://issues.apache.org/jira/browse/SPARK-31117
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 3.0.0
Reporter: Wenchen Fan
Assignee: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-8333) Spark failed to delete temp directory created by HiveContext

2020-03-11 Thread Saeed Hassanvand (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-8333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056710#comment-17056710
 ] 

Saeed Hassanvand edited comment on SPARK-8333 at 3/11/20, 7:35 AM:
---

Hi,

It seems that this bug still exists! I encountered this issue in 
JavaSparkContext, not HiveContext.

 

{{20/03/10 15:28:12 INFO SparkUI: Stopped Spark web UI at 
[http://DESKTOP-0H2AC9E:4040|http://desktop-0h2ac9e:4040/]}}
 {{20/03/10 15:28:12 INFO MapOutputTrackerMasterEndpoint: 
MapOutputTrackerMasterEndpoint stopped!}}
 {{20/03/10 15:28:12 INFO MemoryStore: MemoryStore cleared}}
 {{20/03/10 15:28:12 INFO BlockManager: BlockManager stopped}}
 {{20/03/10 15:28:12 INFO BlockManagerMaster: BlockManagerMaster stopped}}
 {{20/03/10 15:28:12 INFO 
OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
OutputCommitCoordinator stopped!}}
 {{20/03/10 15:28:12 WARN SparkEnv: Exception while deleting Spark temp dir: 
C:\Users\pc-monster\AppData\Local\Temp\spark-e5bd78e4-5161-471c-9a51-4cafd16ffd36\userFiles-624b6e50-2079-46eb-b703-a121925a4e49}}
 {{java.io.IOException: Failed to delete: 
C:\Users\pc-monster\AppData\Local\Temp\spark-e5bd78e4-5161-471c-9a51-4cafd16ffd36\userFiles-624b6e50-2079-46eb-b703-a121925a4e49\simple-spark-app-1.0-SNAPSHOT.jar}}
 \{{ at 
org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:144)}}
 \{{ at 
org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118)}}
 \{{ at 
org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:128)}}
 \{{ at 
org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118)}}
 \{{ at 
org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:91)}}
 \{{ at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1062)}}
 \{{ at org.apache.spark.SparkEnv.stop(SparkEnv.scala:103)}}
 \{{ at 
org.apache.spark.SparkContext$$anonfun$stop$11.apply$mcV$sp(SparkContext.scala:1974)}}
 \{{ at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1340)}}
 \{{ at org.apache.spark.SparkContext.stop(SparkContext.scala:1973)}}
 \{{ at org.apache.spark.sql.SparkSession.stop(SparkSession.scala:712)}}
 \{{ at org.example.Application.main(Application.java:18)}}
 \{{ at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)}}
 \{{ at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)}}
 \{{ at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)}}
 \{{ at java.lang.reflect.Method.invoke(Method.java:498)}}
 \{{ at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)}}
 \{{ at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)}}
 \{{ at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)}}
 \{{ at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)}}
 \{{ at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)}}
 \{{ at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)}}
 \{{ at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)}}
 \{{ at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)}}
 {{20/03/10 15:28:12 INFO SparkContext: Successfully stopped SparkContext}}
 {{20/03/10 15:28:12 INFO ShutdownHookManager: Shutdown hook called}}
 {{20/03/10 15:28:12 INFO ShutdownHookManager: Deleting directory 
C:\Users\pc-monster\AppData\Local\Temp\spark-e5bd78e4-5161-471c-9a51-4cafd16ffd36\userFiles-624b6e50-2079-46eb-b703-a121925a4e49}}
 {{20/03/10 15:28:12 ERROR ShutdownHookManager: Exception while deleting Spark 
temp dir: 
C:\Users\pc-monster\AppData\Local\Temp\spark-e5bd78e4-5161-471c-9a51-4cafd16ffd36\userFiles-624b6e50-2079-46eb-b703-a121925a4e49}}
 {{java.io.IOException: Failed to delete: 
C:\Users\pc-monster\AppData\Local\Temp\spark-e5bd78e4-5161-471c-9a51-4cafd16ffd36\userFiles-624b6e50-2079-46eb-b703-a121925a4e49\simple-spark-app-1.0-SNAPSHOT.jar}}
 \{{ at 
org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:144)}}
 \{{ at 
org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118)}}
 \{{ at 
org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:128)}}
 \{{ at 
org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118)}}
 \{{ at 
org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:91)}}
 \{{ at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1062)}}
 \{{ at 
org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:65)}}
 \{{ at 
org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:62)}}
 \{{ at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)}}
 \{

[jira] [Updated] (SPARK-31114) Constraints inferred from inequality constraints(phase 2)

2020-03-11 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-31114:

Issue Type: Improvement  (was: Bug)

> Constraints inferred from inequality constraints(phase 2)
> -
>
> Key: SPARK-31114
> URL: https://issues.apache.org/jira/browse/SPARK-31114
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31118) Add version information to the configuration of K8S

2020-03-11 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-31118:
--

 Summary: Add version information to the configuration of K8S
 Key: SPARK-31118
 URL: https://issues.apache.org/jira/browse/SPARK-31118
 Project: Spark
  Issue Type: Sub-task
  Components: Kubernetes
Affects Versions: 3.1.0
Reporter: jiaan.geng


resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31119) Add interval value support for extract expression as source

2020-03-11 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-31119:
-
Issue Type: Improvement  (was: Bug)

> Add interval value support for extract expression as source
> ---
>
> Key: SPARK-31119
> URL: https://issues.apache.org/jira/browse/SPARK-31119
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Kent Yao
>Priority: Major
>
> {code:java}
>  ::= EXTRACT   FROM  source> 
>  ::=  | 
> {code}
> We now only support datetime values as extract source for expression but it's 
> alternative function `date_part` supports both datetime and interval.
> For ANSI compliance and the consistency between extract and `date_part`, we 
> support intervals for extract expressions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31119) Add interval value support for extract expression as source

2020-03-11 Thread Kent Yao (Jira)
Kent Yao created SPARK-31119:


 Summary: Add interval value support for extract expression as 
source
 Key: SPARK-31119
 URL: https://issues.apache.org/jira/browse/SPARK-31119
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0, 3.1.0
Reporter: Kent Yao


{code:java}
 ::= EXTRACT   FROM  

 ::=  | 
{code}

We now only support datetime values as extract source for expression but it's 
alternative function `date_part` supports both datetime and interval.

For ANSI compliance and the consistency between extract and `date_part`, we 
support intervals for extract expressions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30541) Flaky test: org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite

2020-03-11 Thread Gabor Somogyi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056820#comment-17056820
 ] 

Gabor Somogyi commented on SPARK-30541:
---

The first problem is obvious, Kafka is not coming up all the time consistently. 
I think not much to do there unless Kafka community is fixing the issue. As a 
temporary solution retry can be used in the test.

 
The second problem is also coming from the Kafka side:
{code:java}
[info]   Cause: org.apache.kafka.common.KafkaException: 
javax.security.auth.login.LoginException: Client not found in Kerberos database 
(6) - Client not found in Kerberos database
{code}

When I've reproduced the issue locally I've realised that:
* KDC didn't throw any exception while the mentioned user created
* The keytab file is readable and able to do kinit with it

Maybe it's another flaky behaviour on the Kafka side?!

All in all since the broker is flaky and KafkaAdminClient shown also some 
flakyness my suggestion is to use testRetry until the mentioned problems are 
not solved in Kafka.


> Flaky test: org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite
> ---
>
> Key: SPARK-30541
> URL: https://issues.apache.org/jira/browse/SPARK-30541
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Priority: Blocker
> Attachments: consoleText_NOK.txt, consoleText_OK.txt, 
> unit-tests_NOK.log, unit-tests_OK.log
>
>
> The test suite has been failing intermittently as of now:
> [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116862/testReport/]
>  
> org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.(It is not a test it 
> is a sbt.testing.SuiteSelector)
>   
> {noformat}
> Error Details
> org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to 
> eventually never returned normally. Attempted 3939 times over 
> 1.000122353532 minutes. Last failure message: KeeperErrorCode = 
> AuthFailed for /brokers/ids.
> Stack Trace
> sbt.ForkMain$ForkError: 
> org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to 
> eventually never returned normally. Attempted 3939 times over 
> 1.000122353532 minutes. Last failure message: KeeperErrorCode = 
> AuthFailed for /brokers/ids.
>   at 
> org.scalatest.concurrent.Eventually.tryTryAgain$1(Eventually.scala:432)
>   at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:439)
>   at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:391)
>   at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:479)
>   at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:337)
>   at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:336)
>   at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:479)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:292)
>   at 
> org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.beforeAll(KafkaDelegationTokenSuite.scala:49)
>   at 
> org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
>   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
>   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
>   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:58)
>   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317)
>   at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:296)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:286)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: sbt.ForkMain$ForkError: 
> org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = 
> AuthFailed for /brokers/ids
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:130)
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
>   at 
> kafka.zookeeper.AsyncResponse.resultException(ZooKeeperClient.scala:554)
>   at kafka.zk.KafkaZkClient.getChildren(KafkaZkClient.scala:719)
>   at kafka.zk.KafkaZkClient.getSortedBrokerList(KafkaZkClient.scala:455)
>   at 
> kafka.zk.KafkaZkClient.getAllBrokersInCluster(KafkaZkClient.scala:404)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.$anonfun$setup$3(KafkaTestUtils.scala:293)
>   at 
> org.scalatest.con

[jira] [Assigned] (SPARK-31071) Spark Encoders.bean() should allow marking non-null fields in its Spark schema

2020-03-11 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-31071:
---

Assignee: L. C. Hsieh

> Spark Encoders.bean() should allow marking non-null fields in its Spark schema
> --
>
> Key: SPARK-31071
> URL: https://issues.apache.org/jira/browse/SPARK-31071
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Kyrill Alyoshin
>Assignee: L. C. Hsieh
>Priority: Major
>
> Spark _Encoders.bean()_ method should allow the generated StructType schema 
> fields be *non-nullable*.
> Currently, any non-primitive type is automatically _nullable_. It is 
> hard-coded in the _org.apache.spark.sql.catalyst.JavaTypeReference_ class.  
> This can lead to rather interesting situations... For example, let's say I 
> want to save a dataframe using an Avro format with my own non-spark generated 
> Avro schema. Let's also say that my Avro schema has a field that is non-null 
> (i.e., not a union type). Well, it appears *impossible* to store a dataframe 
> using such an Avro schema since Spark would assume that the field is nullable 
> (as it is in its own schema) which would conflict with Avro schema semantics 
> and throw an exception.
> I propose making a change to the _JavaTypeReference_ class to observe the 
> JSR-305 _Nonnull_ annotation (and its children) on the provided bean class 
> during StructType schema generation. This would allow bean creators to 
> control the resulting Spark schema so much better.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-31071) Spark Encoders.bean() should allow marking non-null fields in its Spark schema

2020-03-11 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-31071.
-
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 27851
[https://github.com/apache/spark/pull/27851]

> Spark Encoders.bean() should allow marking non-null fields in its Spark schema
> --
>
> Key: SPARK-31071
> URL: https://issues.apache.org/jira/browse/SPARK-31071
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Kyrill Alyoshin
>Assignee: L. C. Hsieh
>Priority: Major
> Fix For: 3.1.0
>
>
> Spark _Encoders.bean()_ method should allow the generated StructType schema 
> fields be *non-nullable*.
> Currently, any non-primitive type is automatically _nullable_. It is 
> hard-coded in the _org.apache.spark.sql.catalyst.JavaTypeReference_ class.  
> This can lead to rather interesting situations... For example, let's say I 
> want to save a dataframe using an Avro format with my own non-spark generated 
> Avro schema. Let's also say that my Avro schema has a field that is non-null 
> (i.e., not a union type). Well, it appears *impossible* to store a dataframe 
> using such an Avro schema since Spark would assume that the field is nullable 
> (as it is in its own schema) which would conflict with Avro schema semantics 
> and throw an exception.
> I propose making a change to the _JavaTypeReference_ class to observe the 
> JSR-305 _Nonnull_ annotation (and its children) on the provided bean class 
> during StructType schema generation. This would allow bean creators to 
> control the resulting Spark schema so much better.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31120) Support enabling maven profiles while importing via sbt on Intellij IDEA.

2020-03-11 Thread Prashant Sharma (Jira)
Prashant Sharma created SPARK-31120:
---

 Summary: Support enabling maven profiles while importing via sbt 
on Intellij IDEA.
 Key: SPARK-31120
 URL: https://issues.apache.org/jira/browse/SPARK-31120
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.0.0
Reporter: Prashant Sharma


At the moment there is no easy way to enable maven profiles if the intellij 
IDEA project is imported via SBT. Only other work around is to set either OS 
level Environment variable SBT_MAVEN_PROFILES. 
So, in this patch we add a property sbt.maven.profiles, which can be configured 
at the time of importing spark in IntelliJ IDEA. 
See attached image for steps to set it up.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31120) Support enabling maven profiles while importing via sbt on Intellij IDEA.

2020-03-11 Thread Prashant Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-31120:

Attachment: Screenshot 2020-03-11 at 4.09.57 PM.png

> Support enabling maven profiles while importing via sbt on Intellij IDEA.
> -
>
> Key: SPARK-31120
> URL: https://issues.apache.org/jira/browse/SPARK-31120
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Prashant Sharma
>Priority: Minor
> Attachments: Screenshot 2020-03-11 at 4.09.57 PM.png
>
>
> At the moment there is no easy way to enable maven profiles if the intellij 
> IDEA project is imported via SBT. Only other work around is to set either OS 
> level Environment variable SBT_MAVEN_PROFILES. 
> So, in this patch we add a property sbt.maven.profiles, which can be 
> configured at the time of importing spark in IntelliJ IDEA. 
> See attached image for steps to set it up.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31120) Support enabling maven profiles while importing via sbt on Intellij IDEA.

2020-03-11 Thread Prashant Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-31120:

Attachment: Screenshot 2020-03-11 at 4.18.09 PM.png

> Support enabling maven profiles while importing via sbt on Intellij IDEA.
> -
>
> Key: SPARK-31120
> URL: https://issues.apache.org/jira/browse/SPARK-31120
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Prashant Sharma
>Priority: Minor
> Attachments: Screenshot 2020-03-11 at 4.09.57 PM.png, Screenshot 
> 2020-03-11 at 4.18.09 PM.png
>
>
> At the moment there is no easy way to enable maven profiles if the intellij 
> IDEA project is imported via SBT. Only other work around is to set either OS 
> level Environment variable SBT_MAVEN_PROFILES. 
> So, in this patch we add a property sbt.maven.profiles, which can be 
> configured at the time of importing spark in IntelliJ IDEA. 
> See attached image for steps to set it up.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31039) Unable to use vendor specific datatypes with JDBC (MSSQL)

2020-03-11 Thread Frank Oosterhuis (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056863#comment-17056863
 ] 

Frank Oosterhuis commented on SPARK-31039:
--

[~hyukjin.kwon]: I'm not sure I agree with the title change. I suppose this 
potentially is a problem for any database.

> Unable to use vendor specific datatypes with JDBC (MSSQL)
> -
>
> Key: SPARK-31039
> URL: https://issues.apache.org/jira/browse/SPARK-31039
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.5
>Reporter: Frank Oosterhuis
>Priority: Major
>
> I'm trying to create a table in MSSQL with a time(7) type.
> For this I'm using the createTableColumnTypes option like "CallStartTime 
> time(7)", with driver 
> "{color:#212121}com.microsoft.sqlserver.jdbc.SQLServerDriver"{color}
> I'm getting an error:  
> {color:#212121}org.apache.spark.sql.catalyst.parser.ParseException: DataType 
> time(7) is not supported.(line 1, pos 43){color}
> {color:#212121}What is then the point of using this option?{color}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31120) Support enabling maven profiles for importing via sbt on Intellij IDEA.

2020-03-11 Thread Prashant Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-31120:

Summary: Support enabling maven profiles for importing via sbt on Intellij 
IDEA.  (was: Support enabling maven profiles while importing via sbt on 
Intellij IDEA.)

> Support enabling maven profiles for importing via sbt on Intellij IDEA.
> ---
>
> Key: SPARK-31120
> URL: https://issues.apache.org/jira/browse/SPARK-31120
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Prashant Sharma
>Priority: Minor
> Attachments: Screenshot 2020-03-11 at 4.09.57 PM.png, Screenshot 
> 2020-03-11 at 4.18.09 PM.png
>
>
> At the moment there is no easy way to enable maven profiles if the intellij 
> IDEA project is imported via SBT. Only other work around is to set either OS 
> level Environment variable SBT_MAVEN_PROFILES. 
> So, in this patch we add a property sbt.maven.profiles, which can be 
> configured at the time of importing spark in IntelliJ IDEA. 
> See attached image for steps to set it up.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31121) Spark API for Table Metadata

2020-03-11 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-31121:
---

 Summary: Spark API for Table Metadata
 Key: SPARK-31121
 URL: https://issues.apache.org/jira/browse/SPARK-31121
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.0.0
Reporter: Wenchen Fan
Assignee: Ryan Blue


Details please see the SPIP doc: 
https://docs.google.com/document/d/1zLFiA1VuaWeVxeTDXNg8bL6GP3BVoOZBkewFtEnjEoo/edit#heading=h.m45webtwxf2d



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-31121) Spark API for Table Metadata

2020-03-11 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-31121.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

> Spark API for Table Metadata
> 
>
> Key: SPARK-31121
> URL: https://issues.apache.org/jira/browse/SPARK-31121
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Assignee: Ryan Blue
>Priority: Major
> Fix For: 3.0.0
>
>
> Details please see the SPIP doc: 
> https://docs.google.com/document/d/1zLFiA1VuaWeVxeTDXNg8bL6GP3BVoOZBkewFtEnjEoo/edit#heading=h.m45webtwxf2d



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24252) DataSourceV2: Add catalog support

2020-03-11 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-24252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-24252:

Parent Issue: SPARK-31121  (was: SPARK-22386)

> DataSourceV2: Add catalog support
> -
>
> Key: SPARK-24252
> URL: https://issues.apache.org/jira/browse/SPARK-24252
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Ryan Blue
>Assignee: Ryan Blue
>Priority: Major
> Fix For: 3.0.0
>
>
> DataSourceV2 needs to support create and drop catalog operations in order to 
> support logical plans like CTAS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27661) Add SupportsNamespaces interface for v2 catalogs

2020-03-11 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-27661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-27661:

Parent: SPARK-31121
Issue Type: Sub-task  (was: Improvement)

> Add SupportsNamespaces interface for v2 catalogs
> 
>
> Key: SPARK-27661
> URL: https://issues.apache.org/jira/browse/SPARK-27661
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Ryan Blue
>Assignee: Ryan Blue
>Priority: Major
> Fix For: 3.0.0
>
>
> Some catalogs support namespace operations, like creating or dropping 
> namespaces. The v2 API should have a way to expose these operations to Spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28341) create a public API for V2SessionCatalog

2020-03-11 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-28341:

Parent: SPARK-31121
Issue Type: Sub-task  (was: Improvement)

> create a public API for V2SessionCatalog
> 
>
> Key: SPARK-28341
> URL: https://issues.apache.org/jira/browse/SPARK-28341
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29511) DataSourceV2: Support CREATE NAMESPACE

2020-03-11 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-29511:

Parent Issue: SPARK-31121  (was: SPARK-22386)

> DataSourceV2: Support CREATE NAMESPACE
> --
>
> Key: SPARK-29511
> URL: https://issues.apache.org/jira/browse/SPARK-29511
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Terry Kim
>Assignee: Terry Kim
>Priority: Major
> Fix For: 3.0.0
>
>
> CREATE NAMESPACE needs to support v2 catalogs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29039) centralize the catalog and table lookup logic

2020-03-11 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-29039:

Parent: SPARK-31121
Issue Type: Sub-task  (was: Improvement)

> centralize the catalog and table lookup logic
> -
>
> Key: SPARK-29039
> URL: https://issues.apache.org/jira/browse/SPARK-29039
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29609) DataSourceV2: Support DROP NAMESPACE

2020-03-11 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-29609:

Parent Issue: SPARK-31121  (was: SPARK-22386)

> DataSourceV2: Support DROP NAMESPACE
> 
>
> Key: SPARK-29609
> URL: https://issues.apache.org/jira/browse/SPARK-29609
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Terry Kim
>Assignee: Terry Kim
>Priority: Major
> Fix For: 3.0.0
>
>
> DROP NAMESPACE needs to support v2 catalogs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29734) DataSourceV2: Support SHOW CURRENT NAMESPACE

2020-03-11 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-29734:

Parent Issue: SPARK-31121  (was: SPARK-22386)

> DataSourceV2: Support SHOW CURRENT NAMESPACE
> 
>
> Key: SPARK-29734
> URL: https://issues.apache.org/jira/browse/SPARK-29734
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Terry Kim
>Assignee: Terry Kim
>Priority: Major
> Fix For: 3.0.0
>
>
> Datasource V2 can support multiple catalogs/namespaces. Having "SHOW CURRENT 
> NAMESPACE" to retrieve the current catalog/namespace info will be useful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30352) DataSourceV2: Add CURRENT_CATALOG function

2020-03-11 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-30352:

Parent Issue: SPARK-31121  (was: SPARK-22386)

> DataSourceV2: Add CURRENT_CATALOG function
> --
>
> Key: SPARK-30352
> URL: https://issues.apache.org/jira/browse/SPARK-30352
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Kent Yao
>Priority: Major
>
> CURRENT_CATALOG is a general value specification in SQL Standard, described 
> as:
> {quote}The value specified by CURRENT_CATALOG is the character string that 
> represents the current default catalog name.{quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28856) DataSourceV2: Support SHOW DATABASES

2020-03-11 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-28856:

Parent Issue: SPARK-31121  (was: SPARK-22386)

> DataSourceV2: Support SHOW DATABASES
> 
>
> Key: SPARK-28856
> URL: https://issues.apache.org/jira/browse/SPARK-28856
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Terry Kim
>Assignee: Terry Kim
>Priority: Major
> Fix For: 3.0.0
>
>
> SHOW DATABASES needs to support v2 catalogs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27857) DataSourceV2: Support ALTER TABLE statements in catalyst SQL parser

2020-03-11 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-27857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-27857:

Parent Issue: SPARK-31121  (was: SPARK-22386)

> DataSourceV2: Support ALTER TABLE statements in catalyst SQL parser
> ---
>
> Key: SPARK-27857
> URL: https://issues.apache.org/jira/browse/SPARK-27857
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: Ryan Blue
>Assignee: Ryan Blue
>Priority: Major
> Fix For: 3.0.0
>
>
> ALTER TABLE statements should be supported for v2 tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28319) DataSourceV2: Support SHOW TABLES

2020-03-11 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-28319:

Parent Issue: SPARK-31121  (was: SPARK-22386)

> DataSourceV2: Support SHOW TABLES
> -
>
> Key: SPARK-28319
> URL: https://issues.apache.org/jira/browse/SPARK-28319
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Ryan Blue
>Assignee: Terry Kim
>Priority: Major
> Fix For: 3.0.0
>
>
> SHOW TABLES needs to support v2 catalogs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31121) Spark API for Table Metadata

2020-03-11 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-31121:

Description: 
Details please see the SPIP doc: 
https://docs.google.com/document/d/1zLFiA1VuaWeVxeTDXNg8bL6GP3BVoOZBkewFtEnjEoo/edit#heading=h.m45webtwxf2d

This will bring multi-catalog support to Spark and allow external catalog 
implementations.

  was:Details please see the SPIP doc: 
https://docs.google.com/document/d/1zLFiA1VuaWeVxeTDXNg8bL6GP3BVoOZBkewFtEnjEoo/edit#heading=h.m45webtwxf2d


> Spark API for Table Metadata
> 
>
> Key: SPARK-31121
> URL: https://issues.apache.org/jira/browse/SPARK-31121
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Assignee: Ryan Blue
>Priority: Major
> Fix For: 3.0.0
>
>
> Details please see the SPIP doc: 
> https://docs.google.com/document/d/1zLFiA1VuaWeVxeTDXNg8bL6GP3BVoOZBkewFtEnjEoo/edit#heading=h.m45webtwxf2d
> This will bring multi-catalog support to Spark and allow external catalog 
> implementations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28139) DataSourceV2: Add AlterTable v2 implementation

2020-03-11 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-28139:

Parent Issue: SPARK-31121  (was: SPARK-22386)

> DataSourceV2: Add AlterTable v2 implementation
> --
>
> Key: SPARK-28139
> URL: https://issues.apache.org/jira/browse/SPARK-28139
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Ryan Blue
>Assignee: Ryan Blue
>Priority: Major
> Fix For: 3.0.0
>
>
> SPARK-27857 updated the parser for v2 ALTER TABLE statements. This tracks 
> implementing those using a v2 catalog.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31122) Add support for sparse matrix multiplication

2020-03-11 Thread Alex Favaro (Jira)
Alex Favaro created SPARK-31122:
---

 Summary: Add support for sparse matrix multiplication
 Key: SPARK-31122
 URL: https://issues.apache.org/jira/browse/SPARK-31122
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Affects Versions: 2.4.5
Reporter: Alex Favaro


MLlib does not currently support multiplication of sparse matrices. When 
multiplying block matrices with sparse blocks, the sparse blocks are first 
converted to dense matrices. This leads to large increases in memory 
utilization for certain problems.

I'd like to propose adding support for local sparse matrix multiplication to 
MLlib, as well as local dense-sparse matrix multiplication. With these changes, 
the case clause which converts sparse blocks to dense matrices in the block 
matrix multiply method could be removed.

One question is whether the result of sparse-sparse matrix multiplication 
should be sparse or dense, since the product of two sparse matrices can be 
quite dense depending on the matrices. I propose returning a sparse matrix, 
however, and letting the application convert the result to a dense matrix if 
necessary. There is some precedent for this with the block matrix add method, 
which returns sparse matrix blocks even when adding a sparse matrix block to a 
dense matrix block.

As for the implementation, one option would be to leverage Breeze's existing 
sparse matrix multiplication, as MLlib currently does for matrix addition. 
Another would be to add support for sparse-sparse multiplication to the BLAS 
wrapper, which would be consistent with the sparse-dense multiplication 
implementation and could support a more efficient routine for transposed 
matrices (as Breeze does not support transposed matrices). The exact algorithm 
would follow that laid out in ["Sparse Matrix Multiplication Package 
(SMMP)"|https://www.i2m.univ-amu.fr/perso/abdallah.bradji/multp_sparse.pdf].

This would likely not be a huge change but would take some time to test and 
benchmark properly, so before I put up a code diff I would be curious to know:
* Is there any interest in supporting this functionality in MLlib?
* Is there a preference for the return type of sparse-sparse multiplication? 
(i.e. sparse or dense)
* Is there a preference for the implementation? (Breeze vs a built-in one)

Some tickets which include related functionality or identified this particular 
issue but never solved it: SPARK-16820, SPARK-3418.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-31111) Fix interval output issue in ExtractBenchmark

2020-03-11 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-3?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-3.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 27867
[https://github.com/apache/spark/pull/27867]

> Fix interval output issue in ExtractBenchmark 
> --
>
> Key: SPARK-3
> URL: https://issues.apache.org/jira/browse/SPARK-3
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Minor
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31111) Fix interval output issue in ExtractBenchmark

2020-03-11 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-3?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-3:
---

Assignee: Kent Yao

> Fix interval output issue in ExtractBenchmark 
> --
>
> Key: SPARK-3
> URL: https://issues.apache.org/jira/browse/SPARK-3
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Deleted] (SPARK-31031) Backward Compatibility for Parsing Datetime

2020-03-11 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan deleted SPARK-31031:



> Backward Compatibility for Parsing Datetime
> ---
>
> Key: SPARK-31031
> URL: https://issues.apache.org/jira/browse/SPARK-31031
> Project: Spark
>  Issue Type: Sub-task
>Reporter: Yuanjian Li
>Priority: Major
>
> Mirror issue for SPARK-31030, because of the sub-task can't add sub-task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31030) Backward Compatibility for Parsing and Formatting Datetime

2020-03-11 Thread Yuanjian Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuanjian Li updated SPARK-31030:

Parent: SPARK-26904
Issue Type: Sub-task  (was: Improvement)

> Backward Compatibility for Parsing and Formatting Datetime
> --
>
> Key: SPARK-31030
> URL: https://issues.apache.org/jira/browse/SPARK-31030
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuanjian Li
>Assignee: Yuanjian Li
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: image-2020-03-04-10-54-05-208.png, 
> image-2020-03-04-10-54-13-238.png
>
>
> *Background*
> In Spark version 2.4 and earlier, datetime parsing, formatting and conversion 
> are performed by using the hybrid calendar ([Julian + 
> Gregorian|https://docs.oracle.com/javase/7/docs/api/java/util/GregorianCalendar.html]).
>  
> Since the Proleptic Gregorian calendar is de-facto calendar worldwide, as 
> well as the chosen one in ANSI SQL standard, Spark 3.0 switches to it by 
> using Java 8 API classes (the java.time packages that are based on [ISO 
> chronology|https://docs.oracle.com/javase/8/docs/api/java/time/chrono/IsoChronology.html]
>  ).
> The switching job is completed in SPARK-26651. 
>  
> *Problem*
> Switching to Java 8 datetime API breaks the backward compatibility of Spark 
> 2.4 and earlier when parsing datetime. Spark need its own patters definition 
> on datetime parsing and formatting.
>  
> *Solution*
> To avoid unexpected result changes after the underlying datetime API switch, 
> we propose the following solution. 
>  * Introduce the fallback mechanism: when the Java 8-based parser fails, we 
> need to detect these behavior differences by falling back to the legacy 
> parser, and fail with a user-friendly error message to tell users what gets 
> changed and how to fix the pattern.
>  * Document the Spark’s datetime patterns: The date-time formatter of Spark 
> is decoupled with the Java patterns. The Spark’s patterns are mainly based on 
> the [Java 7’s 
> pattern|https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html]
>  (for better backward compatibility) with the customized logic (caused by the 
> breaking changes between [Java 
> 7|https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html] 
> and [Java 
> 8|https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html]
>  pattern string). Below are the customized rules:
> ||Pattern||Java 7||Java 8|| Example||Rule||
> |u|Day number of week (1 = Monday, ..., 7 = Sunday)|Year (Different with y, u 
> accept a negative value to represent BC, while y should be used together with 
> G to do the same thing.)|!image-2020-03-04-10-54-05-208.png!  |Substitute ‘u’ 
> to ‘e’ and use Java 8 parser to parse the string. If parsable, return the 
> result; otherwise, fall back to ‘u’, and then use the legacy Java 7 parser to 
> parse. When it is successfully parsed, throw an exception and ask users to 
> change the pattern strings or turn on the legacy mode; otherwise, return NULL 
> as what Spark 2.4 does.|
> | z| General time zone which also accepts
>  [RFC 822 time zones|#rfc822timezone]]|Only accept time-zone name, e.g. 
> Pacific Standard Time; PST|!image-2020-03-04-10-54-13-238.png!  |The 
> semantics of ‘z’ are different between Java 7 and Java 8. Here, Spark 3.0 
> follows the semantics of Java 8. 
>  Use Java 8 to parse the string. If parsable, return the result; otherwise, 
> use the legacy Java 7 parser to parse. When it is successfully parsed, throw 
> an exception and ask users to change the pattern strings or turn on the 
> legacy mode; otherwise, return NULL as what Spark 2.4 does.|
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31074) Avro serializer should not fail when a nullable Spark field is written to a non-null Avro column

2020-03-11 Thread Kyrill Alyoshin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056972#comment-17056972
 ] 

Kyrill Alyoshin commented on SPARK-31074:
-

Yes,
 # Create a simple Avro schema file with 2 properties in it '*f1*' and '*f2*' - 
their types can be Strings.
 # Create a Spark dataframe with two fields in it '*f1*' and '*f2*' of String 
type that are *nullable*.
 # Write out this dataframe to a file using the Avro schema create in 1 through 
'{{avroSchema}}' option.

 

> Avro serializer should not fail when a nullable Spark field is written to a 
> non-null Avro column
> 
>
> Key: SPARK-31074
> URL: https://issues.apache.org/jira/browse/SPARK-31074
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.4
>Reporter: Kyrill Alyoshin
>Priority: Major
>
> Spark StructType schema are strongly biased towards having _nullable_ fields. 
> In fact, this is what _Encoders.bean()_ does - any non-primitive field is 
> automatically _nullable_. When we attempt to serialize dataframes into 
> *user-supplied* Avro schemas where such corresponding fields are marked as 
> _non-null_ (i.e., they are not of _union_ type) any such attempt will fail 
> with the following exception
>  
> {code:java}
> Caused by: org.apache.avro.AvroRuntimeException: Not a union: "string"
>   at org.apache.avro.Schema.getTypes(Schema.java:299)
>   at 
> org.apache.spark.sql.avro.AvroSerializer.org$apache$spark$sql$avro$AvroSerializer$$resolveNullableType(AvroSerializer.scala:229)
>   at 
> org.apache.spark.sql.avro.AvroSerializer$$anonfun$3.apply(AvroSerializer.scala:209)
>  {code}
> This seems as rather draconian. We certainly should be able to write a field 
> of the same type and with the same name if it is not a null into a 
> non-nullable Avro column. In fact, the problem is so *severe* that it is not 
> clear what should be done in such situations when Avro schema is given to you 
> as part of API communication contract (i.e., it is non-changeable).
> This is an important issue.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31123) Drop does not work after join with aliases

2020-03-11 Thread Mikel San Vicente (Jira)
Mikel San Vicente created SPARK-31123:
-

 Summary: Drop does not work after join with aliases
 Key: SPARK-31123
 URL: https://issues.apache.org/jira/browse/SPARK-31123
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.2
Reporter: Mikel San Vicente


 

Hi,

I am seeing a really strange behaviour in drop method after a join with 
aliases. It doesn't seem to find the column when I reference to it using 
dataframe("columnName") syntax, but it does work with other combinators like 
select
{code:java}
case class Record(a: String, dup: String)
case class Record2(b: String, dup: String)
val df = Seq(Record("a", "dup")).toDF
val joined = df.alias("a").join(df2.alias("b"), df("a") === df2("b"))
val dupCol = df("dup")
joined.drop(dupCol) // Does not drop anything
joined.drop(func.col("a.dup")) // It works!  
joined.select(dupCol) // It works!
{code}
 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31123) Drop does not work after join with aliases

2020-03-11 Thread Mikel San Vicente (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikel San Vicente updated SPARK-31123:
--
Description: 
 

Hi,

I am seeing a really strange behaviour in drop method after a join with 
aliases. It doesn't seem to find the column when I reference to it using 
dataframe("columnName") syntax, but it does work with other combinators like 
select
{code:java}
case class Record(a: String, dup: String)
case class Record2(b: String, dup: String)
val df = Seq(Record("a", "dup")).toDF
val joined = df.alias("a").join(df2.alias("b"), df("a") === df2("b"))
val dupCol = df("dup")
joined.drop(dupCol) // Does not drop anything
joined.drop(func.col("a.dup")) // It drops the column  
joined.select(dupCol) // It selects the column
{code}
 

 

 

  was:
 

Hi,

I am seeing a really strange behaviour in drop method after a join with 
aliases. It doesn't seem to find the column when I reference to it using 
dataframe("columnName") syntax, but it does work with other combinators like 
select
{code:java}
case class Record(a: String, dup: String)
case class Record2(b: String, dup: String)
val df = Seq(Record("a", "dup")).toDF
val joined = df.alias("a").join(df2.alias("b"), df("a") === df2("b"))
val dupCol = df("dup")
joined.drop(dupCol) // Does not drop anything
joined.drop(func.col("a.dup")) // It works!  
joined.select(dupCol) // It works!
{code}
 

 

 


> Drop does not work after join with aliases
> --
>
> Key: SPARK-31123
> URL: https://issues.apache.org/jira/browse/SPARK-31123
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.2
>Reporter: Mikel San Vicente
>Priority: Minor
>
>  
> Hi,
> I am seeing a really strange behaviour in drop method after a join with 
> aliases. It doesn't seem to find the column when I reference to it using 
> dataframe("columnName") syntax, but it does work with other combinators like 
> select
> {code:java}
> case class Record(a: String, dup: String)
> case class Record2(b: String, dup: String)
> val df = Seq(Record("a", "dup")).toDF
> val joined = df.alias("a").join(df2.alias("b"), df("a") === df2("b"))
> val dupCol = df("dup")
> joined.drop(dupCol) // Does not drop anything
> joined.drop(func.col("a.dup")) // It drops the column  
> joined.select(dupCol) // It selects the column
> {code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31076) Convert Catalyst's DATE/TIMESTAMP to Java Date/Timestamp via local date-time

2020-03-11 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-31076:
---

Assignee: Maxim Gekk

> Convert Catalyst's DATE/TIMESTAMP to Java Date/Timestamp via local date-time
> 
>
> Key: SPARK-31076
> URL: https://issues.apache.org/jira/browse/SPARK-31076
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
>
> By default, collect() returns java.sql.Timestamp/Date instances with offsets 
> derived from internal values of Catalyst's TIMESTAMP/DATE that store 
> microseconds since the epoch. The conversion from internal values to 
> java.sql.Timestamp/Date based on Proleptic Gregorian calendar but converting 
> the resulted values before 1582 year to strings produces timestamp/date 
> string in Julian calendar. For example:
> {code}
> scala> sql("select date '1100-10-10'").collect()
> res1: Array[org.apache.spark.sql.Row] = Array([1100-10-03])
> {code} 
> This can be fixed if internal Catalyst's values are converted to local 
> date-time in Gregorian calendar,  and construct local date-time from the 
> resulted year, month, ..., seconds in Julian calendar.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-31076) Convert Catalyst's DATE/TIMESTAMP to Java Date/Timestamp via local date-time

2020-03-11 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-31076.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 27807
[https://github.com/apache/spark/pull/27807]

> Convert Catalyst's DATE/TIMESTAMP to Java Date/Timestamp via local date-time
> 
>
> Key: SPARK-31076
> URL: https://issues.apache.org/jira/browse/SPARK-31076
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.0.0
>
>
> By default, collect() returns java.sql.Timestamp/Date instances with offsets 
> derived from internal values of Catalyst's TIMESTAMP/DATE that store 
> microseconds since the epoch. The conversion from internal values to 
> java.sql.Timestamp/Date based on Proleptic Gregorian calendar but converting 
> the resulted values before 1582 year to strings produces timestamp/date 
> string in Julian calendar. For example:
> {code}
> scala> sql("select date '1100-10-10'").collect()
> res1: Array[org.apache.spark.sql.Row] = Array([1100-10-03])
> {code} 
> This can be fixed if internal Catalyst's values are converted to local 
> date-time in Gregorian calendar,  and construct local date-time from the 
> resulted year, month, ..., seconds in Julian calendar.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31041) Show Maven errors from within make-distribution.sh

2020-03-11 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-31041:


Assignee: Nicholas Chammas

> Show Maven errors from within make-distribution.sh
> --
>
> Key: SPARK-31041
> URL: https://issues.apache.org/jira/browse/SPARK-31041
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Nicholas Chammas
>Assignee: Nicholas Chammas
>Priority: Trivial
>
> This works:
> {code:java}
> ./dev/make-distribution.sh \
>  --pip \
>  -Phadoop-2.7 -Phive -Phadoop-cloud {code}
>  
>  But this doesn't:
> {code:java}
>  ./dev/make-distribution.sh \
>  -Phadoop-2.7 -Phive -Phadoop-cloud \
>  --pip{code}
>  
> The latter invocation yields the following, confusing output:
> {code:java}
>  + VERSION=' -X,--debug Produce execution debug output'{code}
>  That's because Maven is accepting {{--pip}} as an option and failing, but 
> the user doesn't get to see the error from Maven.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-31041) Show Maven errors from within make-distribution.sh

2020-03-11 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-31041.
--
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 27800
[https://github.com/apache/spark/pull/27800]

> Show Maven errors from within make-distribution.sh
> --
>
> Key: SPARK-31041
> URL: https://issues.apache.org/jira/browse/SPARK-31041
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Nicholas Chammas
>Assignee: Nicholas Chammas
>Priority: Trivial
> Fix For: 3.1.0
>
>
> This works:
> {code:java}
> ./dev/make-distribution.sh \
>  --pip \
>  -Phadoop-2.7 -Phive -Phadoop-cloud {code}
>  
>  But this doesn't:
> {code:java}
>  ./dev/make-distribution.sh \
>  -Phadoop-2.7 -Phive -Phadoop-cloud \
>  --pip{code}
>  
> The latter invocation yields the following, confusing output:
> {code:java}
>  + VERSION=' -X,--debug Produce execution debug output'{code}
>  That's because Maven is accepting {{--pip}} as an option and failing, but 
> the user doesn't get to see the error from Maven.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31081) Make the display of stageId/stageAttemptId/taskId of sql metrics configurable in UI

2020-03-11 Thread wuyi (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wuyi updated SPARK-31081:
-
Summary: Make the display of stageId/stageAttemptId/taskId of sql metrics 
configurable in UI   (was: Make SQLMetrics more readable from UI)

> Make the display of stageId/stageAttemptId/taskId of sql metrics configurable 
> in UI 
> 
>
> Key: SPARK-31081
> URL: https://issues.apache.org/jira/browse/SPARK-31081
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: wuyi
>Priority: Major
>
> It makes metrics harder to read after SPARK-30209 and user may not interest 
> in extra info({{stageId/StageAttemptId/taskId }}) when they do not need debug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31124) change the default value of minPartitionNum in AQE

2020-03-11 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-31124:
---

 Summary: change the default value of minPartitionNum in AQE
 Key: SPARK-31124
 URL: https://issues.apache.org/jira/browse/SPARK-31124
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Wenchen Fan
Assignee: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30931) ML 3.0 QA: API: Python API coverage

2020-03-11 Thread Huaxin Gao (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17057146#comment-17057146
 ] 

Huaxin Gao commented on SPARK-30931:


cc [~podongfeng]
I didn't see anything else need to be changed. This ticket can be marked as 
complete. 

> ML 3.0 QA: API: Python API coverage
> ---
>
> Key: SPARK-30931
> URL: https://issues.apache.org/jira/browse/SPARK-30931
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, ML, MLlib, PySpark
>Affects Versions: 3.0.0
>Reporter: zhengruifeng
>Priority: Major
>
> For new public APIs added to MLlib ({{spark.ml}} only), we need to check the 
> generated HTML doc and compare the Scala & Python versions.
>  * *GOAL*: Audit and create JIRAs to fix in the next release.
>  * *NON-GOAL*: This JIRA is _not_ for fixing the API parity issues.
> We need to track:
>  * Inconsistency: Do class/method/parameter names match?
>  * Docs: Is the Python doc missing or just a stub? We want the Python doc to 
> be as complete as the Scala doc.
>  * API breaking changes: These should be very rare but are occasionally 
> either necessary (intentional) or accidental. These must be recorded and 
> added in the Migration Guide for this release.
>  ** Note: If the API change is for an Alpha/Experimental/DeveloperApi 
> component, please note that as well.
>  * Missing classes/methods/parameters: We should create to-do JIRAs for 
> functionality missing from Python, to be added in the next release cycle. 
> *Please use a _separate_ JIRA (linked below as "requires") for this list of 
> to-do items.*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-31117) reduce the test time of DateTimeUtilsSuite

2020-03-11 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-31117.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 27873
[https://github.com/apache/spark/pull/27873]

> reduce the test time of DateTimeUtilsSuite
> --
>
> Key: SPARK-31117
> URL: https://issues.apache.org/jira/browse/SPARK-31117
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30935) Update MLlib, GraphX websites for 3.0

2020-03-11 Thread Huaxin Gao (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17057154#comment-17057154
 ] 

Huaxin Gao commented on SPARK-30935:


cc [~podongfeng]
I think all the docs are OK now. This can be marked as complete. 

> Update MLlib, GraphX websites for 3.0
> -
>
> Key: SPARK-30935
> URL: https://issues.apache.org/jira/browse/SPARK-30935
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, GraphX, ML, MLlib
>Affects Versions: 3.0.0
>Reporter: zhengruifeng
>Priority: Critical
>
> Update the sub-projects' websites to include new features in this release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31076) Convert Catalyst's DATE/TIMESTAMP to Java Date/Timestamp via local date-time

2020-03-11 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-31076:
--
Labels: correctness  (was: )

> Convert Catalyst's DATE/TIMESTAMP to Java Date/Timestamp via local date-time
> 
>
> Key: SPARK-31076
> URL: https://issues.apache.org/jira/browse/SPARK-31076
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
>  Labels: correctness
> Fix For: 3.0.0
>
>
> By default, collect() returns java.sql.Timestamp/Date instances with offsets 
> derived from internal values of Catalyst's TIMESTAMP/DATE that store 
> microseconds since the epoch. The conversion from internal values to 
> java.sql.Timestamp/Date based on Proleptic Gregorian calendar but converting 
> the resulted values before 1582 year to strings produces timestamp/date 
> string in Julian calendar. For example:
> {code}
> scala> sql("select date '1100-10-10'").collect()
> res1: Array[org.apache.spark.sql.Row] = Array([1100-10-03])
> {code} 
> This can be fixed if internal Catalyst's values are converted to local 
> date-time in Gregorian calendar,  and construct local date-time from the 
> resulted year, month, ..., seconds in Julian calendar.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-31091) Revert SPARK-24640 "Return `NULL` from `size(NULL)` by default"

2020-03-11 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-31091.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 27834
[https://github.com/apache/spark/pull/27834]

> Revert SPARK-24640 "Return `NULL` from `size(NULL)` by default"
> ---
>
> Key: SPARK-31091
> URL: https://issues.apache.org/jira/browse/SPARK-31091
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24640) size(null) returns null

2020-03-11 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-24640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17057207#comment-17057207
 ] 

Dongjoon Hyun edited comment on SPARK-24640 at 3/11/20, 4:57 PM:
-

This is reverted via https://github.com/apache/spark/pull/27834


was (Author: dongjoon):
This is reverted.

> size(null) returns null 
> 
>
> Key: SPARK-24640
> URL: https://issues.apache.org/jira/browse/SPARK-24640
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Xiao Li
>Priority: Major
>
> Size(null) should return null instead of -1 in 3.0 release. This is a 
> behavior change. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24640) size(null) returns null

2020-03-11 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-24640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-24640:
--
Fix Version/s: (was: 3.0.0)

> size(null) returns null 
> 
>
> Key: SPARK-24640
> URL: https://issues.apache.org/jira/browse/SPARK-24640
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Xiao Li
>Priority: Major
>
> Size(null) should return null instead of -1 in 3.0 release. This is a 
> behavior change. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24640) size(null) returns null

2020-03-11 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-24640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-24640:
-

Assignee: (was: Maxim Gekk)

> size(null) returns null 
> 
>
> Key: SPARK-24640
> URL: https://issues.apache.org/jira/browse/SPARK-24640
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Xiao Li
>Priority: Major
> Fix For: 3.0.0
>
>
> Size(null) should return null instead of -1 in 3.0 release. This is a 
> behavior change. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-24640) size(null) returns null

2020-03-11 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-24640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-24640.
---
Resolution: Won't Do

> size(null) returns null 
> 
>
> Key: SPARK-24640
> URL: https://issues.apache.org/jira/browse/SPARK-24640
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Xiao Li
>Priority: Major
>
> Size(null) should return null instead of -1 in 3.0 release. This is a 
> behavior change. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-24640) size(null) returns null

2020-03-11 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-24640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reopened SPARK-24640:
---

This is reverted.

> size(null) returns null 
> 
>
> Key: SPARK-24640
> URL: https://issues.apache.org/jira/browse/SPARK-24640
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Xiao Li
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.0.0
>
>
> Size(null) should return null instead of -1 in 3.0 release. This is a 
> behavior change. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-24640) size(null) returns null

2020-03-11 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-24640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun closed SPARK-24640.
-

> size(null) returns null 
> 
>
> Key: SPARK-24640
> URL: https://issues.apache.org/jira/browse/SPARK-24640
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Xiao Li
>Priority: Major
>
> Size(null) should return null instead of -1 in 3.0 release. This is a 
> behavior change. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31099) Create migration script for metastore_db

2020-03-11 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17057220#comment-17057220
 ] 

Dongjoon Hyun commented on SPARK-31099:
---

To [~kabhwan]. Yes. I mean that corner cases. It's the same here. We can remove 
local derby files.
To [~rednaxelafx]. For remote HMS, we have `spark.sql.hive.metastore.version`. 
And, SPARK-27686 was the issue for `Update migration guide for make Hive 2.3 
dependency by default`.
To [~cloud_fan], yes. This scope of this issue is for local hive metastore. For 
remote HMS, we should follow up at SPARK-27686.

cc [~smilegator] and [~yumwang] since we worked together at SPARK-27686.

cc [~rxin] since he is a release manager for 3.0.0.



> Create migration script for metastore_db
> 
>
> Key: SPARK-31099
> URL: https://issues.apache.org/jira/browse/SPARK-31099
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gengliang Wang
>Priority: Major
>
> When an existing Derby database exists (in ./metastore_db) created by Hive 
> 1.2.x profile, it'll fail to upgrade itself to the Hive 2.3.x profile.
> Repro steps:
> 1. Build OSS or DBR master with SBT with -Phive-1.2 -Phive 
> -Phive-thriftserver. Make sure there's no existing ./metastore_db directory 
> in the repo.
> 2. Run bin/spark-shell, and then spark.sql("show databases"). This will 
> populate the ./metastore_db directory, where the Derby-based metastore 
> database is hosted. This database is populated from Hive 1.2.x.
> 3. Re-build OSS or DBR master with SBT with -Phive -Phive-thriftserver (drops 
> the Hive 1.2 profile, which makes it use the default Hive 2.3 profile)
> 4. Repeat Step (2) above. This will trigger Hive 2.3.x to load the Derby 
> database created in Step (2), which triggers an upgrade step, and that's 
> where the following error will be reported.
> 5. Delete the ./metastore_db and re-run Step (4). The error is no longer 
> reported.
> {code:java}
> 20/03/09 13:57:04 ERROR Datastore: Error thrown executing ALTER TABLE TBLS 
> ADD IS_REWRITE_ENABLED CHAR(1) NOT NULL CHECK (IS_REWRITE_ENABLED IN 
> ('Y','N')) : In an ALTER TABLE statement, the column 'IS_REWRITE_ENABLED' has 
> been specified as NOT NULL and either the DEFAULT clause was not specified or 
> was specified as DEFAULT NULL.
> java.sql.SQLSyntaxErrorException: In an ALTER TABLE statement, the column 
> 'IS_REWRITE_ENABLED' has been specified as NOT NULL and either the DEFAULT 
> clause was not specified or was specified as DEFAULT NULL.
>   at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
>   at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
> Source)
>   at 
> org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown 
> Source)
>   at 
> org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
>   at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
>   at com.jolbox.bonecp.StatementHandle.execute(StatementHandle.java:254)
>   at 
> org.datanucleus.store.rdbms.table.AbstractTable.executeDdlStatement(AbstractTable.java:879)
>   at 
> org.datanucleus.store.rdbms.table.AbstractTable.executeDdlStatementList(AbstractTable.java:830)
>   at 
> org.datanucleus.store.rdbms.table.TableImpl.validateColumns(TableImpl.java:257)
>   at 
> org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.performTablesValidation(RDBMSStoreManager.java:3398)
>   at 
> org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.run(RDBMSStoreManager.java:2896)
>   at 
> org.datanucleus.store.rdbms.AbstractSchemaTransaction.execute(AbstractSchemaTransaction.java:119)
>   at 
> org.datanucleus.store.rdbms.RDBMSStoreManager.manageClasses(RDBMSStoreManager.java:1627)
>   at 
> org.datanucleus.store.rdbms.RDBMSStoreManager.getDatastoreClass(RDBMSStoreManager.java:672)
>   at 
> org.datanucleus.store.rdbms.query.RDBMSQueryUtils.getStatementForCandidates(RDBMSQueryUtils.java:425)
>   at 
> org.datanucleus.store.rdbms.query.JDOQLQuery.compileQueryFull(JDOQLQuery.java:865)
>   at 
> org.datanucleus.store.rdbms.query.JDOQLQuery.compileInternal(JDOQLQuery.java:347)
>   at org.datanucleus.store.query.Query.executeQuery(Query.java:1816)
>   at org.datanucleus.store.query.Query.executeWithArray(Query.java:1744)
>   at org.datanucleus.store.query.Query.execute(Query.java:1726)
>   at org.datanucleus.api.jdo.JDOQuery.executeInternal(JDOQuery.java:374)

[jira] [Commented] (SPARK-30565) Regression in the ORC benchmark

2020-03-11 Thread Peter Toth (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17057224#comment-17057224
 ] 

Peter Toth commented on SPARK-30565:


I looked into this and the performance drop is due to the 1.2.1 -> 2.3.6 Hive 
version change we introduced in Spark 3. I measured that 
{{org.apache.hadoop.hive.ql.io.orc.ReaderImpl}} in {{hive-exec-2.3.6-core.jar}} 
is ~3-5 times slower than in {{hive-exec-1.2.1.spark2.jar}}.

> Regression in the ORC benchmark
> ---
>
> Key: SPARK-30565
> URL: https://issues.apache.org/jira/browse/SPARK-30565
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Major
>
> New benchmark results generated in the PR 
> [https://github.com/apache/spark/pull/27078] show regression ~3 times.
> Before:
> {code}
> Hive built-in ORC   520531
>8  2.0 495.8   0.6X
> {code}
> https://github.com/apache/spark/pull/27078/files#diff-42fe5f1ef10d8f9f274fc89b2c8d140dL138
> After:
> {code}
> Hive built-in ORC  1761   1792
>   43  0.61679.3   0.1X
> {code}
> https://github.com/apache/spark/pull/27078/files#diff-42fe5f1ef10d8f9f274fc89b2c8d140dR138



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31099) Create migration script for metastore_db

2020-03-11 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-31099:
--
Parent: SPARK-30034
Issue Type: Sub-task  (was: Improvement)

> Create migration script for metastore_db
> 
>
> Key: SPARK-31099
> URL: https://issues.apache.org/jira/browse/SPARK-31099
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gengliang Wang
>Priority: Major
>
> When an existing Derby database exists (in ./metastore_db) created by Hive 
> 1.2.x profile, it'll fail to upgrade itself to the Hive 2.3.x profile.
> Repro steps:
> 1. Build OSS or DBR master with SBT with -Phive-1.2 -Phive 
> -Phive-thriftserver. Make sure there's no existing ./metastore_db directory 
> in the repo.
> 2. Run bin/spark-shell, and then spark.sql("show databases"). This will 
> populate the ./metastore_db directory, where the Derby-based metastore 
> database is hosted. This database is populated from Hive 1.2.x.
> 3. Re-build OSS or DBR master with SBT with -Phive -Phive-thriftserver (drops 
> the Hive 1.2 profile, which makes it use the default Hive 2.3 profile)
> 4. Repeat Step (2) above. This will trigger Hive 2.3.x to load the Derby 
> database created in Step (2), which triggers an upgrade step, and that's 
> where the following error will be reported.
> 5. Delete the ./metastore_db and re-run Step (4). The error is no longer 
> reported.
> {code:java}
> 20/03/09 13:57:04 ERROR Datastore: Error thrown executing ALTER TABLE TBLS 
> ADD IS_REWRITE_ENABLED CHAR(1) NOT NULL CHECK (IS_REWRITE_ENABLED IN 
> ('Y','N')) : In an ALTER TABLE statement, the column 'IS_REWRITE_ENABLED' has 
> been specified as NOT NULL and either the DEFAULT clause was not specified or 
> was specified as DEFAULT NULL.
> java.sql.SQLSyntaxErrorException: In an ALTER TABLE statement, the column 
> 'IS_REWRITE_ENABLED' has been specified as NOT NULL and either the DEFAULT 
> clause was not specified or was specified as DEFAULT NULL.
>   at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
>   at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
> Source)
>   at 
> org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown 
> Source)
>   at 
> org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
>   at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
>   at com.jolbox.bonecp.StatementHandle.execute(StatementHandle.java:254)
>   at 
> org.datanucleus.store.rdbms.table.AbstractTable.executeDdlStatement(AbstractTable.java:879)
>   at 
> org.datanucleus.store.rdbms.table.AbstractTable.executeDdlStatementList(AbstractTable.java:830)
>   at 
> org.datanucleus.store.rdbms.table.TableImpl.validateColumns(TableImpl.java:257)
>   at 
> org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.performTablesValidation(RDBMSStoreManager.java:3398)
>   at 
> org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.run(RDBMSStoreManager.java:2896)
>   at 
> org.datanucleus.store.rdbms.AbstractSchemaTransaction.execute(AbstractSchemaTransaction.java:119)
>   at 
> org.datanucleus.store.rdbms.RDBMSStoreManager.manageClasses(RDBMSStoreManager.java:1627)
>   at 
> org.datanucleus.store.rdbms.RDBMSStoreManager.getDatastoreClass(RDBMSStoreManager.java:672)
>   at 
> org.datanucleus.store.rdbms.query.RDBMSQueryUtils.getStatementForCandidates(RDBMSQueryUtils.java:425)
>   at 
> org.datanucleus.store.rdbms.query.JDOQLQuery.compileQueryFull(JDOQLQuery.java:865)
>   at 
> org.datanucleus.store.rdbms.query.JDOQLQuery.compileInternal(JDOQLQuery.java:347)
>   at org.datanucleus.store.query.Query.executeQuery(Query.java:1816)
>   at org.datanucleus.store.query.Query.executeWithArray(Query.java:1744)
>   at org.datanucleus.store.query.Query.execute(Query.java:1726)
>   at org.datanucleus.api.jdo.JDOQuery.executeInternal(JDOQuery.java:374)
>   at org.datanucleus.api.jdo.JDOQuery.execute(JDOQuery.java:216)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.ensureDbInit(MetaStoreDirectSql.java:184)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.(MetaStoreDirectSql.java:144)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.initializeHelper(ObjectStore.java:410)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:342)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.setC

[jira] [Updated] (SPARK-25193) insert overwrite doesn't throw exception when drop old data fails

2020-03-11 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-25193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-25193:
--
Labels: correctness  (was: bulk-closed)

> insert overwrite doesn't throw exception when drop old data fails
> -
>
> Key: SPARK-25193
> URL: https://issues.apache.org/jira/browse/SPARK-25193
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: chen xiao
>Priority: Major
>  Labels: correctness
>
> dataframe.write.mode(SaveMode.Overwrite).insertInto(s"$databaseName.$tableName")
> Insert overwrite mode will drop old data in hive table if there's old data.
> But if data deleting fails, no exception will be thrown and the data folder 
> will be like:
> hdfs://uxs_nbp/nba_score/dt=2018-08-15/seq_num=2/part-0
> hdfs://uxs_nbp/nba_score/dt=2018-08-15/seq_num=2/part-01534916642513.
> Two copies of data will be kept.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25193) insert overwrite doesn't throw exception when drop old data fails

2020-03-11 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-25193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17057251#comment-17057251
 ] 

Dongjoon Hyun commented on SPARK-25193:
---

I marked this as a correctness because the result after insertion will be 
incorrect due to the old data.

> insert overwrite doesn't throw exception when drop old data fails
> -
>
> Key: SPARK-25193
> URL: https://issues.apache.org/jira/browse/SPARK-25193
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: chen xiao
>Priority: Major
>  Labels: correctness
>
> dataframe.write.mode(SaveMode.Overwrite).insertInto(s"$databaseName.$tableName")
> Insert overwrite mode will drop old data in hive table if there's old data.
> But if data deleting fails, no exception will be thrown and the data folder 
> will be like:
> hdfs://uxs_nbp/nba_score/dt=2018-08-15/seq_num=2/part-0
> hdfs://uxs_nbp/nba_score/dt=2018-08-15/seq_num=2/part-01534916642513.
> Two copies of data will be kept.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-25193) insert overwrite doesn't throw exception when drop old data fails

2020-03-11 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-25193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reopened SPARK-25193:
---

> insert overwrite doesn't throw exception when drop old data fails
> -
>
> Key: SPARK-25193
> URL: https://issues.apache.org/jira/browse/SPARK-25193
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: chen xiao
>Priority: Major
>  Labels: correctness
>
> dataframe.write.mode(SaveMode.Overwrite).insertInto(s"$databaseName.$tableName")
> Insert overwrite mode will drop old data in hive table if there's old data.
> But if data deleting fails, no exception will be thrown and the data folder 
> will be like:
> hdfs://uxs_nbp/nba_score/dt=2018-08-15/seq_num=2/part-0
> hdfs://uxs_nbp/nba_score/dt=2018-08-15/seq_num=2/part-01534916642513.
> Two copies of data will be kept.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25193) insert overwrite doesn't throw exception when drop old data fails

2020-03-11 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-25193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-25193.
---
Resolution: Duplicate

This is fixed at 3.0.0 via SPARK-23710 

> insert overwrite doesn't throw exception when drop old data fails
> -
>
> Key: SPARK-25193
> URL: https://issues.apache.org/jira/browse/SPARK-25193
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: chen xiao
>Priority: Major
>  Labels: correctness
>
> dataframe.write.mode(SaveMode.Overwrite).insertInto(s"$databaseName.$tableName")
> Insert overwrite mode will drop old data in hive table if there's old data.
> But if data deleting fails, no exception will be thrown and the data folder 
> will be like:
> hdfs://uxs_nbp/nba_score/dt=2018-08-15/seq_num=2/part-0
> hdfs://uxs_nbp/nba_score/dt=2018-08-15/seq_num=2/part-01534916642513.
> Two copies of data will be kept.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25193) insert overwrite doesn't throw exception when drop old data fails

2020-03-11 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-25193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-25193:
--
Parent: SPARK-30034
Issue Type: Sub-task  (was: Bug)

> insert overwrite doesn't throw exception when drop old data fails
> -
>
> Key: SPARK-25193
> URL: https://issues.apache.org/jira/browse/SPARK-25193
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: chen xiao
>Priority: Major
>  Labels: correctness
>
> dataframe.write.mode(SaveMode.Overwrite).insertInto(s"$databaseName.$tableName")
> Insert overwrite mode will drop old data in hive table if there's old data.
> But if data deleting fails, no exception will be thrown and the data folder 
> will be like:
> hdfs://uxs_nbp/nba_score/dt=2018-08-15/seq_num=2/part-0
> hdfs://uxs_nbp/nba_score/dt=2018-08-15/seq_num=2/part-01534916642513.
> Two copies of data will be kept.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-25193) insert overwrite doesn't throw exception when drop old data fails

2020-03-11 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-25193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17057252#comment-17057252
 ] 

Dongjoon Hyun edited comment on SPARK-25193 at 3/11/20, 5:37 PM:
-

This is fixed at 3.0.0 via SPARK-30034 after SPARK-23710 


was (Author: dongjoon):
This is fixed at 3.0.0 via SPARK-23710 

> insert overwrite doesn't throw exception when drop old data fails
> -
>
> Key: SPARK-25193
> URL: https://issues.apache.org/jira/browse/SPARK-25193
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: chen xiao
>Priority: Major
>  Labels: correctness
>
> dataframe.write.mode(SaveMode.Overwrite).insertInto(s"$databaseName.$tableName")
> Insert overwrite mode will drop old data in hive table if there's old data.
> But if data deleting fails, no exception will be thrown and the data folder 
> will be like:
> hdfs://uxs_nbp/nba_score/dt=2018-08-15/seq_num=2/part-0
> hdfs://uxs_nbp/nba_score/dt=2018-08-15/seq_num=2/part-01534916642513.
> Two copies of data will be kept.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30989) TABLE.COLUMN reference doesn't work with new columns created by UDF

2020-03-11 Thread hemanth meka (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17057255#comment-17057255
 ] 

hemanth meka commented on SPARK-30989:
--

The alias "cat" is defined as a dataframe having 2 columns "x" and "y". The 
column "z" is generated from "cat" into a new dataframe "df2" but below code 
works and hence this exception looks like it should be the expected behaviour. 
is it not?
df2.select("z")
 

> TABLE.COLUMN reference doesn't work with new columns created by UDF
> ---
>
> Key: SPARK-30989
> URL: https://issues.apache.org/jira/browse/SPARK-30989
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Chris Suchanek
>Priority: Major
>
> When a dataframe is created with an alias (`.as("...")`) its columns can be 
> referred as `TABLE.COLUMN` but it doesn't work for newly created columns with 
> UDF.
> {code:java}
> // code placeholder
> df1 = sc.parallelize(l).toDF("x","y").as("cat")
> val squared = udf((s: Int) => s * s)
> val df2 = df1.withColumn("z", squared(col("y")))
> df2.columns //Array[String] = Array(x, y, z)
> df2.select("cat.x") // works
> df2.select("cat.z") // Doesn't work
> // org.apache.spark.sql.AnalysisException: cannot resolve '`cat.z`' given 
> input 
> // columns: [cat.x, cat.y, z];;
> {code}
> Might be related to: https://issues.apache.org/jira/browse/SPARK-30532



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29295) Duplicate result when dropping partition of an external table and then overwriting

2020-03-11 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-29295:
--
Labels: correctness  (was: )

> Duplicate result when dropping partition of an external table and then 
> overwriting
> --
>
> Key: SPARK-29295
> URL: https://issues.apache.org/jira/browse/SPARK-29295
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: feiwang
>Assignee: L. C. Hsieh
>Priority: Major
>  Labels: correctness
> Fix For: 3.0.0
>
>
> When we drop a partition of a external table and then overwrite it, if we set 
> CONVERT_METASTORE_PARQUET=true(default value), it will overwrite this 
> partition.
> But when we set CONVERT_METASTORE_PARQUET=false, it will give duplicate 
> result.
> Here is a reproduce code below(you can add it into SQLQuerySuite in hive 
> module):
> {code:java}
>   test("spark gives duplicate result when dropping a partition of an external 
> partitioned table" +
> " firstly and they overwrite it") {
> withTable("test") {
>   withTempDir { f =>
> sql("create external table test(id int) partitioned by (name string) 
> stored as " +
>   s"parquet location '${f.getAbsolutePath}'")
> withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> 
> false.toString) {
>   sql("insert overwrite table test partition(name='n1') select 1")
>   sql("ALTER TABLE test DROP PARTITION(name='n1')")
>   sql("insert overwrite table test partition(name='n1') select 2")
>   checkAnswer( sql("select id from test where name = 'n1' order by 
> id"),
> Array(Row(1), Row(2)))
> }
> withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> true.toString) 
> {
>   sql("insert overwrite table test partition(name='n1') select 1")
>   sql("ALTER TABLE test DROP PARTITION(name='n1')")
>   sql("insert overwrite table test partition(name='n1') select 2")
>   checkAnswer( sql("select id from test where name = 'n1' order by 
> id"),
> Array(Row(2)))
> }
>   }
> }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29295) Duplicate result when dropping partition of an external table and then overwriting

2020-03-11 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17057261#comment-17057261
 ] 

Dongjoon Hyun commented on SPARK-29295:
---

Hi, [~viirya]. Could you make a backport against branch-2.4?

> Duplicate result when dropping partition of an external table and then 
> overwriting
> --
>
> Key: SPARK-29295
> URL: https://issues.apache.org/jira/browse/SPARK-29295
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: feiwang
>Assignee: L. C. Hsieh
>Priority: Major
>  Labels: correctness
> Fix For: 3.0.0
>
>
> When we drop a partition of a external table and then overwrite it, if we set 
> CONVERT_METASTORE_PARQUET=true(default value), it will overwrite this 
> partition.
> But when we set CONVERT_METASTORE_PARQUET=false, it will give duplicate 
> result.
> Here is a reproduce code below(you can add it into SQLQuerySuite in hive 
> module):
> {code:java}
>   test("spark gives duplicate result when dropping a partition of an external 
> partitioned table" +
> " firstly and they overwrite it") {
> withTable("test") {
>   withTempDir { f =>
> sql("create external table test(id int) partitioned by (name string) 
> stored as " +
>   s"parquet location '${f.getAbsolutePath}'")
> withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> 
> false.toString) {
>   sql("insert overwrite table test partition(name='n1') select 1")
>   sql("ALTER TABLE test DROP PARTITION(name='n1')")
>   sql("insert overwrite table test partition(name='n1') select 2")
>   checkAnswer( sql("select id from test where name = 'n1' order by 
> id"),
> Array(Row(1), Row(2)))
> }
> withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> true.toString) 
> {
>   sql("insert overwrite table test partition(name='n1') select 1")
>   sql("ALTER TABLE test DROP PARTITION(name='n1')")
>   sql("insert overwrite table test partition(name='n1') select 2")
>   checkAnswer( sql("select id from test where name = 'n1' order by 
> id"),
> Array(Row(2)))
> }
>   }
> }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29295) Duplicate result when dropping partition of an external table and then overwriting

2020-03-11 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17057260#comment-17057260
 ] 

Dongjoon Hyun commented on SPARK-29295:
---

I marked this as a `correctness` issue.

> Duplicate result when dropping partition of an external table and then 
> overwriting
> --
>
> Key: SPARK-29295
> URL: https://issues.apache.org/jira/browse/SPARK-29295
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: feiwang
>Assignee: L. C. Hsieh
>Priority: Major
>  Labels: correctness
> Fix For: 3.0.0
>
>
> When we drop a partition of a external table and then overwrite it, if we set 
> CONVERT_METASTORE_PARQUET=true(default value), it will overwrite this 
> partition.
> But when we set CONVERT_METASTORE_PARQUET=false, it will give duplicate 
> result.
> Here is a reproduce code below(you can add it into SQLQuerySuite in hive 
> module):
> {code:java}
>   test("spark gives duplicate result when dropping a partition of an external 
> partitioned table" +
> " firstly and they overwrite it") {
> withTable("test") {
>   withTempDir { f =>
> sql("create external table test(id int) partitioned by (name string) 
> stored as " +
>   s"parquet location '${f.getAbsolutePath}'")
> withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> 
> false.toString) {
>   sql("insert overwrite table test partition(name='n1') select 1")
>   sql("ALTER TABLE test DROP PARTITION(name='n1')")
>   sql("insert overwrite table test partition(name='n1') select 2")
>   checkAnswer( sql("select id from test where name = 'n1' order by 
> id"),
> Array(Row(1), Row(2)))
> }
> withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> true.toString) 
> {
>   sql("insert overwrite table test partition(name='n1') select 1")
>   sql("ALTER TABLE test DROP PARTITION(name='n1')")
>   sql("insert overwrite table test partition(name='n1') select 2")
>   checkAnswer( sql("select id from test where name = 'n1' order by 
> id"),
> Array(Row(2)))
> }
>   }
> }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29295) Duplicate result when dropping partition of an external table and then overwriting

2020-03-11 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-29295:
--
Affects Version/s: 2.3.4

> Duplicate result when dropping partition of an external table and then 
> overwriting
> --
>
> Key: SPARK-29295
> URL: https://issues.apache.org/jira/browse/SPARK-29295
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.4, 2.4.4
>Reporter: feiwang
>Assignee: L. C. Hsieh
>Priority: Major
>  Labels: correctness
> Fix For: 3.0.0
>
>
> When we drop a partition of a external table and then overwrite it, if we set 
> CONVERT_METASTORE_PARQUET=true(default value), it will overwrite this 
> partition.
> But when we set CONVERT_METASTORE_PARQUET=false, it will give duplicate 
> result.
> Here is a reproduce code below(you can add it into SQLQuerySuite in hive 
> module):
> {code:java}
>   test("spark gives duplicate result when dropping a partition of an external 
> partitioned table" +
> " firstly and they overwrite it") {
> withTable("test") {
>   withTempDir { f =>
> sql("create external table test(id int) partitioned by (name string) 
> stored as " +
>   s"parquet location '${f.getAbsolutePath}'")
> withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> 
> false.toString) {
>   sql("insert overwrite table test partition(name='n1') select 1")
>   sql("ALTER TABLE test DROP PARTITION(name='n1')")
>   sql("insert overwrite table test partition(name='n1') select 2")
>   checkAnswer( sql("select id from test where name = 'n1' order by 
> id"),
> Array(Row(1), Row(2)))
> }
> withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> true.toString) 
> {
>   sql("insert overwrite table test partition(name='n1') select 1")
>   sql("ALTER TABLE test DROP PARTITION(name='n1')")
>   sql("insert overwrite table test partition(name='n1') select 2")
>   checkAnswer( sql("select id from test where name = 'n1' order by 
> id"),
> Array(Row(2)))
> }
>   }
> }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25987) StackOverflowError when executing many operations on a table with many columns

2020-03-11 Thread L. C. Hsieh (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-25987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17057280#comment-17057280
 ] 

L. C. Hsieh commented on SPARK-25987:
-

Looks like Janino was upgraded, is this still an issue in 3.0?

> StackOverflowError when executing many operations on a table with many columns
> --
>
> Key: SPARK-25987
> URL: https://issues.apache.org/jira/browse/SPARK-25987
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.1, 2.2.2, 2.3.0, 2.3.2, 2.4.0, 3.0.0
> Environment: Ubuntu 18.04.1 LTS, openjdk "1.8.0_181"
>Reporter: Ivan Tsukanov
>Priority: Major
>
> When I execute
> {code:java}
> import org.apache.spark.sql._
> import org.apache.spark.sql.types._
> val columnsCount = 100
> val columns = (1 to columnsCount).map(i => s"col$i")
> val initialData = (1 to columnsCount).map(i => s"val$i")
> val df = spark.createDataFrame(
>   rowRDD = spark.sparkContext.makeRDD(Seq(Row.fromSeq(initialData))),
>   schema = StructType(columns.map(StructField(_, StringType, true)))
> )
> val addSuffixUDF = udf(
>   (str: String) => str + "_added"
> )
> implicit class DFOps(df: DataFrame) {
>   def addSuffix() = {
> df.select(columns.map(col =>
>   addSuffixUDF(df(col)).as(col)
> ): _*)
>   }
> }
> df.addSuffix().addSuffix().addSuffix().show()
> {code}
> I get
> {code:java}
> An exception or error caused a run to abort.
> java.lang.StackOverflowError
>  at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:385)
>  at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:553)
> ...
> {code}
> If I reduce columns number (to 10 for example) or do `addSuffix` only once - 
> it works fine.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29295) Duplicate result when dropping partition of an external table and then overwriting

2020-03-11 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-29295:
--
Affects Version/s: 2.2.3

> Duplicate result when dropping partition of an external table and then 
> overwriting
> --
>
> Key: SPARK-29295
> URL: https://issues.apache.org/jira/browse/SPARK-29295
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.3, 2.3.4, 2.4.4
>Reporter: feiwang
>Assignee: L. C. Hsieh
>Priority: Major
>  Labels: correctness
> Fix For: 3.0.0
>
>
> When we drop a partition of a external table and then overwrite it, if we set 
> CONVERT_METASTORE_PARQUET=true(default value), it will overwrite this 
> partition.
> But when we set CONVERT_METASTORE_PARQUET=false, it will give duplicate 
> result.
> Here is a reproduce code below(you can add it into SQLQuerySuite in hive 
> module):
> {code:java}
>   test("spark gives duplicate result when dropping a partition of an external 
> partitioned table" +
> " firstly and they overwrite it") {
> withTable("test") {
>   withTempDir { f =>
> sql("create external table test(id int) partitioned by (name string) 
> stored as " +
>   s"parquet location '${f.getAbsolutePath}'")
> withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> 
> false.toString) {
>   sql("insert overwrite table test partition(name='n1') select 1")
>   sql("ALTER TABLE test DROP PARTITION(name='n1')")
>   sql("insert overwrite table test partition(name='n1') select 2")
>   checkAnswer( sql("select id from test where name = 'n1' order by 
> id"),
> Array(Row(1), Row(2)))
> }
> withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> true.toString) 
> {
>   sql("insert overwrite table test partition(name='n1') select 1")
>   sql("ALTER TABLE test DROP PARTITION(name='n1')")
>   sql("insert overwrite table test partition(name='n1') select 2")
>   checkAnswer( sql("select id from test where name = 'n1' order by 
> id"),
> Array(Row(2)))
> }
>   }
> }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29295) Duplicate result when dropping partition of an external table and then overwriting

2020-03-11 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-29295:
--
Description: 
When we drop a partition of a external table and then overwrite it, if we set 
CONVERT_METASTORE_PARQUET=true(default value), it will overwrite this partition.
But when we set CONVERT_METASTORE_PARQUET=false, it will give duplicate result.

Here is a reproduce code below(you can add it into SQLQuerySuite in hive 
module):

{code:java}
  test("spark gives duplicate result when dropping a partition of an external 
partitioned table" +
" firstly and they overwrite it") {
withTable("test") {
  withTempDir { f =>
sql("create external table test(id int) partitioned by (name string) 
stored as " +
  s"parquet location '${f.getAbsolutePath}'")

withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> false.toString) {
  sql("insert overwrite table test partition(name='n1') select 1")
  sql("ALTER TABLE test DROP PARTITION(name='n1')")
  sql("insert overwrite table test partition(name='n1') select 2")
  checkAnswer( sql("select id from test where name = 'n1' order by id"),
Array(Row(1), Row(2)))
}

withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> true.toString) {
  sql("insert overwrite table test partition(name='n1') select 1")
  sql("ALTER TABLE test DROP PARTITION(name='n1')")
  sql("insert overwrite table test partition(name='n1') select 2")
  checkAnswer( sql("select id from test where name = 'n1' order by id"),
Array(Row(2)))
}
  }
}
  }
{code}

{code}
create external table test(id int) partitioned by (name string) stored as 
parquet location '/tmp/p';
set spark.sql.hive.convertMetastoreParquet=false;
insert overwrite table test partition(name='n1') select 1;
ALTER TABLE test DROP PARTITION(name='n1');
insert overwrite table test partition(name='n1') select 2;
select id from test where name = 'n1' order by id;
{code}

  was:
When we drop a partition of a external table and then overwrite it, if we set 
CONVERT_METASTORE_PARQUET=true(default value), it will overwrite this partition.
But when we set CONVERT_METASTORE_PARQUET=false, it will give duplicate result.

Here is a reproduce code below(you can add it into SQLQuerySuite in hive 
module):

{code:java}
  test("spark gives duplicate result when dropping a partition of an external 
partitioned table" +
" firstly and they overwrite it") {
withTable("test") {
  withTempDir { f =>
sql("create external table test(id int) partitioned by (name string) 
stored as " +
  s"parquet location '${f.getAbsolutePath}'")

withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> false.toString) {
  sql("insert overwrite table test partition(name='n1') select 1")
  sql("ALTER TABLE test DROP PARTITION(name='n1')")
  sql("insert overwrite table test partition(name='n1') select 2")
  checkAnswer( sql("select id from test where name = 'n1' order by id"),
Array(Row(1), Row(2)))
}

withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> true.toString) {
  sql("insert overwrite table test partition(name='n1') select 1")
  sql("ALTER TABLE test DROP PARTITION(name='n1')")
  sql("insert overwrite table test partition(name='n1') select 2")
  checkAnswer( sql("select id from test where name = 'n1' order by id"),
Array(Row(2)))
}
  }
}
  }
{code}



> Duplicate result when dropping partition of an external table and then 
> overwriting
> --
>
> Key: SPARK-29295
> URL: https://issues.apache.org/jira/browse/SPARK-29295
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.3, 2.3.4, 2.4.4
>Reporter: feiwang
>Assignee: L. C. Hsieh
>Priority: Major
>  Labels: correctness
> Fix For: 3.0.0
>
>
> When we drop a partition of a external table and then overwrite it, if we set 
> CONVERT_METASTORE_PARQUET=true(default value), it will overwrite this 
> partition.
> But when we set CONVERT_METASTORE_PARQUET=false, it will give duplicate 
> result.
> Here is a reproduce code below(you can add it into SQLQuerySuite in hive 
> module):
> {code:java}
>   test("spark gives duplicate result when dropping a partition of an external 
> partitioned table" +
> " firstly and they overwrite it") {
> withTable("test") {
>   withTempDir { f =>
> sql("create external table test(id int) partitioned by (name string) 
> stored as " +
>   s"parquet location '${f.getAbsolutePath}'")
> withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> 

[jira] [Updated] (SPARK-29295) Duplicate result when dropping partition of an external table and then overwriting

2020-03-11 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-29295:
--
Affects Version/s: (was: 2.4.4)
   2.4.5

> Duplicate result when dropping partition of an external table and then 
> overwriting
> --
>
> Key: SPARK-29295
> URL: https://issues.apache.org/jira/browse/SPARK-29295
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.3, 2.3.4, 2.4.5
>Reporter: feiwang
>Assignee: L. C. Hsieh
>Priority: Major
>  Labels: correctness
> Fix For: 3.0.0
>
>
> When we drop a partition of a external table and then overwrite it, if we set 
> CONVERT_METASTORE_PARQUET=true(default value), it will overwrite this 
> partition.
> But when we set CONVERT_METASTORE_PARQUET=false, it will give duplicate 
> result.
> Here is a reproduce code below(you can add it into SQLQuerySuite in hive 
> module):
> {code:java}
>   test("spark gives duplicate result when dropping a partition of an external 
> partitioned table" +
> " firstly and they overwrite it") {
> withTable("test") {
>   withTempDir { f =>
> sql("create external table test(id int) partitioned by (name string) 
> stored as " +
>   s"parquet location '${f.getAbsolutePath}'")
> withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> 
> false.toString) {
>   sql("insert overwrite table test partition(name='n1') select 1")
>   sql("ALTER TABLE test DROP PARTITION(name='n1')")
>   sql("insert overwrite table test partition(name='n1') select 2")
>   checkAnswer( sql("select id from test where name = 'n1' order by 
> id"),
> Array(Row(1), Row(2)))
> }
> withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> true.toString) 
> {
>   sql("insert overwrite table test partition(name='n1') select 1")
>   sql("ALTER TABLE test DROP PARTITION(name='n1')")
>   sql("insert overwrite table test partition(name='n1') select 2")
>   checkAnswer( sql("select id from test where name = 'n1' order by 
> id"),
> Array(Row(2)))
> }
>   }
> }
>   }
> {code}
> {code}
> create external table test(id int) partitioned by (name string) stored as 
> parquet location '/tmp/p';
> set spark.sql.hive.convertMetastoreParquet=false;
> insert overwrite table test partition(name='n1') select 1;
> ALTER TABLE test DROP PARTITION(name='n1');
> insert overwrite table test partition(name='n1') select 2;
> select id from test where name = 'n1' order by id;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29295) Duplicate result when dropping partition of an external table and then overwriting

2020-03-11 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17057283#comment-17057283
 ] 

Dongjoon Hyun commented on SPARK-29295:
---

I confirmed that Apache Spark 2.1.3 and older versions have no problem.

> Duplicate result when dropping partition of an external table and then 
> overwriting
> --
>
> Key: SPARK-29295
> URL: https://issues.apache.org/jira/browse/SPARK-29295
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.3, 2.3.4, 2.4.4
>Reporter: feiwang
>Assignee: L. C. Hsieh
>Priority: Major
>  Labels: correctness
> Fix For: 3.0.0
>
>
> When we drop a partition of a external table and then overwrite it, if we set 
> CONVERT_METASTORE_PARQUET=true(default value), it will overwrite this 
> partition.
> But when we set CONVERT_METASTORE_PARQUET=false, it will give duplicate 
> result.
> Here is a reproduce code below(you can add it into SQLQuerySuite in hive 
> module):
> {code:java}
>   test("spark gives duplicate result when dropping a partition of an external 
> partitioned table" +
> " firstly and they overwrite it") {
> withTable("test") {
>   withTempDir { f =>
> sql("create external table test(id int) partitioned by (name string) 
> stored as " +
>   s"parquet location '${f.getAbsolutePath}'")
> withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> 
> false.toString) {
>   sql("insert overwrite table test partition(name='n1') select 1")
>   sql("ALTER TABLE test DROP PARTITION(name='n1')")
>   sql("insert overwrite table test partition(name='n1') select 2")
>   checkAnswer( sql("select id from test where name = 'n1' order by 
> id"),
> Array(Row(1), Row(2)))
> }
> withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> true.toString) 
> {
>   sql("insert overwrite table test partition(name='n1') select 1")
>   sql("ALTER TABLE test DROP PARTITION(name='n1')")
>   sql("insert overwrite table test partition(name='n1') select 2")
>   checkAnswer( sql("select id from test where name = 'n1' order by 
> id"),
> Array(Row(2)))
> }
>   }
> }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25987) StackOverflowError when executing many operations on a table with many columns

2020-03-11 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-25987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17057301#comment-17057301
 ] 

Dongjoon Hyun commented on SPARK-25987:
---

Thank you for commenting, [~viirya].
I confirmed that this is fixed at 3.0.0-preview2 while 2.4.5 still has this bug.

> StackOverflowError when executing many operations on a table with many columns
> --
>
> Key: SPARK-25987
> URL: https://issues.apache.org/jira/browse/SPARK-25987
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.1, 2.2.2, 2.3.0, 2.3.2, 2.4.0, 3.0.0
> Environment: Ubuntu 18.04.1 LTS, openjdk "1.8.0_181"
>Reporter: Ivan Tsukanov
>Priority: Major
>
> When I execute
> {code:java}
> import org.apache.spark.sql._
> import org.apache.spark.sql.types._
> val columnsCount = 100
> val columns = (1 to columnsCount).map(i => s"col$i")
> val initialData = (1 to columnsCount).map(i => s"val$i")
> val df = spark.createDataFrame(
>   rowRDD = spark.sparkContext.makeRDD(Seq(Row.fromSeq(initialData))),
>   schema = StructType(columns.map(StructField(_, StringType, true)))
> )
> val addSuffixUDF = udf(
>   (str: String) => str + "_added"
> )
> implicit class DFOps(df: DataFrame) {
>   def addSuffix() = {
> df.select(columns.map(col =>
>   addSuffixUDF(df(col)).as(col)
> ): _*)
>   }
> }
> df.addSuffix().addSuffix().addSuffix().show()
> {code}
> I get
> {code:java}
> An exception or error caused a run to abort.
> java.lang.StackOverflowError
>  at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:385)
>  at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:553)
> ...
> {code}
> If I reduce columns number (to 10 for example) or do `addSuffix` only once - 
> it works fine.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25987) StackOverflowError when executing many operations on a table with many columns

2020-03-11 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-25987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-25987:
--
Affects Version/s: (was: 3.0.0)
   2.4.5

> StackOverflowError when executing many operations on a table with many columns
> --
>
> Key: SPARK-25987
> URL: https://issues.apache.org/jira/browse/SPARK-25987
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.1, 2.2.2, 2.3.0, 2.3.2, 2.4.0, 2.4.5
> Environment: Ubuntu 18.04.1 LTS, openjdk "1.8.0_181"
>Reporter: Ivan Tsukanov
>Priority: Major
>
> When I execute
> {code:java}
> import org.apache.spark.sql._
> import org.apache.spark.sql.types._
> val columnsCount = 100
> val columns = (1 to columnsCount).map(i => s"col$i")
> val initialData = (1 to columnsCount).map(i => s"val$i")
> val df = spark.createDataFrame(
>   rowRDD = spark.sparkContext.makeRDD(Seq(Row.fromSeq(initialData))),
>   schema = StructType(columns.map(StructField(_, StringType, true)))
> )
> val addSuffixUDF = udf(
>   (str: String) => str + "_added"
> )
> implicit class DFOps(df: DataFrame) {
>   def addSuffix() = {
> df.select(columns.map(col =>
>   addSuffixUDF(df(col)).as(col)
> ): _*)
>   }
> }
> df.addSuffix().addSuffix().addSuffix().show()
> {code}
> I get
> {code:java}
> An exception or error caused a run to abort.
> java.lang.StackOverflowError
>  at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:385)
>  at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:553)
> ...
> {code}
> If I reduce columns number (to 10 for example) or do `addSuffix` only once - 
> it works fine.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25987) StackOverflowError when executing many operations on a table with many columns

2020-03-11 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-25987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-25987.
---
Resolution: Duplicate

> StackOverflowError when executing many operations on a table with many columns
> --
>
> Key: SPARK-25987
> URL: https://issues.apache.org/jira/browse/SPARK-25987
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.1, 2.2.2, 2.3.0, 2.3.2, 2.4.0, 3.0.0
> Environment: Ubuntu 18.04.1 LTS, openjdk "1.8.0_181"
>Reporter: Ivan Tsukanov
>Priority: Major
>
> When I execute
> {code:java}
> import org.apache.spark.sql._
> import org.apache.spark.sql.types._
> val columnsCount = 100
> val columns = (1 to columnsCount).map(i => s"col$i")
> val initialData = (1 to columnsCount).map(i => s"val$i")
> val df = spark.createDataFrame(
>   rowRDD = spark.sparkContext.makeRDD(Seq(Row.fromSeq(initialData))),
>   schema = StructType(columns.map(StructField(_, StringType, true)))
> )
> val addSuffixUDF = udf(
>   (str: String) => str + "_added"
> )
> implicit class DFOps(df: DataFrame) {
>   def addSuffix() = {
> df.select(columns.map(col =>
>   addSuffixUDF(df(col)).as(col)
> ): _*)
>   }
> }
> df.addSuffix().addSuffix().addSuffix().show()
> {code}
> I get
> {code:java}
> An exception or error caused a run to abort.
> java.lang.StackOverflowError
>  at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:385)
>  at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:553)
> ...
> {code}
> If I reduce columns number (to 10 for example) or do `addSuffix` only once - 
> it works fine.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-25987) StackOverflowError when executing many operations on a table with many columns

2020-03-11 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-25987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17057301#comment-17057301
 ] 

Dongjoon Hyun edited comment on SPARK-25987 at 3/11/20, 6:29 PM:
-

Thank you for commenting, [~viirya].
I confirmed that the above example is fixed at 3.0.0-preview2 while 2.4.5 still 
has this bug.


was (Author: dongjoon):
Thank you for commenting, [~viirya].
I confirmed that this is fixed at 3.0.0-preview2 while 2.4.5 still has this bug.

> StackOverflowError when executing many operations on a table with many columns
> --
>
> Key: SPARK-25987
> URL: https://issues.apache.org/jira/browse/SPARK-25987
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.1, 2.2.2, 2.3.0, 2.3.2, 2.4.0, 2.4.5
> Environment: Ubuntu 18.04.1 LTS, openjdk "1.8.0_181"
>Reporter: Ivan Tsukanov
>Priority: Major
>
> When I execute
> {code:java}
> import org.apache.spark.sql._
> import org.apache.spark.sql.types._
> val columnsCount = 100
> val columns = (1 to columnsCount).map(i => s"col$i")
> val initialData = (1 to columnsCount).map(i => s"val$i")
> val df = spark.createDataFrame(
>   rowRDD = spark.sparkContext.makeRDD(Seq(Row.fromSeq(initialData))),
>   schema = StructType(columns.map(StructField(_, StringType, true)))
> )
> val addSuffixUDF = udf(
>   (str: String) => str + "_added"
> )
> implicit class DFOps(df: DataFrame) {
>   def addSuffix() = {
> df.select(columns.map(col =>
>   addSuffixUDF(df(col)).as(col)
> ): _*)
>   }
> }
> df.addSuffix().addSuffix().addSuffix().show()
> {code}
> I get
> {code:java}
> An exception or error caused a run to abort.
> java.lang.StackOverflowError
>  at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:385)
>  at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:553)
> ...
> {code}
> If I reduce columns number (to 10 for example) or do `addSuffix` only once - 
> it works fine.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25987) StackOverflowError when executing many operations on a table with many columns

2020-03-11 Thread L. C. Hsieh (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-25987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17057309#comment-17057309
 ] 

L. C. Hsieh commented on SPARK-25987:
-

Thanks [~dongjoon]. So upgrading Janino can fix this, right?

> StackOverflowError when executing many operations on a table with many columns
> --
>
> Key: SPARK-25987
> URL: https://issues.apache.org/jira/browse/SPARK-25987
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.1, 2.2.2, 2.3.0, 2.3.2, 2.4.0, 2.4.5
> Environment: Ubuntu 18.04.1 LTS, openjdk "1.8.0_181"
>Reporter: Ivan Tsukanov
>Priority: Major
>
> When I execute
> {code:java}
> import org.apache.spark.sql._
> import org.apache.spark.sql.types._
> val columnsCount = 100
> val columns = (1 to columnsCount).map(i => s"col$i")
> val initialData = (1 to columnsCount).map(i => s"val$i")
> val df = spark.createDataFrame(
>   rowRDD = spark.sparkContext.makeRDD(Seq(Row.fromSeq(initialData))),
>   schema = StructType(columns.map(StructField(_, StringType, true)))
> )
> val addSuffixUDF = udf(
>   (str: String) => str + "_added"
> )
> implicit class DFOps(df: DataFrame) {
>   def addSuffix() = {
> df.select(columns.map(col =>
>   addSuffixUDF(df(col)).as(col)
> ): _*)
>   }
> }
> df.addSuffix().addSuffix().addSuffix().show()
> {code}
> I get
> {code:java}
> An exception or error caused a run to abort.
> java.lang.StackOverflowError
>  at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:385)
>  at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:553)
> ...
> {code}
> If I reduce columns number (to 10 for example) or do `addSuffix` only once - 
> it works fine.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31095) Upgrade netty-all to 4.1.47.Final

2020-03-11 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-31095:
--
Fix Version/s: 2.4.6

> Upgrade netty-all to 4.1.47.Final
> -
>
> Key: SPARK-31095
> URL: https://issues.apache.org/jira/browse/SPARK-31095
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.5, 3.0.0, 3.1.0
>Reporter: Vishwas Vijaya Kumar
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: security
> Fix For: 3.0.0, 2.4.6
>
>
> Upgrade version of io.netty_netty-all to 4.1.44.Final 
> [CVE-2019-20445|https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-20445]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31077) Remove ChiSqSelector dependency on mllib.ChiSqSelectorModel

2020-03-11 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-31077:


Assignee: Huaxin Gao

> Remove ChiSqSelector dependency on mllib.ChiSqSelectorModel
> ---
>
> Key: SPARK-31077
> URL: https://issues.apache.org/jira/browse/SPARK-31077
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 3.1.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Major
>
> Currently, ChiSqSelector depends on mllib.ChiSqSelectorModel. Remove this 
> dependency. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-31077) Remove ChiSqSelector dependency on mllib.ChiSqSelectorModel

2020-03-11 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-31077.
--
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 27841
[https://github.com/apache/spark/pull/27841]

> Remove ChiSqSelector dependency on mllib.ChiSqSelectorModel
> ---
>
> Key: SPARK-31077
> URL: https://issues.apache.org/jira/browse/SPARK-31077
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 3.1.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 3.1.0
>
>
> Currently, ChiSqSelector depends on mllib.ChiSqSelectorModel. Remove this 
> dependency. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25987) StackOverflowError when executing many operations on a table with many columns

2020-03-11 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-25987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17057327#comment-17057327
 ] 

Dongjoon Hyun commented on SPARK-25987:
---

Do you mean in `branch-2.4`? Let me check that quickly.

> StackOverflowError when executing many operations on a table with many columns
> --
>
> Key: SPARK-25987
> URL: https://issues.apache.org/jira/browse/SPARK-25987
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.1, 2.2.2, 2.3.0, 2.3.2, 2.4.0, 2.4.5
> Environment: Ubuntu 18.04.1 LTS, openjdk "1.8.0_181"
>Reporter: Ivan Tsukanov
>Priority: Major
>
> When I execute
> {code:java}
> import org.apache.spark.sql._
> import org.apache.spark.sql.types._
> val columnsCount = 100
> val columns = (1 to columnsCount).map(i => s"col$i")
> val initialData = (1 to columnsCount).map(i => s"val$i")
> val df = spark.createDataFrame(
>   rowRDD = spark.sparkContext.makeRDD(Seq(Row.fromSeq(initialData))),
>   schema = StructType(columns.map(StructField(_, StringType, true)))
> )
> val addSuffixUDF = udf(
>   (str: String) => str + "_added"
> )
> implicit class DFOps(df: DataFrame) {
>   def addSuffix() = {
> df.select(columns.map(col =>
>   addSuffixUDF(df(col)).as(col)
> ): _*)
>   }
> }
> df.addSuffix().addSuffix().addSuffix().show()
> {code}
> I get
> {code:java}
> An exception or error caused a run to abort.
> java.lang.StackOverflowError
>  at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:385)
>  at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:553)
> ...
> {code}
> If I reduce columns number (to 10 for example) or do `addSuffix` only once - 
> it works fine.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25987) StackOverflowError when executing many operations on a table with many columns

2020-03-11 Thread L. C. Hsieh (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-25987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17057330#comment-17057330
 ] 

L. C. Hsieh commented on SPARK-25987:
-

Because I'm not sure how this got fixed. I can only see this is superceded by 
"SPARK-26298 Upgrade Janino version to 3.0.11", so I'm wondering if upgrading 
Janino can just fix this.

> StackOverflowError when executing many operations on a table with many columns
> --
>
> Key: SPARK-25987
> URL: https://issues.apache.org/jira/browse/SPARK-25987
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.1, 2.2.2, 2.3.0, 2.3.2, 2.4.0, 2.4.5
> Environment: Ubuntu 18.04.1 LTS, openjdk "1.8.0_181"
>Reporter: Ivan Tsukanov
>Priority: Major
>
> When I execute
> {code:java}
> import org.apache.spark.sql._
> import org.apache.spark.sql.types._
> val columnsCount = 100
> val columns = (1 to columnsCount).map(i => s"col$i")
> val initialData = (1 to columnsCount).map(i => s"val$i")
> val df = spark.createDataFrame(
>   rowRDD = spark.sparkContext.makeRDD(Seq(Row.fromSeq(initialData))),
>   schema = StructType(columns.map(StructField(_, StringType, true)))
> )
> val addSuffixUDF = udf(
>   (str: String) => str + "_added"
> )
> implicit class DFOps(df: DataFrame) {
>   def addSuffix() = {
> df.select(columns.map(col =>
>   addSuffixUDF(df(col)).as(col)
> ): _*)
>   }
> }
> df.addSuffix().addSuffix().addSuffix().show()
> {code}
> I get
> {code:java}
> An exception or error caused a run to abort.
> java.lang.StackOverflowError
>  at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:385)
>  at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:553)
> ...
> {code}
> If I reduce columns number (to 10 for example) or do `addSuffix` only once - 
> it works fine.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25987) StackOverflowError when executing many operations on a table with many columns

2020-03-11 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-25987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17057334#comment-17057334
 ] 

Dongjoon Hyun commented on SPARK-25987:
---

Unfortunately, it seems that we need more patches from `branch-3.0`. With only 
Janino 3.0.11 on `branch-2.4`, it fails.

> StackOverflowError when executing many operations on a table with many columns
> --
>
> Key: SPARK-25987
> URL: https://issues.apache.org/jira/browse/SPARK-25987
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.1, 2.2.2, 2.3.0, 2.3.2, 2.4.0, 2.4.5
> Environment: Ubuntu 18.04.1 LTS, openjdk "1.8.0_181"
>Reporter: Ivan Tsukanov
>Priority: Major
>
> When I execute
> {code:java}
> import org.apache.spark.sql._
> import org.apache.spark.sql.types._
> val columnsCount = 100
> val columns = (1 to columnsCount).map(i => s"col$i")
> val initialData = (1 to columnsCount).map(i => s"val$i")
> val df = spark.createDataFrame(
>   rowRDD = spark.sparkContext.makeRDD(Seq(Row.fromSeq(initialData))),
>   schema = StructType(columns.map(StructField(_, StringType, true)))
> )
> val addSuffixUDF = udf(
>   (str: String) => str + "_added"
> )
> implicit class DFOps(df: DataFrame) {
>   def addSuffix() = {
> df.select(columns.map(col =>
>   addSuffixUDF(df(col)).as(col)
> ): _*)
>   }
> }
> df.addSuffix().addSuffix().addSuffix().show()
> {code}
> I get
> {code:java}
> An exception or error caused a run to abort.
> java.lang.StackOverflowError
>  at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:385)
>  at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:553)
> ...
> {code}
> If I reduce columns number (to 10 for example) or do `addSuffix` only once - 
> it works fine.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29183) Upgrade JDK 11 Installation to 11.0.6

2020-03-11 Thread Shane Knapp (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17057335#comment-17057335
 ] 

Shane Knapp commented on SPARK-29183:
-

i'll get to this later this week/early next.

> Upgrade JDK 11 Installation to 11.0.6
> -
>
> Key: SPARK-29183
> URL: https://issues.apache.org/jira/browse/SPARK-29183
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Shane Knapp
>Priority: Major
>
> Every JDK 11.0.x releases have many fixes including performance regression 
> fix. We had better upgrade it to the latest 11.0.4.
> - https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8221760



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-29183) Upgrade JDK 11 Installation to 11.0.6

2020-03-11 Thread Shane Knapp (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Knapp reassigned SPARK-29183:
---

Assignee: Shane Knapp

> Upgrade JDK 11 Installation to 11.0.6
> -
>
> Key: SPARK-29183
> URL: https://issues.apache.org/jira/browse/SPARK-29183
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Shane Knapp
>Priority: Major
>
> Every JDK 11.0.x releases have many fixes including performance regression 
> fix. We had better upgrade it to the latest 11.0.4.
> - https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8221760



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >