date:20210104

[jira] [Commented] (SPARK-33952) Python-friendly dtypes for pyspark dataframes

2021-01-04 Thread Marc de Lignie (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258046#comment-17258046
 ] 

Marc de Lignie commented on SPARK-33952:


@[~hyukjin.kwon] Thanks for asking. When you write a pyspark UDF or get a 
pyspark DataFrame returned after a collect() it is much more recognizable to 
know that a column datatype is "[Row(x:[Row(x1:string, x2:string)], y:string, 
z:string)]" rather than "array>, 
y:string, z:string>>". Of course, this remains a matter of taste. Also, the 
original dtypes in terms of array, struct, map remain useful when applying 
push-down functions for which the documentation and naming uses these terms.

> Python-friendly dtypes for pyspark dataframes
> -
>
> Key: SPARK-33952
> URL: https://issues.apache.org/jira/browse/SPARK-33952
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Marc de Lignie
>Priority: Minor
>
> The pyspark.sql.DataFrame.dtypes attribute contains string representations of 
> the column datatypes in terms of JVM datatypes. However, for a python user it 
> is a significant mental step to translate these to the corresponding python 
> types encountered in UDF's and collected dataframes. This holds in particular 
> for nested composite datatypes (array, map and struct). It is proposed to 
> provide python-friendly dtypes in pyspark (as an addition, not a replacement) 
> in which array<>, map<> and struct<> are translated to [], {} and Row().
> Sample code, including tests, is available as [gist on 
> github|https://gist.github.com/vtslab/81ded1a7af006100e00bf2a4a70a8147]. More 
> explanation is provided at: 
> [https://yaaics.blogspot.com/2020/12/python-friendly-dtypes-for-pyspark.html]
> If this proposal finds sufficient support, I can provide a PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33005) Kubernetes GA Preparation

2021-01-04 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258047#comment-17258047
 ] 

Dongjoon Hyun commented on SPARK-33005:
---

Sure, [~hyukjin.kwon].

> Kubernetes GA Preparation
> -
>
> Key: SPARK-33005
> URL: https://issues.apache.org/jira/browse/SPARK-33005
> Project: Spark
>  Issue Type: Umbrella
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: releasenotes
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33276) Fix K8s IT Flakiness

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33276:
--
Parent: (was: SPARK-33005)
Issue Type: Bug  (was: Sub-task)

> Fix K8s IT Flakiness
> 
>
> Key: SPARK-33276
> URL: https://issues.apache.org/jira/browse/SPARK-33276
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Tests
>Affects Versions: 3.0.1, 3.1.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> The following two consecutive runs are using the same git hash, 
> a744fea3be12f1a53ab553040b95da730210bc88 .
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20K8s%20Builds/job/spark-master-test-k8s/646/
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20K8s%20Builds/job/spark-master-test-k8s/647/
> However, the second one fails while the first one succeeds.
> {code}
> KubernetesSuite:
> - Run SparkPi with no resources *** FAILED ***
>   The code passed to eventually never returned normally. Attempted 190 times 
> over 3.00269949337 minutes. Last failure message: false was not true. 
> (KubernetesSuite.scala:383)
> - Run SparkPi with a very long application name.
> - Use SparkLauncher.NO_RESOURCE
> - Run SparkPi with a master URL without a scheme.
> - Run SparkPi with an argument.
> - Run SparkPi with custom labels, annotations, and environment variables.
> - All pods have the same service account by default
> - Run extraJVMOptions check on driver
> - Run SparkRemoteFileTest using a remote data file
> - Run SparkPi with env and mount secrets.
> - Run PySpark on simple pi.py example
> - Run PySpark to test a pyfiles example
> - Run PySpark with memory customization
> - Run in client mode.
> - Start pod creation from template
> - PVs with local storage
> - Launcher client dependencies
> - Test basic decommissioning
> - Test basic decommissioning with shuffle cleanup *** FAILED ***
>   The code passed to eventually never returned normally. Attempted 184 times 
> over 3.017213349366 minutes. Last failure message: "++ id -u
>   + myuid=185
>   ++ id -g
>   + mygid=0
>   + set +e
>   ++ getent passwd 185
>   + uidentry=
>   + set -e
>   + '[' -z '' ']'
>   + '[' -w /etc/passwd ']'
>   + echo '185:x:185:0:anonymous uid:/opt/spark:/bin/false'
>   + SPARK_CLASSPATH=':/opt/spark/jars/*'
>   + env
>   + grep SPARK_JAVA_OPT_
>   + sort -t_ -k4 -n
>   + sed 's/[^=]*=\(.*\)/\1/g'
>   + readarray -t SPARK_EXECUTOR_JAVA_OPTS
>   + '[' -n '' ']'
>   + '[' 3 == 3 ']'
>   ++ python3 -V
>   + pyv3='Python 3.7.3'
>   + export PYTHON_VERSION=3.7.3
>   + PYTHON_VERSION=3.7.3
>   + export PYSPARK_PYTHON=python3
>   + PYSPARK_PYTHON=python3
>   + export PYSPARK_DRIVER_PYTHON=python3
>   + PYSPARK_DRIVER_PYTHON=python3
>   + '[' -n '' ']'
>   + '[' -z ']'
>   + case "$1" in
>   + shift 1
>   + CMD=("$SPARK_HOME/bin/spark-submit" --conf 
> "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client 
> "$@")
>   + exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf 
> spark.driver.bindAddress=172.17.0.4 --deploy-mode client --properties-file 
> /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner 
> local:///opt/spark/tests/decommissioning_cleanup.py
>   20/10/28 19:47:28 WARN NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
>   Starting decom test
>   Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
>   20/10/28 19:47:29 INFO SparkContext: Running Spark version 3.1.0-SNAPSHOT
>   20/10/28 19:47:29 INFO ResourceUtils: 
> ==
>   20/10/28 19:47:29 INFO ResourceUtils: No custom resources configured for 
> spark.driver.
>   20/10/28 19:47:29 INFO ResourceUtils: 
> ==
>   20/10/28 19:47:29 INFO SparkContext: Submitted application: DecomTest
>   20/10/28 19:47:29 INFO ResourceProfile: Default ResourceProfile created, 
> executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , 
> memory -> name: memory, amount: 1024, script: , vendor: ), task resources: 
> Map(cpus -> name: cpus, amount: 1.0)
>   20/10/28 19:47:29 INFO ResourceProfile: Limiting resource is cpus at 1 
> tasks per executor
>   20/10/28 19:47:29 INFO ResourceProfileManager: Added ResourceProfile id: 0
>   20/10/28 19:47:29 INFO SecurityManager: Changing view acls to: 185,jenkins
>   20/10/28 19:47:29 INFO SecurityManager: Changing modify acls to: 185,jenkins
>   20/10/28 19:47:29 INFO SecurityManager: Changing view acls groups to: 
>   20/10/28 19:47:29 INFO SecurityManager: Changing modify acls groups to: 
>   20/10/28 19:47:29 INFO SecurityManager: SecurityManager: authentication 
> enabled; ui acls disabled; users  with view permission

[jira] [Updated] (SPARK-28992) Support update dependencies from hdfs when task run on executor pods

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-28992:
--
Parent: (was: SPARK-33005)
Issue Type: Improvement  (was: Sub-task)

> Support update dependencies from hdfs when task run on executor pods
> 
>
> Key: SPARK-28992
> URL: https://issues.apache.org/jira/browse/SPARK-28992
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Spark Core
>Affects Versions: 3.1.0
>Reporter: Kent Yao
>Priority: Major
>
> Here is a case: 
> {code:java}
> bin/spark-submit  --class com.github.ehiggs.spark.terasort.TeraSort 
> hdfs://hz-cluster10/user/kyuubi/udf/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar
>  hdfs://hz-cluster10/user/kyuubi/terasort/1000g 
> hdfs://hz-cluster10/user/kyuubi/terasort/1000g-out1
> {code}
> Spark supports add jar logic and application-jar from hdfs - -  
> [http://spark.apache.org/docs/latest/submitting-applications.html#launching-applications-with-spark-submit]
> Take spark on yarn for example, it creates a __spark_hadoop_conf__.xml file 
> and upload the hadoop distribute cache, the executor processes can use this 
> to identify where their dependencies located.
> But on k8s, i tried and failed to update dependencies.
> {code:java}
> 19/09/04 08:08:52 INFO scheduler.DAGScheduler: ShuffleMapStage 0 
> (newAPIHadoopFile at TeraSort.scala:60) failed in 1.058 s due to Job aborted 
> due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent 
> failure: Lost task 0.3 in stage 0.0 (TID 9, 100.66.0.75, executor 2): 
> java.lang.IllegalArgumentException: java.net.UnknownHostException: 
> hz-cluster10
> 19/09/04 08:08:52 INFO scheduler.DAGScheduler: ShuffleMapStage 0 
> (newAPIHadoopFile at TeraSort.scala:60) failed in 1.058 s due to Job aborted 
> due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent 
> failure: Lost task 0.3 in stage 0.0 (TID 9, 100.66.0.75, executor 2): 
> java.lang.IllegalArgumentException: java.net.UnknownHostException: 
> hz-cluster10 at 
> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:378)
>  at 
> org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:310)
>  at 
> org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176) 
> at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:678) at 
> org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:619) at 
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)
>  at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669) at 
> org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94) at 
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703) at 
> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685) at 
> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373) at 
> org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1881) at 
> org.apache.spark.util.Utils$.doFetchFile(Utils.scala:737) at 
> org.apache.spark.util.Utils$.fetchFile(Utils.scala:522) at 
> org.apache.spark.executor.Executor.$anonfun$updateDependencies$7(Executor.scala:869)
>  at 
> org.apache.spark.executor.Executor.$anonfun$updateDependencies$7$adapted(Executor.scala:860)
>  at 
> scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:792)
>  at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149) at 
> scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237) at 
> scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230) at 
> scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44) at 
> scala.collection.mutable.HashMap.foreach(HashMap.scala:149) at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:791)
>  at 
> org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:860)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:409) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28895) Spark client process is unable to upload jars to hdfs while using ConfigMap not HADOOP_CONF_DIR

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-28895:
--
Parent: (was: SPARK-33005)
Issue Type: Bug  (was: Sub-task)

> Spark client process is unable to upload jars to hdfs while using ConfigMap 
> not HADOOP_CONF_DIR
> ---
>
> Key: SPARK-28895
> URL: https://issues.apache.org/jira/browse/SPARK-28895
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Core
>Affects Versions: 3.0.0
>Reporter: Kent Yao
>Priority: Major
>
> The *BasicDriverFeatureStep* for Spark on Kubernetes will upload the 
> files/jars specified by --files/–jars to a hadoop compatible file system 
> configured by spark.kubernetes.file.upload.path. While using HADOOP_CONF_DIR, 
> the spark-submit process can recognize the file system, but when using 
> spark.kubernetes.hadoop.configMapName which only will be mount on the Pods 
> not applied back to our client process. 
>  
> ||Heading 1||Heading 2||
> |HADOOP_CONF_DIR=/path/to/etc/hadoop|OK|
> |spark.kubernetes.hadoop.configMapName=hz10-hadoop-dir |FAILED|
>  
> {code:java}
>  Kent@KentsMacBookPro  
> ~/Documents/spark-on-k8s/spark-3.0.0-SNAPSHOT-bin-2.7.3  bin/spark-submit 
> --conf spark.kubernetes.file.upload.path=hdfs://hz-cluster10/user/kyuubi/udf 
> --jars 
> /Users/Kent/Documents/spark-on-k8s/spark-3.0.0-SNAPSHOT-bin-2.7.3/hadoop-lzo-0.4.20-SNAPSHOT.jar
>  --conf spark.kerberos.keytab=/Users/Kent/Downloads/kyuubi.keytab --conf 
> spark.kerberos.principal=kyuubi/d...@hadoop.hz.netease.com --conf  
> spark.kubernetes.kerberos.krb5.path=/etc/krb5.conf  --name hehe --deploy-mode 
> cluster --class org.apache.spark.examples.HdfsTest   
> local:///opt/spark/examples/jars/spark-examples_2.12-3.0.0-SNAPSHOT.jar 
> hdfs://hz-cluster10/user/kyuubi/hive_db/kyuubi.db/hive_tbl
> Listening for transport dt_socket at address: 50014
> # spark.master=k8s://https://10.120.238.100:7443
> 19/08/27 17:21:06 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
> 19/08/27 17:21:07 INFO SparkKubernetesClientFactory: Auto-configuring K8S 
> client using current context from users K8S config file
> Listening for transport dt_socket at address: 50014
> Exception in thread "main" org.apache.spark.SparkException: Uploading file 
> /Users/Kent/Documents/spark-on-k8s/spark-3.0.0-SNAPSHOT-bin-2.7.3/hadoop-lzo-0.4.20-SNAPSHOT.jar
>  failed...
>   at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:287)
>   at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.$anonfun$uploadAndTransformFileUris$1(KubernetesUtils.scala:246)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237)
>   at 
> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>   at 
> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:237)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:230)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>   at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadAndTransformFileUris(KubernetesUtils.scala:245)
>   at 
> org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.$anonfun$getAdditionalPodSystemProperties$1(BasicDriverFeatur#
>  spark.master=k8s://https://10.120.238.100:7443
> eStep.scala:165)
>   at scala.collection.immutable.List.foreach(List.scala:392)
>   at 
> org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.getAdditionalPodSystemProperties(BasicDriverFeatureStep.scala:163)
>   at 
> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$3(KubernetesDriverBuilder.scala:60)
>   at 
> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
>   at 
> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
>   at scala.collection.immutable.List.foldLeft(List.scala:89)
>   at 
> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:58)
>   at 
> org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:101)
>   at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$10(KubernetesClientApplication.scala:236)
>   at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$10$adapted(KubernetesClientApplication.scala:229)
>   at

[jira] [Updated] (SPARK-33349) ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33349:
--
Parent: (was: SPARK-33005)
Issue Type: Bug  (was: Sub-task)

> ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed
> --
>
> Key: SPARK-33349
> URL: https://issues.apache.org/jira/browse/SPARK-33349
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.0.1, 3.0.2, 3.1.0
>Reporter: Nicola Bova
>Priority: Critical
>
> I launch my spark application with the 
> [spark-on-kubernetes-operator|https://github.com/GoogleCloudPlatform/spark-on-k8s-operator]
>  with the following yaml file:
> {code:yaml}
> apiVersion: sparkoperator.k8s.io/v1beta2
> kind: SparkApplication
> metadata:
>    name: spark-kafka-streamer-test
>    namespace: kafka2hdfs
> spec: 
>    type: Scala
>    mode: cluster
>    image: /spark:3.0.2-SNAPSHOT-2.12-0.1.0
>    imagePullPolicy: Always
>    timeToLiveSeconds: 259200
>    mainClass: path.to.my.class.KafkaStreamer
>    mainApplicationFile: spark-kafka-streamer_2.12-spark300-assembly.jar
>    sparkVersion: 3.0.1
>    restartPolicy:
>  type: Always
>    sparkConf:
>  "spark.kafka.consumer.cache.capacity": "8192"
>  "spark.kubernetes.memoryOverheadFactor": "0.3"
>    deps:
>    jars:
>  - my
>  - jar
>  - list
>    hadoopConfigMap: hdfs-config
>    driver:
>  cores: 4
>  memory: 12g
>  labels:
>    version: 3.0.1
>  serviceAccount: default
>  javaOptions: 
> "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"
>   executor:
>  instances: 4
>     cores: 4
>     memory: 16g
>     labels:
>   version: 3.0.1
>     javaOptions: 
> "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"
> {code}
>  I have tried with both Spark `3.0.1` and `3.0.2-SNAPSHOT` with the ["Restart 
> the watcher when we receive a version changed from 
> k8s"|https://github.com/apache/spark/pull/29533] patch.
> This is the driver log:
> {code}
> 20/11/04 12:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> ... // my app log, it's a structured streaming app reading from kafka and 
> writing to hdfs
> 20/11/04 13:12:12 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has 
> been closed (this is expected if the application is shutting down.)
> io.fabric8.kubernetes.client.KubernetesClientException: too old resource 
> version: 1574101276 (1574213896)
>  at 
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259)
>  at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323)
>  at 
> okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219)
>  at 
> okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105)
>  at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
>  at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
>  at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203)
>  at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
>  at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
> Source)
>  at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
> Source)
>  at java.base/java.lang.Thread.run(Unknown Source)
> {code}
> The error above appears after roughly 50 minutes.
> After the exception above, no more logs are produced and the app hangs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33711) Race condition in Spark k8s Pod lifecycle manager that leads to shutdowns

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33711:
--
Parent: (was: SPARK-33005)
Issue Type: Bug  (was: Sub-task)

>  Race condition in Spark k8s Pod lifecycle manager that leads to shutdowns
> --
>
> Key: SPARK-33711
> URL: https://issues.apache.org/jira/browse/SPARK-33711
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.4, 2.4.7, 3.0.0, 3.1.0, 3.2.0
>Reporter: Attila Zsolt Piros
>Priority: Major
>
> Watching a POD (ExecutorPodsWatchSnapshotSource) informs about single POD 
> changes which could wrongfully lead to detecting of missing PODs (PODs known 
> by scheduler backend but missing from POD snapshots) by the executor POD 
> lifecycle manager.
> A key indicator of this is seeing this log msg:
> "The executor with ID [some_id] was not found in the cluster but we didn't 
> get a reason why. Marking the executor as failed. The executor may have been 
> deleted but the driver missed the deletion event."
> So one of the problem is running the missing POD detection even when a single 
> pod is changed without having a full consistent snapshot about all the PODs 
> (see ExecutorPodsPollingSnapshotSource). The other could be a race between 
> the executor POD lifecycle manager and the scheduler backend.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33005) Kubernetes GA Preparation

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33005.
---
Resolution: Done

> Kubernetes GA Preparation
> -
>
> Key: SPARK-33005
> URL: https://issues.apache.org/jira/browse/SPARK-33005
> Project: Spark
>  Issue Type: Umbrella
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: releasenotes
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33005) Kubernetes GA Preparation

2021-01-04 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258051#comment-17258051
 ] 

Hyukjin Kwon commented on SPARK-33005:
--

Awesome!

> Kubernetes GA Preparation
> -
>
> Key: SPARK-33005
> URL: https://issues.apache.org/jira/browse/SPARK-33005
> Project: Spark
>  Issue Type: Umbrella
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: releasenotes
> Fix For: 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33005) Kubernetes GA Preparation

2021-01-04 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-33005:
-
Fix Version/s: 3.1.0

> Kubernetes GA Preparation
> -
>
> Key: SPARK-33005
> URL: https://issues.apache.org/jira/browse/SPARK-33005
> Project: Spark
>  Issue Type: Umbrella
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: releasenotes
> Fix For: 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33982) Sparksql does not support when the inserted table is a read table

2021-01-04 Thread hao (Jira)

hao created SPARK-33982:
---

 Summary: Sparksql does not support when the inserted table is a 
read table
 Key: SPARK-33982
 URL: https://issues.apache.org/jira/browse/SPARK-33982
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.1
Reporter: hao


When the inserted table is a read table, sparksql will throw an error - > 
org.apache.spark . sql.AnalysisException : Cannot overwrite a path that is also 
being read from.;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33982) Sparksql does not support when the inserted table is a read table

2021-01-04 Thread hao (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258052#comment-17258052
 ] 

hao commented on SPARK-33982:
-

我认为sparksql应该得到支持

> Sparksql does not support when the inserted table is a read table
> -
>
> Key: SPARK-33982
> URL: https://issues.apache.org/jira/browse/SPARK-33982
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: hao
>Priority: Major
>
> When the inserted table is a read table, sparksql will throw an error - > 
> org.apache.spark . sql.AnalysisException : Cannot overwrite a path that is 
> also being read from.;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33965) CACHE TABLE does not support `spark_catalog` in Hive table names

2021-01-04 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-33965.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 30997
[https://github.com/apache/spark/pull/30997]

> CACHE TABLE does not support `spark_catalog` in Hive table names
> 
>
> Key: SPARK-33965
> URL: https://issues.apache.org/jira/browse/SPARK-33965
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> The test fails:
> {code:scala}
>   test("SPARK-X: cache table in spark_catalog") {
> withNamespace("spark_catalog.ns") {
>   sql("CREATE NAMESPACE spark_catalog.ns")
>   val t = "spark_catalog.ns.tbl"
>   withTable(t) {
> sql(s"CREATE TABLE $t (col int)")
> assert(!spark.catalog.isCached(t))
> sql(s"CACHE TABLE $t")
> assert(spark.catalog.isCached(t))
>   }
> }
>   }
> {code}
> with the exception:
> {code:java}
> [info] - SPARK-X: cache table in spark_catalog *** FAILED *** (278 
> milliseconds)
> [info]   org.apache.spark.sql.AnalysisException: spark_catalog.ns.tbl is not 
> a valid TableIdentifier as it has more than 2 name parts.
> [info]   at 
> org.apache.spark.sql.connector.catalog.CatalogV2Implicits$MultipartIdentifierHelper.asTableIdentifier(CatalogV2Implicits.scala:130)
> [info]   at 
> org.apache.spark.sql.hive.test.TestHiveQueryExecution.$anonfun$analyzed$1(TestHive.scala:600)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33965) CACHE TABLE does not support `spark_catalog` in Hive table names

2021-01-04 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-33965:
---

Assignee: Maxim Gekk

> CACHE TABLE does not support `spark_catalog` in Hive table names
> 
>
> Key: SPARK-33965
> URL: https://issues.apache.org/jira/browse/SPARK-33965
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
>
> The test fails:
> {code:scala}
>   test("SPARK-X: cache table in spark_catalog") {
> withNamespace("spark_catalog.ns") {
>   sql("CREATE NAMESPACE spark_catalog.ns")
>   val t = "spark_catalog.ns.tbl"
>   withTable(t) {
> sql(s"CREATE TABLE $t (col int)")
> assert(!spark.catalog.isCached(t))
> sql(s"CACHE TABLE $t")
> assert(spark.catalog.isCached(t))
>   }
> }
>   }
> {code}
> with the exception:
> {code:java}
> [info] - SPARK-X: cache table in spark_catalog *** FAILED *** (278 
> milliseconds)
> [info]   org.apache.spark.sql.AnalysisException: spark_catalog.ns.tbl is not 
> a valid TableIdentifier as it has more than 2 name parts.
> [info]   at 
> org.apache.spark.sql.connector.catalog.CatalogV2Implicits$MultipartIdentifierHelper.asTableIdentifier(CatalogV2Implicits.scala:130)
> [info]   at 
> org.apache.spark.sql.hive.test.TestHiveQueryExecution.$anonfun$analyzed$1(TestHive.scala:600)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33978) Support ZSTD compression in ORC data source

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33978.
---
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 31002
[https://github.com/apache/spark/pull/31002]

> Support ZSTD compression in ORC data source
> ---
>
> Key: SPARK-33978
> URL: https://issues.apache.org/jira/browse/SPARK-33978
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.2.0
>
>
> h3. What changes were proposed in this pull request?
> This PR aims to support ZSTD compression in ORC data source.
> h3. Why are the changes needed?
> Apache ORC 1.6 supports ZSTD compression to generate more compact files and 
> save the storage cost.
> *BEFORE*
> {code:java}
> scala> spark.range(10).write.option("compression", "zstd").orc("/tmp/zstd")
>  java.lang.IllegalArgumentException: Codec [zstd] is not available. Available 
> codecs are uncompressed, lzo, snappy, zlib, none. {code}
> *AFTER*
> {code:java}
> scala> spark.range(10).write.option("compression", "zstd").orc("/tmp/zstd") 
> {code}
> {code:java}
>  $ orc-tools meta /tmp/zstd 
>  Processing data file 
> file:/tmp/zstd/part-00011-a63d9a17-456f-42d3-87a1-d922112ed28c-c000.orc 
> [length: 230]
>  Structure for 
> file:/tmp/zstd/part-00011-a63d9a17-456f-42d3-87a1-d922112ed28c-c000.orc
>  File Version: 0.12 with ORC_14
>  Rows: 1
>  Compression: ZSTD
>  Compression size: 262144
>  Calendar: Julian/Gregorian
>  Type: struct
> Stripe Statistics:
>  Stripe 1:
>  Column 0: count: 1 hasNull: false
>  Column 1: count: 1 hasNull: false bytesOnDisk: 6 min: 9 max: 9 sum: 9
> File Statistics:
>  Column 0: count: 1 hasNull: false
>  Column 1: count: 1 hasNull: false bytesOnDisk: 6 min: 9 max: 9 sum: 9
> Stripes:
>  Stripe: offset: 3 data: 6 rows: 1 tail: 35 index: 35
>  Stream: column 0 section ROW_INDEX start: 3 length 11
>  Stream: column 1 section ROW_INDEX start: 14 length 24
>  Stream: column 1 section DATA start: 38 length 6
>  Encoding column 0: DIRECT
>  Encoding column 1: DIRECT_V2
> File length: 230 bytes
>  Padding length: 0 bytes
>  Padding ratio: 0%
> User Metadata:
>  org.apache.spark.version=3.2.0{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33978) Support ZSTD compression in ORC data source

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-33978:
-

Assignee: Dongjoon Hyun

> Support ZSTD compression in ORC data source
> ---
>
> Key: SPARK-33978
> URL: https://issues.apache.org/jira/browse/SPARK-33978
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>
> h3. What changes were proposed in this pull request?
> This PR aims to support ZSTD compression in ORC data source.
> h3. Why are the changes needed?
> Apache ORC 1.6 supports ZSTD compression to generate more compact files and 
> save the storage cost.
> *BEFORE*
> {code:java}
> scala> spark.range(10).write.option("compression", "zstd").orc("/tmp/zstd")
>  java.lang.IllegalArgumentException: Codec [zstd] is not available. Available 
> codecs are uncompressed, lzo, snappy, zlib, none. {code}
> *AFTER*
> {code:java}
> scala> spark.range(10).write.option("compression", "zstd").orc("/tmp/zstd") 
> {code}
> {code:java}
>  $ orc-tools meta /tmp/zstd 
>  Processing data file 
> file:/tmp/zstd/part-00011-a63d9a17-456f-42d3-87a1-d922112ed28c-c000.orc 
> [length: 230]
>  Structure for 
> file:/tmp/zstd/part-00011-a63d9a17-456f-42d3-87a1-d922112ed28c-c000.orc
>  File Version: 0.12 with ORC_14
>  Rows: 1
>  Compression: ZSTD
>  Compression size: 262144
>  Calendar: Julian/Gregorian
>  Type: struct
> Stripe Statistics:
>  Stripe 1:
>  Column 0: count: 1 hasNull: false
>  Column 1: count: 1 hasNull: false bytesOnDisk: 6 min: 9 max: 9 sum: 9
> File Statistics:
>  Column 0: count: 1 hasNull: false
>  Column 1: count: 1 hasNull: false bytesOnDisk: 6 min: 9 max: 9 sum: 9
> Stripes:
>  Stripe: offset: 3 data: 6 rows: 1 tail: 35 index: 35
>  Stream: column 0 section ROW_INDEX start: 3 length 11
>  Stream: column 1 section ROW_INDEX start: 14 length 24
>  Stream: column 1 section DATA start: 38 length 6
>  Encoding column 0: DIRECT
>  Encoding column 1: DIRECT_V2
> File length: 230 bytes
>  Padding length: 0 bytes
>  Padding ratio: 0%
> User Metadata:
>  org.apache.spark.version=3.2.0{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33949) Make approx_count_distinct result consistent whether Optimize rule exists or not

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33949:


Assignee: (was: Apache Spark)

> Make approx_count_distinct result consistent whether Optimize rule exists or 
> not
> 
>
> Key: SPARK-33949
> URL: https://issues.apache.org/jira/browse/SPARK-33949
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: ulysses you
>Priority: Minor
>
> This code will fail because folabe value not fold, we should keep result 
> consistent whether Optimize rule exists or not.
> {code:java}
> val excludedRules = Seq(ConstantFolding, 
> ReorderAssociativeOperator).map(_.ruleName)
> withSQLConf(SQLConf.OPTIMIZER_EXCLUDED_RULES.key -> 
> excludedRules.mkString(",")) {
>   sql("select approx_count_distinct(1, 0.01 + 0.02)")
> }{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33949) Make approx_count_distinct result consistent whether Optimize rule exists or not

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258074#comment-17258074
 ] 

Apache Spark commented on SPARK-33949:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/31005

> Make approx_count_distinct result consistent whether Optimize rule exists or 
> not
> 
>
> Key: SPARK-33949
> URL: https://issues.apache.org/jira/browse/SPARK-33949
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: ulysses you
>Priority: Minor
>
> This code will fail because folabe value not fold, we should keep result 
> consistent whether Optimize rule exists or not.
> {code:java}
> val excludedRules = Seq(ConstantFolding, 
> ReorderAssociativeOperator).map(_.ruleName)
> withSQLConf(SQLConf.OPTIMIZER_EXCLUDED_RULES.key -> 
> excludedRules.mkString(",")) {
>   sql("select approx_count_distinct(1, 0.01 + 0.02)")
> }{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33949) Make approx_count_distinct result consistent whether Optimize rule exists or not

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33949:


Assignee: Apache Spark

> Make approx_count_distinct result consistent whether Optimize rule exists or 
> not
> 
>
> Key: SPARK-33949
> URL: https://issues.apache.org/jira/browse/SPARK-33949
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: ulysses you
>Assignee: Apache Spark
>Priority: Minor
>
> This code will fail because folabe value not fold, we should keep result 
> consistent whether Optimize rule exists or not.
> {code:java}
> val excludedRules = Seq(ConstantFolding, 
> ReorderAssociativeOperator).map(_.ruleName)
> withSQLConf(SQLConf.OPTIMIZER_EXCLUDED_RULES.key -> 
> excludedRules.mkString(",")) {
>   sql("select approx_count_distinct(1, 0.01 + 0.02)")
> }{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33950) ALTER TABLE .. DROP PARTITION doesn't refresh cache

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258075#comment-17258075
 ] 

Apache Spark commented on SPARK-33950:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/31006

> ALTER TABLE .. DROP PARTITION doesn't refresh cache
> ---
>
> Key: SPARK-33950
> URL: https://issues.apache.org/jira/browse/SPARK-33950
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> Here is the example to reproduce the issue:
> {code:sql}
> spark-sql> CREATE TABLE tbl1 (col0 int, part0 int) USING parquet PARTITIONED 
> BY (part0);
> spark-sql> INSERT INTO tbl1 PARTITION (part0=0) SELECT 0;
> spark-sql> INSERT INTO tbl1 PARTITION (part0=1) SELECT 1;
> spark-sql> CACHE TABLE tbl1;
> spark-sql> SELECT * FROM tbl1;
> 0 0
> 1 1
> spark-sql> ALTER TABLE tbl1 DROP PARTITION (part0=0);
> spark-sql> SELECT * FROM tbl1;
> 0 0
> 1 1
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33983) Update cloudpickle to v1.6.0

2021-01-04 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-33983:


 Summary: Update cloudpickle to v1.6.0
 Key: SPARK-33983
 URL: https://issues.apache.org/jira/browse/SPARK-33983
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Hyukjin Kwon


Cloudpickle 1.6.0 is released out. We should better match it to the latest 
version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33983) Update cloudpickle to v1.6.0

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258104#comment-17258104
 ] 

Apache Spark commented on SPARK-33983:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/31007

> Update cloudpickle to v1.6.0
> 
>
> Key: SPARK-33983
> URL: https://issues.apache.org/jira/browse/SPARK-33983
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Cloudpickle 1.6.0 is released out. We should better match it to the latest 
> version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33983) Update cloudpickle to v1.6.0

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33983:


Assignee: (was: Apache Spark)

> Update cloudpickle to v1.6.0
> 
>
> Key: SPARK-33983
> URL: https://issues.apache.org/jira/browse/SPARK-33983
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Cloudpickle 1.6.0 is released out. We should better match it to the latest 
> version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33983) Update cloudpickle to v1.6.0

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33983:


Assignee: Apache Spark

> Update cloudpickle to v1.6.0
> 
>
> Key: SPARK-33983
> URL: https://issues.apache.org/jira/browse/SPARK-33983
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>
> Cloudpickle 1.6.0 is released out. We should better match it to the latest 
> version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33844) InsertIntoDir failed since query column name contains ',' cause column type and column names size not equal

2021-01-04 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-33844.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 30850
[https://github.com/apache/spark/pull/30850]

> InsertIntoDir failed since query column name contains ',' cause column type 
> and column names size not equal
> ---
>
> Key: SPARK-33844
> URL: https://issues.apache.org/jira/browse/SPARK-33844
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.2.0
>
>
>  
> After hive-2.3 we will set COLUMN_NAME_DELIMITER to special char when col 
> name cntains ','since column list and column types in serde is splited by 
> COLUMN_NAME_DELIMITER.
>  In spark-2.4.0 + hive-1.2.1 we will failed when INSERT OVERWRITE DIR when 
> query result schema columan name contains ','as
> {code:java}
>  org.apache.hadoop.hive.serde2.SerDeException: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe: columns has 14 elements 
> while columns.types has 11 elements! at 
> org.apache.hadoop.hive.serde2.lazy.LazySerDeParameters.extractColumnInfo(LazySerDeParameters.java:146)
>  at 
> org.apache.hadoop.hive.serde2.lazy.LazySerDeParameters.(LazySerDeParameters.java:85)
>  at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.initialize(LazySimpleSerDe.java:125)
>  at 
> org.apache.spark.sql.hive.execution.HiveOutputWriter.(HiveFileFormat.scala:119)
>  at 
> org.apache.spark.sql.hive.execution.HiveFileFormat$$anon$1.newInstance(HiveFileFormat.scala:103)
>  at 
> org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.newOutputWriter(FileFormatDataWriter.scala:120)
>  at 
> org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.(FileFormatDataWriter.scala:108)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:287)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:219)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:218)
>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at 
> org.apache.spark.scheduler.Task.run(Task.scala:121) at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$12.apply(Executor.scala:461)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:467) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748){code}
>  Since this problem has been solved by 
> [https://github.com/apache/hive/blob/6f4c35c9e904d226451c465effdc5bfd31d395a0/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java#L1044-L1075]
>  But I think we can do this in Spark side to make all version work well.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33844) InsertIntoDir failed since query column name contains ',' cause column type and column names size not equal

2021-01-04 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-33844:
---

Assignee: angerszhu

> InsertIntoDir failed since query column name contains ',' cause column type 
> and column names size not equal
> ---
>
> Key: SPARK-33844
> URL: https://issues.apache.org/jira/browse/SPARK-33844
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
>
>  
> After hive-2.3 we will set COLUMN_NAME_DELIMITER to special char when col 
> name cntains ','since column list and column types in serde is splited by 
> COLUMN_NAME_DELIMITER.
>  In spark-2.4.0 + hive-1.2.1 we will failed when INSERT OVERWRITE DIR when 
> query result schema columan name contains ','as
> {code:java}
>  org.apache.hadoop.hive.serde2.SerDeException: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe: columns has 14 elements 
> while columns.types has 11 elements! at 
> org.apache.hadoop.hive.serde2.lazy.LazySerDeParameters.extractColumnInfo(LazySerDeParameters.java:146)
>  at 
> org.apache.hadoop.hive.serde2.lazy.LazySerDeParameters.(LazySerDeParameters.java:85)
>  at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.initialize(LazySimpleSerDe.java:125)
>  at 
> org.apache.spark.sql.hive.execution.HiveOutputWriter.(HiveFileFormat.scala:119)
>  at 
> org.apache.spark.sql.hive.execution.HiveFileFormat$$anon$1.newInstance(HiveFileFormat.scala:103)
>  at 
> org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.newOutputWriter(FileFormatDataWriter.scala:120)
>  at 
> org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.(FileFormatDataWriter.scala:108)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:287)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:219)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:218)
>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at 
> org.apache.spark.scheduler.Task.run(Task.scala:121) at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$12.apply(Executor.scala:461)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:467) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748){code}
>  Since this problem has been solved by 
> [https://github.com/apache/hive/blob/6f4c35c9e904d226451c465effdc5bfd31d395a0/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java#L1044-L1075]
>  But I think we can do this in Spark side to make all version work well.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33958) spark sql DoubleType(0 * (-1)) return "-0.0"

2021-01-04 Thread Zhang Jianguo (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258109#comment-17258109
 ] 

Zhang Jianguo commented on SPARK-33958:
---

[~yumwang]

Gauss and Oracle return 0. And it looks mathe troditional SQL standard better.

My solution as following, plus 0.0 at every return of FloatType and DoubleType.

0.0 + 0.0 = 0.0

-0.0 + 0.0 = 0.0

 

I can provide pull request later.

> spark sql DoubleType(0 * (-1))  return "-0.0"
> -
>
> Key: SPARK-33958
> URL: https://issues.apache.org/jira/browse/SPARK-33958
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.2, 2.4.5, 3.0.0
>Reporter: Zhang Jianguo
>Priority: Minor
>
> spark version: 2.3.2
> {code:java}
> create table test_zjg(a double);
> insert into test_zjg values(-1.0);
> select a*0 from test_zjg
> {code}
>  After select operation, *{color:#de350b}we will get -0.0 which expected as 
> 0.0:{color}*
> \+\+
> \|(a * CAST(0 AS DOUBLE))\|
> \+\+
> \|-0.0                               \|
> \+\+
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-33958) spark sql DoubleType(0 * (-1)) return "-0.0"

2021-01-04 Thread Zhang Jianguo (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258109#comment-17258109
 ] 

Zhang Jianguo edited comment on SPARK-33958 at 1/4/21, 9:50 AM:


[~yumwang]

Gauss and Oracle return 0. And it looks match SQL standard better.

My solution as following, plus 0.0 at every return of FloatType and DoubleType.

0.0 + 0.0 = 0.0

-0.0 + 0.0 = 0.0

 

I can provide pull request later.


was (Author: alberyzjg):
[~yumwang]

Gauss and Oracle return 0. And it looks mathe troditional SQL standard better.

My solution as following, plus 0.0 at every return of FloatType and DoubleType.

0.0 + 0.0 = 0.0

-0.0 + 0.0 = 0.0

 

I can provide pull request later.

> spark sql DoubleType(0 * (-1))  return "-0.0"
> -
>
> Key: SPARK-33958
> URL: https://issues.apache.org/jira/browse/SPARK-33958
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.2, 2.4.5, 3.0.0
>Reporter: Zhang Jianguo
>Priority: Minor
>
> spark version: 2.3.2
> {code:java}
> create table test_zjg(a double);
> insert into test_zjg values(-1.0);
> select a*0 from test_zjg
> {code}
>  After select operation, *{color:#de350b}we will get -0.0 which expected as 
> 0.0:{color}*
> \+\+
> \|(a * CAST(0 AS DOUBLE))\|
> \+\+
> \|-0.0                               \|
> \+\+
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33977) Add doc for "'like any' and 'like all' operators"

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33977:


Assignee: Apache Spark

> Add doc for "'like any' and 'like all' operators"
> -
>
> Key: SPARK-33977
> URL: https://issues.apache.org/jira/browse/SPARK-33977
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 3.1.0
>Reporter: Xiao Li
>Assignee: Apache Spark
>Priority: Major
>
> Need to update the doc for the new LIKE predicates in the following file:
> [https://github.com/apache/spark/blob/master/docs/sql-ref-syntax-qry-select-like.md]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33977) Add doc for "'like any' and 'like all' operators"

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258121#comment-17258121
 ] 

Apache Spark commented on SPARK-33977:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/31008

> Add doc for "'like any' and 'like all' operators"
> -
>
> Key: SPARK-33977
> URL: https://issues.apache.org/jira/browse/SPARK-33977
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 3.1.0
>Reporter: Xiao Li
>Priority: Major
>
> Need to update the doc for the new LIKE predicates in the following file:
> [https://github.com/apache/spark/blob/master/docs/sql-ref-syntax-qry-select-like.md]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33977) Add doc for "'like any' and 'like all' operators"

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33977:


Assignee: (was: Apache Spark)

> Add doc for "'like any' and 'like all' operators"
> -
>
> Key: SPARK-33977
> URL: https://issues.apache.org/jira/browse/SPARK-33977
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 3.1.0
>Reporter: Xiao Li
>Priority: Major
>
> Need to update the doc for the new LIKE predicates in the following file:
> [https://github.com/apache/spark/blob/master/docs/sql-ref-syntax-qry-select-like.md]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33984) Upgrade to Py4J 0.10.9.1

2021-01-04 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-33984:


 Summary: Upgrade to Py4J 0.10.9.1
 Key: SPARK-33984
 URL: https://issues.apache.org/jira/browse/SPARK-33984
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Hyukjin Kwon


Py4J 0.10.9.1 is out with bug fixes. we should better upgrade in PySpark as 
well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33984) Upgrade to Py4J 0.10.9.1

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258134#comment-17258134
 ] 

Apache Spark commented on SPARK-33984:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/31009

> Upgrade to Py4J 0.10.9.1
> 
>
> Key: SPARK-33984
> URL: https://issues.apache.org/jira/browse/SPARK-33984
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Py4J 0.10.9.1 is out with bug fixes. we should better upgrade in PySpark as 
> well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33984) Upgrade to Py4J 0.10.9.1

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33984:


Assignee: (was: Apache Spark)

> Upgrade to Py4J 0.10.9.1
> 
>
> Key: SPARK-33984
> URL: https://issues.apache.org/jira/browse/SPARK-33984
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Py4J 0.10.9.1 is out with bug fixes. we should better upgrade in PySpark as 
> well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33984) Upgrade to Py4J 0.10.9.1

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33984:


Assignee: Apache Spark

> Upgrade to Py4J 0.10.9.1
> 
>
> Key: SPARK-33984
> URL: https://issues.apache.org/jira/browse/SPARK-33984
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>
> Py4J 0.10.9.1 is out with bug fixes. we should better upgrade in PySpark as 
> well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33984) Upgrade to Py4J 0.10.9.1

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258135#comment-17258135
 ] 

Apache Spark commented on SPARK-33984:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/31009

> Upgrade to Py4J 0.10.9.1
> 
>
> Key: SPARK-33984
> URL: https://issues.apache.org/jira/browse/SPARK-33984
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Py4J 0.10.9.1 is out with bug fixes. we should better upgrade in PySpark as 
> well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33985) Support transform with clusterby/orderby/sortby

2021-01-04 Thread angerszhu (Jira)

angerszhu created SPARK-33985:
-

 Summary: Support transform with clusterby/orderby/sortby
 Key: SPARK-33985
 URL: https://issues.apache.org/jira/browse/SPARK-33985
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: angerszhu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33976) Add a dedicated SQL document page for the TRANSFORM-related functionality,

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33976:


Assignee: Apache Spark

> Add a dedicated SQL document page for the TRANSFORM-related functionality,
> --
>
> Key: SPARK-33976
> URL: https://issues.apache.org/jira/browse/SPARK-33976
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: Apache Spark
>Priority: Major
>
> Add doc about transform 
> https://github.com/apache/spark/pull/30973#issuecomment-753715318



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33986) Spark handle always return LOST status in standalone cluster mode with Spark launcher

2021-01-04 Thread ZhongyuWang (Jira)

ZhongyuWang created SPARK-33986:
---

 Summary: Spark handle always return LOST status in standalone 
cluster mode with Spark launcher
 Key: SPARK-33986
 URL: https://issues.apache.org/jira/browse/SPARK-33986
 Project: Spark
  Issue Type: Question
  Components: Spark Submit
Affects Versions: 2.4.4
 Environment: apache hadoop 2.6.5

apache spark 2.4.4
Reporter: ZhongyuWang


I can use it to submit spark app successfully in standalone client/yarn 
client/yarn cluster mode，and get correct app status, but when i submit spark 
app in standalone cluster mode, Spark handle always return LOST status(once) 
and app running stablely until FINISHED( handle wasn't get any state change 
infomation).  I noticed when I submited app from code, after a while, the 
SparkSubmit process was suddenly stopped. I checked sparkSubmit log(launcher 
redirect log) doesn't have any useful information.

this is my pseudo code,
{code:java}
SparkAppHandle handle = launcher.startApplication(new SparkAppHandle.Listener() 
{
@Override
public void stateChanged(SparkAppHandle handle) {
stateChangedHandle(handle.getAppId(), jobId, code, execId, 
handle.getState(), driverInfo, request, infoLog, errorLog);
}
@Override
public void infoChanged(SparkAppHandle handle) {
stateChangedHandle(handle.getAppId(), jobId, code, execId, 
handle.getState(), driverInfo, request, infoLog, errorLog);
}
});{code}
any idea ? thx

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33976) Add a dedicated SQL document page for the TRANSFORM-related functionality,

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33976:


Assignee: (was: Apache Spark)

> Add a dedicated SQL document page for the TRANSFORM-related functionality,
> --
>
> Key: SPARK-33976
> URL: https://issues.apache.org/jira/browse/SPARK-33976
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>
> Add doc about transform 
> https://github.com/apache/spark/pull/30973#issuecomment-753715318



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33986) Spark handle always return LOST status in standalone cluster mode with Spark launcher

2021-01-04 Thread ZhongyuWang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhongyuWang updated SPARK-33986:

Description: 
I can use it to submit spark app successfully in standalone client/yarn 
client/yarn cluster mode，and get correct app status, but when i submit spark 
app in standalone cluster mode, Spark handle always return LOST status(once) 
and app running stablely until FINISHED( handle wasn't get any state change 
infomation).  I noticed when I submited app from code, after a while, the 
SparkSubmit process was suddenly stopped. I checked sparkSubmit log(launcher 
redirect log) doesn't have any useful information.

this is my pseudo code,
{code:java}
SparkAppHandle handle = launcher.startApplication(new SparkAppHandle.Listener() 
{
@Override
public void stateChanged(SparkAppHandle handle) {
stateChangedHandle(handle.getAppId(), jobId, code, execId, 
handle.getState(), driverInfo, request, infoLog, errorLog);
}
@Override
public void infoChanged(SparkAppHandle handle) {
stateChangedHandle(handle.getAppId(), jobId, code, execId, 
handle.getState(), driverInfo, request, infoLog, errorLog);
}
});{code}
any idea ? thx

  was:
I can use it to submit spark app successfully in standalone client/yarn 
client/yarn cluster mode，and get correct app status, but when i submit spark 
app in standalone cluster mode, Spark handle always return LOST status(once) 
and app running stablely until FINISHED( handle wasn't get any state change 
infomation).  I noticed when I submited app from code, after a while, the 
SparkSubmit process was suddenly stopped. I checked sparkSubmit log(launcher 
redirect log) doesn't have any useful information.

this is my pseudo code,
{code:java}
SparkAppHandle handle = launcher.startApplication(new SparkAppHandle.Listener() 
{
@Override
public void stateChanged(SparkAppHandle handle) {
stateChangedHandle(handle.getAppId(), jobId, code, execId, 
handle.getState(), driverInfo, request, infoLog, errorLog);
}
@Override
public void infoChanged(SparkAppHandle handle) {
stateChangedHandle(handle.getAppId(), jobId, code, execId, 
handle.getState(), driverInfo, request, infoLog, errorLog);
}
});{code}
any idea ? thx

 

 

 

 


> Spark handle always return LOST status in standalone cluster mode with Spark 
> launcher
> -
>
> Key: SPARK-33986
> URL: https://issues.apache.org/jira/browse/SPARK-33986
> Project: Spark
>  Issue Type: Question
>  Components: Spark Submit
>Affects Versions: 2.4.4
> Environment: apache hadoop 2.6.5
> apache spark 2.4.4
>Reporter: ZhongyuWang
>Priority: Major
>
> I can use it to submit spark app successfully in standalone client/yarn 
> client/yarn cluster mode，and get correct app status, but when i submit spark 
> app in standalone cluster mode, Spark handle always return LOST status(once) 
> and app running stablely until FINISHED( handle wasn't get any state change 
> infomation).  I noticed when I submited app from code, after a while, the 
> SparkSubmit process was suddenly stopped. I checked sparkSubmit log(launcher 
> redirect log) doesn't have any useful information.
> this is my pseudo code,
> {code:java}
> SparkAppHandle handle = launcher.startApplication(new 
> SparkAppHandle.Listener() {
> @Override
> public void stateChanged(SparkAppHandle handle) {
> stateChangedHandle(handle.getAppId(), jobId, code, execId, 
> handle.getState(), driverInfo, request, infoLog, errorLog);
> }
> @Override
> public void infoChanged(SparkAppHandle handle) {
> stateChangedHandle(handle.getAppId(), jobId, code, execId, 
> handle.getState(), driverInfo, request, infoLog, errorLog);
> }
> });{code}
> any idea ? thx



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33976) Add a dedicated SQL document page for the TRANSFORM-related functionality,

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258141#comment-17258141
 ] 

Apache Spark commented on SPARK-33976:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/31010

> Add a dedicated SQL document page for the TRANSFORM-related functionality,
> --
>
> Key: SPARK-33976
> URL: https://issues.apache.org/jira/browse/SPARK-33976
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>
> Add doc about transform 
> https://github.com/apache/spark/pull/30973#issuecomment-753715318



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33976) Add a dedicated SQL document page for the TRANSFORM-related functionality,

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258143#comment-17258143
 ] 

Apache Spark commented on SPARK-33976:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/31010

> Add a dedicated SQL document page for the TRANSFORM-related functionality,
> --
>
> Key: SPARK-33976
> URL: https://issues.apache.org/jira/browse/SPARK-33976
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>
> Add doc about transform 
> https://github.com/apache/spark/pull/30973#issuecomment-753715318



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33985) Transform with clusterby/orderby/sortby

2021-01-04 Thread angerszhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-33985:
--
Summary: Transform with clusterby/orderby/sortby  (was: Support transform 
with clusterby/orderby/sortby)

> Transform with clusterby/orderby/sortby
> ---
>
> Key: SPARK-33985
> URL: https://issues.apache.org/jira/browse/SPARK-33985
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33985) Transform with clusterby/orderby/sortby

2021-01-04 Thread angerszhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-33985:
--
Description: Need to add UT to make sure data same with Hive

> Transform with clusterby/orderby/sortby
> ---
>
> Key: SPARK-33985
> URL: https://issues.apache.org/jira/browse/SPARK-33985
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>
> Need to add UT to make sure data same with Hive



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33987) v2 ALTER TABLE .. DROP PARTITION does not refresh cached table

2021-01-04 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-33987:
--

 Summary: v2 ALTER TABLE .. DROP PARTITION does not refresh cached 
table
 Key: SPARK-33987
 URL: https://issues.apache.org/jira/browse/SPARK-33987
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


The test below portraits the issue:
{code:scala}
  test("SPARK-33950: refresh cache after partition dropping") {
withNamespaceAndTable("ns", "tbl") { t =>
  sql(s"CREATE TABLE $t (id int, part int) $defaultUsing PARTITIONED BY 
(part)")
  sql(s"INSERT INTO $t PARTITION (part=0) SELECT 0")
  sql(s"INSERT INTO $t PARTITION (part=1) SELECT 1")
  assert(!spark.catalog.isCached(t))
  sql(s"CACHE TABLE $t")
  assert(spark.catalog.isCached(t))
  QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(0, 0), Row(1, 1)))
  sql(s"ALTER TABLE $t DROP PARTITION (part=0)")
  assert(spark.catalog.isCached(t))
  QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(1, 1)))
}
  }
{code}
The last check fails:
{code}
== Results ==
!== Correct Answer - 1 ==   == Spark Answer - 2 ==
!struct<>   struct
![1,1]  [0,0]
!   [1,1]
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-33982) Sparksql does not support when the inserted table is a read table

2021-01-04 Thread hao (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258052#comment-17258052
 ] 

hao edited comment on SPARK-33982 at 1/4/21, 11:19 AM:
---

我认为sparksql应该得到支持insert overwrite 读取表中


was (Author: hao.duan):
我认为sparksql应该得到支持

> Sparksql does not support when the inserted table is a read table
> -
>
> Key: SPARK-33982
> URL: https://issues.apache.org/jira/browse/SPARK-33982
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: hao
>Priority: Major
>
> When the inserted table is a read table, sparksql will throw an error - > 
> org.apache.spark . sql.AnalysisException : Cannot overwrite a path that is 
> also being read from.;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33988) Add an option to enable CBO in TPCDSQueryBenchmark

2021-01-04 Thread Takeshi Yamamuro (Jira)

Takeshi Yamamuro created SPARK-33988:


 Summary: Add an option to enable CBO in TPCDSQueryBenchmark
 Key: SPARK-33988
 URL: https://issues.apache.org/jira/browse/SPARK-33988
 Project: Spark
  Issue Type: Test
  Components: SQL, Tests
Affects Versions: 3.2.0
Reporter: Takeshi Yamamuro


This ticket aims at adding a new option {{--cbo}} to enable CBO in 
TPCDSQueryBenchmark. I think this option is useful so as to monitor performance 
changes with CBO enabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33988) Add an option to enable CBO in TPCDSQueryBenchmark

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33988:


Assignee: Apache Spark

> Add an option to enable CBO in TPCDSQueryBenchmark
> --
>
> Key: SPARK-33988
> URL: https://issues.apache.org/jira/browse/SPARK-33988
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 3.2.0
>Reporter: Takeshi Yamamuro
>Assignee: Apache Spark
>Priority: Major
>
> This ticket aims at adding a new option {{--cbo}} to enable CBO in 
> TPCDSQueryBenchmark. I think this option is useful so as to monitor 
> performance changes with CBO enabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33988) Add an option to enable CBO in TPCDSQueryBenchmark

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258155#comment-17258155
 ] 

Apache Spark commented on SPARK-33988:
--

User 'maropu' has created a pull request for this issue:
https://github.com/apache/spark/pull/31011

> Add an option to enable CBO in TPCDSQueryBenchmark
> --
>
> Key: SPARK-33988
> URL: https://issues.apache.org/jira/browse/SPARK-33988
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 3.2.0
>Reporter: Takeshi Yamamuro
>Priority: Major
>
> This ticket aims at adding a new option {{--cbo}} to enable CBO in 
> TPCDSQueryBenchmark. I think this option is useful so as to monitor 
> performance changes with CBO enabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33988) Add an option to enable CBO in TPCDSQueryBenchmark

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33988:


Assignee: (was: Apache Spark)

> Add an option to enable CBO in TPCDSQueryBenchmark
> --
>
> Key: SPARK-33988
> URL: https://issues.apache.org/jira/browse/SPARK-33988
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 3.2.0
>Reporter: Takeshi Yamamuro
>Priority: Major
>
> This ticket aims at adding a new option {{--cbo}} to enable CBO in 
> TPCDSQueryBenchmark. I think this option is useful so as to monitor 
> performance changes with CBO enabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33989) Strip auto-generated cast when resolving UnresolvedAlias

2021-01-04 Thread ulysses you (Jira)

ulysses you created SPARK-33989:
---

 Summary: Strip auto-generated cast when resolving UnresolvedAlias
 Key: SPARK-33989
 URL: https://issues.apache.org/jira/browse/SPARK-33989
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: ulysses you


During analysis we may introduce the Cast if exists type cast implicitly. That 
makes assgined name unclear.

Let's say we have a sql `select id == null` which id is int type, then the 
output field name will be `(id = CAST(null as int))`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33987) v2 ALTER TABLE .. DROP PARTITION does not refresh cached table

2021-01-04 Thread Maxim Gekk (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258179#comment-17258179
 ] 

Maxim Gekk commented on SPARK-33987:


I am working on this

> v2 ALTER TABLE .. DROP PARTITION does not refresh cached table
> --
>
> Key: SPARK-33987
> URL: https://issues.apache.org/jira/browse/SPARK-33987
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> The test below portraits the issue:
> {code:scala}
>   test("SPARK-33950: refresh cache after partition dropping") {
> withNamespaceAndTable("ns", "tbl") { t =>
>   sql(s"CREATE TABLE $t (id int, part int) $defaultUsing PARTITIONED BY 
> (part)")
>   sql(s"INSERT INTO $t PARTITION (part=0) SELECT 0")
>   sql(s"INSERT INTO $t PARTITION (part=1) SELECT 1")
>   assert(!spark.catalog.isCached(t))
>   sql(s"CACHE TABLE $t")
>   assert(spark.catalog.isCached(t))
>   QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(0, 0), Row(1, 
> 1)))
>   sql(s"ALTER TABLE $t DROP PARTITION (part=0)")
>   assert(spark.catalog.isCached(t))
>   QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(1, 1)))
> }
>   }
> {code}
> The last check fails:
> {code}
> == Results ==
> !== Correct Answer - 1 ==   == Spark Answer - 2 ==
> !struct<>   struct
> ![1,1]  [0,0]
> !   [1,1]
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33990) v2 ALTER TABLE .. DROP PARTITION does not remove data from dropped partition

2021-01-04 Thread Maxim Gekk (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258183#comment-17258183
 ] 

Maxim Gekk commented on SPARK-33990:


I am working on the issue.

> v2 ALTER TABLE .. DROP PARTITION does not remove data from dropped partition
> 
>
> Key: SPARK-33990
> URL: https://issues.apache.org/jira/browse/SPARK-33990
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> The test fails:
> {code:scala}
>   test("SPARK-X: don not return data from dropped partition") {
> withNamespaceAndTable("ns", "tbl") { t =>
>   sql(s"CREATE TABLE $t (id int, part int) $defaultUsing PARTITIONED BY 
> (part)")
>   sql(s"INSERT INTO $t PARTITION (part=0) SELECT 0")
>   sql(s"INSERT INTO $t PARTITION (part=1) SELECT 1")
>   QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(0, 0), Row(1, 
> 1)))
>   sql(s"ALTER TABLE $t DROP PARTITION (part=0)")
>   QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(1, 1)))
> }
>   }
> {code}
> on the last check with:
> {code}
> == Results ==
> !== Correct Answer - 1 ==   == Spark Answer - 2 ==
> !struct<>   struct
> ![1,1]  [0,0]
> !   [1,1]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33990) v2 ALTER TABLE .. DROP PARTITION does not remove data from dropped partition

2021-01-04 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-33990:
--

 Summary: v2 ALTER TABLE .. DROP PARTITION does not remove data 
from dropped partition
 Key: SPARK-33990
 URL: https://issues.apache.org/jira/browse/SPARK-33990
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


The test fails:
{code:scala}
  test("SPARK-X: don not return data from dropped partition") {
withNamespaceAndTable("ns", "tbl") { t =>
  sql(s"CREATE TABLE $t (id int, part int) $defaultUsing PARTITIONED BY 
(part)")
  sql(s"INSERT INTO $t PARTITION (part=0) SELECT 0")
  sql(s"INSERT INTO $t PARTITION (part=1) SELECT 1")
  QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(0, 0), Row(1, 1)))
  sql(s"ALTER TABLE $t DROP PARTITION (part=0)")
  QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(1, 1)))
}
  }
{code}
on the last check with:
{code}
== Results ==
!== Correct Answer - 1 ==   == Spark Answer - 2 ==
!struct<>   struct
![1,1]  [0,0]
!   [1,1]
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33991) Repair enumeration conversion error for page showing list

2021-01-04 Thread kaif Yi (Jira)

kaif Yi created SPARK-33991:
---

 Summary: Repair enumeration conversion error for page showing list
 Key: SPARK-33991
 URL: https://issues.apache.org/jira/browse/SPARK-33991
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 3.0.0
 Environment: For AllJobsPage class, AllJobsPage gets the 
schedulingMode of enumerated type by loading the spark.scheduler.mode 
configuration from Sparkconf, but an enumeration conversion error occurs when I 
set the value of this configuration to lowercase.
Reporter: kaif Yi






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33991) Repair enumeration conversion error for page showing list

2021-01-04 Thread kaif Yi (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kaif Yi updated SPARK-33991:

Description: For AllJobsPage class, AllJobsPage gets the schedulingMode of 
enumerated type by loading the spark.scheduler.mode configuration from 
Sparkconf, but an enumeration conversion error occurs when I set the value of 
this configuration to lowercase.
Environment: (was: For AllJobsPage class, AllJobsPage gets the 
schedulingMode of enumerated type by loading the spark.scheduler.mode 
configuration from Sparkconf, but an enumeration conversion error occurs when I 
set the value of this configuration to lowercase.)

> Repair enumeration conversion error for page showing list
> -
>
> Key: SPARK-33991
> URL: https://issues.apache.org/jira/browse/SPARK-33991
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: kaif Yi
>Priority: Critical
>
> For AllJobsPage class, AllJobsPage gets the schedulingMode of enumerated type 
> by loading the spark.scheduler.mode configuration from Sparkconf, but an 
> enumeration conversion error occurs when I set the value of this 
> configuration to lowercase.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33991) Repair enumeration conversion error for AllJobsPage

2021-01-04 Thread kaif Yi (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kaif Yi updated SPARK-33991:

Summary: Repair enumeration conversion error for AllJobsPage  (was: Repair 
enumeration conversion error for page showing list of all ongoing and recently 
finished jobs)

> Repair enumeration conversion error for AllJobsPage
> ---
>
> Key: SPARK-33991
> URL: https://issues.apache.org/jira/browse/SPARK-33991
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: kaif Yi
>Priority: Critical
>
> For AllJobsPage class, AllJobsPage gets the schedulingMode of enumerated type 
> by loading the spark.scheduler.mode configuration from Sparkconf, but an 
> enumeration conversion error occurs when I set the value of this 
> configuration to lowercase.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33991) Repair enumeration conversion error for page showing list of all ongoing and recently finished jobs

2021-01-04 Thread kaif Yi (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kaif Yi updated SPARK-33991:

Summary: Repair enumeration conversion error for page showing list of all 
ongoing and recently finished jobs  (was: Repair enumeration conversion error 
for page showing list)

> Repair enumeration conversion error for page showing list of all ongoing and 
> recently finished jobs
> ---
>
> Key: SPARK-33991
> URL: https://issues.apache.org/jira/browse/SPARK-33991
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: kaif Yi
>Priority: Critical
>
> For AllJobsPage class, AllJobsPage gets the schedulingMode of enumerated type 
> by loading the spark.scheduler.mode configuration from Sparkconf, but an 
> enumeration conversion error occurs when I set the value of this 
> configuration to lowercase.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33992) resolveOperatorsUpWithNewOutput should wrap allowInvokingTransformsInAnalyzer

2021-01-04 Thread Kent Yao (Jira)

Kent Yao created SPARK-33992:


 Summary: resolveOperatorsUpWithNewOutput should wrap 
allowInvokingTransformsInAnalyzer
 Key: SPARK-33992
 URL: https://issues.apache.org/jira/browse/SPARK-33992
 Project: Spark
  Issue Type: Bug
  Components: SQL, Tests
Affects Versions: 3.1.0
Reporter: Kent Yao


PaddingAndLengthCheckForCharVarchar could fail query when 
resolveOperatorsUpWithNewOutput
with 


{code:java}
[info] - char/varchar resolution in sub query  *** FAILED *** (367 milliseconds)
[info]   java.lang.RuntimeException: This method should not be called in the 
analyzer
[info]   at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.assertNotAnalysisRule(AnalysisHelper.scala:150)
[info]   at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.assertNotAnalysisRule$(AnalysisHelper.scala:146)
[info]   at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.assertNotAnalysisRule(LogicalPlan.scala:29)
[info]   at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown(AnalysisHelper.scala:161)
[info]   at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown$(AnalysisHelper.scala:160)
[info]   at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
[info]   at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
[info]   at 
org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$updateOuterReferencesInSubquery(QueryPlan.scala:267)
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33992) resolveOperatorsUpWithNewOutput should wrap allowInvokingTransformsInAnalyzer

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258192#comment-17258192
 ] 

Apache Spark commented on SPARK-33992:
--

User 'yaooqinn' has created a pull request for this issue:
https://github.com/apache/spark/pull/31013

> resolveOperatorsUpWithNewOutput should wrap allowInvokingTransformsInAnalyzer
> -
>
> Key: SPARK-33992
> URL: https://issues.apache.org/jira/browse/SPARK-33992
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 3.1.0
>Reporter: Kent Yao
>Priority: Minor
>
> PaddingAndLengthCheckForCharVarchar could fail query when 
> resolveOperatorsUpWithNewOutput
> with 
> {code:java}
> [info] - char/varchar resolution in sub query  *** FAILED *** (367 
> milliseconds)
> [info]   java.lang.RuntimeException: This method should not be called in the 
> analyzer
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.assertNotAnalysisRule(AnalysisHelper.scala:150)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.assertNotAnalysisRule$(AnalysisHelper.scala:146)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.assertNotAnalysisRule(LogicalPlan.scala:29)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown(AnalysisHelper.scala:161)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown$(AnalysisHelper.scala:160)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$updateOuterReferencesInSubquery(QueryPlan.scala:267)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33992) resolveOperatorsUpWithNewOutput should wrap allowInvokingTransformsInAnalyzer

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33992:


Assignee: Apache Spark

> resolveOperatorsUpWithNewOutput should wrap allowInvokingTransformsInAnalyzer
> -
>
> Key: SPARK-33992
> URL: https://issues.apache.org/jira/browse/SPARK-33992
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 3.1.0
>Reporter: Kent Yao
>Assignee: Apache Spark
>Priority: Minor
>
> PaddingAndLengthCheckForCharVarchar could fail query when 
> resolveOperatorsUpWithNewOutput
> with 
> {code:java}
> [info] - char/varchar resolution in sub query  *** FAILED *** (367 
> milliseconds)
> [info]   java.lang.RuntimeException: This method should not be called in the 
> analyzer
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.assertNotAnalysisRule(AnalysisHelper.scala:150)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.assertNotAnalysisRule$(AnalysisHelper.scala:146)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.assertNotAnalysisRule(LogicalPlan.scala:29)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown(AnalysisHelper.scala:161)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown$(AnalysisHelper.scala:160)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$updateOuterReferencesInSubquery(QueryPlan.scala:267)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33992) resolveOperatorsUpWithNewOutput should wrap allowInvokingTransformsInAnalyzer

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33992:


Assignee: (was: Apache Spark)

> resolveOperatorsUpWithNewOutput should wrap allowInvokingTransformsInAnalyzer
> -
>
> Key: SPARK-33992
> URL: https://issues.apache.org/jira/browse/SPARK-33992
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 3.1.0
>Reporter: Kent Yao
>Priority: Minor
>
> PaddingAndLengthCheckForCharVarchar could fail query when 
> resolveOperatorsUpWithNewOutput
> with 
> {code:java}
> [info] - char/varchar resolution in sub query  *** FAILED *** (367 
> milliseconds)
> [info]   java.lang.RuntimeException: This method should not be called in the 
> analyzer
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.assertNotAnalysisRule(AnalysisHelper.scala:150)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.assertNotAnalysisRule$(AnalysisHelper.scala:146)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.assertNotAnalysisRule(LogicalPlan.scala:29)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown(AnalysisHelper.scala:161)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown$(AnalysisHelper.scala:160)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$updateOuterReferencesInSubquery(QueryPlan.scala:267)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33990) v2 ALTER TABLE .. DROP PARTITION does not remove data from dropped partition

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258195#comment-17258195
 ] 

Apache Spark commented on SPARK-33990:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/31014

> v2 ALTER TABLE .. DROP PARTITION does not remove data from dropped partition
> 
>
> Key: SPARK-33990
> URL: https://issues.apache.org/jira/browse/SPARK-33990
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> The test fails:
> {code:scala}
>   test("SPARK-X: don not return data from dropped partition") {
> withNamespaceAndTable("ns", "tbl") { t =>
>   sql(s"CREATE TABLE $t (id int, part int) $defaultUsing PARTITIONED BY 
> (part)")
>   sql(s"INSERT INTO $t PARTITION (part=0) SELECT 0")
>   sql(s"INSERT INTO $t PARTITION (part=1) SELECT 1")
>   QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(0, 0), Row(1, 
> 1)))
>   sql(s"ALTER TABLE $t DROP PARTITION (part=0)")
>   QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(1, 1)))
> }
>   }
> {code}
> on the last check with:
> {code}
> == Results ==
> !== Correct Answer - 1 ==   == Spark Answer - 2 ==
> !struct<>   struct
> ![1,1]  [0,0]
> !   [1,1]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33990) v2 ALTER TABLE .. DROP PARTITION does not remove data from dropped partition

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33990:


Assignee: Apache Spark

> v2 ALTER TABLE .. DROP PARTITION does not remove data from dropped partition
> 
>
> Key: SPARK-33990
> URL: https://issues.apache.org/jira/browse/SPARK-33990
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Apache Spark
>Priority: Major
>
> The test fails:
> {code:scala}
>   test("SPARK-X: don not return data from dropped partition") {
> withNamespaceAndTable("ns", "tbl") { t =>
>   sql(s"CREATE TABLE $t (id int, part int) $defaultUsing PARTITIONED BY 
> (part)")
>   sql(s"INSERT INTO $t PARTITION (part=0) SELECT 0")
>   sql(s"INSERT INTO $t PARTITION (part=1) SELECT 1")
>   QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(0, 0), Row(1, 
> 1)))
>   sql(s"ALTER TABLE $t DROP PARTITION (part=0)")
>   QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(1, 1)))
> }
>   }
> {code}
> on the last check with:
> {code}
> == Results ==
> !== Correct Answer - 1 ==   == Spark Answer - 2 ==
> !struct<>   struct
> ![1,1]  [0,0]
> !   [1,1]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33990) v2 ALTER TABLE .. DROP PARTITION does not remove data from dropped partition

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33990:


Assignee: (was: Apache Spark)

> v2 ALTER TABLE .. DROP PARTITION does not remove data from dropped partition
> 
>
> Key: SPARK-33990
> URL: https://issues.apache.org/jira/browse/SPARK-33990
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> The test fails:
> {code:scala}
>   test("SPARK-X: don not return data from dropped partition") {
> withNamespaceAndTable("ns", "tbl") { t =>
>   sql(s"CREATE TABLE $t (id int, part int) $defaultUsing PARTITIONED BY 
> (part)")
>   sql(s"INSERT INTO $t PARTITION (part=0) SELECT 0")
>   sql(s"INSERT INTO $t PARTITION (part=1) SELECT 1")
>   QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(0, 0), Row(1, 
> 1)))
>   sql(s"ALTER TABLE $t DROP PARTITION (part=0)")
>   QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(1, 1)))
> }
>   }
> {code}
> on the last check with:
> {code}
> == Results ==
> !== Correct Answer - 1 ==   == Spark Answer - 2 ==
> !struct<>   struct
> ![1,1]  [0,0]
> !   [1,1]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33991) Repair enumeration conversion error for AllJobsPage

2021-01-04 Thread Felix Yi (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Yi updated SPARK-33991:
-
Component/s: (was: Web UI)
 Spark Core

> Repair enumeration conversion error for AllJobsPage
> ---
>
> Key: SPARK-33991
> URL: https://issues.apache.org/jira/browse/SPARK-33991
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Felix Yi
>Priority: Critical
>
> For AllJobsPage class, AllJobsPage gets the schedulingMode of enumerated type 
> by loading the spark.scheduler.mode configuration from Sparkconf, but an 
> enumeration conversion error occurs when I set the value of this 
> configuration to lowercase.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33991) Repair enumeration conversion error for AllJobsPage

2021-01-04 Thread Felix Yi (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Yi updated SPARK-33991:
-
Description: 
For AllJobsPage class, AllJobsPage gets the schedulingMode of enumerated type 
by loading the spark.scheduler.mode configuration from Sparkconf, but an 
enumeration conversion error occurs when I set the value of this configuration 
to lowercase.

The reason for this problem is that the value of the SchedulingMode enumeration 
class is uppercase, which occurs when I configure spark. scheduler.mode to be 
lowercase.

  was:For AllJobsPage class, AllJobsPage gets the schedulingMode of enumerated 
type by loading the spark.scheduler.mode configuration from Sparkconf, but an 
enumeration conversion error occurs when I set the value of this configuration 
to lowercase.


> Repair enumeration conversion error for AllJobsPage
> ---
>
> Key: SPARK-33991
> URL: https://issues.apache.org/jira/browse/SPARK-33991
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Felix Yi
>Priority: Critical
>
> For AllJobsPage class, AllJobsPage gets the schedulingMode of enumerated type 
> by loading the spark.scheduler.mode configuration from Sparkconf, but an 
> enumeration conversion error occurs when I set the value of this 
> configuration to lowercase.
> The reason for this problem is that the value of the SchedulingMode 
> enumeration class is uppercase, which occurs when I configure spark. 
> scheduler.mode to be lowercase.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33991) Repair enumeration conversion error for AllJobsPage

2021-01-04 Thread Felix Yi (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258201#comment-17258201
 ] 

Felix Yi commented on SPARK-33991:
--

I saw that the #org.apache.spark.scheduler.TaskSchedulerImpl class convert the 
spark. scheduler.mode value to uppercase, so I think it should be converted in 
AllJobsPage as well.
{code:java}
val schedulingMode: SchedulingMode =
  try {
SchedulingMode.withName(schedulingModeConf.toUpperCase(Locale.ROOT))
  } catch {
case e: java.util.NoSuchElementException =>
  throw new SparkException(s"Unrecognized $SCHEDULER_MODE_PROPERTY: 
$schedulingModeConf")
  }
{code}

> Repair enumeration conversion error for AllJobsPage
> ---
>
> Key: SPARK-33991
> URL: https://issues.apache.org/jira/browse/SPARK-33991
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Felix Yi
>Priority: Critical
>
> For AllJobsPage class, AllJobsPage gets the schedulingMode of enumerated type 
> by loading the spark.scheduler.mode configuration from Sparkconf, but an 
> enumeration conversion error occurs when I set the value of this 
> configuration to lowercase.
> The reason for this problem is that the value of the SchedulingMode 
> enumeration class is uppercase, which occurs when I configure spark. 
> scheduler.mode to be lowercase.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25075) Build and test Spark against Scala 2.13

2021-01-04 Thread Guillaume Martres (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-25075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258218#comment-17258218
 ] 

Guillaume Martres commented on SPARK-25075:
---

Now that 2.13 support is basically complete, would it be possible to publish a 
preview release of spark 3.1 built against scala 2.13 on maven for testing 
purposes? Thanks!

> Build and test Spark against Scala 2.13
> ---
>
> Key: SPARK-25075
> URL: https://issues.apache.org/jira/browse/SPARK-25075
> Project: Spark
>  Issue Type: Umbrella
>  Components: Build, MLlib, Project Infra, Spark Core, SQL
>Affects Versions: 3.0.0
>Reporter: Guillaume Massé
>Priority: Major
>
> This umbrella JIRA tracks the requirements for building and testing Spark 
> against the current Scala 2.13 milestone.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33948) branch-3.1 jenkins test failed in Scala 2.13

2021-01-04 Thread Yang Jie (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258217#comment-17258217
 ] 

Yang Jie commented on SPARK-33948:
--

*Sync:*
{code:java}
commit 58583f7c3fdcac1232607a7ab4b0d052320ac3ea (HEAD -> branch-3.1)
Author: xuewei.linxuewei 
Date:   Wed Dec 2 16:10:45 2020 +
[SPARK-33619][SQL] Fix GetMapValueUtil code generation error

Run completed in 11 minutes, 51 seconds.
Total number of tests run: 4623
Suites: completed 256, aborted 0
Tests: succeeded 4607, failed 16, canceled 0, ignored 5, pending 0
*** 16 TESTS FAILED ***
{code}
{code:java}
commit df8d3f1bf779ce1a9f3520939ab85814f09b48b7 (HEAD -> branch-3.1)
Author: HyukjinKwon 
Date:   Wed Dec 2 16:03:08 2020 +
[SPARK-33544][SQL][FOLLOW-UP] Rename NoSideEffect to NoThrow and clarify the 
documentation more

Run completed in 10 minutes, 39 seconds.
Total number of tests run: 4622
Suites: completed 256, aborted 0
Tests: succeeded 4622, failed 0, canceled 0, ignored 5, pending 0
All tests passed.
{code}
After SPARK-33619 , there are 16 TESTS FAILED in branch-3.1, no further 
investigation yet

> branch-3.1 jenkins test failed in Scala 2.13 
> -
>
> Key: SPARK-33948
> URL: https://issues.apache.org/jira/browse/SPARK-33948
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 3.1.0
> Environment: * 
>  
>Reporter: Yang Jie
>Priority: Major
>
> [https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/#showFailuresLink]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.reuseClientsUpToConfigVariableConcurrent|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/reuseClientsUpToConfigVariableConcurrent/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.reuseClientsUpToConfigVariableConcurrent|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/reuseClientsUpToConfigVariableConcurrent_2/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.fastFailConnectionInTimeWindow|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/fastFailConnectionInTimeWindow/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.fastFailConnectionInTimeWindow|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/fastFailConnectionInTimeWindow_2/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.closeFactoryBeforeCreateClient|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeFactoryBeforeCreateClient/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.closeFactoryBeforeCreateClient|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeFactoryBeforeCreateClient_2/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.closeBlockClientsWithFactory|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeBlockClientsWithFactory/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.closeBlockClientsWithFactory|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeBlockClientsWithFactory_2/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.neverReturnInactiveClients|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/neverReturnInactiveClients/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.neverReturnInactiveClients|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/neverReturnInactiveClients_2/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.closeIdleConnectionForRequestTimeOut|https://amplab.cs.berkeley.edu/jenkins/job/spark

[jira] [Comment Edited] (SPARK-33948) branch-3.1 jenkins test failed in Scala 2.13

2021-01-04 Thread Yang Jie (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258217#comment-17258217
 ] 

Yang Jie edited comment on SPARK-33948 at 1/4/21, 2:00 PM:
---

*Sync:*
{code:java}
commit 58583f7c3fdcac1232607a7ab4b0d052320ac3ea (HEAD -> branch-3.1)
Author: xuewei.linxuewei 
Date:   Wed Dec 2 16:10:45 2020 +
[SPARK-33619][SQL] Fix GetMapValueUtil code generation error

Run completed in 11 minutes, 51 seconds.
Total number of tests run: 4623
Suites: completed 256, aborted 0
Tests: succeeded 4607, failed 16, canceled 0, ignored 5, pending 0
*** 16 TESTS FAILED ***
{code}
{code:java}
commit df8d3f1bf779ce1a9f3520939ab85814f09b48b7 (HEAD -> branch-3.1)
Author: HyukjinKwon 
Date:   Wed Dec 2 16:03:08 2020 +
[SPARK-33544][SQL][FOLLOW-UP] Rename NoSideEffect to NoThrow and clarify the 
documentation more

Run completed in 10 minutes, 39 seconds.
Total number of tests run: 4622
Suites: completed 256, aborted 0
Tests: succeeded 4622, failed 0, canceled 0, ignored 5, pending 0
All tests passed.
{code}
After SPARK-33619 , there are 16 TESTS FAILED in branch-3.1, no further 
investigation yet, and I'm not sure why the master branch was successful, need 
more time to analyze.

 


was (Author: luciferyang):
*Sync:*
{code:java}
commit 58583f7c3fdcac1232607a7ab4b0d052320ac3ea (HEAD -> branch-3.1)
Author: xuewei.linxuewei 
Date:   Wed Dec 2 16:10:45 2020 +
[SPARK-33619][SQL] Fix GetMapValueUtil code generation error

Run completed in 11 minutes, 51 seconds.
Total number of tests run: 4623
Suites: completed 256, aborted 0
Tests: succeeded 4607, failed 16, canceled 0, ignored 5, pending 0
*** 16 TESTS FAILED ***
{code}
{code:java}
commit df8d3f1bf779ce1a9f3520939ab85814f09b48b7 (HEAD -> branch-3.1)
Author: HyukjinKwon 
Date:   Wed Dec 2 16:03:08 2020 +
[SPARK-33544][SQL][FOLLOW-UP] Rename NoSideEffect to NoThrow and clarify the 
documentation more

Run completed in 10 minutes, 39 seconds.
Total number of tests run: 4622
Suites: completed 256, aborted 0
Tests: succeeded 4622, failed 0, canceled 0, ignored 5, pending 0
All tests passed.
{code}
After SPARK-33619 , there are 16 TESTS FAILED in branch-3.1, no further 
investigation yet

> branch-3.1 jenkins test failed in Scala 2.13 
> -
>
> Key: SPARK-33948
> URL: https://issues.apache.org/jira/browse/SPARK-33948
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 3.1.0
> Environment: * 
>  
>Reporter: Yang Jie
>Priority: Major
>
> [https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/#showFailuresLink]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.reuseClientsUpToConfigVariableConcurrent|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/reuseClientsUpToConfigVariableConcurrent/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.reuseClientsUpToConfigVariableConcurrent|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/reuseClientsUpToConfigVariableConcurrent_2/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.fastFailConnectionInTimeWindow|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/fastFailConnectionInTimeWindow/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.fastFailConnectionInTimeWindow|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/fastFailConnectionInTimeWindow_2/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.closeFactoryBeforeCreateClient|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeFactoryBeforeCreateClient/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.closeFactoryBeforeCreateClient|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeFactoryBeforeCreateClient_2/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.closeBlockClientsWithFactory|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/

[jira] [Commented] (SPARK-33736) Handle MERGE in ReplaceNullWithFalseInPredicate

2021-01-04 Thread Anton Okolnychyi (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258222#comment-17258222
 ] 

Anton Okolnychyi commented on SPARK-33736:
--

Sorry, I was on holidays. Will get back to the PR this week.

> Handle MERGE in ReplaceNullWithFalseInPredicate
> ---
>
> Key: SPARK-33736
> URL: https://issues.apache.org/jira/browse/SPARK-33736
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Anton Okolnychyi
>Priority: Major
>
> We need to handle merge statements in {{ReplaceNullWithFalseInPredicate}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33991) Repair enumeration conversion error for AllJobsPage

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258228#comment-17258228
 ] 

Apache Spark commented on SPARK-33991:
--

User 'FelixYik' has created a pull request for this issue:
https://github.com/apache/spark/pull/31015

> Repair enumeration conversion error for AllJobsPage
> ---
>
> Key: SPARK-33991
> URL: https://issues.apache.org/jira/browse/SPARK-33991
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Felix Yi
>Priority: Critical
>
> For AllJobsPage class, AllJobsPage gets the schedulingMode of enumerated type 
> by loading the spark.scheduler.mode configuration from Sparkconf, but an 
> enumeration conversion error occurs when I set the value of this 
> configuration to lowercase.
> The reason for this problem is that the value of the SchedulingMode 
> enumeration class is uppercase, which occurs when I configure spark. 
> scheduler.mode to be lowercase.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33991) Repair enumeration conversion error for AllJobsPage

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33991:


Assignee: Apache Spark

> Repair enumeration conversion error for AllJobsPage
> ---
>
> Key: SPARK-33991
> URL: https://issues.apache.org/jira/browse/SPARK-33991
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Felix Yi
>Assignee: Apache Spark
>Priority: Critical
>
> For AllJobsPage class, AllJobsPage gets the schedulingMode of enumerated type 
> by loading the spark.scheduler.mode configuration from Sparkconf, but an 
> enumeration conversion error occurs when I set the value of this 
> configuration to lowercase.
> The reason for this problem is that the value of the SchedulingMode 
> enumeration class is uppercase, which occurs when I configure spark. 
> scheduler.mode to be lowercase.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33991) Repair enumeration conversion error for AllJobsPage

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33991:


Assignee: (was: Apache Spark)

> Repair enumeration conversion error for AllJobsPage
> ---
>
> Key: SPARK-33991
> URL: https://issues.apache.org/jira/browse/SPARK-33991
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Felix Yi
>Priority: Critical
>
> For AllJobsPage class, AllJobsPage gets the schedulingMode of enumerated type 
> by loading the spark.scheduler.mode configuration from Sparkconf, but an 
> enumeration conversion error occurs when I set the value of this 
> configuration to lowercase.
> The reason for this problem is that the value of the SchedulingMode 
> enumeration class is uppercase, which occurs when I configure spark. 
> scheduler.mode to be lowercase.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33993) Parquet encryption interop

2021-01-04 Thread Gidon Gershinsky (Jira)

Gidon Gershinsky created SPARK-33993:


 Summary: Parquet encryption interop
 Key: SPARK-33993
 URL: https://issues.apache.org/jira/browse/SPARK-33993
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.1.0
Reporter: Gidon Gershinsky


Test interoperability between stand-alone Parquet encryption, and Spark-managed 
Parquet encryption



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33994) ORC encryption interop

2021-01-04 Thread Gidon Gershinsky (Jira)

Gidon Gershinsky created SPARK-33994:


 Summary: ORC encryption interop
 Key: SPARK-33994
 URL: https://issues.apache.org/jira/browse/SPARK-33994
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.1.0
Reporter: Gidon Gershinsky


Test interoperability between stand-alone ORC encryption, and Spark-managed ORC 
encryption



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33875) Implement DESCRIBE COLUMN for v2 catalog

2021-01-04 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-33875:
---

Assignee: Terry Kim

> Implement DESCRIBE COLUMN for v2 catalog
> 
>
> Key: SPARK-33875
> URL: https://issues.apache.org/jira/browse/SPARK-33875
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Terry Kim
>Assignee: Terry Kim
>Priority: Major
>
> Implement DESCRIBE COLUMN for v2 catalog



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33875) Implement DESCRIBE COLUMN for v2 catalog

2021-01-04 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-33875.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 30881
[https://github.com/apache/spark/pull/30881]

> Implement DESCRIBE COLUMN for v2 catalog
> 
>
> Key: SPARK-33875
> URL: https://issues.apache.org/jira/browse/SPARK-33875
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Terry Kim
>Assignee: Terry Kim
>Priority: Major
> Fix For: 3.2.0
>
>
> Implement DESCRIBE COLUMN for v2 catalog



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31786) Exception on submitting Spark-Pi to Kubernetes 1.17.3

2021-01-04 Thread Sachit Murarka (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258345#comment-17258345
 ] 

Sachit Murarka commented on SPARK-31786:


[~maver1ck] / [~dongjoon] :
I am facing this issue. I am using Spark 2.4.7 . 

 

I have tried the settings mentioned in the above comments
spark.kubernetes.driverEnv.HTTP2_DISABLE=true


Following is the exception :


Exception in thread "main" 
io.fabric8.kubernetes.client.KubernetesClientException: Operation: [create] for 
kind: [Pod] with name: [null] in namespace: [spark-test] failed.
 at 
io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
 at 
io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72)
 at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:337)
 at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:330)
 at 
org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:141)
 at 
org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:140)
 at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543)
 at 
org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:140)
 at 
org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:250)
 at 
org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:241)
 at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543)
 at 
org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:241)
 at 
org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:204)
 at 
[org.apache.spark.deploy.SparkSubmit.org|http://org.apache.spark.deploy.sparksubmit.org/]$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
 at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
 at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
 at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
 at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
 at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
 at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: [java.net|http://java.net/].SocketTimeoutException: connect timed out
 at [java.net|http://java.net/].PlainSocketImpl.socketConnect(Native Method)
 at 
[java.net|http://java.net/].AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
 at 
[java.net|http://java.net/].AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
 at 
[java.net|http://java.net/].AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
 at 
[java.net|http://java.net/].SocksSocketImpl.connect(SocksSocketImpl.java:392)
 at [java.net|http://java.net/].Socket.connect(Socket.java:589)
 at okhttp3.internal.platform.Platform.connectSocket(Platform.java:129)
 at 
okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:246)
 at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:166)
 at 
okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:257)
 at 
okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:135)
 at 
okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:114)
 at 
okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
 at 
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
 at 
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
 at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
 at 
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
 at 
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
 at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
 at 
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
 at 
okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:126)
 at 
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
 at 
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
 at 
io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:119)
 at 
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
 at 
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
 at 
io.fabric8.kubernetes.client.utils.Imper

[jira] [Commented] (SPARK-31786) Exception on submitting Spark-Pi to Kubernetes 1.17.3

2021-01-04 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258383#comment-17258383
 ] 

Dongjoon Hyun commented on SPARK-31786:
---

Did you do `export HTTP2_DISABLE=true` before `spark-submit`? HTTP2_DISABLE is 
required all places when you use `K8s client` and technically there exist two 
places.
 # Your Mac (Outside K8s cluster): `spark-submit`
 # Spark Driver Pod (Inside K8s cluster): 
spark.kubernetes.driverEnv.HTTP2_DISABLE=true

> Exception on submitting Spark-Pi to Kubernetes 1.17.3
> -
>
> Key: SPARK-31786
> URL: https://issues.apache.org/jira/browse/SPARK-31786
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Maciej Bryński
>Assignee: Dongjoon Hyun
>Priority: Blocker
> Fix For: 3.0.0
>
>
> Hi,
> I'm getting exception when submitting Spark-Pi app to Kubernetes cluster.
> Kubernetes version: 1.17.3
> JDK version: openjdk version "1.8.0_252"
> Exception:
> {code}
>  ./bin/spark-submit --master k8s://https://172.31.23.60:8443 --deploy-mode 
> cluster --name spark-pi --conf 
> spark.kubernetes.container.image=spark-py:2.4.5 --conf 
> spark.kubernetes.executor.request.cores=0.1 --conf 
> spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf 
> spark.executor.instances=1 local:///opt/spark/examples/src/main/python/pi.py
> log4j:WARN No appenders could be found for logger 
> (io.fabric8.kubernetes.client.Config).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
> info.
> Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
> Exception in thread "main" 
> io.fabric8.kubernetes.client.KubernetesClientException: Operation: [create]  
> for kind: [Pod]  with name: [null]  in namespace: [default]  failed.
> at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
> at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72)
> at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:337)
> at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:330)
> at 
> org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:141)
> at 
> org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:140)
> at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543)
> at 
> org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:140)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:250)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:241)
> at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:241)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:204)
> at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
> at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
> at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.net.SocketException: Broken pipe (Write failed)
> at java.net.SocketOutputStream.socketWrite0(Native Method)
> at 
> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
> at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
> at sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:431)
> at sun.security.ssl.OutputRecord.write(OutputRecord.java:417)
> at 
> sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:894)
> at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:865)
> at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123)
> at okio.Okio$1.write(Okio.java:79)
> at okio.AsyncTimeout$1.write(AsyncTimeout.java:180)
> at okio.Rea

[jira] [Assigned] (SPARK-33984) Upgrade to Py4J 0.10.9.1

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-33984:
-

Assignee: Hyukjin Kwon

> Upgrade to Py4J 0.10.9.1
> 
>
> Key: SPARK-33984
> URL: https://issues.apache.org/jira/browse/SPARK-33984
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>
> Py4J 0.10.9.1 is out with bug fixes. we should better upgrade in PySpark as 
> well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33984) Upgrade to Py4J 0.10.9.1

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33984.
---
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 31009
[https://github.com/apache/spark/pull/31009]

> Upgrade to Py4J 0.10.9.1
> 
>
> Key: SPARK-33984
> URL: https://issues.apache.org/jira/browse/SPARK-33984
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.2.0
>
>
> Py4J 0.10.9.1 is out with bug fixes. we should better upgrade in PySpark as 
> well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33990) v2 ALTER TABLE .. DROP PARTITION does not remove data from dropped partition

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33990.
---
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 31014
[https://github.com/apache/spark/pull/31014]

> v2 ALTER TABLE .. DROP PARTITION does not remove data from dropped partition
> 
>
> Key: SPARK-33990
> URL: https://issues.apache.org/jira/browse/SPARK-33990
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> The test fails:
> {code:scala}
>   test("SPARK-X: don not return data from dropped partition") {
> withNamespaceAndTable("ns", "tbl") { t =>
>   sql(s"CREATE TABLE $t (id int, part int) $defaultUsing PARTITIONED BY 
> (part)")
>   sql(s"INSERT INTO $t PARTITION (part=0) SELECT 0")
>   sql(s"INSERT INTO $t PARTITION (part=1) SELECT 1")
>   QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(0, 0), Row(1, 
> 1)))
>   sql(s"ALTER TABLE $t DROP PARTITION (part=0)")
>   QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(1, 1)))
> }
>   }
> {code}
> on the last check with:
> {code}
> == Results ==
> !== Correct Answer - 1 ==   == Spark Answer - 2 ==
> !struct<>   struct
> ![1,1]  [0,0]
> !   [1,1]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33990) v2 ALTER TABLE .. DROP PARTITION does not remove data from dropped partition

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-33990:
-

Assignee: Maxim Gekk

> v2 ALTER TABLE .. DROP PARTITION does not remove data from dropped partition
> 
>
> Key: SPARK-33990
> URL: https://issues.apache.org/jira/browse/SPARK-33990
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
>
> The test fails:
> {code:scala}
>   test("SPARK-X: don not return data from dropped partition") {
> withNamespaceAndTable("ns", "tbl") { t =>
>   sql(s"CREATE TABLE $t (id int, part int) $defaultUsing PARTITIONED BY 
> (part)")
>   sql(s"INSERT INTO $t PARTITION (part=0) SELECT 0")
>   sql(s"INSERT INTO $t PARTITION (part=1) SELECT 1")
>   QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(0, 0), Row(1, 
> 1)))
>   sql(s"ALTER TABLE $t DROP PARTITION (part=0)")
>   QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(1, 1)))
> }
>   }
> {code}
> on the last check with:
> {code}
> == Results ==
> !== Correct Answer - 1 ==   == Spark Answer - 2 ==
> !struct<>   struct
> ![1,1]  [0,0]
> !   [1,1]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33988) Add an option to enable CBO in TPCDSQueryBenchmark

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33988:
--
Parent: SPARK-33828
Issue Type: Sub-task  (was: Test)

> Add an option to enable CBO in TPCDSQueryBenchmark
> --
>
> Key: SPARK-33988
> URL: https://issues.apache.org/jira/browse/SPARK-33988
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.2.0
>Reporter: Takeshi Yamamuro
>Priority: Major
>
> This ticket aims at adding a new option {{--cbo}} to enable CBO in 
> TPCDSQueryBenchmark. I think this option is useful so as to monitor 
> performance changes with CBO enabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33988) Add an option to enable CBO in TPCDSQueryBenchmark

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33988:
--
Parent: (was: SPARK-33828)
Issue Type: Improvement  (was: Sub-task)

> Add an option to enable CBO in TPCDSQueryBenchmark
> --
>
> Key: SPARK-33988
> URL: https://issues.apache.org/jira/browse/SPARK-33988
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Tests
>Affects Versions: 3.2.0
>Reporter: Takeshi Yamamuro
>Priority: Major
>
> This ticket aims at adding a new option {{--cbo}} to enable CBO in 
> TPCDSQueryBenchmark. I think this option is useful so as to monitor 
> performance changes with CBO enabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33988) Add an option to enable CBO in TPCDSQueryBenchmark

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33988.
---
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 31011
[https://github.com/apache/spark/pull/31011]

> Add an option to enable CBO in TPCDSQueryBenchmark
> --
>
> Key: SPARK-33988
> URL: https://issues.apache.org/jira/browse/SPARK-33988
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Tests
>Affects Versions: 3.2.0
>Reporter: Takeshi Yamamuro
>Assignee: Takeshi Yamamuro
>Priority: Major
> Fix For: 3.2.0
>
>
> This ticket aims at adding a new option {{--cbo}} to enable CBO in 
> TPCDSQueryBenchmark. I think this option is useful so as to monitor 
> performance changes with CBO enabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33988) Add an option to enable CBO in TPCDSQueryBenchmark

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-33988:
-

Assignee: Takeshi Yamamuro

> Add an option to enable CBO in TPCDSQueryBenchmark
> --
>
> Key: SPARK-33988
> URL: https://issues.apache.org/jira/browse/SPARK-33988
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Tests
>Affects Versions: 3.2.0
>Reporter: Takeshi Yamamuro
>Assignee: Takeshi Yamamuro
>Priority: Major
>
> This ticket aims at adding a new option {{--cbo}} to enable CBO in 
> TPCDSQueryBenchmark. I think this option is useful so as to monitor 
> performance changes with CBO enabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33983) Update cloudpickle to v1.6.0

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-33983:
-

Assignee: Hyukjin Kwon

> Update cloudpickle to v1.6.0
> 
>
> Key: SPARK-33983
> URL: https://issues.apache.org/jira/browse/SPARK-33983
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>
> Cloudpickle 1.6.0 is released out. We should better match it to the latest 
> version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33983) Update cloudpickle to v1.6.0

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33983.
---
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 31007
[https://github.com/apache/spark/pull/31007]

> Update cloudpickle to v1.6.0
> 
>
> Key: SPARK-33983
> URL: https://issues.apache.org/jira/browse/SPARK-33983
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.2.0
>
>
> Cloudpickle 1.6.0 is released out. We should better match it to the latest 
> version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31786) Exception on submitting Spark-Pi to Kubernetes 1.17.3

2021-01-04 Thread Sachit Murarka (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258411#comment-17258411
 ] 

Sachit Murarka commented on SPARK-31786:


[~dongjoon] -> Yes I have used `export HTTP2_DISABLE=true` , but only on my 
machine .
Should it on all nodes of Kubernetes?

Also , regarding you mentioned in second point , 
spark.kubernetes.driverEnv.HTTP2_DISABLE=true has to be used with spark-submit 
in form of --conf . 

Please let me know if my understanding is correct.

> Exception on submitting Spark-Pi to Kubernetes 1.17.3
> -
>
> Key: SPARK-31786
> URL: https://issues.apache.org/jira/browse/SPARK-31786
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Maciej Bryński
>Assignee: Dongjoon Hyun
>Priority: Blocker
> Fix For: 3.0.0
>
>
> Hi,
> I'm getting exception when submitting Spark-Pi app to Kubernetes cluster.
> Kubernetes version: 1.17.3
> JDK version: openjdk version "1.8.0_252"
> Exception:
> {code}
>  ./bin/spark-submit --master k8s://https://172.31.23.60:8443 --deploy-mode 
> cluster --name spark-pi --conf 
> spark.kubernetes.container.image=spark-py:2.4.5 --conf 
> spark.kubernetes.executor.request.cores=0.1 --conf 
> spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf 
> spark.executor.instances=1 local:///opt/spark/examples/src/main/python/pi.py
> log4j:WARN No appenders could be found for logger 
> (io.fabric8.kubernetes.client.Config).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
> info.
> Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
> Exception in thread "main" 
> io.fabric8.kubernetes.client.KubernetesClientException: Operation: [create]  
> for kind: [Pod]  with name: [null]  in namespace: [default]  failed.
> at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
> at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72)
> at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:337)
> at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:330)
> at 
> org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:141)
> at 
> org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:140)
> at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543)
> at 
> org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:140)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:250)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:241)
> at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:241)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:204)
> at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
> at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
> at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.net.SocketException: Broken pipe (Write failed)
> at java.net.SocketOutputStream.socketWrite0(Native Method)
> at 
> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
> at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
> at sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:431)
> at sun.security.ssl.OutputRecord.write(OutputRecord.java:417)
> at 
> sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:894)
> at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:865)
> at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123)
> at okio.Okio$1.write(Okio.java:79)
> at okio.AsyncTimeout$1.write(AsyncTimeout.java:1

[jira] [Comment Edited] (SPARK-31786) Exception on submitting Spark-Pi to Kubernetes 1.17.3

2021-01-04 Thread Sachit Murarka (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258411#comment-17258411
 ] 

Sachit Murarka edited comment on SPARK-31786 at 1/4/21, 6:41 PM:
-

[~dongjoon] -> Yes I have used `export HTTP2_DISABLE=true` , but only on my 
machine .
 Should it on all nodes of Kubernetes?

Also , regarding you mentioned in second point , 
spark.kubernetes.driverEnv.HTTP2_DISABLE=true has to be used with spark-submit 
in form of --conf . 

Please let me know if my understanding is correct.

Also , since this is a workaround. What can be the long term solution. Should I 
consider Spark 3 instead of Spark 2.4.7?


was (Author: smurarka):
[~dongjoon] -> Yes I have used `export HTTP2_DISABLE=true` , but only on my 
machine .
Should it on all nodes of Kubernetes?

Also , regarding you mentioned in second point , 
spark.kubernetes.driverEnv.HTTP2_DISABLE=true has to be used with spark-submit 
in form of --conf . 

Please let me know if my understanding is correct.

> Exception on submitting Spark-Pi to Kubernetes 1.17.3
> -
>
> Key: SPARK-31786
> URL: https://issues.apache.org/jira/browse/SPARK-31786
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Maciej Bryński
>Assignee: Dongjoon Hyun
>Priority: Blocker
> Fix For: 3.0.0
>
>
> Hi,
> I'm getting exception when submitting Spark-Pi app to Kubernetes cluster.
> Kubernetes version: 1.17.3
> JDK version: openjdk version "1.8.0_252"
> Exception:
> {code}
>  ./bin/spark-submit --master k8s://https://172.31.23.60:8443 --deploy-mode 
> cluster --name spark-pi --conf 
> spark.kubernetes.container.image=spark-py:2.4.5 --conf 
> spark.kubernetes.executor.request.cores=0.1 --conf 
> spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf 
> spark.executor.instances=1 local:///opt/spark/examples/src/main/python/pi.py
> log4j:WARN No appenders could be found for logger 
> (io.fabric8.kubernetes.client.Config).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
> info.
> Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
> Exception in thread "main" 
> io.fabric8.kubernetes.client.KubernetesClientException: Operation: [create]  
> for kind: [Pod]  with name: [null]  in namespace: [default]  failed.
> at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
> at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72)
> at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:337)
> at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:330)
> at 
> org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:141)
> at 
> org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:140)
> at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543)
> at 
> org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:140)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:250)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:241)
> at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:241)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:204)
> at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
> at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
> at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.net.SocketException: Broken pipe (Write failed)
> at java.net.SocketOutputStream.socketWrite0(Native Method)
> at 
> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
> at java.net.SocketOutputStream.write(Soc

[jira] [Created] (SPARK-33995) Make datetime addition easier for years, weeks, hours, minutes, and seconds

2021-01-04 Thread Matthew Powers (Jira)

Matthew Powers created SPARK-33995:
--

 Summary: Make datetime addition easier for years, weeks, hours, 
minutes, and seconds
 Key: SPARK-33995
 URL: https://issues.apache.org/jira/browse/SPARK-33995
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.1.0
Reporter: Matthew Powers


There are add_months and date_add functions that make it easy to perform 
datetime addition with months and days, but there isn't an easy way to perform 
datetime addition with years, weeks, hours, minutes, or seconds with the 
Scala/Python/R APIs.

Users need to write code like expr("first_datetime + INTERVAL 2 hours") to add 
two hours to a timestamp with the Scala API, which isn't desirable.  We don't 
want to make Scala users manipulate SQL strings.

We can expose the [make_interval SQL 
function|https://github.com/apache/spark/pull/26446/files] to make any 
combination of datetime addition possible.  That'll make tons of different 
datetime addition operations possible and will be valuable for a wide array of 
users.

make_interval takes 7 arguments: years, months, weeks, days, hours, mins, and 
secs.

There are different ways to expose the make_interval functionality to 
Scala/Python/R users:
 * Option 1: Single make_interval function that takes 7 arguments
 * Option 2: expose a few interval functions
 ** make_date_interval function that takes years, months, days
 ** make_time_interval function that takes hours, minutes, seconds
 ** make_datetime_interval function that takes years, months, days, hours, 
minutes, seconds
 * Option 3: expose add_years, add_months, add_days, add_weeks, add_hours, 
add_minutes, and add_seconds as Column methods.  
 * Option 4: Expose the add_years, add_hours, etc. as column functions.  
add_weeks and date_add have already been exposed in this manner.  

Option 1 is nice from a maintenance perspective cause it's a single function, 
but it's not standard from a user perspective.  Most languages support datetime 
instantiation with these arguments: years, months, days, hours, minutes, 
seconds.  Mixing weeks into the equation is not standard.

As a user, Option 3 would be my preference.  
col("first_datetime").addHours(2).addSeconds(30) is easy for me to remember and 
type.  col("first_datetime") + make_time_interval(lit(2), lit(0), lit(30)) 
isn't as nice.  col("first_datetime") + make_interval(lit(0), lit(0), lit(0), 
lit(0), lit(2), lit(0), lit(30)) is harder still.

Any of these options is an improvement to the status quo.  Let me know what 
option you think is best and then I'll make a PR to implement it, building off 
of Max's foundational work of course ;)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33908) Refact SparkSubmitUtils.resolveMavenCoordinates return parameter

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258454#comment-17258454
 ] 

Apache Spark commented on SPARK-33908:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/31016

> Refact SparkSubmitUtils.resolveMavenCoordinates return parameter
> 
>
> Key: SPARK-33908
> URL: https://issues.apache.org/jira/browse/SPARK-33908
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.2.0
>
>
> Per talk in https://github.com/apache/spark/pull/29966#discussion_r531917374



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33987) v2 ALTER TABLE .. DROP PARTITION does not refresh cached table

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258479#comment-17258479
 ] 

Apache Spark commented on SPARK-33987:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/31017

> v2 ALTER TABLE .. DROP PARTITION does not refresh cached table
> --
>
> Key: SPARK-33987
> URL: https://issues.apache.org/jira/browse/SPARK-33987
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> The test below portraits the issue:
> {code:scala}
>   test("SPARK-33950: refresh cache after partition dropping") {
> withNamespaceAndTable("ns", "tbl") { t =>
>   sql(s"CREATE TABLE $t (id int, part int) $defaultUsing PARTITIONED BY 
> (part)")
>   sql(s"INSERT INTO $t PARTITION (part=0) SELECT 0")
>   sql(s"INSERT INTO $t PARTITION (part=1) SELECT 1")
>   assert(!spark.catalog.isCached(t))
>   sql(s"CACHE TABLE $t")
>   assert(spark.catalog.isCached(t))
>   QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(0, 0), Row(1, 
> 1)))
>   sql(s"ALTER TABLE $t DROP PARTITION (part=0)")
>   assert(spark.catalog.isCached(t))
>   QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(1, 1)))
> }
>   }
> {code}
> The last check fails:
> {code}
> == Results ==
> !== Correct Answer - 1 ==   == Spark Answer - 2 ==
> !struct<>   struct
> ![1,1]  [0,0]
> !   [1,1]
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33987) v2 ALTER TABLE .. DROP PARTITION does not refresh cached table

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33987:


Assignee: Apache Spark

> v2 ALTER TABLE .. DROP PARTITION does not refresh cached table
> --
>
> Key: SPARK-33987
> URL: https://issues.apache.org/jira/browse/SPARK-33987
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Apache Spark
>Priority: Major
>
> The test below portraits the issue:
> {code:scala}
>   test("SPARK-33950: refresh cache after partition dropping") {
> withNamespaceAndTable("ns", "tbl") { t =>
>   sql(s"CREATE TABLE $t (id int, part int) $defaultUsing PARTITIONED BY 
> (part)")
>   sql(s"INSERT INTO $t PARTITION (part=0) SELECT 0")
>   sql(s"INSERT INTO $t PARTITION (part=1) SELECT 1")
>   assert(!spark.catalog.isCached(t))
>   sql(s"CACHE TABLE $t")
>   assert(spark.catalog.isCached(t))
>   QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(0, 0), Row(1, 
> 1)))
>   sql(s"ALTER TABLE $t DROP PARTITION (part=0)")
>   assert(spark.catalog.isCached(t))
>   QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(1, 1)))
> }
>   }
> {code}
> The last check fails:
> {code}
> == Results ==
> !== Correct Answer - 1 ==   == Spark Answer - 2 ==
> !struct<>   struct
> ![1,1]  [0,0]
> !   [1,1]
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33987) v2 ALTER TABLE .. DROP PARTITION does not refresh cached table

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33987:


Assignee: (was: Apache Spark)

> v2 ALTER TABLE .. DROP PARTITION does not refresh cached table
> --
>
> Key: SPARK-33987
> URL: https://issues.apache.org/jira/browse/SPARK-33987
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> The test below portraits the issue:
> {code:scala}
>   test("SPARK-33950: refresh cache after partition dropping") {
> withNamespaceAndTable("ns", "tbl") { t =>
>   sql(s"CREATE TABLE $t (id int, part int) $defaultUsing PARTITIONED BY 
> (part)")
>   sql(s"INSERT INTO $t PARTITION (part=0) SELECT 0")
>   sql(s"INSERT INTO $t PARTITION (part=1) SELECT 1")
>   assert(!spark.catalog.isCached(t))
>   sql(s"CACHE TABLE $t")
>   assert(spark.catalog.isCached(t))
>   QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(0, 0), Row(1, 
> 1)))
>   sql(s"ALTER TABLE $t DROP PARTITION (part=0)")
>   assert(spark.catalog.isCached(t))
>   QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(1, 1)))
> }
>   }
> {code}
> The last check fails:
> {code}
> == Results ==
> !== Correct Answer - 1 ==   == Spark Answer - 2 ==
> !struct<>   struct
> ![1,1]  [0,0]
> !   [1,1]
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33894) Word2VecSuite failed for Scala 2.13

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33894:


Assignee: (was: Apache Spark)

> Word2VecSuite failed for Scala 2.13
> ---
>
> Key: SPARK-33894
> URL: https://issues.apache.org/jira/browse/SPARK-33894
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Affects Versions: 3.2.0
>Reporter: Darcy Shen
>Priority: Major
>
> This may be the first failed build:
> https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.7-scala-2.13/52/
> h2. Possible Work Around Fix
> Move 
> case class Data(word: String, vector: Array[Float])
> out of the class Word2VecModel
> h2. Attempts to git bisect
> master branch git "bisect"
> cc23581e2645c91fa8d6e6c81dc87b4221718bb1 fail
> 3d0323401f7a3e4369a3d3f4ff98f15d19e8a643  fail
> 9d9d4a8e122cf1137edeca857e925f7e76c1ace2   fail
> f5d2165c95fe83f24be9841807613950c1d5d6d0 fail 2020-12-01
> h2. Attached Stack Trace
> To reproduce it in master:
> ./dev/change-scala-version.sh 2.13
> sbt -Pscala-2.13
> > project mllib
> > testOnly org.apache.spark.ml.feature.Word2VecSuite
> [info] Word2VecSuite:
> [info] - params (45 milliseconds)
> [info] - Word2Vec (5 seconds, 768 milliseconds)
> [info] - getVectors (549 milliseconds)
> [info] - findSynonyms (222 milliseconds)
> [info] - window size (382 milliseconds)
> [info] - Word2Vec read/write numPartitions calculation (1 millisecond)
> [info] - Word2Vec read/write (669 milliseconds)
> [info] - Word2VecModel read/write *** FAILED *** (423 milliseconds)
> [info]   org.apache.spark.SparkException: Job aborted.
> [info]   at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:231)
> [info]   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:188)
> [info]   at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:108)
> [info]   at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:106)
> [info]   at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:131)
> [info]   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
> [info]   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
> [info]   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> [info]   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
> [info]   at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
> [info]   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:132)
> [info]   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:131)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
> [info]   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
> [info]   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
> [info]   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
> [info]   at 
> org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
> [info]   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:874)
> [info]   at 
> org.apache.spark.ml.feature.Word2VecModel$Word2VecModelWriter.saveImpl(Word2Vec.scala:368)
> [info]   at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:168)
> [info]   at org.apache.spark.ml.util.MLWritable.save(ReadWrite.scala:287)
> [info]   at org.apache.spark.ml.util.MLWritable.save$(ReadWrite.scala:287)
> [info]   at org.apache.spark.ml.feature.Word2VecModel.save(Word2Vec.scala:207)
> [info]   at 
> org.apache.spark.ml.util.DefaultReadWriteTest.testDefaultReadWrite(DefaultReadWriteTest.scala:51)
> [info]   at 
> org.apache.spark.ml.util.DefaultReadWriteTest.testDefaultReadWrite$(DefaultReadWriteTest.scala:42)
> [info]   at 
> org.apache.spark.ml.feature.Word2VecSuite.testDefaultReadWrite(Word2VecSuite.scala:28)
> [info]   at 
> org.apache.spark.ml.feature.Word2VecSuite.$anonfun

1 2 3 >

1 - 100 of 224 matches

Mail list logo