date:20190908

[jira] [Created] (SPARK-29018) Spark ThriftServer change to it's own API

2019-09-08 Thread angerszhu (Jira)

angerszhu created SPARK-29018:
-

 Summary: Spark ThriftServer change to it's own API
 Key: SPARK-29018
 URL: https://issues.apache.org/jira/browse/SPARK-29018
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: angerszhu






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29018) Spark ThriftServer change to it's own API

2019-09-08 Thread angerszhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-29018:
--
Description: 
Current SparkThriftServer rely on HiveServer2 too much, when Hive version 
changed, we should change a lot to fit for Hive code change.

We would best just use Hive's thrift interface to implement it 's own API for 
SparkThriftServer. 

And remove unused code logical [for Spark Thrift Server]. 

> Spark ThriftServer change to it's own API
> -
>
> Key: SPARK-29018
> URL: https://issues.apache.org/jira/browse/SPARK-29018
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: angerszhu
>Priority: Major
>
> Current SparkThriftServer rely on HiveServer2 too much, when Hive version 
> changed, we should change a lot to fit for Hive code change.
> We would best just use Hive's thrift interface to implement it 's own API for 
> SparkThriftServer. 
> And remove unused code logical [for Spark Thrift Server]. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29019) Improve tooltip information in JDBC/ODBC Server tab

2019-09-08 Thread Pablo Langa Blanco (Jira)

Pablo Langa Blanco created SPARK-29019:
--

 Summary: Improve tooltip information in JDBC/ODBC Server tab
 Key: SPARK-29019
 URL: https://issues.apache.org/jira/browse/SPARK-29019
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Affects Versions: 3.0.0
Reporter: Pablo Langa Blanco


Some of the columns of JDBC/ODBC server tab in Web UI are hard to understand.

We have documented it at SPARK-28373 but I think it is better to have some 
tooltips in the SQL statistics table to explain the columns

More information at the pull request



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-20765) Cannot load persisted PySpark ML Pipeline that includes 3rd party stage (Transformer or Estimator) if the package name of stage is not "org.apache.spark" and "pyspark"

2019-09-08 Thread Ilya Matiach (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-20765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ilya Matiach reopened SPARK-20765:
--

I'm still seeing this issue, please see this thread:

[https://github.com/Azure/mmlspark/issues/614]

the user is forced to do an import of the spark package in order for 
PipelineModel.load to work

Maybe there is some way to configure this to make it work for spark packages?

> Cannot load persisted PySpark ML Pipeline that includes 3rd party stage 
> (Transformer or Estimator) if the package name of stage is not 
> "org.apache.spark" and "pyspark"
> ---
>
> Key: SPARK-20765
> URL: https://issues.apache.org/jira/browse/SPARK-20765
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.0.0, 2.1.0, 2.2.0
>Reporter: APeng Zhang
>Priority: Major
>  Labels: bulk-closed
>
> When load persisted PySpark ML Pipeline instance, Pipeline._from_java() will 
> invoke JavaParams._from_java() to create Python instance of persisted stage. 
> In JavaParams._from_java(), the name of python class is derived from java 
> class name by replace string "pyspark" with "org.apache.spark". This is OK 
> for ML Transformer and Estimator inside PySpark, but for 3rd party 
> Transformer and Estimator if package name is not org.apache.spark and 
> pyspark, there will be an error:
>   File "/Users/azhang/Work/apyspark/lib/pyspark.zip/pyspark/ml/util.py", line 
> 228, in load
> return cls.read().load(path)
>   File "/Users/azhang/Work/apyspark/lib/pyspark.zip/pyspark/ml/util.py", line 
> 180, in load
> return self._clazz._from_java(java_obj)
>   File "/Users/azhang/Work/apyspark/lib/pyspark.zip/pyspark/ml/pipeline.py", 
> line 160, in _from_java
> py_stages = [JavaParams._from_java(s) for s in java_stage.getStages()]
>   File "/Users/azhang/Work/apyspark/lib/pyspark.zip/pyspark/ml/wrapper.py", 
> line 169, in _from_java
> py_type = __get_class(stage_name)
>   File "/Users/azhang/Work/apyspark/lib/pyspark.zip/pyspark/ml/wrapper.py", 
> line 163, in __get_class
> m = __import__(module)
> ImportError: No module named com.abc.xyz.ml.testclass
> Related code in PySpark:
> In pyspark/ml/pipeline.py
> class Pipeline(Estimator, MLReadable, MLWritable):
> @classmethod
> def _from_java(cls, java_stage):
> # Create a new instance of this stage.
> py_stage = cls()
> # Load information from java_stage to the instance.
> py_stages = [JavaParams._from_java(s) for s in java_stage.getStages()]
> class JavaParams(JavaWrapper, Params):
> @staticmethod
> def _from_java(java_stage):
> def __get_class(clazz):
> """
> Loads Python class from its name.
> """
> parts = clazz.split('.')
> module = ".".join(parts[:-1])
> m = __import__(module)
> for comp in parts[1:]:
> m = getattr(m, comp)
> return m
> stage_name = 
> java_stage.getClass().getName().replace("org.apache.spark", "pyspark")
> # Generate a default new instance from the stage_name class.
> py_type = __get_class(stage_name)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28042) Support mapping spark.local.dir to hostPath volume

2019-09-08 Thread Jiaxin Shan (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-28042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925265#comment-16925265
 ] 

Jiaxin Shan commented on SPARK-28042:
-

[~vanzin] Do we usually backport these improvements to 2.4.x? 

> Support mapping spark.local.dir to hostPath volume
> --
>
> Key: SPARK-28042
> URL: https://issues.apache.org/jira/browse/SPARK-28042
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Junjie Chen
>Assignee: Junjie Chen
>Priority: Minor
> Fix For: 3.0.0
>
>
> Currently, the k8s executor builder mount spark.local.dir as emptyDir or 
> memory, it should satisfy some small workload, while in some heavily workload 
> like TPCDS, both of them can have some problem, such as pods are evicted due 
> to disk pressure when using emptyDir, and OOM when using tmpfs.
> In particular on cloud environment, users may allocate cluster with minimum 
> configuration and add cloud storage when running workload. In this case, we 
> can specify multiple elastic storage as spark.local.dir to accelerate the 
> spilling. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28042) Support mapping spark.local.dir to hostPath volume

2019-09-08 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-28042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925274#comment-16925274
 ] 

Dongjoon Hyun commented on SPARK-28042:
---

No, we usually backport bug fixes only.

> Support mapping spark.local.dir to hostPath volume
> --
>
> Key: SPARK-28042
> URL: https://issues.apache.org/jira/browse/SPARK-28042
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Junjie Chen
>Assignee: Junjie Chen
>Priority: Minor
> Fix For: 3.0.0
>
>
> Currently, the k8s executor builder mount spark.local.dir as emptyDir or 
> memory, it should satisfy some small workload, while in some heavily workload 
> like TPCDS, both of them can have some problem, such as pods are evicted due 
> to disk pressure when using emptyDir, and OOM when using tmpfs.
> In particular on cloud environment, users may allocate cluster with minimum 
> configuration and add cloud storage when running workload. In this case, we 
> can specify multiple elastic storage as spark.local.dir to accelerate the 
> spilling. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28942) [Spark][WEB UI]Spark in local mode hostname display localhost in the Host Column of Task Summary Page

2019-09-08 Thread Sean Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-28942:
--
Issue Type: Improvement  (was: Bug)

> [Spark][WEB UI]Spark in local mode hostname display localhost in the Host 
> Column of Task Summary Page
> -
>
> Key: SPARK-28942
> URL: https://issues.apache.org/jira/browse/SPARK-28942
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Minor
>
> In the stage page under Task Summary Page Host Column shows 'localhost' 
> instead of showing host IP or host name mentioned against the Driver Host Name
> Steps:
> spark-shell --master local
> create table emp(id int);
> insert into emp values(100);
> select * from emp;
> Go to  Stage UI page and check the Task Summary Page.
> Host column will display 'localhost' instead the driver host.
>  
> Note in case of spark-shell --master yarn mode UI display correct host name 
> under the column.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-28942) [Spark][WEB UI]Spark in local mode hostname display localhost in the Host Column of Task Summary Page

2019-09-08 Thread Sean Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-28942.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 25645
[https://github.com/apache/spark/pull/25645]

> [Spark][WEB UI]Spark in local mode hostname display localhost in the Host 
> Column of Task Summary Page
> -
>
> Key: SPARK-28942
> URL: https://issues.apache.org/jira/browse/SPARK-28942
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: ABHISHEK KUMAR GUPTA
>Assignee: Shivu Sondur
>Priority: Minor
> Fix For: 3.0.0
>
>
> In the stage page under Task Summary Page Host Column shows 'localhost' 
> instead of showing host IP or host name mentioned against the Driver Host Name
> Steps:
> spark-shell --master local
> create table emp(id int);
> insert into emp values(100);
> select * from emp;
> Go to  Stage UI page and check the Task Summary Page.
> Host column will display 'localhost' instead the driver host.
>  
> Note in case of spark-shell --master yarn mode UI display correct host name 
> under the column.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-28942) [Spark][WEB UI]Spark in local mode hostname display localhost in the Host Column of Task Summary Page

2019-09-08 Thread Sean Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reassigned SPARK-28942:
-

Assignee: Shivu Sondur

> [Spark][WEB UI]Spark in local mode hostname display localhost in the Host 
> Column of Task Summary Page
> -
>
> Key: SPARK-28942
> URL: https://issues.apache.org/jira/browse/SPARK-28942
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: ABHISHEK KUMAR GUPTA
>Assignee: Shivu Sondur
>Priority: Minor
>
> In the stage page under Task Summary Page Host Column shows 'localhost' 
> instead of showing host IP or host name mentioned against the Driver Host Name
> Steps:
> spark-shell --master local
> create table emp(id int);
> insert into emp values(100);
> select * from emp;
> Go to  Stage UI page and check the Task Summary Page.
> Host column will display 'localhost' instead the driver host.
>  
> Note in case of spark-shell --master yarn mode UI display correct host name 
> under the column.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-27420) KinesisInputDStream should expose a way to configure CloudWatch metrics

2019-09-08 Thread Sean Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-27420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-27420.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 24651
[https://github.com/apache/spark/pull/24651]

> KinesisInputDStream should expose a way to configure CloudWatch metrics
> ---
>
> Key: SPARK-27420
> URL: https://issues.apache.org/jira/browse/SPARK-27420
> Project: Spark
>  Issue Type: Improvement
>  Components: DStreams, Input/Output
>Affects Versions: 2.3.3
>Reporter: Jerome Gagnon
>Assignee: Kengo Seki
>Priority: Major
> Fix For: 3.0.0
>
>
> KinesisInputDStream currently does not provide a way to disable CloudWatch 
> metrics push. Kinesis client library (KCL) which is used under the hood 
> provide the ability through `withMetrics` methods.
> To make things worse the default level is "DETAILED" which pushes 10s of 
> metrics every 10 seconds. When dealing with multiple streaming jobs this add 
> up pretty quickly, leading to thousands of dollar in cost. 
> Exposing a way to disable/set the proper level of monitoring is critical to 
> us. We had to send invalid credentials and suppress log as a less-than-ideal 
> workaround : see 
> [https://stackoverflow.com/questions/41811039/disable-cloudwatch-for-aws-kinesis-at-spark-streaming/55599002#55599002]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-27420) KinesisInputDStream should expose a way to configure CloudWatch metrics

2019-09-08 Thread Sean Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-27420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reassigned SPARK-27420:
-

Assignee: Kengo Seki

> KinesisInputDStream should expose a way to configure CloudWatch metrics
> ---
>
> Key: SPARK-27420
> URL: https://issues.apache.org/jira/browse/SPARK-27420
> Project: Spark
>  Issue Type: Improvement
>  Components: DStreams, Input/Output
>Affects Versions: 2.3.3
>Reporter: Jerome Gagnon
>Assignee: Kengo Seki
>Priority: Major
>
> KinesisInputDStream currently does not provide a way to disable CloudWatch 
> metrics push. Kinesis client library (KCL) which is used under the hood 
> provide the ability through `withMetrics` methods.
> To make things worse the default level is "DETAILED" which pushes 10s of 
> metrics every 10 seconds. When dealing with multiple streaming jobs this add 
> up pretty quickly, leading to thousands of dollar in cost. 
> Exposing a way to disable/set the proper level of monitoring is critical to 
> us. We had to send invalid credentials and suppress log as a less-than-ideal 
> workaround : see 
> [https://stackoverflow.com/questions/41811039/disable-cloudwatch-for-aws-kinesis-at-spark-streaming/55599002#55599002]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-28953) Integration tests fail due to malformed URL

2019-09-08 Thread Sean Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-28953.
---
Resolution: Duplicate

> Integration tests fail due to malformed URL
> ---
>
> Key: SPARK-28953
> URL: https://issues.apache.org/jira/browse/SPARK-28953
> Project: Spark
>  Issue Type: Bug
>  Components: jenkins, Kubernetes
>Affects Versions: 3.0.0
>Reporter: Stavros Kontopoulos
>Priority: Major
>
> Tests failed on Ubuntu, verified on two different machines:
> KubernetesSuite:
> - Launcher client dependencies *** FAILED ***
>  java.net.MalformedURLException: no protocol: * http://172.31.46.91:30706
>  at java.net.URL.(URL.java:600)
>  at java.net.URL.(URL.java:497)
>  at java.net.URL.(URL.java:446)
>  at 
> org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite.$anonfun$$init$$1(DepsTestsSuite.scala:160)
>  at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>  at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
>  at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>  at org.scalatest.Transformer.apply(Transformer.scala:22)
>  at org.scalatest.Transformer.apply(Transformer.scala:20)
>  at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
>  
> Welcome to
>   __
>  / __/__ ___ _/ /__
>  _\ \/ _ \/ _ `/ __/ '_/
>  /___/ .__/\_,_/_/ /_/\_\ version 3.0.0-SNAPSHOT
>  /_/
>  
>  Using Scala version 2.12.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_222)
>  Type in expressions to have them evaluated.
>  Type :help for more information.
>  
> scala> val pb = new ProcessBuilder().command("bash", "-c", "minikube service 
> ceph-nano-s3 -n spark --url")
>  pb: ProcessBuilder = java.lang.ProcessBuilder@46092840
> scala> pb.redirectErrorStream(true)
>  res0: ProcessBuilder = java.lang.ProcessBuilder@46092840
> scala> val proc = pb.start()
>  proc: Process = java.lang.UNIXProcess@5e9650d3
> scala> val r = org.apache.commons.io.IOUtils.toString(proc.getInputStream())
>  r: String =
>  "* http://172.31.46.91:30706
>  "
> Although (no asterisk):
> $ minikube service ceph-nano-s3 -n spark --url
> [http://172.31.46.91:30706|http://172.31.46.91:30706/]
>  
> This is weird because it fails at the java level, where does the asterisk 
> come from?
> $ minikube version
> minikube version: v1.3.1
> commit: ca60a424ce69a4d79f502650199ca2b52f29e631
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-28886) Kubernetes DepsTestsSuite fails on OSX with minikube 1.3.1 due to formatting

2019-09-08 Thread Sean Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-28886.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 25599
[https://github.com/apache/spark/pull/25599]

> Kubernetes DepsTestsSuite fails on OSX with minikube 1.3.1 due to formatting
> 
>
> Key: SPARK-28886
> URL: https://issues.apache.org/jira/browse/SPARK-28886
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Tests
>Affects Versions: 3.0.0
>Reporter: holdenk
>Assignee: holdenk
>Priority: Minor
> Fix For: 3.0.0
>
>
> With minikube 1.3.1 on OSX the service discovery command returns an extra "* 
> " which doesn't parse into a URL causing the DepsTestsSuite to fail.
>  
> I've got a fix just need to double check some stuff.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-28542) Document Stages page

2019-09-08 Thread Xiao Li (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li reassigned SPARK-28542:
---

Assignee: Pablo Langa Blanco

> Document Stages page
> 
>
> Key: SPARK-28542
> URL: https://issues.apache.org/jira/browse/SPARK-28542
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Web UI
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Assignee: Pablo Langa Blanco
>Priority: Minor
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24285) Flaky test: ContinuousSuite.query without test harness

2019-09-08 Thread Jungtaek Lim (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-24285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925303#comment-16925303
 ] 

Jungtaek Lim commented on SPARK-24285:
--

Somehow I found I filed duplicated issue SPARK-28247 for this, and SPARK-28247 
was resolved.

> Flaky test: ContinuousSuite.query without test harness
> --
>
> Key: SPARK-24285
> URL: https://issues.apache.org/jira/browse/SPARK-24285
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming, Tests
>Affects Versions: 2.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> *2.5.0-SNAPSHOT*
>  - [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96640]
> {code:java}
> sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: 
> scala.this.Predef.Set.apply[Int](0, 1, 2, 3).map[org.apache.spark.sql.Row, 
> scala.collection.immutable.Set[org.apache.spark.sql.Row]](((x$3: Int) => 
> org.apache.spark.sql.Row.apply(x$3)))(immutable.this.Set.canBuildFrom[org.apache.spark.sql.Row]).subsetOf(scala.this.Predef.refArrayOps[org.apache.spark.sql.Row](results).toSet[org.apache.spark.sql.Row])
>  was false{code}
> *2.3.x*
>  - 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-maven-hadoop-2.7/370/]
>  - 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-maven-hadoop-2.7/373/]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-24285) Flaky test: ContinuousSuite.query without test harness

2019-09-08 Thread Jungtaek Lim (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-24285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-24285.
--
Fix Version/s: 2.4.4
   3.0.0
   Resolution: Duplicate

I've just marked it as duplicated, but please correct this if we prefer marking 
this as resolved.

> Flaky test: ContinuousSuite.query without test harness
> --
>
> Key: SPARK-24285
> URL: https://issues.apache.org/jira/browse/SPARK-24285
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming, Tests
>Affects Versions: 2.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
> Fix For: 3.0.0, 2.4.4
>
>
> *2.5.0-SNAPSHOT*
>  - [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96640]
> {code:java}
> sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: 
> scala.this.Predef.Set.apply[Int](0, 1, 2, 3).map[org.apache.spark.sql.Row, 
> scala.collection.immutable.Set[org.apache.spark.sql.Row]](((x$3: Int) => 
> org.apache.spark.sql.Row.apply(x$3)))(immutable.this.Set.canBuildFrom[org.apache.spark.sql.Row]).subsetOf(scala.this.Predef.refArrayOps[org.apache.spark.sql.Row](results).toSet[org.apache.spark.sql.Row])
>  was false{code}
> *2.3.x*
>  - 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-maven-hadoop-2.7/370/]
>  - 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-maven-hadoop-2.7/373/]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28984) csv files should support multi-character separators

2019-09-08 Thread cheng (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-28984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925316#comment-16925316
 ] 

cheng commented on SPARK-28984:
---

My current alternative will cause the csv file to be read twice, which will 
have performance problems. If it is a Univocity issue, I think it can be 
optimized.

> csv files should support multi-character separators
> ---
>
> Key: SPARK-28984
> URL: https://issues.apache.org/jira/browse/SPARK-28984
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: cheng
>Priority: Major
> Attachments: multi_csv2.csv
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> The current version supports reading csv single-character delimiters, but now 
> many files do not follow such rules to generate files, but instead use 
> multi-character delimiters. Now I let spark by reading text and overwriting 
> multi-character delimiters. Successfully read csv, I think spark is necessary 
> to support csv files should support splitting of multi-character separators



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28970) implement USE CATALOG/NAMESPACE for Data Source V2

2019-09-08 Thread Wenchen Fan (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-28970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925327#comment-16925327
 ] 

Wenchen Fan commented on SPARK-28970:
-

{{USE foo.bar}} can be ambiguous, but I think it's fine as we have {{USE 
foo.bar IN cat}}. I forgot that we already have a proposal in the SPIP. Let's 
use that proposal.

> implement USE CATALOG/NAMESPACE for Data Source V2
> --
>
> Key: SPARK-28970
> URL: https://issues.apache.org/jira/browse/SPARK-28970
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Priority: Major
>
> Currently Spark has a `USE abc` command to switch the current database.
> We should have something similar for Data Source V2, to switch the current 
> catalog and/or current namespace.
> We can introduce 2 new command: `USE CATALOG abc` and `USE NAMESPACE abc`



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29015) Can not support "add jar" on JDK 11

2019-09-08 Thread angerszhu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925348#comment-16925348
 ] 

angerszhu commented on SPARK-29015:
---

I have tried in Java8, the same error, maybe not jdk's problem.

> Can not support "add jar" on JDK 11
> ---
>
> Key: SPARK-29015
> URL: https://issues.apache.org/jira/browse/SPARK-29015
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> How to reproduce:
> Case 1:
> {code:bash}
> export JAVA_HOME=/usr/lib/jdk-11.0.3
> export PATH=$JAVA_HOME/bin:$PATH
> build/sbt clean package -Phive -Phadoop-3.2 -Phive-thriftserver
> export SPARK_PREPEND_CLASSES=true
> sbin/start-thriftserver.sh
> bin/beeline -u jdbc:hive2://localhost:1
> {code}
> {noformat}
> 0: jdbc:hive2://localhost:1> add jar 
> /root/.m2/repository/org/apache/hive/hcatalog/hive-hcatalog-core/2.3.6/hive-hcatalog-core-2.3.6.jar;
> INFO  : Added 
> [/root/.m2/repository/org/apache/hive/hcatalog/hive-hcatalog-core/2.3.6/hive-hcatalog-core-2.3.6.jar]
>  to class path
> INFO  : Added resources: 
> [/root/.m2/repository/org/apache/hive/hcatalog/hive-hcatalog-core/2.3.6/hive-hcatalog-core-2.3.6.jar]
> +-+
> | result  |
> +-+
> +-+
> No rows selected (0.381 seconds)
> 0: jdbc:hive2://localhost:1> CREATE TABLE addJar(key string) ROW FORMAT 
> SERDE 'org.apache.hive.hcatalog.data.JsonSerDe';
> +-+
> | Result  |
> +-+
> +-+
> No rows selected (0.613 seconds)
> 0: jdbc:hive2://localhost:1> select * from addJar;
> Error: Error running query: java.lang.RuntimeException: 
> java.lang.ClassNotFoundException: org.apache.hive.hcatalog.data.JsonSerDe 
> (state=,code=0)
> {noformat}
> Case 2:
> {noformat}
> spark-sql> add jar 
> /root/.m2/repository/org/apache/hive/hcatalog/hive-hcatalog-core/2.3.6/hive-hcatalog-core-2.3.6.jar;
> ADD JAR 
> /root/.m2/repository/org/apache/hive/hcatalog/hive-hcatalog-core/2.3.6/hive-hcatalog-core-2.3.6.jar
> spark-sql> CREATE TABLE addJar(key string) ROW FORMAT SERDE 
> 'org.apache.hive.hcatalog.data.JsonSerDe';
> spark-sql> select * from addJar;
> 19/09/07 03:06:54 ERROR SparkSQLDriver: Failed in [select * from addJar]
> java.lang.RuntimeException: java.lang.ClassNotFoundException: 
> org.apache.hive.hcatalog.data.JsonSerDe
>   at 
> org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializerClass(TableDesc.java:79)
>   at 
> org.apache.spark.sql.hive.execution.HiveTableScanExec.addColumnMetadataToConf(HiveTableScanExec.scala:123)
>   at 
> org.apache.spark.sql.hive.execution.HiveTableScanExec.hadoopConf$lzycompute(HiveTableScanExec.scala:101)
>   at 
> org.apache.spark.sql.hive.execution.HiveTableScanExec.hadoopConf(HiveTableScanExec.scala:98)
>   at 
> org.apache.spark.sql.hive.execution.HiveTableScanExec.hadoopReader$lzycompute(HiveTableScanExec.scala:110)
>   at 
> org.apache.spark.sql.hive.execution.HiveTableScanExec.hadoopReader(HiveTableScanExec.scala:105)
>   at 
> org.apache.spark.sql.hive.execution.HiveTableScanExec.$anonfun$doExecute$1(HiveTableScanExec.scala:188)
>   at org.apache.spark.util.Utils$.withDummyCallSite(Utils.scala:2488)
>   at 
> org.apache.spark.sql.hive.execution.HiveTableScanExec.doExecute(HiveTableScanExec.scala:188)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:189)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:227)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:224)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:185)
>   at 
> org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:329)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:378)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:408)
>   at 
> org.apache.spark.sql.execution.HiveResult$.hiveResultString(HiveResult.scala:52)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.$anonfun$run$1(SparkSQLDriver.scala:65)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$4(SQLExecution.scala:100)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:87)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:65)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:367)
>   at org.ap

[jira] [Commented] (SPARK-28042) Support mapping spark.local.dir to hostPath volume

2019-09-08 Thread Jiaxin Shan (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-28042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925349#comment-16925349
 ] 

Jiaxin Shan commented on SPARK-28042:
-

[~dongjoon] Thanks! I will try to cherry-pick changes and build a customized 
version for now. 

> Support mapping spark.local.dir to hostPath volume
> --
>
> Key: SPARK-28042
> URL: https://issues.apache.org/jira/browse/SPARK-28042
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Junjie Chen
>Assignee: Junjie Chen
>Priority: Minor
> Fix For: 3.0.0
>
>
> Currently, the k8s executor builder mount spark.local.dir as emptyDir or 
> memory, it should satisfy some small workload, while in some heavily workload 
> like TPCDS, both of them can have some problem, such as pods are evicted due 
> to disk pressure when using emptyDir, and OOM when using tmpfs.
> In particular on cloud environment, users may allocate cluster with minimum 
> configuration and add cloud storage when running workload. In this case, we 
> can specify multiple elastic storage as spark.local.dir to accelerate the 
> spilling. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26205) Optimize InSet expression for bytes, shorts, ints, dates

2019-09-08 Thread Wenchen Fan (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-26205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925361#comment-16925361
 ] 

Wenchen Fan commented on SPARK-26205:
-

I just noticed that the new codegen of `InSet` fail to compile if the input set 
is empty. This is not a real problem now because optimizer would convert `In` 
with empty list to a literal, but this can be a potential bug. [~aokolnychyi] 
can you fix it?

> Optimize InSet expression for bytes, shorts, ints, dates
> 
>
> Key: SPARK-26205
> URL: https://issues.apache.org/jira/browse/SPARK-26205
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Anton Okolnychyi
>Assignee: Anton Okolnychyi
>Priority: Major
> Fix For: 3.0.0
>
>
> {{In}} expressions are compiled into a sequence of if-else statements, which 
> results in O\(n\) time complexity. {{InSet}} is an optimized version of 
> {{In}}, which is supposed to improve the performance if the number of 
> elements is big enough. However, {{InSet}} actually degrades the performance 
> in many cases due to various reasons (benchmarks were created in SPARK-26203 
> and solutions to the boxing problem are discussed in SPARK-26204).
> The main idea of this JIRA is to use Java {{switch}} statements to 
> significantly improve the performance of {{InSet}} expressions for bytes, 
> shorts, ints, dates. All {{switch}} statements are compiled into 
> {{tableswitch}} and {{lookupswitch}} bytecode instructions. We will have 
> O\(1\) time complexity if our case values are compact and {{tableswitch}} can 
> be used. Otherwise, {{lookupswitch}} will give us O\(log n\). Our local 
> benchmarks show that this logic is more than two times faster even on 500+ 
> elements than using primitive collections in {{InSet}} expressions. As Spark 
> is using Scala {{HashSet}} right now, the performance gain will be is even 
> bigger.
> See 
> [here|https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-3.html#jvms-3.10]
>  and 
> [here|https://stackoverflow.com/questions/10287700/difference-between-jvms-lookupswitch-and-tableswitch]
>  for more information.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-28916) Generated SpecificSafeProjection.apply method grows beyond 64 KB when use SparkSQL

2019-09-08 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-28916:
---

Assignee: Marco Gaido

> Generated SpecificSafeProjection.apply method grows beyond 64 KB when use  
> SparkSQL
> ---
>
> Key: SPARK-28916
> URL: https://issues.apache.org/jira/browse/SPARK-28916
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1, 2.4.3
>Reporter: MOBIN
>Assignee: Marco Gaido
>Priority: Major
>
> Can be reproduced by the following steps：
> 1. Create a table with 5000 fields
> 2. val data=spark.sql("select * from spark64kb limit 10");
> 3. data.describe()
> Then，The following error occurred
> {code:java}
> WARN scheduler.TaskSetManager: Lost task 0.0 in stage 1.0 (TID 0, localhost, 
> executor 1): org.codehaus.janino.InternalCompilerException: failed to 
> compile: org.codehaus.janino.InternalCompilerException: Compiling 
> "GeneratedClass": Code of method 
> "apply(Ljava/lang/Object;)Ljava/lang/Object;" of class 
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificMutableProjection"
>  grows beyond 64 KB
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:1298)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1376)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1373)
>   at 
> org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
>   at 
> org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
>   at 
> org.spark_project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
>   at 
> org.spark_project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257)
>   at org.spark_project.guava.cache.LocalCache.get(LocalCache.java:4000)
>   at org.spark_project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004)
>   at 
> org.spark_project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1238)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateMutableProjection$.create(GenerateMutableProjection.scala:143)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateMutableProjection$.generate(GenerateMutableProjection.scala:44)
>   at 
> org.apache.spark.sql.execution.SparkPlan.newMutableProjection(SparkPlan.scala:385)
>   at 
> org.apache.spark.sql.execution.aggregate.SortAggregateExec$$anonfun$doExecute$1$$anonfun$3$$anonfun$4.apply(SortAggregateExec.scala:96)
>   at 
> org.apache.spark.sql.execution.aggregate.SortAggregateExec$$anonfun$doExecute$1$$anonfun$3$$anonfun$4.apply(SortAggregateExec.scala:95)
>   at 
> org.apache.spark.sql.execution.aggregate.AggregationIterator.generateProcessRow(AggregationIterator.scala:180)
>   at 
> org.apache.spark.sql.execution.aggregate.AggregationIterator.(AggregationIterator.scala:199)
>   at 
> org.apache.spark.sql.execution.aggregate.SortBasedAggregationIterator.(SortBasedAggregationIterator.scala:40)
>   at 
> org.apache.spark.sql.execution.aggregate.SortAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortAggregateExec.scala:86)
>   at 
> org.apache.spark.sql.execution.aggregate.SortAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortAggregateExec.scala:77)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$1$$anonfun$12.apply(RDD.scala:823)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$1$$anonfun$12.apply(RDD.scala:823)
>   at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.run(Task.scala:121)
>   at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.codehaus.janino.InternalCompilerException: Compiling 
> "GeneratedClass": Code of method 
> "apply(Ljava/lang/Object;)Ljava/lang/Object;" of class 
> "org.apache.spark.sql.catalyst.expre

[jira] [Resolved] (SPARK-28916) Generated SpecificSafeProjection.apply method grows beyond 64 KB when use SparkSQL

2019-09-08 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-28916.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 25642
[https://github.com/apache/spark/pull/25642]

> Generated SpecificSafeProjection.apply method grows beyond 64 KB when use  
> SparkSQL
> ---
>
> Key: SPARK-28916
> URL: https://issues.apache.org/jira/browse/SPARK-28916
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1, 2.4.3
>Reporter: MOBIN
>Assignee: Marco Gaido
>Priority: Major
> Fix For: 3.0.0
>
>
> Can be reproduced by the following steps：
> 1. Create a table with 5000 fields
> 2. val data=spark.sql("select * from spark64kb limit 10");
> 3. data.describe()
> Then，The following error occurred
> {code:java}
> WARN scheduler.TaskSetManager: Lost task 0.0 in stage 1.0 (TID 0, localhost, 
> executor 1): org.codehaus.janino.InternalCompilerException: failed to 
> compile: org.codehaus.janino.InternalCompilerException: Compiling 
> "GeneratedClass": Code of method 
> "apply(Ljava/lang/Object;)Ljava/lang/Object;" of class 
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificMutableProjection"
>  grows beyond 64 KB
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:1298)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1376)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1373)
>   at 
> org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
>   at 
> org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
>   at 
> org.spark_project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
>   at 
> org.spark_project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257)
>   at org.spark_project.guava.cache.LocalCache.get(LocalCache.java:4000)
>   at org.spark_project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004)
>   at 
> org.spark_project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1238)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateMutableProjection$.create(GenerateMutableProjection.scala:143)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateMutableProjection$.generate(GenerateMutableProjection.scala:44)
>   at 
> org.apache.spark.sql.execution.SparkPlan.newMutableProjection(SparkPlan.scala:385)
>   at 
> org.apache.spark.sql.execution.aggregate.SortAggregateExec$$anonfun$doExecute$1$$anonfun$3$$anonfun$4.apply(SortAggregateExec.scala:96)
>   at 
> org.apache.spark.sql.execution.aggregate.SortAggregateExec$$anonfun$doExecute$1$$anonfun$3$$anonfun$4.apply(SortAggregateExec.scala:95)
>   at 
> org.apache.spark.sql.execution.aggregate.AggregationIterator.generateProcessRow(AggregationIterator.scala:180)
>   at 
> org.apache.spark.sql.execution.aggregate.AggregationIterator.(AggregationIterator.scala:199)
>   at 
> org.apache.spark.sql.execution.aggregate.SortBasedAggregationIterator.(SortBasedAggregationIterator.scala:40)
>   at 
> org.apache.spark.sql.execution.aggregate.SortAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortAggregateExec.scala:86)
>   at 
> org.apache.spark.sql.execution.aggregate.SortAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortAggregateExec.scala:77)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$1$$anonfun$12.apply(RDD.scala:823)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$1$$anonfun$12.apply(RDD.scala:823)
>   at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.run(Task.scala:121)
>   at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.codehaus.janino.InternalCompilerException: Compiling

[jira] [Resolved] (SPARK-29000) [SQL] Decimal precision overflow when don't allow precision loss

2019-09-08 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-29000.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 25701
[https://github.com/apache/spark/pull/25701]

> [SQL] Decimal precision overflow when don't allow precision loss
> 
>
> Key: SPARK-29000
> URL: https://issues.apache.org/jira/browse/SPARK-29000
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: feiwang
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: screenshot-1.png
>
>
> When we set spark.sql.decimalOperations.allowPrecisionLoss=false.
> For the sql below, the result will overflow and return null.
> {code:java}
> // Some comments here
> select case when 1=2 then 1 else 100. end * 1
> {code}
> However, this sql will return correct result.
> {code:java}
> // Some comments here
> select case when 1=2 then 1 else 100. end * 1.0
> {code}
> The reason is that, there are some issues for the binaryOperator between 
> nonDecimal and decimal.
> In fact, there is a nondecimalAndDecimal method in DecimalPrecision class.
> I copy its implementation into the body of ImplicitTypeCasts.coerceTypes() 
> method.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-29000) [SQL] Decimal precision overflow when don't allow precision loss

2019-09-08 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-29000:
---

Assignee: feiwang

> [SQL] Decimal precision overflow when don't allow precision loss
> 
>
> Key: SPARK-29000
> URL: https://issues.apache.org/jira/browse/SPARK-29000
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: feiwang
>Assignee: feiwang
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: screenshot-1.png
>
>
> When we set spark.sql.decimalOperations.allowPrecisionLoss=false.
> For the sql below, the result will overflow and return null.
> {code:java}
> // Some comments here
> select case when 1=2 then 1 else 100. end * 1
> {code}
> However, this sql will return correct result.
> {code:java}
> // Some comments here
> select case when 1=2 then 1 else 100. end * 1.0
> {code}
> The reason is that, there are some issues for the binaryOperator between 
> nonDecimal and decimal.
> In fact, there is a nondecimalAndDecimal method in DecimalPrecision class.
> I copy its implementation into the body of ImplicitTypeCasts.coerceTypes() 
> method.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-28637) Thriftserver can not support interval type

2019-09-08 Thread Xiao Li (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-28637.
-
Fix Version/s: 3.0.0
 Assignee: Yuming Wang
   Resolution: Fixed

> Thriftserver can not support interval type
> --
>
> Key: SPARK-28637
> URL: https://issues.apache.org/jira/browse/SPARK-28637
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.0.0
>
>
> {code:sql}
> 0: jdbc:hive2://localhost:1> select interval '10-11' year to month;
> Error: java.lang.IllegalArgumentException: Unrecognized type name: interval 
> (state=,code=0)
> {code}
> {code:sql}
> spark-sql> select interval '10-11' year to month;
> interval 10 years 11 months
> {code}
> Thriftserver log:
> {noformat}
> java.lang.RuntimeException: java.lang.IllegalArgumentException: Unrecognized 
> type name: interval
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:83)
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
>   at 
> java.security.AccessController.doPrivileged(AccessController.java:770)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
>   at com.sun.proxy.$Proxy26.getResultSetMetadata(Unknown Source)
>   at 
> org.apache.hive.service.cli.CLIService.getResultSetMetadata(CLIService.java:436)
>   at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.GetResultSetMetadata(ThriftCLIService.java:607)
>   at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$GetResultSetMetadata.getResult(TCLIService.java:1533)
>   at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$GetResultSetMetadata.getResult(TCLIService.java:1518)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
>   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>   at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:53)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:310)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:819)
> Caused by: java.lang.IllegalArgumentException: Unrecognized type name: 
> interval
>   at org.apache.hive.service.cli.Type.getType(Type.java:169)
>   at 
> org.apache.hive.service.cli.TypeDescriptor.(TypeDescriptor.java:53)
>   at 
> org.apache.hive.service.cli.ColumnDescriptor.(ColumnDescriptor.java:53)
>   at org.apache.hive.service.cli.TableSchema.(TableSchema.java:52)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$.getTableSchema(SparkExecuteStatementOperation.scala:314)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.resultSchema$lzycompute(SparkExecuteStatementOperation.scala:69)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.resultSchema(SparkExecuteStatementOperation.scala:64)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.getResultSetSchema(SparkExecuteStatementOperation.scala:158)
>   at 
> org.apache.hive.service.cli.operation.OperationManager.getOperationResultSetSchema(OperationManager.java:209)
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.getResultSetMetadata(HiveSessionImpl.java:773)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
>   ... 18 more
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29018) Spark ThriftServer change to it's own API

[jira] [Updated] (SPARK-29018) Spark ThriftServer change to it's own API

[jira] [Created] (SPARK-29019) Improve tooltip information in JDBC/ODBC Server tab

[jira] [Reopened] (SPARK-20765) Cannot load persisted PySpark ML Pipeline that includes 3rd party stage (Transformer or Estimator) if the package name of stage is not "org.apache.spark" and "pyspark"

[jira] [Commented] (SPARK-28042) Support mapping spark.local.dir to hostPath volume

[jira] [Commented] (SPARK-28042) Support mapping spark.local.dir to hostPath volume

[jira] [Updated] (SPARK-28942) [Spark][WEB UI]Spark in local mode hostname display localhost in the Host Column of Task Summary Page

[jira] [Resolved] (SPARK-28942) [Spark][WEB UI]Spark in local mode hostname display localhost in the Host Column of Task Summary Page

[jira] [Assigned] (SPARK-28942) [Spark][WEB UI]Spark in local mode hostname display localhost in the Host Column of Task Summary Page

[jira] [Resolved] (SPARK-27420) KinesisInputDStream should expose a way to configure CloudWatch metrics

[jira] [Assigned] (SPARK-27420) KinesisInputDStream should expose a way to configure CloudWatch metrics

[jira] [Resolved] (SPARK-28953) Integration tests fail due to malformed URL

[jira] [Resolved] (SPARK-28886) Kubernetes DepsTestsSuite fails on OSX with minikube 1.3.1 due to formatting

[jira] [Assigned] (SPARK-28542) Document Stages page

[jira] [Commented] (SPARK-24285) Flaky test: ContinuousSuite.query without test harness

[jira] [Resolved] (SPARK-24285) Flaky test: ContinuousSuite.query without test harness

[jira] [Commented] (SPARK-28984) csv files should support multi-character separators

[jira] [Commented] (SPARK-28970) implement USE CATALOG/NAMESPACE for Data Source V2

[jira] [Commented] (SPARK-29015) Can not support "add jar" on JDK 11

[jira] [Commented] (SPARK-28042) Support mapping spark.local.dir to hostPath volume

[jira] [Commented] (SPARK-26205) Optimize InSet expression for bytes, shorts, ints, dates

[jira] [Assigned] (SPARK-28916) Generated SpecificSafeProjection.apply method grows beyond 64 KB when use SparkSQL

[jira] [Resolved] (SPARK-28916) Generated SpecificSafeProjection.apply method grows beyond 64 KB when use SparkSQL

[jira] [Resolved] (SPARK-29000) [SQL] Decimal precision overflow when don't allow precision loss

[jira] [Assigned] (SPARK-29000) [SQL] Decimal precision overflow when don't allow precision loss

[jira] [Resolved] (SPARK-28637) Thriftserver can not support interval type

26 matches

Site Navigation

Mail list logo

Footer information