[jira] [Resolved] (SPARK-28759) Upgrade scala-maven-plugin to 4.1.1

2019-08-16 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-28759.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 25476
[https://github.com/apache/spark/pull/25476]

> Upgrade scala-maven-plugin to 4.1.1
> ---
>
> Key: SPARK-28759
> URL: https://issues.apache.org/jira/browse/SPARK-28759
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28765) Dependency generation for JDK8/JDK11

2019-08-16 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-28765:
--
Description: 
SPARK-28737 removes `javax.annotation:javax.annotation-api:jar:1.2:compile` 
dependency. However, it occurs at JDK11 environment. This breaks our dependency 
manifest testing on JDK11 environment.
{code:java}
$ mvn dependency:tree -Dincludes=javax.annotation:javax.annotation-api 
-Phadoop-3.2 -Phive-thriftserver -Pmesos -Pkubernetes -Pyarn -Phive
...
[INFO] --- maven-dependency-plugin:3.1.1:tree (default-cli) @ 
spark-kubernetes_2.12 ---
[INFO] org.apache.spark:spark-kubernetes_2.12:jar:3.0.0-SNAPSHOT
[INFO] \- io.fabric8:kubernetes-client:jar:4.1.2:compile
[INFO]\- io.fabric8:kubernetes-model:jar:4.1.2:compile
[INFO]   \- javax.annotation:javax.annotation-api:jar:1.2:compile
{code}

The other example is the following.
{code}
$ mvn dependency:tree -Dincludes=jakarta.activation:jakarta.activation-api 
-Phadoop-3.2 -Phive-thriftserver -Pmesos -Pkubernetes -Pyarn -Phive
...
[INFO] org.apache.spark:spark-catalyst_2.12:jar:3.0.0-SNAPSHOT
[INFO] \- org.apache.spark:spark-core_2.12:jar:3.0.0-SNAPSHOT:compile
[INFO]\- org.glassfish.jersey.core:jersey-server:jar:2.29:compile
[INFO]   \- jakarta.xml.bind:jakarta.xml.bind-api:jar:2.3.2:compile
[INFO]  \- jakarta.activation:jakarta.activation-api:jar:1.2.1:compile
{code}


*JDK8*
{code}
$ cd core
$ mvn -X dependency:tree -Dincludes=jakarta.activation:jakarta.activation-api
...
[DEBUG]   org.glassfish.jersey.core:jersey-server:jar:2.29:compile (version 
managed from 2.22.2)
[DEBUG]  org.glassfish.jersey.media:jersey-media-jaxb:jar:2.29:compile
[DEBUG]  javax.validation:validation-api:jar:2.0.1.Final:compile
{code}

*JDK11*
{code}
$ cd core
$ mvn -X dependency:tree -Dincludes=jakarta.activation:jakarta.activation-api
...
[DEBUG]   org.glassfish.jersey.core:jersey-server:jar:2.29:compile (version 
managed from 2.22.2)
[DEBUG]  org.glassfish.jersey.media:jersey-media-jaxb:jar:2.29:compile
[DEBUG]  javax.validation:validation-api:jar:2.0.1.Final:compile
[DEBUG]  jakarta.xml.bind:jakarta.xml.bind-api:jar:2.3.2:compile
[DEBUG] jakarta.activation:jakarta.activation-api:jar:1.2.1:compile
{code}

  was:
SPARK-28737 removes `javax.annotation:javax.annotation-api:jar:1.2:compile` 
dependency. However, it occurs at JDK11 environment. This breaks our dependency 
manifest testing on JDK11 environment.
{code:java}
$ mvn dependency:tree -Dincludes=javax.annotation:javax.annotation-api 
-Phadoop-3.2 -Phive-thriftserver -Pmesos -Pkubernetes -Pyarn -Phive
...
[INFO] --- maven-dependency-plugin:3.1.1:tree (default-cli) @ 
spark-kubernetes_2.12 ---
[INFO] org.apache.spark:spark-kubernetes_2.12:jar:3.0.0-SNAPSHOT
[INFO] \- io.fabric8:kubernetes-client:jar:4.1.2:compile
[INFO]\- io.fabric8:kubernetes-model:jar:4.1.2:compile
[INFO]   \- javax.annotation:javax.annotation-api:jar:1.2:compile
{code}

The other example is the following.
{code}
$ mvn dependency:tree -Dincludes=jakarta.activation:jakarta.activation-api 
-Phadoop-3.2 -Phive-thriftserver -Pmesos -Pkubernetes -Pyarn -Phive
...
[INFO] org.apache.spark:spark-catalyst_2.12:jar:3.0.0-SNAPSHOT
[INFO] \- org.apache.spark:spark-core_2.12:jar:3.0.0-SNAPSHOT:compile
[INFO]\- org.glassfish.jersey.core:jersey-server:jar:2.29:compile
[INFO]   \- jakarta.xml.bind:jakarta.xml.bind-api:jar:2.3.2:compile
[INFO]  \- jakarta.activation:jakarta.activation-api:jar:1.2.1:compile
{code}


*JDK8*
{code}
$ cd resource-managers/kubernetes/core
$ mvn -X dependency:tree -Dincludes=jakarta.activation:jakarta.activation-api
...
[DEBUG]   org.glassfish.jersey.core:jersey-server:jar:2.29:compile (version 
managed from 2.22.2)
[DEBUG]  org.glassfish.jersey.media:jersey-media-jaxb:jar:2.29:compile
[DEBUG]  javax.validation:validation-api:jar:2.0.1.Final:compile
{code}

*JDK11*
{code}
$ cd resource-managers/kubernetes/core
$ mvn -X dependency:tree -Dincludes=jakarta.activation:jakarta.activation-api
...
[DEBUG]   org.glassfish.jersey.core:jersey-server:jar:2.29:compile (version 
managed from 2.22.2)
[DEBUG]  org.glassfish.jersey.media:jersey-media-jaxb:jar:2.29:compile
[DEBUG]  javax.validation:validation-api:jar:2.0.1.Final:compile
[DEBUG]  jakarta.xml.bind:jakarta.xml.bind-api:jar:2.3.2:compile
[DEBUG] jakarta.activation:jakarta.activation-api:jar:1.2.1:compile
{code}


> Dependency generation for JDK8/JDK11
> 
>
> Key: SPARK-28765
> URL: https://issues.apache.org/jira/browse/SPARK-28765
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> SPARK-28737 removes 

[jira] [Updated] (SPARK-28765) Dependency generation for JDK8/JDK11

2019-08-16 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-28765:
--
Description: 
SPARK-28737 removes `javax.annotation:javax.annotation-api:jar:1.2:compile` 
dependency. However, it occurs at JDK11 environment. This breaks our dependency 
manifest testing on JDK11 environment.
{code:java}
$ mvn dependency:tree -Dincludes=javax.annotation:javax.annotation-api 
-Phadoop-3.2 -Phive-thriftserver -Pmesos -Pkubernetes -Pyarn -Phive
...
[INFO] --- maven-dependency-plugin:3.1.1:tree (default-cli) @ 
spark-kubernetes_2.12 ---
[INFO] org.apache.spark:spark-kubernetes_2.12:jar:3.0.0-SNAPSHOT
[INFO] \- io.fabric8:kubernetes-client:jar:4.1.2:compile
[INFO]\- io.fabric8:kubernetes-model:jar:4.1.2:compile
[INFO]   \- javax.annotation:javax.annotation-api:jar:1.2:compile
{code}

The other example is the following.
{code}
$ mvn dependency:tree -Dincludes=jakarta.activation:jakarta.activation-api 
-Phadoop-3.2 -Phive-thriftserver -Pmesos -Pkubernetes -Pyarn -Phive
...
[INFO] org.apache.spark:spark-catalyst_2.12:jar:3.0.0-SNAPSHOT
[INFO] \- org.apache.spark:spark-core_2.12:jar:3.0.0-SNAPSHOT:compile
[INFO]\- org.glassfish.jersey.core:jersey-server:jar:2.29:compile
[INFO]   \- jakarta.xml.bind:jakarta.xml.bind-api:jar:2.3.2:compile
[INFO]  \- jakarta.activation:jakarta.activation-api:jar:1.2.1:compile
{code}


*JDK8*
{code}
$ cd resource-managers/kubernetes/core
$ mvn -X dependency:tree -Dincludes=jakarta.activation:jakarta.activation-api
...
[DEBUG]   org.glassfish.jersey.core:jersey-server:jar:2.29:compile (version 
managed from 2.22.2)
[DEBUG]  org.glassfish.jersey.media:jersey-media-jaxb:jar:2.29:compile
[DEBUG]  javax.validation:validation-api:jar:2.0.1.Final:compile
{code}

*JDK11*
{code}
$ cd resource-managers/kubernetes/core
$ mvn -X dependency:tree -Dincludes=jakarta.activation:jakarta.activation-api
...
[DEBUG]   org.glassfish.jersey.core:jersey-server:jar:2.29:compile (version 
managed from 2.22.2)
[DEBUG]  org.glassfish.jersey.media:jersey-media-jaxb:jar:2.29:compile
[DEBUG]  javax.validation:validation-api:jar:2.0.1.Final:compile
[DEBUG]  jakarta.xml.bind:jakarta.xml.bind-api:jar:2.3.2:compile
[DEBUG] jakarta.activation:jakarta.activation-api:jar:1.2.1:compile
{code}

  was:
SPARK-28737 removes `javax.annotation:javax.annotation-api:jar:1.2:compile` 
dependency. However, it occurs at JDK11 environment. This breaks our dependency 
manifest testing on JDK11 environment.
{code:java}
$ mvn dependency:tree -Dincludes=javax.annotation:javax.annotation-api 
-Phadoop-3.2 -Phive-thriftserver -Pmesos -Pkubernetes -Pyarn -Phive
...
[INFO] --- maven-dependency-plugin:3.1.1:tree (default-cli) @ 
spark-kubernetes_2.12 ---
[INFO] org.apache.spark:spark-kubernetes_2.12:jar:3.0.0-SNAPSHOT
[INFO] \- io.fabric8:kubernetes-client:jar:4.1.2:compile
[INFO]\- io.fabric8:kubernetes-model:jar:4.1.2:compile
[INFO]   \- javax.annotation:javax.annotation-api:jar:1.2:compile
{code}

The other example is the following.
{code}
$ mvn dependency:tree -Dincludes=jakarta.activation:jakarta.activation-api 
-Phadoop-3.2 -Phive-thriftserver -Pmesos -Pkubernetes -Pyarn -Phive
...
[INFO] org.apache.spark:spark-catalyst_2.12:jar:3.0.0-SNAPSHOT
[INFO] \- org.apache.spark:spark-core_2.12:jar:3.0.0-SNAPSHOT:compile
[INFO]\- org.glassfish.jersey.core:jersey-server:jar:2.29:compile
[INFO]   \- jakarta.xml.bind:jakarta.xml.bind-api:jar:2.3.2:compile
[INFO]  \- jakarta.activation:jakarta.activation-api:jar:1.2.1:compile
{code}


> Dependency generation for JDK8/JDK11
> 
>
> Key: SPARK-28765
> URL: https://issues.apache.org/jira/browse/SPARK-28765
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> SPARK-28737 removes `javax.annotation:javax.annotation-api:jar:1.2:compile` 
> dependency. However, it occurs at JDK11 environment. This breaks our 
> dependency manifest testing on JDK11 environment.
> {code:java}
> $ mvn dependency:tree -Dincludes=javax.annotation:javax.annotation-api 
> -Phadoop-3.2 -Phive-thriftserver -Pmesos -Pkubernetes -Pyarn -Phive
> ...
> [INFO] --- maven-dependency-plugin:3.1.1:tree (default-cli) @ 
> spark-kubernetes_2.12 ---
> [INFO] org.apache.spark:spark-kubernetes_2.12:jar:3.0.0-SNAPSHOT
> [INFO] \- io.fabric8:kubernetes-client:jar:4.1.2:compile
> [INFO]\- io.fabric8:kubernetes-model:jar:4.1.2:compile
> [INFO]   \- javax.annotation:javax.annotation-api:jar:1.2:compile
> {code}
> The other example is the following.
> {code}
> $ mvn dependency:tree -Dincludes=jakarta.activation:jakarta.activation-api 
> -Phadoop-3.2 -Phive-thriftserver -Pmesos -Pkubernetes -Pyarn -Phive
> ...
> [INFO] 

[jira] [Updated] (SPARK-28765) Dependency generation for JDK8/JDK11

2019-08-16 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-28765:
--
Description: 
SPARK-28737 removes `javax.annotation:javax.annotation-api:jar:1.2:compile` 
dependency. However, it occurs at JDK11 environment. This breaks our dependency 
manifest testing on JDK11 environment.
{code:java}
$ mvn dependency:tree -Dincludes=javax.annotation:javax.annotation-api 
-Phadoop-3.2 -Phive-thriftserver -Pmesos -Pkubernetes -Pyarn -Phive
...
[INFO] --- maven-dependency-plugin:3.1.1:tree (default-cli) @ 
spark-kubernetes_2.12 ---
[INFO] org.apache.spark:spark-kubernetes_2.12:jar:3.0.0-SNAPSHOT
[INFO] \- io.fabric8:kubernetes-client:jar:4.1.2:compile
[INFO]\- io.fabric8:kubernetes-model:jar:4.1.2:compile
[INFO]   \- javax.annotation:javax.annotation-api:jar:1.2:compile
{code}

The other example is the following.
{code}
$ mvn dependency:tree -Dincludes=jakarta.activation:jakarta.activation-api 
-Phadoop-3.2 -Phive-thriftserver -Pmesos -Pkubernetes -Pyarn -Phive
...
[INFO] org.apache.spark:spark-catalyst_2.12:jar:3.0.0-SNAPSHOT
[INFO] \- org.apache.spark:spark-core_2.12:jar:3.0.0-SNAPSHOT:compile
[INFO]\- org.glassfish.jersey.core:jersey-server:jar:2.29:compile
[INFO]   \- jakarta.xml.bind:jakarta.xml.bind-api:jar:2.3.2:compile
[INFO]  \- jakarta.activation:jakarta.activation-api:jar:1.2.1:compile
{code}

  was:
SPARK-28737 removes `javax.annotation:javax.annotation-api:jar:1.2:compile` 
dependency. However, it occurs at JDK11 environment. This breaks our dependency 
manifest testing on JDK11 environment.
{code:java}
$ mvn dependency:tree -Dincludes=javax.annotation:javax.annotation-api 
-Phadoop-3.2 -Phive-thriftserver -Pmesos -Pkubernetes -Pyarn -Phive
...
[INFO] --- maven-dependency-plugin:3.1.1:tree (default-cli) @ 
spark-kubernetes_2.12 ---
[INFO] org.apache.spark:spark-kubernetes_2.12:jar:3.0.0-SNAPSHOT
[INFO] \- io.fabric8:kubernetes-client:jar:4.1.2:compile
[INFO]\- io.fabric8:kubernetes-model:jar:4.1.2:compile
[INFO]   \- javax.annotation:javax.annotation-api:jar:1.2:compile
{code}

On other example is the following.
{code}
$ mvn dependency:tree -Dincludes=jakarta.activation:jakarta.activation-api 
-Phadoop-3.2 -Phive-thriftserver -Pmesos -Pkubernetes -Pyarn -Phive
...
[INFO] org.apache.spark:spark-catalyst_2.12:jar:3.0.0-SNAPSHOT
[INFO] \- org.apache.spark:spark-core_2.12:jar:3.0.0-SNAPSHOT:compile
[INFO]\- org.glassfish.jersey.core:jersey-server:jar:2.29:compile
[INFO]   \- jakarta.xml.bind:jakarta.xml.bind-api:jar:2.3.2:compile
[INFO]  \- jakarta.activation:jakarta.activation-api:jar:1.2.1:compile
{code}


> Dependency generation for JDK8/JDK11
> 
>
> Key: SPARK-28765
> URL: https://issues.apache.org/jira/browse/SPARK-28765
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> SPARK-28737 removes `javax.annotation:javax.annotation-api:jar:1.2:compile` 
> dependency. However, it occurs at JDK11 environment. This breaks our 
> dependency manifest testing on JDK11 environment.
> {code:java}
> $ mvn dependency:tree -Dincludes=javax.annotation:javax.annotation-api 
> -Phadoop-3.2 -Phive-thriftserver -Pmesos -Pkubernetes -Pyarn -Phive
> ...
> [INFO] --- maven-dependency-plugin:3.1.1:tree (default-cli) @ 
> spark-kubernetes_2.12 ---
> [INFO] org.apache.spark:spark-kubernetes_2.12:jar:3.0.0-SNAPSHOT
> [INFO] \- io.fabric8:kubernetes-client:jar:4.1.2:compile
> [INFO]\- io.fabric8:kubernetes-model:jar:4.1.2:compile
> [INFO]   \- javax.annotation:javax.annotation-api:jar:1.2:compile
> {code}
> The other example is the following.
> {code}
> $ mvn dependency:tree -Dincludes=jakarta.activation:jakarta.activation-api 
> -Phadoop-3.2 -Phive-thriftserver -Pmesos -Pkubernetes -Pyarn -Phive
> ...
> [INFO] org.apache.spark:spark-catalyst_2.12:jar:3.0.0-SNAPSHOT
> [INFO] \- org.apache.spark:spark-core_2.12:jar:3.0.0-SNAPSHOT:compile
> [INFO]\- org.glassfish.jersey.core:jersey-server:jar:2.29:compile
> [INFO]   \- jakarta.xml.bind:jakarta.xml.bind-api:jar:2.3.2:compile
> [INFO]  \- jakarta.activation:jakarta.activation-api:jar:1.2.1:compile
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28765) Dependency generation for JDK8/JDK11

2019-08-16 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-28765:
--
Description: 
SPARK-28737 removes `javax.annotation:javax.annotation-api:jar:1.2:compile` 
dependency. However, it occurs at JDK11 environment. This breaks our dependency 
manifest testing on JDK11 environment.
{code:java}
$ mvn dependency:tree -Dincludes=javax.annotation:javax.annotation-api 
-Phadoop-3.2 -Phive-thriftserver -Pmesos -Pkubernetes -Pyarn -Phive
...
[INFO] --- maven-dependency-plugin:3.1.1:tree (default-cli) @ 
spark-kubernetes_2.12 ---
[INFO] org.apache.spark:spark-kubernetes_2.12:jar:3.0.0-SNAPSHOT
[INFO] \- io.fabric8:kubernetes-client:jar:4.1.2:compile
[INFO]\- io.fabric8:kubernetes-model:jar:4.1.2:compile
[INFO]   \- javax.annotation:javax.annotation-api:jar:1.2:compile
{code}

On other example is the following.
{code}
$ mvn dependency:tree -Dincludes=jakarta.activation:jakarta.activation-api 
-Phadoop-3.2 -Phive-thriftserver -Pmesos -Pkubernetes -Pyarn -Phive
...
[INFO] org.apache.spark:spark-catalyst_2.12:jar:3.0.0-SNAPSHOT
[INFO] \- org.apache.spark:spark-core_2.12:jar:3.0.0-SNAPSHOT:compile
[INFO]\- org.glassfish.jersey.core:jersey-server:jar:2.29:compile
[INFO]   \- jakarta.xml.bind:jakarta.xml.bind-api:jar:2.3.2:compile
[INFO]  \- jakarta.activation:jakarta.activation-api:jar:1.2.1:compile
{code}

  was:
SPARK-28737 removes `javax.annotation:javax.annotation-api:jar:1.2:compile` 
dependency. However, it occurs at JDK11 environment. This breaks our dependency 
manifest testing on JDK11 environment.
{code:java}
$ mvn dependency:tree -Dincludes=javax.annotation:javax.annotation-api 
-Phadoop-3.2 -Phive-thriftserver -Pmesos -Pkubernetes -Pyarn -Phive
...
[INFO] --- maven-dependency-plugin:3.1.1:tree (default-cli) @ 
spark-kubernetes_2.12 ---
[INFO] org.apache.spark:spark-kubernetes_2.12:jar:3.0.0-SNAPSHOT
[INFO] \- io.fabric8:kubernetes-client:jar:4.1.2:compile
[INFO]\- io.fabric8:kubernetes-model:jar:4.1.2:compile
[INFO]   \- javax.annotation:javax.annotation-api:jar:1.2:compile
{code}


> Dependency generation for JDK8/JDK11
> 
>
> Key: SPARK-28765
> URL: https://issues.apache.org/jira/browse/SPARK-28765
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> SPARK-28737 removes `javax.annotation:javax.annotation-api:jar:1.2:compile` 
> dependency. However, it occurs at JDK11 environment. This breaks our 
> dependency manifest testing on JDK11 environment.
> {code:java}
> $ mvn dependency:tree -Dincludes=javax.annotation:javax.annotation-api 
> -Phadoop-3.2 -Phive-thriftserver -Pmesos -Pkubernetes -Pyarn -Phive
> ...
> [INFO] --- maven-dependency-plugin:3.1.1:tree (default-cli) @ 
> spark-kubernetes_2.12 ---
> [INFO] org.apache.spark:spark-kubernetes_2.12:jar:3.0.0-SNAPSHOT
> [INFO] \- io.fabric8:kubernetes-client:jar:4.1.2:compile
> [INFO]\- io.fabric8:kubernetes-model:jar:4.1.2:compile
> [INFO]   \- javax.annotation:javax.annotation-api:jar:1.2:compile
> {code}
> On other example is the following.
> {code}
> $ mvn dependency:tree -Dincludes=jakarta.activation:jakarta.activation-api 
> -Phadoop-3.2 -Phive-thriftserver -Pmesos -Pkubernetes -Pyarn -Phive
> ...
> [INFO] org.apache.spark:spark-catalyst_2.12:jar:3.0.0-SNAPSHOT
> [INFO] \- org.apache.spark:spark-core_2.12:jar:3.0.0-SNAPSHOT:compile
> [INFO]\- org.glassfish.jersey.core:jersey-server:jar:2.29:compile
> [INFO]   \- jakarta.xml.bind:jakarta.xml.bind-api:jar:2.3.2:compile
> [INFO]  \- jakarta.activation:jakarta.activation-api:jar:1.2.1:compile
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28765) Dependency generation for JDK8/JDK11

2019-08-16 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-28765:
--
Description: 
SPARK-28737 removes `javax.annotation:javax.annotation-api:jar:1.2:compile` 
dependency. However, it occurs at JDK11 environment. This breaks our dependency 
manifest testing on JDK11 environment.
{code:java}
$ mvn dependency:tree -Dincludes=javax.annotation:javax.annotation-api 
-Phadoop-3.2 -Phive-thriftserver -Pmesos -Pkubernetes -Pyarn -Phive
...
[INFO] --- maven-dependency-plugin:3.1.1:tree (default-cli) @ 
spark-kubernetes_2.12 ---
[INFO] org.apache.spark:spark-kubernetes_2.12:jar:3.0.0-SNAPSHOT
[INFO] \- io.fabric8:kubernetes-client:jar:4.1.2:compile
[INFO]\- io.fabric8:kubernetes-model:jar:4.1.2:compile
[INFO]   \- javax.annotation:javax.annotation-api:jar:1.2:compile
{code}

  was:
The following dependency occurs only at JDK11 environment. This breaks our 
dependency manifest.
{code}
$ mvn dependency:tree -Dincludes=javax.annotation:javax.annotation-api 
-Phadoop-3.2 -Phive-thriftserver -Pmesos -Pkubernetes -Pyarn -Phive
...
[INFO] --- maven-dependency-plugin:3.1.1:tree (default-cli) @ 
spark-kubernetes_2.12 ---
[INFO] org.apache.spark:spark-kubernetes_2.12:jar:3.0.0-SNAPSHOT
[INFO] \- io.fabric8:kubernetes-client:jar:4.1.2:compile
[INFO]\- io.fabric8:kubernetes-model:jar:4.1.2:compile
[INFO]   \- javax.annotation:javax.annotation-api:jar:1.2:compile
{code}


> Dependency generation for JDK8/JDK11
> 
>
> Key: SPARK-28765
> URL: https://issues.apache.org/jira/browse/SPARK-28765
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> SPARK-28737 removes `javax.annotation:javax.annotation-api:jar:1.2:compile` 
> dependency. However, it occurs at JDK11 environment. This breaks our 
> dependency manifest testing on JDK11 environment.
> {code:java}
> $ mvn dependency:tree -Dincludes=javax.annotation:javax.annotation-api 
> -Phadoop-3.2 -Phive-thriftserver -Pmesos -Pkubernetes -Pyarn -Phive
> ...
> [INFO] --- maven-dependency-plugin:3.1.1:tree (default-cli) @ 
> spark-kubernetes_2.12 ---
> [INFO] org.apache.spark:spark-kubernetes_2.12:jar:3.0.0-SNAPSHOT
> [INFO] \- io.fabric8:kubernetes-client:jar:4.1.2:compile
> [INFO]\- io.fabric8:kubernetes-model:jar:4.1.2:compile
> [INFO]   \- javax.annotation:javax.annotation-api:jar:1.2:compile
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28765) Dependency generation for JDK8/JDK11

2019-08-16 Thread Dongjoon Hyun (JIRA)
Dongjoon Hyun created SPARK-28765:
-

 Summary: Dependency generation for JDK8/JDK11
 Key: SPARK-28765
 URL: https://issues.apache.org/jira/browse/SPARK-28765
 Project: Spark
  Issue Type: Sub-task
  Components: Build
Affects Versions: 3.0.0
Reporter: Dongjoon Hyun


The following dependency occurs only at JDK11 environment. This breaks our 
dependency manifest.
{code}
$ mvn dependency:tree -Dincludes=javax.annotation:javax.annotation-api 
-Phadoop-3.2 -Phive-thriftserver -Pmesos -Pkubernetes -Pyarn -Phive
...
[INFO] --- maven-dependency-plugin:3.1.1:tree (default-cli) @ 
spark-kubernetes_2.12 ---
[INFO] org.apache.spark:spark-kubernetes_2.12:jar:3.0.0-SNAPSHOT
[INFO] \- io.fabric8:kubernetes-client:jar:4.1.2:compile
[INFO]\- io.fabric8:kubernetes-model:jar:4.1.2:compile
[INFO]   \- javax.annotation:javax.annotation-api:jar:1.2:compile
{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27659) Allow PySpark toLocalIterator to prefetch data

2019-08-16 Thread holdenk (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909509#comment-16909509
 ] 

holdenk commented on SPARK-27659:
-

I'm working on this.

> Allow PySpark toLocalIterator to prefetch data
> --
>
> Key: SPARK-27659
> URL: https://issues.apache.org/jira/browse/SPARK-27659
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.0.0
>Reporter: Bryan Cutler
>Priority: Minor
>
> In SPARK-23961, data was no longer prefetched so that the local iterator 
> could close cleanly in the case of not consuming all of the data. If the user 
> intends to iterate over all elements, then prefetching data could bring back 
> any lost performance. We would need to run some tests to see if the 
> performance is worth it due to additional complexity and extra usage of 
> memory. The option to prefetch could adding as a user conf, and it's possible 
> this could improve the Scala toLocalIterator also.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27763) Port test cases from PostgreSQL to Spark SQL

2019-08-16 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-27763:


Assignee: Yuming Wang

> Port test cases from PostgreSQL to Spark SQL
> 
>
> Key: SPARK-27763
> URL: https://issues.apache.org/jira/browse/SPARK-27763
> Project: Spark
>  Issue Type: Umbrella
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Assignee: Yuming Wang
>Priority: Major
>
> To improve the test coverage, we can port the regression tests from the other 
> popular open source projects to Spark SQL. PostgreSQL is one of the best SQL 
> systems. Below are the links to the test cases and results. 
>  * Regression test cases: 
> [https://github.com/postgres/postgres/tree/master/src/test/regress/sql]
>  * Expected results: 
> [https://github.com/postgres/postgres/tree/master/src/test/regress/expected]
> Spark SQL does not support all the feature sets of PostgreSQL. In the current 
> stage, we should first comment out these test cases and create the 
> corresponding JIRAs in SPARK-27764. We can discuss and prioritize which 
> features we should support. Also, these PostgreSQL regression tests could 
> also expose the existing bugs of Spark SQL. We should also create the JIRAs 
> and track them in SPARK-27764. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28758) Upgrade Janino to 3.0.15

2019-08-16 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909446#comment-16909446
 ] 

Dongjoon Hyun commented on SPARK-28758:
---

Thanks for the advice. Yep, this is not necessary to Apache Spark. I put this 
out of the umbrella.

> Upgrade Janino to 3.0.15
> 
>
> Key: SPARK-28758
> URL: https://issues.apache.org/jira/browse/SPARK-28758
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, SQL
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.0.0
>
>
> This issue aims to upgrade Janino to bring the bug fixes. Please note that 
> Janino 3.1.0 has a major refactoring instead of bug fixes. We had better use 
> 3.0.15.
> *3.0.15 (2019-07-28)*
> - Fix overloaded single static method import
> *3.0.14 (2019-07-05)*
> - Conflict in sbt-assembly
> - Overloaded static on-demand imported methods cause a CompileException: 
> Ambiguous static method import
> - Handle overloaded static on-demand imports
> - Major refactoring of the Java 8 and Java 9 retrofit mechanism
> - Added tests for "JLS8 8.6 Instance Initializers" and "JLS8 8.7 Static 
> Initializers"
> - Local variables in instance initializers don't work
> - Provide an option to keep generated code files
> - Added compile error handler and warning handler to ICompiler



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28758) Upgrade Janino to 3.0.15

2019-08-16 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-28758:
--
Issue Type: Improvement  (was: Sub-task)
Parent: (was: SPARK-24417)

> Upgrade Janino to 3.0.15
> 
>
> Key: SPARK-28758
> URL: https://issues.apache.org/jira/browse/SPARK-28758
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, SQL
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.0.0
>
>
> This issue aims to upgrade Janino to bring the bug fixes. Please note that 
> Janino 3.1.0 has a major refactoring instead of bug fixes. We had better use 
> 3.0.15.
> *3.0.15 (2019-07-28)*
> - Fix overloaded single static method import
> *3.0.14 (2019-07-05)*
> - Conflict in sbt-assembly
> - Overloaded static on-demand imported methods cause a CompileException: 
> Ambiguous static method import
> - Handle overloaded static on-demand imports
> - Major refactoring of the Java 8 and Java 9 retrofit mechanism
> - Added tests for "JLS8 8.6 Instance Initializers" and "JLS8 8.7 Static 
> Initializers"
> - Local variables in instance initializers don't work
> - Provide an option to keep generated code files
> - Added compile error handler and warning handler to ICompiler



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28737) Update jersey to 2.27+ (2.29)

2019-08-16 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-28737.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 25455
[https://github.com/apache/spark/pull/25455]

> Update jersey to 2.27+ (2.29)
> -
>
> Key: SPARK-28737
> URL: https://issues.apache.org/jira/browse/SPARK-28737
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Sean Owen
>Assignee: Sean Owen
>Priority: Major
> Fix For: 3.0.0
>
>
> Looks like we might need to update Jersey after all, from recent JDK 11 
> testing: 
> {code}
> Caused by: java.lang.IllegalArgumentException
>   at 
> jersey.repackaged.org.objectweb.asm.ClassReader.init(ClassReader.java:170)
>   at 
> jersey.repackaged.org.objectweb.asm.ClassReader.init(ClassReader.java:153)
>   at 
> jersey.repackaged.org.objectweb.asm.ClassReader.init(ClassReader.java:424)
>   at 
> org.glassfish.jersey.server.internal.scanning.AnnotationAcceptingListener.process(AnnotationAcceptingListener.java:170)
> {code}
> It looks like 2.27+ may solve the issue, so worth trying 2.29. 
> I'm not 100% sure this is an issue as the JDK 11 testing process is still 
> undergoing change, but will work on it to see how viable it is anyway, as it 
> may be worthwhile to update for 3.0 in any event.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28761) spark.driver.maxResultSize only applies to compressed data

2019-08-16 Thread Liang-Chi Hsieh (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909420#comment-16909420
 ] 

Liang-Chi Hsieh commented on SPARK-28761:
-

If you do it at SparkPlan.scala#L344, isn't it just for SQL? 
{{spark.driver.maxResultSize}} covers RDD, right?

> spark.driver.maxResultSize only applies to compressed data
> --
>
> Key: SPARK-28761
> URL: https://issues.apache.org/jira/browse/SPARK-28761
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: David Vogelbacher
>Priority: Major
>
> Spark has a setting {{spark.driver.maxResultSize}}, see 
> https://spark.apache.org/docs/latest/configuration.html#application-properties
>  :
> {noformat}
> Limit of total size of serialized results of all partitions for each Spark 
> action (e.g. collect) in bytes. Should be at least 1M, or 0 for unlimited. 
> Jobs will be aborted if the total size is above this limit. Having a high 
> limit may cause out-of-memory errors in driver (depends on 
> spark.driver.memory and memory overhead of objects in JVM). 
> Setting a proper limit can protect the driver from out-of-memory errors.
> {noformat}
> This setting can be very useful in constraining the memory that the spark 
> driver needs for a specific spark action. However, this limit is checked 
> before decompressing data in 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L662
> Even if the compressed data is below the limit the uncompressed data can 
> still be far above. In order to protect the driver we should also impose a 
> limit on the uncompressed data. We could do this in 
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala#L344
> I propose adding a new config option 
> {{spark.driver.maxUncompressedResultSize}}.
> A simple repro of this with spark shell:
> {noformat}
> > printf 'a%.0s' {1..10} > test.csv # create a 100 MB file
> > ./bin/spark-shell --conf "spark.driver.maxResultSize=1"
> scala> val df = spark.read.format("csv").load("/Users/dvogelbacher/test.csv")
> df: org.apache.spark.sql.DataFrame = [_c0: string]
> scala> val results = df.collect()
> results: Array[org.apache.spark.sql.Row] = 
> Array([a...
> scala> results(0).getString(0).size
> res0: Int = 10
> {noformat}
> Even though we set maxResultSize to 10 MB, we collect a result that is 100MB 
> uncompressed.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28732) org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator - failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java' when storing

2019-08-16 Thread Liang-Chi Hsieh (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909409#comment-16909409
 ] 

Liang-Chi Hsieh commented on SPARK-28732:
-

As {{count}} return type is LongType, I think it is reasonable that it can't be 
fit into an Int column. The problem here might be the error is not friendly.

Normally, if we want to map dataset to specified type, an exception like this 
should be thrown, if it is incompatible:
{code}
You can either add an explicit cast to the input data or choose a higher 
precision type of the field in the target object;   
 
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveUpCast$$fail(Analyzer.scala:2801)
  
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast$$anonfun$apply$32$$anonfun$applyOrElse$143.applyOrElse(Analyzer.scala:2821)

  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast$$anonfun$apply$32$$anonfun$applyOrElse$143.applyOrElse(Analyzer.scala:2812)

{code}

> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator - failed to 
> compile: org.codehaus.commons.compiler.CompileException: File 
> 'generated.java' when storing the result of a count aggregation in an integer
> ---
>
> Key: SPARK-28732
> URL: https://issues.apache.org/jira/browse/SPARK-28732
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0, 2.2.0, 2.3.0, 2.4.0
>Reporter: Alix Métivier
>Priority: Major
>
> I am using agg function on a dataset, and i want to count the number of lines 
> upon grouping columns. I would like to store the result of this count in an 
> integer, but it fails with this output : 
> {code}
> [ERROR]: org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator - 
> failed to compile: org.codehaus.commons.compiler.CompileException: File 
> 'generated.java', Line 89, Column 53: No applicable constructor/method found 
> for actual parameters "long"; candidates are: "java.lang.Integer(int)", 
> "java.lang.Integer(java.lang.String)"
> Here is the line 89 and a few others to understand :
> /* 085 */ long value13 = i.getLong(5);
>  /* 086 */ argValue4 = value13;
>  /* 087 */
>  /* 088 */
>  /* 089 */ final java.lang.Integer value12 = false ? null : new 
> java.lang.Integer(argValue4);
> {code}
>  
> As per Integer documentation, there is not constructor for the type Long, so 
> this is why the generated code fails.
> Here is my code : 
> {code}
> org.apache.spark.sql.Dataset ds_row2 = 
> ds_conntAggregateRow_1_Out_1
>  .groupBy(org.apache.spark.sql.functions.col("n_name").as("n_nameN"),
>  org.apache.spark.sql.functions.col("o_year").as("o_yearN"))
>  .agg(org.apache.spark.sql.functions.count("n_name").as("countN"),
>  .as(org.apache.spark.sql.Encoders.bean(row2Struct.class));
> {code}
> row2Struct class is composed of n_nameN: String, o_yearN: String, countN: Int
> If countN is a Long, code above wont fail
> If it is an Int, it works in 1.6 and 2.0, but fails on version 2.1+
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-28732) org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator - failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java' when st

2019-08-16 Thread Liang-Chi Hsieh (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909409#comment-16909409
 ] 

Liang-Chi Hsieh edited comment on SPARK-28732 at 8/16/19 9:19 PM:
--

As {{count}} return type is LongType, I think it is reasonable that it can't be 
fit into an Int column. The problem here might be the error is not friendly.

Normally, if we want to map dataset to specified type, an exception like this 
should be thrown, if it is incompatible:
{code}
org.apache.spark.sql.AnalysisException: Cannot up cast `b` from bigint to int.  

  
The type path of the target object is:  

  
- field (class: "scala.Int", name: "b") 
   
- root class: "Test" 
You can either add an explicit cast to the input data or choose a higher 
precision type of the field in the target object;   
 
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveUpCast$$fail(Analyzer.scala:2801)
  
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast$$anonfun$apply$32$$anonfun$applyOrElse$143.applyOrElse(Analyzer.scala:2821)

  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast$$anonfun$apply$32$$anonfun$applyOrElse$143.applyOrElse(Analyzer.scala:2812)

{code}


was (Author: viirya):
As {{count}} return type is LongType, I think it is reasonable that it can't be 
fit into an Int column. The problem here might be the error is not friendly.

Normally, if we want to map dataset to specified type, an exception like this 
should be thrown, if it is incompatible:
{code}
You can either add an explicit cast to the input data or choose a higher 
precision type of the field in the target object;   
 
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveUpCast$$fail(Analyzer.scala:2801)
  
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast$$anonfun$apply$32$$anonfun$applyOrElse$143.applyOrElse(Analyzer.scala:2821)

  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast$$anonfun$apply$32$$anonfun$applyOrElse$143.applyOrElse(Analyzer.scala:2812)

{code}

> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator - failed to 
> compile: org.codehaus.commons.compiler.CompileException: File 
> 'generated.java' when storing the result of a count aggregation in an integer
> ---
>
> Key: SPARK-28732
> URL: https://issues.apache.org/jira/browse/SPARK-28732
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0, 2.2.0, 2.3.0, 2.4.0
>Reporter: Alix Métivier
>Priority: Major
>
> I am using agg function on a dataset, and i want to count the number of lines 
> upon grouping columns. I would like to store the result of this count in an 
> integer, but it fails with this output : 
> {code}
> [ERROR]: org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator - 
> failed to compile: org.codehaus.commons.compiler.CompileException: File 
> 'generated.java', Line 89, Column 53: No applicable constructor/method found 
> for actual parameters "long"; candidates are: "java.lang.Integer(int)", 
> "java.lang.Integer(java.lang.String)"
> Here is the line 89 and a few others to understand :
> /* 085 */ long value13 = i.getLong(5);
>  /* 086 */ argValue4 = value13;
>  /* 087 */
>  /* 088 */
>  /* 089 */ final java.lang.Integer value12 = false ? null : new 
> java.lang.Integer(argValue4);
> {code}
>  
> As per Integer documentation, there is not constructor for the type Long, so 
> this is why the generated code fails.
> Here is my code : 
> {code}
> org.apache.spark.sql.Dataset 

[jira] [Created] (SPARK-28764) Remove unnecessary writePartitionedFile method from ExternalSorter

2019-08-16 Thread Matt Cheah (JIRA)
Matt Cheah created SPARK-28764:
--

 Summary: Remove unnecessary writePartitionedFile method from 
ExternalSorter
 Key: SPARK-28764
 URL: https://issues.apache.org/jira/browse/SPARK-28764
 Project: Spark
  Issue Type: Task
  Components: Shuffle, Tests
Affects Versions: 3.0.0
Reporter: Matt Cheah


Following SPARK-28571, we now use {{ExternalSorter#writePartitionedData}} in 
{{SortShuffleWriter}} when persisting the shuffle data via the shuffle writer 
plugin. However, we left the {{writePartitionedFile}} method on 
{{ExternalSorter}} strictly for tests. We should figure out a way how to 
refactor those tests to use {{writePartitionedData}} instead of 
{{writePartitionedFile}}.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28701) add java11 support for spark pull request builds

2019-08-16 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-28701:
--
Issue Type: Sub-task  (was: Improvement)
Parent: SPARK-24417

> add java11 support for spark pull request builds
> 
>
> Key: SPARK-28701
> URL: https://issues.apache.org/jira/browse/SPARK-28701
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, jenkins
>Affects Versions: 3.0.0
>Reporter: shane knapp
>Assignee: shane knapp
>Priority: Major
>
> from https://github.com/apache/spark/pull/25405
> add a PRB subject check for [test-java11] and update JAVA_HOME env var to 
> point to /usr/java/jdk-11.0.1



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-28763) Flaky Tests: SparkThriftServerProtocolVersionsSuite.HIVE_CLI_SERVICE_PROTOCOL_V1 get binary type

2019-08-16 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909379#comment-16909379
 ] 

Dongjoon Hyun edited comment on SPARK-28763 at 8/16/19 8:41 PM:


Hi, [~yumwang]. This test suite seems to fail very frequently. As you see, it 
fails 4 times consequtively.
Could you take a look? If the fix is not trivial, we had better revert this 
first.
 
cc [~srowen]


was (Author: dongjoon):
Hi, [~yumwang] and [~smilegator]. This test suite seems to fail very 
frequently. As you see, it fails 4 times consequtively.
Could you take a look? If the fix is not trivial, we had better revert this 
first.
 
cc [~srowen]

> Flaky Tests: 
> SparkThriftServerProtocolVersionsSuite.HIVE_CLI_SERVICE_PROTOCOL_V1 get 
> binary type
> 
>
> Key: SPARK-28763
> URL: https://issues.apache.org/jira/browse/SPARK-28763
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
> Attachments: Screen Shot 2019-08-16 at 1.34.23 PM.png
>
>
> {code}
> org.apache.spark.sql.hive.thriftserver.SparkThriftServerProtocolVersionsSuite.HIVE_CLI_SERVICE_PROTOCOL_V1
>  get binary type
> org.scalatest.exceptions.TestFailedException: "[?](" did not equal "[�]("
> {code}
>  !Screen Shot 2019-08-16 at 1.34.23 PM.png|width=100%! 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28763) Flaky Tests: SparkThriftServerProtocolVersionsSuite.HIVE_CLI_SERVICE_PROTOCOL_V1 get binary type

2019-08-16 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909379#comment-16909379
 ] 

Dongjoon Hyun commented on SPARK-28763:
---

Hi, [~yumwang] and [~smilegator]. This test suite seems to fail very 
frequently. As you see, it fails 4 times consequtively.
Could you take a look? If the fix is not trivial, we had better revert this 
first.
 
cc [~srowen]

> Flaky Tests: 
> SparkThriftServerProtocolVersionsSuite.HIVE_CLI_SERVICE_PROTOCOL_V1 get 
> binary type
> 
>
> Key: SPARK-28763
> URL: https://issues.apache.org/jira/browse/SPARK-28763
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
> Attachments: Screen Shot 2019-08-16 at 1.34.23 PM.png
>
>
> {code}
> org.apache.spark.sql.hive.thriftserver.SparkThriftServerProtocolVersionsSuite.HIVE_CLI_SERVICE_PROTOCOL_V1
>  get binary type
> org.scalatest.exceptions.TestFailedException: "[?](" did not equal "[�]("
> {code}
>  !Screen Shot 2019-08-16 at 1.34.23 PM.png|width=100%! 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28763) Flaky Tests: SparkThriftServerProtocolVersionsSuite.HIVE_CLI_SERVICE_PROTOCOL_V1 get binary type

2019-08-16 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-28763:
--
Description: 
{code}
org.apache.spark.sql.hive.thriftserver.SparkThriftServerProtocolVersionsSuite.HIVE_CLI_SERVICE_PROTOCOL_V1
 get binary type

org.scalatest.exceptions.TestFailedException: "[?](" did not equal "[�]("
{code}

 !Screen Shot 2019-08-16 at 1.34.23 PM.png|width=100%! 



  was:
{code}
org.apache.spark.sql.hive.thriftserver.SparkThriftServerProtocolVersionsSuite.HIVE_CLI_SERVICE_PROTOCOL_V1
 get binary type

org.scalatest.exceptions.TestFailedException: "[?](" did not equal "[�]("
{code}

 !Screen Shot 2019-08-16 at 1.34.23 PM.png! 




> Flaky Tests: 
> SparkThriftServerProtocolVersionsSuite.HIVE_CLI_SERVICE_PROTOCOL_V1 get 
> binary type
> 
>
> Key: SPARK-28763
> URL: https://issues.apache.org/jira/browse/SPARK-28763
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
> Attachments: Screen Shot 2019-08-16 at 1.34.23 PM.png
>
>
> {code}
> org.apache.spark.sql.hive.thriftserver.SparkThriftServerProtocolVersionsSuite.HIVE_CLI_SERVICE_PROTOCOL_V1
>  get binary type
> org.scalatest.exceptions.TestFailedException: "[?](" did not equal "[�]("
> {code}
>  !Screen Shot 2019-08-16 at 1.34.23 PM.png|width=100%! 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28763) Flaky Tests: SparkThriftServerProtocolVersionsSuite.HIVE_CLI_SERVICE_PROTOCOL_V1 get binary type

2019-08-16 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-28763:
--
Attachment: Screen Shot 2019-08-16 at 1.34.23 PM.png

> Flaky Tests: 
> SparkThriftServerProtocolVersionsSuite.HIVE_CLI_SERVICE_PROTOCOL_V1 get 
> binary type
> 
>
> Key: SPARK-28763
> URL: https://issues.apache.org/jira/browse/SPARK-28763
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
> Attachments: Screen Shot 2019-08-16 at 1.34.23 PM.png
>
>
> {code}
> org.apache.spark.sql.hive.thriftserver.SparkThriftServerProtocolVersionsSuite.HIVE_CLI_SERVICE_PROTOCOL_V1
>  get binary type
> org.scalatest.exceptions.TestFailedException: "[?](" did not equal "[�]("
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28763) Flaky Tests: SparkThriftServerProtocolVersionsSuite.HIVE_CLI_SERVICE_PROTOCOL_V1 get binary type

2019-08-16 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-28763:
--
Description: 
{code}
org.apache.spark.sql.hive.thriftserver.SparkThriftServerProtocolVersionsSuite.HIVE_CLI_SERVICE_PROTOCOL_V1
 get binary type

org.scalatest.exceptions.TestFailedException: "[?](" did not equal "[�]("
{code}

 !Screen Shot 2019-08-16 at 1.34.23 PM.png! 



  was:
{code}
org.apache.spark.sql.hive.thriftserver.SparkThriftServerProtocolVersionsSuite.HIVE_CLI_SERVICE_PROTOCOL_V1
 get binary type

org.scalatest.exceptions.TestFailedException: "[?](" did not equal "[�]("
{code}




> Flaky Tests: 
> SparkThriftServerProtocolVersionsSuite.HIVE_CLI_SERVICE_PROTOCOL_V1 get 
> binary type
> 
>
> Key: SPARK-28763
> URL: https://issues.apache.org/jira/browse/SPARK-28763
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
> Attachments: Screen Shot 2019-08-16 at 1.34.23 PM.png
>
>
> {code}
> org.apache.spark.sql.hive.thriftserver.SparkThriftServerProtocolVersionsSuite.HIVE_CLI_SERVICE_PROTOCOL_V1
>  get binary type
> org.scalatest.exceptions.TestFailedException: "[?](" did not equal "[�]("
> {code}
>  !Screen Shot 2019-08-16 at 1.34.23 PM.png! 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28763) Flaky Test

2019-08-16 Thread Dongjoon Hyun (JIRA)
Dongjoon Hyun created SPARK-28763:
-

 Summary: Flaky Test
 Key: SPARK-28763
 URL: https://issues.apache.org/jira/browse/SPARK-28763
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Dongjoon Hyun


{code}
org.apache.spark.sql.hive.thriftserver.SparkThriftServerProtocolVersionsSuite.HIVE_CLI_SERVICE_PROTOCOL_V1
 get binary type

org.scalatest.exceptions.TestFailedException: "[?](" did not equal "[�]("
{code}





--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28763) Flaky Tests: SparkThriftServerProtocolVersionsSuite.HIVE_CLI_SERVICE_PROTOCOL_V1 get binary type

2019-08-16 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-28763:
--
Summary: Flaky Tests: 
SparkThriftServerProtocolVersionsSuite.HIVE_CLI_SERVICE_PROTOCOL_V1 get binary 
type  (was: Flaky Test)

> Flaky Tests: 
> SparkThriftServerProtocolVersionsSuite.HIVE_CLI_SERVICE_PROTOCOL_V1 get 
> binary type
> 
>
> Key: SPARK-28763
> URL: https://issues.apache.org/jira/browse/SPARK-28763
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> {code}
> org.apache.spark.sql.hive.thriftserver.SparkThriftServerProtocolVersionsSuite.HIVE_CLI_SERVICE_PROTOCOL_V1
>  get binary type
> org.scalatest.exceptions.TestFailedException: "[?](" did not equal "[�]("
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21097) Dynamic allocation will preserve cached data

2019-08-16 Thread Adam Kennedy (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909342#comment-16909342
 ] 

Adam Kennedy commented on SPARK-21097:
--

Another supporting reason for supporting transfer of memory cache blocks in 
general... in an application where executors are spinning up and down all the 
time, we are likely to see situations in which the memory cache blocks are 
heavily skewed towards a small percentage of the total executors. If we had 
support for replicating cache blocks around, we could also look at a more 
general cache balancer, which could redistribute the cache blocks over time 
(perhaps idle periods or using long tail resources when no new tasks exist to 
distribute) to make subsequent stages or jobs more likely to get all tasks 
distributed evenly.

> Dynamic allocation will preserve cached data
> 
>
> Key: SPARK-21097
> URL: https://issues.apache.org/jira/browse/SPARK-21097
> Project: Spark
>  Issue Type: Improvement
>  Components: Block Manager, Scheduler, Spark Core
>Affects Versions: 2.2.0, 2.3.0
>Reporter: Brad
>Priority: Major
> Attachments: Preserving Cached Data with Dynamic Allocation.pdf
>
>
> We want to use dynamic allocation to distribute resources among many notebook 
> users on our spark clusters. One difficulty is that if a user has cached data 
> then we are either prevented from de-allocating any of their executors, or we 
> are forced to drop their cached data, which can lead to a bad user experience.
> We propose adding a feature to preserve cached data by copying it to other 
> executors before de-allocation. This behavior would be enabled by a simple 
> spark config. Now when an executor reaches its configured idle timeout, 
> instead of just killing it on the spot, we will stop sending it new tasks, 
> replicate all of its rdd blocks onto other executors, and then kill it. If 
> there is an issue while we replicate the data, like an error, it takes too 
> long, or there isn't enough space, then we will fall back to the original 
> behavior and drop the data and kill the executor.
> This feature should allow anyone with notebook users to use their cluster 
> resources more efficiently. Also since it will be completely opt-in it will 
> unlikely to cause problems for other use cases.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28762) Read JAR main class if JAR is not located in local file system

2019-08-16 Thread Ivan Gozali (JIRA)
Ivan Gozali created SPARK-28762:
---

 Summary: Read JAR main class if JAR is not located in local file 
system
 Key: SPARK-28762
 URL: https://issues.apache.org/jira/browse/SPARK-28762
 Project: Spark
  Issue Type: New Feature
  Components: Deploy, Spark Core, Spark Submit
Affects Versions: 2.4.3
Reporter: Ivan Gozali


Currently, {{spark-submit}} doesn't attempt to read the main class from a Spark 
app JAR file if the scheme of the primary resource URI is not {{file}}. In 
other words, if the JAR is not in the local file system, it will barf.

It would be useful to have this feature if I deploy my Spark app JARs in S3 or 
HDFS.

If it makes sense to maintainers, I can take a stab at this - I think I know 
which files to look at.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28762) Read JAR main class if JAR is not located in local file system

2019-08-16 Thread Ivan Gozali (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Gozali updated SPARK-28762:

Priority: Minor  (was: Major)

> Read JAR main class if JAR is not located in local file system
> --
>
> Key: SPARK-28762
> URL: https://issues.apache.org/jira/browse/SPARK-28762
> Project: Spark
>  Issue Type: New Feature
>  Components: Deploy, Spark Core, Spark Submit
>Affects Versions: 2.4.3
>Reporter: Ivan Gozali
>Priority: Minor
>
> Currently, {{spark-submit}} doesn't attempt to read the main class from a 
> Spark app JAR file if the scheme of the primary resource URI is not {{file}}. 
> In other words, if the JAR is not in the local file system, it will barf.
> It would be useful to have this feature if I deploy my Spark app JARs in S3 
> or HDFS.
> If it makes sense to maintainers, I can take a stab at this - I think I know 
> which files to look at.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28758) Upgrade Janino to 3.0.15

2019-08-16 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-28758.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 25474
[https://github.com/apache/spark/pull/25474]

> Upgrade Janino to 3.0.15
> 
>
> Key: SPARK-28758
> URL: https://issues.apache.org/jira/browse/SPARK-28758
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, SQL
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.0.0
>
>
> This issue aims to upgrade Janino to bring the bug fixes. Please note that 
> Janino 3.1.0 has a major refactoring instead of bug fixes. We had better use 
> 3.0.15.
> *3.0.15 (2019-07-28)*
> - Fix overloaded single static method import
> *3.0.14 (2019-07-05)*
> - Conflict in sbt-assembly
> - Overloaded static on-demand imported methods cause a CompileException: 
> Ambiguous static method import
> - Handle overloaded static on-demand imports
> - Major refactoring of the Java 8 and Java 9 retrofit mechanism
> - Added tests for "JLS8 8.6 Instance Initializers" and "JLS8 8.7 Static 
> Initializers"
> - Local variables in instance initializers don't work
> - Provide an option to keep generated code files
> - Added compile error handler and warning handler to ICompiler



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28722) Change sequential label sorting in StringIndexer fit to parallel

2019-08-16 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-28722.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 25442
[https://github.com/apache/spark/pull/25442]

> Change sequential label sorting in StringIndexer fit to parallel
> 
>
> Key: SPARK-28722
> URL: https://issues.apache.org/jira/browse/SPARK-28722
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.0.0
>Reporter: Liang-Chi Hsieh
>Assignee: Liang-Chi Hsieh
>Priority: Minor
> Fix For: 3.0.0
>
>
> The fit method in StringIndexer sorts given labels in a sequential approach, 
> if there are multiple input columns. When the number of input column 
> increases, the time of label sorting dramatically increases too so it is hard 
> to use in practice if dealing with hundreds of input columns.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28722) Change sequential label sorting in StringIndexer fit to parallel

2019-08-16 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reassigned SPARK-28722:
-

Assignee: Liang-Chi Hsieh

> Change sequential label sorting in StringIndexer fit to parallel
> 
>
> Key: SPARK-28722
> URL: https://issues.apache.org/jira/browse/SPARK-28722
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.0.0
>Reporter: Liang-Chi Hsieh
>Assignee: Liang-Chi Hsieh
>Priority: Minor
>
> The fit method in StringIndexer sorts given labels in a sequential approach, 
> if there are multiple input columns. When the number of input column 
> increases, the time of label sorting dramatically increases too so it is hard 
> to use in practice if dealing with hundreds of input columns.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28722) Change sequential label sorting in StringIndexer fit to parallel

2019-08-16 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-28722:
--
Priority: Minor  (was: Major)

> Change sequential label sorting in StringIndexer fit to parallel
> 
>
> Key: SPARK-28722
> URL: https://issues.apache.org/jira/browse/SPARK-28722
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.0.0
>Reporter: Liang-Chi Hsieh
>Priority: Minor
>
> The fit method in StringIndexer sorts given labels in a sequential approach, 
> if there are multiple input columns. When the number of input column 
> increases, the time of label sorting dramatically increases too so it is hard 
> to use in practice if dealing with hundreds of input columns.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28755) test_mllib_classification fails on JDK11

2019-08-16 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-28755:
-

Assignee: Hyukjin Kwon

> test_mllib_classification fails on JDK11
> 
>
> Key: SPARK-28755
> URL: https://issues.apache.org/jira/browse/SPARK-28755
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.0.0
>
>
> - https://github.com/apache/spark/pull/25443
> - 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109111/consoleFull
> {code}
> ...
> 1. Failure: spark.mlp (@test_mllib_classification.R#310) 
> ---
> head(summary$weights, 5) not equal to list(-24.28415, 107.8701, 16.86376, 
> 1.103736, 9.244488).
> Component 1: Mean relative difference: 0.002250183
> Component 2: Mean relative difference: 0.001494751
> Component 3: Mean relative difference: 0.001602342
> Component 4: Mean relative difference: 0.01193038
> Component 5: Mean relative difference: 0.001732629
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28755) test_mllib_classification fails on JDK11

2019-08-16 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-28755.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 25478
[https://github.com/apache/spark/pull/25478]

> test_mllib_classification fails on JDK11
> 
>
> Key: SPARK-28755
> URL: https://issues.apache.org/jira/browse/SPARK-28755
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
> Fix For: 3.0.0
>
>
> - https://github.com/apache/spark/pull/25443
> - 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109111/consoleFull
> {code}
> ...
> 1. Failure: spark.mlp (@test_mllib_classification.R#310) 
> ---
> head(summary$weights, 5) not equal to list(-24.28415, 107.8701, 16.86376, 
> 1.103736, 9.244488).
> Component 1: Mean relative difference: 0.002250183
> Component 2: Mean relative difference: 0.001494751
> Component 3: Mean relative difference: 0.001602342
> Component 4: Mean relative difference: 0.01193038
> Component 5: Mean relative difference: 0.001732629
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28756) Fix checkJavaVersion to accept JDK8+

2019-08-16 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-28756.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 25472
[https://github.com/apache/spark/pull/25472]

> Fix checkJavaVersion to accept JDK8+
> 
>
> Key: SPARK-28756
> URL: https://issues.apache.org/jira/browse/SPARK-28756
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.0.0
>
>
> {code}
> build/mvn -Phadoop-3.2 -Psparkr -DskipTests package
> R/install-dev.sh
> R/run-tests.sh
> {code}
> {code}
> Skipped 
> 
> 1. create DataFrame from list or data.frame (@test_basic.R#21) - error on 
> Java check
> 2. spark.glm and predict (@test_basic.R#57) - error on Java check
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28758) Upgrade Janino to 3.0.15

2019-08-16 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909163#comment-16909163
 ] 

Sean Owen commented on SPARK-28758:
---

It doesn't really matter, but I don't know if this is strictly related to JDK 
11? 3.0.13 seemed to work. Just to keep the umbrella JIRA sort of specific to 
what had to change. (Or maybe you found there was a bug fix we needed for 11)

> Upgrade Janino to 3.0.15
> 
>
> Key: SPARK-28758
> URL: https://issues.apache.org/jira/browse/SPARK-28758
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, SQL
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>
> This issue aims to upgrade Janino to bring the bug fixes. Please note that 
> Janino 3.1.0 has a major refactoring instead of bug fixes. We had better use 
> 3.0.15.
> *3.0.15 (2019-07-28)*
> - Fix overloaded single static method import
> *3.0.14 (2019-07-05)*
> - Conflict in sbt-assembly
> - Overloaded static on-demand imported methods cause a CompileException: 
> Ambiguous static method import
> - Handle overloaded static on-demand imports
> - Major refactoring of the Java 8 and Java 9 retrofit mechanism
> - Added tests for "JLS8 8.6 Instance Initializers" and "JLS8 8.7 Static 
> Initializers"
> - Local variables in instance initializers don't work
> - Provide an option to keep generated code files
> - Added compile error handler and warning handler to ICompiler



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28757) File table location should include both values of option `path` and `paths`

2019-08-16 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-28757.
-
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 25473
[https://github.com/apache/spark/pull/25473]

> File table location should include both values of option `path` and `paths`
> ---
>
> Key: SPARK-28757
> URL: https://issues.apache.org/jira/browse/SPARK-28757
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.0.0
>
>
> In V1 implementation, file table location includes both values of option 
> `path` and `paths`.
> In the refactoring of https://github.com/apache/spark/pull/24025, the value 
> of option `path` is ignored if "paths" are specified. We should make it 
> consistent with V1.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28757) File table location should include both values of option `path` and `paths`

2019-08-16 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-28757:
---

Assignee: Gengliang Wang

> File table location should include both values of option `path` and `paths`
> ---
>
> Key: SPARK-28757
> URL: https://issues.apache.org/jira/browse/SPARK-28757
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> In V1 implementation, file table location includes both values of option 
> `path` and `paths`.
> In the refactoring of https://github.com/apache/spark/pull/24025, the value 
> of option `path` is ignored if "paths" are specified. We should make it 
> consistent with V1.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28761) spark.driver.maxResultSize only applies to compressed data

2019-08-16 Thread David Vogelbacher (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Vogelbacher updated SPARK-28761:
--
Description: 
Spark has a setting {{spark.driver.maxResultSize}}, see 
https://spark.apache.org/docs/latest/configuration.html#application-properties :
{noformat}
Limit of total size of serialized results of all partitions for each Spark 
action (e.g. collect) in bytes. Should be at least 1M, or 0 for unlimited. 
Jobs will be aborted if the total size is above this limit. Having a high limit 
may cause out-of-memory errors in driver (depends on spark.driver.memory and 
memory overhead of objects in JVM). 
Setting a proper limit can protect the driver from out-of-memory errors.
{noformat}
This setting can be very useful in constraining the memory that the spark 
driver needs for a specific spark action. However, this limit is checked before 
decompressing data in 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L662

Even if the compressed data is below the limit the uncompressed data can still 
be far above. In order to protect the driver we should also impose a limit on 
the uncompressed data. We could do this in 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala#L344
I propose adding a new config option {{spark.driver.maxUncompressedResultSize}}.

A simple repro of this with spark shell:
{noformat}
> printf 'a%.0s' {1..10} > test.csv # create a 100 MB file
> ./bin/spark-shell --conf "spark.driver.maxResultSize=1"
scala> val df = spark.read.format("csv").load("/Users/dvogelbacher/test.csv")
df: org.apache.spark.sql.DataFrame = [_c0: string]

scala> val results = df.collect()
results: Array[org.apache.spark.sql.Row] = 
Array([a...

scala> results(0).getString(0).size
res0: Int = 10
{noformat}

Even though we set maxResultSize to 10 MB, we collect a result that is 100MB 
uncompressed.

  was:
Spark has a setting {{spark.driver.maxResultSize}}, see 
https://spark.apache.org/docs/latest/configuration.html#application-properties :
{noformat}
Limit of total size of serialized results of all partitions for each Spark 
action (e.g. collect) in bytes. Should be at least 1M, or 0 for unlimited. Jobs 
will be aborted if the total size is above this limit. Having a high limit may 
cause out-of-memory errors in driver (depends on spark.driver.memory and memory 
overhead of objects in JVM). Setting a proper limit can protect the driver from 
out-of-memory errors.
{noformat}
This setting can be very useful in constraining the memory that the spark 
driver needs for a specific spark action. However, this limit is checked before 
decompressing data in 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L662

Even if the compressed data is below the limit the uncompressed data can still 
be far above. In order to protect the driver we should also impose a limit on 
the uncompressed data. We could do this in 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala#L344
I propose adding a new config option {{spark.driver.maxUncompressedResultSize}}.

A simple repro of this with spark shell:
{noformat}
> printf 'a%.0s' {1..10} > test.csv # create a 100 MB file
> ./bin/spark-shell --conf "spark.driver.maxResultSize=1"
scala> val df = spark.read.format("csv").load("/Users/dvogelbacher/test.csv")
df: org.apache.spark.sql.DataFrame = [_c0: string]

scala> val results = df.collect()
results: Array[org.apache.spark.sql.Row] = 

[jira] [Updated] (SPARK-28761) spark.driver.maxResultSize only applies to compressed data

2019-08-16 Thread David Vogelbacher (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Vogelbacher updated SPARK-28761:
--
Description: 
Spark has a setting {{spark.driver.maxResultSize}}, see 
https://spark.apache.org/docs/latest/configuration.html#application-properties :
{noformat}
Limit of total size of serialized results of all partitions for each Spark 
action (e.g. collect) in bytes. Should be at least 1M, or 0 for unlimited. Jobs 
will be aborted if the total size is above this limit. Having a high limit may 
cause out-of-memory errors in driver (depends on spark.driver.memory and memory 
overhead of objects in JVM). Setting a proper limit can protect the driver from 
out-of-memory errors.
{noformat}
This setting can be very useful in constraining the memory that the spark 
driver needs for a specific spark action. However, this limit is checked before 
decompressing data in 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L662

Even if the compressed data is below the limit the uncompressed data can still 
be far above. In order to protect the driver we should also impose a limit on 
the uncompressed data. We could do this in 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala#L344
I propose adding a new config option {{spark.driver.maxUncompressedResultSize}}.

A simple repro of this with spark shell:
{noformat}
> printf 'a%.0s' {1..10} > test.csv # create a 100 MB file
> ./bin/spark-shell --conf "spark.driver.maxResultSize=1"
scala> val df = spark.read.format("csv").load("/Users/dvogelbacher/test.csv")
df: org.apache.spark.sql.DataFrame = [_c0: string]

scala> val results = df.collect()
results: Array[org.apache.spark.sql.Row] = 
Array([a...

scala> results(0).getString(0).size
res0: Int = 10
{noformat}

Even though we set maxResultSize to 10 MB, we collect a result that is 100MB 
uncompressed.

  was:
Spark has a setting `spark.driver.maxResultSize`, see 
https://spark.apache.org/docs/latest/configuration.html#application-properties :
{noformat}
Limit of total size of serialized results of all partitions for each Spark 
action (e.g. collect) in bytes. Should be at least 1M, or 0 for unlimited. Jobs 
will be aborted if the total size is above this limit. Having a high limit may 
cause out-of-memory errors in driver (depends on spark.driver.memory and memory 
overhead of objects in JVM). Setting a proper limit can protect the driver from 
out-of-memory errors.
{noformat}
This setting can be very useful in constraining the memory that the spark 
driver needs for a specific spark action. However, this limit is checked before 
decompressing data in 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L662

Even if the compressed data is below the limit the uncompressed data can still 
be far above. In order to protect the driver we should also impose a limit on 
the uncompressed data. We could do this in 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala#L344
I propose adding a new config option {{spark.driver.maxUncompressedResultSize}}.

A simple repro of this with spark shell:
{noformat}
> printf 'a%.0s' {1..10} > test.csv # create a 100 MB file
> ./bin/spark-shell --conf "spark.driver.maxResultSize=1"
scala> val df = spark.read.format("csv").load("/Users/dvogelbacher/test.csv")
df: org.apache.spark.sql.DataFrame = [_c0: string]

scala> val results = df.collect()
results: Array[org.apache.spark.sql.Row] = 

[jira] [Resolved] (SPARK-28671) [UDF] dropping permanent function when a temporary function with the same name already exists giving wrong msg on dropping it again

2019-08-16 Thread Takeshi Yamamuro (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro resolved SPARK-28671.
--
   Resolution: Fixed
 Assignee: pavithra ramachandran
Fix Version/s: 3.0.0

Resolved by [https://github.com/apache/spark/pull/25394]

> [UDF] dropping permanent function when a temporary function with the same 
> name already exists giving wrong msg on dropping it again
> ---
>
> Key: SPARK-28671
> URL: https://issues.apache.org/jira/browse/SPARK-28671
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: ABHISHEK KUMAR GUPTA
>Assignee: pavithra ramachandran
>Priority: Minor
> Fix For: 3.0.0
>
>
> Created jar and uploaded at hdfs path
> 1../hdfs dfs -put /opt/trash1/AddDoublesUDF.jar /user/user1/
> 2.Launch beeline and created permanent function
> CREATE FUNCTION addDoubles AS 
> 'com.huawei.bigdata.hive.example.udf.AddDoublesUDF' using jar 
> 'hdfs://hacluster/user/user1/AddDoublesUDF.jar';
> 3.Perform select operation
> jdbc:hive2://100.100.208.125:23040/default> SELECT addDoubles(1,2,3);
> +--+--+
> | default.addDoubles(1, 2, 3)  |
> +--+--+
> | 6.0  |
> +--+--+
> 1 row selected (0.111 seconds)
> 4.Created temporary function as below
> jdbc:hive2://100.100.208.125:23040/default> CREATE temporary FUNCTION 
> addDoubles AS 'com.huawei.bigdata.hive.example.udf.AddDoublesUDF' using jar 
> 'hdfs://hacluster/user/user1/AddDoublesUDF.jar';
> 5.jdbc:hive2://100.100.208.125:23040/default> SELECT addDoubles(1,2,3);
> +--+--+
> | addDoubles(1, 2, 3)  |
> +--+--+
> | 6.0  |
> +--+--+
> 1 row selected (0.088 seconds)
> 6.Drop function
> jdbc:hive2://100.100.208.125:23040/default> drop function addDoubles;
> +-+--+
> | Result  |
> +-+--+
> +-+--+
> 7.jdbc:hive2://100.100.208.125:23040/default> SELECT addDoubles(1,2,3); 
> -- It is success
> 8.Drop again Error thrown
> jdbc:hive2://100.100.208.125:23040/default> drop function addDoubles;
> Error: org.apache.spark.sql.catalyst.analysis.NoSuchFunctionException: 
> Undefined function: 'default.addDoubles'. This function is neither a 
> registered temporary function nor a permanent function registered in the 
> database 'default'.; (state=,code=0)
> 9.Perform again select 
> jdbc:hive2://100.100.208.125:23040/default>  SELECT addDoubles(1,2,3);
> +--+--+
> | addDoubles(1, 2, 3)  |
> +--+--+
> | 6.0  |
>   
> Issue is why the Error msg shown is step 8 saying it is neither registered as 
> permanent or temporary function where as it is registered as temporary 
> function in step 4 that is why in step 9 select is returning result.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28761) spark.driver.maxResultSize only applies to compressed data

2019-08-16 Thread David Vogelbacher (JIRA)
David Vogelbacher created SPARK-28761:
-

 Summary: spark.driver.maxResultSize only applies to compressed data
 Key: SPARK-28761
 URL: https://issues.apache.org/jira/browse/SPARK-28761
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.0.0
Reporter: David Vogelbacher


Spark has a setting `spark.driver.maxResultSize`, see 
https://spark.apache.org/docs/latest/configuration.html#application-properties :
{noformat}
Limit of total size of serialized results of all partitions for each Spark 
action (e.g. collect) in bytes. Should be at least 1M, or 0 for unlimited. Jobs 
will be aborted if the total size is above this limit. Having a high limit may 
cause out-of-memory errors in driver (depends on spark.driver.memory and memory 
overhead of objects in JVM). Setting a proper limit can protect the driver from 
out-of-memory errors.
{noformat}
This setting can be very useful in constraining the memory that the spark 
driver needs for a specific spark action. However, this limit is checked before 
decompressing data in 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L662

Even if the compressed data is below the limit the uncompressed data can still 
be far above. In order to protect the driver we should also impose a limit on 
the uncompressed data. We could do this in 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala#L344
I propose adding a new config option {{spark.driver.maxUncompressedResultSize}}.

A simple repro of this with spark shell:
{noformat}
> printf 'a%.0s' {1..10} > test.csv # create a 100 MB file
> ./bin/spark-shell --conf "spark.driver.maxResultSize=1"
scala> val df = spark.read.format("csv").load("/Users/dvogelbacher/test.csv")
df: org.apache.spark.sql.DataFrame = [_c0: string]

scala> val results = df.collect()
results: Array[org.apache.spark.sql.Row] = 
Array([a...

scala> results(0).getString(0).size
res0: Int = 10
{noformat}

Even though we set maxResultSize to 10 MB, we collect a result that is 100MB 
uncompressed.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28121) String Functions: decode can not accept 'escape' and 'hex' as charset

2019-08-16 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-28121:

Summary: String Functions: decode can not accept 'escape' and 'hex' as 
charset  (was: String Functions: decode can not accept 'escape' as charset)

> String Functions: decode can not accept 'escape' and 'hex' as charset
> -
>
> Key: SPARK-28121
> URL: https://issues.apache.org/jira/browse/SPARK-28121
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> {noformat}
> postgres=# select decode('1234567890','escape');
> decode
> 
> \x31323334353637383930
> (1 row)
> {noformat}
> {noformat}
> spark-sql> select decode('1234567890','escape');
> 19/06/20 01:57:33 ERROR SparkSQLDriver: Failed in [select 
> decode('1234567890','escape')]
> java.io.UnsupportedEncodingException: escape
>   at java.lang.StringCoding.decode(StringCoding.java:190)
>   at java.lang.String.(String.java:426)
>   at java.lang.String.(String.java:491)
> ...
> spark-sql> select decode('ff','hex');
> 19/08/16 21:44:55 ERROR SparkSQLDriver: Failed in [select decode('ff','hex')]
> java.io.UnsupportedEncodingException: hex
>   at java.lang.StringCoding.decode(StringCoding.java:190)
>   at java.lang.String.(String.java:426)
>   at java.lang.String.(String.java:491)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28121) String Functions: decode can not accept 'escape' as charset

2019-08-16 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-28121:

Description: 
{noformat}
postgres=# select decode('1234567890','escape');
decode

\x31323334353637383930
(1 row)
{noformat}
{noformat}
spark-sql> select decode('1234567890','escape');
19/06/20 01:57:33 ERROR SparkSQLDriver: Failed in [select 
decode('1234567890','escape')]
java.io.UnsupportedEncodingException: escape
at java.lang.StringCoding.decode(StringCoding.java:190)
at java.lang.String.(String.java:426)
at java.lang.String.(String.java:491)
...


spark-sql> select decode('ff','hex');
19/08/16 21:44:55 ERROR SparkSQLDriver: Failed in [select decode('ff','hex')]
java.io.UnsupportedEncodingException: hex
at java.lang.StringCoding.decode(StringCoding.java:190)
at java.lang.String.(String.java:426)
at java.lang.String.(String.java:491)
{noformat}



  was:
{noformat}
postgres=# select decode('1234567890','escape');
decode

\x31323334353637383930
(1 row)
{noformat}
{noformat}
spark-sql> select decode('1234567890','escape');
19/06/20 01:57:33 ERROR SparkSQLDriver: Failed in [select 
decode('1234567890','escape')]
java.io.UnsupportedEncodingException: escape
at java.lang.StringCoding.decode(StringCoding.java:190)
at java.lang.String.(String.java:426)
at java.lang.String.(String.java:491)
{noformat}




> String Functions: decode can not accept 'escape' as charset
> ---
>
> Key: SPARK-28121
> URL: https://issues.apache.org/jira/browse/SPARK-28121
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> {noformat}
> postgres=# select decode('1234567890','escape');
> decode
> 
> \x31323334353637383930
> (1 row)
> {noformat}
> {noformat}
> spark-sql> select decode('1234567890','escape');
> 19/06/20 01:57:33 ERROR SparkSQLDriver: Failed in [select 
> decode('1234567890','escape')]
> java.io.UnsupportedEncodingException: escape
>   at java.lang.StringCoding.decode(StringCoding.java:190)
>   at java.lang.String.(String.java:426)
>   at java.lang.String.(String.java:491)
> ...
> spark-sql> select decode('ff','hex');
> 19/08/16 21:44:55 ERROR SparkSQLDriver: Failed in [select decode('ff','hex')]
> java.io.UnsupportedEncodingException: hex
>   at java.lang.StringCoding.decode(StringCoding.java:190)
>   at java.lang.String.(String.java:426)
>   at java.lang.String.(String.java:491)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28742) StackOverflowError when using otherwise(col()) in a loop

2019-08-16 Thread eugen yushin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909052#comment-16909052
 ] 

eugen yushin commented on SPARK-28742:
--

Looks like the issue is diff in logic between LocalRelation (used for data 
frames) and LogicalRDD (used for DF created from RDD)

```

val df2 = Seq("1").toDF("c1")

df.explain(true)

df2.explain(true)

```

 

> StackOverflowError when using otherwise(col()) in a loop
> 
>
> Key: SPARK-28742
> URL: https://issues.apache.org/jira/browse/SPARK-28742
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0, 2.4.3
>Reporter: Ivan Tsukanov
>Priority: Major
>
> The following code
> {code:java}
> val rdd = sparkContext.makeRDD(Seq(Row("1")))
> val schema = StructType(Seq(
>   StructField("c1", StringType)
> ))
> val df = sparkSession.createDataFrame(rdd, schema)
> val column = when(col("c1").isin("1"), "1").otherwise(col("c1"))
> (1 to 9).foldLeft(df) { case (acc, _) =>
>   val res = acc.withColumn("c1", column)
>   res.take(1)
>   res
> }
> {code}
> falls with
> {code:java}
> java.lang.StackOverflowError
>at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:395)
>...{code}
> Probably, the problem is spark generates unexplainable big Physical Plan - 
> {code:java}
> val rdd = sparkContext.makeRDD(Seq(Row("1")))
> val schema = StructType(Seq(
>   StructField("c1", StringType)
> ))
> val df = sparkSession.createDataFrame(rdd, schema)
> val column = when(col("c1").isin("1"), "1").otherwise(col("c1"))
> val result = (1 to 9).foldLeft(df) { case (acc, _) =>
>   acc.withColumn("c1", column)
> }
> result.explain()
> {code}
> it shows a plan 18936 symbols length
> {code:java}
> == Physical Plan ==
> *(1) Project [CASE WHEN (CASE WHEN (CASE WHEN (CASE WHEN (CASE WHEN (CASE 
> WHEN (CASE  18936 symbols
> +- Scan ExistingRDD[c1#1]  {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28587) JDBC data source's partition whereClause should support jdbc dialect

2019-08-16 Thread Takeshi Yamamuro (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908995#comment-16908995
 ] 

Takeshi Yamamuro commented on SPARK-28587:
--

oh, I see. Nice catch! Before we decide to use the Dialect approach, I'd like 
to look for more general one working well on most databases. For example, the 
query below (converting it to unix timestamp by extract) works well?
{code}
postgres=# select * from t;
a 
-
2019-01-01 00:00:00
(1 row)

postgres=# select * from t where extract(epoch from a) > extract(epoch from 
timestamp '2014-01-28 00:00:00');
a 
-
2019-01-01 00:00:00
(1 row)
{code}
 

> JDBC data source's partition whereClause should support jdbc dialect
> 
>
> Key: SPARK-28587
> URL: https://issues.apache.org/jira/browse/SPARK-28587
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: wyp
>Priority: Minor
>
> When we use JDBC data source to search data from Phoenix, and use timestamp 
> data type column for partitionColumn, e.g.
> {code:java}
> val url = "jdbc:phoenix:thin:url=localhost:8765;serialization=PROTOBUF"
> val driver = "org.apache.phoenix.queryserver.client.Driver"
> val df = spark.read.format("jdbc")
> .option("url", url)
> .option("driver", driver)
> .option("fetchsize", "1000")
> .option("numPartitions", "6")
> .option("partitionColumn", "times")
> .option("lowerBound", "2019-07-31 00:00:00")
> .option("upperBound", "2019-08-01 00:00:00")
> .option("dbtable", "search_info_test")
> .load().select("id")
> println(df.count())
> {code}
> there will throw AvaticaSqlException in phoenix:
> {code:java}
> org.apache.calcite.avatica.AvaticaSqlException: Error -1 (0) : while 
> preparing SQL: SELECT 1 FROM search_info_test WHERE "TIMES" < '2019-07-31 
> 04:00:00' or "TIMES" is null
>   at org.apache.calcite.avatica.Helper.createException(Helper.java:54)
>   at org.apache.calcite.avatica.Helper.createException(Helper.java:41)
>   at 
> org.apache.calcite.avatica.AvaticaConnection.prepareStatement(AvaticaConnection.java:368)
>   at 
> org.apache.calcite.avatica.AvaticaConnection.prepareStatement(AvaticaConnection.java:299)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:300)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>   at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>   at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
>   at org.apache.spark.scheduler.Task.run(Task.scala:121)
>   at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> java.lang.RuntimeException: org.apache.phoenix.schema.TypeMismatchException: 
> ERROR 203 (22005): Type mismatch. TIMESTAMP and VARCHAR for "TIMES" < 
> '2019-07-31 04:00:00'
>   at org.apache.calcite.avatica.jdbc.JdbcMeta.propagate(JdbcMeta.java:700)
>   at 
> org.apache.calcite.avatica.jdbc.PhoenixJdbcMeta.prepare(PhoenixJdbcMeta.java:67)
>   at 
> org.apache.calcite.avatica.remote.LocalService.apply(LocalService.java:195)
>   at 
> org.apache.calcite.avatica.remote.Service$PrepareRequest.accept(Service.java:1215)
>   at 
> org.apache.calcite.avatica.remote.Service$PrepareRequest.accept(Service.java:1186)
>   at 
> org.apache.calcite.avatica.remote.AbstractHandler.apply(AbstractHandler.java:94)
>   at 
> org.apache.calcite.avatica.remote.ProtobufHandler.apply(ProtobufHandler.java:46)
>   at 
> org.apache.calcite.avatica.server.AvaticaProtobufHandler.handle(AvaticaProtobufHandler.java:127)
>   at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>   at org.eclipse.jetty.server.Server.handle(Server.java:534)
>   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
>   at 
> 

[jira] [Closed] (SPARK-28748) 0 as decimal (n , n) in Hive tables shows as NULL in Spark

2019-08-16 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun closed SPARK-28748.
-

> 0 as decimal (n , n) in Hive tables shows as NULL in Spark
> --
>
> Key: SPARK-28748
> URL: https://issues.apache.org/jira/browse/SPARK-28748
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0, 2.2.1, 2.3.1, 2.3.2, 2.4.3
>Reporter: Rohit Sindhu
>Priority: Minor
> Attachments: image-2019-08-16-18-18-19-279.png
>
>
> Zeros(0) inserted as decimal (n , n) in hive tables shows as null in spark 
> sql.
> Repro Steps
> *Hive Shell*
> {code}
> create table test_dec (name string , id decimal(3,3));
> insert into test_dec values ('c1' , 0) , ('c2' , 0.0) , ('c3' , 0.1);
> select * from test_dec;
> {code}
> {code}
> c1 0.000
> c2 0.000
> c3 0.100
> {code} 
> *Spark* Shell
> {code}
> spark.sqlContext.sql("select * from test_dec").show;
> {code}
> {code}
> ++-+                                                                  
>   
> |name|   id|
> ++-+
> |  c1| null|
> |  c2| null|
> |  c3|0.100|
> ++-+
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28748) 0 as decimal (n , n) in Hive tables shows as NULL in Spark

2019-08-16 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-28748.
---
   Resolution: Duplicate
Fix Version/s: (was: 3.0.0)

> 0 as decimal (n , n) in Hive tables shows as NULL in Spark
> --
>
> Key: SPARK-28748
> URL: https://issues.apache.org/jira/browse/SPARK-28748
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0, 2.2.1, 2.3.1, 2.3.2, 2.4.3
>Reporter: Rohit Sindhu
>Priority: Minor
> Attachments: image-2019-08-16-18-18-19-279.png
>
>
> Zeros(0) inserted as decimal (n , n) in hive tables shows as null in spark 
> sql.
> Repro Steps
> *Hive Shell*
> {code}
> create table test_dec (name string , id decimal(3,3));
> insert into test_dec values ('c1' , 0) , ('c2' , 0.0) , ('c3' , 0.1);
> select * from test_dec;
> {code}
> {code}
> c1 0.000
> c2 0.000
> c3 0.100
> {code} 
> *Spark* Shell
> {code}
> spark.sqlContext.sql("select * from test_dec").show;
> {code}
> {code}
> ++-+                                                                  
>   
> |name|   id|
> ++-+
> |  c1| null|
> |  c2| null|
> |  c3|0.100|
> ++-+
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-28748) 0 as decimal (n , n) in Hive tables shows as NULL in Spark

2019-08-16 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reopened SPARK-28748:
---

> 0 as decimal (n , n) in Hive tables shows as NULL in Spark
> --
>
> Key: SPARK-28748
> URL: https://issues.apache.org/jira/browse/SPARK-28748
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0, 2.2.1, 2.3.1, 2.3.2, 2.4.3
>Reporter: Rohit Sindhu
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: image-2019-08-16-18-18-19-279.png
>
>
> Zeros(0) inserted as decimal (n , n) in hive tables shows as null in spark 
> sql.
> Repro Steps
> *Hive Shell*
> {code}
> create table test_dec (name string , id decimal(3,3));
> insert into test_dec values ('c1' , 0) , ('c2' , 0.0) , ('c3' , 0.1);
> select * from test_dec;
> {code}
> {code}
> c1 0.000
> c2 0.000
> c3 0.100
> {code} 
> *Spark* Shell
> {code}
> spark.sqlContext.sql("select * from test_dec").show;
> {code}
> {code}
> ++-+                                                                  
>   
> |name|   id|
> ++-+
> |  c1| null|
> |  c2| null|
> |  c3|0.100|
> ++-+
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28748) 0 as decimal (n , n) in Hive tables shows as NULL in Spark

2019-08-16 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-28748.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

Thank you for reporting, [~RohitSindhu].
As [~yumwang] mentioned in the above, this is already fixed at Apache Spark 
3.0.0 .

> 0 as decimal (n , n) in Hive tables shows as NULL in Spark
> --
>
> Key: SPARK-28748
> URL: https://issues.apache.org/jira/browse/SPARK-28748
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0, 2.2.1, 2.3.1, 2.3.2, 2.4.3
>Reporter: Rohit Sindhu
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: image-2019-08-16-18-18-19-279.png
>
>
> Zeros(0) inserted as decimal (n , n) in hive tables shows as null in spark 
> sql.
> Repro Steps
> *Hive Shell*
> {code}
> create table test_dec (name string , id decimal(3,3));
> insert into test_dec values ('c1' , 0) , ('c2' , 0.0) , ('c3' , 0.1);
> select * from test_dec;
> {code}
> {code}
> c1 0.000
> c2 0.000
> c3 0.100
> {code} 
> *Spark* Shell
> {code}
> spark.sqlContext.sql("select * from test_dec").show;
> {code}
> {code}
> ++-+                                                                  
>   
> |name|   id|
> ++-+
> |  c1| null|
> |  c2| null|
> |  c3|0.100|
> ++-+
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28748) 0 as decimal (n , n) in Hive tables shows as NULL in Spark

2019-08-16 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908973#comment-16908973
 ] 

Dongjoon Hyun commented on SPARK-28748:
---

Thanks! Then, I'll resolve this as an issue superseded by SPARK-23710 .

> 0 as decimal (n , n) in Hive tables shows as NULL in Spark
> --
>
> Key: SPARK-28748
> URL: https://issues.apache.org/jira/browse/SPARK-28748
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0, 2.2.1, 2.3.1, 2.3.2, 2.4.3
>Reporter: Rohit Sindhu
>Priority: Minor
> Attachments: image-2019-08-16-18-18-19-279.png
>
>
> Zeros(0) inserted as decimal (n , n) in hive tables shows as null in spark 
> sql.
> Repro Steps
> *Hive Shell*
> {code}
> create table test_dec (name string , id decimal(3,3));
> insert into test_dec values ('c1' , 0) , ('c2' , 0.0) , ('c3' , 0.1);
> select * from test_dec;
> {code}
> {code}
> c1 0.000
> c2 0.000
> c3 0.100
> {code} 
> *Spark* Shell
> {code}
> spark.sqlContext.sql("select * from test_dec").show;
> {code}
> {code}
> ++-+                                                                  
>   
> |name|   id|
> ++-+
> |  c1| null|
> |  c2| null|
> |  c3|0.100|
> ++-+
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28748) 0 as decimal (n , n) in Hive tables shows as NULL in Spark

2019-08-16 Thread Yuming Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908970#comment-16908970
 ] 

Yuming Wang commented on SPARK-28748:
-

Yes

> 0 as decimal (n , n) in Hive tables shows as NULL in Spark
> --
>
> Key: SPARK-28748
> URL: https://issues.apache.org/jira/browse/SPARK-28748
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0, 2.2.1, 2.3.1, 2.3.2, 2.4.3
>Reporter: Rohit Sindhu
>Priority: Minor
> Attachments: image-2019-08-16-18-18-19-279.png
>
>
> Zeros(0) inserted as decimal (n , n) in hive tables shows as null in spark 
> sql.
> Repro Steps
> *Hive Shell*
> {code}
> create table test_dec (name string , id decimal(3,3));
> insert into test_dec values ('c1' , 0) , ('c2' , 0.0) , ('c3' , 0.1);
> select * from test_dec;
> {code}
> {code}
> c1 0.000
> c2 0.000
> c3 0.100
> {code} 
> *Spark* Shell
> {code}
> spark.sqlContext.sql("select * from test_dec").show;
> {code}
> {code}
> ++-+                                                                  
>   
> |name|   id|
> ++-+
> |  c1| null|
> |  c2| null|
> |  c3|0.100|
> ++-+
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28748) 0 as decimal (n , n) in Hive tables shows as NULL in Spark

2019-08-16 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908969#comment-16908969
 ] 

Dongjoon Hyun commented on SPARK-28748:
---

Thank you for pinging me, [~yumwang]. So, the fix is only available in Apache 
Spark 3.0.0 with Hadoop 3.2 profile build. Did I understand correctly?

> 0 as decimal (n , n) in Hive tables shows as NULL in Spark
> --
>
> Key: SPARK-28748
> URL: https://issues.apache.org/jira/browse/SPARK-28748
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0, 2.2.1, 2.3.1, 2.3.2, 2.4.3
>Reporter: Rohit Sindhu
>Priority: Minor
> Attachments: image-2019-08-16-18-18-19-279.png
>
>
> Zeros(0) inserted as decimal (n , n) in hive tables shows as null in spark 
> sql.
> Repro Steps
> *Hive Shell*
> {code}
> create table test_dec (name string , id decimal(3,3));
> insert into test_dec values ('c1' , 0) , ('c2' , 0.0) , ('c3' , 0.1);
> select * from test_dec;
> {code}
> {code}
> c1 0.000
> c2 0.000
> c3 0.100
> {code} 
> *Spark* Shell
> {code}
> spark.sqlContext.sql("select * from test_dec").show;
> {code}
> {code}
> ++-+                                                                  
>   
> |name|   id|
> ++-+
> |  c1| null|
> |  c2| null|
> |  c3|0.100|
> ++-+
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28735) MultilayerPerceptronClassifierTest.test_raw_and_probability_prediction fails on JDK11

2019-08-16 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-28735.
--
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 25475
[https://github.com/apache/spark/pull/25475]

> MultilayerPerceptronClassifierTest.test_raw_and_probability_prediction fails 
> on JDK11
> -
>
> Key: SPARK-28735
> URL: https://issues.apache.org/jira/browse/SPARK-28735
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.0.0
>
>
> Build Spark and run PySpark UT with JDK11. The last commented `assertTrue` 
> failed.
> {code}
> $ build/sbt -Phadoop-3.2 test:package
> $ python/run-tests --testnames 'pyspark.ml.tests.test_algorithms' 
> --python-executables python
> ...
> ==
> FAIL: test_raw_and_probability_prediction 
> (pyspark.ml.tests.test_algorithms.MultilayerPerceptronClassifierTest)
> --
> Traceback (most recent call last):
>   File 
> "/Users/dongjoon/APACHE/spark-master/python/pyspark/ml/tests/test_algorithms.py",
>  line 89, in test_raw_and_probability_prediction
> self.assertTrue(np.allclose(result.rawPrediction, expected_rawPrediction, 
> atol=1E-4))
> AssertionError: False is not true
> {code}
> {code:python}
> class MultilayerPerceptronClassifierTest(SparkSessionTestCase):
> def test_raw_and_probability_prediction(self):
> data_path = "data/mllib/sample_multiclass_classification_data.txt"
> df = self.spark.read.format("libsvm").load(data_path)
> mlp = MultilayerPerceptronClassifier(maxIter=100, layers=[4, 5, 4, 3],
>  blockSize=128, seed=123)
> model = mlp.fit(df)
> test = self.sc.parallelize([Row(features=Vectors.dense(0.1, 0.1, 
> 0.25, 0.25))]).toDF()
> result = model.transform(test).head()
> expected_prediction = 2.0
> expected_probability = [0.0, 0.0, 1.0]
>   expected_rawPrediction = [-11.6081922998, -8.15827998691, 
> 22.17757045]
>   self.assertTrue(result.prediction, expected_prediction)
>   self.assertTrue(np.allclose(result.probability, 
> expected_probability, atol=1E-4))
>   self.assertTrue(np.allclose(result.rawPrediction, 
> expected_rawPrediction, atol=1E-4))
>   # self.assertTrue(np.allclose(result.rawPrediction, 
> expected_rawPrediction, atol=1E-4))
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28736) pyspark.mllib.clustering fails on JDK11

2019-08-16 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-28736.
--
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 25475
[https://github.com/apache/spark/pull/25475]

> pyspark.mllib.clustering fails on JDK11
> ---
>
> Key: SPARK-28736
> URL: https://issues.apache.org/jira/browse/SPARK-28736
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib, PySpark
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.0.0
>
>
> Build Spark and run PySpark UT with JDK11.
> {code}
> $ build/sbt -Phadoop-3.2 test:package
> $ python/run-tests --testnames 'pyspark.mllib.clustering' 
> --python-executables python
> ...
> File 
> "/Users/dongjoon/APACHE/spark-master/python/pyspark/mllib/clustering.py", 
> line 386, in __main__.GaussianMixtureModel
> Failed example:
> abs(softPredicted[0] - 1.0) < 0.001
> Expected:
> True
> Got:
> False
> **
> File 
> "/Users/dongjoon/APACHE/spark-master/python/pyspark/mllib/clustering.py", 
> line 388, in __main__.GaussianMixtureModel
> Failed example:
> abs(softPredicted[1] - 0.0) < 0.001
> Expected:
> True
> Got:
> False
> **
>2 of  31 in __main__.GaussianMixtureModel
> ***Test Failed*** 2 failures.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28756) Fix checkJavaVersion to accept JDK8+

2019-08-16 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-28756:
--
Summary: Fix checkJavaVersion to accept JDK8+  (was: checkJavaVersion fails 
on JDK11)

> Fix checkJavaVersion to accept JDK8+
> 
>
> Key: SPARK-28756
> URL: https://issues.apache.org/jira/browse/SPARK-28756
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>
> {code}
> build/mvn -Phadoop-3.2 -Psparkr -DskipTests package
> R/install-dev.sh
> R/run-tests.sh
> {code}
> {code}
> Skipped 
> 
> 1. create DataFrame from list or data.frame (@test_basic.R#21) - error on 
> Java check
> 2. spark.glm and predict (@test_basic.R#57) - error on Java check
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28728) Bump Jackson Databind to 2.9.9.3

2019-08-16 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-28728.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 25451
[https://github.com/apache/spark/pull/25451]

> Bump Jackson Databind to 2.9.9.3
> 
>
> Key: SPARK-28728
> URL: https://issues.apache.org/jira/browse/SPARK-28728
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 2.4.3
>Reporter: Fokko Driesprong
>Assignee: Fokko Driesprong
>Priority: Major
> Fix For: 3.0.0
>
>
> Needs to be upgraded due to issues.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28728) Bump Jackson Databind to 2.9.9.3

2019-08-16 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-28728:
-

Assignee: Fokko Driesprong

> Bump Jackson Databind to 2.9.9.3
> 
>
> Key: SPARK-28728
> URL: https://issues.apache.org/jira/browse/SPARK-28728
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 2.4.3
>Reporter: Fokko Driesprong
>Assignee: Fokko Driesprong
>Priority: Major
>
> Needs to be upgraded due to issues.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28760) Add end-to-end Kafka delegation token test

2019-08-16 Thread Gabor Somogyi (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Somogyi updated SPARK-28760:
--
Description: 
At the moment no end-to-end Kafka delegation token test exists which was mainly 
because of missing KDC. KDC is missing in general from the testing side so I've 
discovered what kind of possibilities are there. The most obvious choice is the 
MiniKDC inside the Hadoop library where Apache Kerby runs in the background. In 
this jira I would like to add Kerby to the testing area and use it to cover 
security related features.


  was:
At the moment no end-to-end Kafka delegation token test not exists which was 
mainly because of missing KDC. KDC is missing in general from the testing side 
so I've discovered what kind of possibilities are there. The most obvious 
choice is the MiniKDC inside the Hadoop library where Apache Kerby runs in the 
background. In this jira I would like to add Kerby to the testing area and use 
it to cover security related features.



> Add end-to-end Kafka delegation token test
> --
>
> Key: SPARK-28760
> URL: https://issues.apache.org/jira/browse/SPARK-28760
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming, Tests
>Affects Versions: 3.0.0
>Reporter: Gabor Somogyi
>Priority: Major
>
> At the moment no end-to-end Kafka delegation token test exists which was 
> mainly because of missing KDC. KDC is missing in general from the testing 
> side so I've discovered what kind of possibilities are there. The most 
> obvious choice is the MiniKDC inside the Hadoop library where Apache Kerby 
> runs in the background. In this jira I would like to add Kerby to the testing 
> area and use it to cover security related features.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28748) 0 as decimal (n , n) in Hive tables shows as NULL in Spark

2019-08-16 Thread Yuming Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908940#comment-16908940
 ] 

Yuming Wang commented on SPARK-28748:
-

cc [~dongjoon] We really hit this issue.

> 0 as decimal (n , n) in Hive tables shows as NULL in Spark
> --
>
> Key: SPARK-28748
> URL: https://issues.apache.org/jira/browse/SPARK-28748
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0, 2.2.1, 2.3.1, 2.3.2, 2.4.3
>Reporter: Rohit Sindhu
>Priority: Minor
> Attachments: image-2019-08-16-18-18-19-279.png
>
>
> Zeros(0) inserted as decimal (n , n) in hive tables shows as null in spark 
> sql.
> Repro Steps
> *Hive Shell*
> {code}
> create table test_dec (name string , id decimal(3,3));
> insert into test_dec values ('c1' , 0) , ('c2' , 0.0) , ('c3' , 0.1);
> select * from test_dec;
> {code}
> {code}
> c1 0.000
> c2 0.000
> c3 0.100
> {code} 
> *Spark* Shell
> {code}
> spark.sqlContext.sql("select * from test_dec").show;
> {code}
> {code}
> ++-+                                                                  
>   
> |name|   id|
> ++-+
> |  c1| null|
> |  c2| null|
> |  c3|0.100|
> ++-+
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28748) 0 as decimal (n , n) in Hive tables shows as NULL in Spark

2019-08-16 Thread Yuming Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908937#comment-16908937
 ] 

Yuming Wang commented on SPARK-28748:
-

We fixed it by upgrade built-in Hive to 2.3.5 and we has ported test: 
SPARK-28460

!image-2019-08-16-18-18-19-279.png!

> 0 as decimal (n , n) in Hive tables shows as NULL in Spark
> --
>
> Key: SPARK-28748
> URL: https://issues.apache.org/jira/browse/SPARK-28748
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0, 2.2.1, 2.3.1, 2.3.2, 2.4.3
>Reporter: Rohit Sindhu
>Priority: Minor
> Attachments: image-2019-08-16-18-18-19-279.png
>
>
> Zeros(0) inserted as decimal (n , n) in hive tables shows as null in spark 
> sql.
> Repro Steps
> *Hive Shell*
> {code}
> create table test_dec (name string , id decimal(3,3));
> insert into test_dec values ('c1' , 0) , ('c2' , 0.0) , ('c3' , 0.1);
> select * from test_dec;
> {code}
> {code}
> c1 0.000
> c2 0.000
> c3 0.100
> {code} 
> *Spark* Shell
> {code}
> spark.sqlContext.sql("select * from test_dec").show;
> {code}
> {code}
> ++-+                                                                  
>   
> |name|   id|
> ++-+
> |  c1| null|
> |  c2| null|
> |  c3|0.100|
> ++-+
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28760) Add end-to-end Kafka delegation token test

2019-08-16 Thread Gabor Somogyi (JIRA)
Gabor Somogyi created SPARK-28760:
-

 Summary: Add end-to-end Kafka delegation token test
 Key: SPARK-28760
 URL: https://issues.apache.org/jira/browse/SPARK-28760
 Project: Spark
  Issue Type: Improvement
  Components: Structured Streaming, Tests
Affects Versions: 3.0.0
Reporter: Gabor Somogyi


At the moment no end-to-end Kafka delegation token test not exists which was 
mainly because of missing KDC. KDC is missing in general from the testing side 
so I've discovered what kind of possibilities are there. The most obvious 
choice is the MiniKDC inside the Hadoop library where Apache Kerby runs in the 
background. In this jira I would like to add Kerby to the testing area and use 
it to cover security related features.




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28748) 0 as decimal (n , n) in Hive tables shows as NULL in Spark

2019-08-16 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-28748:

Attachment: image-2019-08-16-18-18-19-279.png

> 0 as decimal (n , n) in Hive tables shows as NULL in Spark
> --
>
> Key: SPARK-28748
> URL: https://issues.apache.org/jira/browse/SPARK-28748
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0, 2.2.1, 2.3.1, 2.3.2, 2.4.3
>Reporter: Rohit Sindhu
>Priority: Minor
> Attachments: image-2019-08-16-18-18-19-279.png
>
>
> Zeros(0) inserted as decimal (n , n) in hive tables shows as null in spark 
> sql.
> Repro Steps
> *Hive Shell*
> {code}
> create table test_dec (name string , id decimal(3,3));
> insert into test_dec values ('c1' , 0) , ('c2' , 0.0) , ('c3' , 0.1);
> select * from test_dec;
> {code}
> {code}
> c1 0.000
> c2 0.000
> c3 0.100
> {code} 
> *Spark* Shell
> {code}
> spark.sqlContext.sql("select * from test_dec").show;
> {code}
> {code}
> ++-+                                                                  
>   
> |name|   id|
> ++-+
> |  c1| null|
> |  c2| null|
> |  c3|0.100|
> ++-+
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28759) Upgrade scala-maven-plugin to 4.1.1

2019-08-16 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-28759:
-

Assignee: Dongjoon Hyun

> Upgrade scala-maven-plugin to 4.1.1
> ---
>
> Key: SPARK-28759
> URL: https://issues.apache.org/jira/browse/SPARK-28759
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-27704) Change default class loader to ParallelGC

2019-08-16 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun closed SPARK-27704.
-

> Change default class loader to ParallelGC
> -
>
> Key: SPARK-27704
> URL: https://issues.apache.org/jira/browse/SPARK-27704
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Mihaly Toth
>Priority: Major
>
> In JDK 11 the default class loader changed from ParallelGC to G1GC. Even 
> though this gc performs better on pause times and interactivity, most of the 
> tasks that need to be processed are more sensitive to throughput and the to 
> the amount of memory. G1 sacrifices these to some extend to avoid the big 
> pauses. As a result the user may perceive a regression compared to JDK 8. 
> Even worse, the regression may not be limited to performance only but some 
> jobs may start failing in case they do not fit into the memory they used to 
> be happy with when running with previous JDK.
> Some other kind of apps, like streaming ones, may rather use G1 because of 
> their more interactive, more realtime needs.
> With this jira it is proposed to have a configurable default GC for all spark 
> applications. This may be overridable by the user through command line 
> parameters. The default value of the default GC (in case it is not provided 
> in spark-defaults.conf) could be ParallelGC.
> I do not see this change required but I think it would benefit to the user 
> experience.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28736) pyspark.mllib.clustering fails on JDK11

2019-08-16 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-28736:


Assignee: Hyukjin Kwon

> pyspark.mllib.clustering fails on JDK11
> ---
>
> Key: SPARK-28736
> URL: https://issues.apache.org/jira/browse/SPARK-28736
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib, PySpark
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Hyukjin Kwon
>Priority: Major
>
> Build Spark and run PySpark UT with JDK11.
> {code}
> $ build/sbt -Phadoop-3.2 test:package
> $ python/run-tests --testnames 'pyspark.mllib.clustering' 
> --python-executables python
> ...
> File 
> "/Users/dongjoon/APACHE/spark-master/python/pyspark/mllib/clustering.py", 
> line 386, in __main__.GaussianMixtureModel
> Failed example:
> abs(softPredicted[0] - 1.0) < 0.001
> Expected:
> True
> Got:
> False
> **
> File 
> "/Users/dongjoon/APACHE/spark-master/python/pyspark/mllib/clustering.py", 
> line 388, in __main__.GaussianMixtureModel
> Failed example:
> abs(softPredicted[1] - 0.0) < 0.001
> Expected:
> True
> Got:
> False
> **
>2 of  31 in __main__.GaussianMixtureModel
> ***Test Failed*** 2 failures.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28759) Upgrade scala-maven-plugin to 4.1.1

2019-08-16 Thread Dongjoon Hyun (JIRA)
Dongjoon Hyun created SPARK-28759:
-

 Summary: Upgrade scala-maven-plugin to 4.1.1
 Key: SPARK-28759
 URL: https://issues.apache.org/jira/browse/SPARK-28759
 Project: Spark
  Issue Type: Sub-task
  Components: Build
Affects Versions: 3.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28735) MultilayerPerceptronClassifierTest.test_raw_and_probability_prediction fails on JDK11

2019-08-16 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-28735:


Assignee: Hyukjin Kwon

> MultilayerPerceptronClassifierTest.test_raw_and_probability_prediction fails 
> on JDK11
> -
>
> Key: SPARK-28735
> URL: https://issues.apache.org/jira/browse/SPARK-28735
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Hyukjin Kwon
>Priority: Major
>
> Build Spark and run PySpark UT with JDK11. The last commented `assertTrue` 
> failed.
> {code}
> $ build/sbt -Phadoop-3.2 test:package
> $ python/run-tests --testnames 'pyspark.ml.tests.test_algorithms' 
> --python-executables python
> ...
> ==
> FAIL: test_raw_and_probability_prediction 
> (pyspark.ml.tests.test_algorithms.MultilayerPerceptronClassifierTest)
> --
> Traceback (most recent call last):
>   File 
> "/Users/dongjoon/APACHE/spark-master/python/pyspark/ml/tests/test_algorithms.py",
>  line 89, in test_raw_and_probability_prediction
> self.assertTrue(np.allclose(result.rawPrediction, expected_rawPrediction, 
> atol=1E-4))
> AssertionError: False is not true
> {code}
> {code:python}
> class MultilayerPerceptronClassifierTest(SparkSessionTestCase):
> def test_raw_and_probability_prediction(self):
> data_path = "data/mllib/sample_multiclass_classification_data.txt"
> df = self.spark.read.format("libsvm").load(data_path)
> mlp = MultilayerPerceptronClassifier(maxIter=100, layers=[4, 5, 4, 3],
>  blockSize=128, seed=123)
> model = mlp.fit(df)
> test = self.sc.parallelize([Row(features=Vectors.dense(0.1, 0.1, 
> 0.25, 0.25))]).toDF()
> result = model.transform(test).head()
> expected_prediction = 2.0
> expected_probability = [0.0, 0.0, 1.0]
>   expected_rawPrediction = [-11.6081922998, -8.15827998691, 
> 22.17757045]
>   self.assertTrue(result.prediction, expected_prediction)
>   self.assertTrue(np.allclose(result.probability, 
> expected_probability, atol=1E-4))
>   self.assertTrue(np.allclose(result.rawPrediction, 
> expected_rawPrediction, atol=1E-4))
>   # self.assertTrue(np.allclose(result.rawPrediction, 
> expected_rawPrediction, atol=1E-4))
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-28110) on JDK11, IsolatedClientLoader must be able to load java.sql classes

2019-08-16 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun closed SPARK-28110.
-

> on JDK11, IsolatedClientLoader must be able to load java.sql classes
> 
>
> Key: SPARK-28110
> URL: https://issues.apache.org/jira/browse/SPARK-28110
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Imran Rashid
>Priority: Major
>
> This might be very specific to my fork & a kind of weird system setup I'm 
> working on, I haven't completely confirmed yet, but I wanted to report it 
> anyway in case anybody else sees this.
> When I try to do anything which touches the metastore on java11, I 
> immediately get errors from IsolatedClientLoader that it can't load anything 
> in java.sql.  eg.
> {noformat}
> scala> spark.sql("show tables").show()
> java.lang.ClassNotFoundException: java.lang.NoClassDefFoundError: 
> java/sql/SQLTransientException when creating Hive client using classpath: 
> file:/home/systest/jdk-11.0.2/, ...
> ...
> Caused by: java.lang.ClassNotFoundException: java.sql.SQLTransientException
>   at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:471)
>   at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:588)
>   at 
> org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1.doLoadClass(IsolatedClientLoader.scala:230)
>   at 
> org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1.loadClass(IsolatedClientLoader.scala:219)
>   at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
> {noformat}
> After a bit of debugging, I also discovered that the {{rootClassLoader}} is 
> {{null}} in {{IsolatedClientLoader}}.  I think this would work if either 
> {{rootClassLoader}} could load those classes, or if {{isShared()}} was 
> changed to allow any class starting with "java."  (I'm not sure why it only 
> allows "java.lang" and "java.net" currently.)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-27587) No such method error (sun.nio.ch.DirectBuffer.cleaner()) when reading big table from JDBC (with one slow query)

2019-08-16 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun closed SPARK-27587.
-

> No such method error (sun.nio.ch.DirectBuffer.cleaner()) when reading big 
> table from JDBC (with one slow query)
> ---
>
> Key: SPARK-27587
> URL: https://issues.apache.org/jira/browse/SPARK-27587
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 2.4.1, 2.4.2
>Reporter: Mohsen Taheri
>Priority: Major
>
> It throws the error while reading big tables from JDBC data source:
> > Code:
> sparkSession.read()
>  .option("numPartitions", data.numPartitions)
>  .option("partitionColumn", data.pk)
>  .option("lowerBound", data.min)
>  .option("upperBound", data.max)
>  .option("queryTimeout", 180).
>  format("jdbc").
>  jdbc(dbURL, tableName, props).
>  
> repartition(10).write().mode(SaveMode.Overwrite).parquet(tableF.getAbsolutePath());
>  
> > Stacktrace:
> Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most 
> recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor 
> driver): java.lang.NoSuchMethodError: 
> sun.nio.ch.DirectBuffer.cleaner()Lsun/misc/Cleaner; +details
> Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most 
> recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor 
> driver): java.lang.NoSuchMethodError: 
> sun.nio.ch.DirectBuffer.cleaner()Lsun/misc/Cleaner;
>  at 
> org.apache.spark.storage.StorageUtils$.cleanDirectBuffer(StorageUtils.scala:212)
>  at org.apache.spark.storage.StorageUtils$.dispose(StorageUtils.scala:207)
>  at org.apache.spark.storage.StorageUtils.dispose(StorageUtils.scala)
>  at 
> org.apache.spark.io.NioBufferedFileInputStream.close(NioBufferedFileInputStream.java:130)
>  at java.base/java.io.FilterInputStream.close(FilterInputStream.java:180)
>  at 
> org.apache.spark.io.ReadAheadInputStream.close(ReadAheadInputStream.java:400)
>  at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeSorterSpillReader.close(UnsafeSorterSpillReader.java:151)
>  at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeSorterSpillReader.loadNext(UnsafeSorterSpillReader.java:123)
>  at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeSorterSpillMerger$1.loadNext(UnsafeSorterSpillMerger.java:82)
>  at 
> org.apache.spark.sql.execution.UnsafeExternalRowSorter$1.next(UnsafeExternalRowSorter.java:187)
>  at 
> org.apache.spark.sql.execution.UnsafeExternalRowSorter$1.next(UnsafeExternalRowSorter.java:174)
>  at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
>  at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:149)
>  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
>  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
>  at org.apache.spark.scheduler.Task.run(Task.scala:121)
>  at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:411)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  at java.base/java.lang.Thread.run(Thread.java:834)
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-27585) No such method error (sun.nio.ch.DirectBuffer.cleaner())

2019-08-16 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun closed SPARK-27585.
-

> No such method error (sun.nio.ch.DirectBuffer.cleaner())
> 
>
> Key: SPARK-27585
> URL: https://issues.apache.org/jira/browse/SPARK-27585
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 2.4.1, 2.4.2
>Reporter: Mohsen Taheri
>Priority: Major
>
> The error appears when a JDBC read executes and lasts for a while 
> (partitioned queries with null conditions are executing with more than normal 
> query timing and then it results in the error maybe)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-26896) Add maven profiles for running tests with JDK 11

2019-08-16 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun closed SPARK-26896.
-

> Add maven profiles for running tests with JDK 11
> 
>
> Key: SPARK-26896
> URL: https://issues.apache.org/jira/browse/SPARK-26896
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Imran Rashid
>Priority: Major
>
> Running unit tests w/ JDK 11 trips over some issues w/ the new module system. 
>  These can be worked around with the new {{--add-opens}} etc. commands.  I 
> think we need to add a build profile for JDK 11 to add some extra args to the 
> test runners.
> In particular:
> 1) removal of jaxb from java itself (used in pmml export in mllib)
> 2) Some reflective access which results in failures, eg. 
> {noformat}
> Unable to make field jdk.internal.ref.PhantomCleanable
> jdk.internal.ref.PhantomCleanable.prev accessible: module java.base does
> not "opens jdk.internal.ref" to unnamed module
> {noformat}
> 3) Some reflective access which results in warnings (you can add 
> {{--illegal-access=warn}} to see all of these).
> All I'm proposing we do here is put in the required handling to make these 
> problems go away, not necessarily do the "right" thing by no longer 
> referencing these unexposed internals.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18466) Missing withFilter method causes errors when using for comprehensions in Scala 2.12

2019-08-16 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-18466:
--
Issue Type: Improvement  (was: Sub-task)
Parent: (was: SPARK-24417)

> Missing withFilter method causes errors when using for comprehensions in 
> Scala 2.12
> ---
>
> Key: SPARK-18466
> URL: https://issues.apache.org/jira/browse/SPARK-18466
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Richard W. Eggert II
>Priority: Minor
>  Labels: easyfix
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The fact that the RDD class has a {{filter}} method but not a {{withFilter}} 
> method results in compiler warnings when using RDDs in {{for}} 
> comprehensions. As of Scala 2.12, falling back to use of {{filter}} is no 
> longer supported, so {{for}} comprehensions that use filters will no longer 
> compile. Semantically, the only difference between {{withFilter}} and 
> {{filter}} is that {{withFilter}} is lazy, and since RDDs are lazy by nature, 
> one can simply be aliased to the other.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25939) Spark 'jshell' support on JDK 11

2019-08-16 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-25939:
--
Issue Type: New Feature  (was: Sub-task)
Parent: (was: SPARK-24417)

> Spark 'jshell' support on JDK 11
> 
>
> Key: SPARK-25939
> URL: https://issues.apache.org/jira/browse/SPARK-25939
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Shell
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> From JDK 9, jshell 
> (https://docs.oracle.com/en/java/javase/11/jshell/introduction-jshell.html#GUID-630F27C8-1195-4989-9F6B-2C51D46F52C8)
>  is introduced.
> It would be great if Spark support JShell so that Java users can REPL it.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25596) TLS1.3 support

2019-08-16 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-25596:
--
Issue Type: Improvement  (was: Sub-task)
Parent: (was: SPARK-24417)

> TLS1.3 support
> --
>
> Key: SPARK-25596
> URL: https://issues.apache.org/jira/browse/SPARK-25596
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 2.3.0
>Reporter: t oo
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-28587) JDBC data source's partition whereClause should support jdbc dialect

2019-08-16 Thread wyp (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908881#comment-16908881
 ] 

wyp edited comment on SPARK-28587 at 8/16/19 9:19 AM:
--

[~maropu], Thank you for your reply. 

If the type of the TIMES field is timestamp, then 'SELECT 1 FROM 
search_info_test WHERE "TIMES" < '2019-07-31 04:00:00' or "TIMES" is null' will 
fail. eg:
{code:java}
// CREATE TABLE SEARCH_INFO_TEST (ID BIGINT primary key, TIMES TIMESTAMP);
// CREATE INDEX TIMES_INDEX ON SEARCH_INFO_TEST(TIMES);

0: jdbc:phoenix:thin:url=http://192.168.0.1> SELECT 1 FROM SEARCH_INFO_TEST 
WHERE "TIMES" < '2019-07-31 04:00:00' or "TIMES" is null;
Error: Error -1 (0) : Error while executing SQL "SELECT 1 FROM 
SEARCH_INFO_TEST WHERE "TIMES" < '2019-07-31 04:00:00' or "TIMES" is null": 
Remote driver error: RuntimeException: 
org.apache.phoenix.schema.TypeMismatchException: ERROR 203 (22005): Type 
mismatch. TIMESTAMP and VARCHAR for "TIMES" < '2019-07-31 04:00:00' -> 
TypeMismatchException: ERROR 203 (22005): Type mismatch. TIMESTAMP and VARCHAR 
for "TIMES" < '2019-07-31 04:00:00' (state=0,code=-1)
org.apache.calcite.avatica.AvaticaSqlException: Error -1 (0) : Error while 
executing SQL "SELECT 1 FROM SEARCH_INFO_TEST WHERE "TIMES" < '2019-07-31 
04:00:00' or "TIMES" is null": Remote driver error: RuntimeException: 
org.apache.phoenix.schema.TypeMismatchException: ERROR 203 (22005): Type 
mismatch. TIMESTAMP and VARCHAR for "TIMES" < '2019-07-31 04:00:00' -> 
TypeMismatchException: ERROR 203 (22005): Type mismatch. TIMESTAMP and VARCHAR 
for "TIMES" < '2019-07-31 04:00:00'
at org.apache.calcite.avatica.Helper.createException(Helper.java:54)
at org.apache.calcite.avatica.Helper.createException(Helper.java:41)
at 
org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:163)
at 
org.apache.calcite.avatica.AvaticaStatement.execute(AvaticaStatement.java:217)
at sqlline.Commands.execute(Commands.java:822)
at sqlline.Commands.sql(Commands.java:732)
at sqlline.SqlLine.dispatch(SqlLine.java:813)
at sqlline.SqlLine.begin(SqlLine.java:686)
at sqlline.SqlLine.start(SqlLine.java:398)
at sqlline.SqlLine.main(SqlLine.java:291)
at 
org.apache.phoenix.queryserver.client.SqllineWrapper.main(SqllineWrapper.java:93)
java.lang.RuntimeException: org.apache.phoenix.schema.TypeMismatchException: 
ERROR 203 (22005): Type mismatch. TIMESTAMP and VARCHAR for "TIMES" < 
'2019-07-31 04:00:00'
at org.apache.calcite.avatica.jdbc.JdbcMeta.propagate(JdbcMeta.java:700)
at 
org.apache.calcite.avatica.jdbc.PhoenixJdbcMeta.prepareAndExecute(PhoenixJdbcMeta.java:101)
at org.apache.calcite.avatica.remote.LocalService.apply(LocalService.java:206)
at 
org.apache.calcite.avatica.remote.Service$PrepareAndExecuteRequest.accept(Service.java:927)
at 
org.apache.calcite.avatica.remote.Service$PrepareAndExecuteRequest.accept(Service.java:879)
at 
org.apache.calcite.avatica.remote.AbstractHandler.apply(AbstractHandler.java:94)
at 
org.apache.calcite.avatica.remote.ProtobufHandler.apply(ProtobufHandler.java:46)
at 
org.apache.calcite.avatica.server.AvaticaProtobufHandler.handle(AvaticaProtobufHandler.java:127)
at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:534)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
at 
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
at java.lang.Thread.run(Thread.java:834)
Caused by: org.apache.phoenix.schema.TypeMismatchException: ERROR 203 (22005): 
Type mismatch. TIMESTAMP and VARCHAR for "TIMES" < '2019-07-31 04:00:00'
at 
org.apache.phoenix.schema.TypeMismatchException.newException(TypeMismatchException.java:53)
at 
org.apache.phoenix.expression.ComparisonExpression.create(ComparisonExpression.java:149)
at 
org.apache.phoenix.compile.ExpressionCompiler.visitLeave(ExpressionCompiler.java:234)
at 
org.apache.phoenix.compile.ExpressionCompiler.visitLeave(ExpressionCompiler.java:146)
at 

[jira] [Commented] (SPARK-28587) JDBC data source's partition whereClause should support jdbc dialect

2019-08-16 Thread wyp (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908881#comment-16908881
 ] 

wyp commented on SPARK-28587:
-

[~maropu], Thank you for your reply. 

If the type of the TIMES field is timestamp, then 'SELECT 1 FROM 
search_info_test WHERE "TIMES" < '2019-07-31 04:00:00' or "TIMES" is null' will 
fail. eg:

 
{code:java}
// CREATE TABLE SEARCH_INFO_TEST (ID BIGINT primary key, TIMES TIMESTAMP);
// CREATE INDEX TIMES_INDEX ON SEARCH_INFO_TEST(TIMES);

0: jdbc:phoenix:thin:url=http://192.168.0.1> SELECT 1 FROM SEARCH_INFO_TEST 
WHERE "TIMES" < '2019-07-31 04:00:00' or "TIMES" is null;
Error: Error -1 (0) : Error while executing SQL "SELECT 1 FROM 
SEARCH_INFO_TEST WHERE "TIMES" < '2019-07-31 04:00:00' or "TIMES" is null": 
Remote driver error: RuntimeException: 
org.apache.phoenix.schema.TypeMismatchException: ERROR 203 (22005): Type 
mismatch. TIMESTAMP and VARCHAR for "TIMES" < '2019-07-31 04:00:00' -> 
TypeMismatchException: ERROR 203 (22005): Type mismatch. TIMESTAMP and VARCHAR 
for "TIMES" < '2019-07-31 04:00:00' (state=0,code=-1)
org.apache.calcite.avatica.AvaticaSqlException: Error -1 (0) : Error while 
executing SQL "SELECT 1 FROM SEARCH_INFO_TEST WHERE "TIMES" < '2019-07-31 
04:00:00' or "TIMES" is null": Remote driver error: RuntimeException: 
org.apache.phoenix.schema.TypeMismatchException: ERROR 203 (22005): Type 
mismatch. TIMESTAMP and VARCHAR for "TIMES" < '2019-07-31 04:00:00' -> 
TypeMismatchException: ERROR 203 (22005): Type mismatch. TIMESTAMP and VARCHAR 
for "TIMES" < '2019-07-31 04:00:00'
at org.apache.calcite.avatica.Helper.createException(Helper.java:54)
at org.apache.calcite.avatica.Helper.createException(Helper.java:41)
at 
org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:163)
at 
org.apache.calcite.avatica.AvaticaStatement.execute(AvaticaStatement.java:217)
at sqlline.Commands.execute(Commands.java:822)
at sqlline.Commands.sql(Commands.java:732)
at sqlline.SqlLine.dispatch(SqlLine.java:813)
at sqlline.SqlLine.begin(SqlLine.java:686)
at sqlline.SqlLine.start(SqlLine.java:398)
at sqlline.SqlLine.main(SqlLine.java:291)
at 
org.apache.phoenix.queryserver.client.SqllineWrapper.main(SqllineWrapper.java:93)
java.lang.RuntimeException: org.apache.phoenix.schema.TypeMismatchException: 
ERROR 203 (22005): Type mismatch. TIMESTAMP and VARCHAR for "TIMES" < 
'2019-07-31 04:00:00'
at org.apache.calcite.avatica.jdbc.JdbcMeta.propagate(JdbcMeta.java:700)
at 
org.apache.calcite.avatica.jdbc.PhoenixJdbcMeta.prepareAndExecute(PhoenixJdbcMeta.java:101)
at org.apache.calcite.avatica.remote.LocalService.apply(LocalService.java:206)
at 
org.apache.calcite.avatica.remote.Service$PrepareAndExecuteRequest.accept(Service.java:927)
at 
org.apache.calcite.avatica.remote.Service$PrepareAndExecuteRequest.accept(Service.java:879)
at 
org.apache.calcite.avatica.remote.AbstractHandler.apply(AbstractHandler.java:94)
at 
org.apache.calcite.avatica.remote.ProtobufHandler.apply(ProtobufHandler.java:46)
at 
org.apache.calcite.avatica.server.AvaticaProtobufHandler.handle(AvaticaProtobufHandler.java:127)
at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:534)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
at 
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
at java.lang.Thread.run(Thread.java:834)
Caused by: org.apache.phoenix.schema.TypeMismatchException: ERROR 203 (22005): 
Type mismatch. TIMESTAMP and VARCHAR for "TIMES" < '2019-07-31 04:00:00'
at 
org.apache.phoenix.schema.TypeMismatchException.newException(TypeMismatchException.java:53)
at 
org.apache.phoenix.expression.ComparisonExpression.create(ComparisonExpression.java:149)
at 
org.apache.phoenix.compile.ExpressionCompiler.visitLeave(ExpressionCompiler.java:234)
at 
org.apache.phoenix.compile.ExpressionCompiler.visitLeave(ExpressionCompiler.java:146)
at 
org.apache.phoenix.parse.ComparisonParseNode.accept(ComparisonParseNode.java:47)
at 

[jira] [Updated] (SPARK-28587) JDBC data source's partition whereClause should support jdbc dialect

2019-08-16 Thread wyp (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wyp updated SPARK-28587:

Description: 
When we use JDBC data source to search data from Phoenix, and use timestamp 
data type column for partitionColumn, e.g.
{code:java}
val url = "jdbc:phoenix:thin:url=localhost:8765;serialization=PROTOBUF"
val driver = "org.apache.phoenix.queryserver.client.Driver"

val df = spark.read.format("jdbc")
.option("url", url)
.option("driver", driver)
.option("fetchsize", "1000")
.option("numPartitions", "6")
.option("partitionColumn", "times")
.option("lowerBound", "2019-07-31 00:00:00")
.option("upperBound", "2019-08-01 00:00:00")
.option("dbtable", "search_info_test")
.load().select("id")

println(df.count())
{code}
there will throw AvaticaSqlException in phoenix:
{code:java}
org.apache.calcite.avatica.AvaticaSqlException: Error -1 (0) : while 
preparing SQL: SELECT 1 FROM search_info_test WHERE "TIMES" < '2019-07-31 
04:00:00' or "TIMES" is null
  at org.apache.calcite.avatica.Helper.createException(Helper.java:54)
  at org.apache.calcite.avatica.Helper.createException(Helper.java:41)
  at 
org.apache.calcite.avatica.AvaticaConnection.prepareStatement(AvaticaConnection.java:368)
  at 
org.apache.calcite.avatica.AvaticaConnection.prepareStatement(AvaticaConnection.java:299)
  at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:300)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
  at org.apache.spark.scheduler.Task.run(Task.scala:121)
  at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at java.lang.Thread.run(Thread.java:748)
java.lang.RuntimeException: org.apache.phoenix.schema.TypeMismatchException: 
ERROR 203 (22005): Type mismatch. TIMESTAMP and VARCHAR for "TIMES" < 
'2019-07-31 04:00:00'
  at org.apache.calcite.avatica.jdbc.JdbcMeta.propagate(JdbcMeta.java:700)
  at 
org.apache.calcite.avatica.jdbc.PhoenixJdbcMeta.prepare(PhoenixJdbcMeta.java:67)
  at org.apache.calcite.avatica.remote.LocalService.apply(LocalService.java:195)
  at 
org.apache.calcite.avatica.remote.Service$PrepareRequest.accept(Service.java:1215)
  at 
org.apache.calcite.avatica.remote.Service$PrepareRequest.accept(Service.java:1186)
  at 
org.apache.calcite.avatica.remote.AbstractHandler.apply(AbstractHandler.java:94)
  at 
org.apache.calcite.avatica.remote.ProtobufHandler.apply(ProtobufHandler.java:46)
  at 
org.apache.calcite.avatica.server.AvaticaProtobufHandler.handle(AvaticaProtobufHandler.java:127)
  at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52)
  at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
  at org.eclipse.jetty.server.Server.handle(Server.java:534)
  at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
  at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
  at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
  at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
  at 
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
  at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
  at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
  at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
  at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
  at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
  at java.lang.Thread.run(Thread.java:834)
{code}
the reason is because JDBC data source's partition whereClause doesn't support 
jdbc dialect. We should use jdbc dialect to compile '2019-07-31 04:00:00' to 
to_timestamp('2019-07-31 04:00:00')

  was:
When we use JDBC data 

[jira] [Assigned] (SPARK-28737) Update jersey to 2.27+ (2.29)

2019-08-16 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-28737:
-

Assignee: Sean Owen

> Update jersey to 2.27+ (2.29)
> -
>
> Key: SPARK-28737
> URL: https://issues.apache.org/jira/browse/SPARK-28737
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Sean Owen
>Assignee: Sean Owen
>Priority: Major
>
> Looks like we might need to update Jersey after all, from recent JDK 11 
> testing: 
> {code}
> Caused by: java.lang.IllegalArgumentException
>   at 
> jersey.repackaged.org.objectweb.asm.ClassReader.init(ClassReader.java:170)
>   at 
> jersey.repackaged.org.objectweb.asm.ClassReader.init(ClassReader.java:153)
>   at 
> jersey.repackaged.org.objectweb.asm.ClassReader.init(ClassReader.java:424)
>   at 
> org.glassfish.jersey.server.internal.scanning.AnnotationAcceptingListener.process(AnnotationAcceptingListener.java:170)
> {code}
> It looks like 2.27+ may solve the issue, so worth trying 2.29. 
> I'm not 100% sure this is an issue as the JDK 11 testing process is still 
> undergoing change, but will work on it to see how viable it is anyway, as it 
> may be worthwhile to update for 3.0 in any event.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28750) Use `--release 8` for javac

2019-08-16 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-28750.
---
Resolution: Later

> Use `--release 8` for javac
> ---
>
> Key: SPARK-28750
> URL: https://issues.apache.org/jira/browse/SPARK-28750
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28587) JDBC data source's partition whereClause should support jdbc dialect

2019-08-16 Thread wyp (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wyp updated SPARK-28587:

Description: 
When we use JDBC data source to search data from Phoenix, and use timestamp 
data type column for partitionColumn, e.g.
{code:java}
val url = "jdbc:phoenix:thin:url=localhost:8765;serialization=PROTOBUF"
val driver = "org.apache.phoenix.queryserver.client.Driver"

val df = spark.read.format("jdbc")
.option("url", url)
.option("driver", driver)
.option("fetchsize", "1000")
.option("numPartitions", "6")
.option("partitionColumn", "search_info_test")
.option("lowerBound", "2019-07-31 00:00:00")
.option("upperBound", "2019-08-01 00:00:00")
.option("dbtable", "test")
.load().select("id")

println(df.count())
{code}
there will throw AvaticaSqlException in phoenix:
{code:java}
org.apache.calcite.avatica.AvaticaSqlException: Error -1 (0) : while 
preparing SQL: SELECT 1 FROM search_info_test WHERE "TIMES" < '2019-07-31 
04:00:00' or "TIMES" is null
  at org.apache.calcite.avatica.Helper.createException(Helper.java:54)
  at org.apache.calcite.avatica.Helper.createException(Helper.java:41)
  at 
org.apache.calcite.avatica.AvaticaConnection.prepareStatement(AvaticaConnection.java:368)
  at 
org.apache.calcite.avatica.AvaticaConnection.prepareStatement(AvaticaConnection.java:299)
  at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:300)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
  at org.apache.spark.scheduler.Task.run(Task.scala:121)
  at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at java.lang.Thread.run(Thread.java:748)
java.lang.RuntimeException: org.apache.phoenix.schema.TypeMismatchException: 
ERROR 203 (22005): Type mismatch. TIMESTAMP and VARCHAR for "TIMES" < 
'2019-07-31 04:00:00'
  at org.apache.calcite.avatica.jdbc.JdbcMeta.propagate(JdbcMeta.java:700)
  at 
org.apache.calcite.avatica.jdbc.PhoenixJdbcMeta.prepare(PhoenixJdbcMeta.java:67)
  at org.apache.calcite.avatica.remote.LocalService.apply(LocalService.java:195)
  at 
org.apache.calcite.avatica.remote.Service$PrepareRequest.accept(Service.java:1215)
  at 
org.apache.calcite.avatica.remote.Service$PrepareRequest.accept(Service.java:1186)
  at 
org.apache.calcite.avatica.remote.AbstractHandler.apply(AbstractHandler.java:94)
  at 
org.apache.calcite.avatica.remote.ProtobufHandler.apply(ProtobufHandler.java:46)
  at 
org.apache.calcite.avatica.server.AvaticaProtobufHandler.handle(AvaticaProtobufHandler.java:127)
  at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52)
  at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
  at org.eclipse.jetty.server.Server.handle(Server.java:534)
  at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
  at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
  at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
  at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
  at 
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
  at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
  at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
  at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
  at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
  at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
  at java.lang.Thread.run(Thread.java:834)
{code}
the reason is because JDBC data source's partition whereClause doesn't support 
jdbc dialect. We should use jdbc dialect to compile '2019-07-31 04:00:00' to 
to_timestamp('2019-07-31 04:00:00')

  was:
When we use JDBC data 

[jira] [Assigned] (SPARK-28758) Upgrade Janino to 3.0.15

2019-08-16 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-28758:
-

Assignee: Dongjoon Hyun

> Upgrade Janino to 3.0.15
> 
>
> Key: SPARK-28758
> URL: https://issues.apache.org/jira/browse/SPARK-28758
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, SQL
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>
> This issue aims to upgrade Janino to bring the bug fixes. Please note that 
> Janino 3.1.0 has a major refactoring instead of bug fixes. We had better use 
> 3.0.15.
> *3.0.15 (2019-07-28)*
> - Fix overloaded single static method import
> *3.0.14 (2019-07-05)*
> - Conflict in sbt-assembly
> - Overloaded static on-demand imported methods cause a CompileException: 
> Ambiguous static method import
> - Handle overloaded static on-demand imports
> - Major refactoring of the Java 8 and Java 9 retrofit mechanism
> - Added tests for "JLS8 8.6 Instance Initializers" and "JLS8 8.7 Static 
> Initializers"
> - Local variables in instance initializers don't work
> - Provide an option to keep generated code files
> - Added compile error handler and warning handler to ICompiler



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28758) Upgrade Janino to 3.0.15

2019-08-16 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-28758:
--
Component/s: Build

> Upgrade Janino to 3.0.15
> 
>
> Key: SPARK-28758
> URL: https://issues.apache.org/jira/browse/SPARK-28758
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, SQL
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> This issue aims to upgrade Janino to bring the bug fixes. Please note that 
> Janino 3.1.0 has a major refactoring instead of bug fixes. We had better use 
> 3.0.15.
> *3.0.15 (2019-07-28)*
> - Fix overloaded single static method import
> *3.0.14 (2019-07-05)*
> - Conflict in sbt-assembly
> - Overloaded static on-demand imported methods cause a CompileException: 
> Ambiguous static method import
> - Handle overloaded static on-demand imports
> - Major refactoring of the Java 8 and Java 9 retrofit mechanism
> - Added tests for "JLS8 8.6 Instance Initializers" and "JLS8 8.7 Static 
> Initializers"
> - Local variables in instance initializers don't work
> - Provide an option to keep generated code files
> - Added compile error handler and warning handler to ICompiler



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24417) Build and Run Spark on JDK11

2019-08-16 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-24417:
--
Target Version/s: 3.0.0

> Build and Run Spark on JDK11
> 
>
> Key: SPARK-24417
> URL: https://issues.apache.org/jira/browse/SPARK-24417
> Project: Spark
>  Issue Type: New Feature
>  Components: Build
>Affects Versions: 2.3.0
>Reporter: DB Tsai
>Priority: Major
>
> This is an umbrella JIRA for Apache Spark to support JDK11
> As JDK8 is reaching EOL, and JDK9 and 10 are already end of life, per 
> community discussion, we will skip JDK9 and 10 to support JDK 11 directly.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28758) Upgrade Janino to 3.0.15

2019-08-16 Thread Dongjoon Hyun (JIRA)
Dongjoon Hyun created SPARK-28758:
-

 Summary: Upgrade Janino to 3.0.15
 Key: SPARK-28758
 URL: https://issues.apache.org/jira/browse/SPARK-28758
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Dongjoon Hyun


This issue aims to upgrade Janino to bring the bug fixes. Please note that 
Janino 3.1.0 has a major refactoring instead of bug fixes. We had better use 
3.0.15.

*3.0.15 (2019-07-28)*

- Fix overloaded single static method import

*3.0.14 (2019-07-05)*

- Conflict in sbt-assembly
- Overloaded static on-demand imported methods cause a CompileException: 
Ambiguous static method import
- Handle overloaded static on-demand imports
- Major refactoring of the Java 8 and Java 9 retrofit mechanism
- Added tests for "JLS8 8.6 Instance Initializers" and "JLS8 8.7 Static 
Initializers"
- Local variables in instance initializers don't work
- Provide an option to keep generated code files
- Added compile error handler and warning handler to ICompiler



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28757) File table location should include both values of option `path` and `paths`

2019-08-16 Thread Gengliang Wang (JIRA)
Gengliang Wang created SPARK-28757:
--

 Summary: File table location should include both values of option 
`path` and `paths`
 Key: SPARK-28757
 URL: https://issues.apache.org/jira/browse/SPARK-28757
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Gengliang Wang


In V1 implementation, file table location includes both values of option `path` 
and `paths`.
In the refactoring of https://github.com/apache/spark/pull/24025, the value of 
option `path` is ignored if "paths" are specified. We should make it consistent 
with V1.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28756) checkJavaVersion fails on JDK11

2019-08-16 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-28756:
-

Assignee: Dongjoon Hyun

> checkJavaVersion fails on JDK11
> ---
>
> Key: SPARK-28756
> URL: https://issues.apache.org/jira/browse/SPARK-28756
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>
> {code}
> build/mvn -Phadoop-3.2 -Psparkr -DskipTests package
> R/install-dev.sh
> R/run-tests.sh
> {code}
> {code}
> Skipped 
> 
> 1. create DataFrame from list or data.frame (@test_basic.R#21) - error on 
> Java check
> 2. spark.glm and predict (@test_basic.R#57) - error on Java check
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28743) YarnShuffleService leads to NodeManager OOM because ChannelOutboundBuffer has too many entries

2019-08-16 Thread Jiandan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiandan Yang  updated SPARK-28743:
--
Summary: YarnShuffleService leads to NodeManager OOM because 
ChannelOutboundBuffer has too many entries  (was: YarnShuffleService leads to 
NodeManager OOM because ChannelOutboundBuffer has t0o many entries)

> YarnShuffleService leads to NodeManager OOM because ChannelOutboundBuffer has 
> too many entries
> --
>
> Key: SPARK-28743
> URL: https://issues.apache.org/jira/browse/SPARK-28743
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 2.3.0
>Reporter: Jiandan Yang 
>Priority: Major
> Attachments: dominator.jpg, histo.jpg
>
>
> NodeManager heap size is 4G, io.netty.channel.ChannelOutboundBuffer$Entry 
> occupied about 2.8G by looking at Histogram of Mat, and those Entries were 
> hold by ChannelOutboundBuffer by looking at dominator_tree of mat. By 
> analyzing  one fo ChannelOutboundBuffer object, I found there were 248867 
> entries in the object of ChannelOutboundBuffer 
> (ChannelOutboundBuffer#flushed=248867), and  
> ChannelOutboundBuffer#totalPengdingSize=23891232 which is more than 
> highwaterMark(64K), and unwritable=1 meaning sending buffer was full.  But 
> ChannelHandler seems not check unwritable flag when write message, and 
> finally NodeManager occurs OOM.
> Histogram:
> !histo.jpg!
> dominator_tree:
> !dominator.jpg!
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28756) checkJavaVersion fails on JDK11

2019-08-16 Thread Dongjoon Hyun (JIRA)
Dongjoon Hyun created SPARK-28756:
-

 Summary: checkJavaVersion fails on JDK11
 Key: SPARK-28756
 URL: https://issues.apache.org/jira/browse/SPARK-28756
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Affects Versions: 3.0.0
Reporter: Dongjoon Hyun


{code}
build/mvn -Phadoop-3.2 -Psparkr -DskipTests package
R/install-dev.sh
R/run-tests.sh
{code}

{code}
Skipped 
1. create DataFrame from list or data.frame (@test_basic.R#21) - error on Java 
check
2. spark.glm and predict (@test_basic.R#57) - error on Java check
{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28755) test_mllib_classification fails on JDK11

2019-08-16 Thread Dongjoon Hyun (JIRA)
Dongjoon Hyun created SPARK-28755:
-

 Summary: test_mllib_classification fails on JDK11
 Key: SPARK-28755
 URL: https://issues.apache.org/jira/browse/SPARK-28755
 Project: Spark
  Issue Type: Bug
  Components: SparkR
Affects Versions: 3.0.0
Reporter: Dongjoon Hyun


- https://github.com/apache/spark/pull/25443
- 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109111/consoleFull

{code}
...
1. Failure: spark.mlp (@test_mllib_classification.R#310) ---
head(summary$weights, 5) not equal to list(-24.28415, 107.8701, 16.86376, 
1.103736, 9.244488).
Component 1: Mean relative difference: 0.002250183
Component 2: Mean relative difference: 0.001494751
Component 3: Mean relative difference: 0.001602342
Component 4: Mean relative difference: 0.01193038
Component 5: Mean relative difference: 0.001732629
{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28755) test_mllib_classification fails on JDK11

2019-08-16 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-28755:
--
Issue Type: Sub-task  (was: Bug)
Parent: SPARK-24417

> test_mllib_classification fails on JDK11
> 
>
> Key: SPARK-28755
> URL: https://issues.apache.org/jira/browse/SPARK-28755
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> - https://github.com/apache/spark/pull/25443
> - 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109111/consoleFull
> {code}
> ...
> 1. Failure: spark.mlp (@test_mllib_classification.R#310) 
> ---
> head(summary$weights, 5) not equal to list(-24.28415, 107.8701, 16.86376, 
> 1.103736, 9.244488).
> Component 1: Mean relative difference: 0.002250183
> Component 2: Mean relative difference: 0.001494751
> Component 3: Mean relative difference: 0.001602342
> Component 4: Mean relative difference: 0.01193038
> Component 5: Mean relative difference: 0.001732629
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-28726) Spark with DynamicAllocation always got connect rest by peers

2019-08-16 Thread angerszhu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908714#comment-16908714
 ] 

angerszhu edited comment on SPARK-28726 at 8/16/19 6:03 AM:


[~hyukjin.kwon]

Just SparkthriftServer run sql with dynamic allocation. config like last reply.


was (Author: angerszhuuu):
[~hyukjin.kwon]

Just SparkthriftServer run sql with dynamic allocation. config like below.

> Spark with DynamicAllocation always got connect rest by peers
> -
>
> Key: SPARK-28726
> URL: https://issues.apache.org/jira/browse/SPARK-28726
> Project: Spark
>  Issue Type: Wish
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: angerszhu
>Priority: Major
>
> When use Spark with dynamic allocation, we set idle time to 5s
> We always got exception about neety 'Connect reset by peers'
>  
> I suspect that it's because we set idle time 5s is too small, it will cause 
> when Blockmanager call netty io, the executor has been remove because of 
> timeout.
> But not timely notify driver's BlocakManager
> {code:java}
> 19/08/14 00:00:46 WARN 
> org.apache.spark.network.server.TransportChannelHandler: "Exception in 
> connection from /host:port"
> java.io.IOException: Connection reset by peer
>  at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>  at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>  at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>  at sun.nio.ch.IOUtil.read(IOUtil.java:192)
>  at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
>  at 
> io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:288)
>  at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1106)
>  at 
> io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:343)
>  at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:123)
>  at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
>  at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
>  at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
>  at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
>  at 
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
>  at 
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
> --
> 19/08/14 00:00:46 WARN org.apache.spark.storage.BlockManagerMasterEndpoint: 
> "Error trying to remove broadcast 67 from block manager BlockManagerId(967, 
> host, port, None)"
> java.io.IOException: Connection reset by peer
>  at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>  at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>  at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>  at sun.nio.ch.IOUtil.read(IOUtil.java:192)
>  at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
>  at 
> io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:288)
>  at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1106)
>  at 
> io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:343)
>  at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:123)
>  at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
>  at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
>  at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
>  at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
>  at 
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
>  at 
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
> --
> 19/08/14 00:00:46 INFO org.apache.spark.ContextCleaner: "Cleaned accumulator 
> 162174"
> 19/08/14 00:00:46 WARN org.apache.spark.storage.BlockManagerMaster: "Failed 
> to remove shuffle 22 - Connection reset by peer"
> java.io.IOException: Connection reset by peer
>  at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>  at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39){code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org