[jira] [Created] (SPARK-46091) [KUBERNETES] Respect the existing kubernetes container SPARK_LOCAL_DIRS env
Fei Wang created SPARK-46091: Summary: [KUBERNETES] Respect the existing kubernetes container SPARK_LOCAL_DIRS env Key: SPARK-46091 URL: https://issues.apache.org/jira/browse/SPARK-46091 Project: Spark Issue Type: Improvement Components: Kubernetes Affects Versions: 3.5.0 Reporter: Fei Wang Respect the user defined SPARK_LOCAL_DIRS container env when setup local dirs. For example, we use hostPath for spark local dir. But we do not mount the sub disks directly to the pod, we mount a root path for spark driver/executor pod. For example, the root path is `/hadoop`. And there are sub disks under that, likes `hadoop/1, /hadoop/2, /hadoop/3, /hadoop4`. And we want to define the SPARK_LOCAL_DIRS in the driver/executor pod env. But now, the user specified SPARK_LOCAL_DIRS does not work. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43540) Add working directory into classpath on the driver in K8S cluster mode
[ https://issues.apache.org/jira/browse/SPARK-43540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Wang updated SPARK-43540: - Summary: Add working directory into classpath on the driver in K8S cluster mode (was: Add current working directory into classpath on the driver in K8S cluster mode) > Add working directory into classpath on the driver in K8S cluster mode > -- > > Key: SPARK-43540 > URL: https://issues.apache.org/jira/browse/SPARK-43540 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Fei Wang >Priority: Major > > In Yarn cluster modes, the passed files/jars are able to be accessed in the > classloader. Looks like this is not the case in Kubernetes cluster mode. > After SPARK-33782, it places spark.files, spark.jars and spark.files under > the current working directory on the driver in K8S cluster mode. but the > spark.files and spark.jars seems are not accessible by the classloader. > > we need to add the current working directory into classpath. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43540) Add current working directory into classpath on the driver in K8S cluster mode
[ https://issues.apache.org/jira/browse/SPARK-43540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Wang updated SPARK-43540: - Description: In Yarn cluster modes, the passed files/jars are able to be accessed in the classloader. Looks like this is not the case in Kubernetes cluster mode. After SPARK-33782, it places spark.files, spark.jars and spark.files under the current working directory on the driver in K8S cluster mode. but the spark.files and spark.jars seems are not accessible by the classloader. we need to add the current working directory into classpath. was: In Yarn cluster modes, the passed files/jars are able to be accessed in the classloader. Looks like this is not the case in Kubernetes cluster mode. After SPARK-33782, it places spark.files, spark.jars and spark.files under the current working directory on the driver in K8S cluster mode. but the spark.files and spark.jars seems are not accessible by the classloader. we need to add the current working directory to classpath. > Add current working directory into classpath on the driver in K8S cluster mode > -- > > Key: SPARK-43540 > URL: https://issues.apache.org/jira/browse/SPARK-43540 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Fei Wang >Priority: Major > > In Yarn cluster modes, the passed files/jars are able to be accessed in the > classloader. Looks like this is not the case in Kubernetes cluster mode. > After SPARK-33782, it places spark.files, spark.jars and spark.files under > the current working directory on the driver in K8S cluster mode. but the > spark.files and spark.jars seems are not accessible by the classloader. > > we need to add the current working directory into classpath. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43540) Add current working directory into classpath on the driver in K8S cluster mode
[ https://issues.apache.org/jira/browse/SPARK-43540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Wang updated SPARK-43540: - Description: In Yarn cluster modes, the passed files/jars are able to be accessed in the classloader. Looks like this is not the case in Kubernetes cluster mode. After SPARK-33782, it places spark.files, spark.jars and spark.files under the current working directory on the driver in K8S cluster mode. but the spark.files and spark.jars seems are not accessible by the classloader. we need to add the current working directory to classpath. was: In Yarn cluster modes, the passed files/jars are able to be accessed in the classloader. Looks like this is not the case in Kubernetes cluster mode. After SPARK-33782, for Kubernetes cluster mode, it places > Add current working directory into classpath on the driver in K8S cluster mode > -- > > Key: SPARK-43540 > URL: https://issues.apache.org/jira/browse/SPARK-43540 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Fei Wang >Priority: Major > > In Yarn cluster modes, the passed files/jars are able to be accessed in the > classloader. Looks like this is not the case in Kubernetes cluster mode. > After SPARK-33782, it places spark.files, spark.jars and spark.files under > the current working directory on the driver in K8S cluster mode. but the > spark.files and spark.jars seems are not accessible by the classloader. > > we need to add the current working directory to classpath. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43540) Add current working directory into classpath on the driver in K8S cluster mode
[ https://issues.apache.org/jira/browse/SPARK-43540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Wang updated SPARK-43540: - Description: In Yarn cluster modes, the passed files/jars are able to be accessed in the classloader. Looks like this is not the case in Kubernetes cluster mode. After SPARK-33782, for Kubernetes cluster mode, it places was: In Yarn cluster modes, the passed files/jars are able to be accessed in the classloader. Looks like this is not the case in Kubernetes cluster mode. > Add current working directory into classpath on the driver in K8S cluster mode > -- > > Key: SPARK-43540 > URL: https://issues.apache.org/jira/browse/SPARK-43540 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Fei Wang >Priority: Major > > In Yarn cluster modes, the passed files/jars are able to be accessed in the > classloader. Looks like this is not the case in Kubernetes cluster mode. > After SPARK-33782, for Kubernetes cluster mode, it places -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43540) Add current working directory into classpath on the driver in K8S cluster mode
Fei Wang created SPARK-43540: Summary: Add current working directory into classpath on the driver in K8S cluster mode Key: SPARK-43540 URL: https://issues.apache.org/jira/browse/SPARK-43540 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.4.0 Reporter: Fei Wang In Yarn cluster modes, the passed files/jars are able to be accessed in the classloader. Looks like this is not the case in Kubernetes cluster mode. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43504) [K8S] Mounts the hadoop config map on the executor pod
[ https://issues.apache.org/jira/browse/SPARK-43504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Wang updated SPARK-43504: - Summary: [K8S] Mounts the hadoop config map on the executor pod (was: [K8S] Mount hadoop config map in executor side) > [K8S] Mounts the hadoop config map on the executor pod > -- > > Key: SPARK-43504 > URL: https://issues.apache.org/jira/browse/SPARK-43504 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.4.0 >Reporter: Fei Wang >Priority: Major > > Since SPARK-25815 [,|https://github.com/apache/spark/pull/22911,] the hadoop > config map is not in executor side. > Per the [https://github.com/apache/spark/pull/22911] description: > {code:java} > The main two things that don't need to happen in executors anymore are: > 1. adding the Hadoop config to the executor pods: this is not needed > since the Spark driver will serialize the Hadoop config and send > it to executors when running tasks. {code} > But in fact, the executor still need the hadoop configuration. > > !https://user-images.githubusercontent.com/6757692/238268640-8ff41144-5812-4232-b572-2de2408348ed.png! > > As shown in above picture, the driver can resolve `hdfs://zeus`, but the > executor can not. > so we still need to mount the hadoop config map in executor side. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43504) [K8S] Mounts the hadoop config map on the executor pod
[ https://issues.apache.org/jira/browse/SPARK-43504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Wang updated SPARK-43504: - Description: Since SPARK-25815 [,|https://github.com/apache/spark/pull/22911,] the hadoop config map will not be mounted on the executor pod. Per the [https://github.com/apache/spark/pull/22911] description: {code:java} The main two things that don't need to happen in executors anymore are: 1. adding the Hadoop config to the executor pods: this is not needed since the Spark driver will serialize the Hadoop config and send it to executors when running tasks. {code} But in fact, the executor still need the hadoop configuration. !https://user-images.githubusercontent.com/6757692/238268640-8ff41144-5812-4232-b572-2de2408348ed.png! As shown in above picture, the driver can resolve `hdfs://zeus`, but the executor can not. so we still need to mount the hadoop config map in executor side. was: Since SPARK-25815 [,|https://github.com/apache/spark/pull/22911,] the hadoop config map is not in executor side. Per the [https://github.com/apache/spark/pull/22911] description: {code:java} The main two things that don't need to happen in executors anymore are: 1. adding the Hadoop config to the executor pods: this is not needed since the Spark driver will serialize the Hadoop config and send it to executors when running tasks. {code} But in fact, the executor still need the hadoop configuration. !https://user-images.githubusercontent.com/6757692/238268640-8ff41144-5812-4232-b572-2de2408348ed.png! As shown in above picture, the driver can resolve `hdfs://zeus`, but the executor can not. so we still need to mount the hadoop config map in executor side. > [K8S] Mounts the hadoop config map on the executor pod > -- > > Key: SPARK-43504 > URL: https://issues.apache.org/jira/browse/SPARK-43504 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.4.0 >Reporter: Fei Wang >Priority: Major > > Since SPARK-25815 [,|https://github.com/apache/spark/pull/22911,] the hadoop > config map will not be mounted on the executor pod. > Per the [https://github.com/apache/spark/pull/22911] description: > {code:java} > The main two things that don't need to happen in executors anymore are: > 1. adding the Hadoop config to the executor pods: this is not needed > since the Spark driver will serialize the Hadoop config and send > it to executors when running tasks. {code} > But in fact, the executor still need the hadoop configuration. > > !https://user-images.githubusercontent.com/6757692/238268640-8ff41144-5812-4232-b572-2de2408348ed.png! > > As shown in above picture, the driver can resolve `hdfs://zeus`, but the > executor can not. > so we still need to mount the hadoop config map in executor side. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43504) [K8S] Mount hadoop config map in executor side
[ https://issues.apache.org/jira/browse/SPARK-43504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Wang updated SPARK-43504: - Description: Since SPARK-25815 [,|https://github.com/apache/spark/pull/22911,] the hadoop config map is not in executor side. Per the [https://github.com/apache/spark/pull/22911] description: {code:java} The main two things that don't need to happen in executors anymore are: 1. adding the Hadoop config to the executor pods: this is not needed since the Spark driver will serialize the Hadoop config and send it to executors when running tasks. {code} But in fact, the executor still need the hadoop configuration. !https://user-images.githubusercontent.com/6757692/238268640-8ff41144-5812-4232-b572-2de2408348ed.png! As shown in above picture, the driver can resolve `hdfs://zeus`, but the executor can not. so we still need to mount the hadoop config map in executor side. was: Since SPARK-25815 [,|https://github.com/apache/spark/pull/22911,] the hadoop config map is not mounted in executor side. Per the [https://github.com/apache/spark/pull/22911|https://github.com/apache/spark/pull/22911,] description: {code:java} The main two things that don't need to happen in executors anymore are: 1. adding the Hadoop config to the executor pods: this is not needed since the Spark driver will serialize the Hadoop config and send it to executors when running tasks. {code} But in fact, the executor still need the hadoop configuration. !https://user-images.githubusercontent.com/6757692/238268640-8ff41144-5812-4232-b572-2de2408348ed.png! As shown in above picture, the driver can resolve `hdfs://zeus`, but the executor can not. so we still need to mount the hadoop config map in executor side. > [K8S] Mount hadoop config map in executor side > -- > > Key: SPARK-43504 > URL: https://issues.apache.org/jira/browse/SPARK-43504 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.4.0 >Reporter: Fei Wang >Priority: Major > > Since SPARK-25815 [,|https://github.com/apache/spark/pull/22911,] the hadoop > config map is not in executor side. > Per the [https://github.com/apache/spark/pull/22911] description: > {code:java} > The main two things that don't need to happen in executors anymore are: > 1. adding the Hadoop config to the executor pods: this is not needed > since the Spark driver will serialize the Hadoop config and send > it to executors when running tasks. {code} > But in fact, the executor still need the hadoop configuration. > > !https://user-images.githubusercontent.com/6757692/238268640-8ff41144-5812-4232-b572-2de2408348ed.png! > > As shown in above picture, the driver can resolve `hdfs://zeus`, but the > executor can not. > so we still need to mount the hadoop config map in executor side. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43504) [K8S] Mount hadoop config map in executor side
[ https://issues.apache.org/jira/browse/SPARK-43504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Wang updated SPARK-43504: - Description: Since SPARK-25815 [,|https://github.com/apache/spark/pull/22911,] the hadoop config map is not mounted in executor side. Per the [https://github.com/apache/spark/pull/22911|https://github.com/apache/spark/pull/22911,] description: {code:java} The main two things that don't need to happen in executors anymore are: 1. adding the Hadoop config to the executor pods: this is not needed since the Spark driver will serialize the Hadoop config and send it to executors when running tasks. {code} But in fact, the executor still need the hadoop configuration. !https://user-images.githubusercontent.com/6757692/238268640-8ff41144-5812-4232-b572-2de2408348ed.png! As shown in above picture, the driver can resolve `hdfs://zeus`, but the executor can not. so we still need to mount the hadoop config map in executor side. was: Since SPARK-25815 [,|https://github.com/apache/spark/pull/22911,] the hadoop config map is not in executor side. Per the [https://github.com/apache/spark/pull/22911|https://github.com/apache/spark/pull/22911,] description: {code:java} The main two things that don't need to happen in executors anymore are: 1. adding the Hadoop config to the executor pods: this is not needed since the Spark driver will serialize the Hadoop config and send it to executors when running tasks. {code} But in fact, the executor still need the hadoop configuration. !https://user-images.githubusercontent.com/6757692/238268640-8ff41144-5812-4232-b572-2de2408348ed.png! As shown in above picture, the driver can resolve `hdfs://zeus`, but the executor can not. so we still need to mount the hadoop config map in executor side. > [K8S] Mount hadoop config map in executor side > -- > > Key: SPARK-43504 > URL: https://issues.apache.org/jira/browse/SPARK-43504 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.4.0 >Reporter: Fei Wang >Priority: Major > > Since SPARK-25815 [,|https://github.com/apache/spark/pull/22911,] the hadoop > config map is not mounted in executor side. > Per the > [https://github.com/apache/spark/pull/22911|https://github.com/apache/spark/pull/22911,] > description: > {code:java} > The main two things that don't need to happen in executors anymore are: > 1. adding the Hadoop config to the executor pods: this is not needed > since the Spark driver will serialize the Hadoop config and send > it to executors when running tasks. {code} > But in fact, the executor still need the hadoop configuration. > > !https://user-images.githubusercontent.com/6757692/238268640-8ff41144-5812-4232-b572-2de2408348ed.png! > > As shown in above picture, the driver can resolve `hdfs://zeus`, but the > executor can not. > so we still need to mount the hadoop config map in executor side. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43504) [K8S] Mount hadoop config map in executor side
[ https://issues.apache.org/jira/browse/SPARK-43504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17722674#comment-17722674 ] Fei Wang commented on SPARK-43504: -- gentle ping [~vanzin] [~dongjoon] [~ifilonenko] > [K8S] Mount hadoop config map in executor side > -- > > Key: SPARK-43504 > URL: https://issues.apache.org/jira/browse/SPARK-43504 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.4.0 >Reporter: Fei Wang >Priority: Major > > Since SPARK-25815 [,|https://github.com/apache/spark/pull/22911,] the hadoop > config map is not in executor side. > Per the > [https://github.com/apache/spark/pull/22911|https://github.com/apache/spark/pull/22911,] > description: > {code:java} > The main two things that don't need to happen in executors anymore are: > 1. adding the Hadoop config to the executor pods: this is not needed > since the Spark driver will serialize the Hadoop config and send > it to executors when running tasks. {code} > But in fact, the executor still need the hadoop configuration. > > !https://user-images.githubusercontent.com/6757692/238268640-8ff41144-5812-4232-b572-2de2408348ed.png! > > As shown in above picture, the driver can resolve `hdfs://zeus`, but the > executor can not. > so we still need to mount the hadoop config map in executor side. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43504) [K8S] Mount hadoop config map in executor side
[ https://issues.apache.org/jira/browse/SPARK-43504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Wang updated SPARK-43504: - Description: Since SPARK-25815 [,|https://github.com/apache/spark/pull/22911,] the hadoop config map is not in executor side. Per the [https://github.com/apache/spark/pull/22911|https://github.com/apache/spark/pull/22911,] description: {code:java} The main two things that don't need to happen in executors anymore are: 1. adding the Hadoop config to the executor pods: this is not needed since the Spark driver will serialize the Hadoop config and send it to executors when running tasks. {code} But in fact, the executor still need the hadoop configuration. !https://user-images.githubusercontent.com/6757692/238268640-8ff41144-5812-4232-b572-2de2408348ed.png! As shown in above picture, the driver can resolve `hdfs://zeus`, but the executor can not. so we still need to mount the hadoop config map in executor side. was: Since SPARK-25815[,|https://github.com/apache/spark/pull/22911,] the hadoop config map is not in executor side. Per the [https://github.com/apache/spark/pull/22911|https://github.com/apache/spark/pull/22911,] description: {code:java} The main two things that don't need to happen in executors anymore are: 1. adding the Hadoop config to the executor pods: this is not needed since the Spark driver will serialize the Hadoop config and send it to executors when running tasks. {code} But in fact, the executor still need the hadoop configuration. !https://user-images.githubusercontent.com/6757692/238268640-8ff41144-5812-4232-b572-2de2408348ed.png! As shown in above picture, the driver can resolve `hdfs://zeus`, but the executor can not. > [K8S] Mount hadoop config map in executor side > -- > > Key: SPARK-43504 > URL: https://issues.apache.org/jira/browse/SPARK-43504 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.4.0 >Reporter: Fei Wang >Priority: Major > > Since SPARK-25815 [,|https://github.com/apache/spark/pull/22911,] the hadoop > config map is not in executor side. > Per the > [https://github.com/apache/spark/pull/22911|https://github.com/apache/spark/pull/22911,] > description: > {code:java} > The main two things that don't need to happen in executors anymore are: > 1. adding the Hadoop config to the executor pods: this is not needed > since the Spark driver will serialize the Hadoop config and send > it to executors when running tasks. {code} > But in fact, the executor still need the hadoop configuration. > > !https://user-images.githubusercontent.com/6757692/238268640-8ff41144-5812-4232-b572-2de2408348ed.png! > > As shown in above picture, the driver can resolve `hdfs://zeus`, but the > executor can not. > so we still need to mount the hadoop config map in executor side. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43504) [K8S] Mount hadoop config map in executor side
Fei Wang created SPARK-43504: Summary: [K8S] Mount hadoop config map in executor side Key: SPARK-43504 URL: https://issues.apache.org/jira/browse/SPARK-43504 Project: Spark Issue Type: Improvement Components: Kubernetes Affects Versions: 3.4.0 Reporter: Fei Wang Since SPARK-25815[,|https://github.com/apache/spark/pull/22911,] the hadoop config map is not in executor side. Per the [https://github.com/apache/spark/pull/22911|https://github.com/apache/spark/pull/22911,] description: {code:java} The main two things that don't need to happen in executors anymore are: 1. adding the Hadoop config to the executor pods: this is not needed since the Spark driver will serialize the Hadoop config and send it to executors when running tasks. {code} But in fact, the executor still need the hadoop configuration. !https://user-images.githubusercontent.com/6757692/238268640-8ff41144-5812-4232-b572-2de2408348ed.png! As shown in above picture, the driver can resolve `hdfs://zeus`, but the executor can not. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43419) [K8S] Make limit.cores be able to be fallen back to request.cores
[ https://issues.apache.org/jira/browse/SPARK-43419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Wang updated SPARK-43419: - Description: make limit.cores be able to be fallen back to request.cores now without limit.cores, we will meet below issue: {code:java} io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https:///api/v1/namespaces/hadooptessns/pods. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. pods "" is forbidden: failed quota: high-qos-limit-requests: must specify limits.cpu. {code} If spark.kubernetes.executor/driver.limit.cores is not specified, how about treating request.cores as limit.cores? was: make limit.cores be able to be fallen back to request.cores now without limit.cores, we will meet below issue: {code:java} io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https:///api/v1/namespaces/hadooptessns/pods. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. pods "" is forbidden: failed quota: high-qos-limit-requests: must specify limits.cpu. {code} If spark.kubernetes.executor/driver.limit.cores, how about treat request.cores as limit.cores? > [K8S] Make limit.cores be able to be fallen back to request.cores > - > > Key: SPARK-43419 > URL: https://issues.apache.org/jira/browse/SPARK-43419 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.4.0 >Reporter: Fei Wang >Priority: Major > > make limit.cores be able to be fallen back to request.cores > now without limit.cores, we will meet below issue: > {code:java} > io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: > POST at: https:///api/v1/namespaces/hadooptessns/pods. Message: > Forbidden!Configured service account doesn't have access. Service account may > have been revoked. pods "" is forbidden: failed quota: > high-qos-limit-requests: must specify limits.cpu. {code} > If spark.kubernetes.executor/driver.limit.cores is not specified, how about > treating request.cores as limit.cores? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43419) [K8S] Make limit.cores be able to be fallen back to request.cores
[ https://issues.apache.org/jira/browse/SPARK-43419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Wang updated SPARK-43419: - Description: make limit.cores be able to be fallen back to request.cores now without limit.cores, we will meet below issue: {code:java} io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https:///api/v1/namespaces/hadooptessns/pods. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. pods "" is forbidden: failed quota: high-qos-limit-requests: must specify limits.cpu. {code} If spark.kubernetes.executor/driver.limit.cores, how about treat request.cores as limit.cores? was: make limit.cores be able to be fallen back to request.cores now without limit.cores, we will meet below issue: {code:java} io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https:///api/v1/namespaces/hadooptessns/pods. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. pods "" is forbidden: failed quota: high-qos-limit-requests: must specify limits.cpu. {code} If spark.kubernetes.executor/driver.limit.cores, treat request.cores as limit.cores. > [K8S] Make limit.cores be able to be fallen back to request.cores > - > > Key: SPARK-43419 > URL: https://issues.apache.org/jira/browse/SPARK-43419 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.4.0 >Reporter: Fei Wang >Priority: Major > > make limit.cores be able to be fallen back to request.cores > now without limit.cores, we will meet below issue: > {code:java} > io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: > POST at: https:///api/v1/namespaces/hadooptessns/pods. Message: > Forbidden!Configured service account doesn't have access. Service account may > have been revoked. pods "" is forbidden: failed quota: > high-qos-limit-requests: must specify limits.cpu. {code} > If spark.kubernetes.executor/driver.limit.cores, how about treat > request.cores as limit.cores? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43419) [K8S] Make limit.cores be able to be fallen back to request.cores
[ https://issues.apache.org/jira/browse/SPARK-43419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Wang updated SPARK-43419: - Description: make limit.cores be able to be fallen back to request.cores now without limit.cores, we will meet below issue: {code:java} io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https:///api/v1/namespaces/hadooptessns/pods. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. pods "" is forbidden: failed quota: high-qos-limit-requests: must specify limits.cpu. {code} If spark.kubernetes.executor/driver.limit.cores, treat request.cores as limit.cores. was: make limit.cores be able to be fallen back to request.cores If spark.kubernetes.executor/driver.limit.cores, treat request.cores as limit.cores. > [K8S] Make limit.cores be able to be fallen back to request.cores > - > > Key: SPARK-43419 > URL: https://issues.apache.org/jira/browse/SPARK-43419 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.4.0 >Reporter: Fei Wang >Priority: Major > > make limit.cores be able to be fallen back to request.cores > now without limit.cores, we will meet below issue: > {code:java} > io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: > POST at: https:///api/v1/namespaces/hadooptessns/pods. Message: > Forbidden!Configured service account doesn't have access. Service account may > have been revoked. pods "" is forbidden: failed quota: > high-qos-limit-requests: must specify limits.cpu. {code} > If spark.kubernetes.executor/driver.limit.cores, treat request.cores as > limit.cores. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43419) [K8S] Make limit.cores be able to be fallen back to request.cores
Fei Wang created SPARK-43419: Summary: [K8S] Make limit.cores be able to be fallen back to request.cores Key: SPARK-43419 URL: https://issues.apache.org/jira/browse/SPARK-43419 Project: Spark Issue Type: Improvement Components: Kubernetes Affects Versions: 3.4.0 Reporter: Fei Wang make limit.cores be able to be fallen back to request.cores If spark.kubernetes.executor/driver.limit.cores, treat request.cores as limit.cores. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20184) performance regression for complex/long sql when enable whole stage codegen
[ https://issues.apache.org/jira/browse/SPARK-20184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15965735#comment-15965735 ] Fei Wang commented on SPARK-20184: -- Also use the master branch to test my test case: 1. Java version 192:spark wangfei$ java -version java version "1.8.0_65" Java(TM) SE Runtime Environment (build 1.8.0_65-b17) Java HotSpot(TM) 64-Bit Server VM (build 25.65-b01, mixed mode) 2. Spark starting cmd 192:spark wangfei$ bin/spark-sql --master local[4] --driver-memory 16g 3. test result: sql: {code} select dim_1, dim_2, sum(c1), sum(c2), sum(c3), sum(c4), sum(c5), sum(c6), sum(c7), sum(c8), sum(c9), sum(c10), sum(c11), sum(c12), sum(c13), sum(c14), sum(c15), sum(c16), sum(c17), sum(c18), sum(c19), sum(c20) from sum_table_50w_3 group by dim_1, dim_2 limit 100; {code} codegen on: about 1.4s codegen off: about 0.6s > performance regression for complex/long sql when enable whole stage codegen > --- > > Key: SPARK-20184 > URL: https://issues.apache.org/jira/browse/SPARK-20184 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0, 2.1.0 >Reporter: Fei Wang > > The performance of following SQL get much worse in spark 2.x in contrast > with codegen off. > SELECT >sum(COUNTER_57) > ,sum(COUNTER_71) > ,sum(COUNTER_3) > ,sum(COUNTER_70) > ,sum(COUNTER_66) > ,sum(COUNTER_75) > ,sum(COUNTER_69) > ,sum(COUNTER_55) > ,sum(COUNTER_63) > ,sum(COUNTER_68) > ,sum(COUNTER_56) > ,sum(COUNTER_37) > ,sum(COUNTER_51) > ,sum(COUNTER_42) > ,sum(COUNTER_43) > ,sum(COUNTER_1) > ,sum(COUNTER_76) > ,sum(COUNTER_54) > ,sum(COUNTER_44) > ,sum(COUNTER_46) > ,DIM_1 > ,DIM_2 > ,DIM_3 > FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100; > Num of rows of aggtable is about 3500. > whole stage codegen on(spark.sql.codegen.wholeStage = true):40s > whole stage codegen off(spark.sql.codegen.wholeStage = false):6s > After some analysis i think this is related to the huge java method(a java > method of thousand lines) which generated by codegen. > And If i config -XX:-DontCompileHugeMethods the performance get much > better(about 7s). -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-20184) performance regression for complex/long sql when enable whole stage codegen
[ https://issues.apache.org/jira/browse/SPARK-20184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15965597#comment-15965597 ] Fei Wang edited comment on SPARK-20184 at 4/12/17 9:21 AM: --- try this : 1. create table {code} val df = (1 to 50).map(x => (x.toString, x.toString, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x)).toDF("dim_1", "dim_2", "c1", "c2", "c3", "c4", "c5", "c6", "c7", "c8", "c9", "c10","c11", "c12", "c13", "c14", "c15", "c16", "c17", "c18", "c19", "c20") df.write.saveAsTable("sum_table_50w_3") {code} 2. query the table {code} select dim_1, dim_2, sum(c1), sum(c2), sum(c3), sum(c4), sum(c5), sum(c6), sum(c7), sum(c8), sum(c9), sum(c10), sum(c11), sum(c12), sum(c13), sum(c14), sum(c15), sum(c16), sum(c17), sum(c18), sum(c19), sum(c20) from sum_table_50w_3 group by dim_1, dim_2 limit 100 {code} was (Author: scwf): try this : 1. create table {code} val df = (1 to 50).map(x => (x.toString, x.toString, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x)).toDF("dim_1", "dim_2", "c1", "c2", "c3", "c4", "c5", "c6", "c7", "c8", "c9", "c10","c11", "c12", "c13", "c14", "c15", "c16", "c17", "c18", "c19", "c20") df.write.saveAsTable("sum_table_50w_3") {code} 2. query the table select dim_1, dim_2, sum(c1), sum(c2), sum(c3), sum(c4), sum(c5), sum(c6), sum(c7), sum(c8), sum(c9), sum(c10), sum(c11), sum(c12), sum(c13), sum(c14), sum(c15), sum(c16), sum(c17), sum(c18), sum(c19), sum(c20) from sum_table_50w_3 group by dim_1, dim_2 limit 100 > performance regression for complex/long sql when enable whole stage codegen > --- > > Key: SPARK-20184 > URL: https://issues.apache.org/jira/browse/SPARK-20184 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0, 2.1.0 >Reporter: Fei Wang > > The performance of following SQL get much worse in spark 2.x in contrast > with codegen off. > SELECT >sum(COUNTER_57) > ,sum(COUNTER_71) > ,sum(COUNTER_3) > ,sum(COUNTER_70) > ,sum(COUNTER_66) > ,sum(COUNTER_75) > ,sum(COUNTER_69) > ,sum(COUNTER_55) > ,sum(COUNTER_63) > ,sum(COUNTER_68) > ,sum(COUNTER_56) > ,sum(COUNTER_37) > ,sum(COUNTER_51) > ,sum(COUNTER_42) > ,sum(COUNTER_43) > ,sum(COUNTER_1) > ,sum(COUNTER_76) > ,sum(COUNTER_54) > ,sum(COUNTER_44) > ,sum(COUNTER_46) > ,DIM_1 > ,DIM_2 > ,DIM_3 > FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100; > Num of rows of aggtable is about 3500. > whole stage codegen on(spark.sql.codegen.wholeStage = true):40s > whole stage codegen off(spark.sql.codegen.wholeStage = false):6s > After some analysis i think this is related to the huge java method(a java > method of thousand lines) which generated by codegen. > And If i config -XX:-DontCompileHugeMethods the performance get much > better(about 7s). -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-20184) performance regression for complex/long sql when enable whole stage codegen
[ https://issues.apache.org/jira/browse/SPARK-20184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15965597#comment-15965597 ] Fei Wang edited comment on SPARK-20184 at 4/12/17 9:21 AM: --- try this : 1. create table {code} val df = (1 to 50).map(x => (x.toString, x.toString, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x)).toDF("dim_1", "dim_2", "c1", "c2", "c3", "c4", "c5", "c6", "c7", "c8", "c9", "c10","c11", "c12", "c13", "c14", "c15", "c16", "c17", "c18", "c19", "c20") df.write.saveAsTable("sum_table_50w_3") {code} 2. query the table select dim_1, dim_2, sum(c1), sum(c2), sum(c3), sum(c4), sum(c5), sum(c6), sum(c7), sum(c8), sum(c9), sum(c10), sum(c11), sum(c12), sum(c13), sum(c14), sum(c15), sum(c16), sum(c17), sum(c18), sum(c19), sum(c20) from sum_table_50w_3 group by dim_1, dim_2 limit 100 was (Author: scwf): try this : 1. create table [code] val df = (1 to 50).map(x => (x.toString, x.toString, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x)).toDF("dim_1", "dim_2", "c1", "c2", "c3", "c4", "c5", "c6", "c7", "c8", "c9", "c10","c11", "c12", "c13", "c14", "c15", "c16", "c17", "c18", "c19", "c20") df.write.saveAsTable("sum_table_50w_3") df.write.format("csv").saveAsTable("sum_table_50w_1") [code] 2. query the table select dim_1, dim_2, sum(c1), sum(c2), sum(c3), sum(c4), sum(c5), sum(c6), sum(c7), sum(c8), sum(c9), sum(c10), sum(c11), sum(c12), sum(c13), sum(c14), sum(c15), sum(c16), sum(c17), sum(c18), sum(c19), sum(c20) from sum_table_50w_3 group by dim_1, dim_2 limit 100 > performance regression for complex/long sql when enable whole stage codegen > --- > > Key: SPARK-20184 > URL: https://issues.apache.org/jira/browse/SPARK-20184 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0, 2.1.0 >Reporter: Fei Wang > > The performance of following SQL get much worse in spark 2.x in contrast > with codegen off. > SELECT >sum(COUNTER_57) > ,sum(COUNTER_71) > ,sum(COUNTER_3) > ,sum(COUNTER_70) > ,sum(COUNTER_66) > ,sum(COUNTER_75) > ,sum(COUNTER_69) > ,sum(COUNTER_55) > ,sum(COUNTER_63) > ,sum(COUNTER_68) > ,sum(COUNTER_56) > ,sum(COUNTER_37) > ,sum(COUNTER_51) > ,sum(COUNTER_42) > ,sum(COUNTER_43) > ,sum(COUNTER_1) > ,sum(COUNTER_76) > ,sum(COUNTER_54) > ,sum(COUNTER_44) > ,sum(COUNTER_46) > ,DIM_1 > ,DIM_2 > ,DIM_3 > FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100; > Num of rows of aggtable is about 3500. > whole stage codegen on(spark.sql.codegen.wholeStage = true):40s > whole stage codegen off(spark.sql.codegen.wholeStage = false):6s > After some analysis i think this is related to the huge java method(a java > method of thousand lines) which generated by codegen. > And If i config -XX:-DontCompileHugeMethods the performance get much > better(about 7s). -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20184) performance regression for complex/long sql when enable whole stage codegen
[ https://issues.apache.org/jira/browse/SPARK-20184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15965597#comment-15965597 ] Fei Wang commented on SPARK-20184: -- try this : 1. create table [code] val df = (1 to 50).map(x => (x.toString, x.toString, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x)).toDF("dim_1", "dim_2", "c1", "c2", "c3", "c4", "c5", "c6", "c7", "c8", "c9", "c10","c11", "c12", "c13", "c14", "c15", "c16", "c17", "c18", "c19", "c20") df.write.saveAsTable("sum_table_50w_3") df.write.format("csv").saveAsTable("sum_table_50w_1") [code] 2. query the table select dim_1, dim_2, sum(c1), sum(c2), sum(c3), sum(c4), sum(c5), sum(c6), sum(c7), sum(c8), sum(c9), sum(c10), sum(c11), sum(c12), sum(c13), sum(c14), sum(c15), sum(c16), sum(c17), sum(c18), sum(c19), sum(c20) from sum_table_50w_3 group by dim_1, dim_2 limit 100 > performance regression for complex/long sql when enable whole stage codegen > --- > > Key: SPARK-20184 > URL: https://issues.apache.org/jira/browse/SPARK-20184 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0, 2.1.0 >Reporter: Fei Wang > > The performance of following SQL get much worse in spark 2.x in contrast > with codegen off. > SELECT >sum(COUNTER_57) > ,sum(COUNTER_71) > ,sum(COUNTER_3) > ,sum(COUNTER_70) > ,sum(COUNTER_66) > ,sum(COUNTER_75) > ,sum(COUNTER_69) > ,sum(COUNTER_55) > ,sum(COUNTER_63) > ,sum(COUNTER_68) > ,sum(COUNTER_56) > ,sum(COUNTER_37) > ,sum(COUNTER_51) > ,sum(COUNTER_42) > ,sum(COUNTER_43) > ,sum(COUNTER_1) > ,sum(COUNTER_76) > ,sum(COUNTER_54) > ,sum(COUNTER_44) > ,sum(COUNTER_46) > ,DIM_1 > ,DIM_2 > ,DIM_3 > FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100; > Num of rows of aggtable is about 3500. > whole stage codegen on(spark.sql.codegen.wholeStage = true):40s > whole stage codegen off(spark.sql.codegen.wholeStage = false):6s > After some analysis i think this is related to the huge java method(a java > method of thousand lines) which generated by codegen. > And If i config -XX:-DontCompileHugeMethods the performance get much > better(about 7s). -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20184) performance regression for complex/long sql when enable whole stage codegen
[ https://issues.apache.org/jira/browse/SPARK-20184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15965308#comment-15965308 ] Fei Wang commented on SPARK-20184: -- Tested with a smaller table 100,000 rows. Codegen on: 2.6s Codegen off: 1.5s > performance regression for complex/long sql when enable whole stage codegen > --- > > Key: SPARK-20184 > URL: https://issues.apache.org/jira/browse/SPARK-20184 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0, 2.1.0 >Reporter: Fei Wang > > The performance of following SQL get much worse in spark 2.x in contrast > with codegen off. > SELECT >sum(COUNTER_57) > ,sum(COUNTER_71) > ,sum(COUNTER_3) > ,sum(COUNTER_70) > ,sum(COUNTER_66) > ,sum(COUNTER_75) > ,sum(COUNTER_69) > ,sum(COUNTER_55) > ,sum(COUNTER_63) > ,sum(COUNTER_68) > ,sum(COUNTER_56) > ,sum(COUNTER_37) > ,sum(COUNTER_51) > ,sum(COUNTER_42) > ,sum(COUNTER_43) > ,sum(COUNTER_1) > ,sum(COUNTER_76) > ,sum(COUNTER_54) > ,sum(COUNTER_44) > ,sum(COUNTER_46) > ,DIM_1 > ,DIM_2 > ,DIM_3 > FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100; > Num of rows of aggtable is about 3500. > whole stage codegen on(spark.sql.codegen.wholeStage = true):40s > whole stage codegen off(spark.sql.codegen.wholeStage = false):6s > After some analysis i think this is related to the huge java method(a java > method of thousand lines) which generated by codegen. > And If i config -XX:-DontCompileHugeMethods the performance get much > better(about 7s). -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20184) performance regression for complex/long sql when enable whole stage codegen
[ https://issues.apache.org/jira/browse/SPARK-20184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Wang updated SPARK-20184: - Summary: performance regression for complex/long sql when enable whole stage codegen (was: performance regression for complex/long sql when enable codegen) > performance regression for complex/long sql when enable whole stage codegen > --- > > Key: SPARK-20184 > URL: https://issues.apache.org/jira/browse/SPARK-20184 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0, 2.1.0 >Reporter: Fei Wang > > The performance of following SQL get much worse in spark 2.x in contrast > with codegen off. > SELECT >sum(COUNTER_57) > ,sum(COUNTER_71) > ,sum(COUNTER_3) > ,sum(COUNTER_70) > ,sum(COUNTER_66) > ,sum(COUNTER_75) > ,sum(COUNTER_69) > ,sum(COUNTER_55) > ,sum(COUNTER_63) > ,sum(COUNTER_68) > ,sum(COUNTER_56) > ,sum(COUNTER_37) > ,sum(COUNTER_51) > ,sum(COUNTER_42) > ,sum(COUNTER_43) > ,sum(COUNTER_1) > ,sum(COUNTER_76) > ,sum(COUNTER_54) > ,sum(COUNTER_44) > ,sum(COUNTER_46) > ,DIM_1 > ,DIM_2 > ,DIM_3 > FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100; > Num of rows of aggtable is about 3500. > whole stage codegen on(spark.sql.codegen.wholeStage = true):40s > whole stage codegen off(spark.sql.codegen.wholeStage = false):6s > After some analysis i think this is related to the huge java method(a java > method of thousand lines) which generated by codegen. > And If i config -XX:-DontCompileHugeMethods the performance get much > better(about 7s). -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20184) performance regression for complex/long sql when enable codegen
[ https://issues.apache.org/jira/browse/SPARK-20184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Wang updated SPARK-20184: - Description: The performance of following SQL get much worse in spark 2.x in contrast with codegen off. SELECT sum(COUNTER_57) ,sum(COUNTER_71) ,sum(COUNTER_3) ,sum(COUNTER_70) ,sum(COUNTER_66) ,sum(COUNTER_75) ,sum(COUNTER_69) ,sum(COUNTER_55) ,sum(COUNTER_63) ,sum(COUNTER_68) ,sum(COUNTER_56) ,sum(COUNTER_37) ,sum(COUNTER_51) ,sum(COUNTER_42) ,sum(COUNTER_43) ,sum(COUNTER_1) ,sum(COUNTER_76) ,sum(COUNTER_54) ,sum(COUNTER_44) ,sum(COUNTER_46) ,DIM_1 ,DIM_2 ,DIM_3 FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100; Num of rows of aggtable is about 3500. whole stage codegen on(spark.sql.codegen.wholeStage = true):40s whole stage codegen off(spark.sql.codegen.wholeStage = false):6s After some analysis i think this is related to the huge java method(a java method of thousand lines) which generated by codegen. And If i config -XX:-DontCompileHugeMethods the performance get much better(about 7s). was: The performance of following SQL get much worse in spark 2.x in contrast with codegen off. SELECT sum(COUNTER_57) ,sum(COUNTER_71) ,sum(COUNTER_3) ,sum(COUNTER_70) ,sum(COUNTER_66) ,sum(COUNTER_75) ,sum(COUNTER_69) ,sum(COUNTER_55) ,sum(COUNTER_63) ,sum(COUNTER_68) ,sum(COUNTER_56) ,sum(COUNTER_37) ,sum(COUNTER_51) ,sum(COUNTER_42) ,sum(COUNTER_43) ,sum(COUNTER_1) ,sum(COUNTER_76) ,sum(COUNTER_54) ,sum(COUNTER_44) ,sum(COUNTER_46) ,DIM_1 ,DIM_2 ,DIM_3 FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100; Num of rows of aggtable is about 3500. whole stage codegen on(spark.sql.codegen.wholeStage = true):40s whole stage codege off(spark.sql.codegen.wholeStage = false):6s After some analysis i think this is related to the huge java method(a java method of thousand lines) which generated by codegen. And If i config -XX:-DontCompileHugeMethods the performance get much better(about 7s). > performance regression for complex/long sql when enable codegen > --- > > Key: SPARK-20184 > URL: https://issues.apache.org/jira/browse/SPARK-20184 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0, 2.1.0 >Reporter: Fei Wang > > The performance of following SQL get much worse in spark 2.x in contrast > with codegen off. > SELECT >sum(COUNTER_57) > ,sum(COUNTER_71) > ,sum(COUNTER_3) > ,sum(COUNTER_70) > ,sum(COUNTER_66) > ,sum(COUNTER_75) > ,sum(COUNTER_69) > ,sum(COUNTER_55) > ,sum(COUNTER_63) > ,sum(COUNTER_68) > ,sum(COUNTER_56) > ,sum(COUNTER_37) > ,sum(COUNTER_51) > ,sum(COUNTER_42) > ,sum(COUNTER_43) > ,sum(COUNTER_1) > ,sum(COUNTER_76) > ,sum(COUNTER_54) > ,sum(COUNTER_44) > ,sum(COUNTER_46) > ,DIM_1 > ,DIM_2 > ,DIM_3 > FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100; > Num of rows of aggtable is about 3500. > whole stage codegen on(spark.sql.codegen.wholeStage = true):40s > whole stage codegen off(spark.sql.codegen.wholeStage = false):6s > After some analysis i think this is related to the huge java method(a java > method of thousand lines) which generated by codegen. > And If i config -XX:-DontCompileHugeMethods the performance get much > better(about 7s). -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20184) performance regression for complex/long sql when enable codegen
[ https://issues.apache.org/jira/browse/SPARK-20184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Wang updated SPARK-20184: - Description: The performance of following SQL get much worse in spark 2.x in contrast with codegen off. SELECT sum(COUNTER_57) ,sum(COUNTER_71) ,sum(COUNTER_3) ,sum(COUNTER_70) ,sum(COUNTER_66) ,sum(COUNTER_75) ,sum(COUNTER_69) ,sum(COUNTER_55) ,sum(COUNTER_63) ,sum(COUNTER_68) ,sum(COUNTER_56) ,sum(COUNTER_37) ,sum(COUNTER_51) ,sum(COUNTER_42) ,sum(COUNTER_43) ,sum(COUNTER_1) ,sum(COUNTER_76) ,sum(COUNTER_54) ,sum(COUNTER_44) ,sum(COUNTER_46) ,DIM_1 ,DIM_2 ,DIM_3 FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100; Num of rows of aggtable is about 3500. whole stage codegen on(spark.sql.codegen.wholeStage = true):40s whole stage codege off(spark.sql.codegen.wholeStage = false):6s After some analysis i think this is related to the huge java method(a java method of thousand lines) which generated by codegen. And If i config -XX:-DontCompileHugeMethods the performance get much better(about 7s). was: The performance of following SQL get much worse in spark 2.x in contrast with codegen off. SELECT sum(COUNTER_57) ,sum(COUNTER_71) ,sum(COUNTER_3) ,sum(COUNTER_70) ,sum(COUNTER_66) ,sum(COUNTER_75) ,sum(COUNTER_69) ,sum(COUNTER_55) ,sum(COUNTER_63) ,sum(COUNTER_68) ,sum(COUNTER_56) ,sum(COUNTER_37) ,sum(COUNTER_51) ,sum(COUNTER_42) ,sum(COUNTER_43) ,sum(COUNTER_1) ,sum(COUNTER_76) ,sum(COUNTER_54) ,sum(COUNTER_44) ,sum(COUNTER_46) ,DIM_1 ,DIM_2 ,DIM_3 FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100; Num of rows of aggtable is about 3500. codegen on:40s codegen off:6s After some analysis i think this is related to the huge java method(a java method of thousand lines) which generated by codegen. And If i config -XX:-DontCompileHugeMethods the performance get much better(about 7s). > performance regression for complex/long sql when enable codegen > --- > > Key: SPARK-20184 > URL: https://issues.apache.org/jira/browse/SPARK-20184 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0, 2.1.0 >Reporter: Fei Wang > > The performance of following SQL get much worse in spark 2.x in contrast > with codegen off. > SELECT >sum(COUNTER_57) > ,sum(COUNTER_71) > ,sum(COUNTER_3) > ,sum(COUNTER_70) > ,sum(COUNTER_66) > ,sum(COUNTER_75) > ,sum(COUNTER_69) > ,sum(COUNTER_55) > ,sum(COUNTER_63) > ,sum(COUNTER_68) > ,sum(COUNTER_56) > ,sum(COUNTER_37) > ,sum(COUNTER_51) > ,sum(COUNTER_42) > ,sum(COUNTER_43) > ,sum(COUNTER_1) > ,sum(COUNTER_76) > ,sum(COUNTER_54) > ,sum(COUNTER_44) > ,sum(COUNTER_46) > ,DIM_1 > ,DIM_2 > ,DIM_3 > FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100; > Num of rows of aggtable is about 3500. > whole stage codegen on(spark.sql.codegen.wholeStage = true):40s > whole stage codege off(spark.sql.codegen.wholeStage = false):6s > After some analysis i think this is related to the huge java method(a java > method of thousand lines) which generated by codegen. > And If i config -XX:-DontCompileHugeMethods the performance get much > better(about 7s). -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20184) performance regression for complex/long sql when enable codegen
[ https://issues.apache.org/jira/browse/SPARK-20184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Wang updated SPARK-20184: - Description: The performance of following SQL get much worse in spark 2.x in contrast with codegen off. SELECT sum(COUNTER_57) ,sum(COUNTER_71) ,sum(COUNTER_3) ,sum(COUNTER_70) ,sum(COUNTER_66) ,sum(COUNTER_75) ,sum(COUNTER_69) ,sum(COUNTER_55) ,sum(COUNTER_63) ,sum(COUNTER_68) ,sum(COUNTER_56) ,sum(COUNTER_37) ,sum(COUNTER_51) ,sum(COUNTER_42) ,sum(COUNTER_43) ,sum(COUNTER_1) ,sum(COUNTER_76) ,sum(COUNTER_54) ,sum(COUNTER_44) ,sum(COUNTER_46) ,DIM_1 ,DIM_2 ,DIM_3 FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100; Num of rows of aggtable is about 3500. codegen on:40s codegen off:6s After some analysis i think this is related to the huge java method(a java method of thousand lines) which generated by codegen. And If i config -XX:-DontCompileHugeMethods the performance get much better(about 7s). was: The performance of following SQL get much worse in spark 2.x in contrast with codegen off. SELECT sum(COUNTER_57) ,sum(COUNTER_71) ,sum(COUNTER_3) ,sum(COUNTER_70) ,sum(COUNTER_66) ,sum(COUNTER_75) ,sum(COUNTER_69) ,sum(COUNTER_55) ,sum(COUNTER_63) ,sum(COUNTER_68) ,sum(COUNTER_56) ,sum(COUNTER_37) ,sum(COUNTER_51) ,sum(COUNTER_42) ,sum(COUNTER_43) ,sum(COUNTER_1) ,sum(COUNTER_76) ,sum(COUNTER_54) ,sum(COUNTER_44) ,sum(COUNTER_46) ,DIM_1 ,DIM_2 ,DIM_3 FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100; Num of rows of aggtable is about 3500. codegen on:40s codegen off:6s after some analysis, i think this is related to the huge java method(a java method thousand of lines) which generated when codegen on. And If i config -XX:-DontCompileHugeMethods the performance get much better. > performance regression for complex/long sql when enable codegen > --- > > Key: SPARK-20184 > URL: https://issues.apache.org/jira/browse/SPARK-20184 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0, 2.1.0 >Reporter: Fei Wang > > The performance of following SQL get much worse in spark 2.x in contrast > with codegen off. > SELECT >sum(COUNTER_57) > ,sum(COUNTER_71) > ,sum(COUNTER_3) > ,sum(COUNTER_70) > ,sum(COUNTER_66) > ,sum(COUNTER_75) > ,sum(COUNTER_69) > ,sum(COUNTER_55) > ,sum(COUNTER_63) > ,sum(COUNTER_68) > ,sum(COUNTER_56) > ,sum(COUNTER_37) > ,sum(COUNTER_51) > ,sum(COUNTER_42) > ,sum(COUNTER_43) > ,sum(COUNTER_1) > ,sum(COUNTER_76) > ,sum(COUNTER_54) > ,sum(COUNTER_44) > ,sum(COUNTER_46) > ,DIM_1 > ,DIM_2 > ,DIM_3 > FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100; > Num of rows of aggtable is about 3500. > codegen on:40s > codegen off:6s > After some analysis i think this is related to the huge java method(a java > method of thousand lines) which generated by codegen. > And If i config -XX:-DontCompileHugeMethods the performance get much > better(about 7s). -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20184) performance regression for complex/long sql when enable codegen
[ https://issues.apache.org/jira/browse/SPARK-20184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Wang updated SPARK-20184: - Description: The performance of following SQL get much worse in spark 2.x in contrast with codegen off. SELECT sum(COUNTER_57) ,sum(COUNTER_71) ,sum(COUNTER_3) ,sum(COUNTER_70) ,sum(COUNTER_66) ,sum(COUNTER_75) ,sum(COUNTER_69) ,sum(COUNTER_55) ,sum(COUNTER_63) ,sum(COUNTER_68) ,sum(COUNTER_56) ,sum(COUNTER_37) ,sum(COUNTER_51) ,sum(COUNTER_42) ,sum(COUNTER_43) ,sum(COUNTER_1) ,sum(COUNTER_76) ,sum(COUNTER_54) ,sum(COUNTER_44) ,sum(COUNTER_46) ,DIM_1 ,DIM_2 ,DIM_3 FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100; Num of rows of aggtable is about 3500. codegen on: 40s codegen off: 6s after some analysis, i think this is related to the huge java method(a java method thousand of lines) which generated when codegen on. And If i config -XX:-DontCompileHugeMethods the performance get much better. was: Execute following sql with spark 2.x when codegen enabled, the performance is much worse than the case when turn off codegen. SELECT sum(COUNTER_57) ,sum(COUNTER_71) ,sum(COUNTER_3) ,sum(COUNTER_70) ,sum(COUNTER_66) ,sum(COUNTER_75) ,sum(COUNTER_69) ,sum(COUNTER_55) ,sum(COUNTER_63) ,sum(COUNTER_68) ,sum(COUNTER_56) ,sum(COUNTER_37) ,sum(COUNTER_51) ,sum(COUNTER_42) ,sum(COUNTER_43) ,sum(COUNTER_1) ,sum(COUNTER_76) ,sum(COUNTER_54) ,sum(COUNTER_44) ,sum(COUNTER_46) ,DIM_1 ,DIM_2 ,DIM_3 FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100; Num of rows of aggtable is about 3500. codegen on: 40s codegen off: 6s after some analysis, i think this is related to the huge java method(a java method thousand of lines) which generated when codegen on. And If i config -XX:-DontCompileHugeMethods the performance get much better. > performance regression for complex/long sql when enable codegen > --- > > Key: SPARK-20184 > URL: https://issues.apache.org/jira/browse/SPARK-20184 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0, 2.1.0 >Reporter: Fei Wang > > The performance of following SQL get much worse in spark 2.x in contrast > with codegen off. > SELECT >sum(COUNTER_57) > ,sum(COUNTER_71) > ,sum(COUNTER_3) > ,sum(COUNTER_70) > ,sum(COUNTER_66) > ,sum(COUNTER_75) > ,sum(COUNTER_69) > ,sum(COUNTER_55) > ,sum(COUNTER_63) > ,sum(COUNTER_68) > ,sum(COUNTER_56) > ,sum(COUNTER_37) > ,sum(COUNTER_51) > ,sum(COUNTER_42) > ,sum(COUNTER_43) > ,sum(COUNTER_1) > ,sum(COUNTER_76) > ,sum(COUNTER_54) > ,sum(COUNTER_44) > ,sum(COUNTER_46) > ,DIM_1 > ,DIM_2 > ,DIM_3 > FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100; > Num of rows of aggtable is about 3500. > codegen on: 40s > codegen off: 6s > after some analysis, i think this is related to the huge java method(a java > method thousand of lines) which generated when codegen on. And If i config > -XX:-DontCompileHugeMethods the performance get much better. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20184) performance regression for complex/long sql when enable codegen
[ https://issues.apache.org/jira/browse/SPARK-20184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Wang updated SPARK-20184: - Description: The performance of following SQL get much worse in spark 2.x in contrast with codegen off. SELECT sum(COUNTER_57) ,sum(COUNTER_71) ,sum(COUNTER_3) ,sum(COUNTER_70) ,sum(COUNTER_66) ,sum(COUNTER_75) ,sum(COUNTER_69) ,sum(COUNTER_55) ,sum(COUNTER_63) ,sum(COUNTER_68) ,sum(COUNTER_56) ,sum(COUNTER_37) ,sum(COUNTER_51) ,sum(COUNTER_42) ,sum(COUNTER_43) ,sum(COUNTER_1) ,sum(COUNTER_76) ,sum(COUNTER_54) ,sum(COUNTER_44) ,sum(COUNTER_46) ,DIM_1 ,DIM_2 ,DIM_3 FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100; Num of rows of aggtable is about 3500. codegen on:40s codegen off:6s after some analysis, i think this is related to the huge java method(a java method thousand of lines) which generated when codegen on. And If i config -XX:-DontCompileHugeMethods the performance get much better. was: The performance of following SQL get much worse in spark 2.x in contrast with codegen off. SELECT sum(COUNTER_57) ,sum(COUNTER_71) ,sum(COUNTER_3) ,sum(COUNTER_70) ,sum(COUNTER_66) ,sum(COUNTER_75) ,sum(COUNTER_69) ,sum(COUNTER_55) ,sum(COUNTER_63) ,sum(COUNTER_68) ,sum(COUNTER_56) ,sum(COUNTER_37) ,sum(COUNTER_51) ,sum(COUNTER_42) ,sum(COUNTER_43) ,sum(COUNTER_1) ,sum(COUNTER_76) ,sum(COUNTER_54) ,sum(COUNTER_44) ,sum(COUNTER_46) ,DIM_1 ,DIM_2 ,DIM_3 FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100; Num of rows of aggtable is about 3500. codegen on: 40s codegen off:6s after some analysis, i think this is related to the huge java method(a java method thousand of lines) which generated when codegen on. And If i config -XX:-DontCompileHugeMethods the performance get much better. > performance regression for complex/long sql when enable codegen > --- > > Key: SPARK-20184 > URL: https://issues.apache.org/jira/browse/SPARK-20184 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0, 2.1.0 >Reporter: Fei Wang > > The performance of following SQL get much worse in spark 2.x in contrast > with codegen off. > SELECT >sum(COUNTER_57) > ,sum(COUNTER_71) > ,sum(COUNTER_3) > ,sum(COUNTER_70) > ,sum(COUNTER_66) > ,sum(COUNTER_75) > ,sum(COUNTER_69) > ,sum(COUNTER_55) > ,sum(COUNTER_63) > ,sum(COUNTER_68) > ,sum(COUNTER_56) > ,sum(COUNTER_37) > ,sum(COUNTER_51) > ,sum(COUNTER_42) > ,sum(COUNTER_43) > ,sum(COUNTER_1) > ,sum(COUNTER_76) > ,sum(COUNTER_54) > ,sum(COUNTER_44) > ,sum(COUNTER_46) > ,DIM_1 > ,DIM_2 > ,DIM_3 > FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100; > Num of rows of aggtable is about 3500. > codegen on:40s > codegen off:6s > after some analysis, i think this is related to the huge java method(a java > method thousand of lines) which generated when codegen on. And If i config > -XX:-DontCompileHugeMethods the performance get much better. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20184) performance regression for complex/long sql when enable codegen
[ https://issues.apache.org/jira/browse/SPARK-20184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Wang updated SPARK-20184: - Description: The performance of following SQL get much worse in spark 2.x in contrast with codegen off. SELECT sum(COUNTER_57) ,sum(COUNTER_71) ,sum(COUNTER_3) ,sum(COUNTER_70) ,sum(COUNTER_66) ,sum(COUNTER_75) ,sum(COUNTER_69) ,sum(COUNTER_55) ,sum(COUNTER_63) ,sum(COUNTER_68) ,sum(COUNTER_56) ,sum(COUNTER_37) ,sum(COUNTER_51) ,sum(COUNTER_42) ,sum(COUNTER_43) ,sum(COUNTER_1) ,sum(COUNTER_76) ,sum(COUNTER_54) ,sum(COUNTER_44) ,sum(COUNTER_46) ,DIM_1 ,DIM_2 ,DIM_3 FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100; Num of rows of aggtable is about 3500. codegen on: 40s codegen off:6s after some analysis, i think this is related to the huge java method(a java method thousand of lines) which generated when codegen on. And If i config -XX:-DontCompileHugeMethods the performance get much better. was: The performance of following SQL get much worse in spark 2.x in contrast with codegen off. SELECT sum(COUNTER_57) ,sum(COUNTER_71) ,sum(COUNTER_3) ,sum(COUNTER_70) ,sum(COUNTER_66) ,sum(COUNTER_75) ,sum(COUNTER_69) ,sum(COUNTER_55) ,sum(COUNTER_63) ,sum(COUNTER_68) ,sum(COUNTER_56) ,sum(COUNTER_37) ,sum(COUNTER_51) ,sum(COUNTER_42) ,sum(COUNTER_43) ,sum(COUNTER_1) ,sum(COUNTER_76) ,sum(COUNTER_54) ,sum(COUNTER_44) ,sum(COUNTER_46) ,DIM_1 ,DIM_2 ,DIM_3 FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100; Num of rows of aggtable is about 3500. codegen on: 40s codegen off: 6s after some analysis, i think this is related to the huge java method(a java method thousand of lines) which generated when codegen on. And If i config -XX:-DontCompileHugeMethods the performance get much better. > performance regression for complex/long sql when enable codegen > --- > > Key: SPARK-20184 > URL: https://issues.apache.org/jira/browse/SPARK-20184 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0, 2.1.0 >Reporter: Fei Wang > > The performance of following SQL get much worse in spark 2.x in contrast > with codegen off. > SELECT >sum(COUNTER_57) > ,sum(COUNTER_71) > ,sum(COUNTER_3) > ,sum(COUNTER_70) > ,sum(COUNTER_66) > ,sum(COUNTER_75) > ,sum(COUNTER_69) > ,sum(COUNTER_55) > ,sum(COUNTER_63) > ,sum(COUNTER_68) > ,sum(COUNTER_56) > ,sum(COUNTER_37) > ,sum(COUNTER_51) > ,sum(COUNTER_42) > ,sum(COUNTER_43) > ,sum(COUNTER_1) > ,sum(COUNTER_76) > ,sum(COUNTER_54) > ,sum(COUNTER_44) > ,sum(COUNTER_46) > ,DIM_1 > ,DIM_2 > ,DIM_3 > FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100; > Num of rows of aggtable is about 3500. > codegen on: 40s > codegen off:6s > after some analysis, i think this is related to the huge java method(a java > method thousand of lines) which generated when codegen on. And If i config > -XX:-DontCompileHugeMethods the performance get much better. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20184) performance regression for complex/long sql when enable codegen
[ https://issues.apache.org/jira/browse/SPARK-20184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952123#comment-15952123 ] Fei Wang commented on SPARK-20184: -- [~r...@databricks.com] [~davies] May be we need split the huge method into small ones for codegen > performance regression for complex/long sql when enable codegen > --- > > Key: SPARK-20184 > URL: https://issues.apache.org/jira/browse/SPARK-20184 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0, 2.1.0 >Reporter: Fei Wang > > Execute flowing sql with spark 2.x when codegen enabled, the performance is > much worse than the case when turn off codegen. > SELECT >sum(COUNTER_57) > ,sum(COUNTER_71) > ,sum(COUNTER_3) > ,sum(COUNTER_70) > ,sum(COUNTER_66) > ,sum(COUNTER_75) > ,sum(COUNTER_69) > ,sum(COUNTER_55) > ,sum(COUNTER_63) > ,sum(COUNTER_68) > ,sum(COUNTER_56) > ,sum(COUNTER_37) > ,sum(COUNTER_51) > ,sum(COUNTER_42) > ,sum(COUNTER_43) > ,sum(COUNTER_1) > ,sum(COUNTER_76) > ,sum(COUNTER_54) > ,sum(COUNTER_44) > ,sum(COUNTER_46) > ,DIM_1 > ,DIM_2 > ,DIM_3 > FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100; > Num of rows of aggtable is about 3500. > codegen on: 40s > codegen off: 6s > after some analysis, i think this is related to the huge java method(a java > method thousand of lines) which generated when codegen on. And If i config > -XX:-DontCompileHugeMethods the performance get much better. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20184) performance regression for complex/long sql when enable codegen
[ https://issues.apache.org/jira/browse/SPARK-20184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Wang updated SPARK-20184: - Description: Execute following sql with spark 2.x when codegen enabled, the performance is much worse than the case when turn off codegen. SELECT sum(COUNTER_57) ,sum(COUNTER_71) ,sum(COUNTER_3) ,sum(COUNTER_70) ,sum(COUNTER_66) ,sum(COUNTER_75) ,sum(COUNTER_69) ,sum(COUNTER_55) ,sum(COUNTER_63) ,sum(COUNTER_68) ,sum(COUNTER_56) ,sum(COUNTER_37) ,sum(COUNTER_51) ,sum(COUNTER_42) ,sum(COUNTER_43) ,sum(COUNTER_1) ,sum(COUNTER_76) ,sum(COUNTER_54) ,sum(COUNTER_44) ,sum(COUNTER_46) ,DIM_1 ,DIM_2 ,DIM_3 FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100; Num of rows of aggtable is about 3500. codegen on: 40s codegen off: 6s after some analysis, i think this is related to the huge java method(a java method thousand of lines) which generated when codegen on. And If i config -XX:-DontCompileHugeMethods the performance get much better. was: Execute flowing sql with spark 2.x when codegen enabled, the performance is much worse than the case when turn off codegen. SELECT sum(COUNTER_57) ,sum(COUNTER_71) ,sum(COUNTER_3) ,sum(COUNTER_70) ,sum(COUNTER_66) ,sum(COUNTER_75) ,sum(COUNTER_69) ,sum(COUNTER_55) ,sum(COUNTER_63) ,sum(COUNTER_68) ,sum(COUNTER_56) ,sum(COUNTER_37) ,sum(COUNTER_51) ,sum(COUNTER_42) ,sum(COUNTER_43) ,sum(COUNTER_1) ,sum(COUNTER_76) ,sum(COUNTER_54) ,sum(COUNTER_44) ,sum(COUNTER_46) ,DIM_1 ,DIM_2 ,DIM_3 FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100; Num of rows of aggtable is about 3500. codegen on: 40s codegen off: 6s after some analysis, i think this is related to the huge java method(a java method thousand of lines) which generated when codegen on. And If i config -XX:-DontCompileHugeMethods the performance get much better. > performance regression for complex/long sql when enable codegen > --- > > Key: SPARK-20184 > URL: https://issues.apache.org/jira/browse/SPARK-20184 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0, 2.1.0 >Reporter: Fei Wang > > Execute following sql with spark 2.x when codegen enabled, the performance > is much worse than the case when turn off codegen. > SELECT >sum(COUNTER_57) > ,sum(COUNTER_71) > ,sum(COUNTER_3) > ,sum(COUNTER_70) > ,sum(COUNTER_66) > ,sum(COUNTER_75) > ,sum(COUNTER_69) > ,sum(COUNTER_55) > ,sum(COUNTER_63) > ,sum(COUNTER_68) > ,sum(COUNTER_56) > ,sum(COUNTER_37) > ,sum(COUNTER_51) > ,sum(COUNTER_42) > ,sum(COUNTER_43) > ,sum(COUNTER_1) > ,sum(COUNTER_76) > ,sum(COUNTER_54) > ,sum(COUNTER_44) > ,sum(COUNTER_46) > ,DIM_1 > ,DIM_2 > ,DIM_3 > FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100; > Num of rows of aggtable is about 3500. > codegen on: 40s > codegen off: 6s > after some analysis, i think this is related to the huge java method(a java > method thousand of lines) which generated when codegen on. And If i config > -XX:-DontCompileHugeMethods the performance get much better. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20184) performance regression for complex/long sql when enable codegen
[ https://issues.apache.org/jira/browse/SPARK-20184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Wang updated SPARK-20184: - Description: Execute flowing sql with spark 2.x when codegen enabled, the performance is much worse than the case when turn off codegen. SELECT sum(COUNTER_57) ,sum(COUNTER_71) ,sum(COUNTER_3) ,sum(COUNTER_70) ,sum(COUNTER_66) ,sum(COUNTER_75) ,sum(COUNTER_69) ,sum(COUNTER_55) ,sum(COUNTER_63) ,sum(COUNTER_68) ,sum(COUNTER_56) ,sum(COUNTER_37) ,sum(COUNTER_51) ,sum(COUNTER_42) ,sum(COUNTER_43) ,sum(COUNTER_1) ,sum(COUNTER_76) ,sum(COUNTER_54) ,sum(COUNTER_44) ,sum(COUNTER_46) ,DIM_1 ,DIM_2 ,DIM_3 FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100; Num of rows of aggtable is about 3500. codegen on: 40s codegen off: 6s after some analysis, i think this is related to the huge java method(a java method thousand of lines) which generated when codegen on. And If i config -XX:-DontCompileHugeMethods the performance get much better. was: Execute flowing sql with spark 2.x when codegen enabled, the performance is much worse than the case when turn off codegen. SELECT sum(COUNTER_57) ,sum(COUNTER_71) ,sum(COUNTER_3) ,sum(COUNTER_70) ,sum(COUNTER_66) ,sum(COUNTER_75) ,sum(COUNTER_69) ,sum(COUNTER_55) ,sum(COUNTER_63) ,sum(COUNTER_68) ,sum(COUNTER_56) ,sum(COUNTER_37) ,sum(COUNTER_51) ,sum(COUNTER_42) ,sum(COUNTER_43) ,sum(COUNTER_1) ,sum(COUNTER_76) ,sum(COUNTER_54) ,sum(COUNTER_44) ,sum(COUNTER_46) ,DIM_1 ,DIM_2 ,DIM_3 FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100; codegen on: 40s codegen off: 6s after some analysis, i think this is related to the huge java method(a java method thousand of lines) which generated when codegen on. And If i config -XX:-DontCompileHugeMethods the performance get much better. > performance regression for complex/long sql when enable codegen > --- > > Key: SPARK-20184 > URL: https://issues.apache.org/jira/browse/SPARK-20184 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0, 2.1.0 >Reporter: Fei Wang > > Execute flowing sql with spark 2.x when codegen enabled, the performance is > much worse than the case when turn off codegen. > SELECT >sum(COUNTER_57) > ,sum(COUNTER_71) > ,sum(COUNTER_3) > ,sum(COUNTER_70) > ,sum(COUNTER_66) > ,sum(COUNTER_75) > ,sum(COUNTER_69) > ,sum(COUNTER_55) > ,sum(COUNTER_63) > ,sum(COUNTER_68) > ,sum(COUNTER_56) > ,sum(COUNTER_37) > ,sum(COUNTER_51) > ,sum(COUNTER_42) > ,sum(COUNTER_43) > ,sum(COUNTER_1) > ,sum(COUNTER_76) > ,sum(COUNTER_54) > ,sum(COUNTER_44) > ,sum(COUNTER_46) > ,DIM_1 > ,DIM_2 > ,DIM_3 > FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100; > Num of rows of aggtable is about 3500. > codegen on: 40s > codegen off: 6s > after some analysis, i think this is related to the huge java method(a java > method thousand of lines) which generated when codegen on. And If i config > -XX:-DontCompileHugeMethods the performance get much better. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20184) performance regression for complex/long sql when enable codegen
[ https://issues.apache.org/jira/browse/SPARK-20184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Wang updated SPARK-20184: - Description: Execute flowing sql with spark 2.x when codegen enabled, the performance is much worse than the case when turn off codegen. SELECT sum(COUNTER_57) ,sum(COUNTER_71) ,sum(COUNTER_3) ,sum(COUNTER_70) ,sum(COUNTER_66) ,sum(COUNTER_75) ,sum(COUNTER_69) ,sum(COUNTER_55) ,sum(COUNTER_63) ,sum(COUNTER_68) ,sum(COUNTER_56) ,sum(COUNTER_37) ,sum(COUNTER_51) ,sum(COUNTER_42) ,sum(COUNTER_43) ,sum(COUNTER_1) ,sum(COUNTER_76) ,sum(COUNTER_54) ,sum(COUNTER_44) ,sum(COUNTER_46) ,DIM_1 ,DIM_2 ,DIM_3 FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100; codegen on: 40s codegen off: 6s after some analysis, i think this is related to the huge java method(a java method thousand of lines) which generated when codegen on. And If i config -XX:-DontCompileHugeMethods the performance get much better. was: Execute flowing sql with spark 2.x when codegen enabled, the performance is much worse than the case when turn off codegen. SELECT sum(COUNTER_57) ,sum(COUNTER_71) ,sum(COUNTER_3) ,sum(COUNTER_70) ,sum(COUNTER_66) ,sum(COUNTER_75) ,sum(COUNTER_69) ,sum(COUNTER_55) ,sum(COUNTER_63) ,sum(COUNTER_68) ,sum(COUNTER_56) ,sum(COUNTER_37) ,sum(COUNTER_51) ,sum(COUNTER_42) ,sum(COUNTER_43) ,sum(COUNTER_1) ,sum(COUNTER_76) ,sum(COUNTER_54) ,sum(COUNTER_44) ,sum(COUNTER_46) ,DIM_1 ,DIM_2 ,DIM_3 FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100; codegen on: 40s codegen off: 6s after some analysis, i think this is related to the huge java method(a java method thousand of lines) which generated when codegen on. And If i config -XX:-DontCompileHugeMethods the performance get much better. > performance regression for complex/long sql when enable codegen > --- > > Key: SPARK-20184 > URL: https://issues.apache.org/jira/browse/SPARK-20184 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0, 2.1.0 >Reporter: Fei Wang > > Execute flowing sql with spark 2.x when codegen enabled, the performance is > much worse than the case when turn off codegen. > SELECT >sum(COUNTER_57) > ,sum(COUNTER_71) > ,sum(COUNTER_3) > ,sum(COUNTER_70) > ,sum(COUNTER_66) > ,sum(COUNTER_75) > ,sum(COUNTER_69) > ,sum(COUNTER_55) > ,sum(COUNTER_63) > ,sum(COUNTER_68) > ,sum(COUNTER_56) > ,sum(COUNTER_37) > ,sum(COUNTER_51) > ,sum(COUNTER_42) > ,sum(COUNTER_43) > ,sum(COUNTER_1) > ,sum(COUNTER_76) > ,sum(COUNTER_54) > ,sum(COUNTER_44) > ,sum(COUNTER_46) > ,DIM_1 > ,DIM_2 > ,DIM_3 > FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100; > codegen on: 40s > codegen off: 6s > after some analysis, i think this is related to the huge java method(a java > method thousand of lines) which generated when codegen on. And If i config > -XX:-DontCompileHugeMethods the performance get much better. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20184) performance regression for complex/long sql when enable codegen
[ https://issues.apache.org/jira/browse/SPARK-20184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Wang updated SPARK-20184: - Summary: performance regression for complex/long sql when enable codegen (was: performance regression for complex sql when enable codegen) > performance regression for complex/long sql when enable codegen > --- > > Key: SPARK-20184 > URL: https://issues.apache.org/jira/browse/SPARK-20184 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0, 2.1.0 >Reporter: Fei Wang > > Execute flowing sql with spark 2.x when codegen enabled, the performance is > much worse than the case when turn off codegen. > SELECT >sum(COUNTER_57) > ,sum(COUNTER_71) > ,sum(COUNTER_3) > ,sum(COUNTER_70) > ,sum(COUNTER_66) > ,sum(COUNTER_75) > ,sum(COUNTER_69) > ,sum(COUNTER_55) > ,sum(COUNTER_63) > ,sum(COUNTER_68) > ,sum(COUNTER_56) > ,sum(COUNTER_37) > ,sum(COUNTER_51) > ,sum(COUNTER_42) > ,sum(COUNTER_43) > ,sum(COUNTER_1) > ,sum(COUNTER_76) > ,sum(COUNTER_54) > ,sum(COUNTER_44) > ,sum(COUNTER_46) > ,DIM_1 > ,DIM_2 > ,DIM_3 > FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100; > codegen on: 40s > codegen off: 6s > after some analysis, i think this is related to the huge java method(a java > method thousand of lines) which generated when codegen on. And If i config > -XX:-DontCompileHugeMethods the performance get much better. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20184) performance regression for complex sql when enable codegen
[ https://issues.apache.org/jira/browse/SPARK-20184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Wang updated SPARK-20184: - Description: Execute flowing sql with spark 2.x when codegen enabled, the performance is much worse than the case when turn off codegen. SELECT sum(COUNTER_57) ,sum(COUNTER_71) ,sum(COUNTER_3) ,sum(COUNTER_70) ,sum(COUNTER_66) ,sum(COUNTER_75) ,sum(COUNTER_69) ,sum(COUNTER_55) ,sum(COUNTER_63) ,sum(COUNTER_68) ,sum(COUNTER_56) ,sum(COUNTER_37) ,sum(COUNTER_51) ,sum(COUNTER_42) ,sum(COUNTER_43) ,sum(COUNTER_1) ,sum(COUNTER_76) ,sum(COUNTER_54) ,sum(COUNTER_44) ,sum(COUNTER_46) ,DIM_1 ,DIM_2 ,DIM_3 FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100; codegen on: 40s codegen off: 6s after some analysis, i think this is related to the huge java method(a java method thousand of lines) which generated when codegen on. And If i config -XX:-DontCompileHugeMethods the performance get much better. was: Execute flowing sql with spark 2.x when codegen enabled, the performance is much worse than the case when turn off codegen. SELECT sum(COUNTER_57) ,sum(COUNTER_71) ,sum(COUNTER_3) ,sum(COUNTER_70) ,sum(COUNTER_66) ,sum(COUNTER_75) ,sum(COUNTER_69) ,sum(COUNTER_55) ,sum(COUNTER_63) ,sum(COUNTER_68) ,sum(COUNTER_56) ,sum(COUNTER_37) ,sum(COUNTER_51) ,sum(COUNTER_42) ,sum(COUNTER_43) ,sum(COUNTER_1) ,sum(COUNTER_76) ,sum(COUNTER_54) ,sum(COUNTER_44) ,sum(COUNTER_46) ,DIM_1 ,DIM_2 ,DIM_3 FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100; codegen on: 40s codegen off: 6s after some analysis, i think this is related to the huge java method which generated when codegen on. And If i config -XX:-DontCompileHugeMethods the performance get much better. > performance regression for complex sql when enable codegen > -- > > Key: SPARK-20184 > URL: https://issues.apache.org/jira/browse/SPARK-20184 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0, 2.1.0 >Reporter: Fei Wang > > Execute flowing sql with spark 2.x when codegen enabled, the performance is > much worse than the case when turn off codegen. > SELECT >sum(COUNTER_57) > ,sum(COUNTER_71) > ,sum(COUNTER_3) > ,sum(COUNTER_70) > ,sum(COUNTER_66) > ,sum(COUNTER_75) > ,sum(COUNTER_69) > ,sum(COUNTER_55) > ,sum(COUNTER_63) > ,sum(COUNTER_68) > ,sum(COUNTER_56) > ,sum(COUNTER_37) > ,sum(COUNTER_51) > ,sum(COUNTER_42) > ,sum(COUNTER_43) > ,sum(COUNTER_1) > ,sum(COUNTER_76) > ,sum(COUNTER_54) > ,sum(COUNTER_44) > ,sum(COUNTER_46) > ,DIM_1 > ,DIM_2 > ,DIM_3 > FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100; > codegen on: 40s > codegen off: 6s > after some analysis, i think this is related to the huge java method(a java > method thousand of lines) which generated when codegen on. And If i config > -XX:-DontCompileHugeMethods the performance get much better. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20184) performance regression for complex sql when enable codegen
[ https://issues.apache.org/jira/browse/SPARK-20184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Wang updated SPARK-20184: - Description: Execute flowing sql with spark 2.x when codegen enabled, the performance is much worse than the case when turn off codegen. SELECT sum(COUNTER_57) ,sum(COUNTER_71) ,sum(COUNTER_3) ,sum(COUNTER_70) ,sum(COUNTER_66) ,sum(COUNTER_75) ,sum(COUNTER_69) ,sum(COUNTER_55) ,sum(COUNTER_63) ,sum(COUNTER_68) ,sum(COUNTER_56) ,sum(COUNTER_37) ,sum(COUNTER_51) ,sum(COUNTER_42) ,sum(COUNTER_43) ,sum(COUNTER_1) ,sum(COUNTER_76) ,sum(COUNTER_54) ,sum(COUNTER_44) ,sum(COUNTER_46) ,DIM_1 ,DIM_2 ,DIM_3 FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100; codegen on: 40s codegen off: 6s after some analysis, i think this is related to the huge java method which generated when codegen on. And If i config -XX:-DontCompileHugeMethods the performance get much better. was: Execute flowing sql with spark 2.x when codegen enabled, the performance is much worse than the case when turn off codegen. SELECT sum(COUNTER_57) ,sum(COUNTER_71) ,sum(COUNTER_3) ,sum(COUNTER_70) ,sum(COUNTER_66) ,sum(COUNTER_75) ,sum(COUNTER_69) ,sum(COUNTER_55) ,sum(COUNTER_63) ,sum(COUNTER_68) ,sum(COUNTER_56) ,sum(COUNTER_37) ,sum(COUNTER_51) ,sum(COUNTER_42) ,sum(COUNTER_43) ,sum(COUNTER_1) ,sum(COUNTER_76) ,sum(COUNTER_54) ,sum(COUNTER_44) ,sum(COUNTER_46) ,DIM_1 ,DIM_2 ,DIM_3 FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100; codegen on: 40s codegen off: 6s after some analysis, i think this is related to the huge java method which generated when codegen on. And If i config -XX:-DontCompileHugeMethods the performance of codegen on get much better. > performance regression for complex sql when enable codegen > -- > > Key: SPARK-20184 > URL: https://issues.apache.org/jira/browse/SPARK-20184 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0, 2.1.0 >Reporter: Fei Wang > > Execute flowing sql with spark 2.x when codegen enabled, the performance is > much worse than the case when turn off codegen. > SELECT >sum(COUNTER_57) > ,sum(COUNTER_71) > ,sum(COUNTER_3) > ,sum(COUNTER_70) > ,sum(COUNTER_66) > ,sum(COUNTER_75) > ,sum(COUNTER_69) > ,sum(COUNTER_55) > ,sum(COUNTER_63) > ,sum(COUNTER_68) > ,sum(COUNTER_56) > ,sum(COUNTER_37) > ,sum(COUNTER_51) > ,sum(COUNTER_42) > ,sum(COUNTER_43) > ,sum(COUNTER_1) > ,sum(COUNTER_76) > ,sum(COUNTER_54) > ,sum(COUNTER_44) > ,sum(COUNTER_46) > ,DIM_1 > ,DIM_2 > ,DIM_3 > FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100; > codegen on: 40s > codegen off: 6s > after some analysis, i think this is related to the huge java method which > generated when codegen on. And If i config -XX:-DontCompileHugeMethods the > performance get much better. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20184) performance regression for complex sql when enable codegen
[ https://issues.apache.org/jira/browse/SPARK-20184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Wang updated SPARK-20184: - Description: Execute flowing sql with spark 2.x when codegen enabled, the performance is much worse than the case when turn off codegen. SELECT sum(COUNTER_57) ,sum(COUNTER_71) ,sum(COUNTER_3) ,sum(COUNTER_70) ,sum(COUNTER_66) ,sum(COUNTER_75) ,sum(COUNTER_69) ,sum(COUNTER_55) ,sum(COUNTER_63) ,sum(COUNTER_68) ,sum(COUNTER_56) ,sum(COUNTER_37) ,sum(COUNTER_51) ,sum(COUNTER_42) ,sum(COUNTER_43) ,sum(COUNTER_1) ,sum(COUNTER_76) ,sum(COUNTER_54) ,sum(COUNTER_44) ,sum(COUNTER_46) ,DIM_1 ,DIM_2 ,DIM_3 FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100; codegen on: 40s codegen off: 6s after some analysis, i think this is related to the huge java method which generated when codegen on. And If i config -XX:-DontCompileHugeMethods the performance of codegen on get much better. was: Execute flowing sql with spark 2.x when codegen enabled, the performance is muchworse than the case when turn off codegen. SELECT sum(COUNTER_57) ,sum(COUNTER_71) ,sum(COUNTER_3) ,sum(COUNTER_70) ,sum(COUNTER_66) ,sum(COUNTER_75) ,sum(COUNTER_69) ,sum(COUNTER_55) ,sum(COUNTER_63) ,sum(COUNTER_68) ,sum(COUNTER_56) ,sum(COUNTER_37) ,sum(COUNTER_51) ,sum(COUNTER_42) ,sum(COUNTER_43) ,sum(COUNTER_1) ,sum(COUNTER_76) ,sum(COUNTER_54) ,sum(COUNTER_44) ,sum(COUNTER_46) ,DIM_1 ,DIM_2 ,DIM_3 FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100; codegen on: 40s codegen off: 6s after some analysis, i think this is related to the huge java method which generated when codegen on. And If i config -XX:-DontCompileHugeMethods the performance of codegen on get much better. > performance regression for complex sql when enable codegen > -- > > Key: SPARK-20184 > URL: https://issues.apache.org/jira/browse/SPARK-20184 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0, 2.1.0 >Reporter: Fei Wang > > Execute flowing sql with spark 2.x when codegen enabled, the performance is > much worse than the case when turn off codegen. > SELECT >sum(COUNTER_57) > ,sum(COUNTER_71) > ,sum(COUNTER_3) > ,sum(COUNTER_70) > ,sum(COUNTER_66) > ,sum(COUNTER_75) > ,sum(COUNTER_69) > ,sum(COUNTER_55) > ,sum(COUNTER_63) > ,sum(COUNTER_68) > ,sum(COUNTER_56) > ,sum(COUNTER_37) > ,sum(COUNTER_51) > ,sum(COUNTER_42) > ,sum(COUNTER_43) > ,sum(COUNTER_1) > ,sum(COUNTER_76) > ,sum(COUNTER_54) > ,sum(COUNTER_44) > ,sum(COUNTER_46) > ,DIM_1 > ,DIM_2 > ,DIM_3 > FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100; > codegen on: 40s > codegen off: 6s > after some analysis, i think this is related to the huge java method which > generated when codegen on. And If i config -XX:-DontCompileHugeMethods the > performance of codegen on get much better. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-20184) performance regression for complex sql when enable codegen
Fei Wang created SPARK-20184: Summary: performance regression for complex sql when enable codegen Key: SPARK-20184 URL: https://issues.apache.org/jira/browse/SPARK-20184 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.1.0, 1.6.0 Reporter: Fei Wang Execute flowing sql with spark 2.x when codegen enabled, the performance is muchworse than the case when turn off codegen. SELECT sum(COUNTER_57) ,sum(COUNTER_71) ,sum(COUNTER_3) ,sum(COUNTER_70) ,sum(COUNTER_66) ,sum(COUNTER_75) ,sum(COUNTER_69) ,sum(COUNTER_55) ,sum(COUNTER_63) ,sum(COUNTER_68) ,sum(COUNTER_56) ,sum(COUNTER_37) ,sum(COUNTER_51) ,sum(COUNTER_42) ,sum(COUNTER_43) ,sum(COUNTER_1) ,sum(COUNTER_76) ,sum(COUNTER_54) ,sum(COUNTER_44) ,sum(COUNTER_46) ,DIM_1 ,DIM_2 ,DIM_3 FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100; codegen on: 40s codegen off: 6s after some analysis, i think this is related to the huge java method which generated when codegen on. And If i config -XX:-DontCompileHugeMethods the performance of codegen on get much better. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17556) Executor side broadcast for broadcast joins
[ https://issues.apache.org/jira/browse/SPARK-17556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Wang updated SPARK-17556: - Attachment: (was: executor broadcast.pdf) > Executor side broadcast for broadcast joins > --- > > Key: SPARK-17556 > URL: https://issues.apache.org/jira/browse/SPARK-17556 > Project: Spark > Issue Type: New Feature > Components: Spark Core, SQL >Reporter: Reynold Xin > Attachments: executor broadcast.pdf, executor-side-broadcast.pdf > > > Currently in Spark SQL, in order to perform a broadcast join, the driver must > collect the result of an RDD and then broadcast it. This introduces some > extra latency. It might be possible to broadcast directly from executors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17556) Executor side broadcast for broadcast joins
[ https://issues.apache.org/jira/browse/SPARK-17556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Wang updated SPARK-17556: - Attachment: executor broadcast.pdf > Executor side broadcast for broadcast joins > --- > > Key: SPARK-17556 > URL: https://issues.apache.org/jira/browse/SPARK-17556 > Project: Spark > Issue Type: New Feature > Components: Spark Core, SQL >Reporter: Reynold Xin > Attachments: executor broadcast.pdf, executor broadcast.pdf, > executor-side-broadcast.pdf > > > Currently in Spark SQL, in order to perform a broadcast join, the driver must > collect the result of an RDD and then broadcast it. This introduces some > extra latency. It might be possible to broadcast directly from executors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17556) Executor side broadcast for broadcast joins
[ https://issues.apache.org/jira/browse/SPARK-17556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Wang updated SPARK-17556: - Attachment: executor broadcast.pdf update design doc > Executor side broadcast for broadcast joins > --- > > Key: SPARK-17556 > URL: https://issues.apache.org/jira/browse/SPARK-17556 > Project: Spark > Issue Type: New Feature > Components: Spark Core, SQL >Reporter: Reynold Xin > Attachments: executor broadcast.pdf, executor-side-broadcast.pdf > > > Currently in Spark SQL, in order to perform a broadcast join, the driver must > collect the result of an RDD and then broadcast it. This introduces some > extra latency. It might be possible to broadcast directly from executors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17556) Executor side broadcast for broadcast joins
[ https://issues.apache.org/jira/browse/SPARK-17556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Wang updated SPARK-17556: - Attachment: (was: executor broadcast.pdf) > Executor side broadcast for broadcast joins > --- > > Key: SPARK-17556 > URL: https://issues.apache.org/jira/browse/SPARK-17556 > Project: Spark > Issue Type: New Feature > Components: Spark Core, SQL >Reporter: Reynold Xin > Attachments: executor broadcast.pdf, executor-side-broadcast.pdf > > > Currently in Spark SQL, in order to perform a broadcast join, the driver must > collect the result of an RDD and then broadcast it. This introduces some > extra latency. It might be possible to broadcast directly from executors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17556) Executor side broadcast for broadcast joins
[ https://issues.apache.org/jira/browse/SPARK-17556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15516772#comment-15516772 ] Fei Wang commented on SPARK-17556: -- [~viirya] in this case how about notify driver to re-persist the rdd? > Executor side broadcast for broadcast joins > --- > > Key: SPARK-17556 > URL: https://issues.apache.org/jira/browse/SPARK-17556 > Project: Spark > Issue Type: New Feature > Components: Spark Core, SQL >Reporter: Reynold Xin > Attachments: executor broadcast.pdf > > > Currently in Spark SQL, in order to perform a broadcast join, the driver must > collect the result of an RDD and then broadcast it. This introduces some > extra latency. It might be possible to broadcast directly from executors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17556) Executor side broadcast for broadcast joins
[ https://issues.apache.org/jira/browse/SPARK-17556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15516732#comment-15516732 ] Fei Wang commented on SPARK-17556: -- That's a good point! In your solution, the broadcast rdd must persist first right? How you handle the case for executors lost (all the replication of a piece lost)? > Executor side broadcast for broadcast joins > --- > > Key: SPARK-17556 > URL: https://issues.apache.org/jira/browse/SPARK-17556 > Project: Spark > Issue Type: New Feature > Components: Spark Core, SQL >Reporter: Reynold Xin > Attachments: executor broadcast.pdf > > > Currently in Spark SQL, in order to perform a broadcast join, the driver must > collect the result of an RDD and then broadcast it. This introduces some > extra latency. It might be possible to broadcast directly from executors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-17556) Executor side broadcast for broadcast joins
[ https://issues.apache.org/jira/browse/SPARK-17556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15516639#comment-15516639 ] Fei Wang edited comment on SPARK-17556 at 9/23/16 2:50 PM: --- Yes, the main different is is does not introduce overhead to driver, for broadcast the executor do need all the result of an RDD, i take a look at your PR, i think you also collect all the result of that rdd to executor, right? was (Author: scwf): Yes, the main different is is does not introduce overhead to driver, for broadcast the executor do need all the result of an RDD, i task a look of your PR, i think you also collect all the result of that rdd to executor, right? > Executor side broadcast for broadcast joins > --- > > Key: SPARK-17556 > URL: https://issues.apache.org/jira/browse/SPARK-17556 > Project: Spark > Issue Type: New Feature > Components: Spark Core, SQL >Reporter: Reynold Xin > Attachments: executor broadcast.pdf > > > Currently in Spark SQL, in order to perform a broadcast join, the driver must > collect the result of an RDD and then broadcast it. This introduces some > extra latency. It might be possible to broadcast directly from executors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17556) Executor side broadcast for broadcast joins
[ https://issues.apache.org/jira/browse/SPARK-17556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15516639#comment-15516639 ] Fei Wang commented on SPARK-17556: -- Yes, the main different is is does not introduce overhead to driver, for broadcast the executor do need all the result of an RDD, i task a look of your PR, i think you also collect all the result of that rdd to executor, right? > Executor side broadcast for broadcast joins > --- > > Key: SPARK-17556 > URL: https://issues.apache.org/jira/browse/SPARK-17556 > Project: Spark > Issue Type: New Feature > Components: Spark Core, SQL >Reporter: Reynold Xin > Attachments: executor broadcast.pdf > > > Currently in Spark SQL, in order to perform a broadcast join, the driver must > collect the result of an RDD and then broadcast it. This introduces some > extra latency. It might be possible to broadcast directly from executors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-17556) Executor side broadcast for broadcast joins
[ https://issues.apache.org/jira/browse/SPARK-17556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Wang updated SPARK-17556: - Comment: was deleted (was: Not correct, I just collect the broadcast ref to the driver but not the data:) . ) > Executor side broadcast for broadcast joins > --- > > Key: SPARK-17556 > URL: https://issues.apache.org/jira/browse/SPARK-17556 > Project: Spark > Issue Type: New Feature > Components: Spark Core, SQL >Reporter: Reynold Xin > Attachments: executor broadcast.pdf > > > Currently in Spark SQL, in order to perform a broadcast join, the driver must > collect the result of an RDD and then broadcast it. This introduces some > extra latency. It might be possible to broadcast directly from executors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17556) Executor side broadcast for broadcast joins
[ https://issues.apache.org/jira/browse/SPARK-17556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15516545#comment-15516545 ] Fei Wang commented on SPARK-17556: -- Not correct, I just collect the broadcast ref to the driver but not the data:) . > Executor side broadcast for broadcast joins > --- > > Key: SPARK-17556 > URL: https://issues.apache.org/jira/browse/SPARK-17556 > Project: Spark > Issue Type: New Feature > Components: Spark Core, SQL >Reporter: Reynold Xin > Attachments: executor broadcast.pdf > > > Currently in Spark SQL, in order to perform a broadcast join, the driver must > collect the result of an RDD and then broadcast it. This introduces some > extra latency. It might be possible to broadcast directly from executors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17556) Executor side broadcast for broadcast joins
[ https://issues.apache.org/jira/browse/SPARK-17556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15516546#comment-15516546 ] Fei Wang commented on SPARK-17556: -- Not correct, I just collect the broadcast ref to the driver but not the data:) . > Executor side broadcast for broadcast joins > --- > > Key: SPARK-17556 > URL: https://issues.apache.org/jira/browse/SPARK-17556 > Project: Spark > Issue Type: New Feature > Components: Spark Core, SQL >Reporter: Reynold Xin > Attachments: executor broadcast.pdf > > > Currently in Spark SQL, in order to perform a broadcast join, the driver must > collect the result of an RDD and then broadcast it. This introduces some > extra latency. It might be possible to broadcast directly from executors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-17556) Executor side broadcast for broadcast joins
[ https://issues.apache.org/jira/browse/SPARK-17556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15516386#comment-15516386 ] Fei Wang edited comment on SPARK-17556 at 9/23/16 1:15 PM: --- [~rxin] attached a design doc for the executor based broadcast. Will soon file a PR for this. [~viirya] We have a executor based broadcast implementation in our inner product system which is based on the design doc i attached. Now we are contributing it to opensource, Can you help to review this, thanks. was (Author: scwf): [~rxin] attached a design doc for the executor based broadcast. Will soon file a PR for this. [~viirya] We have a executor based broadcast implementation in our inner produce system which is based on the design doc i attached. Now we are contributing it to opensource, Can you help to review this, thanks. > Executor side broadcast for broadcast joins > --- > > Key: SPARK-17556 > URL: https://issues.apache.org/jira/browse/SPARK-17556 > Project: Spark > Issue Type: New Feature > Components: Spark Core, SQL >Reporter: Reynold Xin > Attachments: executor broadcast.pdf > > > Currently in Spark SQL, in order to perform a broadcast join, the driver must > collect the result of an RDD and then broadcast it. This introduces some > extra latency. It might be possible to broadcast directly from executors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-17556) Executor side broadcast for broadcast joins
[ https://issues.apache.org/jira/browse/SPARK-17556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15516386#comment-15516386 ] Fei Wang edited comment on SPARK-17556 at 9/23/16 1:05 PM: --- [~rxin] attached a design doc for the executor based broadcast. Will soon file a PR for this. [~viirya] We have a executor based broadcast implementation in our inner produce system which is based on the design doc i attached. Now we are contributing it to opensource, Can you help to review this, thanks. was (Author: scwf): [~rxin] attached a design doc for the executor based broadcast. Will soon file a PR for this. [~viirya] We have a executor based broadcast implementation in our inner produce system which is based on the design doc i attached. Can you help to review this, thanks. > Executor side broadcast for broadcast joins > --- > > Key: SPARK-17556 > URL: https://issues.apache.org/jira/browse/SPARK-17556 > Project: Spark > Issue Type: New Feature > Components: Spark Core, SQL >Reporter: Reynold Xin > Attachments: executor broadcast.pdf > > > Currently in Spark SQL, in order to perform a broadcast join, the driver must > collect the result of an RDD and then broadcast it. This introduces some > extra latency. It might be possible to broadcast directly from executors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17556) Executor side broadcast for broadcast joins
[ https://issues.apache.org/jira/browse/SPARK-17556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15516386#comment-15516386 ] Fei Wang commented on SPARK-17556: -- [~rxin] attached a design doc for the executor based broadcast. Will soon file a PR for this. [~viirya] We have a executor based broadcast implementation in our inner produce system which is based on the design doc i attached. Can you help to review this, thanks. > Executor side broadcast for broadcast joins > --- > > Key: SPARK-17556 > URL: https://issues.apache.org/jira/browse/SPARK-17556 > Project: Spark > Issue Type: New Feature > Components: Spark Core, SQL >Reporter: Reynold Xin > Attachments: executor broadcast.pdf > > > Currently in Spark SQL, in order to perform a broadcast join, the driver must > collect the result of an RDD and then broadcast it. This introduces some > extra latency. It might be possible to broadcast directly from executors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17556) Executor side broadcast for broadcast joins
[ https://issues.apache.org/jira/browse/SPARK-17556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Wang updated SPARK-17556: - Attachment: executor broadcast.pdf > Executor side broadcast for broadcast joins > --- > > Key: SPARK-17556 > URL: https://issues.apache.org/jira/browse/SPARK-17556 > Project: Spark > Issue Type: New Feature > Components: Spark Core, SQL >Reporter: Reynold Xin > Attachments: executor broadcast.pdf > > > Currently in Spark SQL, in order to perform a broadcast join, the driver must > collect the result of an RDD and then broadcast it. This introduces some > extra latency. It might be possible to broadcast directly from executors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17644) The failed stage never resubmitted due to abort stage in another thread
Fei Wang created SPARK-17644: Summary: The failed stage never resubmitted due to abort stage in another thread Key: SPARK-17644 URL: https://issues.apache.org/jira/browse/SPARK-17644 Project: Spark Issue Type: Bug Components: Scheduler, Spark Core Affects Versions: 2.0.0, 1.6.0 Reporter: Fei Wang there is a race condition when FetchFailed and resubmit failed stage: job1, job2 run in different threads, if job 1 failed 4 times due to fetchfailed and aborted, then job2 can not post ResubmitFailedStages becase the failedStages in DAGScheduler is not empty now. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12742) org.apache.spark.sql.hive.LogicalPlanToSQLSuite failure due to
[ https://issues.apache.org/jira/browse/SPARK-12742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Wang updated SPARK-12742: - Summary: org.apache.spark.sql.hive.LogicalPlanToSQLSuite failure due to (was: org.apache.spark.sql.hive.LogicalPlanToSQLSuite failuer) > org.apache.spark.sql.hive.LogicalPlanToSQLSuite failure due to > --- > > Key: SPARK-12742 > URL: https://issues.apache.org/jira/browse/SPARK-12742 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Fei Wang > > [info] Exception encountered when attempting to run a suite with class name: > org.apache.spark.sql.hive.LogicalPlanToSQLSuite *** ABORTED *** (325 > milliseconds) > [info] org.apache.spark.sql.AnalysisException: Table `t1` already exists.; > [info] at > org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:296) > [info] at > org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:285) > [info] at > org.apache.spark.sql.hive.LogicalPlanToSQLSuite.beforeAll(LogicalPlanToSQLSuite.scala:33) > [info] at > org.scalatest.BeforeAndAfterAll$class.beforeAll(BeforeAndAfterAll.scala:187) > [info] at > org.apache.spark.sql.hive.LogicalPlanToSQLSuite.beforeAll(LogicalPlanToSQLSuite.scala:23) > [info] at > org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:253) > [info] at > org.apache.spark.sql.hive.LogicalPlanToSQLSuite.run(LogicalPlanToSQLSuite.scala:23) > [info] at > org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462) > [info] at > org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671) > [info] at sbt.ForkMain$Run$2.call(ForkMain.java:296) > [info] at sbt.ForkMain$Run$2.call(ForkMain.java:286) > [info] at java.util.concurrent.FutureTask.run(FutureTask.java:266) > [info] at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [info] at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [info] at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12742) org.apache.spark.sql.hive.LogicalPlanToSQLSuite failure due to Table already exists
[ https://issues.apache.org/jira/browse/SPARK-12742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Wang updated SPARK-12742: - Due Date: 11/Jan/16 Component/s: SQL > org.apache.spark.sql.hive.LogicalPlanToSQLSuite failure due to Table already > exists > --- > > Key: SPARK-12742 > URL: https://issues.apache.org/jira/browse/SPARK-12742 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Fei Wang > > [info] Exception encountered when attempting to run a suite with class name: > org.apache.spark.sql.hive.LogicalPlanToSQLSuite *** ABORTED *** (325 > milliseconds) > [info] org.apache.spark.sql.AnalysisException: Table `t1` already exists.; > [info] at > org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:296) > [info] at > org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:285) > [info] at > org.apache.spark.sql.hive.LogicalPlanToSQLSuite.beforeAll(LogicalPlanToSQLSuite.scala:33) > [info] at > org.scalatest.BeforeAndAfterAll$class.beforeAll(BeforeAndAfterAll.scala:187) > [info] at > org.apache.spark.sql.hive.LogicalPlanToSQLSuite.beforeAll(LogicalPlanToSQLSuite.scala:23) > [info] at > org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:253) > [info] at > org.apache.spark.sql.hive.LogicalPlanToSQLSuite.run(LogicalPlanToSQLSuite.scala:23) > [info] at > org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462) > [info] at > org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671) > [info] at sbt.ForkMain$Run$2.call(ForkMain.java:296) > [info] at sbt.ForkMain$Run$2.call(ForkMain.java:286) > [info] at java.util.concurrent.FutureTask.run(FutureTask.java:266) > [info] at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [info] at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [info] at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12742) org.apache.spark.sql.hive.LogicalPlanToSQLSuite failure due to Table already exists
[ https://issues.apache.org/jira/browse/SPARK-12742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Wang updated SPARK-12742: - Summary: org.apache.spark.sql.hive.LogicalPlanToSQLSuite failure due to Table already exists (was: org.apache.spark.sql.hive.LogicalPlanToSQLSuite failure due to ) > org.apache.spark.sql.hive.LogicalPlanToSQLSuite failure due to Table already > exists > --- > > Key: SPARK-12742 > URL: https://issues.apache.org/jira/browse/SPARK-12742 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Fei Wang > > [info] Exception encountered when attempting to run a suite with class name: > org.apache.spark.sql.hive.LogicalPlanToSQLSuite *** ABORTED *** (325 > milliseconds) > [info] org.apache.spark.sql.AnalysisException: Table `t1` already exists.; > [info] at > org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:296) > [info] at > org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:285) > [info] at > org.apache.spark.sql.hive.LogicalPlanToSQLSuite.beforeAll(LogicalPlanToSQLSuite.scala:33) > [info] at > org.scalatest.BeforeAndAfterAll$class.beforeAll(BeforeAndAfterAll.scala:187) > [info] at > org.apache.spark.sql.hive.LogicalPlanToSQLSuite.beforeAll(LogicalPlanToSQLSuite.scala:23) > [info] at > org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:253) > [info] at > org.apache.spark.sql.hive.LogicalPlanToSQLSuite.run(LogicalPlanToSQLSuite.scala:23) > [info] at > org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462) > [info] at > org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671) > [info] at sbt.ForkMain$Run$2.call(ForkMain.java:296) > [info] at sbt.ForkMain$Run$2.call(ForkMain.java:286) > [info] at java.util.concurrent.FutureTask.run(FutureTask.java:266) > [info] at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [info] at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [info] at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12742) org.apache.spark.sql.hive.LogicalPlanToSQLSuite failuer
Fei Wang created SPARK-12742: Summary: org.apache.spark.sql.hive.LogicalPlanToSQLSuite failuer Key: SPARK-12742 URL: https://issues.apache.org/jira/browse/SPARK-12742 Project: Spark Issue Type: Bug Reporter: Fei Wang [info] Exception encountered when attempting to run a suite with class name: org.apache.spark.sql.hive.LogicalPlanToSQLSuite *** ABORTED *** (325 milliseconds) [info] org.apache.spark.sql.AnalysisException: Table `t1` already exists.; [info] at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:296) [info] at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:285) [info] at org.apache.spark.sql.hive.LogicalPlanToSQLSuite.beforeAll(LogicalPlanToSQLSuite.scala:33) [info] at org.scalatest.BeforeAndAfterAll$class.beforeAll(BeforeAndAfterAll.scala:187) [info] at org.apache.spark.sql.hive.LogicalPlanToSQLSuite.beforeAll(LogicalPlanToSQLSuite.scala:23) [info] at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:253) [info] at org.apache.spark.sql.hive.LogicalPlanToSQLSuite.run(LogicalPlanToSQLSuite.scala:23) [info] at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462) [info] at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671) [info] at sbt.ForkMain$Run$2.call(ForkMain.java:296) [info] at sbt.ForkMain$Run$2.call(ForkMain.java:286) [info] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [info] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [info] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [info] at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12222) deserialize RoaringBitmap using Kryo serializer throw Buffer underflow exception
Fei Wang created SPARK-1: Summary: deserialize RoaringBitmap using Kryo serializer throw Buffer underflow exception Key: SPARK-1 URL: https://issues.apache.org/jira/browse/SPARK-1 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.6.0 Reporter: Fei Wang here are some problems when deserialize RoaringBitmap. see the examples below: run this piece of code ``` import com.esotericsoftware.kryo.io.{Input => KryoInput, Output => KryoOutput} import java.io.DataInput class KryoInputDataInputBridge(input: KryoInput) extends DataInput { override def readLong(): Long = input.readLong() override def readChar(): Char = input.readChar() override def readFloat(): Float = input.readFloat() override def readByte(): Byte = input.readByte() override def readShort(): Short = input.readShort() override def readUTF(): String = input.readString() // readString in kryo does utf8 override def readInt(): Int = input.readInt() override def readUnsignedShort(): Int = input.readShortUnsigned() override def skipBytes(n: Int): Int = input.skip(n.toLong).toInt override def readFully(b: Array[Byte]): Unit = input.read(b) override def readFully(b: Array[Byte], off: Int, len: Int): Unit = input.read(b, off, len) override def readLine(): String = throw new UnsupportedOperationException("readLine") override def readBoolean(): Boolean = input.readBoolean() override def readUnsignedByte(): Int = input.readByteUnsigned() override def readDouble(): Double = input.readDouble() } class KryoOutputDataOutputBridge(output: KryoOutput) extends DataOutput { override def writeFloat(v: Float): Unit = output.writeFloat(v) // There is no "readChars" counterpart, except maybe "readLine", which is not supported override def writeChars(s: String): Unit = throw new UnsupportedOperationException("writeChars") override def writeDouble(v: Double): Unit = output.writeDouble(v) override def writeUTF(s: String): Unit = output.writeString(s) // writeString in kryo does UTF8 override def writeShort(v: Int): Unit = output.writeShort(v) override def writeInt(v: Int): Unit = output.writeInt(v) override def writeBoolean(v: Boolean): Unit = output.writeBoolean(v) override def write(b: Int): Unit = output.write(b) override def write(b: Array[Byte]): Unit = output.write(b) override def write(b: Array[Byte], off: Int, len: Int): Unit = output.write(b, off, len) override def writeBytes(s: String): Unit = output.writeString(s) override def writeChar(v: Int): Unit = output.writeChar(v.toChar) override def writeLong(v: Long): Unit = output.writeLong(v) override def writeByte(v: Int): Unit = output.writeByte(v) } val outStream = new FileOutputStream("D:\\wfserde") val output = new KryoOutput(outStream) val bitmap = new RoaringBitmap bitmap.add(1) bitmap.add(3) bitmap.add(5) bitmap.serialize(new KryoOutputDataOutputBridge(output)) output.flush() output.close() val inStream = new FileInputStream("D:\\wfserde") val input = new KryoInput(inStream) val ret = new RoaringBitmap ret.deserialize(new KryoInputDataInputBridge(input)) println(ret) ``` this will throw `Buffer underflow` error: ``` com.esotericsoftware.kryo.KryoException: Buffer underflow. at com.esotericsoftware.kryo.io.Input.require(Input.java:156) at com.esotericsoftware.kryo.io.Input.skip(Input.java:131) at com.esotericsoftware.kryo.io.Input.skip(Input.java:264) at org.apache.spark.sql.SQLQuerySuite$$anonfun$6$KryoInputDataInputBridge$1.skipBytes ``` after same investigation, i found this is caused by a bug of kryo's `Input.skip(long count)`(https://github.com/EsotericSoftware/kryo/issues/119) and we call this method in `KryoInputDataInputBridge`. So i think we can fix this issue in this two ways: 1) upgrade the kryo version to 2.23.0 or 2.24.0, which has fix this bug in kryo (i am not sure the upgrade is safe in spark, can you check it? @davies ) 2) we can bypass the kryo's `Input.skip(long count)` by directly call another `skip` method in kryo's Input.java(https://github.com/EsotericSoftware/kryo/blob/kryo-2.21/src/com/esotericsoftware/kryo/io/Input.java#L124), i.e. write the bug-fixed version of `Input.skip(long count)` in KryoInputDataInputBridge's `skipBytes` method: ``` class KryoInputDataInputBridge(input: KryoInput) extends DataInput { ... override def skipBytes(n: Int): Int = { var remaining: Long = n while (remaining > 0) { val skip = Math.min(Integer.MAX_VALUE, remaining).asInstanceOf[Int] input.skip(skip) remaining -= skip } n } ... } ``` -- This
[jira] [Commented] (SPARK-4131) Support "Writing data into the filesystem from queries"
[ https://issues.apache.org/jira/browse/SPARK-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735995#comment-14735995 ] Fei Wang commented on SPARK-4131: - can you try the example of my test suite to see if it works? https://github.com/apache/spark/pull/4380/files#diff-1ea02a6fab84e938582f7f87cc4d9ea1R535 > Support "Writing data into the filesystem from queries" > --- > > Key: SPARK-4131 > URL: https://issues.apache.org/jira/browse/SPARK-4131 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 1.1.0 >Reporter: XiaoJing wang >Assignee: Fei Wang >Priority: Critical > Original Estimate: 0.05h > Remaining Estimate: 0.05h > > Writing data into the filesystem from queries,SparkSql is not support . > eg: > {code}insert overwrite LOCAL DIRECTORY '/data1/wangxj/sql_spark' select * > from page_views; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10030) Managed memory leak detected when cache table
[ https://issues.apache.org/jira/browse/SPARK-10030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699075#comment-14699075 ] Fei Wang commented on SPARK-10030: -- [~hyukjin.kwon] this is defenitly a spark sql bug, so we opened a jira. Managed memory leak detected when cache table - Key: SPARK-10030 URL: https://issues.apache.org/jira/browse/SPARK-10030 Project: Spark Issue Type: Bug Affects Versions: 1.5.0 Reporter: wangwei Priority: Blocker I test the lastest spark-1.5.0 in local, standalone, yarn mode and follow the steps bellow, then errors occured. 1. create table cache_test(id int, name string) stored as textfile ; 2. load data local inpath 'SparkSource/sql/hive/src/test/resources/data/files/kv1.txt' into table cache_test; 3. cache table test as select * from cache_test distribute by id; configuration: spark.driver.memory5g spark.executor.memory 28g spark.cores.max 21 15/08/16 17:14:47 ERROR Executor: Managed memory leak detected; size = 67108864 bytes, TID = 434 15/08/16 17:14:47 ERROR Executor: Exception in task 23.0 in stage 9.0 (TID 434) java.util.NoSuchElementException: key not found: val_54 at scala.collection.MapLike$class.default(MapLike.scala:228) at scala.collection.AbstractMap.default(Map.scala:58) at scala.collection.mutable.HashMap.apply(HashMap.scala:64) at org.apache.spark.sql.columnar.compression.DictionaryEncoding$Encoder.compress(compressionSchemes.scala:258) at org.apache.spark.sql.columnar.compression.CompressibleColumnBuilder$class.build(CompressibleColumnBuilder.scala:110) at org.apache.spark.sql.columnar.NativeColumnBuilder.build(ColumnBuilder.scala:87) at org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1$$anonfun$next$2.apply(InMemoryColumnarTableScan.scala:153) at org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1$$anonfun$next$2.apply(InMemoryColumnarTableScan.scala:153) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) at org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1.next(InMemoryColumnarTableScan.scala:153) at org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1.next(InMemoryColumnarTableScan.scala:120) at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:278) at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) at org.apache.spark.rdd.RDD.iterator(RDD.scala:262) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsWithPreparationRDD.compute(MapPartitionsWithPreparationRDD.scala:46) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10030) Managed memory leak detected when cache table
[ https://issues.apache.org/jira/browse/SPARK-10030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698974#comment-14698974 ] Fei Wang commented on SPARK-10030: -- [~liancheng] can you have a look on this issue? Managed memory leak detected when cache table - Key: SPARK-10030 URL: https://issues.apache.org/jira/browse/SPARK-10030 Project: Spark Issue Type: Bug Affects Versions: 1.5.0 Reporter: wangwei Priority: Blocker I test the lastest spark-1.5.0 in local, standalone, yarn mode and follow the steps bellow, then errors occured. 1. create table cache_test(id int, name string) stored as textfile ; 2. load data local inpath 'SparkSource/sql/hive/src/test/resources/data/files/kv1.txt' into table cache_test; 3. cache table test as select * from cache_test distribute by id; configuration: spark.driver.memory5g spark.executor.memory 28g spark.cores.max 21 15/08/16 17:14:47 ERROR Executor: Managed memory leak detected; size = 67108864 bytes, TID = 434 15/08/16 17:14:47 ERROR Executor: Exception in task 23.0 in stage 9.0 (TID 434) java.util.NoSuchElementException: key not found: val_54 at scala.collection.MapLike$class.default(MapLike.scala:228) at scala.collection.AbstractMap.default(Map.scala:58) at scala.collection.mutable.HashMap.apply(HashMap.scala:64) at org.apache.spark.sql.columnar.compression.DictionaryEncoding$Encoder.compress(compressionSchemes.scala:258) at org.apache.spark.sql.columnar.compression.CompressibleColumnBuilder$class.build(CompressibleColumnBuilder.scala:110) at org.apache.spark.sql.columnar.NativeColumnBuilder.build(ColumnBuilder.scala:87) at org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1$$anonfun$next$2.apply(InMemoryColumnarTableScan.scala:153) at org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1$$anonfun$next$2.apply(InMemoryColumnarTableScan.scala:153) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) at org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1.next(InMemoryColumnarTableScan.scala:153) at org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1.next(InMemoryColumnarTableScan.scala:120) at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:278) at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) at org.apache.spark.rdd.RDD.iterator(RDD.scala:262) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsWithPreparationRDD.compute(MapPartitionsWithPreparationRDD.scala:46) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9725) spark sql query string field return empty/garbled string
[ https://issues.apache.org/jira/browse/SPARK-9725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698124#comment-14698124 ] Fei Wang commented on SPARK-9725: - good job :) spark sql query string field return empty/garbled string Key: SPARK-9725 URL: https://issues.apache.org/jira/browse/SPARK-9725 Project: Spark Issue Type: Bug Components: SQL Reporter: Fei Wang Assignee: Davies Liu Priority: Blocker Fix For: 1.5.0 to reproduce it: 1 deploy spark cluster mode, i use standalone mode locally 2 set executor memory = 32g, set following config in spark-default.xml spark.executor.memory36g 3 run spark-sql.sh with show tables it return empty/garbled string -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9725) spark sql query string field return empty/garbled string
[ https://issues.apache.org/jira/browse/SPARK-9725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696323#comment-14696323 ] Fei Wang commented on SPARK-9725: - in some cases(the long table name) of ours it also show garbled string. spark sql query string field return empty/garbled string Key: SPARK-9725 URL: https://issues.apache.org/jira/browse/SPARK-9725 Project: Spark Issue Type: Bug Components: SQL Reporter: Fei Wang Assignee: Davies Liu Priority: Blocker to reproduce it: 1 deploy spark cluster mode, i use standalone mode locally 2 set executor memory = 32g, set following config in spark-default.xml spark.executor.memory36g 3 run spark-sql.sh with show tables it return empty/garbled string -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9725) spark sql query string field return empty/garbled string
[ https://issues.apache.org/jira/browse/SPARK-9725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696324#comment-14696324 ] Fei Wang commented on SPARK-9725: - we test on suse and redhat, both has this issue. Now we find some clue, will post here later. spark sql query string field return empty/garbled string Key: SPARK-9725 URL: https://issues.apache.org/jira/browse/SPARK-9725 Project: Spark Issue Type: Bug Components: SQL Reporter: Fei Wang Assignee: Davies Liu Priority: Blocker to reproduce it: 1 deploy spark cluster mode, i use standalone mode locally 2 set executor memory = 32g, set following config in spark-default.xml spark.executor.memory36g 3 run spark-sql.sh with show tables it return empty/garbled string -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9725) spark sql query string field return empty/garbled string
[ https://issues.apache.org/jira/browse/SPARK-9725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696449#comment-14696449 ] Fei Wang commented on SPARK-9725: - [~davies] please see the comment of peizhongshuai spark sql query string field return empty/garbled string Key: SPARK-9725 URL: https://issues.apache.org/jira/browse/SPARK-9725 Project: Spark Issue Type: Bug Components: SQL Reporter: Fei Wang Assignee: Davies Liu Priority: Blocker to reproduce it: 1 deploy spark cluster mode, i use standalone mode locally 2 set executor memory = 32g, set following config in spark-default.xml spark.executor.memory36g 3 run spark-sql.sh with show tables it return empty/garbled string -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8890) Reduce memory consumption for dynamic partition insert
[ https://issues.apache.org/jira/browse/SPARK-8890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14679447#comment-14679447 ] Fei Wang commented on SPARK-8890: - should this be part of tungsten? Reduce memory consumption for dynamic partition insert -- Key: SPARK-8890 URL: https://issues.apache.org/jira/browse/SPARK-8890 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Michael Armbrust Priority: Critical Fix For: 1.5.0 Currently, InsertIntoHadoopFsRelation can run out of memory if the number of table partitions is large. The problem is that we open one output writer for each partition, and when data are randomized and when the number of partitions is large, we open a large number of output writers, leading to OOM. The solution here is to inject a sorting operation once the number of active partitions is beyond a certain point (e.g. 50?) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9725) spark sql query string field return empty/garbled string
[ https://issues.apache.org/jira/browse/SPARK-9725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14662847#comment-14662847 ] Fei Wang commented on SPARK-9725: - And i not set SPARK_PREPEND_CLASSES spark sql query string field return empty/garbled string Key: SPARK-9725 URL: https://issues.apache.org/jira/browse/SPARK-9725 Project: Spark Issue Type: Bug Components: SQL Reporter: Fei Wang Assignee: Davies Liu Priority: Blocker to reproduce it: 1 deploy spark cluster mode, i use standalone mode locally 2 set executor memory = 32g, set following config in spark-default.xml spark.executor.memory36g 3 run spark-sql.sh with show tables it return empty/garbled string -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9725) spark sql query string field return empty/garbled string
[ https://issues.apache.org/jira/browse/SPARK-9725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14661383#comment-14661383 ] Fei Wang commented on SPARK-9725: - how much memory you set to executor? spark sql query string field return empty/garbled string Key: SPARK-9725 URL: https://issues.apache.org/jira/browse/SPARK-9725 Project: Spark Issue Type: Bug Components: SQL Reporter: Fei Wang Assignee: Davies Liu Priority: Blocker to reproduce it: 1 deploy spark cluster mode, i use standalone mode locally 2 set executor memory = 32g, set following config in spark-default.xml spark.executor.memory36g 3 run spark-sql.sh with show tables it return empty/garbled string -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-9725) spark sql query string field return empty/garbled string
[ https://issues.apache.org/jira/browse/SPARK-9725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14661389#comment-14661389 ] Fei Wang edited comment on SPARK-9725 at 8/7/15 6:21 AM: - i try the master again and reproduce this issue: code M151:/home/wf/spark # bin/spark-sql SET hive.support.sql11.reserved.keywords=false SET spark.sql.hive.version=1.2.1 SET spark.sql.hive.version=1.2.1 [INFO] Unable to bind key for unsupported operation: backward-delete-word [INFO] Unable to bind key for unsupported operation: backward-delete-word [INFO] Unable to bind key for unsupported operation: down-history [INFO] Unable to bind key for unsupported operation: up-history [INFO] Unable to bind key for unsupported operation: up-history [INFO] Unable to bind key for unsupported operation: down-history [INFO] Unable to bind key for unsupported operation: up-history [INFO] Unable to bind key for unsupported operation: down-history [INFO] Unable to bind key for unsupported operation: up-history [INFO] Unable to bind key for unsupported operation: down-history [INFO] Unable to bind key for unsupported operation: up-history [INFO] Unable to bind key for unsupported operation: down-history spark-sql CREATE TABLE IF NOT EXISTS srcbasdfasdfasdf (key INT, value STRING); OK Time taken: 3.493 seconds spark-sql CREATE TABLE IF NOT EXISTS src (key INT, value STRING); OK Time taken: 0.181 seconds spark-sql show tables; false srcbasdffalse Time taken: 0.211 seconds, Fetched 2 row(s) spark-sql /code was (Author: scwf): i try the master again and reproduce this issue: M151:/home/wf/spark # bin/spark-sql SET hive.support.sql11.reserved.keywords=false SET spark.sql.hive.version=1.2.1 SET spark.sql.hive.version=1.2.1 [INFO] Unable to bind key for unsupported operation: backward-delete-word [INFO] Unable to bind key for unsupported operation: backward-delete-word [INFO] Unable to bind key for unsupported operation: down-history [INFO] Unable to bind key for unsupported operation: up-history [INFO] Unable to bind key for unsupported operation: up-history [INFO] Unable to bind key for unsupported operation: down-history [INFO] Unable to bind key for unsupported operation: up-history [INFO] Unable to bind key for unsupported operation: down-history [INFO] Unable to bind key for unsupported operation: up-history [INFO] Unable to bind key for unsupported operation: down-history [INFO] Unable to bind key for unsupported operation: up-history [INFO] Unable to bind key for unsupported operation: down-history spark-sql CREATE TABLE IF NOT EXISTS srcbasdfasdfasdf (key INT, value STRING); OK Time taken: 3.493 seconds spark-sql CREATE TABLE IF NOT EXISTS src (key INT, value STRING); OK Time taken: 0.181 seconds spark-sql show tables; false srcbasdffalse Time taken: 0.211 seconds, Fetched 2 row(s) spark-sql spark sql query string field return empty/garbled string Key: SPARK-9725 URL: https://issues.apache.org/jira/browse/SPARK-9725 Project: Spark Issue Type: Bug Components: SQL Reporter: Fei Wang Assignee: Davies Liu Priority: Blocker to reproduce it: 1 deploy spark cluster mode, i use standalone mode locally 2 set executor memory = 32g, set following config in spark-default.xml spark.executor.memory36g 3 run spark-sql.sh with show tables it return empty/garbled string -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-9725) spark sql query string field return empty/garbled string
[ https://issues.apache.org/jira/browse/SPARK-9725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14661389#comment-14661389 ] Fei Wang edited comment on SPARK-9725 at 8/7/15 6:21 AM: - i try the master again and reproduce this issue: M151:/home/wf/spark # bin/spark-sql SET hive.support.sql11.reserved.keywords=false SET spark.sql.hive.version=1.2.1 SET spark.sql.hive.version=1.2.1 [INFO] Unable to bind key for unsupported operation: backward-delete-word [INFO] Unable to bind key for unsupported operation: backward-delete-word [INFO] Unable to bind key for unsupported operation: down-history [INFO] Unable to bind key for unsupported operation: up-history [INFO] Unable to bind key for unsupported operation: up-history [INFO] Unable to bind key for unsupported operation: down-history [INFO] Unable to bind key for unsupported operation: up-history [INFO] Unable to bind key for unsupported operation: down-history [INFO] Unable to bind key for unsupported operation: up-history [INFO] Unable to bind key for unsupported operation: down-history [INFO] Unable to bind key for unsupported operation: up-history [INFO] Unable to bind key for unsupported operation: down-history spark-sql CREATE TABLE IF NOT EXISTS srcbasdfasdfasdf (key INT, value STRING); OK Time taken: 3.493 seconds spark-sql CREATE TABLE IF NOT EXISTS src (key INT, value STRING); OK Time taken: 0.181 seconds spark-sql show tables; false srcbasdffalse Time taken: 0.211 seconds, Fetched 2 row(s) spark-sql was (Author: scwf): i try the master again and reproduce this issue: code M151:/home/wf/spark # bin/spark-sql SET hive.support.sql11.reserved.keywords=false SET spark.sql.hive.version=1.2.1 SET spark.sql.hive.version=1.2.1 [INFO] Unable to bind key for unsupported operation: backward-delete-word [INFO] Unable to bind key for unsupported operation: backward-delete-word [INFO] Unable to bind key for unsupported operation: down-history [INFO] Unable to bind key for unsupported operation: up-history [INFO] Unable to bind key for unsupported operation: up-history [INFO] Unable to bind key for unsupported operation: down-history [INFO] Unable to bind key for unsupported operation: up-history [INFO] Unable to bind key for unsupported operation: down-history [INFO] Unable to bind key for unsupported operation: up-history [INFO] Unable to bind key for unsupported operation: down-history [INFO] Unable to bind key for unsupported operation: up-history [INFO] Unable to bind key for unsupported operation: down-history spark-sql CREATE TABLE IF NOT EXISTS srcbasdfasdfasdf (key INT, value STRING); OK Time taken: 3.493 seconds spark-sql CREATE TABLE IF NOT EXISTS src (key INT, value STRING); OK Time taken: 0.181 seconds spark-sql show tables; false srcbasdffalse Time taken: 0.211 seconds, Fetched 2 row(s) spark-sql /code spark sql query string field return empty/garbled string Key: SPARK-9725 URL: https://issues.apache.org/jira/browse/SPARK-9725 Project: Spark Issue Type: Bug Components: SQL Reporter: Fei Wang Assignee: Davies Liu Priority: Blocker to reproduce it: 1 deploy spark cluster mode, i use standalone mode locally 2 set executor memory = 32g, set following config in spark-default.xml spark.executor.memory36g 3 run spark-sql.sh with show tables it return empty/garbled string -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9725) spark sql query string field return empty/garbled string
[ https://issues.apache.org/jira/browse/SPARK-9725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14661389#comment-14661389 ] Fei Wang commented on SPARK-9725: - i try the master again and reproduce this issue: M151:/home/wf/spark # bin/spark-sql SET hive.support.sql11.reserved.keywords=false SET spark.sql.hive.version=1.2.1 SET spark.sql.hive.version=1.2.1 [INFO] Unable to bind key for unsupported operation: backward-delete-word [INFO] Unable to bind key for unsupported operation: backward-delete-word [INFO] Unable to bind key for unsupported operation: down-history [INFO] Unable to bind key for unsupported operation: up-history [INFO] Unable to bind key for unsupported operation: up-history [INFO] Unable to bind key for unsupported operation: down-history [INFO] Unable to bind key for unsupported operation: up-history [INFO] Unable to bind key for unsupported operation: down-history [INFO] Unable to bind key for unsupported operation: up-history [INFO] Unable to bind key for unsupported operation: down-history [INFO] Unable to bind key for unsupported operation: up-history [INFO] Unable to bind key for unsupported operation: down-history spark-sql CREATE TABLE IF NOT EXISTS srcbasdfasdfasdf (key INT, value STRING); OK Time taken: 3.493 seconds spark-sql CREATE TABLE IF NOT EXISTS src (key INT, value STRING); OK Time taken: 0.181 seconds spark-sql show tables; false srcbasdffalse Time taken: 0.211 seconds, Fetched 2 row(s) spark-sql spark sql query string field return empty/garbled string Key: SPARK-9725 URL: https://issues.apache.org/jira/browse/SPARK-9725 Project: Spark Issue Type: Bug Components: SQL Reporter: Fei Wang Assignee: Davies Liu Priority: Blocker to reproduce it: 1 deploy spark cluster mode, i use standalone mode locally 2 set executor memory = 32g, set following config in spark-default.xml spark.executor.memory36g 3 run spark-sql.sh with show tables it return empty/garbled string -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9725) spark sql query string field return empty/garbled string
Fei Wang created SPARK-9725: --- Summary: spark sql query string field return empty/garbled string Key: SPARK-9725 URL: https://issues.apache.org/jira/browse/SPARK-9725 Project: Spark Issue Type: Bug Components: SQL Reporter: Fei Wang Priority: Blocker to reproduce it: 1 deploy spark cluster mode, i use standalone mode locally 2 set executor memory = 32g, set following config in spark-default.xml spark.executor.memory36g 3 run spark-sql.sh with show tables it return empty/garbled string -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8968) dynamic partitioning in spark sql performance issue due to the high GC overhead
[ https://issues.apache.org/jira/browse/SPARK-8968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Wang updated SPARK-8968: Summary: dynamic partitioning in spark sql performance issue due to the high GC overhead (was: shuffled by the partition clomns when dynamic partitioning to optimize the memory overhead) dynamic partitioning in spark sql performance issue due to the high GC overhead --- Key: SPARK-8968 URL: https://issues.apache.org/jira/browse/SPARK-8968 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.4.0 Reporter: Fei Wang now the dynamic partitioning show the bad performance for big data due to the GC/memory overhead. this is because each task each partition now we open a writer to write the data, this will cause many small files and high GC. We can shuffle data by the partition columns so that each partition will have ony one partition file and this also reduce the gc overhead -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8968) dynamic partitioning in spark sql performance issue due to the high GC overhead
[ https://issues.apache.org/jira/browse/SPARK-8968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621935#comment-14621935 ] Fei Wang commented on SPARK-8968: - changed, how about this? dynamic partitioning in spark sql performance issue due to the high GC overhead --- Key: SPARK-8968 URL: https://issues.apache.org/jira/browse/SPARK-8968 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.4.0 Reporter: Fei Wang now the dynamic partitioning show the bad performance for big data due to the GC/memory overhead. this is because each task each partition now we open a writer to write the data, this will cause many small files and high GC. We can shuffle data by the partition columns so that each partition will have ony one partition file and this also reduce the gc overhead -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-8968) shuffled by the partition clomns when dynamic partitioning to optimize the memory overhead
Fei Wang created SPARK-8968: --- Summary: shuffled by the partition clomns when dynamic partitioning to optimize the memory overhead Key: SPARK-8968 URL: https://issues.apache.org/jira/browse/SPARK-8968 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.4.0 Reporter: Fei Wang now the dynamic partitioning show the bad performance for big data due to the GC/memory overhead. this is because each task each partition now we open a writer to write the data, this will cause many small files and high GC. We can shuffle data by the partition columns so that each partition will have ony one partition file and this also reduce the gc overhead -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7173) Support YARN node label expressions for the application master
[ https://issues.apache.org/jira/browse/SPARK-7173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14576547#comment-14576547 ] Fei Wang commented on SPARK-7173: - is this resolved by spark-6470? Support YARN node label expressions for the application master -- Key: SPARK-7173 URL: https://issues.apache.org/jira/browse/SPARK-7173 Project: Spark Issue Type: Improvement Components: YARN Affects Versions: 1.3.1 Reporter: Sandy Ryza -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-7866) print the format string in dataframe explain
[ https://issues.apache.org/jira/browse/SPARK-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Wang closed SPARK-7866. --- Resolution: Won't Fix Wrong print in intelli idea, this is really ok, not a problem. print the format string in dataframe explain Key: SPARK-7866 URL: https://issues.apache.org/jira/browse/SPARK-7866 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.3.1 Reporter: Fei Wang Priority: Trivial QueryExecution.toString give a format and clear string, so we print it in DataFrame.explain method -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7866) print the format string in dataframe explain
Fei Wang created SPARK-7866: --- Summary: print the format string in dataframe explain Key: SPARK-7866 URL: https://issues.apache.org/jira/browse/SPARK-7866 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.3.1 Reporter: Fei Wang QueryExecution.toString give a format and clear string, so we print it in DataFrame.explain method -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7866) print the format string in dataframe explain
[ https://issues.apache.org/jira/browse/SPARK-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14559048#comment-14559048 ] Fei Wang commented on SPARK-7866: - Get it, thanks Owen. print the format string in dataframe explain Key: SPARK-7866 URL: https://issues.apache.org/jira/browse/SPARK-7866 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.3.1 Reporter: Fei Wang Priority: Trivial QueryExecution.toString give a format and clear string, so we print it in DataFrame.explain method -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-3323) yarn website's Tracking UI links to the Standby RM
[ https://issues.apache.org/jira/browse/SPARK-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Wang resolved SPARK-3323. - Resolution: Fixed yarn website's Tracking UI links to the Standby RM -- Key: SPARK-3323 URL: https://issues.apache.org/jira/browse/SPARK-3323 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.1.0 Reporter: Fei Wang Running a big application, sometimes will occur this situation: When clicking the Tracking UI of the running application, it links to the Standby RM. With Info as fellow: This is standby RM.Redirecting to the current active RM: some address But actually the address of this website is the same with the some address -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-6997) Convert StringType in LocalTableScan
[ https://issues.apache.org/jira/browse/SPARK-6997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Wang closed SPARK-6997. --- Resolution: Won't Fix Convert StringType in LocalTableScan - Key: SPARK-6997 URL: https://issues.apache.org/jira/browse/SPARK-6997 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.0 Reporter: Fei Wang -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6997) Convert StringType in LocalTableScan
[ https://issues.apache.org/jira/browse/SPARK-6997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549924#comment-14549924 ] Fei Wang commented on SPARK-6997: - this is not nessarry since we have convert the data from scala objects to catalyst rows / types when construct LocalRelation. I am closing this. Convert StringType in LocalTableScan - Key: SPARK-6997 URL: https://issues.apache.org/jira/browse/SPARK-6997 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.0 Reporter: Fei Wang -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-7289) Combine Limit and Sort to avoid total ordering
[ https://issues.apache.org/jira/browse/SPARK-7289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Wang closed SPARK-7289. --- Resolution: Won't Fix use usually not write sql like this Combine Limit and Sort to avoid total ordering -- Key: SPARK-7289 URL: https://issues.apache.org/jira/browse/SPARK-7289 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.3.1 Reporter: Fei Wang Optimize following sql select key from (select * from testData order by key) t limit 5 from == Parsed Logical Plan == 'Limit 5 'Project ['key] 'Subquery t 'Sort ['key ASC], true 'Project [*] 'UnresolvedRelation [testData], None == Analyzed Logical Plan == Limit 5 Project [key#0] Subquery t Sort [key#0 ASC], true Project [key#0,value#1] Subquery testData LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Optimized Logical Plan == Limit 5 Project [key#0] Sort [key#0 ASC], true LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Physical Plan == Limit 5 Project [key#0] Sort [key#0 ASC], true Exchange (RangePartitioning [key#0 ASC], 5), [] PhysicalRDD [key#0,value#1], MapPartitionsRDD[1] to == Parsed Logical Plan == 'Limit 5 'Project ['key] 'Subquery t 'Sort ['key ASC], true 'Project [*] 'UnresolvedRelation [testData], None == Analyzed Logical Plan == Limit 5 Project [key#0] Subquery t Sort [key#0 ASC], true Project [key#0,value#1] Subquery testData LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Optimized Logical Plan == Project [key#0] Limit 5 Sort [key#0 ASC], true LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Physical Plan == Project [key#0] TakeOrdered 5, [key#0 ASC] PhysicalRDD [key#0,value#1], MapPartitionsRDD[1] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-5240) Adding `createDataSourceTable` interface to Catalog
[ https://issues.apache.org/jira/browse/SPARK-5240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Wang closed SPARK-5240. --- Resolution: Won't Fix Adding `createDataSourceTable` interface to Catalog --- Key: SPARK-5240 URL: https://issues.apache.org/jira/browse/SPARK-5240 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.2.0 Reporter: Fei Wang Adding `createDataSourceTable` interface to Catalog. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7656) use CatalystConf in FunctionRegistry
Fei Wang created SPARK-7656: --- Summary: use CatalystConf in FunctionRegistry Key: SPARK-7656 URL: https://issues.apache.org/jira/browse/SPARK-7656 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.3.1 Reporter: Fei Wang should use CatalystConf in FunctionRegistry -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7631) treenode argString should not print children
Fei Wang created SPARK-7631: --- Summary: treenode argString should not print children Key: SPARK-7631 URL: https://issues.apache.org/jira/browse/SPARK-7631 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.1 Reporter: Fei Wang spark-sql explain extended select * from ( select key from src union all select key from src) t; the spark plan will print children in argString == Physical Plan == Union[ HiveTableScan [key#1], (MetastoreRelation default, src, None), None, HiveTableScan [key#3], (MetastoreRelation default, src, None), None] HiveTableScan [key#1], (MetastoreRelation default, src, None), None HiveTableScan [key#3], (MetastoreRelation default, src, None), None -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7659) Sort by attributes that are not present in the SELECT clause when there is windowfunction analysis error
Fei Wang created SPARK-7659: --- Summary: Sort by attributes that are not present in the SELECT clause when there is windowfunction analysis error Key: SPARK-7659 URL: https://issues.apache.org/jira/browse/SPARK-7659 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.1 Reporter: Fei Wang flowing sql get error: select month, sum(product) over (partition by month) from windowData order by area -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6929) Alias for more complex expression causes attribute not been able to resolve
[ https://issues.apache.org/jira/browse/SPARK-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536627#comment-14536627 ] Fei Wang commented on SPARK-6929: - SPARK SQL use c_0 as inner alias name, i think you can try with another alias name which is not c_${number} Alias for more complex expression causes attribute not been able to resolve --- Key: SPARK-6929 URL: https://issues.apache.org/jira/browse/SPARK-6929 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.0 Reporter: Michał Warecki Priority: Critical I've extracted the minimal query that don't work with aliases. You can remove tstudent expression ((tstudent((COUNT(g_0.test2_value) - 1)) from that query and result will be the same. In exception you can see that c_0 is not resolved but c_1 cause that problem. {code} SELECT g_0.test1 AS c_0, (AVG(g_0.test2) - ((tstudent((COUNT(g_0.test2_value) - 1)) * stddev(g_0.test2_value)) / sqrt(convert(COUNT(g_0.test2), long AS c_1 FROM sometable AS g_0 GROUP BY g_0.test1 ORDER BY c_0 LIMIT 502 {code} cause exception: {code} Remote org.apache.spark.sql.AnalysisException: cannot resolve 'c_0' given input columns c_0, c_1; line 1 pos 246 at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$apply$3$$anonfun$apply$1.applyOrElse(CheckAnalysis.scala:48) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$apply$3$$anonfun$apply$1.applyOrElse(CheckAnalysis.scala:45) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:250) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:250) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:50) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:249) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:263) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) at scala.collection.AbstractIterator.to(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenUp(TreeNode.scala:292) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:247) at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$transformExpressionUp$1(QueryPlan.scala:103) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2$$anonfun$apply$2.apply(QueryPlan.scala:117) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:116) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) at scala.collection.AbstractIterator.to(Iterator.scala:1157) at
[jira] [Created] (SPARK-7303) push down project if possible when the child is sort
Fei Wang created SPARK-7303: --- Summary: push down project if possible when the child is sort Key: SPARK-7303 URL: https://issues.apache.org/jira/browse/SPARK-7303 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.3.1 Reporter: Fei Wang Optimize the case of `project(_, sort)` , a example is: `select key from (select * from testData order by key) t` optimize it from ``` == Parsed Logical Plan == 'Project ['key] 'Subquery t 'Sort ['key ASC], true 'Project [*] 'UnresolvedRelation [testData], None == Analyzed Logical Plan == Project [key#0] Subquery t Sort [key#0 ASC], true Project [key#0,value#1] Subquery testData LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Optimized Logical Plan == Project [key#0] Sort [key#0 ASC], true LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Physical Plan == Project [key#0] Sort [key#0 ASC], true Exchange (RangePartitioning [key#0 ASC], 5), [] PhysicalRDD [key#0,value#1], MapPartitionsRDD[1] ``` to ``` == Parsed Logical Plan == 'Project ['key] 'Subquery t 'Sort ['key ASC], true 'Project [*] 'UnresolvedRelation [testData], None == Analyzed Logical Plan == Project [key#0] Subquery t Sort [key#0 ASC], true Project [key#0,value#1] Subquery testData LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Optimized Logical Plan == Sort [key#0 ASC], true Project [key#0] LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Physical Plan == Sort [key#0 ASC], true Exchange (RangePartitioning [key#0 ASC], 5), [] Project [key#0] PhysicalRDD [key#0,value#1], MapPartitionsRDD[1] ``` -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7289) Combine Limit and Sort to avoid total ordering
[ https://issues.apache.org/jira/browse/SPARK-7289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Wang updated SPARK-7289: Description: Optimize following sql select key from (select * from testData order by key) t limit 5 from == Parsed Logical Plan == 'Limit 5 'Project ['key] 'Subquery t 'Sort ['key ASC], true 'Project [*] 'UnresolvedRelation [testData], None == Analyzed Logical Plan == Limit 5 Project [key#0] Subquery t Sort [key#0 ASC], true Project [key#0,value#1] Subquery testData LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Optimized Logical Plan == Limit 5 Project [key#0] Sort [key#0 ASC], true LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Physical Plan == Limit 5 Project [key#0] Sort [key#0 ASC], true Exchange (RangePartitioning [key#0 ASC], 5), [] PhysicalRDD [key#0,value#1], MapPartitionsRDD[1] to == Parsed Logical Plan == 'Limit 5 'Project ['key] 'Subquery t 'Sort ['key ASC], true 'Project [*] 'UnresolvedRelation [testData], None == Analyzed Logical Plan == Limit 5 Project [key#0] Subquery t Sort [key#0 ASC], true Project [key#0,value#1] Subquery testData LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Optimized Logical Plan == Project [key#0] Limit 5 Sort [key#0 ASC], true LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Physical Plan == Project [key#0] TakeOrdered 5, [key#0 ASC] PhysicalRDD [key#0,value#1], MapPartitionsRDD[1] was: Optimize following sql `select key from (select * from testData limit 5) t order by key limit 5` optimize it from ``` == Parsed Logical Plan == 'Limit 5 'Sort ['key ASC], true 'Project ['key] 'Subquery t 'Limit 5 'Project [*] 'UnresolvedRelation [testData], None == Analyzed Logical Plan == Limit 5 Sort [key#0 ASC], true Project [key#0] Subquery t Limit 5 Project [key#0,value#1] Subquery testData LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Optimized Logical Plan == Limit 5 Sort [key#0 ASC], true Project [key#0] Limit 5 LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Physical Plan == TakeOrdered 5, [key#0 ASC] Project [key#0] Limit 5 PhysicalRDD [key#0,value#1], MapPartitionsRDD[1] ``` to ``` == Parsed Logical Plan == 'Limit 5 'Sort ['key ASC], true 'Project ['key] 'Subquery t 'Limit 5 'Project [*] 'UnresolvedRelation [testData], None == Analyzed Logical Plan == Limit 5 Sort [key#0 ASC], true Project [key#0] Subquery t Limit 5 Project [key#0,value#1] Subquery testData LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Optimized Logical Plan == Limit 5 Sort [key#0 ASC], true Project [key#0] LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Physical Plan == TakeOrdered 5, [key#0 ASC] Project [key#0] PhysicalRDD [key#0,value#1], MapPartitionsRDD[1] ``` Summary: Combine Limit and Sort to avoid total ordering (was: push down sort when it's child is Limit) Combine Limit and Sort to avoid total ordering -- Key: SPARK-7289 URL: https://issues.apache.org/jira/browse/SPARK-7289 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.3.1 Reporter: Fei Wang Optimize following sql select key from (select * from testData order by key) t limit 5 from == Parsed Logical Plan == 'Limit 5 'Project ['key] 'Subquery t 'Sort ['key ASC], true 'Project [*] 'UnresolvedRelation [testData], None == Analyzed Logical Plan == Limit 5 Project [key#0] Subquery t Sort [key#0 ASC], true Project [key#0,value#1] Subquery testData LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Optimized Logical Plan == Limit 5 Project [key#0] Sort [key#0 ASC], true LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Physical Plan == Limit 5 Project [key#0] Sort [key#0 ASC], true Exchange (RangePartitioning [key#0 ASC], 5), [] PhysicalRDD [key#0,value#1], MapPartitionsRDD[1] to == Parsed Logical Plan == 'Limit 5 'Project ['key] 'Subquery t 'Sort ['key ASC], true 'Project [*] 'UnresolvedRelation [testData], None == Analyzed Logical Plan == Limit 5 Project [key#0] Subquery t Sort [key#0 ASC], true Project [key#0,value#1] Subquery testData LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Optimized Logical Plan == Project [key#0] Limit 5 Sort [key#0 ASC], true LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Physical Plan == Project [key#0] TakeOrdered 5, [key#0 ASC] PhysicalRDD [key#0,value#1], MapPartitionsRDD[1] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe,
[jira] [Created] (SPARK-7289) push down sort when it's child is Limit
Fei Wang created SPARK-7289: --- Summary: push down sort when it's child is Limit Key: SPARK-7289 URL: https://issues.apache.org/jira/browse/SPARK-7289 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.3.1 Reporter: Fei Wang Optimize following sql `select key from (select * from testData limit 5) t order by key limit 5` optimize it from ``` == Parsed Logical Plan == 'Limit 5 'Sort ['key ASC], true 'Project ['key] 'Subquery t 'Limit 5 'Project [*] 'UnresolvedRelation [testData], None == Analyzed Logical Plan == Limit 5 Sort [key#0 ASC], true Project [key#0] Subquery t Limit 5 Project [key#0,value#1] Subquery testData LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Optimized Logical Plan == Limit 5 Sort [key#0 ASC], true Project [key#0] Limit 5 LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Physical Plan == TakeOrdered 5, [key#0 ASC] Project [key#0] Limit 5 PhysicalRDD [key#0,value#1], MapPartitionsRDD[1] ``` to ``` == Parsed Logical Plan == 'Limit 5 'Sort ['key ASC], true 'Project ['key] 'Subquery t 'Limit 5 'Project [*] 'UnresolvedRelation [testData], None == Analyzed Logical Plan == Limit 5 Sort [key#0 ASC], true Project [key#0] Subquery t Limit 5 Project [key#0,value#1] Subquery testData LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Optimized Logical Plan == Limit 5 Sort [key#0 ASC], true Project [key#0] LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Physical Plan == TakeOrdered 5, [key#0 ASC] Project [key#0] PhysicalRDD [key#0,value#1], MapPartitionsRDD[1] ``` -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7232) Add a Substitution batch for spark sql analyzer
Fei Wang created SPARK-7232: --- Summary: Add a Substitution batch for spark sql analyzer Key: SPARK-7232 URL: https://issues.apache.org/jira/browse/SPARK-7232 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.3.1 Reporter: Fei Wang Added a new batch named `Substitution` before Resolution batch. The motivation for this is there are kind of cases we want to do some substitution on the parsed logical plan before resolve it. Consider this two cases: 1 CTE, for cte we first build a row logical plan 'With Map(q1 - 'Subquery q1 'Project ['key] 'UnresolvedRelation [src], None) 'Project [*] 'Filter ('key = 5) 'UnresolvedRelation [q1], None In `With` logicalplan here is a map stored the (q1- subquery), we want first take off the with command and substitute the q1 of UnresolvedRelation by the subquery 2 Another example is Window function, in window function user may define some windows, we also need substitute the window name of child by the concrete window. this should also done in the Substitution batch. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7163) minor refactory for HiveQl
Fei Wang created SPARK-7163: --- Summary: minor refactory for HiveQl Key: SPARK-7163 URL: https://issues.apache.org/jira/browse/SPARK-7163 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.3.1 Reporter: Fei Wang Now hiveql is a much bigger object, to refactory hiveql to make it more clean and readable 1 move ASTNode related util method/object to a new object named HiveASTNodeUtil 2 delete no use method in HiveQl 3 override `sqlParser` in hivecontext by `ExtendedHiveQlParser`, instead of making a new `ddlParserWithHiveQL` and calling `HiveQl.parseSql` in hivecontext. 4 rename HiveQl to HiveQlConverter -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7123) support table.star in sqlcontext
Fei Wang created SPARK-7123: --- Summary: support table.star in sqlcontext Key: SPARK-7123 URL: https://issues.apache.org/jira/browse/SPARK-7123 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.1 Reporter: Fei Wang support this sql SELECT r.* FROM testData l join testData2 r on (l.key = r.a) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7093) Using newPredicate in NestedLoopJoin to enable code generation
Fei Wang created SPARK-7093: --- Summary: Using newPredicate in NestedLoopJoin to enable code generation Key: SPARK-7093 URL: https://issues.apache.org/jira/browse/SPARK-7093 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.3.0 Reporter: Fei Wang -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7109) Push down left side filter for left semi join
Fei Wang created SPARK-7109: --- Summary: Push down left side filter for left semi join Key: SPARK-7109 URL: https://issues.apache.org/jira/browse/SPARK-7109 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.3.1 Reporter: Fei Wang now in spark sql optimizer we only push down right side filter, actually we can push down left side filter for left semi join -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5659) Flaky test: o.a.s.streaming.ReceiverSuite.block
[ https://issues.apache.org/jira/browse/SPARK-5659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508302#comment-14508302 ] Fei Wang commented on SPARK-5659: - my locally test with dev/run-tests also go into this issue. org.apache.spark.streaming.ReceiverSuite.block generator throttling Failing for the past 2 builds (Since Aborted#3 ) 运行时间:2.2 秒 添加说明 Error Message 126 was greater than or equal to 95.0, but 126 was not less than or equal to 105.0 # records in received blocks = [91,369,294,39,100,100,101,99,100,100,100,100,100,101,100,99,5], not between 95.0 and 105.0, on average Stacktrace sbt.ForkMain$ForkError: 126 was greater than or equal to 95.0, but 126 was not less than or equal to 105.0 # records in received blocks = [91,369,294,39,100,100,101,99,100,100,100,100,100,101,100,99,5], not between 95.0 and 105.0, on average at org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500) at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555) at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466) at org.apache.spark.streaming.ReceiverSuite$$anonfun$3.apply$mcV$sp(ReceiverSuite.scala:207) at org.apache.spark.streaming.ReceiverSuite$$anonfun$3.apply(ReceiverSuite.scala:158) at org.apache.spark.streaming.ReceiverSuite$$anonfun$3.apply(ReceiverSuite.scala:158) at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) at org.scalatest.Suite$class.withFixture(Suite.scala:1122) at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555) at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) at org.apache.spark.streaming.ReceiverSuite.org$scalatest$BeforeAndAfter$$super$runTest(ReceiverSuite.scala:39) at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:200) at org.apache.spark.streaming.ReceiverSuite.runTest(ReceiverSuite.scala:39) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) at scala.collection.immutable.List.foreach(List.scala:318) at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208) at org.scalatest.FunSuite.runTests(FunSuite.scala:1555) at org.scalatest.Suite$class.run(Suite.scala:1424) at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555) at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) at org.scalatest.SuperEngine.runImpl(Engine.scala:545) at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212) at org.apache.spark.streaming.ReceiverSuite.org$scalatest$BeforeAndAfter$$super$run(ReceiverSuite.scala:39) at org.scalatest.BeforeAndAfter$class.run(BeforeAndAfter.scala:241) at org.apache.spark.streaming.ReceiverSuite.run(ReceiverSuite.scala:39) at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462) at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671) at sbt.ForkMain$Run$2.call(ForkMain.java:294) at sbt.ForkMain$Run$2.call(ForkMain.java:284) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) Flaky test:
[jira] [Comment Edited] (SPARK-5659) Flaky test: o.a.s.streaming.ReceiverSuite.block
[ https://issues.apache.org/jira/browse/SPARK-5659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508302#comment-14508302 ] Fei Wang edited comment on SPARK-5659 at 4/23/15 2:02 AM: -- my locally test with dev/run-tests also go into this issue. org.apache.spark.streaming.ReceiverSuite.block generator throttling {code} Error Message 126 was greater than or equal to 95.0, but 126 was not less than or equal to 105.0 # records in received blocks = [91,369,294,39,100,100,101,99,100,100,100,100,100,101,100,99,5], not between 95.0 and 105.0, on average Stacktrace sbt.ForkMain$ForkError: 126 was greater than or equal to 95.0, but 126 was not less than or equal to 105.0 # records in received blocks = [91,369,294,39,100,100,101,99,100,100,100,100,100,101,100,99,5], not between 95.0 and 105.0, on average at org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500) at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555) at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466) at org.apache.spark.streaming.ReceiverSuite$$anonfun$3.apply$mcV$sp(ReceiverSuite.scala:207) at org.apache.spark.streaming.ReceiverSuite$$anonfun$3.apply(ReceiverSuite.scala:158) at org.apache.spark.streaming.ReceiverSuite$$anonfun$3.apply(ReceiverSuite.scala:158) at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) at org.scalatest.Suite$class.withFixture(Suite.scala:1122) at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555) at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) at org.apache.spark.streaming.ReceiverSuite.org$scalatest$BeforeAndAfter$$super$runTest(ReceiverSuite.scala:39) at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:200) at org.apache.spark.streaming.ReceiverSuite.runTest(ReceiverSuite.scala:39) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) at scala.collection.immutable.List.foreach(List.scala:318) at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208) at org.scalatest.FunSuite.runTests(FunSuite.scala:1555) at org.scalatest.Suite$class.run(Suite.scala:1424) at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555) at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) at org.scalatest.SuperEngine.runImpl(Engine.scala:545) at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212) at org.apache.spark.streaming.ReceiverSuite.org$scalatest$BeforeAndAfter$$super$run(ReceiverSuite.scala:39) at org.scalatest.BeforeAndAfter$class.run(BeforeAndAfter.scala:241) at org.apache.spark.streaming.ReceiverSuite.run(ReceiverSuite.scala:39) at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462) at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671) at sbt.ForkMain$Run$2.call(ForkMain.java:294) at sbt.ForkMain$Run$2.call(ForkMain.java:284) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) {code} was (Author: scwf): my locally
[jira] [Comment Edited] (SPARK-5659) Flaky test: o.a.s.streaming.ReceiverSuite.block
[ https://issues.apache.org/jira/browse/SPARK-5659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508302#comment-14508302 ] Fei Wang edited comment on SPARK-5659 at 4/23/15 2:03 AM: -- my locally test with dev/run-tests also go into this issue. {code} org.apache.spark.streaming.ReceiverSuite.block generator throttling Error Message 126 was greater than or equal to 95.0, but 126 was not less than or equal to 105.0 # records in received blocks = [91,369,294,39,100,100,101,99,100,100,100,100,100,101,100,99,5], not between 95.0 and 105.0, on average Stacktrace sbt.ForkMain$ForkError: 126 was greater than or equal to 95.0, but 126 was not less than or equal to 105.0 # records in received blocks = [91,369,294,39,100,100,101,99,100,100,100,100,100,101,100,99,5], not between 95.0 and 105.0, on average at org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500) at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555) at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466) at org.apache.spark.streaming.ReceiverSuite$$anonfun$3.apply$mcV$sp(ReceiverSuite.scala:207) at org.apache.spark.streaming.ReceiverSuite$$anonfun$3.apply(ReceiverSuite.scala:158) at org.apache.spark.streaming.ReceiverSuite$$anonfun$3.apply(ReceiverSuite.scala:158) at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) at org.scalatest.Suite$class.withFixture(Suite.scala:1122) at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555) at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) at org.apache.spark.streaming.ReceiverSuite.org$scalatest$BeforeAndAfter$$super$runTest(ReceiverSuite.scala:39) at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:200) at org.apache.spark.streaming.ReceiverSuite.runTest(ReceiverSuite.scala:39) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) at scala.collection.immutable.List.foreach(List.scala:318) at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208) at org.scalatest.FunSuite.runTests(FunSuite.scala:1555) at org.scalatest.Suite$class.run(Suite.scala:1424) at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555) at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) at org.scalatest.SuperEngine.runImpl(Engine.scala:545) at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212) at org.apache.spark.streaming.ReceiverSuite.org$scalatest$BeforeAndAfter$$super$run(ReceiverSuite.scala:39) at org.scalatest.BeforeAndAfter$class.run(BeforeAndAfter.scala:241) at org.apache.spark.streaming.ReceiverSuite.run(ReceiverSuite.scala:39) at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462) at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671) at sbt.ForkMain$Run$2.call(ForkMain.java:294) at sbt.ForkMain$Run$2.call(ForkMain.java:284) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) {code} was (Author: scwf): my locally