[jira] [Resolved] (SPARK-48517) PythonWorkerFactory does not print error stream in case the daemon fails before the main daemon.py#main()
[ https://issues.apache.org/jira/browse/SPARK-48517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph resolved SPARK-48517. --- Resolution: Invalid Driver logs have correctly captured the error stream. Apologies for the spam. > PythonWorkerFactory does not print error stream in case the daemon fails > before the main daemon.py#main() > - > > Key: SPARK-48517 > URL: https://issues.apache.org/jira/browse/SPARK-48517 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.3.2 >Reporter: Prabhu Joseph >Priority: Major > > PythonWorkerFactory does not print the error stream in case the daemon fails > before executing the main daemon.py#mainI(). It throws SparkException like > below but does not print the error stream which has the failure why > pyspark.daemon failed to start. > {code:java} > org.apache.spark.SparkException: > 2024-05-07T16:04:53.169524256Z stderr F Bad data in pyspark.daemon's standard > output. Invalid port number: > 2024-05-07T16:04:53.169530703Z stderr F 1097887852 (0x4170706c){code} > The error stream is being [read|#L303]] after throwing SparkException. It has > to be captured during the exception as well. > > *Simple Repro:* > 1. Run a sample pyspark job by setting wrong > spark.python.daemon.module like pyspark.wrongdaemon instead of default one > pyspark.daemon. > > 2. The forked python process will fail with > {*}"{*}{*}/opt/python/3.9.2/bin/python: No module named > pyspark.wrongdaemon"{*} but it is not captured by PythonWorkerFactory. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48517) PythonWorkerFactory does not print error stream in case the daemon fails before the main daemon.py#main()
[ https://issues.apache.org/jira/browse/SPARK-48517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated SPARK-48517: -- Description: PythonWorkerFactory does not print the error stream in case the daemon fails before executing the main daemon.py#mainI(). It throws SparkException like below but does not print the error stream which has the failure why pyspark.daemon failed to start. {code:java} org.apache.spark.SparkException: 2024-05-07T16:04:53.169524256Z stderr F Bad data in pyspark.daemon's standard output. Invalid port number: 2024-05-07T16:04:53.169530703Z stderr F 1097887852 (0x4170706c){code} The error stream is being [read|#L303]] after throwing SparkException. It has to be captured during the exception as well. *Simple Repro:* 1. Run a sample pyspark job by setting wrong spark.python.daemon.module like pyspark.wrongdaemon instead of default one pyspark.daemon. 2. The forked python process will fail with {*}"{*}{*}/opt/python/3.9.2/bin/python: No module named pyspark.wrongdaemon"{*} but it is not captured by PythonWorkerFactory. was: PythonWorkerFactory does not print the error stream in case the daemon fails before calling the main daemon.py#method. It throws SparkException like below but does not print the error stream which has the failure why pyspark.daemon failed to start. {code:java} org.apache.spark.SparkException: 2024-05-07T16:04:53.169524256Z stderr F Bad data in pyspark.daemon's standard output. Invalid port number: 2024-05-07T16:04:53.169530703Z stderr F 1097887852 (0x4170706c){code} The error stream is being [read|[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala#L303]] after throwing SparkException. It has to be captured during the exception as well. *Simple Repro:* 1. Run a sample pyspark job by setting wrong spark.python.daemon.module like pyspark.wrongdaemon instead of default one pyspark.daemon. 2. The forked python process will fail with {*}"{*}{*}/opt/python/3.9.2/bin/python: No module named pyspark.wrongdaemon"{*} but it is not captured by PythonWorkerFactory. > PythonWorkerFactory does not print error stream in case the daemon fails > before the main daemon.py#main() > - > > Key: SPARK-48517 > URL: https://issues.apache.org/jira/browse/SPARK-48517 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.3.2 >Reporter: Prabhu Joseph >Priority: Major > > PythonWorkerFactory does not print the error stream in case the daemon fails > before executing the main daemon.py#mainI(). It throws SparkException like > below but does not print the error stream which has the failure why > pyspark.daemon failed to start. > {code:java} > org.apache.spark.SparkException: > 2024-05-07T16:04:53.169524256Z stderr F Bad data in pyspark.daemon's standard > output. Invalid port number: > 2024-05-07T16:04:53.169530703Z stderr F 1097887852 (0x4170706c){code} > The error stream is being [read|#L303]] after throwing SparkException. It has > to be captured during the exception as well. > > *Simple Repro:* > 1. Run a sample pyspark job by setting wrong > spark.python.daemon.module like pyspark.wrongdaemon instead of default one > pyspark.daemon. > > 2. The forked python process will fail with > {*}"{*}{*}/opt/python/3.9.2/bin/python: No module named > pyspark.wrongdaemon"{*} but it is not captured by PythonWorkerFactory. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48517) PythonWorkerFactory does not print error stream in case the daemon fails before the main daemon.py#main()
[ https://issues.apache.org/jira/browse/SPARK-48517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated SPARK-48517: -- Summary: PythonWorkerFactory does not print error stream in case the daemon fails before the main daemon.py#main() (was: PythonWorkerFactory does not print error stream in case the daemon fails before the main daemon.py#method) > PythonWorkerFactory does not print error stream in case the daemon fails > before the main daemon.py#main() > - > > Key: SPARK-48517 > URL: https://issues.apache.org/jira/browse/SPARK-48517 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.3.2 >Reporter: Prabhu Joseph >Priority: Major > > PythonWorkerFactory does not print the error stream in case the daemon fails > before calling the main daemon.py#method. > It throws SparkException like below but does not print the error stream > which has the failure why pyspark.daemon failed to start. > {code:java} > org.apache.spark.SparkException: > 2024-05-07T16:04:53.169524256Z stderr F Bad data in pyspark.daemon's standard > output. Invalid port number: > 2024-05-07T16:04:53.169530703Z stderr F 1097887852 (0x4170706c){code} > The error stream is being > [read|[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala#L303]] > after throwing SparkException. It has to be captured during the exception as > well. > > *Simple Repro:* > 1. Run a sample pyspark job by setting wrong > spark.python.daemon.module like pyspark.wrongdaemon instead of default one > pyspark.daemon. > > 2. The forked python process will fail with > {*}"{*}{*}/opt/python/3.9.2/bin/python: No module named > pyspark.wrongdaemon"{*} but it is not captured by PythonWorkerFactory. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48517) PythonWorkerFactory does not print error stream in case the daemon fails before the main daemon.py#method
Prabhu Joseph created SPARK-48517: - Summary: PythonWorkerFactory does not print error stream in case the daemon fails before the main daemon.py#method Key: SPARK-48517 URL: https://issues.apache.org/jira/browse/SPARK-48517 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 3.3.2 Reporter: Prabhu Joseph PythonWorkerFactory does not print the error stream in case the daemon fails before calling the main daemon.py#method. It throws SparkException like below but does not print the error stream which has the failure why pyspark.daemon failed to start. {code:java} org.apache.spark.SparkException: 2024-05-07T16:04:53.169524256Z stderr F Bad data in pyspark.daemon's standard output. Invalid port number: 2024-05-07T16:04:53.169530703Z stderr F 1097887852 (0x4170706c){code} The error stream is being [read|[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala#L303]] after throwing SparkException. It has to be captured during the exception as well. *Simple Repro:* 1. Run a sample pyspark job by setting wrong spark.python.daemon.module like pyspark.wrongdaemon instead of default one pyspark.daemon. 2. The forked python process will fail with {*}"{*}{*}/opt/python/3.9.2/bin/python: No module named pyspark.wrongdaemon"{*} but it is not captured by PythonWorkerFactory. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36328) HadoopRDD#getPartitions fetches FileSystem Delegation Token for every partition
[ https://issues.apache.org/jira/browse/SPARK-36328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated SPARK-36328: -- Description: Spark Job creates a separate JobConf for every RDD (every hive table partition) in HadoopRDD#getPartitions. {code} override def getPartitions: Array[Partition] = { val jobConf = getJobConf() // add the credentials here as this can be called before SparkContext initialized SparkHadoopUtil.get.addCredentials(jobConf) {code} Hadoop FileSystem fetches FileSystem Delegation Token and sets into the Credentials which is part of JobConf. On further requests, will reuse the token from the Credentials if already exists. {code} if (serviceName != null) { // fs has token, grab it final Text service = new Text(serviceName); Token token = credentials.getToken(service); if (token == null) { token = getDelegationToken(renewer); if (token != null) { tokens.add(token); credentials.addToken(service, token); } } } {code} But since Spark Job creates a new JobConf (which will have a new Credentials) for every hive table partition, the token is not reused and gets fetched for every partition. This is slowing down the query as each delegation token has to go through KDC and SSL handshake on Secure Clusters. *Improvement:* Spark can add the credentials from previous JobConf into the new JobConf to reuse the FileSystem Delegation Token similar to how the User Credentials are added into JobConf after construction. {code} val jobConf = getJobConf() // add the credentials here as this can be called before SparkContext initialized SparkHadoopUtil.get.addCredentials(jobConf) {code} *Repro* {code} beeline> create table parttable (key char(1), value int) partitioned by (p int); insert into table parttable partition(p=100) values ('d', 1), ('e', 2), ('f', 3); insert into table parttable partition(p=200) values ('d', 1), ('e', 2), ('f', 3); insert into table parttable partition(p=300) values ('d', 1), ('e', 2), ('f', 3); spark-sql> select value, count(*) from parttable group by value {code} was: Spark Job creates a separate JobConf for every RDD (every hive table partition) in HadoopRDD#getPartitions. {code} override def getPartitions: Array[Partition] = { val jobConf = getJobConf() // add the credentials here as this can be called before SparkContext initialized SparkHadoopUtil.get.addCredentials(jobConf) {code} Hadoop FileSystem fetches FileSystem Delegation Token and sets into the Credentials which is part of JobConf. On further call, will reuse the token from the Credentials if already exists. {code} if (serviceName != null) { // fs has token, grab it final Text service = new Text(serviceName); Token token = credentials.getToken(service); if (token == null) { token = getDelegationToken(renewer); if (token != null) { tokens.add(token); credentials.addToken(service, token); } } } {code} But since Spark Job creates a new JobConf (which will have a new Credentials) for every hive table partition, the token is not reused and gets fetched for every partition. This is slowing down the query as each delegation token has to go through KDC and SSL handshake on Secure Clusters. *Improvement:* Spark can add the credentials from previous JobConf into the new JobConf to reuse the FileSystem Delegation Token similar to how the User Credentials are added into JobConf after construction. {code} val jobConf = getJobConf() // add the credentials here as this can be called before SparkContext initialized SparkHadoopUtil.get.addCredentials(jobConf) {code} *Repro* {code} beeline> create table parttable (key char(1), value int) partitioned by (p int); insert into table parttable partition(p=100) values ('d', 1), ('e', 2), ('f', 3); insert into table parttable partition(p=200) values ('d', 1), ('e', 2), ('f', 3); insert into table parttable partition(p=300) values ('d', 1), ('e', 2), ('f', 3); spark-sql> select value, count(*) from parttable group by value {code} > HadoopRDD#getPartitions fetches FileSystem Delegation Token for every > partition > --- > > Key: SPARK-36328 > URL: https://issues.apache.org/jira/browse/SPARK-36328 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.2 >Reporter: Prabhu Joseph >Priority: Major > > Spark Job creates a separate JobConf for every RDD (every hive table > partition) in HadoopRDD#getPartitions. > {code} > override def getPartitions: Array[Partition] = { > val jobConf = getJobConf() > // add the credentials here as this can be
[jira] [Updated] (SPARK-36328) HadoopRDD#getPartitions fetches FileSystem Delegation Token for every partition
[ https://issues.apache.org/jira/browse/SPARK-36328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated SPARK-36328: -- Summary: HadoopRDD#getPartitions fetches FileSystem Delegation Token for every partition (was: HadoopRDD#getPartitions fetches FileSystem Delegation Token for evert partition) > HadoopRDD#getPartitions fetches FileSystem Delegation Token for every > partition > --- > > Key: SPARK-36328 > URL: https://issues.apache.org/jira/browse/SPARK-36328 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.2 >Reporter: Prabhu Joseph >Priority: Major > > Spark Job creates a separate JobConf for every RDD (every hive table > partition) in HadoopRDD#getPartitions. > {code} > override def getPartitions: Array[Partition] = { > val jobConf = getJobConf() > // add the credentials here as this can be called before SparkContext > initialized > SparkHadoopUtil.get.addCredentials(jobConf) > {code} > Hadoop FileSystem fetches FileSystem Delegation Token and sets into the > Credentials which is part of JobConf. On further call, will reuse the token > from the Credentials if already exists. > {code} >if (serviceName != null) { // fs has token, grab it > final Text service = new Text(serviceName); > Token token = credentials.getToken(service); > if (token == null) { > token = getDelegationToken(renewer); > if (token != null) { > tokens.add(token); > credentials.addToken(service, token); > } > } > } > {code} > But since Spark Job creates a new JobConf (which will have a new > Credentials) for every hive table partition, the token is not reused and gets > fetched for every partition. This is slowing down the query as each > delegation token has to go through KDC and SSL handshake on Secure Clusters. > *Improvement:* > Spark can add the credentials from previous JobConf into the new JobConf to > reuse the FileSystem Delegation Token similar to how the User Credentials are > added into JobConf after construction. > {code} > val jobConf = getJobConf() > // add the credentials here as this can be called before SparkContext > initialized > SparkHadoopUtil.get.addCredentials(jobConf) > {code} > *Repro* > {code} > beeline> > create table parttable (key char(1), value int) partitioned by (p int); > insert into table parttable partition(p=100) values ('d', 1), ('e', 2), ('f', > 3); > insert into table parttable partition(p=200) values ('d', 1), ('e', 2), ('f', > 3); > insert into table parttable partition(p=300) values ('d', 1), ('e', 2), ('f', > 3); > spark-sql> > select value, count(*) from parttable group by value > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36328) HadoopRDD#getPartitions fetches FileSystem Delegation Token for evert partition
Prabhu Joseph created SPARK-36328: - Summary: HadoopRDD#getPartitions fetches FileSystem Delegation Token for evert partition Key: SPARK-36328 URL: https://issues.apache.org/jira/browse/SPARK-36328 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.1.2 Reporter: Prabhu Joseph Spark Job creates a separate JobConf for every RDD (every hive table partition) in HadoopRDD#getPartitions. {code} override def getPartitions: Array[Partition] = { val jobConf = getJobConf() // add the credentials here as this can be called before SparkContext initialized SparkHadoopUtil.get.addCredentials(jobConf) {code} Hadoop FileSystem fetches FileSystem Delegation Token and sets into the Credentials which is part of JobConf. On further call, will reuse the token from the Credentials if already exists. {code} if (serviceName != null) { // fs has token, grab it final Text service = new Text(serviceName); Token token = credentials.getToken(service); if (token == null) { token = getDelegationToken(renewer); if (token != null) { tokens.add(token); credentials.addToken(service, token); } } } {code} But since Spark Job creates a new JobConf (which will have a new Credentials) for every hive table partition, the token is not reused and gets fetched for every partition. This is slowing down the query as each delegation token has to go through KDC and SSL handshake on Secure Clusters. *Improvement:* Spark can add the credentials from previous JobConf into the new JobConf to reuse the FileSystem Delegation Token similar to how the User Credentials are added into JobConf after construction. {code} val jobConf = getJobConf() // add the credentials here as this can be called before SparkContext initialized SparkHadoopUtil.get.addCredentials(jobConf) {code} *Repro* {code} beeline> create table parttable (key char(1), value int) partitioned by (p int); insert into table parttable partition(p=100) values ('d', 1), ('e', 2), ('f', 3); insert into table parttable partition(p=200) values ('d', 1), ('e', 2), ('f', 3); insert into table parttable partition(p=300) values ('d', 1), ('e', 2), ('f', 3); spark-sql> select value, count(*) from parttable group by value {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26867) Spark Support of YARN Placement Constraint
[ https://issues.apache.org/jira/browse/SPARK-26867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16782270#comment-16782270 ] Prabhu Joseph edited comment on SPARK-26867 at 3/2/19 3:52 AM: --- [~srowen] Spark can allow users to configure the Placement Constraint so that users will have more control on where the executors will get placed. For example: 1. Spark job wants to be run on machines where Python version is x or Java version is y (Node Attributes) 2. Spark job needs / does not need executors to be placed on machine where Hbase RegionServer / Zookeeper / Or any other Service is running. (Affinity / Anti Affinity) 3. Spark job wants no more than 2 of it's executors on same node (Cardinality) 4. Spark Job A executors wants / does not want to be run on where Spark Job / Any Other Job B containers runs (Application_Tag NameSpace) was (Author: prabhu joseph): Spark can allow users to configure the Placement Constraint so that users will have more control on where the executors will get placed. For example: 1. Spark job wants to be run on machines where Python version is x or Java version is y (Node Attributes) 2. Spark job needs / does not need executors to be placed on machine where Hbase RegionServer / Zookeeper / Or any other Service is running. (Affinity / Anti Affinity) 3. Spark job wants no more than 2 of it's executors on same node (Cardinality) 4. Spark Job A executors wants / does not want to be run on where Spark Job / Any Other Job B containers runs (Application_Tag NameSpace) > Spark Support of YARN Placement Constraint > -- > > Key: SPARK-26867 > URL: https://issues.apache.org/jira/browse/SPARK-26867 > Project: Spark > Issue Type: New Feature > Components: Spark Core, YARN >Affects Versions: 3.0.0 >Reporter: Prabhu Joseph >Priority: Major > > YARN provides Placement Constraint Features - where application can request > containers based on affinity / anti-affinity / cardinality to services or > other application containers / node attributes. This is a useful feature for > Spark Jobs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26867) Spark Support of YARN Placement Constraint
[ https://issues.apache.org/jira/browse/SPARK-26867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16782270#comment-16782270 ] Prabhu Joseph commented on SPARK-26867: --- Spark can allow users to configure the Placement Constraint so that users will have more control on where the executors will get placed. For example: 1. Spark job wants to be run on machines where Python version is x or Java version is y (Node Attributes) 2. Spark job needs / does not need executors to be placed on machine where Hbase RegionServer / Zookeeper / Or any other Service is running. (Affinity / Anti Affinity) 3. Spark job wants no more than 2 of it's executors on same node (Cardinality) 4. Spark Job A executors wants / does not want to be run on where Spark Job / Any Other Job B containers runs (Application_Tag NameSpace) > Spark Support of YARN Placement Constraint > -- > > Key: SPARK-26867 > URL: https://issues.apache.org/jira/browse/SPARK-26867 > Project: Spark > Issue Type: New Feature > Components: Spark Core, YARN >Affects Versions: 3.0.0 >Reporter: Prabhu Joseph >Priority: Major > > YARN provides Placement Constraint Features - where application can request > containers based on affinity / anti-affinity / cardinality to services or > other application containers / node attributes. This is a useful feature for > Spark Jobs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26867) Spark Support of YARN Placement Constraint
[ https://issues.apache.org/jira/browse/SPARK-26867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated SPARK-26867: -- Summary: Spark Support of YARN Placement Constraint (was: Spark Support for YARN Placement Constraint) > Spark Support of YARN Placement Constraint > -- > > Key: SPARK-26867 > URL: https://issues.apache.org/jira/browse/SPARK-26867 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Prabhu Joseph >Priority: Major > > YARN provides Placement Constraint Features - where application can request > containers based on affinity / anti-affinity / cardinality to services or > other application containers / node attributes. This is a useful feature for > Spark Jobs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26867) Spark Support of YARN Placement Constraint
[ https://issues.apache.org/jira/browse/SPARK-26867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated SPARK-26867: -- Component/s: YARN > Spark Support of YARN Placement Constraint > -- > > Key: SPARK-26867 > URL: https://issues.apache.org/jira/browse/SPARK-26867 > Project: Spark > Issue Type: New Feature > Components: Spark Core, YARN >Affects Versions: 2.4.0 >Reporter: Prabhu Joseph >Priority: Major > > YARN provides Placement Constraint Features - where application can request > containers based on affinity / anti-affinity / cardinality to services or > other application containers / node attributes. This is a useful feature for > Spark Jobs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26867) Spark Support for YARN Placement Constraint
Prabhu Joseph created SPARK-26867: - Summary: Spark Support for YARN Placement Constraint Key: SPARK-26867 URL: https://issues.apache.org/jira/browse/SPARK-26867 Project: Spark Issue Type: New Feature Components: Spark Core Affects Versions: 2.4.0 Reporter: Prabhu Joseph YARN provides Placement Constraint Features - where application can request containers based on affinity / anti-affinity / cardinality to services or other application containers / node attributes. This is a useful feature for Spark Jobs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24192) Invalid Spark URL in local spark session since upgrading from org.apache.spark:spark-sql_2.11:2.2.1 to org.apache.spark:spark-sql_2.11:2.3.0
[ https://issues.apache.org/jira/browse/SPARK-24192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549707#comment-16549707 ] Prabhu Joseph commented on SPARK-24192: --- We have faced this issue and this happens when the hostname has underscore in it. > Invalid Spark URL in local spark session since upgrading from > org.apache.spark:spark-sql_2.11:2.2.1 to org.apache.spark:spark-sql_2.11:2.3.0 > > > Key: SPARK-24192 > URL: https://issues.apache.org/jira/browse/SPARK-24192 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 2.3.0 >Reporter: Tal Barda >Priority: Major > Labels: HeartbeatReceiver, config, session, spark, spark-conf, > spark-session > Original Estimate: 24h > Remaining Estimate: 24h > > since updating to Spark 2.3.0, tests which are run in my CI (Codeship) fail > due to a allegedly invalid spark url when creating the (local) spark context. > Here's a log from my _*mvn clean install*_ command execution: > {quote}{{2018-05-03 13:18:47.668 ERROR 5533 --- [ main] > org.apache.spark.SparkContext : Error initializing SparkContext. > org.apache.spark.SparkException: Invalid Spark URL: > spark://HeartbeatReceiver@railsonfire_61eb1c99-232b-49d0-abb5-a1eb9693516b_52bcc09bb48b:44284 > at > org.apache.spark.rpc.RpcEndpointAddress$.apply(RpcEndpointAddress.scala:66) > ~[spark-core_2.11-2.3.0.jar:2.3.0] at > org.apache.spark.rpc.netty.NettyRpcEnv.asyncSetupEndpointRefByURI(NettyRpcEnv.scala:134) > ~[spark-core_2.11-2.3.0.jar:2.3.0] at > org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101) > ~[spark-core_2.11-2.3.0.jar:2.3.0] at > org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:109) > ~[spark-core_2.11-2.3.0.jar:2.3.0] at > org.apache.spark.util.RpcUtils$.makeDriverRef(RpcUtils.scala:32) > ~[spark-core_2.11-2.3.0.jar:2.3.0] at > org.apache.spark.executor.Executor.(Executor.scala:155) > ~[spark-core_2.11-2.3.0.jar:2.3.0] at > org.apache.spark.scheduler.local.LocalEndpoint.(LocalSchedulerBackend.scala:59) > ~[spark-core_2.11-2.3.0.jar:2.3.0] at > org.apache.spark.scheduler.local.LocalSchedulerBackend.start(LocalSchedulerBackend.scala:126) > ~[spark-core_2.11-2.3.0.jar:2.3.0] at > org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164) > ~[spark-core_2.11-2.3.0.jar:2.3.0] at > org.apache.spark.SparkContext.(SparkContext.scala:500) > ~[spark-core_2.11-2.3.0.jar:2.3.0] at > org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2486) > [spark-core_2.11-2.3.0.jar:2.3.0] at > org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:930) > [spark-sql_2.11-2.3.0.jar:2.3.0] at > org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:921) > [spark-sql_2.11-2.3.0.jar:2.3.0] at scala.Option.getOrElse(Option.scala:121) > [scala-library-2.11.8.jar:na] at > org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921) > [spark-sql_2.11-2.3.0.jar:2.3.0] at > com.planck.spark_data_features_extractors.utils.SparkConfig.sparkSession(SparkConfig.java:43) > [classes/:na] at > com.planck.spark_data_features_extractors.utils.SparkConfig$$EnhancerBySpringCGLIB$$66dd1f72.CGLIB$sparkSession$0() > [classes/:na] at > com.planck.spark_data_features_extractors.utils.SparkConfig$$EnhancerBySpringCGLIB$$66dd1f72$$FastClassBySpringCGLIB$$a213b647.invoke() > [classes/:na] at > org.springframework.cglib.proxy.MethodProxy.invokeSuper(MethodProxy.java:228) > [spring-core-5.0.2.RELEASE.jar:5.0.2.RELEASE] at > org.springframework.context.annotation.ConfigurationClassEnhancer$BeanMethodInterceptor.intercept(ConfigurationClassEnhancer.java:361) > [spring-context-5.0.2.RELEASE.jar:5.0.2.RELEASE] at > com.planck.spark_data_features_extractors.utils.SparkConfig$$EnhancerBySpringCGLIB$$66dd1f72.sparkSession() > [classes/:na] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[na:1.8.0_171] at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[na:1.8.0_171] at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[na:1.8.0_171] at java.lang.reflect.Method.invoke(Method.java:498) > ~[na:1.8.0_171] at > org.springframework.beans.factory.support.SimpleInstantiationStrategy.instantiate(SimpleInstantiationStrategy.java:154) > [spring-beans-5.0.2.RELEASE.jar:5.0.2.RELEASE] at > org.springframework.beans.factory.support.ConstructorResolver.instantiateUsingFactoryMethod(ConstructorResolver.java:579) > [spring-beans-5.0.2.RELEASE.jar:5.0.2.RELEASE] at >
[jira] [Updated] (SPARK-24008) SQL/Hive Context fails with NullPointerException
[ https://issues.apache.org/jira/browse/SPARK-24008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated SPARK-24008: -- Description: SQL / Hive Context fails with NullPointerException while getting configuration from SQLConf. This happens when the MemoryStore is filled with lot of broadcast and started dropping and then SQL / Hive Context is created and broadcast. When using this Context to access a table fails with below NullPointerException. Repro is attached - the Spark Example which fills the MemoryStore with broadcasts and then creates and accesses a SQL Context. {code} java.lang.NullPointerException at org.apache.spark.sql.SQLConf.getConf(SQLConf.scala:638) at org.apache.spark.sql.SQLConf.defaultDataSourceName(SQLConf.scala:558) at org.apache.spark.sql.DataFrameReader.(DataFrameReader.scala:362) at org.apache.spark.sql.SQLContext.read(SQLContext.scala:623) at SparkHiveExample$.main(SparkHiveExample.scala:76) at SparkHiveExample.main(SparkHiveExample.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) 18/04/06 14:17:42 ERROR ApplicationMaster: User class threw exception: java.lang.NullPointerException java.lang.NullPointerException at org.apache.spark.sql.SQLConf.getConf(SQLConf.scala:638) at org.apache.spark.sql.SQLContext.getConf(SQLContext.scala:153) at org.apache.spark.sql.hive.HiveContext.hiveMetastoreVersion(HiveContext.scala:166) at org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:258) at org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:255) at org.apache.spark.sql.hive.HiveContext$$anon$2.(HiveContext.scala:475) at org.apache.spark.sql.hive.HiveContext.catalog$lzycompute(HiveContext.scala:475) at org.apache.spark.sql.hive.HiveContext.catalog(HiveContext.scala:474) at org.apache.spark.sql.hive.HiveContext.catalog(HiveContext.scala:90) at org.apache.spark.sql.SQLContext.table(SQLContext.scala:831) at org.apache.spark.sql.SQLContext.table(SQLContext.scala:827) {code} MemoryStore got filled and started dropping the blocks. {code} 18/04/17 08:03:43 INFO MemoryStore: 2 blocks selected for dropping 18/04/17 08:03:43 INFO MemoryStore: Block broadcast_14 stored as values in memory (estimated size 78.1 MB, free 64.4 MB) 18/04/17 08:03:43 INFO MemoryStore: Block broadcast_14_piece0 stored as bytes in memory (estimated size 1522.0 B, free 64.4 MB) 18/04/17 08:03:43 INFO MemoryStore: Block broadcast_15 stored as values in memory (estimated size 350.9 KB, free 64.1 MB) 18/04/17 08:03:43 INFO MemoryStore: Block broadcast_15_piece0 stored as bytes in memory (estimated size 29.9 KB, free 64.0 MB) 18/04/17 08:03:43 INFO MemoryStore: 10 blocks selected for dropping 18/04/17 08:03:43 INFO MemoryStore: Block broadcast_16 stored as values in memory (estimated size 78.1 MB, free 64.7 MB) 18/04/17 08:03:43 INFO MemoryStore: Block broadcast_16_piece0 stored as bytes in memory (estimated size 1522.0 B, free 64.7 MB) 18/04/17 08:03:43 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 136.0 B, free 64.7 MB) 18/04/17 08:03:44 INFO MemoryStore: MemoryStore cleared 18/04/17 08:03:20 INFO MemoryStore: MemoryStore started with capacity 511.1 MB 18/04/17 08:03:44 INFO MemoryStore: MemoryStore cleared 18/04/17 08:03:57 INFO MemoryStore: MemoryStore started with capacity 511.1 MB 18/04/17 08:04:23 INFO MemoryStore: MemoryStore cleared {code} Fix is to remove broadcasting SQL/Hive Context or Increasing the Driver memory. was: SQL / Hive Context fails with NullPointerException while getting configuration from SQLConf. This happens when the MemoryStore is filled with lot of broadcast and started dropping and then SQL / Hive Context is created and broadcast. When using this Context to access a table fails with below NullPointerException. Repro is attached - the Spark Example which fills the MemoryStore with broadcasts and then creates and accesses a SQL Context. {code} java.lang.NullPointerException at org.apache.spark.sql.SQLConf.getConf(SQLConf.scala:638) at org.apache.spark.sql.SQLConf.defaultDataSourceName(SQLConf.scala:558) at org.apache.spark.sql.DataFrameReader.(DataFrameReader.scala:362) at org.apache.spark.sql.SQLContext.read(SQLContext.scala:623) at SparkHiveExample$.main(SparkHiveExample.scala:76) at SparkHiveExample.main(SparkHiveExample.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at
[jira] [Commented] (SPARK-24008) SQL/Hive Context fails with NullPointerException
[ https://issues.apache.org/jira/browse/SPARK-24008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443484#comment-16443484 ] Prabhu Joseph commented on SPARK-24008: --- Yes it's driver specific. But better to handle this case instead of failing. > SQL/Hive Context fails with NullPointerException > - > > Key: SPARK-24008 > URL: https://issues.apache.org/jira/browse/SPARK-24008 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.3 >Reporter: Prabhu Joseph >Priority: Major > Attachments: Repro > > > SQL / Hive Context fails with NullPointerException while getting > configuration from SQLConf. This happens when the MemoryStore is filled with > lot of broadcast and started dropping and then SQL / Hive Context is created > and broadcast. When using this Context to access a table fails with below > NullPointerException. > Repro is attached - the Spark Example which fills the MemoryStore with > broadcasts and then creates and accesses a SQL Context. > {code} > java.lang.NullPointerException > at org.apache.spark.sql.SQLConf.getConf(SQLConf.scala:638) > at > org.apache.spark.sql.SQLConf.defaultDataSourceName(SQLConf.scala:558) > at > org.apache.spark.sql.DataFrameReader.(DataFrameReader.scala:362) > at org.apache.spark.sql.SQLContext.read(SQLContext.scala:623) > at SparkHiveExample$.main(SparkHiveExample.scala:76) > at SparkHiveExample.main(SparkHiveExample.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > 18/04/06 14:17:42 ERROR ApplicationMaster: User class threw exception: > java.lang.NullPointerException > java.lang.NullPointerException > at org.apache.spark.sql.SQLConf.getConf(SQLConf.scala:638) > at org.apache.spark.sql.SQLContext.getConf(SQLContext.scala:153) > at > org.apache.spark.sql.hive.HiveContext.hiveMetastoreVersion(HiveContext.scala:166) > > at > org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:258) > > at org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:255) > at > org.apache.spark.sql.hive.HiveContext$$anon$2.(HiveContext.scala:475) > at > org.apache.spark.sql.hive.HiveContext.catalog$lzycompute(HiveContext.scala:475) > > at org.apache.spark.sql.hive.HiveContext.catalog(HiveContext.scala:474) > at org.apache.spark.sql.hive.HiveContext.catalog(HiveContext.scala:90) > at org.apache.spark.sql.SQLContext.table(SQLContext.scala:831) > at org.apache.spark.sql.SQLContext.table(SQLContext.scala:827) > {code} > MemoryStore got filled and started dropping the blocks. > {code} > 18/04/17 08:03:43 INFO MemoryStore: 2 blocks selected for dropping > 18/04/17 08:03:43 INFO MemoryStore: Block broadcast_14 stored as values in > memory (estimated size 78.1 MB, free 64.4 MB) > 18/04/17 08:03:43 INFO MemoryStore: Block broadcast_14_piece0 stored as bytes > in memory (estimated size 1522.0 B, free 64.4 MB) > 18/04/17 08:03:43 INFO MemoryStore: Block broadcast_15 stored as values in > memory (estimated size 350.9 KB, free 64.1 MB) > 18/04/17 08:03:43 INFO MemoryStore: Block broadcast_15_piece0 stored as bytes > in memory (estimated size 29.9 KB, free 64.0 MB) > 18/04/17 08:03:43 INFO MemoryStore: 10 blocks selected for dropping > 18/04/17 08:03:43 INFO MemoryStore: Block broadcast_16 stored as values in > memory (estimated size 78.1 MB, free 64.7 MB) > 18/04/17 08:03:43 INFO MemoryStore: Block broadcast_16_piece0 stored as bytes > in memory (estimated size 1522.0 B, free 64.7 MB) > 18/04/17 08:03:43 INFO MemoryStore: Block broadcast_1 stored as values in > memory (estimated size 136.0 B, free 64.7 MB) > 18/04/17 08:03:44 INFO MemoryStore: MemoryStore cleared > 18/04/17 08:03:20 INFO MemoryStore: MemoryStore started with capacity 511.1 MB > 18/04/17 08:03:44 INFO MemoryStore: MemoryStore cleared > 18/04/17 08:03:57 INFO MemoryStore: MemoryStore started with capacity 511.1 MB > 18/04/17 08:04:23 INFO MemoryStore: MemoryStore cleared > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24008) SQL/Hive Context fails with NullPointerException
[ https://issues.apache.org/jira/browse/SPARK-24008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated SPARK-24008: -- Attachment: Repro > SQL/Hive Context fails with NullPointerException > - > > Key: SPARK-24008 > URL: https://issues.apache.org/jira/browse/SPARK-24008 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.3 >Reporter: Prabhu Joseph >Priority: Major > Attachments: Repro > > > SQL / Hive Context fails with NullPointerException while getting > configuration from SQLConf. This happens when the MemoryStore is filled with > lot of broadcast and started dropping and then SQL / Hive Context is created > and broadcast. When using this Context to access a table fails with below > NullPointerException. > Repro is attached - the Spark Example which fills the MemoryStore with > broadcasts and then creates and accesses a SQL Context. > {code} > java.lang.NullPointerException > at org.apache.spark.sql.SQLConf.getConf(SQLConf.scala:638) > at > org.apache.spark.sql.SQLConf.defaultDataSourceName(SQLConf.scala:558) > at > org.apache.spark.sql.DataFrameReader.(DataFrameReader.scala:362) > at org.apache.spark.sql.SQLContext.read(SQLContext.scala:623) > at SparkHiveExample$.main(SparkHiveExample.scala:76) > at SparkHiveExample.main(SparkHiveExample.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > 18/04/06 14:17:42 ERROR ApplicationMaster: User class threw exception: > java.lang.NullPointerException > java.lang.NullPointerException > at org.apache.spark.sql.SQLConf.getConf(SQLConf.scala:638) > at org.apache.spark.sql.SQLContext.getConf(SQLContext.scala:153) > at > org.apache.spark.sql.hive.HiveContext.hiveMetastoreVersion(HiveContext.scala:166) > > at > org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:258) > > at org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:255) > at > org.apache.spark.sql.hive.HiveContext$$anon$2.(HiveContext.scala:475) > at > org.apache.spark.sql.hive.HiveContext.catalog$lzycompute(HiveContext.scala:475) > > at org.apache.spark.sql.hive.HiveContext.catalog(HiveContext.scala:474) > at org.apache.spark.sql.hive.HiveContext.catalog(HiveContext.scala:90) > at org.apache.spark.sql.SQLContext.table(SQLContext.scala:831) > at org.apache.spark.sql.SQLContext.table(SQLContext.scala:827) > {code} > MemoryStore got filled and started dropping the blocks. > {code} > 18/04/17 08:03:43 INFO MemoryStore: 2 blocks selected for dropping > 18/04/17 08:03:43 INFO MemoryStore: Block broadcast_14 stored as values in > memory (estimated size 78.1 MB, free 64.4 MB) > 18/04/17 08:03:43 INFO MemoryStore: Block broadcast_14_piece0 stored as bytes > in memory (estimated size 1522.0 B, free 64.4 MB) > 18/04/17 08:03:43 INFO MemoryStore: Block broadcast_15 stored as values in > memory (estimated size 350.9 KB, free 64.1 MB) > 18/04/17 08:03:43 INFO MemoryStore: Block broadcast_15_piece0 stored as bytes > in memory (estimated size 29.9 KB, free 64.0 MB) > 18/04/17 08:03:43 INFO MemoryStore: 10 blocks selected for dropping > 18/04/17 08:03:43 INFO MemoryStore: Block broadcast_16 stored as values in > memory (estimated size 78.1 MB, free 64.7 MB) > 18/04/17 08:03:43 INFO MemoryStore: Block broadcast_16_piece0 stored as bytes > in memory (estimated size 1522.0 B, free 64.7 MB) > 18/04/17 08:03:43 INFO MemoryStore: Block broadcast_1 stored as values in > memory (estimated size 136.0 B, free 64.7 MB) > 18/04/17 08:03:44 INFO MemoryStore: MemoryStore cleared > 18/04/17 08:03:20 INFO MemoryStore: MemoryStore started with capacity 511.1 MB > 18/04/17 08:03:44 INFO MemoryStore: MemoryStore cleared > 18/04/17 08:03:57 INFO MemoryStore: MemoryStore started with capacity 511.1 MB > 18/04/17 08:04:23 INFO MemoryStore: MemoryStore cleared > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24008) SQL/Hive Context fails with NullPointerException
Prabhu Joseph created SPARK-24008: - Summary: SQL/Hive Context fails with NullPointerException Key: SPARK-24008 URL: https://issues.apache.org/jira/browse/SPARK-24008 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.6.3 Reporter: Prabhu Joseph SQL / Hive Context fails with NullPointerException while getting configuration from SQLConf. This happens when the MemoryStore is filled with lot of broadcast and started dropping and then SQL / Hive Context is created and broadcast. When using this Context to access a table fails with below NullPointerException. Repro is attached - the Spark Example which fills the MemoryStore with broadcasts and then creates and accesses a SQL Context. {code} java.lang.NullPointerException at org.apache.spark.sql.SQLConf.getConf(SQLConf.scala:638) at org.apache.spark.sql.SQLConf.defaultDataSourceName(SQLConf.scala:558) at org.apache.spark.sql.DataFrameReader.(DataFrameReader.scala:362) at org.apache.spark.sql.SQLContext.read(SQLContext.scala:623) at SparkHiveExample$.main(SparkHiveExample.scala:76) at SparkHiveExample.main(SparkHiveExample.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) 18/04/06 14:17:42 ERROR ApplicationMaster: User class threw exception: java.lang.NullPointerException java.lang.NullPointerException at org.apache.spark.sql.SQLConf.getConf(SQLConf.scala:638) at org.apache.spark.sql.SQLContext.getConf(SQLContext.scala:153) at org.apache.spark.sql.hive.HiveContext.hiveMetastoreVersion(HiveContext.scala:166) at org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:258) at org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:255) at org.apache.spark.sql.hive.HiveContext$$anon$2.(HiveContext.scala:475) at org.apache.spark.sql.hive.HiveContext.catalog$lzycompute(HiveContext.scala:475) at org.apache.spark.sql.hive.HiveContext.catalog(HiveContext.scala:474) at org.apache.spark.sql.hive.HiveContext.catalog(HiveContext.scala:90) at org.apache.spark.sql.SQLContext.table(SQLContext.scala:831) at org.apache.spark.sql.SQLContext.table(SQLContext.scala:827) {code} MemoryStore got filled and started dropping the blocks. {code} 18/04/17 08:03:43 INFO MemoryStore: 2 blocks selected for dropping 18/04/17 08:03:43 INFO MemoryStore: Block broadcast_14 stored as values in memory (estimated size 78.1 MB, free 64.4 MB) 18/04/17 08:03:43 INFO MemoryStore: Block broadcast_14_piece0 stored as bytes in memory (estimated size 1522.0 B, free 64.4 MB) 18/04/17 08:03:43 INFO MemoryStore: Block broadcast_15 stored as values in memory (estimated size 350.9 KB, free 64.1 MB) 18/04/17 08:03:43 INFO MemoryStore: Block broadcast_15_piece0 stored as bytes in memory (estimated size 29.9 KB, free 64.0 MB) 18/04/17 08:03:43 INFO MemoryStore: 10 blocks selected for dropping 18/04/17 08:03:43 INFO MemoryStore: Block broadcast_16 stored as values in memory (estimated size 78.1 MB, free 64.7 MB) 18/04/17 08:03:43 INFO MemoryStore: Block broadcast_16_piece0 stored as bytes in memory (estimated size 1522.0 B, free 64.7 MB) 18/04/17 08:03:43 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 136.0 B, free 64.7 MB) 18/04/17 08:03:44 INFO MemoryStore: MemoryStore cleared 18/04/17 08:03:20 INFO MemoryStore: MemoryStore started with capacity 511.1 MB 18/04/17 08:03:44 INFO MemoryStore: MemoryStore cleared 18/04/17 08:03:57 INFO MemoryStore: MemoryStore started with capacity 511.1 MB 18/04/17 08:04:23 INFO MemoryStore: MemoryStore cleared {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-22587) Spark job fails if fs.defaultFS and application jar are different url
Prabhu Joseph created SPARK-22587: - Summary: Spark job fails if fs.defaultFS and application jar are different url Key: SPARK-22587 URL: https://issues.apache.org/jira/browse/SPARK-22587 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 1.6.3 Reporter: Prabhu Joseph Spark Job fails if the fs.defaultFs and url where application jar resides are different and having same scheme, spark-submit --conf spark.master=yarn-cluster wasb://XXX/tmp/test.py core-site.xml fs.defaultFS is set to wasb:///YYY. Hadoop list works (hadoop fs -ls) works for both the url XXX and YYY. {code} Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: wasb://XXX/tmp/test.py, expected: wasb://YYY at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:665) at org.apache.hadoop.fs.azure.NativeAzureFileSystem.checkPath(NativeAzureFileSystem.java:1251) at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:485) at org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:396) at org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:507) at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:660) at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:912) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:172) at org.apache.spark.deploy.yarn.Client.run(Client.scala:1248) at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1307) at org.apache.spark.deploy.yarn.Client.main(Client.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:751) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) {code} The code Client.copyFileToRemote tries to resolve the path of application jar (XXX) from the FileSystem object created using fs.defaultFS url (YYY) instead of the actual url of application jar. val destFs = destDir.getFileSystem(hadoopConf) val srcFs = srcPath.getFileSystem(hadoopConf) getFileSystem will create the filesystem based on the url of the path and so this is fine. But the below lines of code tries to get the srcPath (XXX url) from the destFs (YYY url) and so it fails. var destPath = srcPath val qualifiedDestPath = destFs.makeQualified(destPath) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22426) Spark AM launching containers on node where External spark shuffle service failed to initialize
[ https://issues.apache.org/jira/browse/SPARK-22426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16237769#comment-16237769 ] Prabhu Joseph commented on SPARK-22426: --- Thanks [~jerryshao], we can close this as a duplicate. > Spark AM launching containers on node where External spark shuffle service > failed to initialize > --- > > Key: SPARK-22426 > URL: https://issues.apache.org/jira/browse/SPARK-22426 > Project: Spark > Issue Type: Bug > Components: Shuffle, YARN >Affects Versions: 1.6.3 >Reporter: Prabhu Joseph >Priority: Major > > When Spark External Shuffle Service on a NodeManager fails, the remote > executors will fail while fetching the data from the executors launched on > this Node. Spark ApplicationMaster should not launch containers on this Node > or not use external shuffle service. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22426) Spark AM launching containers on node where External spark shuffle service failed to initialize
[ https://issues.apache.org/jira/browse/SPARK-22426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235500#comment-16235500 ] Prabhu Joseph commented on SPARK-22426: --- Node and NodeManager process is fine, External Spark Shuffle Service failed to initialize on that NodeManager for some reason like SPARK-17433, SPARK-15519 > Spark AM launching containers on node where External spark shuffle service > failed to initialize > --- > > Key: SPARK-22426 > URL: https://issues.apache.org/jira/browse/SPARK-22426 > Project: Spark > Issue Type: Bug > Components: Shuffle, YARN >Affects Versions: 1.6.3 >Reporter: Prabhu Joseph >Priority: Major > > When Spark External Shuffle Service on a NodeManager fails, the remote > executors will fail while fetching the data from the executors launched on > this Node. Spark ApplicationMaster should not launch containers on this Node > or not use external shuffle service. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-22426) Spark AM launching containers on node where External spark shuffle service failed to initialize
Prabhu Joseph created SPARK-22426: - Summary: Spark AM launching containers on node where External spark shuffle service failed to initialize Key: SPARK-22426 URL: https://issues.apache.org/jira/browse/SPARK-22426 Project: Spark Issue Type: Bug Components: Shuffle Affects Versions: 1.6.3 Reporter: Prabhu Joseph Priority: Major When Spark External Shuffle Service on a NodeManager fails, the remote executors will fail while fetching the data from the executors launched on this Node. Spark ApplicationMaster should not launch containers on this Node or not use external shuffle service. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22426) Spark AM launching containers on node where External spark shuffle service failed to initialize
[ https://issues.apache.org/jira/browse/SPARK-22426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated SPARK-22426: -- Component/s: YARN > Spark AM launching containers on node where External spark shuffle service > failed to initialize > --- > > Key: SPARK-22426 > URL: https://issues.apache.org/jira/browse/SPARK-22426 > Project: Spark > Issue Type: Bug > Components: Shuffle, YARN >Affects Versions: 1.6.3 >Reporter: Prabhu Joseph >Priority: Major > > When Spark External Shuffle Service on a NodeManager fails, the remote > executors will fail while fetching the data from the executors launched on > this Node. Spark ApplicationMaster should not launch containers on this Node > or not use external shuffle service. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-22199) Spark Job on YARN fails with executors "Slave registration failed"
Prabhu Joseph created SPARK-22199: - Summary: Spark Job on YARN fails with executors "Slave registration failed" Key: SPARK-22199 URL: https://issues.apache.org/jira/browse/SPARK-22199 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.6.3 Reporter: Prabhu Joseph Spark Job on YARN Failed with max executors Failed. ApplicationMaster logs: {code} 17/09/28 04:18:27 INFO ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: Max number of executor failures (3) reached) {code} Checking the failed container logs shows "Slave registration failed: Duplicate executor ID" whereas the Driver logs shows it has removed those executors as they are idle for spark.dynamicAllocation.executorIdleTimeout Executor Logs: {code} 17/09/28 04:18:26 ERROR CoarseGrainedExecutorBackend: Slave registration failed: Duplicate executor ID: 122 {code} Driver logs: {code} 17/09/28 04:18:21 INFO ExecutorAllocationManager: Removing executor 122 because it has been idle for 60 seconds (new desired total will be 133) {code} There are two issues here: 1. Error Message in executor is misleading "Slave registration failed: Duplicate executor ID" as the actual error is it was idle 2. The job failed as there are executors idle for spark.dynamicAllocation.executorIdleTimeout -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-16225) Spark Sql throws ArrayIndexOutOfBoundsException on accessing Null Value
[ https://issues.apache.org/jira/browse/SPARK-16225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15350718#comment-15350718 ] Prabhu Joseph edited comment on SPARK-16225 at 6/27/16 10:00 AM: - Thanks [~srowen]. Changing code to use limit as Integer.MAX_VALUE in split(regex,limit) worked val pRowRDD = pTel.map(_.split(delimiter,Integer.MAX_VALUE)).map(p => Row(p(0),p(1),p(2),p(3),p(4),p(5))) was (Author: prabhu joseph): Thanks [~srowen] > Spark Sql throws ArrayIndexOutOfBoundsException on accessing Null Value > --- > > Key: SPARK-16225 > URL: https://issues.apache.org/jira/browse/SPARK-16225 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.2 >Reporter: Prabhu Joseph > > Spark Sql throws ArrayIndexOutOfBoundsException on accessing Null value > Repro: > {code} > [root@spark1 spark]# cat repro > a,b,c,,, > ./bin/spark-shell > scala> val pTel = sc.textFile("file:///usr/hdp/2.3.4.7-4/spark/repro") > scala> pTel.take(1) > res0: Array[String] = Array(a,b,c,,,) > scala> val delimiter = "," > delimiter: String = , > scala> import org.apache.spark.sql._ > import org.apache.spark.sql._ > scala> val pRowRDD = pTel.map(_.split(delimiter)).map(p => > Row(p(0),p(1),p(2),p(3),p(4),p(5))) > pRowRDD: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = > MapPartitionsRDD[3] at map at :28 > scala> pRowRDD.take(1) > 16/06/27 08:21:40 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1) > java.lang.ArrayIndexOutOfBoundsException: 3 > {code} > It works fine when the data is a,b,c,d,e,f -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16225) Spark Sql throws ArrayIndexOutOfBoundsException on accessing Null Value
[ https://issues.apache.org/jira/browse/SPARK-16225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15350718#comment-15350718 ] Prabhu Joseph commented on SPARK-16225: --- Thanks [~srowen] > Spark Sql throws ArrayIndexOutOfBoundsException on accessing Null Value > --- > > Key: SPARK-16225 > URL: https://issues.apache.org/jira/browse/SPARK-16225 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.2 >Reporter: Prabhu Joseph > > Spark Sql throws ArrayIndexOutOfBoundsException on accessing Null value > Repro: > {code} > [root@spark1 spark]# cat repro > a,b,c,,, > ./bin/spark-shell > scala> val pTel = sc.textFile("file:///usr/hdp/2.3.4.7-4/spark/repro") > scala> pTel.take(1) > res0: Array[String] = Array(a,b,c,,,) > scala> val delimiter = "," > delimiter: String = , > scala> import org.apache.spark.sql._ > import org.apache.spark.sql._ > scala> val pRowRDD = pTel.map(_.split(delimiter)).map(p => > Row(p(0),p(1),p(2),p(3),p(4),p(5))) > pRowRDD: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = > MapPartitionsRDD[3] at map at :28 > scala> pRowRDD.take(1) > 16/06/27 08:21:40 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1) > java.lang.ArrayIndexOutOfBoundsException: 3 > {code} > It works fine when the data is a,b,c,d,e,f -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-16225) Spark Sql throws ArrayIndexOutOfBoundsException on accessing Null Value
Prabhu Joseph created SPARK-16225: - Summary: Spark Sql throws ArrayIndexOutOfBoundsException on accessing Null Value Key: SPARK-16225 URL: https://issues.apache.org/jira/browse/SPARK-16225 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.5.2 Reporter: Prabhu Joseph Spark Sql throws ArrayIndexOutOfBoundsException on accessing Null value Repro: {code} [root@spark1 spark]# cat repro a,b,c,,, ./bin/spark-shell scala> val pTel = sc.textFile("file:///usr/hdp/2.3.4.7-4/spark/repro") scala> pTel.take(1) res0: Array[String] = Array(a,b,c,,,) scala> val delimiter = "," delimiter: String = , scala> import org.apache.spark.sql._ import org.apache.spark.sql._ scala> val pRowRDD = pTel.map(_.split(delimiter)).map(p => Row(p(0),p(1),p(2),p(3),p(4),p(5))) pRowRDD: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = MapPartitionsRDD[3] at map at :28 scala> pRowRDD.take(1) 16/06/27 08:21:40 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1) java.lang.ArrayIndexOutOfBoundsException: 3 {code} It works fine when the data is a,b,c,d,e,f -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15918) unionAll returns wrong result when two dataframes has schema in different order
Prabhu Joseph created SPARK-15918: - Summary: unionAll returns wrong result when two dataframes has schema in different order Key: SPARK-15918 URL: https://issues.apache.org/jira/browse/SPARK-15918 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.6.1 Environment: CentOS Reporter: Prabhu Joseph Fix For: 1.6.1 On applying unionAll operation between A and B dataframes, they both has same schema but in different order and hence the result has column value mapping changed. Repro: {code} A.show() +---++---+--+--+-++---+--+---+---+-+ |tag|year_day|tm_hour|tm_min|tm_sec|dtype|time|tm_mday|tm_mon|tm_yday|tm_year|value| +---++---+--+--+-++---+--+---+---+-+ +---++---+--+--+-++---+--+---+---+-+ B.show() +-+---+--+---+---+--+--+--+---+---+--++ |dtype|tag| time|tm_hour|tm_mday|tm_min|tm_mon|tm_sec|tm_yday|tm_year| value|year_day| +-+---+--+---+---+--+--+--+---+---+--++ |F|C_FNHXUT701Z.CNSTLO|1443790800| 13| 2| 0|10| 0| 275| 2015|1.2345| 2015275| |F|C_FNHXUDP713.CNSTHI|1443790800| 13| 2| 0|10| 0| 275| 2015|1.2345| 2015275| |F| C_FNHXUT718.CNSTHI|1443790800| 13| 2| 0|10| 0| 275| 2015|1.2345| 2015275| |F|C_FNHXUT703Z.CNSTLO|1443790800| 13| 2| 0|10| 0| 275| 2015|1.2345| 2015275| |F|C_FNHXUR716A.CNSTLO|1443790800| 13| 2| 0|10| 0| 275| 2015|1.2345| 2015275| |F|C_FNHXUT803Z.CNSTHI|1443790800| 13| 2| 0|10| 0| 275| 2015|1.2345| 2015275| |F| C_FNHXUT728.CNSTHI|1443790800| 13| 2| 0|10| 0| 275| 2015|1.2345| 2015275| |F| C_FNHXUR806.CNSTHI|1443790800| 13| 2| 0|10| 0| 275| 2015|1.2345| 2015275| +-+---+--+---+---+--+--+--+---+---+--++ A = A.unionAll(B) A.show() +---+---+--+--+--+-++---+--+---+---+-+ |tag| year_day| tm_hour|tm_min|tm_sec|dtype|time|tm_mday|tm_mon|tm_yday|tm_year|value| +---+---+--+--+--+-++---+--+---+---+-+ | F|C_FNHXUT701Z.CNSTLO|1443790800|13| 2|0| 10| 0| 275| 2015| 1.2345|2015275.0| | F|C_FNHXUDP713.CNSTHI|1443790800|13| 2|0| 10| 0| 275| 2015| 1.2345|2015275.0| | F| C_FNHXUT718.CNSTHI|1443790800|13| 2|0| 10| 0| 275| 2015| 1.2345|2015275.0| | F|C_FNHXUT703Z.CNSTLO|1443790800|13| 2|0| 10| 0| 275| 2015| 1.2345|2015275.0| | F|C_FNHXUR716A.CNSTLO|1443790800|13| 2|0| 10| 0| 275| 2015| 1.2345|2015275.0| | F|C_FNHXUT803Z.CNSTHI|1443790800|13| 2|0| 10| 0| 275| 2015| 1.2345|2015275.0| | F| C_FNHXUT728.CNSTHI|1443790800|13| 2|0| 10| 0| 275| 2015| 1.2345|2015275.0| | F| C_FNHXUR806.CNSTHI|1443790800|13| 2|0| 10| 0| 275| 2015| 1.2345|2015275.0| +---+---+--+--+--+-++---+--+---+---+-+ {code} On changing the schema of A according to B and doing unionAll works fine {code} C = A.select("dtype","tag","time","tm_hour","tm_mday","tm_min",”tm_mon”,"tm_sec","tm_yday","tm_year","value","year_day") A = C.unionAll(B) A.show() +-+---+--+---+---+--+--+--+---+---+--++ |dtype|tag| time|tm_hour|tm_mday|tm_min|tm_mon|tm_sec|tm_yday|tm_year| value|year_day| +-+---+--+---+---+--+--+--+---+---+--++ |F|C_FNHXUT701Z.CNSTLO|1443790800| 13| 2| 0|10| 0| 275| 2015|1.2345| 2015275| |F|C_FNHXUDP713.CNSTHI|1443790800| 13| 2| 0|10| 0| 275| 2015|1.2345| 2015275| |F| C_FNHXUT718.CNSTHI|1443790800| 13| 2| 0|10| 0| 275| 2015|1.2345| 2015275| |F|C_FNHXUT703Z.CNSTLO|1443790800| 13| 2| 0|10| 0| 275| 2015|1.2345| 2015275| |F|C_FNHXUR716A.CNSTLO|1443790800| 13| 2| 0|10| 0| 275| 2015|1.2345| 2015275| |F|C_FNHXUT803Z.CNSTHI|1443790800| 13| 2| 0|10| 0| 275| 2015|1.2345| 2015275| |F| C_FNHXUT728.CNSTHI|1443790800| 13| 2| 0|10| 0| 275| 2015|1.2345| 2015275| |F| C_FNHXUR806.CNSTHI|1443790800| 13| 2| 0|
[jira] [Resolved] (SPARK-13181) Spark delay in task scheduling within executor
[ https://issues.apache.org/jira/browse/SPARK-13181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph resolved SPARK-13181. --- Resolution: Not A Problem > Spark delay in task scheduling within executor > -- > > Key: SPARK-13181 > URL: https://issues.apache.org/jira/browse/SPARK-13181 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.2 >Reporter: Prabhu Joseph > Fix For: 1.5.2 > > Attachments: ran3.JPG > > > When Spark job with some RDD in memory and some in Hadoop, the tasks within > Executor which reads from memory is started parallel but task to read from > hadoop is started after some delay. > Repro: > A logFile of 1.25 GB is given as input. (5 RDD each of 256MB) > val logData = sc.textFile(logFile, 2).cache() > var numAs = logData.filter(line => line.contains("a")).count() > var numBs = logData.filter(line => line.contains("b")).count() > Run Spark Job with 1 executor with 6GB memory, 12 cores > Stage A (reading line with a) - executor starts 5 tasks parallel and all > reads data from Hadoop. > Stage B(reading line with b) - As the data is cached (4 RDD is in memory, 1 > is in Hadoop) - executor starts 4 tasks parallel and after 4 seconds delay, > starts the last task to read from Hadoop. > On Running the same Spark Job with 12GB memory, all 5 RDD are in memory ans 5 > tasks in Stage B started parallel. > On Running the job with 2GB memory, all 5 RDD are in Hadoop and 5 tasks in > stage B started parallel. > The task delay happens only when some RDD in memory and some in Hadoop. > Check the attached image. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13181) Spark delay in task scheduling within executor
[ https://issues.apache.org/jira/browse/SPARK-13181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15132115#comment-15132115 ] Prabhu Joseph commented on SPARK-13181: --- Okay, the reason for the task delay within executor when some RDD in memory and some in Hadoop i.e, Multiple Locality Levels NODE_LOCAL and ANY, in this case Scheduler waits for spark.locality.wait 3 seconds default. During this period, scheduler waits to launch a data-local task before giving up and launching it on a less-local node. So after making it 0, all tasks started parallel. But learned that it is better not to reduce it to 0. > Spark delay in task scheduling within executor > -- > > Key: SPARK-13181 > URL: https://issues.apache.org/jira/browse/SPARK-13181 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.2 >Reporter: Prabhu Joseph > Fix For: 1.5.2 > > Attachments: ran3.JPG > > > When Spark job with some RDD in memory and some in Hadoop, the tasks within > Executor which reads from memory is started parallel but task to read from > hadoop is started after some delay. > Repro: > A logFile of 1.25 GB is given as input. (5 RDD each of 256MB) > val logData = sc.textFile(logFile, 2).cache() > var numAs = logData.filter(line => line.contains("a")).count() > var numBs = logData.filter(line => line.contains("b")).count() > Run Spark Job with 1 executor with 6GB memory, 12 cores > Stage A (reading line with a) - executor starts 5 tasks parallel and all > reads data from Hadoop. > Stage B(reading line with b) - As the data is cached (4 RDD is in memory, 1 > is in Hadoop) - executor starts 4 tasks parallel and after 4 seconds delay, > starts the last task to read from Hadoop. > On Running the same Spark Job with 12GB memory, all 5 RDD are in memory ans 5 > tasks in Stage B started parallel. > On Running the job with 2GB memory, all 5 RDD are in Hadoop and 5 tasks in > stage B started parallel. > The task delay happens only when some RDD in memory and some in Hadoop. > Check the attached image. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13181) Spark delay in task scheduling within executor
[ https://issues.apache.org/jira/browse/SPARK-13181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated SPARK-13181: -- Attachment: ran3.JPG > Spark delay in task scheduling within executor > -- > > Key: SPARK-13181 > URL: https://issues.apache.org/jira/browse/SPARK-13181 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.2 >Reporter: Prabhu Joseph > Fix For: 1.5.2 > > Attachments: ran3.JPG > > > When Spark job with some RDD in memory and some in Hadoop, the tasks within > Executor which reads from memory is started parallel but task to read from > hadoop is started after some delay. > Repro: > A logFile of 1.25 GB is given as input. (5 RDD each of 256MB) > val logData = sc.textFile(logFile, 2).cache() > var numAs = logData.filter(line => line.contains("a")).count() > var numBs = logData.filter(line => line.contains("b")).count() > Run Spark Job with 1 executor with 6GB memory, 12 cores > Stage A (reading line with a) - executor starts 5 tasks parallel and all > reads data from Hadoop. > Stage B(reading line with b) - As the data is cached (4 RDD is in memory, 1 > is in Hadoop) - executor starts 4 tasks parallel and after 4 seconds delay, > starts the last task to read from Hadoop. > On Running the same Spark Job with 12GB memory, all 5 RDD are in memory ans 5 > tasks in Stage B started parallel. > On Running the job with 2GB memory, all 5 RDD are in Hadoop and 5 tasks in > stage B started parallel. > The task delay happens only when some RDD in memory and some in Hadoop. > Check the attached image. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13181) Spark delay in task scheduling within executor
Prabhu Joseph created SPARK-13181: - Summary: Spark delay in task scheduling within executor Key: SPARK-13181 URL: https://issues.apache.org/jira/browse/SPARK-13181 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.5.2 Reporter: Prabhu Joseph Fix For: 1.5.2 When Spark job with some RDD in memory and some in Hadoop, the tasks within Executor which reads from memory is started parallel but task to read from hadoop is started after some delay. Repro: A logFile of 1.25 GB is given as input. (5 RDD each of 256MB) val logData = sc.textFile(logFile, 2).cache() var numAs = logData.filter(line => line.contains("a")).count() var numBs = logData.filter(line => line.contains("b")).count() Run Spark Job with 1 executor with 6GB memory, 12 cores Stage A (reading line with a) - executor starts 5 tasks parallel and all reads data from Hadoop. Stage B(reading line with b) - As the data is cached (4 RDD is in memory, 1 is in Hadoop) - executor starts 4 tasks parallel and after 4 seconds delay, starts the last task to read from Hadoop. On Running the same Spark Job with 12GB memory, all 5 RDD are in memory ans 5 tasks in Stage B started parallel. On Running the job with 2GB memory, all 5 RDD are in Hadoop and 5 tasks in stage B started parallel. The task delay happens only when some RDD in memory and some in Hadoop. Check the attached image. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13182) Spark Executor retries infinitely
Prabhu Joseph created SPARK-13182: - Summary: Spark Executor retries infinitely Key: SPARK-13182 URL: https://issues.apache.org/jira/browse/SPARK-13182 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.5.2 Reporter: Prabhu Joseph Priority: Minor Fix For: 1.5.2 When a Spark job (Spark-1.5.2) is submitted with a single executor and if user passes some wrong JVM arguments with spark.executor.extraJavaOptions, the first executor fails. But the job keeps on retrying, creating a new executor and failing every time, until CTRL-C is pressed. ./spark-submit --class SimpleApp --master "spark://10.10.72.145:7077" --conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=35 -XX:ConcGCThreads=16" /SPARK/SimpleApp.jar Here when user submits job with ConcGCThreads 16 which is greater than ParallelGCThreads, JVM will crash. But the job does not exit, keeps on creating executors and retrying. .. 16/02/01 06:54:28 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160201065319-0014/2846 on hostPort 10.10.72.145:36558 with 12 cores, 2.0 GB RAM 16/02/01 06:54:28 INFO AppClient$ClientEndpoint: Executor updated: app-20160201065319-0014/2846 is now LOADING 16/02/01 06:54:28 INFO AppClient$ClientEndpoint: Executor updated: app-20160201065319-0014/2846 is now RUNNING 16/02/01 06:54:28 INFO AppClient$ClientEndpoint: Executor updated: app-20160201065319-0014/2846 is now EXITED (Command exited with code 1) 16/02/01 06:54:28 INFO SparkDeploySchedulerBackend: Executor app-20160201065319-0014/2846 removed: Command exited with code 1 16/02/01 06:54:28 INFO SparkDeploySchedulerBackend: Asked to remove non-existent executor 2846 16/02/01 06:54:28 INFO AppClient$ClientEndpoint: Executor added: app-20160201065319-0014/2847 on worker-20160131230345-10.10.72.145-36558 (10.10.72.145:36558) with 12 cores 16/02/01 06:54:28 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160201065319-0014/2847 on hostPort 10.10.72.145:36558 with 12 cores, 2.0 GB RAM 16/02/01 06:54:28 INFO AppClient$ClientEndpoint: Executor updated: app-20160201065319-0014/2847 is now LOADING 16/02/01 06:54:28 INFO AppClient$ClientEndpoint: Executor updated: app-20160201065319-0014/2847 is now EXITED (Command exited with code 1) 16/02/01 06:54:28 INFO SparkDeploySchedulerBackend: Executor app-20160201065319-0014/2847 removed: Command exited with code 1 16/02/01 06:54:28 INFO SparkDeploySchedulerBackend: Asked to remove non-existent executor 2847 16/02/01 06:54:28 INFO AppClient$ClientEndpoint: Executor added: app-20160201065319-0014/2848 on worker-20160131230345-10.10.72.145-36558 (10.10.72.145:36558) with 12 cores 16/02/01 06:54:28 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160201065319-0014/2848 on hostPort 10.10.72.145:36558 with 12 cores, 2.0 GB RAM 16/02/01 06:54:28 INFO AppClient$ClientEndpoint: Executor updated: app-20160201065319-0014/2848 is now LOADING 16/02/01 06:54:28 INFO AppClient$ClientEndpoint: Executor updated: app-20160201065319-0014/2848 is now RUNNING Spark should not fall into a trap on these kind of user errors on a production cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org