[jira] [Commented] (SPARK-38058) Writing a spark dataframe to Azure Sql Server is causing duplicate records intermittently

2022-02-02 Thread john (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17486263#comment-17486263
 ] 

john commented on SPARK-38058:
--

since i am working in production env i cannot disclose any docs in here. this 
may be bug in spark. it happend every 3/5 times. for 2 times all the records 
are inserted correctly. other times duplicats are inserted. we have tried all 
workarounds it is not working

> Writing a spark dataframe to Azure Sql Server is causing duplicate records 
> intermittently
> -
>
> Key: SPARK-38058
> URL: https://issues.apache.org/jira/browse/SPARK-38058
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Spark Core
>Affects Versions: 3.1.0
>Reporter: john
>Priority: Major
>
> We are using JDBC option to insert transformed data in a spark DataFrame to a 
> table in Azure SQL Server. Below is the code snippet we are using for this 
> insert. However, we noticed on few occasions that some records are being 
> duplicated in the destination table. This is happening for large tables. e.g. 
> if a DataFrame has 600K records, after inserting data into the table, we get 
> around 620K records.  we still want to understand why that's happening.
>  {{DataToLoad.write.jdbc(url = jdbcUrl, table = targetTable, mode = 
> "overwrite", properties = jdbcConnectionProperties)}}
>  
> Only reason we could think of is that while inserts are happening in 
> distributed fashion, if one of the executors fail in between, they are being 
> re-tried and could be inserting duplicate records. This could be totally 
> meaningless but just to see if that could be an issue.{{{}{}}}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38058) Writing a spark dataframe to Azure Sql Server is causing duplicate records intermittently

2022-01-30 Thread john (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17484318#comment-17484318
 ] 

john commented on SPARK-38058:
--

it seems it doesn't specific to sql server.it is the problem with the spark 
itself. 
https://issues.apache.org/jira/browse/SPARK-16741 - this link suggest that 
disable the spark.speculation . but in latest spark version it is disable is 
default
i have tried that also. also then the duplicate rows were there in sql server 
when i am using jdbc in spark.

i have tried with small mount of data like 10K . it is working fine  no 
duplicates.
when i have load millions of data duplicate is there. 
because of this issue. we are using intermediate stage layer table to get all 
data including duplicates and we are inserting into landing zone with distinct 
clause.

> Writing a spark dataframe to Azure Sql Server is causing duplicate records 
> intermittently
> -
>
> Key: SPARK-38058
> URL: https://issues.apache.org/jira/browse/SPARK-38058
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Spark Core
>Affects Versions: 3.1.0
>Reporter: john
>Priority: Major
>
> We are using JDBC option to insert transformed data in a spark DataFrame to a 
> table in Azure SQL Server. Below is the code snippet we are using for this 
> insert. However, we noticed on few occasions that some records are being 
> duplicated in the destination table. This is happening for large tables. e.g. 
> if a DataFrame has 600K records, after inserting data into the table, we get 
> around 620K records.  we still want to understand why that's happening.
>  {{DataToLoad.write.jdbc(url = jdbcUrl, table = targetTable, mode = 
> "overwrite", properties = jdbcConnectionProperties)}}
>  
> Only reason we could think of is that while inserts are happening in 
> distributed fashion, if one of the executors fail in between, they are being 
> re-tried and could be inserting duplicate records. This could be totally 
> meaningless but just to see if that could be an issue.{{{}{}}}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38058) Writing a spark dataframe to Azure Sql Server is causing duplicate records intermittently

2022-01-28 Thread john (Jira)
john created SPARK-38058:


 Summary: Writing a spark dataframe to Azure Sql Server is causing 
duplicate records intermittently
 Key: SPARK-38058
 URL: https://issues.apache.org/jira/browse/SPARK-38058
 Project: Spark
  Issue Type: Bug
  Components: PySpark, Spark Core
Affects Versions: 3.1.0
Reporter: john


We are using JDBC option to insert transformed data in a spark DataFrame to a 
table in Azure SQL Server. Below is the code snippet we are using for this 
insert. However, we noticed on few occasions that some records are being 
duplicated in the destination table. This is happening for large tables. e.g. 
if a DataFrame has 600K records, after inserting data into the table, we get 
around 620K records.  we still want to understand why that's happening.
 {{DataToLoad.write.jdbc(url = jdbcUrl, table = targetTable, mode = 
"overwrite", properties = jdbcConnectionProperties)}}
 
Only reason we could think of is that while inserts are happening in 
distributed fashion, if one of the executors fail in between, they are being 
re-tried and could be inserting duplicate records. This could be totally 
meaningless but just to see if that could be an issue.{{{}{}}}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34513) Kubernetes Spark Driver Pod Name Length Limitation

2021-02-23 Thread John (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John updated SPARK-34513:
-
Description: 
Hi,

We are using Spark in Airflow with the k8s-master. Airflow is attaching to our 
spark-driver pod a unique id utilizing the k8s-subdomain convention '.'

This creates rather long pod-names. 

We noticed an issue with pod names in total (pod name + airflow attached uuid) 
exceeding 63 chars. Usually pod names can be up to 253 chars long. However 
Spark seems to have an issue with driver pod names which are longer than 63 
characters.

In our case the driver pod name is exactly 65 chars long, but Spark is omitting 
the last 2 chars in its error message. I assume internally Spark is loosing 
those two characters. Reducing our Driver Pod Name to just 63 charts fixed the 
issue.

Here you can see the actual pod name (row 1) and the pod name from the Spark 
Error log (row 2)
{code:java}
ab-aa--cc-dd.3s092032c69f4639adff835a826e0120
ab-aa--cc-dd.3s092032c69f4639adff835a826e01{code}
{code:java}
[2021-02-20 00:30:06,289] {pod_launcher.py:136} INFO - Exception in thread 
"main" org.apache.spark.SparkException: No pod was found named 
Some(ab-aa--cc-dd.3s092032c69f4639adff835a826e01) in the 
cluster in the namespace airflow-ns (this was supposed to be the driver 
pod.).{code}
 

  was:
Hi,

We are using Spark in Airflow with the k8s-master. Airflow is attaching to our 
spark-driver pod a unique id utilizing the k8s-subdomain convention '.'

This creates rather long pod-names. 

We noticed an issue with pod names in total (pod name + airflow attached uuid) 
exceeding 63 chars. Usually pod names can be up to 253 chars long. However 
Spark seems to have an issue with driver pod names which are longer than 63 
characters.

In our case the driver pod name is exactly 65 chars long, but Spark is omitting 
the last 2 chars in its error message. I assume internally Spark is loosing 
those two characters. Reducing our Driver Pod Name to just 63 charts fixed the 
issue.

Here you can see the actual pod name (row 1) and the pod name from the Spark 
Error log (row 2)
ab-aa--cc-dd.3s092032c69f4639adff835a826e0120
ab-aa--cc-dd.3s092032c69f4639adff835a826e01
[2021-02-20 00:30:06,289] \{pod_launcher.py:136} INFO - Exception in thread 
"main" org.apache.spark.SparkException: No pod was found named 
Some(ab-aa--cc-dd.3s092032c69f4639adff835a826e01) in the 
cluster in the namespace airflow-ns (this was supposed to be the driver pod.).
 


> Kubernetes Spark Driver Pod Name Length Limitation
> --
>
> Key: SPARK-34513
> URL: https://issues.apache.org/jira/browse/SPARK-34513
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.0.0, 3.0.1
>Reporter: John
>Priority: Major
>
> Hi,
> We are using Spark in Airflow with the k8s-master. Airflow is attaching to 
> our spark-driver pod a unique id utilizing the k8s-subdomain convention '.'
> This creates rather long pod-names. 
> We noticed an issue with pod names in total (pod name + airflow attached 
> uuid) exceeding 63 chars. Usually pod names can be up to 253 chars long. 
> However Spark seems to have an issue with driver pod names which are longer 
> than 63 characters.
> In our case the driver pod name is exactly 65 chars long, but Spark is 
> omitting the last 2 chars in its error message. I assume internally Spark is 
> loosing those two characters. Reducing our Driver Pod Name to just 63 charts 
> fixed the issue.
> Here you can see the actual pod name (row 1) and the pod name from the Spark 
> Error log (row 2)
> {code:java}
> ab-aa--cc-dd.3s092032c69f4639adff835a826e0120
> ab-aa--cc-dd.3s092032c69f4639adff835a826e01{code}
> {code:java}
> [2021-02-20 00:30:06,289] {pod_launcher.py:136} INFO - Exception in thread 
> "main" org.apache.spark.SparkException: No pod was found named 
> Some(ab-aa--cc-dd.3s092032c69f4639adff835a826e01) in the 
> cluster in the namespace airflow-ns (this was supposed to be the driver 
> pod.).{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34513) Kubernetes Spark Driver Pod Name Length Limitation

2021-02-23 Thread John (Jira)
John created SPARK-34513:


 Summary: Kubernetes Spark Driver Pod Name Length Limitation
 Key: SPARK-34513
 URL: https://issues.apache.org/jira/browse/SPARK-34513
 Project: Spark
  Issue Type: Bug
  Components: Kubernetes
Affects Versions: 3.0.1, 3.0.0
Reporter: John


Hi,

We are using Spark in Airflow with the k8s-master. Airflow is attaching to our 
spark-driver pod a unique id utilizing the k8s-subdomain convention '.'

This creates rather long pod-names. 

We noticed an issue with pod names in total (pod name + airflow attached uuid) 
exceeding 63 chars. Usually pod names can be up to 253 chars long. However 
Spark seems to have an issue with driver pod names which are longer than 63 
characters.

In our case the driver pod name is exactly 65 chars long, but Spark is omitting 
the last 2 chars in its error message. I assume internally Spark is loosing 
those two characters. Reducing our Driver Pod Name to just 63 charts fixed the 
issue.

Here you can see the actual pod name (row 1) and the pod name from the Spark 
Error log (row 2)
ab-aa--cc-dd.3s092032c69f4639adff835a826e0120
ab-aa--cc-dd.3s092032c69f4639adff835a826e01
[2021-02-20 00:30:06,289] \{pod_launcher.py:136} INFO - Exception in thread 
"main" org.apache.spark.SparkException: No pod was found named 
Some(ab-aa--cc-dd.3s092032c69f4639adff835a826e01) in the 
cluster in the namespace airflow-ns (this was supposed to be the driver pod.).
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26000) Missing block when reading HDFS Data from Cloudera Manager

2018-11-09 Thread john (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682163#comment-16682163
 ] 

john commented on SPARK-26000:
--

I have Cloudera Manager in Environment A which has HDFS component and Spark in 
B. I am doing a very sample read and write to/from HDFS. Writing to HDFS 
Cloudera Manager is working as expected when reading back i m getting below 
issues:

 

"java.lang.reflect.InvocationTargetException" Caused By: 
"org.apache.spark.sql.AnalysisException: Unable to infer schema for Parquet. It 
must be specified manually.;"

Caused By: "java.net.SocketTimeoutException: 6 millis timeout while waiting 
for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/SparkNode_IP_PORT_NoO 
remote=/NameNode:50010:"

java Sample code

 

// writing 

spark.write().mode("append").format("parquet").save(path_to_file);

// read

spark.read().parquet(path_to_file);

 

 

 

 

> Missing block when reading HDFS Data from Cloudera Manager
> --
>
> Key: SPARK-26000
> URL: https://issues.apache.org/jira/browse/SPARK-26000
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.2
>Reporter: john
>Priority: Major
>
> I am able to write to Cloudera Manager HDFS through Open Source Spark which 
> runs separately. but not able to read the Cloudera Manger HDFS data .
>  
> I am getting missing block location, socketTimeOut.
>  
> spark.read().textfile(path_to_file)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26000) Missing block when reading HDFS Data from Cloudera Manager

2018-11-09 Thread john (JIRA)
john created SPARK-26000:


 Summary: Missing block when reading HDFS Data from Cloudera Manager
 Key: SPARK-26000
 URL: https://issues.apache.org/jira/browse/SPARK-26000
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.2.2
Reporter: john


I am able to write to Cloudera Manager HDFS through Open Source Spark which 
runs separately. but not able to read the Cloudera Manger HDFS data .

 

I am getting missing block location, socketTimeOut.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26000) Missing block when reading HDFS Data from Cloudera Manager

2018-11-09 Thread john (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

john updated SPARK-26000:
-
Description: 
I am able to write to Cloudera Manager HDFS through Open Source Spark which 
runs separately. but not able to read the Cloudera Manger HDFS data .

 

I am getting missing block location, socketTimeOut.

 

spark.read().textfile(path_to_file)

  was:
I am able to write to Cloudera Manager HDFS through Open Source Spark which 
runs separately. but not able to read the Cloudera Manger HDFS data .

 

I am getting missing block location, socketTimeOut.


> Missing block when reading HDFS Data from Cloudera Manager
> --
>
> Key: SPARK-26000
> URL: https://issues.apache.org/jira/browse/SPARK-26000
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.2
>Reporter: john
>Priority: Major
>
> I am able to write to Cloudera Manager HDFS through Open Source Spark which 
> runs separately. but not able to read the Cloudera Manger HDFS data .
>  
> I am getting missing block location, socketTimeOut.
>  
> spark.read().textfile(path_to_file)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23982) NoSuchMethodException: There is no startCredentialUpdater method in the object YarnSparkHadoopUtil

2018-04-15 Thread John (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438895#comment-16438895
 ] 

John commented on SPARK-23982:
--

see spark-core_2.11-2.3.0 and spark-yarn_2.11-2.3.0

org.apache.spark.executor.CoarseGrainedExecutorBackend:

if (driverConf.contains("spark.yarn.credentials.file")) {
  logInfo("Will periodically update credentials from: " +
    driverConf.get("spark.yarn.credentials.file"))
  Utils.classForName("org.apache.spark.deploy.yarn.YarnSparkHadoopUtil")
    .getMethod("startCredentialUpdater", classOf[SparkConf])
    .invoke(null, driverConf)
}

 

> NoSuchMethodException: There is no startCredentialUpdater method in the 
> object YarnSparkHadoopUtil
> --
>
> Key: SPARK-23982
> URL: https://issues.apache.org/jira/browse/SPARK-23982
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: John
>Priority: Major
>
>  In the 219 line of the CoarseGrainedExecutorBackend class:
> Utils.classForName("org.apache.spark.deploy.yarn.YarnSparkHadoopUtil").getMethod("startCredentialUpdater",
>  classOf[SparkConf]).invoke(null, driverConf)
> But, There is no startCredentialUpdater method in the object 
> YarnSparkHadoopUtil.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23982) NoSuchMethodException: There is no startCredentialUpdater method in the object YarnSparkHadoopUtil

2018-04-14 Thread John (JIRA)
John created SPARK-23982:


 Summary: NoSuchMethodException: There is no startCredentialUpdater 
method in the object YarnSparkHadoopUtil
 Key: SPARK-23982
 URL: https://issues.apache.org/jira/browse/SPARK-23982
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.3.0
Reporter: John


 In the 219 line of the CoarseGrainedExecutorBackend class:
Utils.classForName("org.apache.spark.deploy.yarn.YarnSparkHadoopUtil").getMethod("startCredentialUpdater",
 classOf[SparkConf]).invoke(null, driverConf)
But, There is no startCredentialUpdater method in the object 
YarnSparkHadoopUtil.
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-17885) Spark Streaming deletes checkpointed RDD then tries to load it after restart

2017-10-03 Thread Vishal John (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16189483#comment-16189483
 ] 

Vishal John edited comment on SPARK-17885 at 10/3/17 11:27 AM:
---

I can see that the checkpointed folder was explicitly deleted - 
INFO dstream.DStreamCheckpointData: Deleted checkpoint file 
'hdfs://nameservice1/user/my-user/checkpoints/my-application/8c683e77-33b9-42ee-80f7-167abb39c241/rdd-401

I was looking at the source code of `cleanup` method in 
`DStreamCheckpointData`. I am curious to know what setting is causing this 
behaviour.

My StreamingContext batch duration is 30 seconds and I haven't provided any 
other time intervals. Should i need to provide any other intervals like 
checkpoint interval or something like that ?

-

UPDATE: I was able to get around this problem by setting 
"spark.streaming.stopGracefullyOnShutdown" to "true""




was (Author: vishaljohn):
I can see that the checkpointed folder was explicitly deleted - 
INFO dstream.DStreamCheckpointData: Deleted checkpoint file 
'hdfs://nameservice1/user/my-user/checkpoints/my-application/8c683e77-33b9-42ee-80f7-167abb39c241/rdd-401

I was looking at the source code of `cleanup` method in 
`DStreamCheckpointData`. I am curious to know what setting is causing this 
behaviour.

My StreamingContext batch duration is 30 seconds and I haven't provided any 
other time intervals. Should i need to provide any other intervals like 
checkpoint interval or something like that ?

> Spark Streaming deletes checkpointed RDD then tries to load it after restart
> 
>
> Key: SPARK-17885
> URL: https://issues.apache.org/jira/browse/SPARK-17885
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 1.5.1
>Reporter: Cosmin Ciobanu
>
> The issue is that the Spark driver checkpoints an RDD, deletes it, the job 
> restarts, and the new driver tries to load the deleted checkpoint RDD.
> The application is run in YARN, which attempts to restart the application a 
> number of times (100 in our case), all of which fail due to missing the 
> deleted RDD. 
> Here is a Splunk log which shows the inconsistency in checkpoint behaviour:
> *2016-10-09 02:48:43,533* [streaming-job-executor-0] INFO  
> org.apache.spark.rdd.ReliableRDDCheckpointData - Done checkpointing RDD 73847 
> to 
> hdfs://proc-job/checkpoint/cadf8dcf-ebc2-4366-a2e1-0939976c6ce1/*rdd-73847*, 
> new parent is RDD 73872
> host = ip-10-1-1-13.ec2.internal
> *2016-10-09 02:53:14,696* [JobGenerator] INFO  
> org.apache.spark.streaming.dstream.DStreamCheckpointData - Deleted checkpoint 
> file 
> 'hdfs://proc-job/checkpoint/cadf8dcf-ebc2-4366-a2e1-0939976c6ce1/*rdd-73847*' 
> for time 147598131 ms
> host = ip-10-1-1-13.ec2.internal
> *Job restarts here, notice driver host change from ip-10-1-1-13.ec2.internal 
> to ip-10-1-1-25.ec2.internal.*
> *2016-10-09 02:53:30,175* [Driver] INFO  
> org.apache.spark.streaming.dstream.DStreamCheckpointData - Restoring 
> checkpointed RDD for time 147598131 ms from file 
> 'hdfs://proc-job/checkpoint/cadf8dcf-ebc2-4366-a2e1-0939976c6ce1/*rdd-73847*'
> host = ip-10-1-1-25.ec2.internal
> *2016-10-09 02:53:30,491* [Driver] ERROR 
> org.apache.spark.deploy.yarn.ApplicationMaster - User class threw exception: 
> java.lang.IllegalArgumentException: requirement failed: Checkpoint directory 
> does not exist: 
> hdfs://proc-job/checkpoint/cadf8dcf-ebc2-4366-a2e1-0939976c6ce1/*rdd-73847*
> java.lang.IllegalArgumentException: requirement failed: Checkpoint directory 
> does not exist: 
> hdfs://proc-job/checkpoint/cadf8dcf-ebc2-4366-a2e1-0939976c6ce1/*rdd-73847*
> host = ip-10-1-1-25.ec2.internal
> Spark streaming is configured with a microbatch interval of 30 seconds, 
> checkpoint interval of 120 seconds, and cleaner.ttl of 28800 (8 hours), but 
> as far as I can tell, this TTL only affects metadata cleanup interval. RDDs 
> seem to be deleted every 4-5 minutes after being checkpointed.
> Running on top of Spark 1.5.1.
> There are at least two possible issues here:
> - In case of a driver restart the new driver tries to load checkpointed RDDs 
> which the previous driver had just deleted;
> - Spark loads stale checkpointed data - the logs show that the deleted RDD 
> was initially checkpointed 4 minutes and 31 seconds before deletion, and 4 
> minutes and 47 seconds before the new driver tries to load it. Given the fact 
> the checkpointing interval is 120 seconds, it makes no sense to load data 
> older than that.
> P.S. Looking at the source code with the event loop that handles checkpoint 
> updates and cleanup, nothing seems to have changed in more recent 

[jira] [Commented] (SPARK-17885) Spark Streaming deletes checkpointed RDD then tries to load it after restart

2017-10-03 Thread Vishal John (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16189483#comment-16189483
 ] 

Vishal John commented on SPARK-17885:
-

I can see that the checkpointed folder was explicitly deleted - 
INFO dstream.DStreamCheckpointData: Deleted checkpoint file 
'hdfs://nameservice1/user/my-user/checkpoints/my-application/8c683e77-33b9-42ee-80f7-167abb39c241/rdd-401

I was looking at the source code of `cleanup` method in 
`DStreamCheckpointData`. I am curious to know what setting is causing this 
behaviour.

My StreamingContext batch duration is 30 seconds and I haven't provided any 
other time intervals. Should i need to provide any other intervals like 
checkpoint interval or something like that ?

> Spark Streaming deletes checkpointed RDD then tries to load it after restart
> 
>
> Key: SPARK-17885
> URL: https://issues.apache.org/jira/browse/SPARK-17885
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 1.5.1
>Reporter: Cosmin Ciobanu
>
> The issue is that the Spark driver checkpoints an RDD, deletes it, the job 
> restarts, and the new driver tries to load the deleted checkpoint RDD.
> The application is run in YARN, which attempts to restart the application a 
> number of times (100 in our case), all of which fail due to missing the 
> deleted RDD. 
> Here is a Splunk log which shows the inconsistency in checkpoint behaviour:
> *2016-10-09 02:48:43,533* [streaming-job-executor-0] INFO  
> org.apache.spark.rdd.ReliableRDDCheckpointData - Done checkpointing RDD 73847 
> to 
> hdfs://proc-job/checkpoint/cadf8dcf-ebc2-4366-a2e1-0939976c6ce1/*rdd-73847*, 
> new parent is RDD 73872
> host = ip-10-1-1-13.ec2.internal
> *2016-10-09 02:53:14,696* [JobGenerator] INFO  
> org.apache.spark.streaming.dstream.DStreamCheckpointData - Deleted checkpoint 
> file 
> 'hdfs://proc-job/checkpoint/cadf8dcf-ebc2-4366-a2e1-0939976c6ce1/*rdd-73847*' 
> for time 147598131 ms
> host = ip-10-1-1-13.ec2.internal
> *Job restarts here, notice driver host change from ip-10-1-1-13.ec2.internal 
> to ip-10-1-1-25.ec2.internal.*
> *2016-10-09 02:53:30,175* [Driver] INFO  
> org.apache.spark.streaming.dstream.DStreamCheckpointData - Restoring 
> checkpointed RDD for time 147598131 ms from file 
> 'hdfs://proc-job/checkpoint/cadf8dcf-ebc2-4366-a2e1-0939976c6ce1/*rdd-73847*'
> host = ip-10-1-1-25.ec2.internal
> *2016-10-09 02:53:30,491* [Driver] ERROR 
> org.apache.spark.deploy.yarn.ApplicationMaster - User class threw exception: 
> java.lang.IllegalArgumentException: requirement failed: Checkpoint directory 
> does not exist: 
> hdfs://proc-job/checkpoint/cadf8dcf-ebc2-4366-a2e1-0939976c6ce1/*rdd-73847*
> java.lang.IllegalArgumentException: requirement failed: Checkpoint directory 
> does not exist: 
> hdfs://proc-job/checkpoint/cadf8dcf-ebc2-4366-a2e1-0939976c6ce1/*rdd-73847*
> host = ip-10-1-1-25.ec2.internal
> Spark streaming is configured with a microbatch interval of 30 seconds, 
> checkpoint interval of 120 seconds, and cleaner.ttl of 28800 (8 hours), but 
> as far as I can tell, this TTL only affects metadata cleanup interval. RDDs 
> seem to be deleted every 4-5 minutes after being checkpointed.
> Running on top of Spark 1.5.1.
> There are at least two possible issues here:
> - In case of a driver restart the new driver tries to load checkpointed RDDs 
> which the previous driver had just deleted;
> - Spark loads stale checkpointed data - the logs show that the deleted RDD 
> was initially checkpointed 4 minutes and 31 seconds before deletion, and 4 
> minutes and 47 seconds before the new driver tries to load it. Given the fact 
> the checkpointing interval is 120 seconds, it makes no sense to load data 
> older than that.
> P.S. Looking at the source code with the event loop that handles checkpoint 
> updates and cleanup, nothing seems to have changed in more recent versions of 
> Spark, so the bug is likely present in 2.0.1 as well.
> P.P.S. The issue is difficult to reproduce - it only occurs once in every 10 
> or so restarts, and only in clusters with high-load.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17885) Spark Streaming deletes checkpointed RDD then tries to load it after restart

2017-10-03 Thread Vishal John (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16189403#comment-16189403
 ] 

Vishal John commented on SPARK-17885:
-


Hello all,

Our application also suffers from the same problem. Our application uses Spark 
states(mapWithState) and checkpointed RDDs are getting created in the specified 
checkpoint folder. But when the application is killed then the directory 
containing the checkpointed RDDs is cleared. 
When I launch the application again, it fails because it cannot find the 
checkpoint directory. 

This is the error 'java.lang.IllegalArgumentException: requirement failed: 
Checkpoint directory does not exist: 
hdfs://nameservice1/user/my-user/checkpoints/my-application/77b1dd15-f904-4e80-a5ed-5018224b4df0/rdd-6833'

The applications uses Spark 2.0.2 and it's deployed on Cloudera YARN 
(2.5.0-cdh5.2.0)

Because of this error we are unable to use the checkpointed RDDs and Spark 
states. Can this issue be taken up as priority ?
Please let me know if you require any additional information.
[~tdas][~srowen]

thanks a lot,
Vishal

> Spark Streaming deletes checkpointed RDD then tries to load it after restart
> 
>
> Key: SPARK-17885
> URL: https://issues.apache.org/jira/browse/SPARK-17885
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 1.5.1
>Reporter: Cosmin Ciobanu
>
> The issue is that the Spark driver checkpoints an RDD, deletes it, the job 
> restarts, and the new driver tries to load the deleted checkpoint RDD.
> The application is run in YARN, which attempts to restart the application a 
> number of times (100 in our case), all of which fail due to missing the 
> deleted RDD. 
> Here is a Splunk log which shows the inconsistency in checkpoint behaviour:
> *2016-10-09 02:48:43,533* [streaming-job-executor-0] INFO  
> org.apache.spark.rdd.ReliableRDDCheckpointData - Done checkpointing RDD 73847 
> to 
> hdfs://proc-job/checkpoint/cadf8dcf-ebc2-4366-a2e1-0939976c6ce1/*rdd-73847*, 
> new parent is RDD 73872
> host = ip-10-1-1-13.ec2.internal
> *2016-10-09 02:53:14,696* [JobGenerator] INFO  
> org.apache.spark.streaming.dstream.DStreamCheckpointData - Deleted checkpoint 
> file 
> 'hdfs://proc-job/checkpoint/cadf8dcf-ebc2-4366-a2e1-0939976c6ce1/*rdd-73847*' 
> for time 147598131 ms
> host = ip-10-1-1-13.ec2.internal
> *Job restarts here, notice driver host change from ip-10-1-1-13.ec2.internal 
> to ip-10-1-1-25.ec2.internal.*
> *2016-10-09 02:53:30,175* [Driver] INFO  
> org.apache.spark.streaming.dstream.DStreamCheckpointData - Restoring 
> checkpointed RDD for time 147598131 ms from file 
> 'hdfs://proc-job/checkpoint/cadf8dcf-ebc2-4366-a2e1-0939976c6ce1/*rdd-73847*'
> host = ip-10-1-1-25.ec2.internal
> *2016-10-09 02:53:30,491* [Driver] ERROR 
> org.apache.spark.deploy.yarn.ApplicationMaster - User class threw exception: 
> java.lang.IllegalArgumentException: requirement failed: Checkpoint directory 
> does not exist: 
> hdfs://proc-job/checkpoint/cadf8dcf-ebc2-4366-a2e1-0939976c6ce1/*rdd-73847*
> java.lang.IllegalArgumentException: requirement failed: Checkpoint directory 
> does not exist: 
> hdfs://proc-job/checkpoint/cadf8dcf-ebc2-4366-a2e1-0939976c6ce1/*rdd-73847*
> host = ip-10-1-1-25.ec2.internal
> Spark streaming is configured with a microbatch interval of 30 seconds, 
> checkpoint interval of 120 seconds, and cleaner.ttl of 28800 (8 hours), but 
> as far as I can tell, this TTL only affects metadata cleanup interval. RDDs 
> seem to be deleted every 4-5 minutes after being checkpointed.
> Running on top of Spark 1.5.1.
> There are at least two possible issues here:
> - In case of a driver restart the new driver tries to load checkpointed RDDs 
> which the previous driver had just deleted;
> - Spark loads stale checkpointed data - the logs show that the deleted RDD 
> was initially checkpointed 4 minutes and 31 seconds before deletion, and 4 
> minutes and 47 seconds before the new driver tries to load it. Given the fact 
> the checkpointing interval is 120 seconds, it makes no sense to load data 
> older than that.
> P.S. Looking at the source code with the event loop that handles checkpoint 
> updates and cleanup, nothing seems to have changed in more recent versions of 
> Spark, so the bug is likely present in 2.0.1 as well.
> P.P.S. The issue is difficult to reproduce - it only occurs once in every 10 
> or so restarts, and only in clusters with high-load.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21346) Spark does not use SSL for HTTP File Server and Broadcast Server

2017-07-08 Thread John (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079405#comment-16079405
 ] 

John edited comment on SPARK-21346 at 7/9/17 1:27 AM:
--

Sorry, I wasn't aware of that. Do you mind elaborating more on what kind of 
resources are fetched using HTTPS? I'd be happy to make a PR but I'd just like 
a little more information. 


was (Author: jljlee118):
Sorry, I wasn't aware of that. Do you mind elaborating more on what kind of 
resources are fetched using HTTPS? I'd be happy to make a PR buy I'd just like 
a little more information. 

> Spark does not use SSL for HTTP File Server and Broadcast Server
> 
>
> Key: SPARK-21346
> URL: https://issues.apache.org/jira/browse/SPARK-21346
> Project: Spark
>  Issue Type: Question
>  Components: Documentation, Spark Core
>Affects Versions: 2.1.1
>Reporter: John
>Priority: Minor
>  Labels: documentation
>
> SecurityManager states that SSL is used to secure HTTP communication for the 
> broadcast and file server. However, the SSLOptions from the SecurityManager 
> only seem to be used by the SparkUI, the WebUI, and the HistoryServer. 
> According to  [Spark-11140|https://issues.apache.org/jira/browse/SPARK-11140] 
> and [Spark-12588|https://issues.apache.org/jira/browse/SPARK-12588], neither 
> the file server nor broadcast use HTTP anymore. It seems that the 
> documentation is inaccurate and that Spark actually uses SASL on the RPC 
> endpoints to secure the file server and broadcast communications.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21346) Spark does not use SSL for HTTP File Server and Broadcast Server

2017-07-08 Thread John (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079405#comment-16079405
 ] 

John commented on SPARK-21346:
--

Sorry, I wasn't aware of that. Do you mind elaborating more on what kind of 
resources are fetched using HTTPS? I'd be happy to make a PR buy I'd just like 
a little more information. 

> Spark does not use SSL for HTTP File Server and Broadcast Server
> 
>
> Key: SPARK-21346
> URL: https://issues.apache.org/jira/browse/SPARK-21346
> Project: Spark
>  Issue Type: Question
>  Components: Documentation, Spark Core
>Affects Versions: 2.1.1
>Reporter: John
>Priority: Minor
>  Labels: documentation
>
> SecurityManager states that SSL is used to secure HTTP communication for the 
> broadcast and file server. However, the SSLOptions from the SecurityManager 
> only seem to be used by the SparkUI, the WebUI, and the HistoryServer. 
> According to  [Spark-11140|https://issues.apache.org/jira/browse/SPARK-11140] 
> and [Spark-12588|https://issues.apache.org/jira/browse/SPARK-12588], neither 
> the file server nor broadcast use HTTP anymore. It seems that the 
> documentation is inaccurate and that Spark actually uses SASL on the RPC 
> endpoints to secure the file server and broadcast communications.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21346) Spark does not use SSL for HTTP File Server and Broadcast Server

2017-07-07 Thread John (JIRA)
John created SPARK-21346:


 Summary: Spark does not use SSL for HTTP File Server and Broadcast 
Server
 Key: SPARK-21346
 URL: https://issues.apache.org/jira/browse/SPARK-21346
 Project: Spark
  Issue Type: Question
  Components: Documentation, Spark Core
Affects Versions: 2.1.1
Reporter: John
Priority: Minor


SecurityManager states that SSL is used to secure HTTP communication for the 
broadcast and file server. However, the SSLOptions from the SecurityManager 
only seem to be used by the SparkUI, the WebUI, and the HistoryServer. 
According to  [Spark-11140|https://issues.apache.org/jira/browse/SPARK-11140] 
and [Spark-12588|https://issues.apache.org/jira/browse/SPARK-12588], neither 
the file server nor broadcast use HTTP anymore. It seems that the documentation 
is inaccurate and that Spark actually uses SASL on the RPC endpoints to secure 
the file server and broadcast communications.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12763) Spark gets stuck executing SSB query

2016-05-19 Thread Rogers Jeffrey Leo John (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292263#comment-15292263
 ] 

Rogers Jeffrey Leo John commented on SPARK-12763:
-

I believe it is related to the join order : date, customer, supplier, part,
ddate customer supplier are all dimension tables and  joining them in that 
order would result in a crossproduct. I believe rewriting the query to use   
"lineorder date, customer, supplier, part" in the from clause should get it to 
work

> Spark gets stuck executing SSB query
> 
>
> Key: SPARK-12763
> URL: https://issues.apache.org/jira/browse/SPARK-12763
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
> Environment: Standalone cluster
>Reporter: Vadim Tkachenko
> Attachments: Spark shell - Details for Stage 5 (Attempt 0).pdf
>
>
> I am trying to emulate SSB load. Data generated with 
> https://github.com/Percona-Lab/ssb-dbgen
> generated size is with 1000 scale factor and converted to parquet format.
> Now there is a following script
> val pLineOrder = 
> sqlContext.read.parquet("/mnt/i3600/spark/ssb-1000/lineorder").cache()
> val pDate = sqlContext.read.parquet("/mnt/i3600/spark/ssb-1000/date").cache()
> val pPart = sqlContext.read.parquet("/mnt/i3600/spark/ssb-1000/part").cache()
> val pSupplier = 
> sqlContext.read.parquet("/mnt/i3600/spark/ssb-1000/supplier").cache()
> val pCustomer = 
> sqlContext.read.parquet("/mnt/i3600/spark/ssb-1000/customer").cache()
> pLineOrder.registerTempTable("lineorder")
> pDate.registerTempTable("date")
> pPart.registerTempTable("part")
> pSupplier.registerTempTable("supplier")
> pCustomer.registerTempTable("customer")
> query 
> val sql41=sqlContext.sql("select D_YEAR, C_NATION,sum(LO_REVENUE - 
> LO_SUPPLYCOST) as profit from date, customer, supplier, part, lineorder 
> where LO_CUSTKEY = C_CUSTKEYand LO_SUPPKEY = S_SUPPKEYand 
> LO_PARTKEY = P_PARTKEY   and LO_ORDERDATE = D_DATEKEYand C_REGION = 
> 'AMERICA'and S_REGION = 'AMERICA'and (P_MFGR = 'MFGR#1' or P_MFGR = 
> 'MFGR#2') group by D_YEAR, C_NATION order by D_YEAR, C_NATION")
> and 
> sql41.show()
> get stuck, at some point there is no progress and server is fully idle, but 
> Job is staying at the same stage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-6388) Spark 1.3 + Hadoop 2.6 Can't work on Java 8_40

2015-03-17 Thread John (JIRA)
John created SPARK-6388:
---

 Summary: Spark 1.3 + Hadoop 2.6 Can't work on Java 8_40
 Key: SPARK-6388
 URL: https://issues.apache.org/jira/browse/SPARK-6388
 Project: Spark
  Issue Type: Bug
  Components: Block Manager, Spark Submit, YARN
Affects Versions: 1.3.0
 Environment: 1. Linux version 3.16.0-30-generic (buildd@komainu) (gcc 
version 4.9.1 (Ubuntu 4.9.1-16ubuntu6) ) #40-Ubuntu SMP Mon Jan 12 22:06:37 UTC 
2015
2. Oracle Java 8 update 40  for Linux X64
3. Scala 2.10.5
Reporter: John


I build Apache Spark 1.3 munally.
---
JAVA_HOME=PATH_TO_JAVA8
mvn clean package -Pyarn -Phadoop-2.4 -Dhadoop.version=2.6.0 -DskipTests
---

Something goes wrong, akka always tell me 
---
15/03/17 21:28:10 WARN remote.ReliableDeliverySupervisor: Association with 
remote system [akka.tcp://sparkYarnAM@Server2:42161] has failed, address is now 
gated for [5000] ms. Reason is: [Disassociated].
---

I build another version of Spark 1.3 + Hadoop 2.6 under Java 7.
Everything goes well.

Logs
---
15/03/17 21:27:06 INFO spark.SparkContext: Running Spark version 1.3.0
15/03/17 21:27:07 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
15/03/17 21:27:08 INFO spark.SecurityManager: Changing view Servers to: hduser
15/03/17 21:27:08 INFO spark.SecurityManager: Changing modify Servers to: hduser
15/03/17 21:27:08 INFO spark.SecurityManager: SecurityManager: authentication 
disabled; ui Servers disabled; users with view permissions: Set(hduser); users 
with modify permissions: Set(hduser)
15/03/17 21:27:08 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/03/17 21:27:08 INFO Remoting: Starting remoting
15/03/17 21:27:09 INFO Remoting: Remoting started; listening on addresses 
:[akka.tcp://sparkDriver@Server3:37951]
15/03/17 21:27:09 INFO util.Utils: Successfully started service 'sparkDriver' 
on port 37951.
15/03/17 21:27:09 INFO spark.SparkEnv: Registering MapOutputTracker
15/03/17 21:27:09 INFO spark.SparkEnv: Registering BlockManagerMaster
15/03/17 21:27:09 INFO storage.DiskBlockManager: Created local directory at 
/tmp/spark-0db692bb-cd02-40c8-a8f0-3813c6da18e2/blockmgr-a1d0ad23-ab76-4177-80a0-a6f982a64d80
15/03/17 21:27:09 INFO storage.MemoryStore: MemoryStore started with capacity 
265.1 MB
15/03/17 21:27:09 INFO spark.HttpFileServer: HTTP File server directory is 
/tmp/spark-502ef3f8-b8cd-45cf-b1df-97df297cdb35/httpd-6303e24d-4b2b-4614-bb1d-74e8d331189b
15/03/17 21:27:09 INFO spark.HttpServer: Starting HTTP Server
15/03/17 21:27:09 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/03/17 21:27:10 INFO server.AbstractConnector: Started 
SocketConnector@0.0.0.0:48000
15/03/17 21:27:10 INFO util.Utils: Successfully started service 'HTTP file 
server' on port 48000.
15/03/17 21:27:10 INFO spark.SparkEnv: Registering OutputCommitCoordinator
15/03/17 21:27:10 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/03/17 21:27:10 INFO server.AbstractConnector: Started 
SelectChannelConnector@0.0.0.0:4040
15/03/17 21:27:10 INFO util.Utils: Successfully started service 'SparkUI' on 
port 4040.
15/03/17 21:27:10 INFO ui.SparkUI: Started SparkUI at http://Server3:4040
15/03/17 21:27:10 INFO spark.SparkContext: Added JAR 
file:/home/hduser/spark-java2.jar at 
http://192.168.11.42:48000/jars/spark-java2.jar with timestamp 1426598830307
15/03/17 21:27:10 INFO client.RMProxy: Connecting to ResourceManager at 
Server3/192.168.11.42:8050
15/03/17 21:27:11 INFO yarn.Client: Requesting a new application from cluster 
with 3 NodeManagers
15/03/17 21:27:11 INFO yarn.Client: Verifying our application has not requested 
more than the maximum memory capability of the cluster (8192 MB per container)
15/03/17 21:27:11 INFO yarn.Client: Will allocate AM container, with 896 MB 
memory including 384 MB overhead
15/03/17 21:27:11 INFO yarn.Client: Setting up container launch context for our 
AM
15/03/17 21:27:11 INFO yarn.Client: Preparing resources for our AM container
15/03/17 21:27:12 INFO yarn.Client: Uploading resource 
file:/home/hduser/spark-1.3.0/assembly/target/scala-2.10/spark-assembly-1.3.0-hadoop2.6.0.jar
 - 
hdfs://Server3:9000/user/hduser/.sparkStaging/application_1426595477608_0002/spark-assembly-1.3.0-hadoop2.6.0.jar
15/03/17 21:27:21 INFO yarn.Client: Setting up the launch environment for our 
AM container
15/03/17 21:27:21 INFO spark.SecurityManager: Changing view Servers to: hduser
15/03/17 21:27:21 INFO spark.SecurityManager: Changing modify Servers to: hduser
15/03/17 21:27:21 INFO spark.SecurityManager: SecurityManager: authentication 
disabled; ui Servers disabled; users with view permissions: Set(hduser); users 
with modify permissions: Set(hduser)
15/03/17 21:27:21 INFO yarn.Client: Submitting application 2 to ResourceManager
15/03/17 

[jira] [Closed] (SPARK-6388) Spark 1.3 + Hadoop 2.6 Can't work on Java 8_40

2015-03-17 Thread John (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John closed SPARK-6388.
---
Resolution: Not a Problem

 Spark 1.3 + Hadoop 2.6 Can't work on Java 8_40
 --

 Key: SPARK-6388
 URL: https://issues.apache.org/jira/browse/SPARK-6388
 Project: Spark
  Issue Type: Bug
  Components: Block Manager, Spark Submit, YARN
Affects Versions: 1.3.0
 Environment: 1. Linux version 3.16.0-30-generic (buildd@komainu) (gcc 
 version 4.9.1 (Ubuntu 4.9.1-16ubuntu6) ) #40-Ubuntu SMP Mon Jan 12 22:06:37 
 UTC 2015
 2. Oracle Java 8 update 40  for Linux X64
 3. Scala 2.10.5
 4. Hadoop 2.6 (pre-build version)
Reporter: John
   Original Estimate: 24h
  Remaining Estimate: 24h

 I build Apache Spark 1.3 munally.
 ---
 JAVA_HOME=PATH_TO_JAVA8
 mvn clean package -Pyarn -Phadoop-2.4 -Dhadoop.version=2.6.0 -DskipTests
 ---
 Something goes wrong, akka always tell me 
 ---
 15/03/17 21:28:10 WARN remote.ReliableDeliverySupervisor: Association with 
 remote system [akka.tcp://sparkYarnAM@Server2:42161] has failed, address is 
 now gated for [5000] ms. Reason is: [Disassociated].
 ---
 I build another version of Spark 1.3 + Hadoop 2.6 under Java 7.
 Everything goes well.
 Logs
 ---
 15/03/17 21:27:06 INFO spark.SparkContext: Running Spark version 1.3.0
 15/03/17 21:27:07 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 15/03/17 21:27:08 INFO spark.SecurityManager: Changing view Servers to: hduser
 15/03/17 21:27:08 INFO spark.SecurityManager: Changing modify Servers to: 
 hduser
 15/03/17 21:27:08 INFO spark.SecurityManager: SecurityManager: authentication 
 disabled; ui Servers disabled; users with view permissions: Set(hduser); 
 users with modify permissions: Set(hduser)
 15/03/17 21:27:08 INFO slf4j.Slf4jLogger: Slf4jLogger started
 15/03/17 21:27:08 INFO Remoting: Starting remoting
 15/03/17 21:27:09 INFO Remoting: Remoting started; listening on addresses 
 :[akka.tcp://sparkDriver@Server3:37951]
 15/03/17 21:27:09 INFO util.Utils: Successfully started service 'sparkDriver' 
 on port 37951.
 15/03/17 21:27:09 INFO spark.SparkEnv: Registering MapOutputTracker
 15/03/17 21:27:09 INFO spark.SparkEnv: Registering BlockManagerMaster
 15/03/17 21:27:09 INFO storage.DiskBlockManager: Created local directory at 
 /tmp/spark-0db692bb-cd02-40c8-a8f0-3813c6da18e2/blockmgr-a1d0ad23-ab76-4177-80a0-a6f982a64d80
 15/03/17 21:27:09 INFO storage.MemoryStore: MemoryStore started with capacity 
 265.1 MB
 15/03/17 21:27:09 INFO spark.HttpFileServer: HTTP File server directory is 
 /tmp/spark-502ef3f8-b8cd-45cf-b1df-97df297cdb35/httpd-6303e24d-4b2b-4614-bb1d-74e8d331189b
 15/03/17 21:27:09 INFO spark.HttpServer: Starting HTTP Server
 15/03/17 21:27:09 INFO server.Server: jetty-8.y.z-SNAPSHOT
 15/03/17 21:27:10 INFO server.AbstractConnector: Started 
 SocketConnector@0.0.0.0:48000
 15/03/17 21:27:10 INFO util.Utils: Successfully started service 'HTTP file 
 server' on port 48000.
 15/03/17 21:27:10 INFO spark.SparkEnv: Registering OutputCommitCoordinator
 15/03/17 21:27:10 INFO server.Server: jetty-8.y.z-SNAPSHOT
 15/03/17 21:27:10 INFO server.AbstractConnector: Started 
 SelectChannelConnector@0.0.0.0:4040
 15/03/17 21:27:10 INFO util.Utils: Successfully started service 'SparkUI' on 
 port 4040.
 15/03/17 21:27:10 INFO ui.SparkUI: Started SparkUI at http://Server3:4040
 15/03/17 21:27:10 INFO spark.SparkContext: Added JAR 
 file:/home/hduser/spark-java2.jar at 
 http://192.168.11.42:48000/jars/spark-java2.jar with timestamp 1426598830307
 15/03/17 21:27:10 INFO client.RMProxy: Connecting to ResourceManager at 
 Server3/192.168.11.42:8050
 15/03/17 21:27:11 INFO yarn.Client: Requesting a new application from cluster 
 with 3 NodeManagers
 15/03/17 21:27:11 INFO yarn.Client: Verifying our application has not 
 requested more than the maximum memory capability of the cluster (8192 MB per 
 container)
 15/03/17 21:27:11 INFO yarn.Client: Will allocate AM container, with 896 MB 
 memory including 384 MB overhead
 15/03/17 21:27:11 INFO yarn.Client: Setting up container launch context for 
 our AM
 15/03/17 21:27:11 INFO yarn.Client: Preparing resources for our AM container
 15/03/17 21:27:12 INFO yarn.Client: Uploading resource 
 file:/home/hduser/spark-1.3.0/assembly/target/scala-2.10/spark-assembly-1.3.0-hadoop2.6.0.jar
  - 
 hdfs://Server3:9000/user/hduser/.sparkStaging/application_1426595477608_0002/spark-assembly-1.3.0-hadoop2.6.0.jar
 15/03/17 21:27:21 INFO yarn.Client: Setting up the launch environment for our 
 AM container
 15/03/17 21:27:21 INFO spark.SecurityManager: Changing view Servers to: hduser
 15/03/17 21:27:21 INFO 

[jira] [Commented] (SPARK-6388) Spark 1.3 + Hadoop 2.6 Can't work on Java 8_40

2015-03-17 Thread John (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365148#comment-14365148
 ] 

John commented on SPARK-6388:
-

Ok, it looks like my problem.
I'll try it later.
Sorry for opening a issue.
Let me close it.

 Spark 1.3 + Hadoop 2.6 Can't work on Java 8_40
 --

 Key: SPARK-6388
 URL: https://issues.apache.org/jira/browse/SPARK-6388
 Project: Spark
  Issue Type: Bug
  Components: Block Manager, Spark Submit, YARN
Affects Versions: 1.3.0
 Environment: 1. Linux version 3.16.0-30-generic (buildd@komainu) (gcc 
 version 4.9.1 (Ubuntu 4.9.1-16ubuntu6) ) #40-Ubuntu SMP Mon Jan 12 22:06:37 
 UTC 2015
 2. Oracle Java 8 update 40  for Linux X64
 3. Scala 2.10.5
 4. Hadoop 2.6 (pre-build version)
Reporter: John
   Original Estimate: 24h
  Remaining Estimate: 24h

 I build Apache Spark 1.3 munally.
 ---
 JAVA_HOME=PATH_TO_JAVA8
 mvn clean package -Pyarn -Phadoop-2.4 -Dhadoop.version=2.6.0 -DskipTests
 ---
 Something goes wrong, akka always tell me 
 ---
 15/03/17 21:28:10 WARN remote.ReliableDeliverySupervisor: Association with 
 remote system [akka.tcp://sparkYarnAM@Server2:42161] has failed, address is 
 now gated for [5000] ms. Reason is: [Disassociated].
 ---
 I build another version of Spark 1.3 + Hadoop 2.6 under Java 7.
 Everything goes well.
 Logs
 ---
 15/03/17 21:27:06 INFO spark.SparkContext: Running Spark version 1.3.0
 15/03/17 21:27:07 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 15/03/17 21:27:08 INFO spark.SecurityManager: Changing view Servers to: hduser
 15/03/17 21:27:08 INFO spark.SecurityManager: Changing modify Servers to: 
 hduser
 15/03/17 21:27:08 INFO spark.SecurityManager: SecurityManager: authentication 
 disabled; ui Servers disabled; users with view permissions: Set(hduser); 
 users with modify permissions: Set(hduser)
 15/03/17 21:27:08 INFO slf4j.Slf4jLogger: Slf4jLogger started
 15/03/17 21:27:08 INFO Remoting: Starting remoting
 15/03/17 21:27:09 INFO Remoting: Remoting started; listening on addresses 
 :[akka.tcp://sparkDriver@Server3:37951]
 15/03/17 21:27:09 INFO util.Utils: Successfully started service 'sparkDriver' 
 on port 37951.
 15/03/17 21:27:09 INFO spark.SparkEnv: Registering MapOutputTracker
 15/03/17 21:27:09 INFO spark.SparkEnv: Registering BlockManagerMaster
 15/03/17 21:27:09 INFO storage.DiskBlockManager: Created local directory at 
 /tmp/spark-0db692bb-cd02-40c8-a8f0-3813c6da18e2/blockmgr-a1d0ad23-ab76-4177-80a0-a6f982a64d80
 15/03/17 21:27:09 INFO storage.MemoryStore: MemoryStore started with capacity 
 265.1 MB
 15/03/17 21:27:09 INFO spark.HttpFileServer: HTTP File server directory is 
 /tmp/spark-502ef3f8-b8cd-45cf-b1df-97df297cdb35/httpd-6303e24d-4b2b-4614-bb1d-74e8d331189b
 15/03/17 21:27:09 INFO spark.HttpServer: Starting HTTP Server
 15/03/17 21:27:09 INFO server.Server: jetty-8.y.z-SNAPSHOT
 15/03/17 21:27:10 INFO server.AbstractConnector: Started 
 SocketConnector@0.0.0.0:48000
 15/03/17 21:27:10 INFO util.Utils: Successfully started service 'HTTP file 
 server' on port 48000.
 15/03/17 21:27:10 INFO spark.SparkEnv: Registering OutputCommitCoordinator
 15/03/17 21:27:10 INFO server.Server: jetty-8.y.z-SNAPSHOT
 15/03/17 21:27:10 INFO server.AbstractConnector: Started 
 SelectChannelConnector@0.0.0.0:4040
 15/03/17 21:27:10 INFO util.Utils: Successfully started service 'SparkUI' on 
 port 4040.
 15/03/17 21:27:10 INFO ui.SparkUI: Started SparkUI at http://Server3:4040
 15/03/17 21:27:10 INFO spark.SparkContext: Added JAR 
 file:/home/hduser/spark-java2.jar at 
 http://192.168.11.42:48000/jars/spark-java2.jar with timestamp 1426598830307
 15/03/17 21:27:10 INFO client.RMProxy: Connecting to ResourceManager at 
 Server3/192.168.11.42:8050
 15/03/17 21:27:11 INFO yarn.Client: Requesting a new application from cluster 
 with 3 NodeManagers
 15/03/17 21:27:11 INFO yarn.Client: Verifying our application has not 
 requested more than the maximum memory capability of the cluster (8192 MB per 
 container)
 15/03/17 21:27:11 INFO yarn.Client: Will allocate AM container, with 896 MB 
 memory including 384 MB overhead
 15/03/17 21:27:11 INFO yarn.Client: Setting up container launch context for 
 our AM
 15/03/17 21:27:11 INFO yarn.Client: Preparing resources for our AM container
 15/03/17 21:27:12 INFO yarn.Client: Uploading resource 
 file:/home/hduser/spark-1.3.0/assembly/target/scala-2.10/spark-assembly-1.3.0-hadoop2.6.0.jar
  - 
 hdfs://Server3:9000/user/hduser/.sparkStaging/application_1426595477608_0002/spark-assembly-1.3.0-hadoop2.6.0.jar
 15/03/17 21:27:21 INFO yarn.Client: Setting up the launch environment for 

[jira] [Updated] (SPARK-6388) Spark 1.3 + Hadoop 2.6 Can't work on Java 8_40

2015-03-17 Thread John (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John updated SPARK-6388:

Environment: 
1. Linux version 3.16.0-30-generic (buildd@komainu) (gcc version 4.9.1 (Ubuntu 
4.9.1-16ubuntu6) ) #40-Ubuntu SMP Mon Jan 12 22:06:37 UTC 2015
2. Oracle Java 8 update 40  for Linux X64
3. Scala 2.10.5
4. Hadoop 2.6 (pre-build version)

  was:
1. Linux version 3.16.0-30-generic (buildd@komainu) (gcc version 4.9.1 (Ubuntu 
4.9.1-16ubuntu6) ) #40-Ubuntu SMP Mon Jan 12 22:06:37 UTC 2015
2. Oracle Java 8 update 40  for Linux X64
3. Scala 2.10.5


 Spark 1.3 + Hadoop 2.6 Can't work on Java 8_40
 --

 Key: SPARK-6388
 URL: https://issues.apache.org/jira/browse/SPARK-6388
 Project: Spark
  Issue Type: Bug
  Components: Block Manager, Spark Submit, YARN
Affects Versions: 1.3.0
 Environment: 1. Linux version 3.16.0-30-generic (buildd@komainu) (gcc 
 version 4.9.1 (Ubuntu 4.9.1-16ubuntu6) ) #40-Ubuntu SMP Mon Jan 12 22:06:37 
 UTC 2015
 2. Oracle Java 8 update 40  for Linux X64
 3. Scala 2.10.5
 4. Hadoop 2.6 (pre-build version)
Reporter: John
   Original Estimate: 24h
  Remaining Estimate: 24h

 I build Apache Spark 1.3 munally.
 ---
 JAVA_HOME=PATH_TO_JAVA8
 mvn clean package -Pyarn -Phadoop-2.4 -Dhadoop.version=2.6.0 -DskipTests
 ---
 Something goes wrong, akka always tell me 
 ---
 15/03/17 21:28:10 WARN remote.ReliableDeliverySupervisor: Association with 
 remote system [akka.tcp://sparkYarnAM@Server2:42161] has failed, address is 
 now gated for [5000] ms. Reason is: [Disassociated].
 ---
 I build another version of Spark 1.3 + Hadoop 2.6 under Java 7.
 Everything goes well.
 Logs
 ---
 15/03/17 21:27:06 INFO spark.SparkContext: Running Spark version 1.3.0
 15/03/17 21:27:07 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 15/03/17 21:27:08 INFO spark.SecurityManager: Changing view Servers to: hduser
 15/03/17 21:27:08 INFO spark.SecurityManager: Changing modify Servers to: 
 hduser
 15/03/17 21:27:08 INFO spark.SecurityManager: SecurityManager: authentication 
 disabled; ui Servers disabled; users with view permissions: Set(hduser); 
 users with modify permissions: Set(hduser)
 15/03/17 21:27:08 INFO slf4j.Slf4jLogger: Slf4jLogger started
 15/03/17 21:27:08 INFO Remoting: Starting remoting
 15/03/17 21:27:09 INFO Remoting: Remoting started; listening on addresses 
 :[akka.tcp://sparkDriver@Server3:37951]
 15/03/17 21:27:09 INFO util.Utils: Successfully started service 'sparkDriver' 
 on port 37951.
 15/03/17 21:27:09 INFO spark.SparkEnv: Registering MapOutputTracker
 15/03/17 21:27:09 INFO spark.SparkEnv: Registering BlockManagerMaster
 15/03/17 21:27:09 INFO storage.DiskBlockManager: Created local directory at 
 /tmp/spark-0db692bb-cd02-40c8-a8f0-3813c6da18e2/blockmgr-a1d0ad23-ab76-4177-80a0-a6f982a64d80
 15/03/17 21:27:09 INFO storage.MemoryStore: MemoryStore started with capacity 
 265.1 MB
 15/03/17 21:27:09 INFO spark.HttpFileServer: HTTP File server directory is 
 /tmp/spark-502ef3f8-b8cd-45cf-b1df-97df297cdb35/httpd-6303e24d-4b2b-4614-bb1d-74e8d331189b
 15/03/17 21:27:09 INFO spark.HttpServer: Starting HTTP Server
 15/03/17 21:27:09 INFO server.Server: jetty-8.y.z-SNAPSHOT
 15/03/17 21:27:10 INFO server.AbstractConnector: Started 
 SocketConnector@0.0.0.0:48000
 15/03/17 21:27:10 INFO util.Utils: Successfully started service 'HTTP file 
 server' on port 48000.
 15/03/17 21:27:10 INFO spark.SparkEnv: Registering OutputCommitCoordinator
 15/03/17 21:27:10 INFO server.Server: jetty-8.y.z-SNAPSHOT
 15/03/17 21:27:10 INFO server.AbstractConnector: Started 
 SelectChannelConnector@0.0.0.0:4040
 15/03/17 21:27:10 INFO util.Utils: Successfully started service 'SparkUI' on 
 port 4040.
 15/03/17 21:27:10 INFO ui.SparkUI: Started SparkUI at http://Server3:4040
 15/03/17 21:27:10 INFO spark.SparkContext: Added JAR 
 file:/home/hduser/spark-java2.jar at 
 http://192.168.11.42:48000/jars/spark-java2.jar with timestamp 1426598830307
 15/03/17 21:27:10 INFO client.RMProxy: Connecting to ResourceManager at 
 Server3/192.168.11.42:8050
 15/03/17 21:27:11 INFO yarn.Client: Requesting a new application from cluster 
 with 3 NodeManagers
 15/03/17 21:27:11 INFO yarn.Client: Verifying our application has not 
 requested more than the maximum memory capability of the cluster (8192 MB per 
 container)
 15/03/17 21:27:11 INFO yarn.Client: Will allocate AM container, with 896 MB 
 memory including 384 MB overhead
 15/03/17 21:27:11 INFO yarn.Client: Setting up container launch context for 
 our AM
 15/03/17 21:27:11 INFO yarn.Client: Preparing resources for our AM container
 15/03/17 21:27:12 INFO 

[jira] [Commented] (SPARK-6388) Spark 1.3 + Hadoop 2.6 Can't work on Java 8_40

2015-03-17 Thread John (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365211#comment-14365211
 ] 

John commented on SPARK-6388:
-

Thanks, I will try it later

 Spark 1.3 + Hadoop 2.6 Can't work on Java 8_40
 --

 Key: SPARK-6388
 URL: https://issues.apache.org/jira/browse/SPARK-6388
 Project: Spark
  Issue Type: Bug
  Components: Block Manager, Spark Submit, YARN
Affects Versions: 1.3.0
 Environment: 1. Linux version 3.16.0-30-generic (buildd@komainu) (gcc 
 version 4.9.1 (Ubuntu 4.9.1-16ubuntu6) ) #40-Ubuntu SMP Mon Jan 12 22:06:37 
 UTC 2015
 2. Oracle Java 8 update 40  for Linux X64
 3. Scala 2.10.5
 4. Hadoop 2.6 (pre-build version)
Reporter: John
   Original Estimate: 24h
  Remaining Estimate: 24h

 I build Apache Spark 1.3 munally.
 ---
 JAVA_HOME=PATH_TO_JAVA8
 mvn clean package -Pyarn -Phadoop-2.4 -Dhadoop.version=2.6.0 -DskipTests
 ---
 Something goes wrong, akka always tell me 
 ---
 15/03/17 21:28:10 WARN remote.ReliableDeliverySupervisor: Association with 
 remote system [akka.tcp://sparkYarnAM@Server2:42161] has failed, address is 
 now gated for [5000] ms. Reason is: [Disassociated].
 ---
 I build another version of Spark 1.3 + Hadoop 2.6 under Java 7.
 Everything goes well.
 Logs
 ---
 15/03/17 21:27:06 INFO spark.SparkContext: Running Spark version 1.3.0
 15/03/17 21:27:07 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 15/03/17 21:27:08 INFO spark.SecurityManager: Changing view Servers to: hduser
 15/03/17 21:27:08 INFO spark.SecurityManager: Changing modify Servers to: 
 hduser
 15/03/17 21:27:08 INFO spark.SecurityManager: SecurityManager: authentication 
 disabled; ui Servers disabled; users with view permissions: Set(hduser); 
 users with modify permissions: Set(hduser)
 15/03/17 21:27:08 INFO slf4j.Slf4jLogger: Slf4jLogger started
 15/03/17 21:27:08 INFO Remoting: Starting remoting
 15/03/17 21:27:09 INFO Remoting: Remoting started; listening on addresses 
 :[akka.tcp://sparkDriver@Server3:37951]
 15/03/17 21:27:09 INFO util.Utils: Successfully started service 'sparkDriver' 
 on port 37951.
 15/03/17 21:27:09 INFO spark.SparkEnv: Registering MapOutputTracker
 15/03/17 21:27:09 INFO spark.SparkEnv: Registering BlockManagerMaster
 15/03/17 21:27:09 INFO storage.DiskBlockManager: Created local directory at 
 /tmp/spark-0db692bb-cd02-40c8-a8f0-3813c6da18e2/blockmgr-a1d0ad23-ab76-4177-80a0-a6f982a64d80
 15/03/17 21:27:09 INFO storage.MemoryStore: MemoryStore started with capacity 
 265.1 MB
 15/03/17 21:27:09 INFO spark.HttpFileServer: HTTP File server directory is 
 /tmp/spark-502ef3f8-b8cd-45cf-b1df-97df297cdb35/httpd-6303e24d-4b2b-4614-bb1d-74e8d331189b
 15/03/17 21:27:09 INFO spark.HttpServer: Starting HTTP Server
 15/03/17 21:27:09 INFO server.Server: jetty-8.y.z-SNAPSHOT
 15/03/17 21:27:10 INFO server.AbstractConnector: Started 
 SocketConnector@0.0.0.0:48000
 15/03/17 21:27:10 INFO util.Utils: Successfully started service 'HTTP file 
 server' on port 48000.
 15/03/17 21:27:10 INFO spark.SparkEnv: Registering OutputCommitCoordinator
 15/03/17 21:27:10 INFO server.Server: jetty-8.y.z-SNAPSHOT
 15/03/17 21:27:10 INFO server.AbstractConnector: Started 
 SelectChannelConnector@0.0.0.0:4040
 15/03/17 21:27:10 INFO util.Utils: Successfully started service 'SparkUI' on 
 port 4040.
 15/03/17 21:27:10 INFO ui.SparkUI: Started SparkUI at http://Server3:4040
 15/03/17 21:27:10 INFO spark.SparkContext: Added JAR 
 file:/home/hduser/spark-java2.jar at 
 http://192.168.11.42:48000/jars/spark-java2.jar with timestamp 1426598830307
 15/03/17 21:27:10 INFO client.RMProxy: Connecting to ResourceManager at 
 Server3/192.168.11.42:8050
 15/03/17 21:27:11 INFO yarn.Client: Requesting a new application from cluster 
 with 3 NodeManagers
 15/03/17 21:27:11 INFO yarn.Client: Verifying our application has not 
 requested more than the maximum memory capability of the cluster (8192 MB per 
 container)
 15/03/17 21:27:11 INFO yarn.Client: Will allocate AM container, with 896 MB 
 memory including 384 MB overhead
 15/03/17 21:27:11 INFO yarn.Client: Setting up container launch context for 
 our AM
 15/03/17 21:27:11 INFO yarn.Client: Preparing resources for our AM container
 15/03/17 21:27:12 INFO yarn.Client: Uploading resource 
 file:/home/hduser/spark-1.3.0/assembly/target/scala-2.10/spark-assembly-1.3.0-hadoop2.6.0.jar
  - 
 hdfs://Server3:9000/user/hduser/.sparkStaging/application_1426595477608_0002/spark-assembly-1.3.0-hadoop2.6.0.jar
 15/03/17 21:27:21 INFO yarn.Client: Setting up the launch environment for our 
 AM container
 15/03/17 21:27:21 INFO spark.SecurityManager: