Spark checkpoint problem

2015-11-25 Thread wyphao.2007




I am test checkpoint to understand how it works, My code as following:


scala> val data = sc.parallelize(List("a", "b", "c"))
data: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[0] at 
parallelize at :15


scala> sc.setCheckpointDir("/tmp/checkpoint")
15/11/25 18:09:07 WARN spark.SparkContext: Checkpoint directory must be 
non-local if Spark is running on a cluster: /tmp/checkpoint1


scala> data.checkpoint


scala> val temp = data.map(item => (item, 1))
temp: org.apache.spark.rdd.RDD[(String, Int)] = MapPartitionsRDD[3] at map at 
:17


scala> temp.checkpoint


scala> temp.count


but I found that only the temp RDD is checkpont in the /tmp/checkpoint 
directory, The data RDD is not checkpointed! I found the doCheckpoint function  
in the org.apache.spark.rdd.RDD class:


  private[spark] def doCheckpoint(): Unit = {
RDDOperationScope.withScope(sc, "checkpoint", allowNesting = false, 
ignoreParent = true) {
  if (!doCheckpointCalled) {
doCheckpointCalled = true
if (checkpointData.isDefined) {
  checkpointData.get.checkpoint()
} else {
  dependencies.foreach(_.rdd.doCheckpoint())
}
  }
}
  }


from the code above, Only the last RDD(In my case is temp) will be 
checkpointed, My question : Is deliberately designed or this is a bug?


Thank you.










 

Spark checkpoint problem

2015-11-25 Thread wyphao.2007
Hi, 


I am test checkpoint to understand how it works, My code as following:


scala> val data = sc.parallelize(List("a", "b", "c"))
data: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[0] at 
parallelize at :15


scala> sc.setCheckpointDir("/tmp/checkpoint")
15/11/25 18:09:07 WARN spark.SparkContext: Checkpoint directory must be 
non-local if Spark is running on a cluster: /tmp/checkpoint1


scala> data.checkpoint


scala> val temp = data.map(item => (item, 1))
temp: org.apache.spark.rdd.RDD[(String, Int)] = MapPartitionsRDD[3] at map at 
:17


scala> temp.checkpoint


scala> temp.count


but I found that only the temp RDD is checkpont in the /tmp/checkpoint 
directory, The data RDD is not checkpointed! I found the doCheckpoint function  
in the org.apache.spark.rdd.RDD class:


  private[spark] def doCheckpoint(): Unit = {
RDDOperationScope.withScope(sc, "checkpoint", allowNesting = false, 
ignoreParent = true) {
  if (!doCheckpointCalled) {
doCheckpointCalled = true
if (checkpointData.isDefined) {
  checkpointData.get.checkpoint()
} else {
  dependencies.foreach(_.rdd.doCheckpoint())
}
  }
}
  }


from the code above, Only the last RDD(In my case is temp) will be 
checkpointed, My question : Is deliberately designed or this is a bug?


Thank you.







Re:RE: Spark checkpoint problem

2015-11-25 Thread wyphao.2007
Spark 1.5.2.


在 2015-11-26 13:19:39,"张志强(旺轩)" <zzq98...@alibaba-inc.com> 写道:


What’s your spark version?

发件人: wyphao.2007 [mailto:wyphao.2...@163.com]
发送时间: 2015年11月26日 10:04
收件人: user
抄送:dev@spark.apache.org
主题: Spark checkpoint problem

I am test checkpoint to understand how it works, My code as following:

 

scala> val data = sc.parallelize(List("a", "b", "c"))

data: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[0] at 
parallelize at :15

 

scala> sc.setCheckpointDir("/tmp/checkpoint")

15/11/25 18:09:07 WARN spark.SparkContext: Checkpoint directory must be 
non-local if Spark is running on a cluster: /tmp/checkpoint1

 

scala> data.checkpoint

 

scala> val temp = data.map(item => (item, 1))

temp: org.apache.spark.rdd.RDD[(String, Int)] = MapPartitionsRDD[3] at map at 
:17

 

scala> temp.checkpoint

 

scala> temp.count

 

but I found that only the temp RDD is checkpont in the /tmp/checkpoint 
directory, The data RDD is not checkpointed! I found the doCheckpoint function  
in the org.apache.spark.rdd.RDD class:

 

  private[spark] def doCheckpoint(): Unit = {

RDDOperationScope.withScope(sc, "checkpoint", allowNesting = false, 
ignoreParent = true) {

  if (!doCheckpointCalled) {

doCheckpointCalled = true

if (checkpointData.isDefined) {

  checkpointData.get.checkpoint()

} else {

  dependencies.foreach(_.rdd.doCheckpoint())

}

  }

}

  }

 

from the code above, Only the last RDD(In my case is temp) will be 
checkpointed, My question : Is deliberately designed or this is a bug?

 

Thank you.

 

 

 

 

 

 

 

Re:Re:Driver memory leak?

2015-04-29 Thread wyphao.2007
No, I am not collect  the result to driver,I sample send the result to kafka.


BTW, the image address are:
https://cloud.githubusercontent.com/assets/5170878/7389463/ac03bf34-eea0-11e4-9e6b-1d2fba170c1c.png
and 
https://cloud.githubusercontent.com/assets/5170878/7389480/c629d236-eea0-11e4-983a-dc5aa97c2554.png



At 2015-04-29 18:48:33,zhangxiongfei zhangxiongfei0...@163.com wrote:



The mount of memory that the driver consumes depends on your program logic,did 
you try to collect the result of Spark job?




At 2015-04-29 18:42:04, wyphao.2007 wyphao.2...@163.com wrote:

Hi, Dear developer, I am using Spark Streaming to read data from kafka, the 
program already run about 120 hours, but today the program failed because of 
driver's OOM as follow:


Container [pid=49133,containerID=container_1429773909253_0050_02_01] is 
running beyond physical memory limits. Current usage: 2.5 GB of 2.5 GB physical 
memory used; 3.2 GB of 50 GB virtual memory used. Killing container.


I set --driver-memory to 2g, In my mind, driver is responsibility for job 
scheduler and job monitor(Please correct me If I'm wrong), Why it using so much 
memory?


So I using jmap to monitor other program(already run about 48 hours): 
sudo /home/q/java7/jdk1.7.0_45/bin/jmap -histo:live 31256, the result as follow:
the java.util.HashMap$Entry and java.lang.Long  object using about 600Mb memory!


and I also using jmap to monitor other program(already run about 1 hours),  the 
result as follow:
the java.util.HashMap$Entry and java.lang.Long object doesn't using so many 
memory, But I found, as time goes by, the java.util.HashMap$Entry and 
java.lang.Long object will occupied more and more memory,
It is driver's memory leak question? or other reason?
Thanks
Best Regards

















Re:Re: java.lang.StackOverflowError when recovery from checkpoint in Streaming

2015-04-28 Thread wyphao.2007
Hi Akhil Das, Thank you for your reply.
It is very similar to my problem, I will focus on it.
Thanks
Best Regards

At 2015-04-28 18:08:32,Akhil Das ak...@sigmoidanalytics.com wrote:
There's a similar issue reported over here
https://issues.apache.org/jira/browse/SPARK-6847

Thanks
Best Regards

On Tue, Apr 28, 2015 at 7:35 AM, wyphao.2007 wyphao.2...@163.com wrote:

  Hi everyone, I am using val messages =
 KafkaUtils.createDirectStream[String, String, StringDecoder,
 StringDecoder](ssc, kafkaParams, topicsSet) to read data from
 kafka(1k/second), and store the data in windows,the code snippets as
 follow:val windowedStreamChannel =
 streamChannel.combineByKey[TreeSet[Obj]](TreeSet[Obj](_), _ += _, _ ++= _,
 new HashPartitioner(numPartition))
   .reduceByKeyAndWindow((x: TreeSet[Obj], y: TreeSet[Obj]) = x
 ++= y,
 (x: TreeSet[Obj], y: TreeSet[Obj]) = x --= y, Minutes(60),
 Seconds(2), numPartition,
 (item: (String, TreeSet[Obj])) = item._2.size != 0)after the
 application  run for an hour,  I kill the application and restart it from
 checkpoint directory, but I  encountered an exception:2015-04-27
 17:52:40,955 INFO  [Driver] - Slicing from 1430126222000 ms to
 1430126222000 ms (aligned to 1430126222000 ms and 1430126222000 ms)
 2015-04-27 17:52:40,958 ERROR [Driver] - User class threw exception: null
 java.lang.StackOverflowError
 at java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
 at
 java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:242)
 at java.io.File.exists(File.java:813)
 at
 sun.misc.URLClassPath$FileLoader.getResource(URLClassPath.java:1080)
 at sun.misc.URLClassPath.getResource(URLClassPath.java:199)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:358)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:190)
 at
 org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
 at org.apache.spark.SparkContext.clean(SparkContext.scala:1623)
 at org.apache.spark.rdd.RDD.filter(RDD.scala:303)
 at
 org.apache.spark.streaming.dstream.FilteredDStream$$anonfun$compute$1.apply(FilteredDStream.scala:35)
 at
 org.apache.spark.streaming.dstream.FilteredDStream$$anonfun$compute$1.apply(FilteredDStream.scala:35)
 at scala.Option.map(Option.scala:145)
 at
 org.apache.spark.streaming.dstream.FilteredDStream.compute(FilteredDStream.scala:35)
 at
 org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300)
 at
 org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300)
 at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
 at
 org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:299)
 at
 org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:287)
 at scala.Option.orElse(Option.scala:257)
 at
 org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:284)
 at
 org.apache.spark.streaming.dstream.FlatMappedDStream.compute(FlatMappedDStream.scala:35)
 at
 org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300)
 at
 org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300)
 at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
 at
 org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:299)
 at
 org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:287)
 at scala.Option.orElse(Option.scala:257)
 at
 org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:284)
 at
 org.apache.spark.streaming.dstream.FilteredDStream.compute(FilteredDStream.scala:35)
 at
 org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300)
 at
 org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300)
 at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
 at
 org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:299)
 at
 org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:287)
 at scala.Option.orElse(Option.scala:257

java.lang.StackOverflowError when recovery from checkpoint in Streaming

2015-04-27 Thread wyphao.2007
 Hi everyone, I am using val messages = KafkaUtils.createDirectStream[String, 
String, StringDecoder, StringDecoder](ssc, kafkaParams, topicsSet) to read data 
from kafka(1k/second), and store the data in windows,the code snippets as 
follow:val windowedStreamChannel = 
streamChannel.combineByKey[TreeSet[Obj]](TreeSet[Obj](_), _ += _, _ ++= _, new 
HashPartitioner(numPartition))
  .reduceByKeyAndWindow((x: TreeSet[Obj], y: TreeSet[Obj]) = x ++= y,
(x: TreeSet[Obj], y: TreeSet[Obj]) = x --= y, Minutes(60), 
Seconds(2), numPartition,
(item: (String, TreeSet[Obj])) = item._2.size != 0)after the 
application  run for an hour,  I kill the application and restart it from 
checkpoint directory, but I  encountered an exception:2015-04-27 17:52:40,955 
INFO  [Driver] - Slicing from 1430126222000 ms to 1430126222000 ms (aligned to 
1430126222000 ms and 1430126222000 ms)
2015-04-27 17:52:40,958 ERROR [Driver] - User class threw exception: null
java.lang.StackOverflowError
at java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
at java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:242)
at java.io.File.exists(File.java:813)
at sun.misc.URLClassPath$FileLoader.getResource(URLClassPath.java:1080)
at sun.misc.URLClassPath.getResource(URLClassPath.java:199)
at java.net.URLClassLoader$1.run(URLClassLoader.java:358)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:190)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
at org.apache.spark.SparkContext.clean(SparkContext.scala:1623)
at org.apache.spark.rdd.RDD.filter(RDD.scala:303)
at 
org.apache.spark.streaming.dstream.FilteredDStream$$anonfun$compute$1.apply(FilteredDStream.scala:35)
at 
org.apache.spark.streaming.dstream.FilteredDStream$$anonfun$compute$1.apply(FilteredDStream.scala:35)
at scala.Option.map(Option.scala:145)
at 
org.apache.spark.streaming.dstream.FilteredDStream.compute(FilteredDStream.scala:35)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:299)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:287)
at scala.Option.orElse(Option.scala:257)
at 
org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:284)
at 
org.apache.spark.streaming.dstream.FlatMappedDStream.compute(FlatMappedDStream.scala:35)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:299)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:287)
at scala.Option.orElse(Option.scala:257)
at 
org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:284)
at 
org.apache.spark.streaming.dstream.FilteredDStream.compute(FilteredDStream.scala:35)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:299)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:287)
at scala.Option.orElse(Option.scala:257)
at 
org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:284)
at 
org.apache.spark.streaming.dstream.ShuffledDStream.compute(ShuffledDStream.scala:41)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300)

Question about recovery from checkpoint exception[SPARK-6892]

2015-04-19 Thread wyphao.2007
Hi, 
   When I recovery from checkpoint in yarn-cluster mode using Spark Streaming, 
I found it will reuse the application id (In my case is 
application_1428664056212_0016) before falied to write spark eventLog, But now 
my application id is application_1428664056212_0017,then spark write eventLog 
will falied, the stacktrace as follow:
15/04/14 10:14:01 WARN util.ShutdownHookManager: ShutdownHook '$anon$3' failed, 
java.io.IOException: Target log file already exists 
(hdfs://mycluster/spark-logs/eventLog/application_1428664056212_0016)
java.io.IOException: Target log file already exists 
(hdfs://mycluster/spark-logs/eventLog/application_1428664056212_0016)
at 
org.apache.spark.scheduler.EventLoggingListener.stop(EventLoggingListener.scala:201)
at 
org.apache.spark.SparkContext$$anonfun$stop$4.apply(SparkContext.scala:1388)
at 
org.apache.spark.SparkContext$$anonfun$stop$4.apply(SparkContext.scala:1388)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.SparkContext.stop(SparkContext.scala:1388)
at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:107)
at 
org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
Is someone can help me, The issue is SPARK-6892.
thanks





Re:Re: Question about recovery from checkpoint exception[SPARK-6892]

2015-04-19 Thread wyphao.2007
Hi Sean Owen, Thank you for your attention.


I know spark.hadoop.validateOutputSpecs. 


I restart the job, the application id is application_1428664056212_0017 and it 
recovery from checkpoint, and it will write eventLog into 
application_1428664056212_0016 dir, I think it shoud write to 
application_1428664056212_0017 not application_1428664056212_0016.



At 2015-04-20 11:46:12,Sean Owen so...@cloudera.com wrote:
This is why spark.hadoop.validateOutputSpecs exists, really:
https://spark.apache.org/docs/latest/configuration.html

On Mon, Apr 20, 2015 at 3:40 AM, wyphao.2007 wyphao.2...@163.com wrote:
 Hi,
When I recovery from checkpoint in yarn-cluster mode using Spark 
 Streaming, I found it will reuse the application id (In my case is 
 application_1428664056212_0016) before falied to write spark eventLog, But 
 now my application id is application_1428664056212_0017,then spark write 
 eventLog will falied, the stacktrace as follow:
 15/04/14 10:14:01 WARN util.ShutdownHookManager: ShutdownHook '$anon$3' 
 failed, java.io.IOException: Target log file already exists 
 (hdfs://mycluster/spark-logs/eventLog/application_1428664056212_0016)
 java.io.IOException: Target log file already exists 
 (hdfs://mycluster/spark-logs/eventLog/application_1428664056212_0016)
 at 
 org.apache.spark.scheduler.EventLoggingListener.stop(EventLoggingListener.scala:201)
 at 
 org.apache.spark.SparkContext$$anonfun$stop$4.apply(SparkContext.scala:1388)
 at 
 org.apache.spark.SparkContext$$anonfun$stop$4.apply(SparkContext.scala:1388)
 at scala.Option.foreach(Option.scala:236)
 at org.apache.spark.SparkContext.stop(SparkContext.scala:1388)
 at 
 org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:107)
 at 
 org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
 Is someone can help me, The issue is SPARK-6892.
 thanks




-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



How to get removed RDD from windows?

2015-03-30 Thread wyphao.2007
I want to get removed RDD from windows as follow, The old RDDs will removed 
from current window, 
//  _
// |  previous window   _|___
// |___|   current window|  -- Time
// |_|
//
// | _|  | _|
//  | |
//  V V
//   old RDDs new RDDs
//
I find  the slice function in DStream class can return the DStream between 
fromTime to  toTime. But when I use the function as follow:


val now = System.currentTimeMillis()
result.slice(new Time(now - 30 * 1000), new Time(now - 30 * 1000 + 
result.slideDuration.milliseconds)).foreach(item = println(xxx + item))
ssc.start()


30 is the window's duration,Then I got zeroTime has not been initialized 
exception. 


Is anyone can help me? thx!












Use mvn to build Spark 1.2.0 failed

2014-12-21 Thread wyphao.2007
Hi all, Today download Spark source from http://spark.apache.org/downloads.html 
page, and I use


 ./make-distribution.sh --tgz -Phadoop-2.2 -Pyarn -DskipTests 
-Dhadoop.version=2.2.0 -Phive


to build the release, but I encountered an exception as follow:


[INFO] --- build-helper-maven-plugin:1.8:add-source (add-scala-sources) @ 
spark-parent ---
[INFO] Source directory: /home/q/spark/spark-1.2.0/src/main/scala added.
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.5:process (default) @ spark-parent 
---
[INFO] 
[INFO] Reactor Summary:
[INFO] 
[INFO] Spark Project Parent POM .. FAILURE [1.015s]
[INFO] Spark Project Networking .. SKIPPED
[INFO] Spark Project Shuffle Streaming Service ... SKIPPED
[INFO] Spark Project Core  SKIPPED
[INFO] Spark Project Bagel ... SKIPPED
[INFO] Spark Project GraphX .. SKIPPED
[INFO] Spark Project Streaming ... SKIPPED
[INFO] Spark Project Catalyst  SKIPPED
[INFO] Spark Project SQL . SKIPPED
[INFO] Spark Project ML Library .. SKIPPED
[INFO] Spark Project Tools ... SKIPPED
[INFO] Spark Project Hive  SKIPPED
[INFO] Spark Project REPL  SKIPPED
[INFO] Spark Project YARN Parent POM . SKIPPED
[INFO] Spark Project YARN Stable API . SKIPPED
[INFO] Spark Project Assembly  SKIPPED
[INFO] Spark Project External Twitter  SKIPPED
[INFO] Spark Project External Flume Sink . SKIPPED
[INFO] Spark Project External Flume .. SKIPPED
[INFO] Spark Project External MQTT ... SKIPPED
[INFO] Spark Project External ZeroMQ . SKIPPED
[INFO] Spark Project External Kafka .. SKIPPED
[INFO] Spark Project Examples  SKIPPED
[INFO] Spark Project YARN Shuffle Service  SKIPPED
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 1.644s
[INFO] Finished at: Mon Dec 22 10:56:35 CST 2014
[INFO] Final Memory: 21M/481M
[INFO] 
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-remote-resources-plugin:1.5:process (default) on 
project spark-parent: Error finding remote resources manifests: 
/home/q/spark/spark-1.2.0/target/maven-shared-archive-resources/META-INF/NOTICE 
(No such file or directory) - [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException


but the NOTICE file is in the download spark release:


[wyp@spark  /home/q/spark/spark-1.2.0]$ ll
total 248
drwxrwxr-x 3 1000 1000  4096 Dec 10 18:02 assembly
drwxrwxr-x 3 1000 1000  4096 Dec 10 18:02 bagel
drwxrwxr-x 2 1000 1000  4096 Dec 10 18:02 bin
drwxrwxr-x 2 1000 1000  4096 Dec 10 18:02 conf
-rw-rw-r-- 1 1000 1000   663 Dec 10 18:02 CONTRIBUTING.md
drwxrwxr-x 3 1000 1000  4096 Dec 10 18:02 core
drwxrwxr-x 3 1000 1000  4096 Dec 10 18:02 data
drwxrwxr-x 4 1000 1000  4096 Dec 10 18:02 dev
drwxrwxr-x 3 1000 1000  4096 Dec 10 18:02 docker
drwxrwxr-x 7 1000 1000  4096 Dec 10 18:02 docs
drwxrwxr-x 4 1000 1000  4096 Dec 10 18:02 ec2
drwxrwxr-x 4 1000 1000  4096 Dec 10 18:02 examples
drwxrwxr-x 8 1000 1000  4096 Dec 10 18:02 external
drwxrwxr-x 5 1000 1000  4096 Dec 10 18:02 extras
drwxrwxr-x 4 1000 1000  4096 Dec 10 18:02 graphx
-rw-rw-r-- 1 1000 1000 45242 Dec 10 18:02 LICENSE
-rwxrwxr-x 1 1000 1000  7941 Dec 10 18:02 make-distribution.sh
drwxrwxr-x 3 1000 1000  4096 Dec 10 18:02 mllib
drwxrwxr-x 5 1000 1000  4096 Dec 10 18:02 network
-rw-rw-r-- 1 1000 1000 22559 Dec 10 18:02 NOTICE
-rw-rw-r-- 1 1000 1000 49002 Dec 10 18:02 pom.xml
drwxrwxr-x 4 1000 1000  4096 Dec 10 18:02 project
drwxrwxr-x 6 1000 1000  4096 Dec 10 18:02 python
-rw-rw-r-- 1 1000 1000  3645 Dec 10 18:02 README.md
drwxrwxr-x 5 1000 1000  4096 Dec 10 18:02 repl
drwxrwxr-x 2 1000 1000  4096 Dec 10 18:02 sbin
drwxrwxr-x 2 1000 1000  4096 Dec 10 18:02 sbt
-rw-rw-r-- 1 1000 1000  7804 Dec 10 18:02 scalastyle-config.xml
drwxrwxr-x 6 1000 1000  4096 Dec 10 18:02 sql
drwxrwxr-x 3 1000 1000  4096 Dec 10 18:02 streaming
drwxrwxr-x 3 1000 1000  4096 Dec 10 18:02 tools
-rw-rw-r-- 1 1000 1000   838 Dec 10 18:02 tox.ini
drwxrwxr-x 5 

Re:Re: Announcing Spark 1.2!

2014-12-19 Thread wyphao.2007


In the http://spark.apache.org/downloads.html page,We cann't download the 
newest Spark release.  






At 2014-12-19 17:55:29,Sean Owen so...@cloudera.com wrote:
Tag 1.2.0 is older than 1.2.0-rc2. I wonder if it just didn't get
updated. I assume it's going to be 1.2.0-rc2 plus a few commits
related to the release process.

On Fri, Dec 19, 2014 at 9:50 AM, Shixiong Zhu zsxw...@gmail.com wrote:
 Congrats!

 A little question about this release: Which commit is this release based on?
 v1.2.0 and v1.2.0-rc2 are pointed to different commits in
 https://github.com/apache/spark/releases

 Best Regards,

 Shixiong Zhu

 2014-12-19 16:52 GMT+08:00 Patrick Wendell pwend...@gmail.com:

 I'm happy to announce the availability of Spark 1.2.0! Spark 1.2.0 is
 the third release on the API-compatible 1.X line. It is Spark's
 largest release ever, with contributions from 172 developers and more
 than 1,000 commits!

 This release brings operational and performance improvements in Spark
 core including a new network transport subsytem designed for very
 large shuffles. Spark SQL introduces an API for external data sources
 along with Hive 13 support, dynamic partitioning, and the
 fixed-precision decimal type. MLlib adds a new pipeline-oriented
 package (spark.ml) for composing multiple algorithms. Spark Streaming
 adds a Python API and a write ahead log for fault tolerance. Finally,
 GraphX has graduated from alpha and introduces a stable API along with
 performance improvements.

 Visit the release notes [1] to read about the new features, or
 download [2] the release today.

 For errata in the contributions or release notes, please e-mail me
 *directly* (not on-list).

 Thanks to everyone involved in creating, testing, and documenting this
 release!

 [1] http://spark.apache.org/releases/spark-release-1-2-0.html
 [2] http://spark.apache.org/downloads.html

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org






network.ConnectionManager error

2014-09-17 Thread wyphao.2007
Hi,  When I run spark job on yarn,and the job finished success,but I found 
there are some error logs in the logfile as follow(the red color text):


14/09/17 18:25:03 INFO ui.SparkUI: Stopped Spark web UI at 
http://sparkserver2.cn:63937
14/09/17 18:25:03 INFO scheduler.DAGScheduler: Stopping DAGScheduler
14/09/17 18:25:03 INFO cluster.YarnClusterSchedulerBackend: Shutting down all 
executors
14/09/17 18:25:03 INFO cluster.YarnClusterSchedulerBackend: Asking each 
executor to shut down
14/09/17 18:25:03 INFO network.ConnectionManager: Removing SendingConnection to 
ConnectionManagerId(sparkserver2.cn,9072)
14/09/17 18:25:03 INFO network.ConnectionManager: Removing ReceivingConnection 
to ConnectionManagerId(sparkserver2.cn,9072)
14/09/17 18:25:03 ERROR network.ConnectionManager: Corresponding 
SendingConnection to ConnectionManagerId(sparkserver2.cn,9072) not found
14/09/17 18:25:03 INFO network.ConnectionManager: Removing ReceivingConnection 
to ConnectionManagerId(sparkserver2.cn,14474)
14/09/17 18:25:03 INFO network.ConnectionManager: Removing SendingConnection to 
ConnectionManagerId(sparkserver2.cn,14474)
14/09/17 18:25:03 INFO network.ConnectionManager: Removing SendingConnection to 
ConnectionManagerId(sparkserver2.cn,14474)
14/09/17 18:25:04 INFO spark.MapOutputTrackerMasterActor: MapOutputTrackerActor 
stopped!
14/09/17 18:25:04 INFO network.ConnectionManager: Selector thread was 
interrupted!
14/09/17 18:25:04 INFO network.ConnectionManager: Removing SendingConnection to 
ConnectionManagerId(sparkserver2.cn,9072)
14/09/17 18:25:04 INFO network.ConnectionManager: Removing SendingConnection to 
ConnectionManagerId(sparkserver2.cn,14474)
14/09/17 18:25:04 INFO network.ConnectionManager: Removing ReceivingConnection 
to ConnectionManagerId(sparkserver2.cn,9072)
14/09/17 18:25:04 ERROR network.ConnectionManager: Corresponding 
SendingConnection to ConnectionManagerId(sparkserver2.cn,9072) not found
14/09/17 18:25:04 INFO network.ConnectionManager: Removing ReceivingConnection 
to ConnectionManagerId(sparkserver2.cn,14474)
14/09/17 18:25:04 ERROR network.ConnectionManager: Corresponding 
SendingConnection to ConnectionManagerId(sparkserver2.cn,14474) not found
14/09/17 18:25:04 WARN network.ConnectionManager: All connections not cleaned up
14/09/17 18:25:04 INFO network.ConnectionManager: ConnectionManager stopped
14/09/17 18:25:04 INFO storage.MemoryStore: MemoryStore cleared
14/09/17 18:25:04 INFO storage.BlockManager: BlockManager stopped
14/09/17 18:25:04 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
14/09/17 18:25:04 INFO spark.SparkContext: Successfully stopped SparkContext
14/09/17 18:25:04 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster 
with SUCCEEDED
14/09/17 18:25:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: 
Shutting down remote daemon.
14/09/17 18:25:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote 
daemon shut down; proceeding with flushing remote transports.
14/09/17 18:25:04 INFO impl.AMRMClientImpl: Waiting for application to be 
successfully unregistered.
14/09/17 18:25:04 INFO Remoting: Remoting shut down
14/09/17 18:25:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: 
Remoting shut down.


What is the cause of this error? My spark version is 1.1.0   hadoop version is 
2.2.0.
Thank you.

How to use jdbcRDD in JAVA

2014-09-11 Thread wyphao.2007
Hi,I want to know how to use jdbcRDD in JAVA not scala, trying to figure out 
the last parameter in the constructor of jdbcRDD


thanks