from:"wyphao.2007"

Spark checkpoint problem

2015-11-25 Thread wyphao.2007





I am test checkpoint to understand how it works, My code as following:


scala> val data = sc.parallelize(List("a", "b", "c"))
data: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[0] at 
parallelize at :15


scala> sc.setCheckpointDir("/tmp/checkpoint")
15/11/25 18:09:07 WARN spark.SparkContext: Checkpoint directory must be 
non-local if Spark is running on a cluster: /tmp/checkpoint1


scala> data.checkpoint


scala> val temp = data.map(item => (item, 1))
temp: org.apache.spark.rdd.RDD[(String, Int)] = MapPartitionsRDD[3] at map at 
:17


scala> temp.checkpoint


scala> temp.count


but I found that only the temp RDD is checkpont in the /tmp/checkpoint 
directory, The data RDD is not checkpointed! I found the doCheckpoint function  
in the org.apache.spark.rdd.RDD class:


  private[spark] def doCheckpoint(): Unit = {
RDDOperationScope.withScope(sc, "checkpoint", allowNesting = false, 
ignoreParent = true) {
  if (!doCheckpointCalled) {
doCheckpointCalled = true
if (checkpointData.isDefined) {
  checkpointData.get.checkpoint()
} else {
  dependencies.foreach(_.rdd.doCheckpoint())
}
  }
}
  }


from the code above, Only the last RDD(In my case is temp) will be 
checkpointed, My question : Is deliberately designed or this is a bug?


Thank you.

Spark checkpoint problem

2015-11-25 Thread wyphao.2007

Hi, 


I am test checkpoint to understand how it works, My code as following:


scala> val data = sc.parallelize(List("a", "b", "c"))
data: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[0] at 
parallelize at :15


scala> sc.setCheckpointDir("/tmp/checkpoint")
15/11/25 18:09:07 WARN spark.SparkContext: Checkpoint directory must be 
non-local if Spark is running on a cluster: /tmp/checkpoint1


scala> data.checkpoint


scala> val temp = data.map(item => (item, 1))
temp: org.apache.spark.rdd.RDD[(String, Int)] = MapPartitionsRDD[3] at map at 
:17


scala> temp.checkpoint


scala> temp.count


but I found that only the temp RDD is checkpont in the /tmp/checkpoint 
directory, The data RDD is not checkpointed! I found the doCheckpoint function  
in the org.apache.spark.rdd.RDD class:


  private[spark] def doCheckpoint(): Unit = {
RDDOperationScope.withScope(sc, "checkpoint", allowNesting = false, 
ignoreParent = true) {
  if (!doCheckpointCalled) {
doCheckpointCalled = true
if (checkpointData.isDefined) {
  checkpointData.get.checkpoint()
} else {
  dependencies.foreach(_.rdd.doCheckpoint())
}
  }
}
  }


from the code above, Only the last RDD(In my case is temp) will be 
checkpointed, My question : Is deliberately designed or this is a bug?


Thank you.

Re:RE: Spark checkpoint problem

2015-11-25 Thread wyphao.2007

Spark 1.5.2.


在 2015-11-26 13:19:39，"张志强(旺轩)" <zzq98...@alibaba-inc.com> 写道：


What’s your spark version?

发件人: wyphao.2007 [mailto:wyphao.2...@163.com]
发送时间: 2015年11月26日 10:04
收件人: user
抄送:dev@spark.apache.org
主题: Spark checkpoint problem

I am test checkpoint to understand how it works, My code as following:

 

scala> val data = sc.parallelize(List("a", "b", "c"))

data: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[0] at 
parallelize at :15

 

scala> sc.setCheckpointDir("/tmp/checkpoint")

15/11/25 18:09:07 WARN spark.SparkContext: Checkpoint directory must be 
non-local if Spark is running on a cluster: /tmp/checkpoint1

 

scala> data.checkpoint

 

scala> val temp = data.map(item => (item, 1))

temp: org.apache.spark.rdd.RDD[(String, Int)] = MapPartitionsRDD[3] at map at 
:17

 

scala> temp.checkpoint

 

scala> temp.count

 

but I found that only the temp RDD is checkpont in the /tmp/checkpoint 
directory, The data RDD is not checkpointed! I found the doCheckpoint function  
in the org.apache.spark.rdd.RDD class:

 

  private[spark] def doCheckpoint(): Unit = {

RDDOperationScope.withScope(sc, "checkpoint", allowNesting = false, 
ignoreParent = true) {

  if (!doCheckpointCalled) {

doCheckpointCalled = true

if (checkpointData.isDefined) {

  checkpointData.get.checkpoint()

} else {

  dependencies.foreach(_.rdd.doCheckpoint())

}

  }

}

  }

 

from the code above, Only the last RDD(In my case is temp) will be 
checkpointed, My question : Is deliberately designed or this is a bug?

 

Thank you.

Re:Re:Driver memory leak?

2015-04-29 Thread wyphao.2007

No, I am not collect  the result to driver,I sample send the result to kafka.


BTW, the image address are:
https://cloud.githubusercontent.com/assets/5170878/7389463/ac03bf34-eea0-11e4-9e6b-1d2fba170c1c.png
and 
https://cloud.githubusercontent.com/assets/5170878/7389480/c629d236-eea0-11e4-983a-dc5aa97c2554.png



At 2015-04-29 18:48:33,zhangxiongfei zhangxiongfei0...@163.com wrote:



The mount of memory that the driver consumes depends on your program logic,did 
you try to collect the result of Spark job?




At 2015-04-29 18:42:04, wyphao.2007 wyphao.2...@163.com wrote:

Hi, Dear developer, I am using Spark Streaming to read data from kafka, the 
program already run about 120 hours, but today the program failed because of 
driver's OOM as follow:


Container [pid=49133,containerID=container_1429773909253_0050_02_01] is 
running beyond physical memory limits. Current usage: 2.5 GB of 2.5 GB physical 
memory used; 3.2 GB of 50 GB virtual memory used. Killing container.


I set --driver-memory to 2g, In my mind, driver is responsibility for job 
scheduler and job monitor(Please correct me If I'm wrong), Why it using so much 
memory?


So I using jmap to monitor other program(already run about 48 hours): 
sudo /home/q/java7/jdk1.7.0_45/bin/jmap -histo:live 31256, the result as follow:
the java.util.HashMap$Entry and java.lang.Long  object using about 600Mb memory!


and I also using jmap to monitor other program(already run about 1 hours),  the 
result as follow:
the java.util.HashMap$Entry and java.lang.Long object doesn't using so many 
memory, But I found, as time goes by, the java.util.HashMap$Entry and 
java.lang.Long object will occupied more and more memory,
It is driver's memory leak question? or other reason?
Thanks
Best Regards

Re:Re: java.lang.StackOverflowError when recovery from checkpoint in Streaming

2015-04-28 Thread wyphao.2007

Hi Akhil Das, Thank you for your reply.
It is very similar to my problem, I will focus on it.
Thanks
Best Regards

At 2015-04-28 18:08:32,Akhil Das ak...@sigmoidanalytics.com wrote:
There's a similar issue reported over here
https://issues.apache.org/jira/browse/SPARK-6847

Thanks
Best Regards

On Tue, Apr 28, 2015 at 7:35 AM, wyphao.2007 wyphao.2...@163.com wrote:

  Hi everyone, I am using val messages =
 KafkaUtils.createDirectStream[String, String, StringDecoder,
 StringDecoder](ssc, kafkaParams, topicsSet) to read data from
 kafka(1k/second), and store the data in windows,the code snippets as
 follow:val windowedStreamChannel =
 streamChannel.combineByKey[TreeSet[Obj]](TreeSet[Obj](_), _ += _, _ ++= _,
 new HashPartitioner(numPartition))
   .reduceByKeyAndWindow((x: TreeSet[Obj], y: TreeSet[Obj]) = x
 ++= y,
 (x: TreeSet[Obj], y: TreeSet[Obj]) = x --= y, Minutes(60),
 Seconds(2), numPartition,
 (item: (String, TreeSet[Obj])) = item._2.size != 0)after the
 application  run for an hour,  I kill the application and restart it from
 checkpoint directory, but I  encountered an exception:2015-04-27
 17:52:40,955 INFO  [Driver] - Slicing from 1430126222000 ms to
 1430126222000 ms (aligned to 1430126222000 ms and 1430126222000 ms)
 2015-04-27 17:52:40,958 ERROR [Driver] - User class threw exception: null
 java.lang.StackOverflowError
 at java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
 at
 java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:242)
 at java.io.File.exists(File.java:813)
 at
 sun.misc.URLClassPath$FileLoader.getResource(URLClassPath.java:1080)
 at sun.misc.URLClassPath.getResource(URLClassPath.java:199)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:358)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:190)
 at
 org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
 at org.apache.spark.SparkContext.clean(SparkContext.scala:1623)
 at org.apache.spark.rdd.RDD.filter(RDD.scala:303)
 at
 org.apache.spark.streaming.dstream.FilteredDStream$$anonfun$compute$1.apply(FilteredDStream.scala:35)
 at
 org.apache.spark.streaming.dstream.FilteredDStream$$anonfun$compute$1.apply(FilteredDStream.scala:35)
 at scala.Option.map(Option.scala:145)
 at
 org.apache.spark.streaming.dstream.FilteredDStream.compute(FilteredDStream.scala:35)
 at
 org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300)
 at
 org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300)
 at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
 at
 org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:299)
 at
 org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:287)
 at scala.Option.orElse(Option.scala:257)
 at
 org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:284)
 at
 org.apache.spark.streaming.dstream.FlatMappedDStream.compute(FlatMappedDStream.scala:35)
 at
 org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300)
 at
 org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300)
 at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
 at
 org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:299)
 at
 org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:287)
 at scala.Option.orElse(Option.scala:257)
 at
 org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:284)
 at
 org.apache.spark.streaming.dstream.FilteredDStream.compute(FilteredDStream.scala:35)
 at
 org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300)
 at
 org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300)
 at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
 at
 org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:299)
 at
 org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:287)
 at scala.Option.orElse(Option.scala:257

java.lang.StackOverflowError when recovery from checkpoint in Streaming

2015-04-27 Thread wyphao.2007

 Hi everyone, I am using val messages = KafkaUtils.createDirectStream[String, 
String, StringDecoder, StringDecoder](ssc, kafkaParams, topicsSet) to read data 
from kafka(1k/second), and store the data in windows,the code snippets as 
follow:val windowedStreamChannel = 
streamChannel.combineByKey[TreeSet[Obj]](TreeSet[Obj](_), _ += _, _ ++= _, new 
HashPartitioner(numPartition))
  .reduceByKeyAndWindow((x: TreeSet[Obj], y: TreeSet[Obj]) = x ++= y,
(x: TreeSet[Obj], y: TreeSet[Obj]) = x --= y, Minutes(60), 
Seconds(2), numPartition,
(item: (String, TreeSet[Obj])) = item._2.size != 0)after the 
application  run for an hour,  I kill the application and restart it from 
checkpoint directory, but I  encountered an exception:2015-04-27 17:52:40,955 
INFO  [Driver] - Slicing from 1430126222000 ms to 1430126222000 ms (aligned to 
1430126222000 ms and 1430126222000 ms)
2015-04-27 17:52:40,958 ERROR [Driver] - User class threw exception: null
java.lang.StackOverflowError
at java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
at java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:242)
at java.io.File.exists(File.java:813)
at sun.misc.URLClassPath$FileLoader.getResource(URLClassPath.java:1080)
at sun.misc.URLClassPath.getResource(URLClassPath.java:199)
at java.net.URLClassLoader$1.run(URLClassLoader.java:358)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:190)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
at org.apache.spark.SparkContext.clean(SparkContext.scala:1623)
at org.apache.spark.rdd.RDD.filter(RDD.scala:303)
at 
org.apache.spark.streaming.dstream.FilteredDStream$$anonfun$compute$1.apply(FilteredDStream.scala:35)
at 
org.apache.spark.streaming.dstream.FilteredDStream$$anonfun$compute$1.apply(FilteredDStream.scala:35)
at scala.Option.map(Option.scala:145)
at 
org.apache.spark.streaming.dstream.FilteredDStream.compute(FilteredDStream.scala:35)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:299)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:287)
at scala.Option.orElse(Option.scala:257)
at 
org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:284)
at 
org.apache.spark.streaming.dstream.FlatMappedDStream.compute(FlatMappedDStream.scala:35)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:299)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:287)
at scala.Option.orElse(Option.scala:257)
at 
org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:284)
at 
org.apache.spark.streaming.dstream.FilteredDStream.compute(FilteredDStream.scala:35)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:299)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:287)
at scala.Option.orElse(Option.scala:257)
at 
org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:284)
at 
org.apache.spark.streaming.dstream.ShuffledDStream.compute(ShuffledDStream.scala:41)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300)

Question about recovery from checkpoint exception[SPARK-6892]

2015-04-19 Thread wyphao.2007

Hi, 
   When I recovery from checkpoint in yarn-cluster mode using Spark Streaming, 
I found it will reuse the application id (In my case is 
application_1428664056212_0016) before falied to write spark eventLog, But now 
my application id is application_1428664056212_0017,then spark write eventLog 
will falied, the stacktrace as follow:
15/04/14 10:14:01 WARN util.ShutdownHookManager: ShutdownHook '$anon$3' failed, 
java.io.IOException: Target log file already exists 
(hdfs://mycluster/spark-logs/eventLog/application_1428664056212_0016)
java.io.IOException: Target log file already exists 
(hdfs://mycluster/spark-logs/eventLog/application_1428664056212_0016)
at 
org.apache.spark.scheduler.EventLoggingListener.stop(EventLoggingListener.scala:201)
at 
org.apache.spark.SparkContext$$anonfun$stop$4.apply(SparkContext.scala:1388)
at 
org.apache.spark.SparkContext$$anonfun$stop$4.apply(SparkContext.scala:1388)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.SparkContext.stop(SparkContext.scala:1388)
at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:107)
at 
org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
Is someone can help me, The issue is SPARK-6892.
thanks

Re:Re: Question about recovery from checkpoint exception[SPARK-6892]

2015-04-19 Thread wyphao.2007

Hi Sean Owen, Thank you for your attention.


I know spark.hadoop.validateOutputSpecs. 


I restart the job, the application id is application_1428664056212_0017 and it 
recovery from checkpoint, and it will write eventLog into 
application_1428664056212_0016 dir, I think it shoud write to 
application_1428664056212_0017 not application_1428664056212_0016.



At 2015-04-20 11:46:12,Sean Owen so...@cloudera.com wrote:
This is why spark.hadoop.validateOutputSpecs exists, really:
https://spark.apache.org/docs/latest/configuration.html

On Mon, Apr 20, 2015 at 3:40 AM, wyphao.2007 wyphao.2...@163.com wrote:
 Hi,
When I recovery from checkpoint in yarn-cluster mode using Spark 
 Streaming, I found it will reuse the application id (In my case is 
 application_1428664056212_0016) before falied to write spark eventLog, But 
 now my application id is application_1428664056212_0017,then spark write 
 eventLog will falied, the stacktrace as follow:
 15/04/14 10:14:01 WARN util.ShutdownHookManager: ShutdownHook '$anon$3' 
 failed, java.io.IOException: Target log file already exists 
 (hdfs://mycluster/spark-logs/eventLog/application_1428664056212_0016)
 java.io.IOException: Target log file already exists 
 (hdfs://mycluster/spark-logs/eventLog/application_1428664056212_0016)
 at 
 org.apache.spark.scheduler.EventLoggingListener.stop(EventLoggingListener.scala:201)
 at 
 org.apache.spark.SparkContext$$anonfun$stop$4.apply(SparkContext.scala:1388)
 at 
 org.apache.spark.SparkContext$$anonfun$stop$4.apply(SparkContext.scala:1388)
 at scala.Option.foreach(Option.scala:236)
 at org.apache.spark.SparkContext.stop(SparkContext.scala:1388)
 at 
 org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:107)
 at 
 org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
 Is someone can help me, The issue is SPARK-6892.
 thanks




-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

How to get removed RDD from windows？

2015-03-30 Thread wyphao.2007

I want to get removed RDD from windows as follow, The old RDDs will removed 
from current window, 
//  _
// |  previous window   _|___
// |___|   current window|  -- Time
// |_|
//
// | _|  | _|
//  | |
//  V V
//   old RDDs new RDDs
//
I find  the slice function in DStream class can return the DStream between 
fromTime to  toTime. But when I use the function as follow:


val now = System.currentTimeMillis()
result.slice(new Time(now - 30 * 1000), new Time(now - 30 * 1000 + 
result.slideDuration.milliseconds)).foreach(item = println(xxx + item))
ssc.start()


30 is the window's duration,Then I got zeroTime has not been initialized 
exception. 


Is anyone can help me? thx!

Use mvn to build Spark 1.2.0 failed

2014-12-21 Thread wyphao.2007

Hi all, Today download Spark source from http://spark.apache.org/downloads.html 
page, and I use


 ./make-distribution.sh --tgz -Phadoop-2.2 -Pyarn -DskipTests 
-Dhadoop.version=2.2.0 -Phive


to build the release, but I encountered an exception as follow:


[INFO] --- build-helper-maven-plugin:1.8:add-source (add-scala-sources) @ 
spark-parent ---
[INFO] Source directory: /home/q/spark/spark-1.2.0/src/main/scala added.
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.5:process (default) @ spark-parent 
---
[INFO] 
[INFO] Reactor Summary:
[INFO] 
[INFO] Spark Project Parent POM .. FAILURE [1.015s]
[INFO] Spark Project Networking .. SKIPPED
[INFO] Spark Project Shuffle Streaming Service ... SKIPPED
[INFO] Spark Project Core  SKIPPED
[INFO] Spark Project Bagel ... SKIPPED
[INFO] Spark Project GraphX .. SKIPPED
[INFO] Spark Project Streaming ... SKIPPED
[INFO] Spark Project Catalyst  SKIPPED
[INFO] Spark Project SQL . SKIPPED
[INFO] Spark Project ML Library .. SKIPPED
[INFO] Spark Project Tools ... SKIPPED
[INFO] Spark Project Hive  SKIPPED
[INFO] Spark Project REPL  SKIPPED
[INFO] Spark Project YARN Parent POM . SKIPPED
[INFO] Spark Project YARN Stable API . SKIPPED
[INFO] Spark Project Assembly  SKIPPED
[INFO] Spark Project External Twitter  SKIPPED
[INFO] Spark Project External Flume Sink . SKIPPED
[INFO] Spark Project External Flume .. SKIPPED
[INFO] Spark Project External MQTT ... SKIPPED
[INFO] Spark Project External ZeroMQ . SKIPPED
[INFO] Spark Project External Kafka .. SKIPPED
[INFO] Spark Project Examples  SKIPPED
[INFO] Spark Project YARN Shuffle Service  SKIPPED
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 1.644s
[INFO] Finished at: Mon Dec 22 10:56:35 CST 2014
[INFO] Final Memory: 21M/481M
[INFO] 
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-remote-resources-plugin:1.5:process (default) on 
project spark-parent: Error finding remote resources manifests: 
/home/q/spark/spark-1.2.0/target/maven-shared-archive-resources/META-INF/NOTICE 
(No such file or directory) - [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException


but the NOTICE file is in the download spark release:


[wyp@spark  /home/q/spark/spark-1.2.0]$ ll
total 248
drwxrwxr-x 3 1000 1000  4096 Dec 10 18:02 assembly
drwxrwxr-x 3 1000 1000  4096 Dec 10 18:02 bagel
drwxrwxr-x 2 1000 1000  4096 Dec 10 18:02 bin
drwxrwxr-x 2 1000 1000  4096 Dec 10 18:02 conf
-rw-rw-r-- 1 1000 1000   663 Dec 10 18:02 CONTRIBUTING.md
drwxrwxr-x 3 1000 1000  4096 Dec 10 18:02 core
drwxrwxr-x 3 1000 1000  4096 Dec 10 18:02 data
drwxrwxr-x 4 1000 1000  4096 Dec 10 18:02 dev
drwxrwxr-x 3 1000 1000  4096 Dec 10 18:02 docker
drwxrwxr-x 7 1000 1000  4096 Dec 10 18:02 docs
drwxrwxr-x 4 1000 1000  4096 Dec 10 18:02 ec2
drwxrwxr-x 4 1000 1000  4096 Dec 10 18:02 examples
drwxrwxr-x 8 1000 1000  4096 Dec 10 18:02 external
drwxrwxr-x 5 1000 1000  4096 Dec 10 18:02 extras
drwxrwxr-x 4 1000 1000  4096 Dec 10 18:02 graphx
-rw-rw-r-- 1 1000 1000 45242 Dec 10 18:02 LICENSE
-rwxrwxr-x 1 1000 1000  7941 Dec 10 18:02 make-distribution.sh
drwxrwxr-x 3 1000 1000  4096 Dec 10 18:02 mllib
drwxrwxr-x 5 1000 1000  4096 Dec 10 18:02 network
-rw-rw-r-- 1 1000 1000 22559 Dec 10 18:02 NOTICE
-rw-rw-r-- 1 1000 1000 49002 Dec 10 18:02 pom.xml
drwxrwxr-x 4 1000 1000  4096 Dec 10 18:02 project
drwxrwxr-x 6 1000 1000  4096 Dec 10 18:02 python
-rw-rw-r-- 1 1000 1000  3645 Dec 10 18:02 README.md
drwxrwxr-x 5 1000 1000  4096 Dec 10 18:02 repl
drwxrwxr-x 2 1000 1000  4096 Dec 10 18:02 sbin
drwxrwxr-x 2 1000 1000  4096 Dec 10 18:02 sbt
-rw-rw-r-- 1 1000 1000  7804 Dec 10 18:02 scalastyle-config.xml
drwxrwxr-x 6 1000 1000  4096 Dec 10 18:02 sql
drwxrwxr-x 3 1000 1000  4096 Dec 10 18:02 streaming
drwxrwxr-x 3 1000 1000  4096 Dec 10 18:02 tools
-rw-rw-r-- 1 1000 1000   838 Dec 10 18:02 tox.ini
drwxrwxr-x 5

Re:Re: Announcing Spark 1.2!

2014-12-19 Thread wyphao.2007



In the http://spark.apache.org/downloads.html page,We cann't download the 
newest Spark release.  






At 2014-12-19 17:55:29,Sean Owen so...@cloudera.com wrote:
Tag 1.2.0 is older than 1.2.0-rc2. I wonder if it just didn't get
updated. I assume it's going to be 1.2.0-rc2 plus a few commits
related to the release process.

On Fri, Dec 19, 2014 at 9:50 AM, Shixiong Zhu zsxw...@gmail.com wrote:
 Congrats!

 A little question about this release: Which commit is this release based on?
 v1.2.0 and v1.2.0-rc2 are pointed to different commits in
 https://github.com/apache/spark/releases

 Best Regards,

 Shixiong Zhu

 2014-12-19 16:52 GMT+08:00 Patrick Wendell pwend...@gmail.com:

 I'm happy to announce the availability of Spark 1.2.0! Spark 1.2.0 is
 the third release on the API-compatible 1.X line. It is Spark's
 largest release ever, with contributions from 172 developers and more
 than 1,000 commits!

 This release brings operational and performance improvements in Spark
 core including a new network transport subsytem designed for very
 large shuffles. Spark SQL introduces an API for external data sources
 along with Hive 13 support, dynamic partitioning, and the
 fixed-precision decimal type. MLlib adds a new pipeline-oriented
 package (spark.ml) for composing multiple algorithms. Spark Streaming
 adds a Python API and a write ahead log for fault tolerance. Finally,
 GraphX has graduated from alpha and introduces a stable API along with
 performance improvements.

 Visit the release notes [1] to read about the new features, or
 download [2] the release today.

 For errata in the contributions or release notes, please e-mail me
 *directly* (not on-list).

 Thanks to everyone involved in creating, testing, and documenting this
 release!

 [1] http://spark.apache.org/releases/spark-release-1-2-0.html
 [2] http://spark.apache.org/downloads.html

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

network.ConnectionManager error

2014-09-17 Thread wyphao.2007

Hi,  When I run spark job on yarn,and the job finished success,but I found 
there are some error logs in the logfile as follow(the red color text):


14/09/17 18:25:03 INFO ui.SparkUI: Stopped Spark web UI at 
http://sparkserver2.cn:63937
14/09/17 18:25:03 INFO scheduler.DAGScheduler: Stopping DAGScheduler
14/09/17 18:25:03 INFO cluster.YarnClusterSchedulerBackend: Shutting down all 
executors
14/09/17 18:25:03 INFO cluster.YarnClusterSchedulerBackend: Asking each 
executor to shut down
14/09/17 18:25:03 INFO network.ConnectionManager: Removing SendingConnection to 
ConnectionManagerId(sparkserver2.cn,9072)
14/09/17 18:25:03 INFO network.ConnectionManager: Removing ReceivingConnection 
to ConnectionManagerId(sparkserver2.cn,9072)
14/09/17 18:25:03 ERROR network.ConnectionManager: Corresponding 
SendingConnection to ConnectionManagerId(sparkserver2.cn,9072) not found
14/09/17 18:25:03 INFO network.ConnectionManager: Removing ReceivingConnection 
to ConnectionManagerId(sparkserver2.cn,14474)
14/09/17 18:25:03 INFO network.ConnectionManager: Removing SendingConnection to 
ConnectionManagerId(sparkserver2.cn,14474)
14/09/17 18:25:03 INFO network.ConnectionManager: Removing SendingConnection to 
ConnectionManagerId(sparkserver2.cn,14474)
14/09/17 18:25:04 INFO spark.MapOutputTrackerMasterActor: MapOutputTrackerActor 
stopped!
14/09/17 18:25:04 INFO network.ConnectionManager: Selector thread was 
interrupted!
14/09/17 18:25:04 INFO network.ConnectionManager: Removing SendingConnection to 
ConnectionManagerId(sparkserver2.cn,9072)
14/09/17 18:25:04 INFO network.ConnectionManager: Removing SendingConnection to 
ConnectionManagerId(sparkserver2.cn,14474)
14/09/17 18:25:04 INFO network.ConnectionManager: Removing ReceivingConnection 
to ConnectionManagerId(sparkserver2.cn,9072)
14/09/17 18:25:04 ERROR network.ConnectionManager: Corresponding 
SendingConnection to ConnectionManagerId(sparkserver2.cn,9072) not found
14/09/17 18:25:04 INFO network.ConnectionManager: Removing ReceivingConnection 
to ConnectionManagerId(sparkserver2.cn,14474)
14/09/17 18:25:04 ERROR network.ConnectionManager: Corresponding 
SendingConnection to ConnectionManagerId(sparkserver2.cn,14474) not found
14/09/17 18:25:04 WARN network.ConnectionManager: All connections not cleaned up
14/09/17 18:25:04 INFO network.ConnectionManager: ConnectionManager stopped
14/09/17 18:25:04 INFO storage.MemoryStore: MemoryStore cleared
14/09/17 18:25:04 INFO storage.BlockManager: BlockManager stopped
14/09/17 18:25:04 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
14/09/17 18:25:04 INFO spark.SparkContext: Successfully stopped SparkContext
14/09/17 18:25:04 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster 
with SUCCEEDED
14/09/17 18:25:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: 
Shutting down remote daemon.
14/09/17 18:25:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote 
daemon shut down; proceeding with flushing remote transports.
14/09/17 18:25:04 INFO impl.AMRMClientImpl: Waiting for application to be 
successfully unregistered.
14/09/17 18:25:04 INFO Remoting: Remoting shut down
14/09/17 18:25:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: 
Remoting shut down.


What is the cause of this error? My spark version is 1.1.0   hadoop version is 
2.2.0.
Thank you.

How to use jdbcRDD in JAVA

2014-09-11 Thread wyphao.2007

Hi,I want to know how to use jdbcRDD in JAVA not scala, trying to figure out 
the last parameter in the constructor of jdbcRDD


thanks

Spark checkpoint problem

Spark checkpoint problem

Re:RE: Spark checkpoint problem

Re:Re:Driver memory leak?

Re:Re: java.lang.StackOverflowError when recovery from checkpoint in Streaming

java.lang.StackOverflowError when recovery from checkpoint in Streaming

Question about recovery from checkpoint exception[SPARK-6892]

Re:Re: Question about recovery from checkpoint exception[SPARK-6892]

How to get removed RDD from windows？

Use mvn to build Spark 1.2.0 failed

Re:Re: Announcing Spark 1.2!

network.ConnectionManager error

How to use jdbcRDD in JAVA

13 matches

Site Navigation

Mail list logo

Footer information