unsubscribe

2024-05-03 Thread Bing



 Replied Message 
| From | Wood Super |
| Date | 05/01/2024 07:49 |
| To | user  |
| Subject | unsubscribe |
unsubscribe


Re: Re: spark job paused(active stages finished)

2017-11-09 Thread bing...@iflytek.com
Thank you for your reply.

But,sometimes successed, when i rerun the job.
And the job process the same data using the same code.

 
From: Margusja
Date: 2017-11-09 14:25
To: bing...@iflytek.com
CC: user
Subject: Re: spark job paused(active stages finished)
You have to deal with failed jobs. In example try catch in your code.

Br Margus Roo


On 9 Nov 2017, at 05:37, bing...@iflytek.com wrote:

Dear,All
I have a simple spark job, as below, all tasks in the stage 2(sth failed, 
retry) already finished. But the next stage never run.


   
driver thread dump:  attachment( thread.dump)
driver last log:


 driver do not receive the 16 retry tasks report.Thank you ideas.


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Does spark restart the executors if its nodemanager crashes?

2016-01-12 Thread Bing Jiang
hi, guys.
We have set up the dynamic allocation resource on spark-yarn. Now we use
spark 1.5.
One executor tries to fetch data from another nodemanager's shuffle
service, and the nodemanager crashes, which makes the executor stop on the
states util the crashed nodemanager has been launched again.

I just want to know whether spark will resubmit the completed tasks if the
latter tasks being executing cannot find the output?

Thanks for any explanation.

-- 
Bing Jiang


Package Release Annoucement: Spark SQL on HBase Astro

2015-07-22 Thread Bing Xiao (Bing)
We are happy to announce the availability of the Spark SQL on HBase 1.0.0 
release.  http://spark-packages.org/package/Huawei-Spark/Spark-SQL-on-HBase
The main features in this package, dubbed Astro, include:

* Systematic and powerful handling of data pruning and intelligent 
scan, based on partial evaluation technique

* HBase pushdown capabilities like custom filters and coprocessor to 
support ultra low latency processing

* SQL, Data Frame support

* More SQL capabilities made possible (Secondary index, bloom filter, 
Primary Key, Bulk load, Update)

* Joins with data from other sources

* Python/Java/Scala support

* Support latest Spark 1.4.0 release


The tests by Huawei team and community contributors covered the areas: bulk 
load; projection pruning; partition pruning; partial evaluation; code 
generation; coprocessor; customer filtering; DML; complex filtering on keys and 
non-keys; Join/union with non-Hbase data; Data Frame; multi-column family test. 
 We will post the test results including performance tests the middle of August.
You are very welcomed to try out or deploy the package, and help improve the 
integration tests with various combinations of the settings, extensive Data 
Frame tests, complex join/union test and extensive performance tests.  Please 
use the Issues Pull Requests links at this package homepage, if you want to 
report bugs, improvement or feature requests.
Special thanks to project owner and technical leader Yan Zhou, Huawei global 
team, community contributors and Databricks.   Databricks has been providing 
great assistance from the design to the release.
Astro, the Spark SQL on HBase package will be useful for ultra low latency 
query and analytics of large scale data sets in vertical enterprises. We will 
continue to work with the community to develop new features and improve code 
base.  Your comments and suggestions are greatly appreciated.

Yan Zhou / Bing Xiao
Huawei Big Data team



fail to run LBFS in 5G KDD data in spark 1.0.1?

2014-08-06 Thread Lizhengbing (bing, BIPA)
1 I don't use spark_submit to run my problem and use spark context directly
val conf = new SparkConf()
 .setMaster(spark://123d101suse11sp3:7077)
 .setAppName(LBFGS)
 .set(spark.executor.memory, 30g)
 .set(spark.akka.frameSize,20)
val sc = new SparkContext(conf)

2 I use KDD data, size is about 5G

3 After I execute LBFGS.runLBFGS, at the stage of 7, the problem occus:

[cid:image001.png@01CFB197.A3BD3D60]

14/08/06 16:44:45 INFO DAGScheduler: Failed to run aggregate at LBFGS.scala:201
Exception in thread main org.apache.spark.SparkException: Job aborted due to 
stage failure: Task 7.0:12 failed 4 times, most recent failure: TID 304 on host 
123d103suse11sp3 failed for unknown reason
Driver stacktrace:
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
at scala.Option.foreach(Option.scala:236)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:634)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1229)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)


答复: fail to run LBFS in 5G KDD data in spark 1.0.1?

2014-08-06 Thread Lizhengbing (bing, BIPA)
I have test it in spark-1.1.0-SNAPSHOT.
It is ok now

发件人: Xiangrui Meng [mailto:men...@gmail.com]
发送时间: 2014年8月6日 23:12
收件人: Lizhengbing (bing, BIPA)
抄送: user@spark.apache.org
主题: Re: fail to run LBFS in 5G KDD data in spark 1.0.1?

Do you mind testing 1.1-SNAPSHOT and allocating more memory to the driver? I 
think the problem is with the feature dimension. KDD data has more than 20M 
features and in v1.0.1, the driver collects the partial gradients one by one, 
sums them up, does the update, and then sends the new weights back to executors 
one by one. In 1.1-SNAPSHOT, we switched to multi-level tree aggregation and 
torrent broadcasting.

For the driver memory, you can set it with spark-summit using `--driver-memory 
30g`. It could be confirmed by visiting the storage tab in the WebUI.

-Xiangrui

On Wed, Aug 6, 2014 at 1:58 AM, Lizhengbing (bing, BIPA) 
zhengbing...@huawei.commailto:zhengbing...@huawei.com wrote:
1 I don’t use spark_submit to run my problem and use spark context directly
val conf = new SparkConf()
 .setMaster(spark://123d101suse11sp3:7077)
 .setAppName(LBFGS)
 .set(spark.executor.memory, 30g)
 .set(spark.akka.frameSize,20)
val sc = new SparkContext(conf)

2 I use KDD data, size is about 5G

3 After I execute LBFGS.runLBFGS, at the stage of 7, the problem occus:

[cid:image001.png@01CFB234.3AA725F0]

14/08/06 16:44:45 INFO DAGScheduler: Failed to run aggregate at LBFGS.scala:201
Exception in thread main org.apache.spark.SparkException: Job aborted due to 
stage failure: Task 7.0:12 failed 4 times, most recent failure: TID 304 on host 
123d103suse11sp3 failed for unknown reason
Driver stacktrace:
at 
org.apache.spark.scheduler.DAGScheduler.orghttp://org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
at scala.Option.foreach(Option.scala:236)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:634)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1229)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)



How can I integrate spark cluster into my own program without using spark-submit?

2014-07-26 Thread Lizhengbing (bing, BIPA)
I want to use spark cluster through a scala function. So I can integrate spark 
into my program directly.
For example:
  When I call count function in my own program, my program will deploy the 
function to the cluster , so I can get the result directly
  def count()=
{
val master = spark://mache123:7077
  val appName = control_test
   val sc = new SparkContext(master, appName)
   val rdd =  sc.textFile(hdfs://123d101suse11sp3:9000/netflix/netflix.test)
val count = rdd.count
   System.out.println(rdd.count =  + count)
count

}


答复: Spark RDD Disk Persistance

2014-07-08 Thread Lizhengbing (bing, BIPA)
You might  let your data stored in tachyon

发件人: Jahagirdar, Madhu [mailto:madhu.jahagir...@philips.com]
发送时间: 2014年7月8日 10:16
收件人: user@spark.apache.org
主题: Spark RDD Disk Persistance

Should i use Disk based Persistance for RDD's and if the machine goes down 
during the program execution, next time when i rerun the program would the data 
be intact and not lost ?

Regards,
Madhu Jahagirdar


The information contained in this message may be confidential and legally 
protected under applicable law. The message is intended solely for the 
addressee(s). If you are not the intended recipient, you are hereby notified 
that any use, forwarding, dissemination, or reproduction of this message is 
strictly prohibited and may be unlawful. If you are not the intended recipient, 
please contact the sender by return e-mail and destroy all copies of the 
original message.