thanks Ak,      thanks for your idea. I had tried using spark to do what the 
shell did. However it is not fast enough as I expected and not very easy.     


--------------------------------

 

Thanks&Best regards!
San.Luo

----- 原始邮件 -----
发件人:Akhil Das <ak...@sigmoidanalytics.com>
收件人:罗辉 <luohui20...@sina.com>
抄送人:user <user@spark.apache.org>
主题:Re: Re: Re: How to decrease the time of storing block in memory
日期:2015年06月09日 18点05分

Hi 罗辉
I think you interpret the logs wrong. 
Your program actually runs from this point: (Rest of them are just starting up 
stuffs and connecting)
15/06/08 16:14:22 INFO broadcast.TorrentBroadcast: Started reading broadcast 
variable 015/06/08 16:14:23 INFO storage.MemoryStore: ensureFreeSpace(1561) 
called with curMem=0, maxMem=37050384315/06/08 16:14:23 INFO 
storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory 
(estimated size 1561.0 B, free 353.3 MB)15/06/08 16:14:23 INFO 
storage.BlockManagerMaster: Updated info of block broadcast_0_piece015/06/08 
16:14:23 INFO broadcast.TorrentBroadcast: Reading broadcast variable 0 took 967 
ms15/06/08 16:14:23 INFO storage.MemoryStore: ensureFreeSpace(2168) called with 
curMem=1561, maxMem=37050384315/06/08 16:14:23 INFO storage.MemoryStore: Block 
broadcast_0 stored as values in memory (estimated size 2.1 KB, free 353.3 MB)
=> At this point it has already stored the broadcast piece in memory. And 
starts your Task 0
15/06/08 16:14:42 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 
0). 693 bytes result sent to driver
=> It took 19s to finish your Task 0, and starts Task 1 from this point

15/06/08 16:14:42 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 
115/06/08 16:14:42 INFO executor.Executor: Running task 1.0 in stage 0.0 (TID 
1)15/06/08 16:14:56 INFO executor.Executor: Finished task 1.0 in stage 0.0 (TID 
1). 693 bytes result sent to driver15/06/08 16:14:56 INFO 
executor.CoarseGrainedExecutorBackend: Driver commanded a shutdown

Now, to speed up things you need to obtain parallelism (at least 2-3 times the 
number of cores you have), which could mean that your sort.sh is running on a 
single core. You can perhaps instead of triggering an external command try to 
do the operation within spark itself, in that way you can always control the 
parallelism and stuffs. Hope it helps.

ThanksBest Regards

On Tue, Jun 9, 2015 at 3:00 PM,  <luohui20...@sina.com> wrote:
hi akhil
Not exactly ,the task took 54s to finish, started from 16:14:02 and ended at 
16:14:56. 
within this 54s , it needs 19s to store value in memory, which started from 
16:14:23 and ended at 16:14:42. I think this is the most time-wasting part of 
this task ,also unreasonable.You may check the log attached in previous mail.
and here is my codes:

import org.apache.spark._

object GeneCompare3 {
  def main(args: Array[String]) {
    //i:piece number, j:user number
    val i = args(0).toInt
    val j = args(1)
    val conf = new SparkConf().setAppName("CompareGenePiece " + i + " of User " 
+ j).setMaster("spark://slave3:7077").set("spark.executor.memory", "2g")
    val sc = new SparkContext(conf)
    println("start to compare gene")
    val runmodifyshell2 = List("run", "sort.sh")
    val runmodifyshellRDD2 = sc.makeRDD(runmodifyshell2)
    val pipeModify2 = runmodifyshellRDD2.pipe("sh /opt/sh/bin/sort.sh 
/opt/data/shellcompare/db/chr" + i + ".txt /opt/data/shellcompare/data/user" + 
j + "/pgs/sample/samplechr" + i + ".txt /opt/data/shellcompare/data/user" + j + 
"/pgs/intermediateResult/result" + i + ".txt 600")
    pipeModify2.collect()    sc.stop()
  }
}

--------------------------------

 

Thanks&amp;Best regards!
San.Luo

----- 原始邮件 -----
发件人:Akhil Das <ak...@sigmoidanalytics.com>
收件人:罗辉 <luohui20...@sina.com>
抄送人:user <user@spark.apache.org>
主题:Re: Re: How to decrease the time of storing block in memory
日期:2015年06月09日 16点51分

Is it that task taking 19s? It won't be simply taking 19s to store 2KB of data 
into memory there could be other operations happening too (the transformations 
that you are doing), It would be good if you can paste the code snippet that 
you are running to have a better understanding.ThanksBest Regards

On Tue, Jun 9, 2015 at 2:09 PM,  <luohui20...@sina.com> wrote:
Only 1 minor GC, 0.07s. 


--------------------------------

 

Thanks&amp;Best regards!
San.Luo

----- 原始邮件 -----
发件人:Akhil Das <ak...@sigmoidanalytics.com>
收件人:罗辉 <luohui20...@sina.com>
抄送人:user <user@spark.apache.org>
主题:Re: How to decrease the time of storing block in memory
日期:2015年06月09日 15点02分

May be you should check in your driver UI and see if there's any GC time 
involved etc. ThanksBest Regards

On Mon, Jun 8, 2015 at 5:45 PM,  <luohui20...@sina.com> wrote:
hi there      I am trying to descrease my app's running time in worker node. I 
checked the log and found the most time-wasting part is below:15/06/08 16:14:23 
INFO storage.MemoryStore: Block broadcast_0 stored as values in memory 
(estimated size 2.1 KB, free 353.3 MB)
15/06/08 16:14:42 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 
0). 693 bytes result sent to driver

I don't know why it needs 19s to storing 2.1KB size data to memory. Is there 
any tuning method?

The attache is the full log, here it is:15/06/08 16:14:02 INFO 
executor.CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, 
HUP, INT]
15/06/08 16:14:07 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
15/06/08 16:14:10 INFO spark.SecurityManager: Changing view acls to: root
15/06/08 16:14:10 INFO spark.SecurityManager: Changing modify acls to: root
15/06/08 16:14:10 INFO spark.SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users with view permissions: Set(root); users with 
modify permissions: Set(root)
15/06/08 16:14:14 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/06/08 16:14:14 INFO Remoting: Starting remoting
15/06/08 16:14:15 INFO Remoting: Remoting started; listening on addresses 
:[akka.tcp://driverPropsFetcher@slave5:54684]
15/06/08 16:14:15 INFO util.Utils: Successfully started service 
'driverPropsFetcher' on port 54684.
15/06/08 16:14:16 INFO spark.SecurityManager: Changing view acls to: root
15/06/08 16:14:16 INFO spark.SecurityManager: Changing modify acls to: root
15/06/08 16:14:16 INFO spark.SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users with view permissions: Set(root); users with 
modify permissions: Set(root)
15/06/08 16:14:16 INFO remote.RemoteActorRefProvider$RemotingTerminator: 
Shutting down remote daemon.
15/06/08 16:14:16 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote 
daemon shut down; proceeding with flushing remote transports.
15/06/08 16:14:16 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/06/08 16:14:16 INFO Remoting: Starting remoting
15/06/08 16:14:17 INFO remote.RemoteActorRefProvider$RemotingTerminator: 
Remoting shut down.
15/06/08 16:14:17 INFO Remoting: Remoting started; listening on addresses 
:[akka.tcp://sparkExecutor@slave5:49169]
15/06/08 16:14:17 INFO util.Utils: Successfully started service 'sparkExecutor' 
on port 49169.
15/06/08 16:14:17 INFO util.AkkaUtils: Connecting to MapOutputTracker: 
akka.tcp://sparkDriver@slave5:58630/user/MapOutputTracker
15/06/08 16:14:17 INFO util.AkkaUtils: Connecting to BlockManagerMaster: 
akka.tcp://sparkDriver@slave5:58630/user/BlockManagerMaster
15/06/08 16:14:17 INFO storage.DiskBlockManager: Created local directory at 
/tmp/spark-548b4618-4aba-4b63-9467-381fbfea8d5b/spark-83737dd6-46b0-47ee-82a5-5afee46bdbf5/spark-0fe9d8ba-2910-44a2-bf4f-80d179f5d58b/blockmgr-b4884e7a-2527-447a-9fc5-1823d923c2f1
15/06/08 16:14:17 INFO storage.MemoryStore: MemoryStore started with capacity 
353.3 MB
15/06/08 16:14:17 INFO util.AkkaUtils: Connecting to OutputCommitCoordinator: 
akka.tcp://sparkDriver@slave5:58630/user/OutputCommitCoordinator
15/06/08 16:14:17 INFO executor.CoarseGrainedExecutorBackend: Connecting to 
driver: akka.tcp://sparkDriver@slave5:58630/user/CoarseGrainedScheduler
15/06/08 16:14:18 INFO worker.WorkerWatcher: Connecting to worker 
akka.tcp://sparkWorker@slave5:48926/user/Worker
15/06/08 16:14:18 INFO executor.CoarseGrainedExecutorBackend: Successfully 
registered with driver
15/06/08 16:14:18 INFO executor.Executor: Starting executor ID 0 on host slave5
15/06/08 16:14:18 INFO worker.WorkerWatcher: Successfully connected to 
akka.tcp://sparkWorker@slave5:48926/user/Worker
15/06/08 16:14:21 WARN internal.ThreadLocalRandom: Failed to generate a seed 
from SecureRandom within 3 seconds. Not enough entrophy?
15/06/08 16:14:21 INFO netty.NettyBlockTransferService: Server created on 53449
15/06/08 16:14:21 INFO storage.BlockManagerMaster: Trying to register 
BlockManager
15/06/08 16:14:21 INFO storage.BlockManagerMaster: Registered BlockManager
15/06/08 16:14:21 INFO util.AkkaUtils: Connecting to HeartbeatReceiver: 
akka.tcp://sparkDriver@slave5:58630/user/HeartbeatReceiver
15/06/08 16:14:21 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 0
15/06/08 16:14:21 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)
15/06/08 16:14:21 INFO executor.Executor: Fetching 
http://192.168.100.11:50648/jars/ShellCompare.jar with timestamp 1433751228699
15/06/08 16:14:21 INFO util.Utils: Fetching 
http://192.168.100.11:50648/jars/ShellCompare.jar to 
/tmp/spark-548b4618-4aba-4b63-9467-381fbfea8d5b/spark-83737dd6-46b0-47ee-82a5-5afee46bdbf5/spark-99aba156-5803-41b8-9df1-3e36305f43c3/fetchFileTemp8512004298922421624.tmp
15/06/08 16:14:21 INFO util.Utils: Copying 
/tmp/spark-548b4618-4aba-4b63-9467-381fbfea8d5b/spark-83737dd6-46b0-47ee-82a5-5afee46bdbf5/spark-99aba156-5803-41b8-9df1-3e36305f43c3/-8058314591433751228699_cache
 to /usr/lib/spark/work/app-20150608161350-0001/0/./ShellCompare.jar
15/06/08 16:14:22 INFO executor.Executor: Adding 
file:/usr/lib/spark/work/app-20150608161350-0001/0/./ShellCompare.jar to class 
loader
15/06/08 16:14:22 INFO broadcast.TorrentBroadcast: Started reading broadcast 
variable 0
15/06/08 16:14:23 INFO storage.MemoryStore: ensureFreeSpace(1561) called with 
curMem=0, maxMem=370503843
15/06/08 16:14:23 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as 
bytes in memory (estimated size 1561.0 B, free 353.3 MB)
15/06/08 16:14:23 INFO storage.BlockManagerMaster: Updated info of block 
broadcast_0_piece0
15/06/08 16:14:23 INFO broadcast.TorrentBroadcast: Reading broadcast variable 0 
took 967 ms
15/06/08 16:14:23 INFO storage.MemoryStore: ensureFreeSpace(2168) called with 
curMem=1561, maxMem=370503843
15/06/08 16:14:23 INFO storage.MemoryStore: Block broadcast_0 stored as values 
in memory (estimated size 2.1 KB, free 353.3 MB)
15/06/08 16:14:42 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 
0). 693 bytes result sent to driver
15/06/08 16:14:42 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 
1
15/06/08 16:14:42 INFO executor.Executor: Running task 1.0 in stage 0.0 (TID 1)
15/06/08 16:14:56 INFO executor.Executor: Finished task 1.0 in stage 0.0 (TID 
1). 693 bytes result sent to driver
15/06/08 16:14:56 INFO executor.CoarseGrainedExecutorBackend: Driver commanded 
a shutdown
15/06/08 16:14:56 INFO storage.MemoryStore: MemoryStore cleared
15/06/08 16:14:56 INFO storage.BlockManager: BlockManager stopped
15/06/08 16:14:56 INFO remote.RemoteActorRefProvider$RemotingTerminator: 
Shutting down remote daemon.
15/06/08 16:14:56 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote 
daemon shut down; proceeding with flushing remote transports.


--------------------------------

 

Thanks&amp;Best regards!
San.Luo



---------------------------------------------------------------------

To unsubscribe, e-mail: user-unsubscr...@spark.apache.org

For additional commands, e-mail: user-h...@spark.apache.org









Reply via email to