when run the same job, time that spark used is very diffrent from shark.

qingyang li Thu, 06 Mar 2014 19:40:32 -0800

*Hi, community, I have setup 3 nodes spark cluster using standalone mode,
each machine's memery is 16G, the core is 4. *




*when i run " val file =
sc.textFile("/user/hive/warehouse/b/test.txt")
file.filter(line => line.contains("2013-")).count()               "*

*it cost  2.7s , *



*but , when i run "select count(*) from b;" using shark, it cost 15.81s, *



*So,Why shark using more time than spark?  *

*other info:*

*1. i have set export SPARK_MEM=10g in shark-env.sh2. *
*test.txt is 4.21G which exists on each machine's directory
/user/hive/warehouse/b/ and *
*test.txt has been loaded into memery.*

*3. there are 38532979 lines in test.txt*

when run the same job, time that spark used is very diffrent from shark.

Reply via email to