Hi, all

   We use spark sql to insert data from a text table into a partitioning table 
and found that if we give more cores to executors the insert performance whold 
be worse.



executors num                total-executor-cores          average time for 
insert task

3                                   3                                     1.7 
min

3                                   6                                     3 min

3                                   9                                     4.8 
min





any one has idea?



following is the detail info of out test.





cluster:
4 nodes, each node 300+ g memory and 24 cores.
hadoop: 1 namenode + 3 datanodes.
spark: standalone mode, 1 master + 3 workers.



spark version:

apache master branch,  the current commit is

33mcommit a777eb04bf981312b640326607158f78dd4163cd

Author: Patrick Wendell <patr...@databricks.com<mailto:patr...@databricks.com>>
Date:   Wed Jun 10 21:13:47 2015 -0700

    [HOTFIX] Adding more contributor name bindings


sql:

1 create a external table like

CREATE EXTERNAL TABLE tableA (
   f1 string,
   f2 string,
   f3 bigint,
   f4 smallint,
   f5 smallint,
   f6 string,
   f7 smallint,
   f8 string,
   f9 string,
   f10 smallint,
   f11 string,
   f12 bigint,
   f13 bigint,
   f14 bigint,
   f15 bigint,
   f16 string,
   f17 string,
   f18 smallint,
   f19 string,
   f20 string,
   f21 string,
   f22 string,
   f23 string,
   f24 smallint,
   f25 smallint,
   f26 bigint,
   f27 smallint,
   f28 bigint,
   f29 string,
   f30 bigint,
   f31 bigint,
   f32 bigint,
   f33 smallint,
   f34 smallint,
   f35 smallint,
   f36 smallint,
   f37 smallint,
   f38 smallint,
   f39 string,
   f40 smallint,
   f41 string,
   f42 smallint,
   f43 string,
   f44 smallint,
   f45 smallint,
   f46 smallint,
   f47 string,
   f48 smallint,
   f49 smallint,
   f50 string,
   f51 string,
   f52 smallint,
   f53 int,
   f54 bigint,
   f55 bigint)
row format delimited fields terminated by '|'
STORED AS textfile
LOCATION '/data';



2 create a patition table



CREATE EXTERNAL TABLE tableB (
   f1 string,
   f2 string,
   f3 bigint,
   f4 smallint,
   f5 smallint,
   f6 string,
   f7 smallint,
   f8 string,
   f9 string,
   f10 smallint,
   f11 string,
   f12 bigint,
   f13 bigint,
   f14 bigint,
   f15 bigint,
   f16 string,
   f17 string,
   f18 smallint,
   f19 string,
   f20 string,
   f21 string,
   f22 string,
   f23 string,
   f24 smallint,
   f25 smallint,
   f26 bigint,
   f27 smallint,
   f28 bigint,
   f29 string,
   f30 bigint,
   f31 bigint,
   f32 bigint,
   f33 smallint,
   f34 smallint,
   f35 smallint,
   f36 smallint,
   f37 smallint,
   f38 smallint,
   f39 string,
   f40 smallint,
   f41 string,
   f42 smallint,
   f43 string,
   f44 smallint,
   f45 smallint,
   f46 smallint,
   f47 string,
   f48 smallint,
   f49 smallint,
   f50 string,
   f51 string,
   f52 smallint,
   f53 int,
   f54 bigint,
   f55 bigint) partitioned by (hour int, id string);



3 do insert



insert into table tableB partition (hour,last_msisdn) select *, 
hour(from_unixtime(f3,'yyyy-MM-dd HH:mm:ss')),substr(f9,-1)  from tableA;


Reply via email to