I find the difference between the two clusters, the replication of HDFS in the Normal cluster is 3, another one is 1, and shortcircuit is enable!
Thanks. 2017-08-03 15:02 GMT+08:00 孙清孟 <sqm2...@gmail.com>: > Hi Jeszy: > Thanks for your reply. > > On another cluster with two instances, I do the same SQL, and the file > size is smaller : > > F00:PLAN FRAGMENT [RANDOM] hosts=2 instances=2 > WRITE TO HDFS [default.cdr_partition_par_false, OVERWRITE=true] > | partitions=1 > | mem-estimate=1.00GB mem-reservation=0B > | > 00:SCAN HDFS [default.cdr_partition, RANDOM] > partitions=1/1 files=1 size=762.93MB > > And the single file is splitted: > Averaged Fragment F00 > <http://192.168.33.22:7180/cmf/impala/queryDetails?queryId=cb433d9e02457f39%3A247dc1f100000000&serviceName=impala#> > > - split sizes: *min: 378.93 MB, max: 384.00 MB, avg: 381.46 MB, > stddev: 2.54 MB* > > > Is there some configuration wrong in my cluster? > > 2017-08-03 13:20 GMT+08:00 Jeszy <jes...@gmail.com>: > >> Putting some more files in the source table will allow you to use more >> hosts. >> >> On 3 August 2017 at 05:08, Taras Bobrovytsky <taras...@apache.org> wrote: >> > Yes, it looks like all the work is being done on a single node because >> > hosts=1. >> > >> > On Wed, Aug 2, 2017 at 7:55 PM, 孙清孟 <sqm2...@gmail.com> wrote: >> > >> >> This is my impala cluster: >> >> >> >> >> >> <http://192.168.200.101:7180/cmf/services/14/instances#sort> >> >> Role Type <http://192.168.200.101:7180/cmf/services/14/instances#sort> >> >> State <http://192.168.200.101:7180/cmf/services/14/instances#sort> >> >> Host <http://192.168.200.101:7180/cmf/services/14/instances#sort> >> >> Commission State >> >> <http://192.168.200.101:7180/cmf/services/14/instances#sort> >> >> Role Group <http://192.168.200.101:7180/cmf/services/14/instances#sort >> > >> >> Impala Catalog Server >> >> <http://192.168.200.101:7180/cmf/services/14/instances/48/status> >> Started >> >> with Outdated Configuration cdha0.embed.com >> >> <http://192.168.200.101:7180/cmf/hardware/hosts/1/status> Commissioned >> >> Impala >> >> Catalog Server Default Group >> >> Impala Daemon >> >> <http://192.168.200.101:7180/cmf/services/14/instances/50/status> >> Started >> >> cdha2.embed.com <http://192.168.200.101:7180/c >> mf/hardware/hosts/3/status> >> >> Commissioned Impala Daemon Default Group >> >> Impala Daemon >> >> <http://192.168.200.101:7180/cmf/services/14/instances/52/status> >> Started >> >> cdha1.embed.com <http://192.168.200.101:7180/c >> mf/hardware/hosts/2/status> >> >> Commissioned Impala Daemon Default Group >> >> Impala Daemon >> >> <http://192.168.200.101:7180/cmf/services/14/instances/49/status> >> Started >> >> with Outdated Configuration cdha3.embed.com >> >> <http://192.168.200.101:7180/cmf/hardware/hosts/5/status> Commissioned >> >> Impala >> >> Daemon Default Group >> >> Impala Daemon >> >> <http://192.168.200.101:7180/cmf/services/14/instances/51/status> >> Started >> >> cdha4.embed.com <http://192.168.200.101:7180/c >> mf/hardware/hosts/4/status> >> >> Commissioned Impala Daemon Default Group >> >> Impala StateStore >> >> <http://192.168.200.101:7180/cmf/services/14/instances/53/status> >> Started >> >> cdha0.embed.com <http://192.168.200.101:7180/c >> mf/hardware/hosts/1/status> >> >> Commissioned Impala StateStore Default Group >> >> >> >> >> >> When I run a SQL: >> >> >> >> insert into table cdr_partition_true partition(ym = '2014-11') select >> >> call_1, >> >> call_2, >> >> type_1, >> >> own_1, >> >> own_2, >> >> hdfs_id, >> >> a_imsi, >> >> p_imsi, >> >> a_imei, >> >> p_imei, >> >> CAST(unix_timestamp(start_time) AS INT), >> >> CAST(unix_timestamp(end_time) AS INT), >> >> time, >> >> a_LAC, >> >> a_CI, >> >> p_LAC, >> >> p_CIfrom cdr_partition_cwang >> >> >> >> >> >> >> >> The EXPLAIN, it says only one host: >> >> >> >> ---------------- >> >> Estimated Per-Host Requirements: Memory=2.80GB VCores=1 >> >> WARNING: The following tables are missing relevant table and/or column >> >> statistics. >> >> default.cdr_partition_cwang >> >> >> >> WRITE TO HDFS [default.cdr_partition_true, OVERWRITE=false, >> >> PARTITION-KEYS=('2014-11')] >> >> | partitions=1 >> >> | hosts=1 per-host-mem=1.00GB >> >> | >> >> 00:SCAN HDFS [default.cdr_partition_cwang, RANDOM] >> >> partitions=1/1 files=1 size=2.00GB >> >> table stats: unavailable >> >> column stats: unavailable >> >> hosts=1 per-host-mem=1.80GB >> >> tuple-ids=0 row-size=128B cardinality=unavailable >> >> ---------------- >> >> >> >> And instance is 1 -> Average Fragment F00.num instances: 1 >> >> >> >> Is this means my work only was performed on only one impala node? >> >> >> > >