Also check block size.
On 3 August 2017 at 14:36, 孙清孟 <sqm2...@gmail.com> wrote: > I find the difference between the two clusters, the replication of HDFS in > the Normal cluster is 3, another one is 1, > and shortcircuit is enable! > > Thanks. > > 2017-08-03 15:02 GMT+08:00 孙清孟 <sqm2...@gmail.com>: > >> Hi Jeszy: >> Thanks for your reply. >> >> On another cluster with two instances, I do the same SQL, and the file >> size is smaller : >> >> F00:PLAN FRAGMENT [RANDOM] hosts=2 instances=2 >> WRITE TO HDFS [default.cdr_partition_par_false, OVERWRITE=true] >> | partitions=1 >> | mem-estimate=1.00GB mem-reservation=0B >> | >> 00:SCAN HDFS [default.cdr_partition, RANDOM] >> partitions=1/1 files=1 size=762.93MB >> >> And the single file is splitted: >> Averaged Fragment F00 >> <http://192.168.33.22:7180/cmf/impala/queryDetails?queryId=cb433d9e02457f39%3A247dc1f100000000&serviceName=impala#> >> >> - split sizes: *min: 378.93 MB, max: 384.00 MB, avg: 381.46 MB, >> stddev: 2.54 MB* >> >> >> Is there some configuration wrong in my cluster? >> >> 2017-08-03 13:20 GMT+08:00 Jeszy <jes...@gmail.com>: >> >>> Putting some more files in the source table will allow you to use more >>> hosts. >>> >>> On 3 August 2017 at 05:08, Taras Bobrovytsky <taras...@apache.org> wrote: >>> > Yes, it looks like all the work is being done on a single node because >>> > hosts=1. >>> > >>> > On Wed, Aug 2, 2017 at 7:55 PM, 孙清孟 <sqm2...@gmail.com> wrote: >>> > >>> >> This is my impala cluster: >>> >> >>> >> >>> >> <http://192.168.200.101:7180/cmf/services/14/instances#sort> >>> >> Role Type <http://192.168.200.101:7180/cmf/services/14/instances#sort> >>> >> State <http://192.168.200.101:7180/cmf/services/14/instances#sort> >>> >> Host <http://192.168.200.101:7180/cmf/services/14/instances#sort> >>> >> Commission State >>> >> <http://192.168.200.101:7180/cmf/services/14/instances#sort> >>> >> Role Group <http://192.168.200.101:7180/cmf/services/14/instances#sort >>> > >>> >> Impala Catalog Server >>> >> <http://192.168.200.101:7180/cmf/services/14/instances/48/status> >>> Started >>> >> with Outdated Configuration cdha0.embed.com >>> >> <http://192.168.200.101:7180/cmf/hardware/hosts/1/status> Commissioned >>> >> Impala >>> >> Catalog Server Default Group >>> >> Impala Daemon >>> >> <http://192.168.200.101:7180/cmf/services/14/instances/50/status> >>> Started >>> >> cdha2.embed.com <http://192.168.200.101:7180/c >>> mf/hardware/hosts/3/status> >>> >> Commissioned Impala Daemon Default Group >>> >> Impala Daemon >>> >> <http://192.168.200.101:7180/cmf/services/14/instances/52/status> >>> Started >>> >> cdha1.embed.com <http://192.168.200.101:7180/c >>> mf/hardware/hosts/2/status> >>> >> Commissioned Impala Daemon Default Group >>> >> Impala Daemon >>> >> <http://192.168.200.101:7180/cmf/services/14/instances/49/status> >>> Started >>> >> with Outdated Configuration cdha3.embed.com >>> >> <http://192.168.200.101:7180/cmf/hardware/hosts/5/status> Commissioned >>> >> Impala >>> >> Daemon Default Group >>> >> Impala Daemon >>> >> <http://192.168.200.101:7180/cmf/services/14/instances/51/status> >>> Started >>> >> cdha4.embed.com <http://192.168.200.101:7180/c >>> mf/hardware/hosts/4/status> >>> >> Commissioned Impala Daemon Default Group >>> >> Impala StateStore >>> >> <http://192.168.200.101:7180/cmf/services/14/instances/53/status> >>> Started >>> >> cdha0.embed.com <http://192.168.200.101:7180/c >>> mf/hardware/hosts/1/status> >>> >> Commissioned Impala StateStore Default Group >>> >> >>> >> >>> >> When I run a SQL: >>> >> >>> >> insert into table cdr_partition_true partition(ym = '2014-11') select >>> >> call_1, >>> >> call_2, >>> >> type_1, >>> >> own_1, >>> >> own_2, >>> >> hdfs_id, >>> >> a_imsi, >>> >> p_imsi, >>> >> a_imei, >>> >> p_imei, >>> >> CAST(unix_timestamp(start_time) AS INT), >>> >> CAST(unix_timestamp(end_time) AS INT), >>> >> time, >>> >> a_LAC, >>> >> a_CI, >>> >> p_LAC, >>> >> p_CIfrom cdr_partition_cwang >>> >> >>> >> >>> >> >>> >> The EXPLAIN, it says only one host: >>> >> >>> >> ---------------- >>> >> Estimated Per-Host Requirements: Memory=2.80GB VCores=1 >>> >> WARNING: The following tables are missing relevant table and/or column >>> >> statistics. >>> >> default.cdr_partition_cwang >>> >> >>> >> WRITE TO HDFS [default.cdr_partition_true, OVERWRITE=false, >>> >> PARTITION-KEYS=('2014-11')] >>> >> | partitions=1 >>> >> | hosts=1 per-host-mem=1.00GB >>> >> | >>> >> 00:SCAN HDFS [default.cdr_partition_cwang, RANDOM] >>> >> partitions=1/1 files=1 size=2.00GB >>> >> table stats: unavailable >>> >> column stats: unavailable >>> >> hosts=1 per-host-mem=1.80GB >>> >> tuple-ids=0 row-size=128B cardinality=unavailable >>> >> ---------------- >>> >> >>> >> And instance is 1 -> Average Fragment F00.num instances: 1 >>> >> >>> >> Is this means my work only was performed on only one impala node? >>> >> >>> >> >>