Re: How many threads impala start for handling partitioned join?

2017-10-25 Thread Jeszy
Hello JJ,

No, currently Impala uses one thread to execute the join (without
regard for the amount of partitions that fit into memory).

HTH

On 25 October 2017 at 05:44, 俊杰陈  wrote:
> Hi
>
> When Impala does a partitioned join on a node, it split the build input
> into partitions until a partition can fit into memory and consume the probe
> input then do the join and output rows.
>
> My question is will impala schedule multiple tasks to do join if multiple
> partitions fit into memory, or iterate over partitions? And for one
> partition does it use multiple threads to do join?  Thanks in advanced.
>
>
> JJ


Re: [VOTE] Graduate to a TLP

2017-10-17 Thread Jeszy
+1

On 18 October 2017 at 03:40, Tim Armstrong  wrote:
> +1
>
> On 17 Oct. 2017 8:38 pm, "Alexander Behm"  wrote:
>
>> +1
>>
>> On Tue, Oct 17, 2017 at 8:18 PM, Taras Bobrovytsky 
>> wrote:
>>
>> > +1
>> >
>> > On Tue, Oct 17, 2017 at 7:56 PM, Michael Ho  wrote:
>> >
>> > > +1
>> > >
>> > > On Tue, Oct 17, 2017 at 7:25 PM, Thomas Tauber-Marshall <
>> > > tmarsh...@cloudera.com> wrote:
>> > >
>> > > > +1
>> > > >
>> > > > On Tue, Oct 17, 2017 at 9:12 PM Bharath Vissapragada <
>> > > > bhara...@cloudera.com>
>> > > > wrote:
>> > > >
>> > > > > +1
>> > > > >
>> > > > > On Tue, Oct 17, 2017 at 7:10 PM, Mostafa Mokhtar <
>> > > mmokh...@cloudera.com>
>> > > > > wrote:
>> > > > >
>> > > > > > +1
>> > > > > >
>> > > > > > Thanks
>> > > > > > Mostafa
>> > > > > >
>> > > > > > > On Oct 17, 2017, at 7:09 PM, Brock Noland 
>> > wrote:
>> > > > > > >
>> > > > > > > +1
>> > > > > > >
>> > > > > > >> On Tue, Oct 17, 2017 at 9:07 PM, Lars Volker > >
>> > > > wrote:
>> > > > > > >> +1
>> > > > > > >>
>> > > > > > >>> On Oct 17, 2017 19:07, "Jim Apple" 
>> > wrote:
>> > > > > > >>>
>> > > > > > >>> Following our discussion
>> > > > > > >>> https://lists.apache.org/thread.html/
>> > > > 2f5db4788aff9b0557354b9106c032
>> > > > > > >>> 8a29c1f90c1a74a228163949d2@%3Cdev.impala.apache.org%3E
>> > > > > > >>> , I propose that we graduate to a TLP. According to
>> > > > > > >>> https://incubator.apache.org/guides/graduation.html#
>> > > > > > >>> community_graduation_vote
>> > > > > > >>> this is not required, and https://impala.apache.org/
>> > bylaws.html
>> > > > does
>> > > > > > not
>> > > > > > >>> say whose votes are "binding" in a graduation vote, so all
>> > > > community
>> > > > > > >>> members are welcome to vote.
>> > > > > > >>>
>> > > > > > >>> This will remain open 72 hours. I will be notifying
>> > > > general@incubator
>> > > > > > it
>> > > > > > >>> is
>> > > > > > >>> occurring.
>> > > > > > >>>
>> > > > > > >>> This is my +1.
>> > > > > > >>>
>> > > > > >
>> > > > >
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Thanks,
>> > > Michael
>> > >
>> >
>>


Re: show create table return empty after change column name in hive

2017-10-12 Thread Jeszy
Hey Yu,

I tried to reproduce on a CDH5.13 cluster, but your exact commands
work as expected for me. Are you using Impala 2.10 on a CDH5.13
cluster, or something else? Can you share your catalog and Hive
metastore logs?

Thanks.

On 12 October 2017 at 19:39, yu feng <olaptes...@gmail.com> wrote:
> I try to use ' invalidate metadata' for the whole catalog, But the modified
> table is still empty.  I am doubt the only way is restart catalogd.
>
> BTW, I test with the newest version(2.10.0)
>
> 2017-10-13 0:17 GMT+08:00 Jeszy <jes...@gmail.com>:
>
>> This does sound like a bug. What version are you using? Do you see any
>> errors in the catalog logs?
>> I think a global invalidate metadata should work, and it's a bit less
>> intrusive than a catalog restart. In general, it is a good idea to do
>> all metadata operations from Impala if you are using Impala at all, it
>> helps a lot in making metadata operations seamless.
>>
>> On 12 October 2017 at 02:53, yu feng <olaptes...@gmail.com> wrote:
>> > In our scene, users always do metadata modifications in hive, and do some
>> > query in impala.
>> >
>> > 2017-10-12 16:31 GMT+08:00 sky <x_h...@163.com>:
>> >
>> >> Why is the second step performed in hive, not impala?
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> At 2017-10-12 15:12:38, "yu feng" <olaptes...@gmail.com> wrote:
>> >> >I open impala-shell and hive-cli.
>> >> >1、execute 'show create table impala_test.sales_fact_1997' in
>> impala-shell
>> >> ,
>> >> >return :
>> >> >
>> >> >+--
>> >> -+
>> >> >| result
>> >> > |
>> >> >+--
>> >> -+
>> >> >| CREATE TABLE impala_test.sales_fact_1997 (
>> >> > |
>> >> >|   product_id INT,
>> >> >|
>> >> >|   time_id INT,
>> >> > |
>> >> >|   customer_id INT,
>> >> > |
>> >> >|   promotion_id INT,
>> >> >|
>> >> >|   store_id INT,
>> >> >|
>> >> >|   store_sales DOUBLE,
>> >> >|
>> >> >|   store_cost DOUBLE,
>> >> > |
>> >> >|   unit_sales DOUBLE
>> >> >|
>> >> >| )
>> >> >|
>> >> >|  COMMENT 'Imported by sqoop on 2017/06/09 20:25:40'
>> >> >|
>> >> >| ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u0001' LINES TERMINATED
>> BY
>> >> >'\n'   |
>> >> >| WITH SERDEPROPERTIES ('field.delim'='\u0001', 'line.delim'='\n',
>> >> >'serialization.format'='\u0001') |
>> >> >| STORED AS PARQUET
>> >> >|
>> >> >| LOCATION
>> >> >'hdfs://hz-cluster1/user/nrpt/hive-server/impala_test.
>> db/sales_fact_1997'
>> >> > |
>> >> >| TBLPROPERTIES ('COLUMN_STATS_ACCURATE'='true', 'numFiles'='3',
>> >> >'numRows'='10', 'rawDataSize'='80', 'totalSize'='1619937') |
>> >> >+--
>> >> -+
>> >> >
>> >> >2、execute 'alter table impala_test.sales_fact_1997 change column
>> >> product_id
>> >> >pproduct_id int;'  in hive -cli, return OK.
>> >> >3、execute 'invalidate metadata impala_test.sales_fact_1997 '.
>> >> >4、execute 'show create table impala_test.sales_fact_1997' again in
>> >

Re: Re: Load Data Parquet Table

2017-10-12 Thread Jeszy
You can load already existing parquet files to the destination table
from another location in HDFS.

On 12 October 2017 at 18:44, sky <x_h...@163.com> wrote:
> From the impala document, parquet supports load data operation, and how does 
> it support ?
>
>
>
>
>
>
>
>
> At 2017-10-13 00:30:12, "Jeszy" <jes...@gmail.com> wrote:
>>See the docs on LOAD DATA:
>>http://impala.apache.org/docs/build/html/topics/impala_load_data.html
>>
>>"In the interest of speed, only limited error checking is done. If the
>>loaded files have the wrong file format, different columns than the
>>destination table, or other kind of mismatch, Impala does not raise
>>any error for the LOAD DATA statement. Querying the table afterward
>>could produce a runtime error or unexpected results. Currently, the
>>only checking the LOAD DATA statement does is to avoid mixing together
>>uncompressed and LZO-compressed text files in the same table."
>>
>>To reload CSV data as parquet using Impala, you'd have to create a
>>table for the CSV data, then do an 'insert into [parquet table] select
>>[...] from [csv_table]'.
>>
>>HTH
>>
>>On 12 October 2017 at 07:58, sky <x_h...@163.com> wrote:
>>> Hi all,
>>> How does the parquet table perform load data operations? How does a CSV 
>>> file import into the parquet table?


Re: Load Data Parquet Table

2017-10-12 Thread Jeszy
See the docs on LOAD DATA:
http://impala.apache.org/docs/build/html/topics/impala_load_data.html

"In the interest of speed, only limited error checking is done. If the
loaded files have the wrong file format, different columns than the
destination table, or other kind of mismatch, Impala does not raise
any error for the LOAD DATA statement. Querying the table afterward
could produce a runtime error or unexpected results. Currently, the
only checking the LOAD DATA statement does is to avoid mixing together
uncompressed and LZO-compressed text files in the same table."

To reload CSV data as parquet using Impala, you'd have to create a
table for the CSV data, then do an 'insert into [parquet table] select
[...] from [csv_table]'.

HTH

On 12 October 2017 at 07:58, sky  wrote:
> Hi all,
> How does the parquet table perform load data operations? How does a CSV 
> file import into the parquet table?


Re: show create table return empty after change column name in hive

2017-10-12 Thread Jeszy
This does sound like a bug. What version are you using? Do you see any
errors in the catalog logs?
I think a global invalidate metadata should work, and it's a bit less
intrusive than a catalog restart. In general, it is a good idea to do
all metadata operations from Impala if you are using Impala at all, it
helps a lot in making metadata operations seamless.

On 12 October 2017 at 02:53, yu feng  wrote:
> In our scene, users always do metadata modifications in hive, and do some
> query in impala.
>
> 2017-10-12 16:31 GMT+08:00 sky :
>
>> Why is the second step performed in hive, not impala?
>>
>>
>>
>>
>>
>>
>>
>>
>> At 2017-10-12 15:12:38, "yu feng"  wrote:
>> >I open impala-shell and hive-cli.
>> >1、execute 'show create table impala_test.sales_fact_1997' in impala-shell
>> ,
>> >return :
>> >
>> >+--
>> -+
>> >| result
>> > |
>> >+--
>> -+
>> >| CREATE TABLE impala_test.sales_fact_1997 (
>> > |
>> >|   product_id INT,
>> >|
>> >|   time_id INT,
>> > |
>> >|   customer_id INT,
>> > |
>> >|   promotion_id INT,
>> >|
>> >|   store_id INT,
>> >|
>> >|   store_sales DOUBLE,
>> >|
>> >|   store_cost DOUBLE,
>> > |
>> >|   unit_sales DOUBLE
>> >|
>> >| )
>> >|
>> >|  COMMENT 'Imported by sqoop on 2017/06/09 20:25:40'
>> >|
>> >| ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u0001' LINES TERMINATED BY
>> >'\n'   |
>> >| WITH SERDEPROPERTIES ('field.delim'='\u0001', 'line.delim'='\n',
>> >'serialization.format'='\u0001') |
>> >| STORED AS PARQUET
>> >|
>> >| LOCATION
>> >'hdfs://hz-cluster1/user/nrpt/hive-server/impala_test.db/sales_fact_1997'
>> > |
>> >| TBLPROPERTIES ('COLUMN_STATS_ACCURATE'='true', 'numFiles'='3',
>> >'numRows'='10', 'rawDataSize'='80', 'totalSize'='1619937') |
>> >+--
>> -+
>> >
>> >2、execute 'alter table impala_test.sales_fact_1997 change column
>> product_id
>> >pproduct_id int;'  in hive -cli, return OK.
>> >3、execute 'invalidate metadata impala_test.sales_fact_1997 '.
>> >4、execute 'show create table impala_test.sales_fact_1997' again in
>> >impala-shell, return :
>> >
>> >+--
>> -+
>> >| result
>> > |
>> >+--
>> -+
>> >| CREATE TABLE impala_test.sales_fact_1997
>> > |
>> >|  COMMENT 'Imported by sqoop on 2017/06/09 20:25:40'
>> >|
>> >| ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u0001' LINES TERMINATED BY
>> >'\n'   |
>> >| WITH SERDEPROPERTIES ('field.delim'='\u0001', 'line.delim'='\n',
>> >'serialization.format'='\u0001') |
>> >| STORED AS PARQUET
>> >|
>> >| LOCATION
>> >'hdfs://hz-cluster1/user/nrpt/hive-server/impala_test.db/sales_fact_1997'
>> > |
>> >| TBLPROPERTIES ('COLUMN_STATS_ACCURATE'='true', 'numFiles'='3',
>> >'numRows'='10', 'rawDataSize'='80', 'totalSize'='1619937') |
>> >+--
>> -+
>> >
>> >all columns disappear, the column change will correct if I restart
>> >catalogd, I think it is a BUG caused by hive metastore client, It is any
>> >good idea overcome the problem except restart catalogd.
>> >
>> > I think we can check columns after getTable from HiveMetastoreClient, if
>> >it is empty, try to recreate the HiveMetastoreClient(hive do not support
>> >0-column table). is it a good way to overcome the problem if modify code
>> >like this?
>>


Re: ColumnLineageGraph.java Compile Error in Frontend

2017-10-03 Thread Jeszy
Hello Quanlong,

This is https://issues.apache.org/jira/browse/IMPALA-6009, there's
already a fix (but see follow up talk on jira).

HTH

On 4 October 2017 at 01:53, 黄权隆  wrote:
> Hi all,
>
> I encountered a compile error when I try to recompile impala yesterday. The
> error is in Frontend:
>
> [ERROR] COMPILATION ERROR :
>
> [INFO] -
>
> [ERROR]
> /mnt/volume1/impala-orc/incubator-impala/fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java:[593,11]
> no suitable method found for putString(java.lang.String)
>
> method
> com.google.common.hash.Hasher.putString(java.lang.CharSequence,java.nio.charset.Charset)
> is not applicable
>
>   (actual and formal argument lists differ in length)
>
> method
> com.google.common.hash.PrimitiveSink.putString(java.lang.CharSequence,java.nio.charset.Charset)
> is not applicable
>
>   (actual and formal argument lists differ in length)
>
>
> I also found this in the jenkins builds. It seems that
> com.google.common.hash.Hasher exists in both guava-*.jar and
> hive-exec-*.jar. Are there any changes in
> hive-exec-1.1.0-cdh5.14.0-SNAPSHOT.jar recently? What can I do to recover
> from this?
>
>
> Thanks,
>
> Quanlong


Re: Re: Impala Driver Source Code

2017-09-19 Thread Jeszy
Can you clarify what driver are you referring to?
Kudu doesn't have JDBC/ODBC drivers at all:
"Is there a JDBC driver available?
Kudu is not a SQL engine. The availability of JDBC and ODBC drivers
will be dictated by the SQL engine used in combination with Kudu."

https://kudu.apache.org/faq.html

FWIW, you can use Hive's open source driver to connect to impala, but
since it's not as widely used as Cloudera's drivers you may run into a
few issues.

On 19 September 2017 at 10:55, sky <x_h...@163.com> wrote:
> Thank you Jeszy.
>  But doesn't impala provide an open-source driver like kudu? Or, where 
> have open source available drivers?
>
>
>
>
>
>
>
> At 2017-09-18 17:57:21, "Jeszy" <jes...@gmail.com> wrote:
>>If you are referring to Cloudera's JDBC and ODBC connectors,
>>unfortunately those are proprietary.
>>
>>On 18 September 2017 at 11:55, sky <x_h...@163.com> wrote:
>>> Hi all,
>>> Could you give me a impala driver source code connection?


Re: Impala Driver Source Code

2017-09-18 Thread Jeszy
If you are referring to Cloudera's JDBC and ODBC connectors,
unfortunately those are proprietary.

On 18 September 2017 at 11:55, sky  wrote:
> Hi all,
> Could you give me a impala driver source code connection?


Re: impala is not parallelized

2017-08-03 Thread Jeszy
Also check block size.

On 3 August 2017 at 14:36, 孙清孟 <sqm2...@gmail.com> wrote:
> I find the difference between the two clusters, the replication of  HDFS in
> the Normal cluster is 3, another one is 1,
> and shortcircuit is enable!
>
> Thanks.
>
> 2017-08-03 15:02 GMT+08:00 孙清孟 <sqm2...@gmail.com>:
>
>> Hi Jeszy:
>>   Thanks for your reply.
>>
>>  On another cluster with two instances, I do the same SQL, and the file
>> size is smaller  :
>>
>> F00:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
>> WRITE TO HDFS [default.cdr_partition_par_false, OVERWRITE=true]
>> |  partitions=1
>> |  mem-estimate=1.00GB mem-reservation=0B
>> |
>> 00:SCAN HDFS [default.cdr_partition, RANDOM]
>>partitions=1/1 files=1 size=762.93MB
>>
>> And the single file is splitted:
>>  Averaged Fragment F00
>> <http://192.168.33.22:7180/cmf/impala/queryDetails?queryId=cb433d9e02457f39%3A247dc1f1=impala#>
>>
>>- split sizes: *min: 378.93 MB, max: 384.00 MB, avg: 381.46 MB,
>>stddev: 2.54 MB*
>>
>>
>> Is there some configuration wrong in my cluster?
>>
>> 2017-08-03 13:20 GMT+08:00 Jeszy <jes...@gmail.com>:
>>
>>> Putting some more files in the source table will allow you to use more
>>> hosts.
>>>
>>> On 3 August 2017 at 05:08, Taras Bobrovytsky <taras...@apache.org> wrote:
>>> > Yes, it looks like all the work is being done on a single node because
>>> > hosts=1.
>>> >
>>> > On Wed, Aug 2, 2017 at 7:55 PM, 孙清孟 <sqm2...@gmail.com> wrote:
>>> >
>>> >> This is my impala cluster:
>>> >>
>>> >>
>>> >>   <http://192.168.200.101:7180/cmf/services/14/instances#sort>
>>> >> Role Type <http://192.168.200.101:7180/cmf/services/14/instances#sort>
>>> >> State <http://192.168.200.101:7180/cmf/services/14/instances#sort>
>>> >> Host <http://192.168.200.101:7180/cmf/services/14/instances#sort>
>>> >> Commission State
>>> >> <http://192.168.200.101:7180/cmf/services/14/instances#sort>
>>> >> Role Group <http://192.168.200.101:7180/cmf/services/14/instances#sort
>>> >
>>> >> Impala Catalog Server
>>> >> <http://192.168.200.101:7180/cmf/services/14/instances/48/status>
>>> Started
>>> >> with Outdated Configuration cdha0.embed.com
>>> >> <http://192.168.200.101:7180/cmf/hardware/hosts/1/status> Commissioned
>>> >> Impala
>>> >> Catalog Server Default Group
>>> >> Impala Daemon
>>> >> <http://192.168.200.101:7180/cmf/services/14/instances/50/status>
>>> Started
>>> >> cdha2.embed.com <http://192.168.200.101:7180/c
>>> mf/hardware/hosts/3/status>
>>> >> Commissioned Impala Daemon Default Group
>>> >> Impala Daemon
>>> >> <http://192.168.200.101:7180/cmf/services/14/instances/52/status>
>>> Started
>>> >> cdha1.embed.com <http://192.168.200.101:7180/c
>>> mf/hardware/hosts/2/status>
>>> >> Commissioned Impala Daemon Default Group
>>> >> Impala Daemon
>>> >> <http://192.168.200.101:7180/cmf/services/14/instances/49/status>
>>> Started
>>> >> with Outdated Configuration cdha3.embed.com
>>> >> <http://192.168.200.101:7180/cmf/hardware/hosts/5/status> Commissioned
>>> >> Impala
>>> >> Daemon Default Group
>>> >> Impala Daemon
>>> >> <http://192.168.200.101:7180/cmf/services/14/instances/51/status>
>>> Started
>>> >> cdha4.embed.com <http://192.168.200.101:7180/c
>>> mf/hardware/hosts/4/status>
>>> >> Commissioned Impala Daemon Default Group
>>> >> Impala StateStore
>>> >> <http://192.168.200.101:7180/cmf/services/14/instances/53/status>
>>> Started
>>> >> cdha0.embed.com <http://192.168.200.101:7180/c
>>> mf/hardware/hosts/1/status>
>>> >> Commissioned Impala StateStore Default Group
>>> >>
>>> >>
>>> >> When I run a SQL:
>>> >>
>>> >> insert into table cdr_partition_true partition(ym = '2014-11') select
>>> >> call_1,
>>> >> call_2,
>>> >> type_1,
>>> >> own_1,
>>> >> own_2,
>>&g

Re: impala is not parallelized

2017-08-02 Thread Jeszy
Putting some more files in the source table will allow you to use more hosts.

On 3 August 2017 at 05:08, Taras Bobrovytsky  wrote:
> Yes, it looks like all the work is being done on a single node because
> hosts=1.
>
> On Wed, Aug 2, 2017 at 7:55 PM, 孙清孟  wrote:
>
>> This is my impala cluster:
>>
>>
>>   
>> Role Type 
>> State 
>> Host 
>> Commission State
>> 
>> Role Group 
>> Impala Catalog Server
>>  Started
>> with Outdated Configuration cdha0.embed.com
>>  Commissioned
>> Impala
>> Catalog Server Default Group
>> Impala Daemon
>>  Started
>> cdha2.embed.com 
>> Commissioned Impala Daemon Default Group
>> Impala Daemon
>>  Started
>> cdha1.embed.com 
>> Commissioned Impala Daemon Default Group
>> Impala Daemon
>>  Started
>> with Outdated Configuration cdha3.embed.com
>>  Commissioned
>> Impala
>> Daemon Default Group
>> Impala Daemon
>>  Started
>> cdha4.embed.com 
>> Commissioned Impala Daemon Default Group
>> Impala StateStore
>>  Started
>> cdha0.embed.com 
>> Commissioned Impala StateStore Default Group
>>
>>
>> When I run a SQL:
>>
>> insert into table cdr_partition_true partition(ym = '2014-11') select
>> call_1,
>> call_2,
>> type_1,
>> own_1,
>> own_2,
>> hdfs_id,
>> a_imsi,
>> p_imsi,
>> a_imei,
>> p_imei,
>> CAST(unix_timestamp(start_time) AS INT),
>> CAST(unix_timestamp(end_time) AS INT),
>> time,
>> a_LAC,
>> a_CI,
>> p_LAC,
>> p_CIfrom cdr_partition_cwang
>>
>>
>>
>> The EXPLAIN, it says only one host:
>>
>> 
>> Estimated Per-Host Requirements: Memory=2.80GB VCores=1
>> WARNING: The following tables are missing relevant table and/or column
>> statistics.
>> default.cdr_partition_cwang
>>
>> WRITE TO HDFS [default.cdr_partition_true, OVERWRITE=false,
>> PARTITION-KEYS=('2014-11')]
>> |  partitions=1
>> |  hosts=1 per-host-mem=1.00GB
>> |
>> 00:SCAN HDFS [default.cdr_partition_cwang, RANDOM]
>>partitions=1/1 files=1 size=2.00GB
>>table stats: unavailable
>>column stats: unavailable
>>hosts=1 per-host-mem=1.80GB
>>tuple-ids=0 row-size=128B cardinality=unavailable
>> 
>>
>> And instance is 1  -> Average Fragment F00.num instances: 1
>>
>> Is this means my work only was performed  on only one impala node?
>>


Re: Slow/unusable apache JIRA?

2017-07-24 Thread Jeszy
Hey,

No problems usually, but now It's down for me as well. According to
status.apache.org, the service seems to be struggling.

On 24 July 2017 at 18:15, Matthew Jacobs  wrote:
> Hey,
>
> I've been noticing a lot of slowness/timeouts on the Apache JIRA. Has
> anyone else noticed this? Sometimes it's just annoying, but today I've
> found a lot of pages are just timing out.
>
> Just got this error when attempting to load
> https://issues.apache.org/jira/browse/IMPALA-5275
>
>
> Communications Breakdown
>
> The call to the JIRA server did not complete within the timeout period. We
> are unsure of the result of this operation.
>
> Close this dialog and press refresh in your browser


Re: Re: Impala make install

2017-06-21 Thread Jeszy
With CM you can just add Impala as a new service to your cluster. Use
the dropdown next to the cluster name. A binary version of impala is
shipped as part of the CDH 5.11 parcels that you have installed.

On 21 June 2017 at 11:58, 孙清孟  wrote:
> Hi Tim,
>   I've built Impala according to the describe here:
> https://cwiki.apache.org/confluence/display/IMPALA/Building+Impala
>   But how can I install the Impala to an already running cdh-5.11.0-release
> cluster that managered by Cloudera Manager.
>   Build Debian packages and use `apt-get`?
>
> 2017-06-21 11:16 GMT+08:00 Henry Robinson :
>
>> I don't think there's any plan for this work. The CMake documentation would
>> be where I'd start looking for ideas:
>>
>> https://cmake.org/cmake/help/v3.2/command/install.html
>>
>> Best,
>> Henry
>>
>> On 20 June 2017 at 18:31, sky  wrote:
>>
>> > Hi Tim,
>> >Is there a plan for this work? Could you provide a manual copy of the
>> > example?Thanks.
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > At 2017-06-21 01:41:33, "Tim Armstrong"  wrote:
>> > >Hi Sky,
>> > >  We have not implemented an install target yet - for deployment we rely
>> > on
>> > >copying out the artifacts manually. I believe CMake has some support for
>> > >implementing install targets but nobody has picked up that work yet.
>> > >
>> > >- Tim
>> > >
>> > >On Mon, Jun 19, 2017 at 8:45 PM, sky  wrote:
>> > >
>> > >> Hi all,
>> > >> I am using cdh5.11.1-release,the compilation command is provided
>> in
>> > >> the documentation(./buildall.sh -notests -so),but there is no command
>> > >> similar to 'make install'.In the current document compiled, the
>> > directory
>> > >> structure is too much and do not need too many files. Could you
>> provide
>> > an
>> > >> "install" command to extract compiled files to other directories for
>> > easy
>> > >> management
>> >
>>


Re: big issue on retrieving 400MB data

2017-04-28 Thread Jeszy
Hey,

It looks like all the time is spent waiting for the client to fetch the results:
 - ClientFetchWaitTimer: 17m31s

Try doing:
impala-shell -B -q ''

HTH

2017-04-28 14:51 GMT+02:00 吴朱华 :
> Maybe I just paste some main thing on mail , and congratulation on IPO
> thing.
>
> Unregister query: 17m42s (17m42s)
>
> Fetched 317246 row(s) in 1062.84s
> Query Runtime Profile:
> Query (id=8149e2439f43b15a:f08e570d7fbf1085):
>   Summary:
> Session ID: 35436d1112b79287:9045c79c795858a5
> Session Type: BEESWAX
> Start Time: 2017-04-28 11:50:00.292615000
> End Time: 2017-04-28 12:07:43.133484000
> Query Type: QUERY
> Query State: FINISHED
> Query Status: OK
> Impala Version: impalad version 2.5.0-cdh5-INTERNAL RELEASE (build
> 43880282edc04c03c162bbea6fc85b5388e7fdde)
> User: impala
> Connected User: impala
> Delegated User:
> Network Address: :::10.44.10.186:36325
> Default Db: sjzy
> Sql Statement: select
> MRECID,UNITID,PCQDM,PCQMC,PCXQDM,PCXQMC,DM,H001,H002,H003,H021,H022,H023,H024,H025,H026A,H026B,H026C,H026D,H026E,H026F,H026G,H027,H028,H029,H030,H031,H032,H033,H034,H035,H036,H037A,H037B,H037C,H038,H039,H040,H041,H042,H043A,H043B,H043C,H043D,H043E,H043F,H043G,H043H,H043I,H043J,H043K,H043L,H044A,H044B,H044C,H044D,H044E,H044F,H044G,H044H,H044I,H050,H051,H052,H053,H054,H055,H056,H061,H062,H063,H064,H065,H066,H070,H071,H072,H073,H074,H075,H080,H100,H111,H112,H113,H120,H200,H201,H202,H203,H204,H205,H206,H207,H208,H209,H210,H211,H300,H320,H321,H322,H323,H324,H400,H401,H402,H403,H404,H405,H406,H500,H600,H601,H602,H603,H604,H605,H606,H607,H608,H609,H610,H611,H612,H613,H614,H615,H616,H621A,H621B,H621C,H621D,H621E,H621F,H622A,H622B,H622C,H801,H802,H803,H804,H901,H902,H903
> FROM NP_2017_NP601 WHERE DS_AREACODE LIKE '445281%'
> Coordinator: node1.sky.org:22000
> Query Options (non default):
> Plan:
> 
> Estimated Per-Host Requirements: Memory=4.50GB VCores=1
>
> 01:EXCHANGE [UNPARTITIONED]
> |  hosts=4 per-host-mem=unavailable
> |  tuple-ids=0 row-size=1.67KB cardinality=1155911
> |
> 00:SCAN HDFS [sjzy.np_2017_np601, RANDOM]
>partitions=1/1 files=20 size=1.06GB
>predicates: DS_AREACODE LIKE '445281%'
>table stats: 11559109 rows total
>column stats: all
>hosts=4 per-host-mem=4.50GB
>tuple-ids=0 row-size=1.67KB cardinality=1155911
> 
> Estimated Per-Host Mem: 4831838208
> Estimated Per-Host VCores: 1
> Request Pool: default-pool
> ExecSummary:
> Operator   #Hosts  Avg Time  Max Time#Rows  Est. #Rows   Peak Mem
>  Est. Peak Mem  Detail
> -
> 01:EXCHANGE 1  32.314ms  32.314ms  317.25K   1.16M  0
>  -1.00 B  UNPARTITIONED
> 00:SCAN HDFS   20   1s137ms   1s348ms  317.25K   1.16M  163.85 MB
>  4.50 GB  sjzy.np_2017_np601
> Planner Timeline: 53.683ms
>- Analysis finished: 24.565ms (24.565ms)
>- Equivalence classes computed: 26.389ms (1.823ms)
>- Single node plan created: 33.607ms (7.218ms)
>- Runtime filters computed: 33.684ms (76.568us)
>- Distributed plan created: 39.125ms (5.441ms)
>- Planning finished: 53.683ms (14.558ms)
> Query Timeline: 17m42s
>- Start execution: 43.792us (43.792us)
>- Planning finished: 60.640ms (60.596ms)
>- Ready to start 20 remote fragments: 65.111ms (4.471ms)
>- All 20 remote fragments started: 74.572ms (9.461ms)
>- Rows available: 744.300ms (669.728ms)
>- First row fetched: 790.128ms (45.828ms)
>- Unregister query: 17m42s (17m42s)
>   ImpalaServer:
>  - ClientFetchWaitTimer: 17m31s
>  - RowMaterializationTimer: 10s024ms
>
> 2017-04-28 19:44 GMT+08:00 Jim Apple :
>
>> dev@ does not appear to accept attachments. You can upload it somewhere
>> and
>> post a link, though.
>>
>> On Thu, Apr 27, 2017 at 11:35 PM, 吴朱华  wrote:
>>
>> > Oops, I just resend it, you know the chinese network^_^
>> >
>> > 2017-04-28 14:20 GMT+08:00 Mostafa Mokhtar :
>> >
>> >> Btw the profile wasn't attached.
>> >> Please resend.
>> >>
>> >> On Thu, Apr 27, 2017 at 11:11 PM, 吴朱华  wrote:
>> >>
>> >>> Profile is in the attachment, thanks
>> >>>
>> >>>
>> >>> 2017-04-28 13:10 GMT+08:00 Dimitris Tsirogiannis <
>> >>> dtsirogian...@cloudera.com>:
>> >>>
>>  Maybe you also want to post some information about the schema (how
>> wide
>>  your table is, does it use nested types, etc) as well as the profile
>> of
>>  the
>>  slow query.
>> 
>>  Dimitris
>> 
>>  On Thu, Apr 27, 2017 at 9:30 PM, 吴朱华  wrote:
>> 
>>  > Hi guys:
>>  > we can facing a big issue when select * from a big table.
>>  > The performance is 17minutes for retrieving 400MB data. Even slow
>> 

Re: Is there any way to retrieve table metadata using select rather than show?

2017-04-06 Thread Jeszy
Hey,

that's not possible from within impala. If you go directly to the
HMS's backing DB, you can query that.
What information are you looking for?

Thanks.

On Thu, Apr 6, 2017 at 3:02 PM, 吴朱华  wrote:
> Hi guys:
>
> Currently, we are using "show databases","show tables" or "Describe table"
> to retrieve table metadata, but we use such as "select * from metadata" to
> retrieve, just like RDBMS did^_^