Re: show create table return empty after change column name in hive

2017-10-12 Thread Jeszy
Hey Yu,

I tried to reproduce on a CDH5.13 cluster, but your exact commands
work as expected for me. Are you using Impala 2.10 on a CDH5.13
cluster, or something else? Can you share your catalog and Hive
metastore logs?

Thanks.

On 12 October 2017 at 19:39, yu feng  wrote:
> I try to use ' invalidate metadata' for the whole catalog, But the modified
> table is still empty.  I am doubt the only way is restart catalogd.
>
> BTW, I test with the newest version(2.10.0)
>
> 2017-10-13 0:17 GMT+08:00 Jeszy :
>
>> This does sound like a bug. What version are you using? Do you see any
>> errors in the catalog logs?
>> I think a global invalidate metadata should work, and it's a bit less
>> intrusive than a catalog restart. In general, it is a good idea to do
>> all metadata operations from Impala if you are using Impala at all, it
>> helps a lot in making metadata operations seamless.
>>
>> On 12 October 2017 at 02:53, yu feng  wrote:
>> > In our scene, users always do metadata modifications in hive, and do some
>> > query in impala.
>> >
>> > 2017-10-12 16:31 GMT+08:00 sky :
>> >
>> >> Why is the second step performed in hive, not impala?
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> At 2017-10-12 15:12:38, "yu feng"  wrote:
>> >> >I open impala-shell and hive-cli.
>> >> >1、execute 'show create table impala_test.sales_fact_1997' in
>> impala-shell
>> >> ,
>> >> >return :
>> >> >
>> >> >+--
>> >> -+
>> >> >| result
>> >> > |
>> >> >+--
>> >> -+
>> >> >| CREATE TABLE impala_test.sales_fact_1997 (
>> >> > |
>> >> >|   product_id INT,
>> >> >|
>> >> >|   time_id INT,
>> >> > |
>> >> >|   customer_id INT,
>> >> > |
>> >> >|   promotion_id INT,
>> >> >|
>> >> >|   store_id INT,
>> >> >|
>> >> >|   store_sales DOUBLE,
>> >> >|
>> >> >|   store_cost DOUBLE,
>> >> > |
>> >> >|   unit_sales DOUBLE
>> >> >|
>> >> >| )
>> >> >|
>> >> >|  COMMENT 'Imported by sqoop on 2017/06/09 20:25:40'
>> >> >|
>> >> >| ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u0001' LINES TERMINATED
>> BY
>> >> >'\n'   |
>> >> >| WITH SERDEPROPERTIES ('field.delim'='\u0001', 'line.delim'='\n',
>> >> >'serialization.format'='\u0001') |
>> >> >| STORED AS PARQUET
>> >> >|
>> >> >| LOCATION
>> >> >'hdfs://hz-cluster1/user/nrpt/hive-server/impala_test.
>> db/sales_fact_1997'
>> >> > |
>> >> >| TBLPROPERTIES ('COLUMN_STATS_ACCURATE'='true', 'numFiles'='3',
>> >> >'numRows'='10', 'rawDataSize'='80', 'totalSize'='1619937') |
>> >> >+--
>> >> -+
>> >> >
>> >> >2、execute 'alter table impala_test.sales_fact_1997 change column
>> >> product_id
>> >> >pproduct_id int;'  in hive -cli, return OK.
>> >> >3、execute 'invalidate metadata impala_test.sales_fact_1997 '.
>> >> >4、execute 'show create table impala_test.sales_fact_1997' again in
>> >> >impala-shell, return :
>> >> >
>> >> >+--
>> >> -+
>> >> >| result
>> >> > |
>> >> >+--
>> >> -+
>> >> >| CREATE TABLE impala_test.sales_fact_1997
>> >> > |
>> >> >|  COMMENT 'Imported by sqoop on 2017/06/09 20:25:40'
>> >> >|
>> >> >| ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u0001' LINES TERMINATED
>> BY
>> >> >'\n'   |
>> >> >| WITH SERDEPROPERTIES ('field.delim'='\u0001', 'line.delim'='\n',
>> >> >'serialization.format'='\u0001') |
>> >> >| STORED AS PARQUET
>> >> >|
>> >> >| LOCATION
>> >> >'hdfs://hz-cluster1/user/nrpt/hive-server/impala_test.
>> 

Re: Re: Load Data Parquet Table

2017-10-12 Thread Jeszy
You can load already existing parquet files to the destination table
from another location in HDFS.

On 12 October 2017 at 18:44, sky  wrote:
> From the impala document, parquet supports load data operation, and how does 
> it support ?
>
>
>
>
>
>
>
>
> At 2017-10-13 00:30:12, "Jeszy"  wrote:
>>See the docs on LOAD DATA:
>>http://impala.apache.org/docs/build/html/topics/impala_load_data.html
>>
>>"In the interest of speed, only limited error checking is done. If the
>>loaded files have the wrong file format, different columns than the
>>destination table, or other kind of mismatch, Impala does not raise
>>any error for the LOAD DATA statement. Querying the table afterward
>>could produce a runtime error or unexpected results. Currently, the
>>only checking the LOAD DATA statement does is to avoid mixing together
>>uncompressed and LZO-compressed text files in the same table."
>>
>>To reload CSV data as parquet using Impala, you'd have to create a
>>table for the CSV data, then do an 'insert into [parquet table] select
>>[...] from [csv_table]'.
>>
>>HTH
>>
>>On 12 October 2017 at 07:58, sky  wrote:
>>> Hi all,
>>> How does the parquet table perform load data operations? How does a CSV 
>>> file import into the parquet table?


Re: show create table return empty after change column name in hive

2017-10-12 Thread yu feng
I try to use ' invalidate metadata' for the whole catalog, But the modified
table is still empty.  I am doubt the only way is restart catalogd.

BTW, I test with the newest version(2.10.0)

2017-10-13 0:17 GMT+08:00 Jeszy :

> This does sound like a bug. What version are you using? Do you see any
> errors in the catalog logs?
> I think a global invalidate metadata should work, and it's a bit less
> intrusive than a catalog restart. In general, it is a good idea to do
> all metadata operations from Impala if you are using Impala at all, it
> helps a lot in making metadata operations seamless.
>
> On 12 October 2017 at 02:53, yu feng  wrote:
> > In our scene, users always do metadata modifications in hive, and do some
> > query in impala.
> >
> > 2017-10-12 16:31 GMT+08:00 sky :
> >
> >> Why is the second step performed in hive, not impala?
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> At 2017-10-12 15:12:38, "yu feng"  wrote:
> >> >I open impala-shell and hive-cli.
> >> >1、execute 'show create table impala_test.sales_fact_1997' in
> impala-shell
> >> ,
> >> >return :
> >> >
> >> >+--
> >> -+
> >> >| result
> >> > |
> >> >+--
> >> -+
> >> >| CREATE TABLE impala_test.sales_fact_1997 (
> >> > |
> >> >|   product_id INT,
> >> >|
> >> >|   time_id INT,
> >> > |
> >> >|   customer_id INT,
> >> > |
> >> >|   promotion_id INT,
> >> >|
> >> >|   store_id INT,
> >> >|
> >> >|   store_sales DOUBLE,
> >> >|
> >> >|   store_cost DOUBLE,
> >> > |
> >> >|   unit_sales DOUBLE
> >> >|
> >> >| )
> >> >|
> >> >|  COMMENT 'Imported by sqoop on 2017/06/09 20:25:40'
> >> >|
> >> >| ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u0001' LINES TERMINATED
> BY
> >> >'\n'   |
> >> >| WITH SERDEPROPERTIES ('field.delim'='\u0001', 'line.delim'='\n',
> >> >'serialization.format'='\u0001') |
> >> >| STORED AS PARQUET
> >> >|
> >> >| LOCATION
> >> >'hdfs://hz-cluster1/user/nrpt/hive-server/impala_test.
> db/sales_fact_1997'
> >> > |
> >> >| TBLPROPERTIES ('COLUMN_STATS_ACCURATE'='true', 'numFiles'='3',
> >> >'numRows'='10', 'rawDataSize'='80', 'totalSize'='1619937') |
> >> >+--
> >> -+
> >> >
> >> >2、execute 'alter table impala_test.sales_fact_1997 change column
> >> product_id
> >> >pproduct_id int;'  in hive -cli, return OK.
> >> >3、execute 'invalidate metadata impala_test.sales_fact_1997 '.
> >> >4、execute 'show create table impala_test.sales_fact_1997' again in
> >> >impala-shell, return :
> >> >
> >> >+--
> >> -+
> >> >| result
> >> > |
> >> >+--
> >> -+
> >> >| CREATE TABLE impala_test.sales_fact_1997
> >> > |
> >> >|  COMMENT 'Imported by sqoop on 2017/06/09 20:25:40'
> >> >|
> >> >| ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u0001' LINES TERMINATED
> BY
> >> >'\n'   |
> >> >| WITH SERDEPROPERTIES ('field.delim'='\u0001', 'line.delim'='\n',
> >> >'serialization.format'='\u0001') |
> >> >| STORED AS PARQUET
> >> >|
> >> >| LOCATION
> >> >'hdfs://hz-cluster1/user/nrpt/hive-server/impala_test.
> db/sales_fact_1997'
> >> > |
> >> >| TBLPROPERTIES ('COLUMN_STATS_ACCURATE'='true', 'numFiles'='3',
> >> >'numRows'='10', 'rawDataSize'='80', 'totalSize'='1619937') |
> >> >+--
> >> -+
> >> >
> >> >all columns disappear, the column change will correct if I 

Re:Re: Load Data Parquet Table

2017-10-12 Thread sky
From the impala document, parquet supports load data operation, and how does it 
support ?








At 2017-10-13 00:30:12, "Jeszy"  wrote:
>See the docs on LOAD DATA:
>http://impala.apache.org/docs/build/html/topics/impala_load_data.html
>
>"In the interest of speed, only limited error checking is done. If the
>loaded files have the wrong file format, different columns than the
>destination table, or other kind of mismatch, Impala does not raise
>any error for the LOAD DATA statement. Querying the table afterward
>could produce a runtime error or unexpected results. Currently, the
>only checking the LOAD DATA statement does is to avoid mixing together
>uncompressed and LZO-compressed text files in the same table."
>
>To reload CSV data as parquet using Impala, you'd have to create a
>table for the CSV data, then do an 'insert into [parquet table] select
>[...] from [csv_table]'.
>
>HTH
>
>On 12 October 2017 at 07:58, sky  wrote:
>> Hi all,
>> How does the parquet table perform load data operations? How does a CSV 
>> file import into the parquet table?


Re: Time for graduation?

2017-10-12 Thread Brock Noland
Hi all,

I've been thinking about this as well and I feel Impala is ready.

(more inline)

On Thu, Oct 12, 2017 at 6:06 PM, Todd Lipcon  wrote:

> On Thu, Oct 12, 2017 at 3:24 PM, Jim Apple  wrote:
>
> > Also, mentors are traditionally included in a graduating podling's PMC,
> > right?
>
> That's often been done but I don't think there's any hard requirement.
> Perhaps we could ask each mentor whether they would like to continue to be
> involved?
>

For my part, I don't feel I contribute much to the PMC, but Impala is a
project I use everyday and thus have a strong interest in the project being
successful. I would not be hurt in the *least* if I was not included on the
PMC. However, I'd be more than happy to serve.

Cheers,
Brock


Re: Time for graduation?

2017-10-12 Thread Jim Apple
All of that SGTM


Re: Time for graduation?

2017-10-12 Thread Todd Lipcon
On Thu, Oct 12, 2017 at 3:24 PM, Jim Apple  wrote:

> I think it would be a good time to graduate. I'm very proud of the progress
> the community has made in terms of acting in an Apache way.
>
> Some logistics:
>
> I would be happy to serve as an initial chair.
>
> I'll draft a resolution, with a blank space for chair. This doesn't mean we
> have to agree now is the time to graduate, but we'll have it available for
> discussion and revision whenever we are ready.
>
> If we decide to graduate now, maybe we could email everyone who is on the
> PPMC, ccing private@, to see if they are still interested in being on the
> PMC, and taking no response to mean "yes" until we hear otherwise, in case
> someone is on vacation away from email, or in the hospital, or something.
>

That seems pretty reasonable to me, given the default is "yes". Those that
respond "no" could just be given "PMC emeritus" status. There isn't any
official emeritus policy at Apache but it's a nice way to thank these
people for past involvement and typically such members could be easily
re-instated upon their request (see
http://www.apache.org/dev/pmc.html#emeritus)


>
> Also, mentors are traditionally included in a graduating podling's PMC,
> right?
>

That's often been done but I don't think there's any hard requirement.
Perhaps we could ask each mentor whether they would like to continue to be
involved?

-Todd


>
> On Thu, Oct 12, 2017 at 2:17 PM, Todd Lipcon  wrote:
>
> > Hey Impala community,
> >
> > It's been a while that all of the Impala infrastructure has been moved
> > over, and the community appears to be functioning healthily, generating
> new
> > releases on a regular cadence as well as adding new committers and PPMC
> > members. All of the branding stuff seems great, and the user mailing list
> > has a healthy amount of traffic and a good track record of answering
> > questions when they come up.
> >
> > As a mentor I think it's probably time to discuss graduation. The project
> > is already functioning in the same way as your typical Apache TLP and it
> > seems like it's time to become one.
> >
> > Any thoughts? If everyone is on board, the next step would be:
> >
> > 1. Pick the initial PMC chair for the TLP. According to the published
> > Impala Bylaws it seems that this is meant to rotate annually, so no need
> to
> > stress too much about it.
> >
> > A couple obvious choices here would be Marcel (as the original founder of
> > the project) or perhaps Jim (who has done yeoman's work on a lot of the
> > incubation process, podling reports, etc). Others could certainly
> volunteer
> > or be nominated as well.
> >
> > 2. Draft a Resolution for the PPMC and IPMC to vote upon.
> > -- the resolution would include the above-decided chair as well as the
> list
> > of initial PMC, etc.
> > -- the Initial PMC could be just the current list of PPMC, or you could
> > consider adding others at this point as well.
> >
> >
> > I can help with the above process but figured I'd solicit opinions first
> on
> > whether the communit feels it's ready to graduate.
> >
> > Thanks
> > Todd
> >
>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Time for graduation?

2017-10-12 Thread Jim Apple
I think it would be a good time to graduate. I'm very proud of the progress
the community has made in terms of acting in an Apache way.

Some logistics:

I would be happy to serve as an initial chair.

I'll draft a resolution, with a blank space for chair. This doesn't mean we
have to agree now is the time to graduate, but we'll have it available for
discussion and revision whenever we are ready.

If we decide to graduate now, maybe we could email everyone who is on the
PPMC, ccing private@, to see if they are still interested in being on the
PMC, and taking no response to mean "yes" until we hear otherwise, in case
someone is on vacation away from email, or in the hospital, or something.

Also, mentors are traditionally included in a graduating podling's PMC,
right?

On Thu, Oct 12, 2017 at 2:17 PM, Todd Lipcon  wrote:

> Hey Impala community,
>
> It's been a while that all of the Impala infrastructure has been moved
> over, and the community appears to be functioning healthily, generating new
> releases on a regular cadence as well as adding new committers and PPMC
> members. All of the branding stuff seems great, and the user mailing list
> has a healthy amount of traffic and a good track record of answering
> questions when they come up.
>
> As a mentor I think it's probably time to discuss graduation. The project
> is already functioning in the same way as your typical Apache TLP and it
> seems like it's time to become one.
>
> Any thoughts? If everyone is on board, the next step would be:
>
> 1. Pick the initial PMC chair for the TLP. According to the published
> Impala Bylaws it seems that this is meant to rotate annually, so no need to
> stress too much about it.
>
> A couple obvious choices here would be Marcel (as the original founder of
> the project) or perhaps Jim (who has done yeoman's work on a lot of the
> incubation process, podling reports, etc). Others could certainly volunteer
> or be nominated as well.
>
> 2. Draft a Resolution for the PPMC and IPMC to vote upon.
> -- the resolution would include the above-decided chair as well as the list
> of initial PMC, etc.
> -- the Initial PMC could be just the current list of PPMC, or you could
> consider adding others at this point as well.
>
>
> I can help with the above process but figured I'd solicit opinions first on
> whether the communit feels it's ready to graduate.
>
> Thanks
> Todd
>


Time for graduation?

2017-10-12 Thread Todd Lipcon
Hey Impala community,

It's been a while that all of the Impala infrastructure has been moved
over, and the community appears to be functioning healthily, generating new
releases on a regular cadence as well as adding new committers and PPMC
members. All of the branding stuff seems great, and the user mailing list
has a healthy amount of traffic and a good track record of answering
questions when they come up.

As a mentor I think it's probably time to discuss graduation. The project
is already functioning in the same way as your typical Apache TLP and it
seems like it's time to become one.

Any thoughts? If everyone is on board, the next step would be:

1. Pick the initial PMC chair for the TLP. According to the published
Impala Bylaws it seems that this is meant to rotate annually, so no need to
stress too much about it.

A couple obvious choices here would be Marcel (as the original founder of
the project) or perhaps Jim (who has done yeoman's work on a lot of the
incubation process, podling reports, etc). Others could certainly volunteer
or be nominated as well.

2. Draft a Resolution for the PPMC and IPMC to vote upon.
-- the resolution would include the above-decided chair as well as the list
of initial PMC, etc.
-- the Initial PMC could be just the current list of PPMC, or you could
consider adding others at this point as well.


I can help with the above process but figured I'd solicit opinions first on
whether the communit feels it's ready to graduate.

Thanks
Todd


Re: Load Data Parquet Table

2017-10-12 Thread Jeszy
See the docs on LOAD DATA:
http://impala.apache.org/docs/build/html/topics/impala_load_data.html

"In the interest of speed, only limited error checking is done. If the
loaded files have the wrong file format, different columns than the
destination table, or other kind of mismatch, Impala does not raise
any error for the LOAD DATA statement. Querying the table afterward
could produce a runtime error or unexpected results. Currently, the
only checking the LOAD DATA statement does is to avoid mixing together
uncompressed and LZO-compressed text files in the same table."

To reload CSV data as parquet using Impala, you'd have to create a
table for the CSV data, then do an 'insert into [parquet table] select
[...] from [csv_table]'.

HTH

On 12 October 2017 at 07:58, sky  wrote:
> Hi all,
> How does the parquet table perform load data operations? How does a CSV 
> file import into the parquet table?


Re: show create table return empty after change column name in hive

2017-10-12 Thread Jeszy
This does sound like a bug. What version are you using? Do you see any
errors in the catalog logs?
I think a global invalidate metadata should work, and it's a bit less
intrusive than a catalog restart. In general, it is a good idea to do
all metadata operations from Impala if you are using Impala at all, it
helps a lot in making metadata operations seamless.

On 12 October 2017 at 02:53, yu feng  wrote:
> In our scene, users always do metadata modifications in hive, and do some
> query in impala.
>
> 2017-10-12 16:31 GMT+08:00 sky :
>
>> Why is the second step performed in hive, not impala?
>>
>>
>>
>>
>>
>>
>>
>>
>> At 2017-10-12 15:12:38, "yu feng"  wrote:
>> >I open impala-shell and hive-cli.
>> >1、execute 'show create table impala_test.sales_fact_1997' in impala-shell
>> ,
>> >return :
>> >
>> >+--
>> -+
>> >| result
>> > |
>> >+--
>> -+
>> >| CREATE TABLE impala_test.sales_fact_1997 (
>> > |
>> >|   product_id INT,
>> >|
>> >|   time_id INT,
>> > |
>> >|   customer_id INT,
>> > |
>> >|   promotion_id INT,
>> >|
>> >|   store_id INT,
>> >|
>> >|   store_sales DOUBLE,
>> >|
>> >|   store_cost DOUBLE,
>> > |
>> >|   unit_sales DOUBLE
>> >|
>> >| )
>> >|
>> >|  COMMENT 'Imported by sqoop on 2017/06/09 20:25:40'
>> >|
>> >| ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u0001' LINES TERMINATED BY
>> >'\n'   |
>> >| WITH SERDEPROPERTIES ('field.delim'='\u0001', 'line.delim'='\n',
>> >'serialization.format'='\u0001') |
>> >| STORED AS PARQUET
>> >|
>> >| LOCATION
>> >'hdfs://hz-cluster1/user/nrpt/hive-server/impala_test.db/sales_fact_1997'
>> > |
>> >| TBLPROPERTIES ('COLUMN_STATS_ACCURATE'='true', 'numFiles'='3',
>> >'numRows'='10', 'rawDataSize'='80', 'totalSize'='1619937') |
>> >+--
>> -+
>> >
>> >2、execute 'alter table impala_test.sales_fact_1997 change column
>> product_id
>> >pproduct_id int;'  in hive -cli, return OK.
>> >3、execute 'invalidate metadata impala_test.sales_fact_1997 '.
>> >4、execute 'show create table impala_test.sales_fact_1997' again in
>> >impala-shell, return :
>> >
>> >+--
>> -+
>> >| result
>> > |
>> >+--
>> -+
>> >| CREATE TABLE impala_test.sales_fact_1997
>> > |
>> >|  COMMENT 'Imported by sqoop on 2017/06/09 20:25:40'
>> >|
>> >| ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u0001' LINES TERMINATED BY
>> >'\n'   |
>> >| WITH SERDEPROPERTIES ('field.delim'='\u0001', 'line.delim'='\n',
>> >'serialization.format'='\u0001') |
>> >| STORED AS PARQUET
>> >|
>> >| LOCATION
>> >'hdfs://hz-cluster1/user/nrpt/hive-server/impala_test.db/sales_fact_1997'
>> > |
>> >| TBLPROPERTIES ('COLUMN_STATS_ACCURATE'='true', 'numFiles'='3',
>> >'numRows'='10', 'rawDataSize'='80', 'totalSize'='1619937') |
>> >+--
>> -+
>> >
>> >all columns disappear, the column change will correct if I restart
>> >catalogd, I think it is a BUG caused by hive metastore client, It is any
>> >good idea overcome the problem except restart catalogd.
>> >
>> > I think we can check columns after getTable from HiveMetastoreClient, if
>> >it is empty, try to recreate the HiveMetastoreClient(hive do not support
>> >0-column table). is it a good way to overcome the problem if modify code
>> >like this?
>>


Re: Jenkins down briefly to take quiesced snapshot

2017-10-12 Thread Michael Brown
It's back up.

On Thu, Oct 12, 2017 at 7:14 AM, Michael Brown  wrote:

> Sorry for late notice, should only take a few minutes. No jobs are
> currently running.
>
> This is in preparation for an upgrade in the next few days.
>


Jenkins down briefly to take quiesced snapshot

2017-10-12 Thread Michael Brown
Sorry for late notice, should only take a few minutes. No jobs are
currently running.

This is in preparation for an upgrade in the next few days.


Re: show create table return empty after change column name in hive

2017-10-12 Thread yu feng
In our scene, users always do metadata modifications in hive, and do some
query in impala.

2017-10-12 16:31 GMT+08:00 sky :

> Why is the second step performed in hive, not impala?
>
>
>
>
>
>
>
>
> At 2017-10-12 15:12:38, "yu feng"  wrote:
> >I open impala-shell and hive-cli.
> >1、execute 'show create table impala_test.sales_fact_1997' in impala-shell
> ,
> >return :
> >
> >+--
> -+
> >| result
> > |
> >+--
> -+
> >| CREATE TABLE impala_test.sales_fact_1997 (
> > |
> >|   product_id INT,
> >|
> >|   time_id INT,
> > |
> >|   customer_id INT,
> > |
> >|   promotion_id INT,
> >|
> >|   store_id INT,
> >|
> >|   store_sales DOUBLE,
> >|
> >|   store_cost DOUBLE,
> > |
> >|   unit_sales DOUBLE
> >|
> >| )
> >|
> >|  COMMENT 'Imported by sqoop on 2017/06/09 20:25:40'
> >|
> >| ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u0001' LINES TERMINATED BY
> >'\n'   |
> >| WITH SERDEPROPERTIES ('field.delim'='\u0001', 'line.delim'='\n',
> >'serialization.format'='\u0001') |
> >| STORED AS PARQUET
> >|
> >| LOCATION
> >'hdfs://hz-cluster1/user/nrpt/hive-server/impala_test.db/sales_fact_1997'
> > |
> >| TBLPROPERTIES ('COLUMN_STATS_ACCURATE'='true', 'numFiles'='3',
> >'numRows'='10', 'rawDataSize'='80', 'totalSize'='1619937') |
> >+--
> -+
> >
> >2、execute 'alter table impala_test.sales_fact_1997 change column
> product_id
> >pproduct_id int;'  in hive -cli, return OK.
> >3、execute 'invalidate metadata impala_test.sales_fact_1997 '.
> >4、execute 'show create table impala_test.sales_fact_1997' again in
> >impala-shell, return :
> >
> >+--
> -+
> >| result
> > |
> >+--
> -+
> >| CREATE TABLE impala_test.sales_fact_1997
> > |
> >|  COMMENT 'Imported by sqoop on 2017/06/09 20:25:40'
> >|
> >| ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u0001' LINES TERMINATED BY
> >'\n'   |
> >| WITH SERDEPROPERTIES ('field.delim'='\u0001', 'line.delim'='\n',
> >'serialization.format'='\u0001') |
> >| STORED AS PARQUET
> >|
> >| LOCATION
> >'hdfs://hz-cluster1/user/nrpt/hive-server/impala_test.db/sales_fact_1997'
> > |
> >| TBLPROPERTIES ('COLUMN_STATS_ACCURATE'='true', 'numFiles'='3',
> >'numRows'='10', 'rawDataSize'='80', 'totalSize'='1619937') |
> >+--
> -+
> >
> >all columns disappear, the column change will correct if I restart
> >catalogd, I think it is a BUG caused by hive metastore client, It is any
> >good idea overcome the problem except restart catalogd.
> >
> > I think we can check columns after getTable from HiveMetastoreClient, if
> >it is empty, try to recreate the HiveMetastoreClient(hive do not support
> >0-column table). is it a good way to overcome the problem if modify code
> >like this?
>


Re:show create table return empty after change column name in hive

2017-10-12 Thread sky
Why is the second step performed in hive, not impala?








At 2017-10-12 15:12:38, "yu feng"  wrote:
>I open impala-shell and hive-cli.
>1、execute 'show create table impala_test.sales_fact_1997' in impala-shell ,
>return :
>
>+---+
>| result
> |
>+---+
>| CREATE TABLE impala_test.sales_fact_1997 (
> |
>|   product_id INT,
>|
>|   time_id INT,
> |
>|   customer_id INT,
> |
>|   promotion_id INT,
>|
>|   store_id INT,
>|
>|   store_sales DOUBLE,
>|
>|   store_cost DOUBLE,
> |
>|   unit_sales DOUBLE
>|
>| )
>|
>|  COMMENT 'Imported by sqoop on 2017/06/09 20:25:40'
>|
>| ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u0001' LINES TERMINATED BY
>'\n'   |
>| WITH SERDEPROPERTIES ('field.delim'='\u0001', 'line.delim'='\n',
>'serialization.format'='\u0001') |
>| STORED AS PARQUET
>|
>| LOCATION
>'hdfs://hz-cluster1/user/nrpt/hive-server/impala_test.db/sales_fact_1997'
> |
>| TBLPROPERTIES ('COLUMN_STATS_ACCURATE'='true', 'numFiles'='3',
>'numRows'='10', 'rawDataSize'='80', 'totalSize'='1619937') |
>+---+
>
>2、execute 'alter table impala_test.sales_fact_1997 change column product_id
>pproduct_id int;'  in hive -cli, return OK.
>3、execute 'invalidate metadata impala_test.sales_fact_1997 '.
>4、execute 'show create table impala_test.sales_fact_1997' again in
>impala-shell, return :
>
>+---+
>| result
> |
>+---+
>| CREATE TABLE impala_test.sales_fact_1997
> |
>|  COMMENT 'Imported by sqoop on 2017/06/09 20:25:40'
>|
>| ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u0001' LINES TERMINATED BY
>'\n'   |
>| WITH SERDEPROPERTIES ('field.delim'='\u0001', 'line.delim'='\n',
>'serialization.format'='\u0001') |
>| STORED AS PARQUET
>|
>| LOCATION
>'hdfs://hz-cluster1/user/nrpt/hive-server/impala_test.db/sales_fact_1997'
> |
>| TBLPROPERTIES ('COLUMN_STATS_ACCURATE'='true', 'numFiles'='3',
>'numRows'='10', 'rawDataSize'='80', 'totalSize'='1619937') |
>+---+
>
>all columns disappear, the column change will correct if I restart
>catalogd, I think it is a BUG caused by hive metastore client, It is any
>good idea overcome the problem except restart catalogd.
>
> I think we can check columns after getTable from HiveMetastoreClient, if
>it is empty, try to recreate the HiveMetastoreClient(hive do not support
>0-column table). is it a good way to overcome the problem if modify code
>like this?


show create table return empty after change column name in hive

2017-10-12 Thread yu feng
I open impala-shell and hive-cli.
1、execute 'show create table impala_test.sales_fact_1997' in impala-shell ,
return :

+---+
| result
 |
+---+
| CREATE TABLE impala_test.sales_fact_1997 (
 |
|   product_id INT,
|
|   time_id INT,
 |
|   customer_id INT,
 |
|   promotion_id INT,
|
|   store_id INT,
|
|   store_sales DOUBLE,
|
|   store_cost DOUBLE,
 |
|   unit_sales DOUBLE
|
| )
|
|  COMMENT 'Imported by sqoop on 2017/06/09 20:25:40'
|
| ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u0001' LINES TERMINATED BY
'\n'   |
| WITH SERDEPROPERTIES ('field.delim'='\u0001', 'line.delim'='\n',
'serialization.format'='\u0001') |
| STORED AS PARQUET
|
| LOCATION
'hdfs://hz-cluster1/user/nrpt/hive-server/impala_test.db/sales_fact_1997'
 |
| TBLPROPERTIES ('COLUMN_STATS_ACCURATE'='true', 'numFiles'='3',
'numRows'='10', 'rawDataSize'='80', 'totalSize'='1619937') |
+---+

2、execute 'alter table impala_test.sales_fact_1997 change column product_id
pproduct_id int;'  in hive -cli, return OK.
3、execute 'invalidate metadata impala_test.sales_fact_1997 '.
4、execute 'show create table impala_test.sales_fact_1997' again in
impala-shell, return :

+---+
| result
 |
+---+
| CREATE TABLE impala_test.sales_fact_1997
 |
|  COMMENT 'Imported by sqoop on 2017/06/09 20:25:40'
|
| ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u0001' LINES TERMINATED BY
'\n'   |
| WITH SERDEPROPERTIES ('field.delim'='\u0001', 'line.delim'='\n',
'serialization.format'='\u0001') |
| STORED AS PARQUET
|
| LOCATION
'hdfs://hz-cluster1/user/nrpt/hive-server/impala_test.db/sales_fact_1997'
 |
| TBLPROPERTIES ('COLUMN_STATS_ACCURATE'='true', 'numFiles'='3',
'numRows'='10', 'rawDataSize'='80', 'totalSize'='1619937') |
+---+

all columns disappear, the column change will correct if I restart
catalogd, I think it is a BUG caused by hive metastore client, It is any
good idea overcome the problem except restart catalogd.

 I think we can check columns after getTable from HiveMetastoreClient, if
it is empty, try to recreate the HiveMetastoreClient(hive do not support
0-column table). is it a good way to overcome the problem if modify code
like this?


Re: Re: Alter Table Drop Column

2017-10-12 Thread Alexander Behm
For Parquet you might be able to:
SET PARQUET_FALLBACK_SCHEMA_RESOLUTION=NAME;
(Please look in the documentation for details)

For TEXT, the columns in the table schema are matched to columns in the CSV
by else. No other way to do it because TEXT files do not contain schema
typically. Your CSV might contain a header with the column names, but
Impala cannot use that for resolution.


On Wed, Oct 11, 2017 at 11:52 PM, sky  wrote:

> Textfile and parquet,the two format both cause data confusion.
>
>
>
> .
>
>
>
>
> At 2017-10-12 14:41:02, "Alexander Behm"  wrote:
> >What's the file format?
> >
> >On Wed, Oct 11, 2017 at 11:30 PM, sky  wrote:
> >
> >> Hi all,
> >>   After using the 'alter table ... drop columns ...' to delete the
> middle
> >> column, then the select query  will appear data confusion, how to solve
> it ?
>


Re:Re: Alter Table Drop Column

2017-10-12 Thread sky
Textfile and parquet,the two format both cause data confusion.



.




At 2017-10-12 14:41:02, "Alexander Behm"  wrote:
>What's the file format?
>
>On Wed, Oct 11, 2017 at 11:30 PM, sky  wrote:
>
>> Hi all,
>>   After using the 'alter table ... drop columns ...' to delete the middle
>> column, then the select query  will appear data confusion, how to solve it ?


Re: Alter Table Drop Column

2017-10-12 Thread Alexander Behm
What's the file format?

On Wed, Oct 11, 2017 at 11:30 PM, sky  wrote:

> Hi all,
>   After using the 'alter table ... drop columns ...' to delete the middle
> column, then the select query  will appear data confusion, how to solve it ?


Alter Table Drop Column

2017-10-12 Thread sky
Hi all,
  After using the 'alter table ... drop columns ...' to delete the middle 
column, then the select query  will appear data confusion, how to solve it ?