Re: show create table return empty after change column name in hive
Hey Yu, I tried to reproduce on a CDH5.13 cluster, but your exact commands work as expected for me. Are you using Impala 2.10 on a CDH5.13 cluster, or something else? Can you share your catalog and Hive metastore logs? Thanks. On 12 October 2017 at 19:39, yu fengwrote: > I try to use ' invalidate metadata' for the whole catalog, But the modified > table is still empty. I am doubt the only way is restart catalogd. > > BTW, I test with the newest version(2.10.0) > > 2017-10-13 0:17 GMT+08:00 Jeszy : > >> This does sound like a bug. What version are you using? Do you see any >> errors in the catalog logs? >> I think a global invalidate metadata should work, and it's a bit less >> intrusive than a catalog restart. In general, it is a good idea to do >> all metadata operations from Impala if you are using Impala at all, it >> helps a lot in making metadata operations seamless. >> >> On 12 October 2017 at 02:53, yu feng wrote: >> > In our scene, users always do metadata modifications in hive, and do some >> > query in impala. >> > >> > 2017-10-12 16:31 GMT+08:00 sky : >> > >> >> Why is the second step performed in hive, not impala? >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> At 2017-10-12 15:12:38, "yu feng" wrote: >> >> >I open impala-shell and hive-cli. >> >> >1、execute 'show create table impala_test.sales_fact_1997' in >> impala-shell >> >> , >> >> >return : >> >> > >> >> >+-- >> >> -+ >> >> >| result >> >> > | >> >> >+-- >> >> -+ >> >> >| CREATE TABLE impala_test.sales_fact_1997 ( >> >> > | >> >> >| product_id INT, >> >> >| >> >> >| time_id INT, >> >> > | >> >> >| customer_id INT, >> >> > | >> >> >| promotion_id INT, >> >> >| >> >> >| store_id INT, >> >> >| >> >> >| store_sales DOUBLE, >> >> >| >> >> >| store_cost DOUBLE, >> >> > | >> >> >| unit_sales DOUBLE >> >> >| >> >> >| ) >> >> >| >> >> >| COMMENT 'Imported by sqoop on 2017/06/09 20:25:40' >> >> >| >> >> >| ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u0001' LINES TERMINATED >> BY >> >> >'\n' | >> >> >| WITH SERDEPROPERTIES ('field.delim'='\u0001', 'line.delim'='\n', >> >> >'serialization.format'='\u0001') | >> >> >| STORED AS PARQUET >> >> >| >> >> >| LOCATION >> >> >'hdfs://hz-cluster1/user/nrpt/hive-server/impala_test. >> db/sales_fact_1997' >> >> > | >> >> >| TBLPROPERTIES ('COLUMN_STATS_ACCURATE'='true', 'numFiles'='3', >> >> >'numRows'='10', 'rawDataSize'='80', 'totalSize'='1619937') | >> >> >+-- >> >> -+ >> >> > >> >> >2、execute 'alter table impala_test.sales_fact_1997 change column >> >> product_id >> >> >pproduct_id int;' in hive -cli, return OK. >> >> >3、execute 'invalidate metadata impala_test.sales_fact_1997 '. >> >> >4、execute 'show create table impala_test.sales_fact_1997' again in >> >> >impala-shell, return : >> >> > >> >> >+-- >> >> -+ >> >> >| result >> >> > | >> >> >+-- >> >> -+ >> >> >| CREATE TABLE impala_test.sales_fact_1997 >> >> > | >> >> >| COMMENT 'Imported by sqoop on 2017/06/09 20:25:40' >> >> >| >> >> >| ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u0001' LINES TERMINATED >> BY >> >> >'\n' | >> >> >| WITH SERDEPROPERTIES ('field.delim'='\u0001', 'line.delim'='\n', >> >> >'serialization.format'='\u0001') | >> >> >| STORED AS PARQUET >> >> >| >> >> >| LOCATION >> >> >'hdfs://hz-cluster1/user/nrpt/hive-server/impala_test. >>
Re: Re: Load Data Parquet Table
You can load already existing parquet files to the destination table from another location in HDFS. On 12 October 2017 at 18:44, skywrote: > From the impala document, parquet supports load data operation, and how does > it support ? > > > > > > > > > At 2017-10-13 00:30:12, "Jeszy" wrote: >>See the docs on LOAD DATA: >>http://impala.apache.org/docs/build/html/topics/impala_load_data.html >> >>"In the interest of speed, only limited error checking is done. If the >>loaded files have the wrong file format, different columns than the >>destination table, or other kind of mismatch, Impala does not raise >>any error for the LOAD DATA statement. Querying the table afterward >>could produce a runtime error or unexpected results. Currently, the >>only checking the LOAD DATA statement does is to avoid mixing together >>uncompressed and LZO-compressed text files in the same table." >> >>To reload CSV data as parquet using Impala, you'd have to create a >>table for the CSV data, then do an 'insert into [parquet table] select >>[...] from [csv_table]'. >> >>HTH >> >>On 12 October 2017 at 07:58, sky wrote: >>> Hi all, >>> How does the parquet table perform load data operations? How does a CSV >>> file import into the parquet table?
Re: show create table return empty after change column name in hive
I try to use ' invalidate metadata' for the whole catalog, But the modified table is still empty. I am doubt the only way is restart catalogd. BTW, I test with the newest version(2.10.0) 2017-10-13 0:17 GMT+08:00 Jeszy: > This does sound like a bug. What version are you using? Do you see any > errors in the catalog logs? > I think a global invalidate metadata should work, and it's a bit less > intrusive than a catalog restart. In general, it is a good idea to do > all metadata operations from Impala if you are using Impala at all, it > helps a lot in making metadata operations seamless. > > On 12 October 2017 at 02:53, yu feng wrote: > > In our scene, users always do metadata modifications in hive, and do some > > query in impala. > > > > 2017-10-12 16:31 GMT+08:00 sky : > > > >> Why is the second step performed in hive, not impala? > >> > >> > >> > >> > >> > >> > >> > >> > >> At 2017-10-12 15:12:38, "yu feng" wrote: > >> >I open impala-shell and hive-cli. > >> >1、execute 'show create table impala_test.sales_fact_1997' in > impala-shell > >> , > >> >return : > >> > > >> >+-- > >> -+ > >> >| result > >> > | > >> >+-- > >> -+ > >> >| CREATE TABLE impala_test.sales_fact_1997 ( > >> > | > >> >| product_id INT, > >> >| > >> >| time_id INT, > >> > | > >> >| customer_id INT, > >> > | > >> >| promotion_id INT, > >> >| > >> >| store_id INT, > >> >| > >> >| store_sales DOUBLE, > >> >| > >> >| store_cost DOUBLE, > >> > | > >> >| unit_sales DOUBLE > >> >| > >> >| ) > >> >| > >> >| COMMENT 'Imported by sqoop on 2017/06/09 20:25:40' > >> >| > >> >| ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u0001' LINES TERMINATED > BY > >> >'\n' | > >> >| WITH SERDEPROPERTIES ('field.delim'='\u0001', 'line.delim'='\n', > >> >'serialization.format'='\u0001') | > >> >| STORED AS PARQUET > >> >| > >> >| LOCATION > >> >'hdfs://hz-cluster1/user/nrpt/hive-server/impala_test. > db/sales_fact_1997' > >> > | > >> >| TBLPROPERTIES ('COLUMN_STATS_ACCURATE'='true', 'numFiles'='3', > >> >'numRows'='10', 'rawDataSize'='80', 'totalSize'='1619937') | > >> >+-- > >> -+ > >> > > >> >2、execute 'alter table impala_test.sales_fact_1997 change column > >> product_id > >> >pproduct_id int;' in hive -cli, return OK. > >> >3、execute 'invalidate metadata impala_test.sales_fact_1997 '. > >> >4、execute 'show create table impala_test.sales_fact_1997' again in > >> >impala-shell, return : > >> > > >> >+-- > >> -+ > >> >| result > >> > | > >> >+-- > >> -+ > >> >| CREATE TABLE impala_test.sales_fact_1997 > >> > | > >> >| COMMENT 'Imported by sqoop on 2017/06/09 20:25:40' > >> >| > >> >| ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u0001' LINES TERMINATED > BY > >> >'\n' | > >> >| WITH SERDEPROPERTIES ('field.delim'='\u0001', 'line.delim'='\n', > >> >'serialization.format'='\u0001') | > >> >| STORED AS PARQUET > >> >| > >> >| LOCATION > >> >'hdfs://hz-cluster1/user/nrpt/hive-server/impala_test. > db/sales_fact_1997' > >> > | > >> >| TBLPROPERTIES ('COLUMN_STATS_ACCURATE'='true', 'numFiles'='3', > >> >'numRows'='10', 'rawDataSize'='80', 'totalSize'='1619937') | > >> >+-- > >> -+ > >> > > >> >all columns disappear, the column change will correct if I
Re:Re: Load Data Parquet Table
From the impala document, parquet supports load data operation, and how does it support ? At 2017-10-13 00:30:12, "Jeszy"wrote: >See the docs on LOAD DATA: >http://impala.apache.org/docs/build/html/topics/impala_load_data.html > >"In the interest of speed, only limited error checking is done. If the >loaded files have the wrong file format, different columns than the >destination table, or other kind of mismatch, Impala does not raise >any error for the LOAD DATA statement. Querying the table afterward >could produce a runtime error or unexpected results. Currently, the >only checking the LOAD DATA statement does is to avoid mixing together >uncompressed and LZO-compressed text files in the same table." > >To reload CSV data as parquet using Impala, you'd have to create a >table for the CSV data, then do an 'insert into [parquet table] select >[...] from [csv_table]'. > >HTH > >On 12 October 2017 at 07:58, sky wrote: >> Hi all, >> How does the parquet table perform load data operations? How does a CSV >> file import into the parquet table?
Re: Time for graduation?
Hi all, I've been thinking about this as well and I feel Impala is ready. (more inline) On Thu, Oct 12, 2017 at 6:06 PM, Todd Lipconwrote: > On Thu, Oct 12, 2017 at 3:24 PM, Jim Apple wrote: > > > Also, mentors are traditionally included in a graduating podling's PMC, > > right? > > That's often been done but I don't think there's any hard requirement. > Perhaps we could ask each mentor whether they would like to continue to be > involved? > For my part, I don't feel I contribute much to the PMC, but Impala is a project I use everyday and thus have a strong interest in the project being successful. I would not be hurt in the *least* if I was not included on the PMC. However, I'd be more than happy to serve. Cheers, Brock
Re: Time for graduation?
All of that SGTM
Re: Time for graduation?
On Thu, Oct 12, 2017 at 3:24 PM, Jim Applewrote: > I think it would be a good time to graduate. I'm very proud of the progress > the community has made in terms of acting in an Apache way. > > Some logistics: > > I would be happy to serve as an initial chair. > > I'll draft a resolution, with a blank space for chair. This doesn't mean we > have to agree now is the time to graduate, but we'll have it available for > discussion and revision whenever we are ready. > > If we decide to graduate now, maybe we could email everyone who is on the > PPMC, ccing private@, to see if they are still interested in being on the > PMC, and taking no response to mean "yes" until we hear otherwise, in case > someone is on vacation away from email, or in the hospital, or something. > That seems pretty reasonable to me, given the default is "yes". Those that respond "no" could just be given "PMC emeritus" status. There isn't any official emeritus policy at Apache but it's a nice way to thank these people for past involvement and typically such members could be easily re-instated upon their request (see http://www.apache.org/dev/pmc.html#emeritus) > > Also, mentors are traditionally included in a graduating podling's PMC, > right? > That's often been done but I don't think there's any hard requirement. Perhaps we could ask each mentor whether they would like to continue to be involved? -Todd > > On Thu, Oct 12, 2017 at 2:17 PM, Todd Lipcon wrote: > > > Hey Impala community, > > > > It's been a while that all of the Impala infrastructure has been moved > > over, and the community appears to be functioning healthily, generating > new > > releases on a regular cadence as well as adding new committers and PPMC > > members. All of the branding stuff seems great, and the user mailing list > > has a healthy amount of traffic and a good track record of answering > > questions when they come up. > > > > As a mentor I think it's probably time to discuss graduation. The project > > is already functioning in the same way as your typical Apache TLP and it > > seems like it's time to become one. > > > > Any thoughts? If everyone is on board, the next step would be: > > > > 1. Pick the initial PMC chair for the TLP. According to the published > > Impala Bylaws it seems that this is meant to rotate annually, so no need > to > > stress too much about it. > > > > A couple obvious choices here would be Marcel (as the original founder of > > the project) or perhaps Jim (who has done yeoman's work on a lot of the > > incubation process, podling reports, etc). Others could certainly > volunteer > > or be nominated as well. > > > > 2. Draft a Resolution for the PPMC and IPMC to vote upon. > > -- the resolution would include the above-decided chair as well as the > list > > of initial PMC, etc. > > -- the Initial PMC could be just the current list of PPMC, or you could > > consider adding others at this point as well. > > > > > > I can help with the above process but figured I'd solicit opinions first > on > > whether the communit feels it's ready to graduate. > > > > Thanks > > Todd > > > -- Todd Lipcon Software Engineer, Cloudera
Re: Time for graduation?
I think it would be a good time to graduate. I'm very proud of the progress the community has made in terms of acting in an Apache way. Some logistics: I would be happy to serve as an initial chair. I'll draft a resolution, with a blank space for chair. This doesn't mean we have to agree now is the time to graduate, but we'll have it available for discussion and revision whenever we are ready. If we decide to graduate now, maybe we could email everyone who is on the PPMC, ccing private@, to see if they are still interested in being on the PMC, and taking no response to mean "yes" until we hear otherwise, in case someone is on vacation away from email, or in the hospital, or something. Also, mentors are traditionally included in a graduating podling's PMC, right? On Thu, Oct 12, 2017 at 2:17 PM, Todd Lipconwrote: > Hey Impala community, > > It's been a while that all of the Impala infrastructure has been moved > over, and the community appears to be functioning healthily, generating new > releases on a regular cadence as well as adding new committers and PPMC > members. All of the branding stuff seems great, and the user mailing list > has a healthy amount of traffic and a good track record of answering > questions when they come up. > > As a mentor I think it's probably time to discuss graduation. The project > is already functioning in the same way as your typical Apache TLP and it > seems like it's time to become one. > > Any thoughts? If everyone is on board, the next step would be: > > 1. Pick the initial PMC chair for the TLP. According to the published > Impala Bylaws it seems that this is meant to rotate annually, so no need to > stress too much about it. > > A couple obvious choices here would be Marcel (as the original founder of > the project) or perhaps Jim (who has done yeoman's work on a lot of the > incubation process, podling reports, etc). Others could certainly volunteer > or be nominated as well. > > 2. Draft a Resolution for the PPMC and IPMC to vote upon. > -- the resolution would include the above-decided chair as well as the list > of initial PMC, etc. > -- the Initial PMC could be just the current list of PPMC, or you could > consider adding others at this point as well. > > > I can help with the above process but figured I'd solicit opinions first on > whether the communit feels it's ready to graduate. > > Thanks > Todd >
Time for graduation?
Hey Impala community, It's been a while that all of the Impala infrastructure has been moved over, and the community appears to be functioning healthily, generating new releases on a regular cadence as well as adding new committers and PPMC members. All of the branding stuff seems great, and the user mailing list has a healthy amount of traffic and a good track record of answering questions when they come up. As a mentor I think it's probably time to discuss graduation. The project is already functioning in the same way as your typical Apache TLP and it seems like it's time to become one. Any thoughts? If everyone is on board, the next step would be: 1. Pick the initial PMC chair for the TLP. According to the published Impala Bylaws it seems that this is meant to rotate annually, so no need to stress too much about it. A couple obvious choices here would be Marcel (as the original founder of the project) or perhaps Jim (who has done yeoman's work on a lot of the incubation process, podling reports, etc). Others could certainly volunteer or be nominated as well. 2. Draft a Resolution for the PPMC and IPMC to vote upon. -- the resolution would include the above-decided chair as well as the list of initial PMC, etc. -- the Initial PMC could be just the current list of PPMC, or you could consider adding others at this point as well. I can help with the above process but figured I'd solicit opinions first on whether the communit feels it's ready to graduate. Thanks Todd
Re: Load Data Parquet Table
See the docs on LOAD DATA: http://impala.apache.org/docs/build/html/topics/impala_load_data.html "In the interest of speed, only limited error checking is done. If the loaded files have the wrong file format, different columns than the destination table, or other kind of mismatch, Impala does not raise any error for the LOAD DATA statement. Querying the table afterward could produce a runtime error or unexpected results. Currently, the only checking the LOAD DATA statement does is to avoid mixing together uncompressed and LZO-compressed text files in the same table." To reload CSV data as parquet using Impala, you'd have to create a table for the CSV data, then do an 'insert into [parquet table] select [...] from [csv_table]'. HTH On 12 October 2017 at 07:58, skywrote: > Hi all, > How does the parquet table perform load data operations? How does a CSV > file import into the parquet table?
Re: show create table return empty after change column name in hive
This does sound like a bug. What version are you using? Do you see any errors in the catalog logs? I think a global invalidate metadata should work, and it's a bit less intrusive than a catalog restart. In general, it is a good idea to do all metadata operations from Impala if you are using Impala at all, it helps a lot in making metadata operations seamless. On 12 October 2017 at 02:53, yu fengwrote: > In our scene, users always do metadata modifications in hive, and do some > query in impala. > > 2017-10-12 16:31 GMT+08:00 sky : > >> Why is the second step performed in hive, not impala? >> >> >> >> >> >> >> >> >> At 2017-10-12 15:12:38, "yu feng" wrote: >> >I open impala-shell and hive-cli. >> >1、execute 'show create table impala_test.sales_fact_1997' in impala-shell >> , >> >return : >> > >> >+-- >> -+ >> >| result >> > | >> >+-- >> -+ >> >| CREATE TABLE impala_test.sales_fact_1997 ( >> > | >> >| product_id INT, >> >| >> >| time_id INT, >> > | >> >| customer_id INT, >> > | >> >| promotion_id INT, >> >| >> >| store_id INT, >> >| >> >| store_sales DOUBLE, >> >| >> >| store_cost DOUBLE, >> > | >> >| unit_sales DOUBLE >> >| >> >| ) >> >| >> >| COMMENT 'Imported by sqoop on 2017/06/09 20:25:40' >> >| >> >| ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u0001' LINES TERMINATED BY >> >'\n' | >> >| WITH SERDEPROPERTIES ('field.delim'='\u0001', 'line.delim'='\n', >> >'serialization.format'='\u0001') | >> >| STORED AS PARQUET >> >| >> >| LOCATION >> >'hdfs://hz-cluster1/user/nrpt/hive-server/impala_test.db/sales_fact_1997' >> > | >> >| TBLPROPERTIES ('COLUMN_STATS_ACCURATE'='true', 'numFiles'='3', >> >'numRows'='10', 'rawDataSize'='80', 'totalSize'='1619937') | >> >+-- >> -+ >> > >> >2、execute 'alter table impala_test.sales_fact_1997 change column >> product_id >> >pproduct_id int;' in hive -cli, return OK. >> >3、execute 'invalidate metadata impala_test.sales_fact_1997 '. >> >4、execute 'show create table impala_test.sales_fact_1997' again in >> >impala-shell, return : >> > >> >+-- >> -+ >> >| result >> > | >> >+-- >> -+ >> >| CREATE TABLE impala_test.sales_fact_1997 >> > | >> >| COMMENT 'Imported by sqoop on 2017/06/09 20:25:40' >> >| >> >| ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u0001' LINES TERMINATED BY >> >'\n' | >> >| WITH SERDEPROPERTIES ('field.delim'='\u0001', 'line.delim'='\n', >> >'serialization.format'='\u0001') | >> >| STORED AS PARQUET >> >| >> >| LOCATION >> >'hdfs://hz-cluster1/user/nrpt/hive-server/impala_test.db/sales_fact_1997' >> > | >> >| TBLPROPERTIES ('COLUMN_STATS_ACCURATE'='true', 'numFiles'='3', >> >'numRows'='10', 'rawDataSize'='80', 'totalSize'='1619937') | >> >+-- >> -+ >> > >> >all columns disappear, the column change will correct if I restart >> >catalogd, I think it is a BUG caused by hive metastore client, It is any >> >good idea overcome the problem except restart catalogd. >> > >> > I think we can check columns after getTable from HiveMetastoreClient, if >> >it is empty, try to recreate the HiveMetastoreClient(hive do not support >> >0-column table). is it a good way to overcome the problem if modify code >> >like this? >>
Re: Jenkins down briefly to take quiesced snapshot
It's back up. On Thu, Oct 12, 2017 at 7:14 AM, Michael Brownwrote: > Sorry for late notice, should only take a few minutes. No jobs are > currently running. > > This is in preparation for an upgrade in the next few days. >
Jenkins down briefly to take quiesced snapshot
Sorry for late notice, should only take a few minutes. No jobs are currently running. This is in preparation for an upgrade in the next few days.
Re: show create table return empty after change column name in hive
In our scene, users always do metadata modifications in hive, and do some query in impala. 2017-10-12 16:31 GMT+08:00 sky: > Why is the second step performed in hive, not impala? > > > > > > > > > At 2017-10-12 15:12:38, "yu feng" wrote: > >I open impala-shell and hive-cli. > >1、execute 'show create table impala_test.sales_fact_1997' in impala-shell > , > >return : > > > >+-- > -+ > >| result > > | > >+-- > -+ > >| CREATE TABLE impala_test.sales_fact_1997 ( > > | > >| product_id INT, > >| > >| time_id INT, > > | > >| customer_id INT, > > | > >| promotion_id INT, > >| > >| store_id INT, > >| > >| store_sales DOUBLE, > >| > >| store_cost DOUBLE, > > | > >| unit_sales DOUBLE > >| > >| ) > >| > >| COMMENT 'Imported by sqoop on 2017/06/09 20:25:40' > >| > >| ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u0001' LINES TERMINATED BY > >'\n' | > >| WITH SERDEPROPERTIES ('field.delim'='\u0001', 'line.delim'='\n', > >'serialization.format'='\u0001') | > >| STORED AS PARQUET > >| > >| LOCATION > >'hdfs://hz-cluster1/user/nrpt/hive-server/impala_test.db/sales_fact_1997' > > | > >| TBLPROPERTIES ('COLUMN_STATS_ACCURATE'='true', 'numFiles'='3', > >'numRows'='10', 'rawDataSize'='80', 'totalSize'='1619937') | > >+-- > -+ > > > >2、execute 'alter table impala_test.sales_fact_1997 change column > product_id > >pproduct_id int;' in hive -cli, return OK. > >3、execute 'invalidate metadata impala_test.sales_fact_1997 '. > >4、execute 'show create table impala_test.sales_fact_1997' again in > >impala-shell, return : > > > >+-- > -+ > >| result > > | > >+-- > -+ > >| CREATE TABLE impala_test.sales_fact_1997 > > | > >| COMMENT 'Imported by sqoop on 2017/06/09 20:25:40' > >| > >| ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u0001' LINES TERMINATED BY > >'\n' | > >| WITH SERDEPROPERTIES ('field.delim'='\u0001', 'line.delim'='\n', > >'serialization.format'='\u0001') | > >| STORED AS PARQUET > >| > >| LOCATION > >'hdfs://hz-cluster1/user/nrpt/hive-server/impala_test.db/sales_fact_1997' > > | > >| TBLPROPERTIES ('COLUMN_STATS_ACCURATE'='true', 'numFiles'='3', > >'numRows'='10', 'rawDataSize'='80', 'totalSize'='1619937') | > >+-- > -+ > > > >all columns disappear, the column change will correct if I restart > >catalogd, I think it is a BUG caused by hive metastore client, It is any > >good idea overcome the problem except restart catalogd. > > > > I think we can check columns after getTable from HiveMetastoreClient, if > >it is empty, try to recreate the HiveMetastoreClient(hive do not support > >0-column table). is it a good way to overcome the problem if modify code > >like this? >
Re:show create table return empty after change column name in hive
Why is the second step performed in hive, not impala? At 2017-10-12 15:12:38, "yu feng"wrote: >I open impala-shell and hive-cli. >1、execute 'show create table impala_test.sales_fact_1997' in impala-shell , >return : > >+---+ >| result > | >+---+ >| CREATE TABLE impala_test.sales_fact_1997 ( > | >| product_id INT, >| >| time_id INT, > | >| customer_id INT, > | >| promotion_id INT, >| >| store_id INT, >| >| store_sales DOUBLE, >| >| store_cost DOUBLE, > | >| unit_sales DOUBLE >| >| ) >| >| COMMENT 'Imported by sqoop on 2017/06/09 20:25:40' >| >| ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u0001' LINES TERMINATED BY >'\n' | >| WITH SERDEPROPERTIES ('field.delim'='\u0001', 'line.delim'='\n', >'serialization.format'='\u0001') | >| STORED AS PARQUET >| >| LOCATION >'hdfs://hz-cluster1/user/nrpt/hive-server/impala_test.db/sales_fact_1997' > | >| TBLPROPERTIES ('COLUMN_STATS_ACCURATE'='true', 'numFiles'='3', >'numRows'='10', 'rawDataSize'='80', 'totalSize'='1619937') | >+---+ > >2、execute 'alter table impala_test.sales_fact_1997 change column product_id >pproduct_id int;' in hive -cli, return OK. >3、execute 'invalidate metadata impala_test.sales_fact_1997 '. >4、execute 'show create table impala_test.sales_fact_1997' again in >impala-shell, return : > >+---+ >| result > | >+---+ >| CREATE TABLE impala_test.sales_fact_1997 > | >| COMMENT 'Imported by sqoop on 2017/06/09 20:25:40' >| >| ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u0001' LINES TERMINATED BY >'\n' | >| WITH SERDEPROPERTIES ('field.delim'='\u0001', 'line.delim'='\n', >'serialization.format'='\u0001') | >| STORED AS PARQUET >| >| LOCATION >'hdfs://hz-cluster1/user/nrpt/hive-server/impala_test.db/sales_fact_1997' > | >| TBLPROPERTIES ('COLUMN_STATS_ACCURATE'='true', 'numFiles'='3', >'numRows'='10', 'rawDataSize'='80', 'totalSize'='1619937') | >+---+ > >all columns disappear, the column change will correct if I restart >catalogd, I think it is a BUG caused by hive metastore client, It is any >good idea overcome the problem except restart catalogd. > > I think we can check columns after getTable from HiveMetastoreClient, if >it is empty, try to recreate the HiveMetastoreClient(hive do not support >0-column table). is it a good way to overcome the problem if modify code >like this?
show create table return empty after change column name in hive
I open impala-shell and hive-cli. 1、execute 'show create table impala_test.sales_fact_1997' in impala-shell , return : +---+ | result | +---+ | CREATE TABLE impala_test.sales_fact_1997 ( | | product_id INT, | | time_id INT, | | customer_id INT, | | promotion_id INT, | | store_id INT, | | store_sales DOUBLE, | | store_cost DOUBLE, | | unit_sales DOUBLE | | ) | | COMMENT 'Imported by sqoop on 2017/06/09 20:25:40' | | ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u0001' LINES TERMINATED BY '\n' | | WITH SERDEPROPERTIES ('field.delim'='\u0001', 'line.delim'='\n', 'serialization.format'='\u0001') | | STORED AS PARQUET | | LOCATION 'hdfs://hz-cluster1/user/nrpt/hive-server/impala_test.db/sales_fact_1997' | | TBLPROPERTIES ('COLUMN_STATS_ACCURATE'='true', 'numFiles'='3', 'numRows'='10', 'rawDataSize'='80', 'totalSize'='1619937') | +---+ 2、execute 'alter table impala_test.sales_fact_1997 change column product_id pproduct_id int;' in hive -cli, return OK. 3、execute 'invalidate metadata impala_test.sales_fact_1997 '. 4、execute 'show create table impala_test.sales_fact_1997' again in impala-shell, return : +---+ | result | +---+ | CREATE TABLE impala_test.sales_fact_1997 | | COMMENT 'Imported by sqoop on 2017/06/09 20:25:40' | | ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u0001' LINES TERMINATED BY '\n' | | WITH SERDEPROPERTIES ('field.delim'='\u0001', 'line.delim'='\n', 'serialization.format'='\u0001') | | STORED AS PARQUET | | LOCATION 'hdfs://hz-cluster1/user/nrpt/hive-server/impala_test.db/sales_fact_1997' | | TBLPROPERTIES ('COLUMN_STATS_ACCURATE'='true', 'numFiles'='3', 'numRows'='10', 'rawDataSize'='80', 'totalSize'='1619937') | +---+ all columns disappear, the column change will correct if I restart catalogd, I think it is a BUG caused by hive metastore client, It is any good idea overcome the problem except restart catalogd. I think we can check columns after getTable from HiveMetastoreClient, if it is empty, try to recreate the HiveMetastoreClient(hive do not support 0-column table). is it a good way to overcome the problem if modify code like this?
Re: Re: Alter Table Drop Column
For Parquet you might be able to: SET PARQUET_FALLBACK_SCHEMA_RESOLUTION=NAME; (Please look in the documentation for details) For TEXT, the columns in the table schema are matched to columns in the CSV by else. No other way to do it because TEXT files do not contain schema typically. Your CSV might contain a header with the column names, but Impala cannot use that for resolution. On Wed, Oct 11, 2017 at 11:52 PM, skywrote: > Textfile and parquet,the two format both cause data confusion. > > > > . > > > > > At 2017-10-12 14:41:02, "Alexander Behm" wrote: > >What's the file format? > > > >On Wed, Oct 11, 2017 at 11:30 PM, sky wrote: > > > >> Hi all, > >> After using the 'alter table ... drop columns ...' to delete the > middle > >> column, then the select query will appear data confusion, how to solve > it ? >
Re:Re: Alter Table Drop Column
Textfile and parquet,the two format both cause data confusion. . At 2017-10-12 14:41:02, "Alexander Behm"wrote: >What's the file format? > >On Wed, Oct 11, 2017 at 11:30 PM, sky wrote: > >> Hi all, >> After using the 'alter table ... drop columns ...' to delete the middle >> column, then the select query will appear data confusion, how to solve it ?
Re: Alter Table Drop Column
What's the file format? On Wed, Oct 11, 2017 at 11:30 PM, skywrote: > Hi all, > After using the 'alter table ... drop columns ...' to delete the middle > column, then the select query will appear data confusion, how to solve it ?
Alter Table Drop Column
Hi all, After using the 'alter table ... drop columns ...' to delete the middle column, then the select query will appear data confusion, how to solve it ?