Manpreet Singh created HIVE-19991:
-------------------------------------
Summary: msck repair table command not able to retrieve achieved
data.
Key: HIVE-19991
URL: https://issues.apache.org/jira/browse/HIVE-19991
Project: Hive
Issue Type: Bug
Affects Versions: 1.1.0
Reporter: Manpreet Singh
Observed an issue when customer used msck repair tablename command on a
archeived data (data copied via hadoop fs -cp from another location) by the
command did not loaded the partitions in the table and showed zero results.
Please find below complete steps for both archived and unarchived.
Copied table's partition data to another location & define a similar table
structure for new location an ran "msck repair table" command on the
destination and then ran a select * from <destination> table it produced zero
results, same thing is working for non achieved data.
1. Created two table with similar structure and different loctation.
create table sau_test1 ( a int, b string) partitioned by (dt string) stored as
parquet location '/user/hive/warehouse/sau_test1';
create table sau_arch ( a int, b string) partitioned by (dt string) stored as
parquet location'/user/hive/warehouse/sau_arch';
2. Inserted data in source table
insert into sau_test1 partition(dt='dt1') select 1,'A1';
select * from sau_test1;
+---------------+-------------++-----------------+
|sau_test1.a|sau_test1.b|sau_test1.dt|
+---------------+-------------++-----------------+
|1|A1|dt1|
+---------------+-------------++-----------------+
3. Copied the content of directory source directory to destination directory.
hadoop fs -cp /user/hive/warehouse/sau_test1/* /user/hive/warehouse/sau_arch/
4. Running msck repair table <destination table> and checking results. — for
unachieved data
msck repair table sau_arch ;
select * from sau_arch;
select * from sau_arch;
+--------------+------------++----------------+
|sau_arch.a|sau_arch.b|sau_arch.dt|
+--------------+------------++----------------+
|1|A1|dt1|
+--------------+------------++----------------+
5. Customer wants the same functionality for archived data and hence tried
below steps.
a) Dropped table partition in destination table "alter table sau_arch drop
partition(dt='dt1');"
b) set hive.archive.enabled=true;
alter table sau_test1 archive partition ( dt='dt1');
c) copied the hdfs files from source table to destination tables.
hdfs dfs -ls /user/hive/warehouse/sau_test1/dt=dt1/
drwxr-xr-x - hive supergroup 0 2018-06-08 13:26
/user/hive/warehouse/sau_test1/dt=dt1/data.har
-rw-r--r-- 3 hive supergroup 0 2018-06-08 13:26
/user/hive/warehouse/sau_test1/dt=dt1/data.har/_SUCCESS
-rw-r--r-- 3 hive supergroup 305 2018-06-08 13:26
/user/hive/warehouse/sau_test1/dt=dt1/data.har/_index
-rw-r--r-- 3 hive supergroup 23 2018-06-08 13:26
/user/hive/warehouse/sau_test1/dt=dt1/data.har/_masterindex
-rw-r--r-- 3 hive supergroup 286 2018-06-08 13:26
/user/hive/warehouse/sau_test1/dt=dt1/data.har/part-0
$ hdfs dfs -ls /user/hive/warehouse/sau_arch/dt=dt1/
drwxr-xr-x - ngdb supergroup 0 2018-06-08 13:27
/user/hive/warehouse/sau_arch/dt=dt1/data.har
-rw-r--r-- 3 ngdb supergroup 0 2018-06-08 13:27
/user/hive/warehouse/sau_arch/dt=dt1/data.har/_SUCCESS
-rw-r--r-- 3 ngdb supergroup 305 2018-06-08 13:27
/user/hive/warehouse/sau_arch/dt=dt1/data.har/_index
-rw-r--r-- 3 ngdb supergroup 23 2018-06-08 13:27
/user/hive/warehouse/sau_arch/dt=dt1/data.har/_masterindex
-rw-r--r-- 3 ngdb supergroup 286 2018-06-08 13:27
/user/hive/warehouse/sau_arch/dt=dt1/data.har/part-0
d) msck repair table sau_arch;
e)select * from sau_arch . - No results shown
+--------------+------------++----------------+
|sau_arch.a|sau_arch.b|sau_arch.dt|
+--------------+------------++----------------+
+--------------+------------++----------------+
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)