[jira] [Updated] (HIVE-16666) Set hive.exec.stagingdir a relative directory or a sub directory of distination data directory will cause Hive to delete the intermediate query results

yangfang (JIRA) Mon, 15 May 2017 05:21:26 -0700

     [ 
https://issues.apache.org/jira/browse/HIVE-16666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


yangfang updated HIVE-16666:
----------------------------
    Description: 
Set hive.exec.stagingdir=./*,  for example set hive.exec.stagingdir=./opq8.
Then excute a query like this:
insert overwrite table test2 select * from test3; 
You will get the error like this:
hive> set hive.exec.stagingdir=./opq8;
hive> insert overwrite table test2 select * from test3;
Query ID = mr_20170515134831_28ee392d-0d5a-4e47-b80c-dfcd31691b02
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1494818119523_0008, Tracking URL = 
http://zdh77:8088/proxy/application_1494818119523_0008/
Kill Command = /opt/ZDH/parcels/lib/hadoop/bin/hadoop job  -kill 
job_1494818119523_0008
Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0
2017-05-15 13:48:51,487 Stage-1 map = 0%,  reduce = 0%
Ended Job = job_1494818119523_0008
Stage-3 is selected by condition resolver.
Stage-2 is filtered out by condition resolver.
Stage-4 is filtered out by condition resolver.
Moving data to directory 
hdfs://nameservice/hive/test2/opqt8_hive_2017-05-15_13-48-31_558_6151032330134038151-1/-ext-10000
Loading data to table default.test2
Moved: 
'hdfs://nameservice/hive/test2/opqt8_hive_2017-05-15_13-48-31_558_6151032330134038151-1'
 to trash at: hdfs://nameservice/user/mr/.Trash/Current
Failed with exception Unable to move source 
hdfs://nameservice/hive/test2/opqt8_hive_2017-05-15_13-48-31_558_6151032330134038151-1/-ext-10000
 to destination hdfs://nameservice/hive/test2
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.MoveTask. Unable to move source 
hdfs://nameservice/hive/test2/opqt8_hive_2017-05-15_13-48-31_558_6151032330134038151-1/-ext-10000
 to destination hdfs://nameservice/hive/test2
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
hive>

I set hive.exec.stagingdir=./opq8 is a relative path for destination write 
directory  /hive/test2.  Hive will create a temporary directory 
/hive/test2/opq8_hive* for intermediate query results.  Later in the move 
staging, Hive will delete or trash the sub directory under the /hive/test2 
who's name does not begin with "_" or "."  in order to move data to this 
directory. You can see its processing logic in 
org.apache.hadoop.hive.ql.metadata.trashFilesUnderDir.

My modification method is: if  stagingdir is a sub directory of the destination 
write directory. I add a "."   in front of stagingdir. now temporary directory 
will be /hive/test2/.opq8_hive* , because the sub directory .opq8_hive* starts 
with ".",  Hive will not delete it.
hive> set hive.exec.stagingdir=./opq8;
hive>  insert overwrite table test2 select * from test3;
Query ID = mr_20170515143940_ae48a65e-42be-4f50-b974-b713ca902867
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1494818119523_0012, Tracking URL = 
http://zdh77:8088/proxy/application_1494818119523_0012/
Kill Command = /opt/ZDH/parcels/lib/hadoop/bin/hadoop job  -kill 
job_1494818119523_0012
Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0
2017-05-15 14:40:04,547 Stage-1 map = 0%,  reduce = 0%
Ended Job = job_1494818119523_0012
Stage-3 is selected by condition resolver.
Stage-2 is filtered out by condition resolver.
Stage-4 is filtered out by condition resolver.
Moving data to directory 
hdfs://nameservice/hive/test2/.opqt8_hive_2017-05-15_14-39-40_751_1221840798987515724-1/-ext-10000
Loading data to table default.test2
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 26.751 seconds
hive> 

  was:
Set hive.exec.stagingdir=./*,  for example set hive.exec.stagingdir=./opq8.
Then excute a query like this:
insert overwrite table test2 select * from test3; 
You will get the error like this:
hive> set hive.exec.stagingdir=./opq8;
hive> insert overwrite table test2 select * from test3;
Query ID = mr_20170515134831_28ee392d-0d5a-4e47-b80c-dfcd31691b02
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1494818119523_0008, Tracking URL = 
http://zdh77:8088/proxy/application_1494818119523_0008/
Kill Command = /opt/ZDH/parcels/lib/hadoop/bin/hadoop job  -kill 
job_1494818119523_0008
Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0
2017-05-15 13:48:51,487 Stage-1 map = 0%,  reduce = 0%
Ended Job = job_1494818119523_0008
Stage-3 is selected by condition resolver.
Stage-2 is filtered out by condition resolver.
Stage-4 is filtered out by condition resolver.
Moving data to directory 
hdfs://nameservice/hive/test2/opqt8_hive_2017-05-15_13-48-31_558_6151032330134038151-1/-ext-10000
Loading data to table default.test2
Moved: 
'hdfs://nameservice/hive/test2/opqt8_hive_2017-05-15_13-48-31_558_6151032330134038151-1'
 to trash at: hdfs://nameservice/user/mr/.Trash/Current
Failed with exception Unable to move source 
hdfs://nameservice/hive/test2/opqt8_hive_2017-05-15_13-48-31_558_6151032330134038151-1/-ext-10000
 to destination hdfs://nameservice/hive/test2
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.MoveTask. Unable to move source 
hdfs://nameservice/hive/test2/opqt8_hive_2017-05-15_13-48-31_558_6151032330134038151-1/-ext-10000
 to destination hdfs://nameservice/hive/test2
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
hive>

I set hive.exec.stagingdir=./opq8 is a relative path for destination write 
directory  /hive/test2.  Hive will create a temporary directory 
/hive/test2/opq8_hive* for intermediate query results.  Later in the move 
staging, Hive will delete or trash the sub directory under the /hive/test2 
who's name does not begin with "_" or "."  in order to move data to this 
directory. You can see its processing logic in 
org.apache.hadoop.hive.ql.metadata.trashFilesUnderDir.
My modification method is: if  stagingdir is a sub directory of the destination 
write directory. I add a "."   in front of stagingdir. now temporary directory 
will be /hive/test2/.opq8_hive* , because the sub directory .opq8_hive* starts 
with ".",  Hive will not delete it.
hive> set hive.exec.stagingdir=./opq8;
hive>  insert overwrite table test2 select * from test3;
Query ID = mr_20170515143940_ae48a65e-42be-4f50-b974-b713ca902867
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1494818119523_0012, Tracking URL = 
http://zdh77:8088/proxy/application_1494818119523_0012/
Kill Command = /opt/ZDH/parcels/lib/hadoop/bin/hadoop job  -kill 
job_1494818119523_0012
Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0
2017-05-15 14:40:04,547 Stage-1 map = 0%,  reduce = 0%
Ended Job = job_1494818119523_0012
Stage-3 is selected by condition resolver.
Stage-2 is filtered out by condition resolver.
Stage-4 is filtered out by condition resolver.
Moving data to directory 
hdfs://nameservice/hive/test2/.opqt8_hive_2017-05-15_14-39-40_751_1221840798987515724-1/-ext-10000
Loading data to table default.test2
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 26.751 seconds
hive> 


> Set hive.exec.stagingdir a relative directory or a sub directory of 
> distination data directory will cause Hive to delete the intermediate query 
> results
> -------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-16666
>                 URL: https://issues.apache.org/jira/browse/HIVE-16666
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 2.1.1
>            Reporter: yangfang
>            Assignee: yangfang
>            Priority: Critical
>         Attachments: HIVE-16666.1.patch
>
>
> Set hive.exec.stagingdir=./*,  for example set hive.exec.stagingdir=./opq8.
> Then excute a query like this:
> insert overwrite table test2 select * from test3; 
> You will get the error like this:
> hive> set hive.exec.stagingdir=./opq8;
> hive> insert overwrite table test2 select * from test3;
> Query ID = mr_20170515134831_28ee392d-0d5a-4e47-b80c-dfcd31691b02
> Total jobs = 3
> Launching Job 1 out of 3
> Number of reduce tasks is set to 0 since there's no reduce operator
> Starting Job = job_1494818119523_0008, Tracking URL = 
> http://zdh77:8088/proxy/application_1494818119523_0008/
> Kill Command = /opt/ZDH/parcels/lib/hadoop/bin/hadoop job  -kill 
> job_1494818119523_0008
> Hadoop job information for Stage-1: number of mappers: 0; number of reducers: > 0
> 2017-05-15 13:48:51,487 Stage-1 map = 0%,  reduce = 0%
> Ended Job = job_1494818119523_0008
> Stage-3 is selected by condition resolver.
> Stage-2 is filtered out by condition resolver.
> Stage-4 is filtered out by condition resolver.
> Moving data to directory 
> hdfs://nameservice/hive/test2/opqt8_hive_2017-05-15_13-48-31_558_6151032330134038151-1/-ext-10000
> Loading data to table default.test2
> Moved: 
> 'hdfs://nameservice/hive/test2/opqt8_hive_2017-05-15_13-48-31_558_6151032330134038151-1'
>  to trash at: hdfs://nameservice/user/mr/.Trash/Current
> Failed with exception Unable to move source 
> hdfs://nameservice/hive/test2/opqt8_hive_2017-05-15_13-48-31_558_6151032330134038151-1/-ext-10000
>  to destination hdfs://nameservice/hive/test2
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.MoveTask. Unable to move source 
> hdfs://nameservice/hive/test2/opqt8_hive_2017-05-15_13-48-31_558_6151032330134038151-1/-ext-10000
>  to destination hdfs://nameservice/hive/test2
> MapReduce Jobs Launched: 
> Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
> Total MapReduce CPU Time Spent: 0 msec
> hive>
> I set hive.exec.stagingdir=./opq8 is a relative path for destination write 
> directory  /hive/test2.  Hive will create a temporary directory 
> /hive/test2/opq8_hive* for intermediate query results.  Later in the move 
> staging, Hive will delete or trash the sub directory under the /hive/test2 
> who's name does not begin with "_" or "."  in order to move data to this 
> directory. You can see its processing logic in 
> org.apache.hadoop.hive.ql.metadata.trashFilesUnderDir.
> My modification method is: if  stagingdir is a sub directory of the 
> destination write directory. I add a "."   in front of stagingdir. now 
> temporary directory will be /hive/test2/.opq8_hive* , because the sub 
> directory .opq8_hive* starts with ".",  Hive will not delete it.
> hive> set hive.exec.stagingdir=./opq8;
> hive>  insert overwrite table test2 select * from test3;
> Query ID = mr_20170515143940_ae48a65e-42be-4f50-b974-b713ca902867
> Total jobs = 3
> Launching Job 1 out of 3
> Number of reduce tasks is set to 0 since there's no reduce operator
> Starting Job = job_1494818119523_0012, Tracking URL = 
> http://zdh77:8088/proxy/application_1494818119523_0012/
> Kill Command = /opt/ZDH/parcels/lib/hadoop/bin/hadoop job  -kill 
> job_1494818119523_0012
> Hadoop job information for Stage-1: number of mappers: 0; number of reducers: > 0
> 2017-05-15 14:40:04,547 Stage-1 map = 0%,  reduce = 0%
> Ended Job = job_1494818119523_0012
> Stage-3 is selected by condition resolver.
> Stage-2 is filtered out by condition resolver.
> Stage-4 is filtered out by condition resolver.
> Moving data to directory 
> hdfs://nameservice/hive/test2/.opqt8_hive_2017-05-15_14-39-40_751_1221840798987515724-1/-ext-10000
> Loading data to table default.test2
> MapReduce Jobs Launched: 
> Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
> Total MapReduce CPU Time Spent: 0 msec
> OK
> Time taken: 26.751 seconds
> hive> 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16666) Set hive.exec.stagingdir a relative directory or a sub directory of distination data directory will cause Hive to delete the intermediate query results

Reply via email to