[ https://issues.apache.org/jira/browse/HIVE-16666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16017592#comment-16017592 ]
Aihua Xu commented on HIVE-16666: --------------------------------- I agree with Peter. We probably should add a validation against the configurations and throw exception before query compilation starts, rather than changing the property value internally. How do you think [~yangfang]? > Set hive.exec.stagingdir a relative directory or a sub directory of > distination data directory will cause Hive to delete the intermediate query > results > ------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: HIVE-16666 > URL: https://issues.apache.org/jira/browse/HIVE-16666 > Project: Hive > Issue Type: Bug > Components: Query Processor > Affects Versions: 3.0.0 > Reporter: yangfang > Assignee: yangfang > Priority: Critical > Attachments: HIVE-16666.1.patch > > > Set hive.exec.stagingdir=./*, for example set hive.exec.stagingdir=./opq8. > Then excute a query like this: > insert overwrite table test2 select * from test3; > You will get the error like this: > hive> set hive.exec.stagingdir=./opq8; > hive> insert overwrite table test2 select * from test3; > Query ID = mr_20170515134831_28ee392d-0d5a-4e47-b80c-dfcd31691b02 > Total jobs = 3 > Launching Job 1 out of 3 > Number of reduce tasks is set to 0 since there's no reduce operator > Starting Job = job_1494818119523_0008, Tracking URL = > http://zdh77:8088/proxy/application_1494818119523_0008/ > Kill Command = /opt/ZDH/parcels/lib/hadoop/bin/hadoop job -kill > job_1494818119523_0008 > Hadoop job information for Stage-1: number of mappers: 0; number of reducers: > 0 > 2017-05-15 13:48:51,487 Stage-1 map = 0%, reduce = 0% > Ended Job = job_1494818119523_0008 > Stage-3 is selected by condition resolver. > Stage-2 is filtered out by condition resolver. > Stage-4 is filtered out by condition resolver. > Moving data to directory > hdfs://nameservice/hive/test2/opqt8_hive_2017-05-15_13-48-31_558_6151032330134038151-1/-ext-10000 > Loading data to table default.test2 > Moved: > 'hdfs://nameservice/hive/test2/opqt8_hive_2017-05-15_13-48-31_558_6151032330134038151-1' > to trash at: hdfs://nameservice/user/mr/.Trash/Current > Failed with exception Unable to move source > hdfs://nameservice/hive/test2/opqt8_hive_2017-05-15_13-48-31_558_6151032330134038151-1/-ext-10000 > to destination hdfs://nameservice/hive/test2 > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.MoveTask. Unable to move source > hdfs://nameservice/hive/test2/opqt8_hive_2017-05-15_13-48-31_558_6151032330134038151-1/-ext-10000 > to destination hdfs://nameservice/hive/test2 > MapReduce Jobs Launched: > Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS > Total MapReduce CPU Time Spent: 0 msec > hive> > hive.exec.stagingdir=./opq8 is a relative path for destination write > directory /hive/test2. Hive will create a temporary directory > /hive/test2/opq8_hive* for intermediate query results. Later in the move > staging, Hive will delete or trash the sub directory under the /hive/test2 > who's name does not begin with "_" or "." in order to move data to this > directory. You can see its processing logic in > org.apache.hadoop.hive.ql.metadata.trashFilesUnderDir. > My modification method is: if stagingdir is a sub directory of the > destination write directory. I add a "." in front of stagingdir. now > temporary directory will be /hive/test2/.opq8_hive* , because the sub > directory .opq8_hive* starts with ".", Hive will not delete it. > hive> set hive.exec.stagingdir=./opq8; > hive> insert overwrite table test2 select * from test3; > Query ID = mr_20170515143940_ae48a65e-42be-4f50-b974-b713ca902867 > Total jobs = 3 > Launching Job 1 out of 3 > Number of reduce tasks is set to 0 since there's no reduce operator > Starting Job = job_1494818119523_0012, Tracking URL = > http://zdh77:8088/proxy/application_1494818119523_0012/ > Kill Command = /opt/ZDH/parcels/lib/hadoop/bin/hadoop job -kill > job_1494818119523_0012 > Hadoop job information for Stage-1: number of mappers: 0; number of reducers: > 0 > 2017-05-15 14:40:04,547 Stage-1 map = 0%, reduce = 0% > Ended Job = job_1494818119523_0012 > Stage-3 is selected by condition resolver. > Stage-2 is filtered out by condition resolver. > Stage-4 is filtered out by condition resolver. > Moving data to directory > hdfs://nameservice/hive/test2/.opqt8_hive_2017-05-15_14-39-40_751_1221840798987515724-1/-ext-10000 > Loading data to table default.test2 > MapReduce Jobs Launched: > Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS > Total MapReduce CPU Time Spent: 0 msec > OK > Time taken: 26.751 seconds > hive> -- This message was sent by Atlassian JIRA (v6.3.15#6346)