[jira] Commented: (HIVE-1665) drop operations may cause file leak

2010-09-29 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916329#action_12916329
 ] 

He Yongqiang commented on HIVE-1665:


If  "2 failed and rolling back 1) also failed", then the data is in trash 
scratch dir and the table's metadata is there.
But "2 failed and rolling back 1) also failed" will rarely happen. Most concern 
here is to deal with hdfs down and housekeeping operations.

For 'mark-then-delete', I think the main problem is there is no administration 
daemon process or helper script for it. 

> drop operations may cause file leak
> ---
>
> Key: HIVE-1665
> URL: https://issues.apache.org/jira/browse/HIVE-1665
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: hive-1665.1.patch
>
>
> Right now when doing a drop, Hive first drops metadata and then drops the 
> actual files. If file system is down at that time, the files will keep not 
> deleted. 
> Had an offline discussion about this:
> to fix this, add a new conf "scratch dir" into hive conf. 
> when doing a drop operation:
> 1) move data to scratch directory
> 2) drop metadata
> 3) if 2) failed, roll back 1) and report error 3.1
> if 2) succeeded, drop data from scratch directory 3.2
> 4) if 3.2 fails, we are ok because we assume the scratch dir will be emptied 
> manually.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1665) drop operations may cause file leak

2010-09-29 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916249#action_12916249
 ] 

Ning Zhang commented on HIVE-1665:
--

What about 2 failed and rolling back 1) also failed? This could happen if the 
CLI got killed at any time between 1) and 2). 

Another option is to use the traditional 'mark-then-delete' trick that you mark 
the partition as deleted on the metastore first and then clean up the data. In 
case of any failure, redoing the drop partiton will resume the data deletion 
process. It is also easier from the administrator's point of view that you can 
periodically check the metastore for deleted partitions (which are left 
uncommitted) and re-drop the partition. 

> drop operations may cause file leak
> ---
>
> Key: HIVE-1665
> URL: https://issues.apache.org/jira/browse/HIVE-1665
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: hive-1665.1.patch
>
>
> Right now when doing a drop, Hive first drops metadata and then drops the 
> actual files. If file system is down at that time, the files will keep not 
> deleted. 
> Had an offline discussion about this:
> to fix this, add a new conf "scratch dir" into hive conf. 
> when doing a drop operation:
> 1) move data to scratch directory
> 2) drop metadata
> 3) if 2) failed, roll back 1) and report error 3.1
> if 2) succeeded, drop data from scratch directory 3.2
> 4) if 3.2 fails, we are ok because we assume the scratch dir will be emptied 
> manually.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1665) drop operations may cause file leak

2010-09-22 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913880#action_12913880
 ] 

Namit Jain commented on HIVE-1665:
--

By default, the scratch dir can be based on date etc. so that it can be easily 
cleaned up

> drop operations may cause file leak
> ---
>
> Key: HIVE-1665
> URL: https://issues.apache.org/jira/browse/HIVE-1665
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: He Yongqiang
>
> Right now when doing a drop, Hive first drops metadata and then drops the 
> actual files. If file system is down at that time, the files will keep not 
> deleted. 
> Had an offline discussion about this:
> to fix this, add a new conf "scratch dir" into hive conf. 
> when doing a drop operation:
> 1) move data to scratch directory
> 2) drop metadata
> 3) if 2) failed, roll back 1) and report error 3.1
> if 2) succeeded, drop data from scratch directory 3.2
> 4) if 3.2 fails, we are ok because we assume the scratch dir will be emptied 
> manually.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.