[ 
https://issues.apache.org/jira/browse/FALCON-321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghav Kumar Gautam updated FALCON-321:
---------------------------------------

    Description: 
In FeedEvictor.java we have:
<code:java>
private void deleteParentIfEmpty(FileSystem fs, Path parent, Path feedBasePath) 
throws IOException {
        if (feedBasePath.equals(parent)) {
            LOG.info("Not deleting feed base path:" + parent);
        } else {
            if (fs.getContentSummary(parent).getFileCount() == 0) {
                LOG.info("Parent path: " + parent + " is empty, deleting path");
                if (fs.delete(parent, true)) {
                    LOG.info("Deleted empty dir: " + parent);
                } else {
                    throw new IOException("Unable to delete parent path:" + 
parent);
                }
                deleteParentIfEmpty(fs, parent.getParent(), feedBasePath);
            }
        }
    }
</code>

In the fs.getContentSummary(parent).getFileCount() call if the parent has no 
files but has directories then we delete the parent directory. Which is 
incorrect.

Here is log from falcon-regression's RetentionTest.testRetention(parameters: 
hours, 24, true, daily) :
<quote>
2014-02-24 15:09:45,034 INFO [main] org.apache.falcon.retention.FeedEvictor: 
Applying retention on 
DATA=hdfs://raghav5-falcon-5.cs1cloud.internal:8020/retention/testFolders/${YEAR}/${MONTH}/${DAY}/${HOUR}#META=hdfs://raghav5-falcon-5.cs1cloud.internal:8020/projects/ivory/clicksMetaData#STATS=hdfs://raghav5-falcon-5.cs1cloud.internal:8020/projects/ivory/clicksStats#TMP=/tmp
 type: instance, Limit: hours(24), timezone: UTC, frequency: hours, 
storageFILESYSTEM
2014-02-24 15:09:45,051 INFO [main] org.apache.falcon.retention.FeedEvictor: 
Normalized path : /retention/testFolders/${YEAR}/${MONTH}/${DAY}/${HOUR}
2014-02-24 15:09:45,123 INFO [main] org.apache.falcon.retention.FeedEvictor: 
Searching for /retention/testFolders/*/*/*/*
2014-02-24 15:09:45,486 INFO [main] org.apache.falcon.retention.FeedEvictor: 
Deleted instance :/retention/testFolders/2014/01/21/00
2014-02-24 15:09:45,500 INFO [main] org.apache.falcon.retention.FeedEvictor: 
Parent path: /retention/testFolders/2014/01/21 is empty, deleting path
2014-02-24 15:09:45,509 INFO [main] org.apache.falcon.retention.FeedEvictor: 
Deleted empty dir: /retention/testFolders/2014/01/21
2014-02-24 15:09:45,511 INFO [main] org.apache.falcon.retention.FeedEvictor: 
Parent path: /retention/testFolders/2014/01 is empty, deleting path
2014-02-24 15:09:45,517 INFO [main] org.apache.falcon.retention.FeedEvictor: 
Deleted empty dir: /retention/testFolders/2014/01
2014-02-24 15:09:45,518 INFO [main] org.apache.falcon.retention.FeedEvictor: 
Parent path: /retention/testFolders/2014 is empty, deleting path
2014-02-24 15:09:45,525 INFO [main] org.apache.falcon.retention.FeedEvictor: 
Deleted empty dir: /retention/testFolders/2014
2014-02-24 15:09:45,526 INFO [main] org.apache.falcon.retention.FeedEvictor: 
Not deleting feed base path:/retention/testFolders
</quote>

  was:
In FeedEvictor.java we have:
<code: java>
private void deleteParentIfEmpty(FileSystem fs, Path parent, Path feedBasePath) 
throws IOException {
        if (feedBasePath.equals(parent)) {
            LOG.info("Not deleting feed base path:" + parent);
        } else {
            if (fs.getContentSummary(parent).getFileCount() == 0) {
                LOG.info("Parent path: " + parent + " is empty, deleting path");
                if (fs.delete(parent, true)) {
                    LOG.info("Deleted empty dir: " + parent);
                } else {
                    throw new IOException("Unable to delete parent path:" + 
parent);
                }
                deleteParentIfEmpty(fs, parent.getParent(), feedBasePath);
            }
        }
    }
</code>

In the fs.getContentSummary(parent).getFileCount() call if the parent has no 
files but has directories then we delete the parent directory. Which is 
incorrect.

Here is log from falcon-regression's RetentionTest.testRetention(parameters: 
hours, 24, true, daily) :
<quote>
2014-02-24 15:09:45,034 INFO [main] org.apache.falcon.retention.FeedEvictor: 
Applying retention on 
DATA=hdfs://raghav5-falcon-5.cs1cloud.internal:8020/retention/testFolders/${YEAR}/${MONTH}/${DAY}/${HOUR}#META=hdfs://raghav5-falcon-5.cs1cloud.internal:8020/projects/ivory/clicksMetaData#STATS=hdfs://raghav5-falcon-5.cs1cloud.internal:8020/projects/ivory/clicksStats#TMP=/tmp
 type: instance, Limit: hours(24), timezone: UTC, frequency: hours, 
storageFILESYSTEM
2014-02-24 15:09:45,051 INFO [main] org.apache.falcon.retention.FeedEvictor: 
Normalized path : /retention/testFolders/${YEAR}/${MONTH}/${DAY}/${HOUR}
2014-02-24 15:09:45,123 INFO [main] org.apache.falcon.retention.FeedEvictor: 
Searching for /retention/testFolders/*/*/*/*
2014-02-24 15:09:45,486 INFO [main] org.apache.falcon.retention.FeedEvictor: 
Deleted instance :/retention/testFolders/2014/01/21/00
2014-02-24 15:09:45,500 INFO [main] org.apache.falcon.retention.FeedEvictor: 
Parent path: /retention/testFolders/2014/01/21 is empty, deleting path
2014-02-24 15:09:45,509 INFO [main] org.apache.falcon.retention.FeedEvictor: 
Deleted empty dir: /retention/testFolders/2014/01/21
2014-02-24 15:09:45,511 INFO [main] org.apache.falcon.retention.FeedEvictor: 
Parent path: /retention/testFolders/2014/01 is empty, deleting path
2014-02-24 15:09:45,517 INFO [main] org.apache.falcon.retention.FeedEvictor: 
Deleted empty dir: /retention/testFolders/2014/01
2014-02-24 15:09:45,518 INFO [main] org.apache.falcon.retention.FeedEvictor: 
Parent path: /retention/testFolders/2014 is empty, deleting path
2014-02-24 15:09:45,525 INFO [main] org.apache.falcon.retention.FeedEvictor: 
Deleted empty dir: /retention/testFolders/2014
2014-02-24 15:09:45,526 INFO [main] org.apache.falcon.retention.FeedEvictor: 
Not deleting feed base path:/retention/testFolders
</quote>


> Feed evictor deleting more stuff than it should
> -----------------------------------------------
>
>                 Key: FALCON-321
>                 URL: https://issues.apache.org/jira/browse/FALCON-321
>             Project: Falcon
>          Issue Type: Bug
>            Reporter: Raghav Kumar Gautam
>
> In FeedEvictor.java we have:
> <code:java>
> private void deleteParentIfEmpty(FileSystem fs, Path parent, Path 
> feedBasePath) throws IOException {
>         if (feedBasePath.equals(parent)) {
>             LOG.info("Not deleting feed base path:" + parent);
>         } else {
>             if (fs.getContentSummary(parent).getFileCount() == 0) {
>                 LOG.info("Parent path: " + parent + " is empty, deleting 
> path");
>                 if (fs.delete(parent, true)) {
>                     LOG.info("Deleted empty dir: " + parent);
>                 } else {
>                     throw new IOException("Unable to delete parent path:" + 
> parent);
>                 }
>                 deleteParentIfEmpty(fs, parent.getParent(), feedBasePath);
>             }
>         }
>     }
> </code>
> In the fs.getContentSummary(parent).getFileCount() call if the parent has no 
> files but has directories then we delete the parent directory. Which is 
> incorrect.
> Here is log from falcon-regression's RetentionTest.testRetention(parameters: 
> hours, 24, true, daily) :
> <quote>
> 2014-02-24 15:09:45,034 INFO [main] org.apache.falcon.retention.FeedEvictor: 
> Applying retention on 
> DATA=hdfs://raghav5-falcon-5.cs1cloud.internal:8020/retention/testFolders/${YEAR}/${MONTH}/${DAY}/${HOUR}#META=hdfs://raghav5-falcon-5.cs1cloud.internal:8020/projects/ivory/clicksMetaData#STATS=hdfs://raghav5-falcon-5.cs1cloud.internal:8020/projects/ivory/clicksStats#TMP=/tmp
>  type: instance, Limit: hours(24), timezone: UTC, frequency: hours, 
> storageFILESYSTEM
> 2014-02-24 15:09:45,051 INFO [main] org.apache.falcon.retention.FeedEvictor: 
> Normalized path : /retention/testFolders/${YEAR}/${MONTH}/${DAY}/${HOUR}
> 2014-02-24 15:09:45,123 INFO [main] org.apache.falcon.retention.FeedEvictor: 
> Searching for /retention/testFolders/*/*/*/*
> 2014-02-24 15:09:45,486 INFO [main] org.apache.falcon.retention.FeedEvictor: 
> Deleted instance :/retention/testFolders/2014/01/21/00
> 2014-02-24 15:09:45,500 INFO [main] org.apache.falcon.retention.FeedEvictor: 
> Parent path: /retention/testFolders/2014/01/21 is empty, deleting path
> 2014-02-24 15:09:45,509 INFO [main] org.apache.falcon.retention.FeedEvictor: 
> Deleted empty dir: /retention/testFolders/2014/01/21
> 2014-02-24 15:09:45,511 INFO [main] org.apache.falcon.retention.FeedEvictor: 
> Parent path: /retention/testFolders/2014/01 is empty, deleting path
> 2014-02-24 15:09:45,517 INFO [main] org.apache.falcon.retention.FeedEvictor: 
> Deleted empty dir: /retention/testFolders/2014/01
> 2014-02-24 15:09:45,518 INFO [main] org.apache.falcon.retention.FeedEvictor: 
> Parent path: /retention/testFolders/2014 is empty, deleting path
> 2014-02-24 15:09:45,525 INFO [main] org.apache.falcon.retention.FeedEvictor: 
> Deleted empty dir: /retention/testFolders/2014
> 2014-02-24 15:09:45,526 INFO [main] org.apache.falcon.retention.FeedEvictor: 
> Not deleting feed base path:/retention/testFolders
> </quote>



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to