Hi,

since Eric requested a parallel review, let's go for it. Overall, I am
of the opinion that the patch could be pushed as it is, with a few
things fixed in a followup patch. I am pointing out some possible
stylistic cleanups below, and some refactoring opportunities. I'll be
happier if you pick up on those (be it as a followup or through
amendment), although I am not going to be terribly unhappy if the patch
stays as it is.

Yours,
   Petr.

Resolve issue 1599: automatically expire unused caches
------------------------------------------------------

[export/import stuff]

> hunk ./src/Darcs/Global.hs 154
>  resetCRCWarnings :: IO ()
>  resetCRCWarnings = writeIORef _crcWarningList []
>  
> +{- NOINLINE _badSourcesList -}
> +_badSourcesList :: IORef [String]
> +_badSourcesList = unsafePerformIO $ newIORef []
> +
> +addBadSource :: String -> IO ()
> +addBadSource cache = modifyIORef _badSourcesList (cache:)
> +
> +getBadSourcesList :: IO [String]
> +getBadSourcesList = readIORef _badSourcesList
> +
> +isBadSource :: IO (String -> Bool)
> +isBadSource = do badSources <- getBadSourcesList
> +                 return (`elem` badSources)
> +
> +{- NOINLINE _reachableSourcesList -}
> +_reachableSourcesList :: IORef [String]
> +_reachableSourcesList = unsafePerformIO $ newIORef []
> +
> +addReachableSource :: String -> IO ()
> +addReachableSource src = modifyIORef _reachableSourcesList (src:)
> +
> +getReachableSources :: IO [String]
> +getReachableSources = readIORef _reachableSourcesList
> +
> +isReachableSource :: IO (String -> Bool)
> +isReachableSource =  do reachableSources <- getReachableSources
> +                        return (`elem` reachableSources)
> +
>  darcsdir :: String
>  darcsdir = "_darcs"
Some global bookkeeping. Doesn't make me very happy, but for now it
happens to be the best tradeoff we can get.

[import/export wibbles for Darcs.Repository]

[import/export bits of Darcs.Repository.Cache]

> hunk ./src/Darcs/Repository/Cache.hs 63
>  
>  data WritableOrNot = Writable | NotWritable deriving ( Show )
>  data CacheType = Repo | Directory deriving ( Eq, Show )
> -data CacheLoc = Cache !CacheType !WritableOrNot !String
> +data CacheLoc = Cache { cacheType:: !CacheType, cacheWritable:: 
> !WritableOrNot, cacheSource:: !String }
>  newtype Cache = Ca [CacheLoc] -- abstract type for hiding cache
>  
>  instance Eq CacheLoc where
> hunk ./src/Darcs/Repository/Cache.hs 209
>      do debugMessage $ "I'm doing copyFileUsingCache on "++(hashedDir 
> subdir)++"/"++f
>         Just stickItHere <- cacheLoc cache
>         createDirectoryIfMissing False (reverse $ dropWhile (/='/') $ reverse 
> stickItHere)
> -       sfuc cache stickItHere
> +       cacheFiltered <- filterBadSources cache
> +       sfuc cacheFiltered stickItHere
Filter the cache. I'd probably write

filterBadSources cache >>= (flip sfuc) stickItHere

(unless you need the cacheFiltered identifier somewhere later, but I
don't see that)

This could go in as an extra patch, if you feel like changing it.

>      `catchall` return ()
>      where cacheLoc [] = return Nothing
>            cacheLoc (c:cs) | not $ writable c = cacheLoc cs
> hunk ./src/Darcs/Repository/Cache.hs 221
>                                 case othercache of Just x -> return $ Just x
>                                                    Nothing -> return $ Just 
> (fn c)
>            sfuc [] _ = return ()
> -          sfuc (c:cs) out | not $ writable c =
> +          sfuc (c:cs) out
> +            | not (writable c) =
>                if oos == OnlySpeculate
> hunk ./src/Darcs/Repository/Cache.hs 224
> -                 then speculateFileOrUrl (fn c) out
> -                 else copyFileOrUrl DefaultRemoteDarcs (fn c) out Cachable
> -                          | otherwise = sfuc cs out
> +               then speculateFileOrUrl (fn c) out `catchNonSignal` (\e -> 
> checkCacheReachability (show e) c)
> +               else copyFileOrUrl DefaultRemoteDarcs (fn c) out Cachable 
> `catchNonSignal` (\e -> checkCacheReachability (show e) c)
> +            | otherwise = sfuc cs out
OK. Some more checking of source validity.

>            fn c = hashedFilePath c subdir f
>  
>  copyFilesUsingCache :: OrOnlySpeculate -> Cache -> HashedDir -> [String] -> 
> IO ()
> hunk ./src/Darcs/Repository/Cache.hs 236
>  
>  data FromWhere = LocalOnly | Anywhere deriving ( Eq )
>  
> +-- | Checks if a given cache entry is reachable or not.
> +-- It receives an error caught during execution and the cache entry.
> +-- If the caches is not reachable it is blacklisted and not longer tried for
> +-- the rest of the session. If it is reachable it is whitelisted and future 
> errors with such
> +-- cache get ignore.
> +-- To determine reachability:
> +--  * For a local cache, if the given source doesn't exist anymore, it is 
> blacklisted.
> +--  * For remote sources if the error is timeout, it is blacklisted, if not,
> +--    it checks if _darcs/hashed_inventory  exist, if it does, the entry is 
> whitelisted, if
> +--    it doesn't, it is blacklisted.
> +checkCacheReachability :: String -> CacheLoc -> IO ()
> +checkCacheReachability e cache
> + | isFile source = do
> +     reachable <- isReachableSource
> +     unless (reachable source) $ do
> +       exist <- doesDirectoryExist source
> +       if exist
> +        then
> +         addReachableSource source
> +        else
> +         addBadSource source
> + | isUrl source = do
> +     reachable <- isReachableSource
> +     unless (reachable source) $ do
> +            let string = case dropWhile (/='(') e of
> +                          (_:xs) -> fst (break (==')') xs)
> +                          _      -> e
> +            let cerror = case reads string ::[(HTTP.ConnectionError,String)] 
> of
> +                           [(ce,_)] -> Just ce
> +                           _        -> Nothing
> +            if isJust cerror
> +             then addBadSource source
> +             else checkFileReachability
> +
> + | isSsh source = do
> +   reachable <- isReachableSource
> +   unless (reachable source) checkFileReachability
> +
> + | otherwise = fail $ "unknown transport protocol for: " ++ source
> + where source = cacheSource cache
> +       checkFileReachability = do
> +         reachable <- checkHashedInventoryReachability cache
> +         if reachable
> +          then
> +           addReachableSource source
> +          else
> +           addBadSource source
OK. I am not exactly excited about the abuse of String here, although
IIRC you have discussed that with Eric already. I won't dither on this
too much.

> +-- | Returns a list of reachables cache entries,
> +--   taking out the blacklisted entries
> +filterBadSources :: [CacheLoc] -> IO [CacheLoc]
> +filterBadSources cache = do
> +   badSource <- isBadSource
> +   return $ filter (not . badSource . cacheSource) cache
OK.


> +-- | Checks if the  _darcs/hashed_inventory exist and is reachable
> +checkHashedInventoryReachability :: CacheLoc -> IO Bool
> +checkHashedInventoryReachability cache =
> +     withTemp $ \tempout -> do
> +       let f = cacheSource cache ++ "/" ++darcsdir ++ "/" ++ 
> "hashed_inventory"
> +       copyFileOrUrl DefaultRemoteDarcs f tempout Cachable
> +       return True
> +     `catchNonSignal` (\_ -> return False)
You could use System.FilePath.Posix ( (</>) ) above (note the _Posix_
there). Other than that, OK.

>  fetchFileUsingCachePrivate :: FromWhere -> Cache -> HashedDir -> String -> 
> IO (String, B.ByteString)
>  fetchFileUsingCachePrivate fromWhere (Ca cache) subdir f =
>      do when (fromWhere == Anywhere) $ copyFileUsingCache ActuallyCopy (Ca 
> cache) subdir f
> hunk ./src/Darcs/Repository/Cache.hs 303
> -       ffuc cache
> -    `catchall` debugFail ("Couldn't fetch `"++f++"'\nin subdir "++(hashedDir 
> subdir)++
> +       cacheFiltered <- filterBadSources cache
> +       ffuc cacheFiltered
> +    `catchall` debugFail ("Couldn't fetch `"++f++"'\nin subdir "++ hashedDir 
> subdir ++
>                            " from sources:\n\n"++show (Ca cache))
OK. Filter the cache. Why not (filterBadSources cache >>= ffuc)?

>      where ffuc (c:cs)
>             | not (writable c) && (Anywhere == fromWhere || isFile (fn c)) =
> hunk ./src/Darcs/Repository/Cache.hs 320
>                                      fail $ "Hash failure in " ++ fn c
>                              return (fn c, x')
>                      else return (fn c, x) -- FIXME: create links in caches
> -              `catchall` ffuc cs
> +              `catchNonSignal` (\e -> do
> +                                      checkCacheReachability (show e) c
> +                                      cacheFiltered <- filterBadSources cs
> +                                      ffuc cacheFiltered)
Another place to check and filter the cache.

Makes me wonder if you wouldn't like to write something like

    checkCaches (show e) c cs >>= ffuc

instead...

>  
>             | writable c =
>                do x1 <- gzFetchFilePS (fn c) Cachable
> hunk ./src/Darcs/Repository/Cache.hs 337
>                        else return x1
>                   mapM_ (tryLinking (fn c)) cs
>                   return (fn c, x)
> -              `catchall` do (fname,x) <- ffuc cs
> -                            do createCache c subdir
> -                               createLink fname (fn c)
> -                               return (fn c, x)
> -                             `catchall`
> -                             do gzWriteFilePS (fn c) x `catchall` return ()
> -                                return (fname,x)
> +              `catchNonSignal` (\ e ->
> +                                 do
> +                                   checkCacheReachability (show e) c
> +                                   cacheFiltered <- filterBadSources cs
> +                                   (fname,x) <- ffuc cacheFiltered
> +                                   do createCache c subdir
> +                                      createLink fname (fn c)
> +                                      return (fn c, x)
> +                                    `catchall`
> +                                      do gzWriteFilePS (fn c) x `catchall` 
> return ()
> +                                         return (fname,x))
Looks equivalent, plus fixes up the cache list on errors plus filters
the caches some more. May I add that fetchFileUsingCache is in a dire
need of refactor? If you have a bit of time later, I'd welcome that.

>             | otherwise = ffuc cs
>  
>            ffuc [] = debugFail $ "No sources from which to fetch file 
> `"++f++"'\n"++ show (Ca cache)

> hunk ./src/Darcs/Repository/Cache.hs 409
> +-- | Prints an error message with a list of bad caches.
> +reportBadSources :: IO ()
> +reportBadSources = do
> +  sources <- getBadSourcesList
> +  let size = length sources
> +  unless (null sources) $ do
> +   hPutStrLn stderr $
> +    concat ["\nI could not reach the following ",
> +            englishNum size  (Noun "repository") ":"]
> +   hPutStrLn stderr $
> +    concat [unlines sources,
> +            "If you're not using ",
> +            englishNum size It ", you should probably delete\nthe 
> corresponding ",
> +            englishNum size (Noun "entry") " from _darcs/prefs/sources."]
(I know Eric made you use concat here... I'd go with unlines
personally. You can do this in an extra patch if you want to, or just
forget it otherwise.)

[export]
> hunk ./src/HTTP.hs 22
>  import qualified Data.ByteString.Char8 as BC
>  #endif
>  
> +data ConnectionError = CouldNotResolveHost     |
> +                       CouldNotConnectToServer |
> +                       OperationTimeout
> +               deriving (Eq, Read, Show)
Enumerate some common connection errors for use with above.

> +
>  fetchUrl :: String -> IO String
>  postUrl
>      :: String     -- ^ url
> hunk ./src/HTTP.hs 35
>      -> IO ()  -- ^ result
>  
>  requestUrl :: String -> FilePath -> a -> IO String
> -waitNextUrl :: IO (String, String)
> +waitNextUrl :: IO (String, String, Maybe ConnectionError)
This Maybe ConnectionError feels a bit backwards, but OK.

>  #ifdef HAVE_HTTP
>  
> hunk ./src/HTTP.hs 94
>  waitNextUrl = do
>    (u, f) <- readIORef requestedUrl
>    if null u
> -     then return ("", "No URL requested")
> +     then return ("", "No URL requested", Nothing)
>       else do writeIORef requestedUrl ("", "")
>               e <- (fetchUrl u >>= \s -> B.writeFile f (BC.pack s) >> return 
> "") `catch` h
> hunk ./src/HTTP.hs 97
> -             return (u, e)
> +             let ce = case e of
> +                       "timeout" -> Just OperationTimeout
> +                       _         -> Nothing
> +             return (u, e, ce)
Extract a possible connection error. OK.

>      where h = return . ioeGetErrorString
>  
>  getProxy :: IO String

[imports again]
> hunk ./src/URL.hs 225
>    let l = pipeLength st
>    when (l > 0) $ do
>                  dbg "URL.waitNextUrl start"
> -                (u, e) <- liftIO $ waitNextUrl'
> +                (u, e, ce) <- liftIO $ waitNextUrl'
>                  let p = inProgress st
>                      new_st = st { inProgress = Map.delete u p
>                                  , pipeLength = l - 1 }
> hunk ./src/URL.hs 240
>                           else case Map.lookup u p of
>                                  Just (f, _, _) -> do
>                                    removeFileMayNotExist 
> (f++"-new_"++randomJunk st)
> -                                  downloadComplete u e
> +                                  case ce of
> +                                    Just httpError -> downloadComplete u 
> (show httpError)
> +                                    Nothing        -> downloadComplete u e
Pass errors from waitNextUrl down the pipe. OK.

>                                    debugMessage $ "URL.waitNextUrl failed: "++
>                                                 u++" "++f++" "++e
>                                  Nothing -> bug $ "Another possible bug in 
> URL.waitNextUrl: "++u++" "++e
> hunk ./src/URL.hs 265
>                   Just var -> do
>                          e <- readMVar var
>                          modifyMVar_ urlNotifications (return . (Map.delete 
> u))
> -                        unless (null e) (debugFail $ "Failed to download URL 
> "++u++": "++e)
> +                        unless (null e) $ do
> +                          debugMessage $ "Failed to download URL "++u++": 
> "++e
> +                          fail e
Is this any different?

>                   Nothing  -> return () -- file was already downloaded
>  
>  dbg :: String -> StateT a IO ()
> hunk ./src/URL.hs 293
>  
>  setDebugHTTP :: IO ()
>  requestUrl :: String -> FilePath -> Cachable -> IO String
> -waitNextUrl' :: IO (String, String)
> +waitNextUrl' :: IO (String, String, Maybe ConnectionError)
>  pipeliningEnabled :: IO Bool
>  
>  #ifdef HAVE_CURL
> hunk ./src/URL.hs 306
>        err <- curl_request_url ustr fstr (cachableToInt cache) >>= peekCString
>        return err
>  
> -waitNextUrl' = do
> -  e <- curl_wait_next_url >>= peekCString
> -  u <- curl_last_url >>= peekCString
> -  return (u, e)
> +waitNextUrl' =
> +  bracket malloc free $ \ errorPointer -> do
> +    e <- curl_wait_next_url errorPointer >>= peekCString
> +    ce <- if not (null e)
> +          then do
> +           errorNum <- peek errorPointer
> +           case errorNum of
> +             6  -> return $ Just CouldNotResolveHost
> +             7  -> return $ Just CouldNotConnectToServer
> +             29 -> return $ Just OperationTimeout
> +             _  -> return Nothing
> +          else
> +           return Nothing
> +    u <- curl_last_url >>= peekCString
> +    return (u, e, ce)
Oi, ugly. Well, that's the destiny of FFI code... Just translates cURL
error codes to something more malleable (ConnectionError). (I'd probably
use alloca myself, but bracket malloc free is probably equivalent.)

>  pipeliningEnabled = do
>    r <- curl_pipelining_enabled
> hunk ./src/URL.hs 330
>    curl_request_url :: CString -> CString -> CInt -> IO CString
>  
>  foreign import ccall "hscurl.h curl_wait_next_url"
> -  curl_wait_next_url :: IO CString
> +  curl_wait_next_url :: Ptr CInt -> IO CString
>  
>  foreign import ccall "hscurl.h curl_last_url"
>    curl_last_url :: IO CString
FFI bits. OK.

[imports for darcs.hs]
> hunk ./src/darcs.hs 53
>  main = withAtexit $ withSignalsHandled $
>    flip catch execExceptionHandler $
>    handle (\(AssertionFailed e) -> bug e) $ do
> +  atexit reportBadSources
>    argv <- getArgs
>    case argv of
>      -- User called "darcs" without arguments.
[OK] Register bad sources reporting to run at exit time.

> hunk ./src/hscurl.c 253
>    return error_strings[RESULT_OK];
>  }
>  
> -const char *curl_wait_next_url()
> +const char *curl_wait_next_url(int* errorCode)
>  {
> hunk ./src/hscurl.c 255
> +  *errorCode = -1;
> +
>    if (last_url != NULL)
>      {
>        free(last_url);
> hunk ./src/hscurl.c 280
>        CURLcode result = msg->data.result;
>        struct UrlData *url_data;
>        int error = curl_easy_getinfo(easy, CURLINFO_PRIVATE, (char 
> **)&url_data);
> -      if (error != CURLE_OK)
> +      if (error != CURLE_OK){
Iew, that ){ is hurting my eyes. A bit of whitespace couldn't hurt! :)

> +        *errorCode = error;
>          return curl_easy_strerror(error);
> hunk ./src/hscurl.c 283
> +      }
>  
>        last_url = url_data->url;
>        fclose(url_data->file);
> hunk ./src/hscurl.c 295
>          return curl_multi_strerror(error);
>        curl_easy_cleanup(easy);
>  
> -      if (result != CURLE_OK)
> +      if (result != CURLE_OK){
> +        *errorCode = result;
>          return curl_easy_strerror(result);
> hunk ./src/hscurl.c 298
> +      }
>      }
>    else
>      return error_strings[RESULT_UNKNOWN_MESSAGE];
> hunk ./src/hscurl.h 5
>                               const char *filename,
>                               int cache_time);
>  
> -const char *curl_wait_next_url();
> +const char *curl_wait_next_url(int *errorCode);
>  
>  const char *curl_last_url();
OK.

_______________________________________________
darcs-users mailing list
[email protected]
http://lists.osuosl.org/mailman/listinfo/darcs-users

Reply via email to