Hi,

Alexey Levan <[email protected]> writes:
> Amendment to cache-related packs code, along with a couple more fixes.
> Tarball contents now gets hadlinked to/from cache; filenames starting with
> "tmp-" in basic.tar.gz are reserved for future use, tmp-{pristine,inventories}
> renamed to tmp-filelist-{pristine,inventories}; irrelevant inventories are no
> longer packed to basic tarball (that should resolve issue1889, currently there
> is 90% of waste in basic.tar.gz on darcs.net).

Some high-level remarks first: I would like the "tmp-" prefix to be
changed to something meaningful. Probably "meta-".

> Tarballs are still have to be downloaded completely before their content can
> be read; there seems to be no way current darcs code can be used to download
> files in stream-like fashion, and getting this functionality would require
> substantional refactoring/rewrite of Darcs.URL.

Ok, I agree this is not completely easy with the way URL (not to be
confused with Darcs.URL!) works. My idea was that we would just use the
HTTP library directly in fetchFileLazyPS when the thing to fetch was an
HTTP url and fall back to copyAndReadFile otherwise (which would still
involve a copy).

The code would look something like (just drafted in this mail, may not
compile nor work):

fetchFileLazyPS url c
    | isUrl url && "http" `isPrefixOf` url = do -- maybe need to case-convert 
url here
        rsp <- fetch repo_url
        unless (rspCode rsp == (2, 0, 0)) $ fallback -- just in case
        return $ rspBody rsp
    | otherwise = fallback
  where fallback = copyAndReadFile BL.readFile url c
        fetch url =
          do (_, rsp) <- browse $ do setCheckForProxy True
                                     setOutHandler (const $ return ())
                                     request (mkRequest GET url)
             return rsp

I guess functionality like this would be rather useful for the remaining
code in these patches.

Overall, I am going to accept this bundle, however I am still requesting
a few fixes to be done on top. One is the "tmp-" -> "meta-" (or
something more suitable) rename I mentioned above. Another would be to
make the tarball unpacking more tolerant/flexible.

Something like

unpackBasic cache tar@(Tar.Next x xs) lists = do
   withTarFile x $ \p c -> case p of
       _ | "meta-" `isPrefixOf` takeFileName p -> processMeta cache tar lists
         | otherwise -> unpackFile p c xs
   unpackBasic cache xs lists

processMeta cache (Tar.Next x xs) lists = do
   lists' <- withTarFile x process_one_meta
   if (have_both lists) then ... forkIO ...
                        else unpackBasic cache xs lists'
  where process_one_meta p c = ... -- update "lists"

i.e. you process files as they come and unpack them as they come and
when you run into the right "meta-" files you use them (if you know how)
or discard them (if you don't).

That way, the only requirement for the basic tarball is that it contains
_darcs/hashed_inventory (anywhere) and _darcs/pristine.hashed and that
any files that should not end up under _darcs are prefixed with meta-
(these have special meaning for the unpacker). Hopefully, these
precautions will make our life in the future a lot easier, if we need to
change how the basic tarball is built (say we decide that we want to add
a meta- file at the end of the basic tarball to carry the filecache or
something akin to that -- a darcs that doesn't know about filecache
should just discard that meta- file, and it should certainly not crash
upon doing a "get"). (Sorry I forgot the new name for filecache... was
that patchindex?)

I think we want to have both these before releasing a stable darcs that
can use packs for download.

Some more comments are interspersed in the patches below.

I am going to compile & test the patch soon-ish and then push. Please
record any further changes as new patches and send them as a new bundle.

Yours,
   Petr.

Hardlink files while getting a packed repository
------------------------------------------------
> hunk ./src/Darcs/Repository.hs 49
>      ) where
>  
>  import System.Exit ( ExitCode(..), exitWith )
> -import Data.List ( isSuffixOf )
> +import Data.List ( isPrefixOf )
>  import Data.Maybe( catMaybes )
>  
>  import Darcs.Repository.State( readRecorded, readUnrecorded, readWorking, 
> unrecordedChanges
> hunk ./src/Darcs/Repository.hs 79
>      )
>  import Darcs.Repository.Merge( tentativelyMergePatches, 
> considerMergeToWorking )
>  import Darcs.Repository.Cache ( unionRemoteCaches, fetchFileUsingCache,
> -                                speculateFileUsingCache, HashedDir(..), 
> Cache(..), CacheLoc(..), WritableOrNot(..))
> +                                speculateFileUsingCache, HashedDir(..), 
> Cache(..), CacheLoc(..), WritableOrNot(..), CacheType(Directory) )
>  import Darcs.Patch.Set ( PatchSet(..), SealedPatchSet, newset2RL, newset2FL, 
> progressPatchSet )
>  #ifdef GADT_WITNESSES
>  import Darcs.Patch.Set ( Origin )
> hunk ./src/Darcs/Repository.hs 87
>  import URL ( maxPipelineLength )
>  
>  import Control.Applicative ( (<$>) )
> -import Control.Monad ( unless, when )
> +import Control.Monad ( unless, when , (>=>) )
>  import System.Directory ( createDirectory, renameDirectory,
> hunk ./src/Darcs/Repository.hs 89
> -                          createDirectoryIfMissing, renameFile )
> +                          createDirectoryIfMissing, renameFile,
> +                          doesFileExist, removeFile, getDirectoryContents )
>  import System.IO.Error ( isAlreadyExistsError )
> hunk ./src/Darcs/Repository.hs 92
> +import System.Posix.Files ( createLink )
>  
>  import qualified Darcs.Repository.DarcsRepo as DarcsRepo
>  import qualified Darcs.Repository.HashedRepo as HashedRepo
> hunk ./src/Darcs/Repository.hs 132
>  import Storage.Hashed( writePlainTree )
>  import ByteStringUtils( gzReadFilePS )
>  
> -import System.FilePath( (</>) )
> +import System.FilePath( (</>), takeFileName, splitPath, joinPath
> +                      , takeDirectory )
>  import qualified Codec.Archive.Tar as Tar
>  import Codec.Compression.GZip ( compress, decompress )
>  import qualified Data.ByteString.Char8 as BS
> hunk ./src/Darcs/Repository.hs 290
>    createDirectoryIfMissing False $ toDir </> darcsdir </> "pristine.hashed"
>    createDirectoryIfMissing False $ toDir </> darcsdir </> "patches"
>    copySources toRepo fromDir
> +  Repo _ _ _ (DarcsRepository _ toCache3) <-
> +    identifyRepositoryFor toRepo "."
> +  let
> +    cs = case toCache3 of
> +      Ca cs' -> catMaybes . flip map cs' $ \x -> case x of
> +        Cache Directory Writable x' -> Just x'
> +        _ -> Nothing
> +    ca = if not (null cs) then Just (head cs) else Nothing
>    -- unpack inventory & pristine cache
> hunk ./src/Darcs/Repository.hs 299
> -  writeCompressed . Tar.read $ decompress b
> +  procBasicTar ca . Tar.read $ decompress b
>    createPristineDirectoryTree toRepo "."
>    -- pull new patches
>    us <- readRepo toRepo
> hunk ./src/Darcs/Repository.hs 315
>    -- get old patches
>    unless (any (`elem` opts) [Partial, Lazy, Ephemeral]) $ do
>      putInfo "Copying patches, to get lazy repository hit ctrl-C..."
> -    writeCompressed . Tar.read . decompress =<< fetchFileLazyPS 
> (fromPacksDir ++
> +    procPatches ca . Tar.read . decompress =<< fetchFileLazyPS (fromPacksDir 
> ++
>        "patches.tar.gz") Uncachable
>   where
> hunk ./src/Darcs/Repository.hs 318
> -  writeCompressed Tar.Done = return ()
> -  writeCompressed (Tar.Next x xs) = case Tar.entryContent x of
> -    Tar.NormalFile x' _ -> do
> -      let p = Tar.entryPath x
> -      withTemp $ \p' -> do
> -        BL.writeFile p' $ if "hashed_inventory" `isSuffixOf` p
> -          then x'
> -          else compress x'
> -        renameFile p' p
> -      writeCompressed xs
> +  procBasicTar ca = procHashedInv >=> procTmp >=> procFiles ca
> +  procPatches = procFiles
> +  procHashedInv Tar.Done = fail
> +    "Unexpected end of file; hashed_inventory expected"
> +  procHashedInv (Tar.Next x xs) = withTarFile x $ \p c ->
> +    if "hashed_inventory" == takeFileName p
> +      then do
> +        writeFile' Nothing p c
> +        return xs
> +      else fail $ "Unexpected file: " ++ takeFileName p ++
> +        "\nhashed_inventory expected"
> +  procHashedInv (Tar.Fail e) = fail e
> +  procTmp Tar.Done = return Tar.Done
> +  procTmp xxs@(Tar.Next x xs) = withTarFile x $ \p c ->
> +    if "tmp-" `isPrefixOf` p
> +      then do
> +        BL.writeFile p c
> +        procTmp xs
> +      else do
> +        mapM removeFile . filter ("tmp-" `isPrefixOf`) =<<
> +          getDirectoryContents "."
> +        return xxs
> +  procTmp (Tar.Fail e) = fail e
> +  procFiles _ Tar.Done = return ()
> +  procFiles ca (Tar.Next x xs) = withTarFile x $ \p c -> do
> +    writeFile' ca p $ compress c
> +    procFiles ca xs
> +  procFiles _ (Tar.Fail e) = fail e
> +  withTarFile x f = case Tar.entryContent x of
> +    Tar.NormalFile x' _ -> f (Tar.entryPath x) x'
>      _ -> fail "Unexpected non-file tar entry"
> hunk ./src/Darcs/Repository.hs 349
> -  writeCompressed (Tar.Fail e) = fail e
> +  writeFile' Nothing x y = withTemp $ \x' -> do
> +    BL.writeFile x' y
> +    renameFile x' x
> +  writeFile' (Just ca) x y = do
> +    let x' = joinPath . tail $ splitPath x -- drop darcsdir
> +    ex <- doesFileExist $ ca </> x'
> +    if ex
> +      then createLink' (ca </> x') x
> +      else withTemp $ \x'' -> do
> +        BL.writeFile x'' y
> +        createLink' x'' $ ca </> x'
> +        renameFile x'' x
> +  createLink' x y = do
> +    createDirectoryIfMissing True $ takeDirectory y
> +    createLink x y `catchall` return ()
>    putInfo = when (not $ Quiet `elem` opts) . putStrLn

The code above is basically correct, with two caveats:

- multiple write-able caches are not supported (probably rare, but
  supported by other parts of darcs)... I guess this is not a
  showstopper (and we may end up removing this feature anyway)
- the basic tarball only handles the (reserved) tmp-* files between
  hashed_inventory and the rest of the tarball... I'd more appreciate if
  they were ignored anywhere encountered

Use cache while getting a packed repository
-------------------------------------------

[snip]

> hunk ./src/Darcs/Commands/Optimize.lhs 372
>        gzs <- filter ((== ".gz") . takeExtension) `fmap` getDirectoryContents 
> "."
>        mapM_ removeFile gzs
>  
> -doOptimizeHTTP :: IO ()
> -doOptimizeHTTP = do
> +doOptimizeHTTP :: RepoPatch p => Repository p C(r u t) -> IO ()
> +doOptimizeHTTP repo = do
>    rf <- either fail return =<< identifyRepoFormat "."
>    unless (formatHas HashedInventory rf) . fail $
>      "Unsupported repository format:\n" ++
> hunk ./src/Darcs/Commands/Optimize.lhs 379
>      "  only hashed repositories can be optimized for HTTP"
>    createDirectoryIfMissing False packsDir
> -  ps <- dirContents' "patches" $ \x -> all (x /=) ["unrevert", "pending",
> -    "pending.tentative"]
> +  ps <- mapRL hashedPatchFileName . newset2RL <$> readRepo repo
>    BL.writeFile (patchesTar <.> "part") . compress . write =<<
>      mapM fileEntry' ps
>    renameFile (patchesTar <.> "part") patchesTar
> hunk ./src/Darcs/Commands/Optimize.lhs 383
> -  let i = darcsdir </> "hashed_inventory"
> -  is <- dirContents "inventories"
> -  pr <- dirContents "pristine.hashed"
> -  BL.writeFile (basicTar <.> "part") . compress . write =<<
> -    mapM fileEntry' (i : (is ++ pr))
> +  is <- sortByMTime =<< dirContents "inventories"
> +  writeFile (darcsdir </> "tmp-filelist-inventories") . unlines $
> +    map takeFileName is
> +  pr <- sortByMTime =<< dirContents "pristine.hashed"
> +  writeFile (darcsdir </> "tmp-filelist-pristine") . unlines $
> +    map takeFileName pr
This writeFile (darcsdir </> ("tmp-" ++ name) . unlines $ map
takeFileName list) could probably live in a helper function (writeList?)

> +  BL.writeFile (basicTar <.> "part") . compress . write =<< mapM fileEntry' (
> +    [ darcsdir </> "hashed_inventory"
> +    , darcsdir </> "tmp-filelist-pristine"
> +    , darcsdir </> "tmp-filelist-inventories"
> +    ] ++ reverse pr ++ reverse is)
>    renameFile (basicTar <.> "part") basicTar
> hunk ./src/Darcs/Commands/Optimize.lhs 395
> +  removeFile $ darcsdir </> "tmp-filelist-inventories"
> +  removeFile $ darcsdir </> "tmp-filelist-pristine"
Should this go into a finally block to clean up in case of failure as
well?

>   where
>    packsDir = darcsdir </> "packs"
>    basicTar = packsDir </> "basic.tar.gz"
> hunk ./src/Darcs/Commands/Optimize.lhs 408
>    dirContents d = dirContents' d $ const True
>    dirContents' d f = map ((darcsdir </> d) </>) . filter (\x ->
>      head x /= '.' && f x) <$> getDirectoryContents (darcsdir </> d)
> +  hashedPatchFileName x = case extractHash x of
> +    Left _ -> fail "unexpected unhashed patch"
> +    Right h -> darcsdir </> "patches" </> h
> +  sortByMTime xs = map snd . sort <$> mapM (\x -> (\t -> (t, x)) <$>
> +    getModificationTime x) xs
>  \end{code}

[snip]

> hunk ./src/Darcs/Repository.hs 291
>    let toRepo :: Repository p C(r u r) -- In empty repo, t(entative) = 
> r(ecorded)
>        toRepo = Repo toDir opts toFormat $ DarcsRepository toPristine toCache2
>        fromPacksDir = fromDir ++ "/" ++ darcsdir ++ "/packs/"
> -  createDirectoryIfMissing False $ toDir </> darcsdir </> "inventories"
> -  createDirectoryIfMissing False $ toDir </> darcsdir </> "pristine.hashed"
> -  createDirectoryIfMissing False $ toDir </> darcsdir </> "patches"
> +  createDirectoryIfMissing False $ darcsdir </> "inventories"
>    copySources toRepo fromDir
>    Repo _ _ _ (DarcsRepository _ toCache3) <-
>      identifyRepositoryFor toRepo "."
> hunk ./src/Darcs/Repository.hs 295
> -  let
> -    cs = case toCache3 of
> -      Ca cs' -> catMaybes . flip map cs' $ \x -> case x of
> -        Cache Directory Writable x' -> Just x'
> -        _ -> Nothing
> -    ca = if not (null cs) then Just (head cs) else Nothing
>    -- unpack inventory & pristine cache
> hunk ./src/Darcs/Repository.hs 296
> -  procBasicTar ca . Tar.read $ decompress b
> +  cleanDir "pristine.hashed"
> +  procBasic toCache3 . Tar.read $ decompress b
>    createPristineDirectoryTree toRepo "."
>    -- pull new patches
>    us <- readRepo toRepo
> hunk ./src/Darcs/Repository.hs 311
>      applyToWorking toRepo opts pw
>      return ()
>    -- get old patches
> +  cleanDir "patches"
>    unless (any (`elem` opts) [Partial, Lazy, Ephemeral]) $ do
>      putInfo "Copying patches, to get lazy repository hit ctrl-C..."
> hunk ./src/Darcs/Repository.hs 314
> -    procPatches ca . Tar.read . decompress =<< fetchFileLazyPS (fromPacksDir 
> ++
> -      "patches.tar.gz") Uncachable
> +    mv <- newEmptyMVar
> +    _ <- forkIO . flip finally (putMVar mv ()) .
> +      fetchFiles toCache3 HashedPatchesDir . mapFL hashedPatchFileName $
> +      newset2FL us
> +    procPatches toCache3 . Tar.read . decompress =<<
> +      fetchFileLazyPS (fromPacksDir ++ "patches.tar.gz") Uncachable
> +    takeMVar mv
>   where
> hunk ./src/Darcs/Repository.hs 322
> -  procBasicTar ca = procHashedInv >=> procTmp >=> procFiles ca
> -  procPatches = procFiles
> -  procHashedInv Tar.Done = fail
> +  procBasic = procHashedInv
> +  procPatches ca = procFiles $ cacheDir ca
> +  procHashedInv _ Tar.Done = fail
>      "Unexpected end of file; hashed_inventory expected"
> hunk ./src/Darcs/Repository.hs 326
> -  procHashedInv (Tar.Next x xs) = withTarFile x $ \p c ->
> +  procHashedInv ca (Tar.Next x xs) = withTarFile x $ \p c ->
>      if "hashed_inventory" == takeFileName p
>        then do
>          writeFile' Nothing p c
> hunk ./src/Darcs/Repository.hs 330
> -        return xs
> +        procTmp ca xs
>        else fail $ "Unexpected file: " ++ takeFileName p ++
>          "\nhashed_inventory expected"
> hunk ./src/Darcs/Repository.hs 333
> -  procHashedInv (Tar.Fail e) = fail e
> -  procTmp Tar.Done = return Tar.Done
> -  procTmp xxs@(Tar.Next x xs) = withTarFile x $ \p c ->
> -    if "tmp-" `isPrefixOf` p
> +  procHashedInv _ (Tar.Fail e) = fail e
> +  procTmp _ Tar.Done = return ()
> +  procTmp ca xxs@(Tar.Next x xs) = withTarFile x $ \p c ->
> +    if "tmp-" `isPrefixOf` takeFileName p
>        then do
>          BL.writeFile p c
> hunk ./src/Darcs/Repository.hs 339
> -        procTmp xs
> +        procTmp ca xs
>        else do
> hunk ./src/Darcs/Repository.hs 341
> -        mapM removeFile . filter ("tmp-" `isPrefixOf`) =<<
> -          getDirectoryContents "."
> -        return xxs
> -  procTmp (Tar.Fail e) = fail e
> +        ex <- and <$> mapM doesFileExist
> +          [ darcsdir </> "tmp-filelist-pristine"
> +          , darcsdir </> "tmp-filelist-inventories"
> +          ]
> +        if ex
> +          then do
> +            mv <- newEmptyMVar
> +            _ <- forkIO . flip finally (putMVar mv ()) $ do
> +              fetchFiles ca HashedInventoriesDir . lines =<<
> +                readFile (darcsdir </> "tmp-filelist-inventories")
> +              fetchFiles ca HashedPristineDir . lines =<<
> +                readFile (darcsdir </> "tmp-filelist-pristine")
> +            procFiles (cacheDir ca) xxs
> +            takeMVar mv
> +          else procFiles (cacheDir ca) xxs
> +        mapM_ removeFile . (map (darcsdir </>)) .
> +          filter (("tmp-" `isPrefixOf`) . takeFileName) =<<
> +          getDirectoryContents darcsdir
> +  procTmp _ (Tar.Fail e) = fail e
>    procFiles _ Tar.Done = return ()
>    procFiles ca (Tar.Next x xs) = withTarFile x $ \p c -> do
> hunk ./src/Darcs/Repository.hs 362
> -    writeFile' ca p $ compress c
> -    procFiles ca xs
> +    ex <- doesFileExist p
> +    if ex
> +      then debugMessage $ "Tar thread: STOP " ++ p
> +      else do
> +        writeFile' ca p $ compress c
> +        debugMessage $ "Tar thread: GET " ++ p
> +        procFiles ca xs
>    procFiles _ (Tar.Fail e) = fail e
>    withTarFile x f = case Tar.entryContent x of
>      Tar.NormalFile x' _ -> f (Tar.entryPath x) x'
> hunk ./src/Darcs/Repository.hs 389
>      createDirectoryIfMissing True $ takeDirectory y
>      createLink x y `catchall` return ()
>    putInfo = when (not $ Quiet `elem` opts) . putStrLn
> -
> +  fetchFiles _ _ [] = return ()
> +  fetchFiles c d (f:fs) = do
> +    ex <- doesFileExist $ darcsdir </> hashedDir d </> f
> +    if ex
> +      then debugMessage $ "Cache thread: STOP " ++
> +        (darcsdir </> hashedDir d </> f)
> +      else do
> +        debugMessage $ "Cache thread: GET " ++
> +          (darcsdir </> hashedDir d </> f)
> +        fetchFileUsingCache c d f
> +        fetchFiles c d fs
> +  hashedPatchFileName x = case extractHash x of
> +    Left _ -> fail "unexpected unhashed patch"
> +    Right h -> h
> +  cacheDir (Ca cs) = let
> +    cs' = catMaybes . flip map cs $ \x -> case x of
> +      Cache Directory Writable x' -> Just x'
> +      _ -> Nothing
> +   in
> +    if not (null cs') then Just (head cs') else Nothing
> +  cleanDir d = mapM_ (\x -> removeFile $ darcsdir </> d </> x) .
> +    filter (\x -> head x /= '.') =<< getDirectoryContents (darcsdir </> d)
> + 
>  -- | writePatchSet is like patchSetToRepository, except that it doesn't
>  -- touch the working directory or pristine cache.
>  writePatchSet :: RepoPatch p => PatchSet p C(Origin x) -> [DarcsFlag] -> IO 
> (Repository p C(r u t))

Ok, I can accept this as well. The "tmp-" to "meta-" rename can be done
as a patch on top these 3 in this bundle.

Minimize the number of packed inventories
-----------------------------------------

Looks reasonable. I guess the below code may error out if the current
repository is lazy (due to those (Ca []) bits) but this is not much of a
problem. We can polish the error this would give later.

> hunk ./src/Darcs/Commands/Optimize.lhs 383
>    BL.writeFile (patchesTar <.> "part") . compress . write =<<
>      mapM fileEntry' ps
>    renameFile (patchesTar <.> "part") patchesTar
> -  is <- sortByMTime =<< dirContents "inventories"
> +  is <- map ((darcsdir </> "inventories") </>) <$> HashedRepo.listInventories
>    writeFile (darcsdir </> "tmp-filelist-inventories") . unlines $
>      map takeFileName is
>    pr <- sortByMTime =<< dirContents "pristine.hashed"
> hunk ./src/Darcs/Repository/HashedRepo.hs 29
>                                       addToTentativeInventory, 
> removeFromTentativeInventory,
>                                       readRepo, readTentativeRepo, 
> writeAndReadPatch,
>                                       writeTentativeInventory, copyRepo,
> -                                     readHashedPristineRoot, pris2inv, 
> copySources
> +                                     readHashedPristineRoot, pris2inv, 
> copySources,
> +                                     listInventories
>                                     ) where
>  
>  import System.Directory ( createDirectoryIfMissing )
> hunk ./src/Darcs/Repository/HashedRepo.hs 39
>  import System.IO ( stderr, hPutStrLn )
>  import Data.List ( delete, filter )
>  import Control.Monad ( unless )
> +import Control.Applicative ( (<$>) )
>  
>  import Workaround ( renameFile )
>  import Darcs.Flags ( DarcsFlag, Compression, RemoteDarcs )
> hunk ./src/Darcs/Repository/HashedRepo.hs 404
>                    _ -> return ([],i)
>      return $ reverse (readPatchIds str) : rest
>  
> +listInventories :: IO [String]
> +listInventories = do
> +  x <- fst <$> readInventoryPrivate (Ca []) darcsdir "hashed_inventory"
> +  case x of
> +    Nothing -> return []
> +    Just x' -> f x'
> + where
> +  f i = do
> +    x <- fst <$> readInventoryPrivate (Ca []) (darcsdir </> "inventories") i
> +    (i :) <$> case x of
> +      Nothing -> return []
> +      Just x' -> f x'
> +
>  readPatchIds :: B.ByteString -> [(PatchInfo, String)]
>  readPatchIds inv | B.null inv = []
>  readPatchIds inv = case readPatchInfo inv of
_______________________________________________
darcs-users mailing list
[email protected]
http://lists.osuosl.org/mailman/listinfo/darcs-users

Reply via email to