Send Beginners mailing list submissions to beginners@haskell.org To subscribe or unsubscribe via the World Wide Web, visit http://mail.haskell.org/cgi-bin/mailman/listinfo/beginners or, via email, send a message with subject or body 'help' to beginners-requ...@haskell.org
You can reach the person managing the list at beginners-ow...@haskell.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Beginners digest..." Today's Topics: 1. Re: Noobie attempt to process log output into dependency graph (John Lusk) ---------------------------------------------------------------------- Message: 1 Date: Wed, 14 Dec 2016 18:28:07 -0500 From: John Lusk <johnlu...@gmail.com> To: The Haskell-Beginners Mailing List - Discussion of primarily beginner-level topics related to Haskell <beginners@haskell.org> Subject: Re: [Haskell-beginners] Noobie attempt to process log output into dependency graph Message-ID: <cajqkmby+qd+jpkuicnoh6d6vrtjqao7j0tvakk1fj-ehgzt...@mail.gmail.com> Content-Type: text/plain; charset="utf-8" (Or you could find it here: https://github.com/JohnL4/DependencyGraph) On Wed, Dec 14, 2016 at 6:26 PM, John Lusk <johnlu...@gmail.com> wrote: > Hi, all, > > Here's my question: > > I thought, for grins, I'd try to turn some log output into a dependency > graph (using GraphViz's dot(1)). I'm having difficulty forcing my > stateful paradigm into a functional one, so I need some help. > > If I was to do this with an imperative (stateful) language, I'd build a > set of edges (or a map to a frequency count, really, since I'll use freq > > 1 to add some output text noting the repeated occurrences), and then > dump out the set elements to a text file that would look something like > this fragment: > > a -> q > q -> d > d -> e [color=red] > d -> f [color=red > > My big problem now is that if I process a subtree that looks like: > > a > b > c > d > b > d > e > > my current plan is to proces the first b-c-d subtree and then process the > b-d-e subtree, *BUT* I need to pass the updated edge set to the second > processing call, which is pretty stateful. > > Do I need to just bite the bullet and find some succinct way to do that, > or is my entire approach just wrong, stuck in my stateful mindset? > > My (awful) code looks like this: > > -- Emit to stdout a series of dot(1) edges specifying dependencies.-- "A -> > B" means "A depends on B".---- Build with 'ghc dependency-graph.hs'-- -- > Input is a text file containing lines as follows:-- (some indentation) > (some extraneous text) (file-A) in (some directory)-- (some extra > indentation) (some extraneous text) (file-B) in (some directory)-- (some > indentation matching the first line above) (some extraneous text) (file-C) in > (some directory)---- This means that file-A depends on file-B, but neither > file-A nor file-B depend on file-C.---- Sample:-- > Helios.MigrationTool.Common.AssemblyUtils.GetAssemblyList() Information: 0 : > Processing SXA.Compass.Config.ViewModel.dll in C:\Program Files > (x86)\Allscripts Sunrise\Clinical Manager Client\7.2.5575.0\-- > Helios.MigrationTool.Common.AssemblyUtils.GetAssemblyList() Information: 0 : > Adding C:\Program Files (x86)\Allscripts Sunrise\Clinical Manager > Client\7.2.5575.0\SXA.Compass.Config.ViewModel.dll (IsPresent=true) to > assemblyList at beginning of GetAssemblyListEx()-- > Helios.MigrationTool.Common.AssemblyUtils.GetAssemblyList() Information: 0 : > Processing SXA.Compass.Config.Utils.dll in C:\Program Files > (x86)\Allscripts Sunrise\Clinical Manager Client\7.2.5575.0\---- (Need to > skip the line containing "Adding", and only process the ones containing > "Processing".)-- -- Algorithm:-- Read first line, parse, remember > indentation-- Repeat for other lines, but if indentation increases, > store pair A -> B in hashset.-- At end, dump out hashset. > -- import Debug.Trace-- import System.Environment-- import > System.Console.GetOpt-- import Data.Maybe (fromMaybe)-- import > Data.List.Splitimport Prelude -- hiding (readFile) -- Because we want the > System.IO.Strict version-- import System.IO (hPutStr, hPutStrLn, stderr)-- > import System.IO.Strict-- import Control.Monad-- import System.Directory-- > import System.FilePathimport Text.Regex.TDFA-- import > Text.Regex.TDFA.String-- import Text.Printf > -- import qualified Data.Map.Lazy as Mapimport qualified Data.Map.Strict as > Map > ---------------------------------------------------------------- Test Datal1 > = " Helios.MigrationTool.Common.AssemblyUtils.GetAssemblyList() > Information: 0 : Processing SXA.Compass.Config.ViewModel.dll\tin C:\\Program > Files (x86)\\Allscripts Sunrise\\Clinical Manager Client\\7.2.5575.0\\"l2 = " > Helios.MigrationTool.Common.AssemblyUtils.GetAssemblyList() Information: 0 > : Adding C:\\Program Files (x86)\\Allscripts Sunrise\\Clinical Manager > Client\\7.2.5575.0\\SXA.Compass.Config.ViewModel.dll\t(IsPresent=true)\tto > assemblyList at beginning of GetAssemblyListEx()"l3 = " > Helios.MigrationTool.Common.AssemblyUtils.GetAssemblyList() Information: 0 : > Processing SXA.Compass.Config.Utils.dll\tin C:\\Program Files > (x86)\\Allscripts Sunrise\\Clinical Manager > Client\\7.2.5575.0\\"---------------------------------------------------------------- > Test Data Ends-- See http://stackoverflow.com/q/32149354/370611-- toRegex = > makeRegexOpts defaultCompOpt{multiline=False} defaultExecOpt > -- Escape parens?-- initialFillerRegex :: String-- initialFillerRegex = > "Helios.MigrationTool.Common.AssemblyUtils.GetAssemblyList\\(\\) Information: > 0 : Processing" > -- Regex matching (marking) a line to be processed-- valuableLineRegex :: > String-- valuableLineRegex = "\\bProcessing\\b" > -- |Regex matching line to be parsedparseLineRegex :: StringparseLineRegex = > "^(.* Information: 0 : Processing )([^ ]*)[ \t]+in (.*)" -- 3subexpressions > main :: IO()main = do > logContents <- getContents > putStrLn $ unlines $ fst $ edges (parseIndent $ lines logContents) Map.empty > ------------------------------------------------------------------ |Parses > out the leading indentation of the given String into a string of spaces and > the rest of the lineparseIndent :: String -> (String,String)parseIndent s = > ((fourth $ (s =~ "^( *)(.*)" :: (String,String,String,[String]))) !! 0, > (fourth $ (s =~ "^( *)(.*)" :: > (String,String,String,[String]))) !! 1) > ------------------------------------------------------------------ |Returns a > list of strings describing edges in the form "a -> b /* comment */"edges :: > [(String,String)] -- ^ Input tuples: (indent, restOfString) > -> Map.Map String Int -- ^ Map of edges in form "a -> b" with a count of > the number of times that edge occurs > -> [String] -- ^ Output list of edge descriptions in form "a -> b > optionalExtraText" > edges [] edgeSet = > (edgeDump $ Map.assocs edgeSet, 0) > edges (lastLine:[]) edgeSet = > (edgeDump $ Map.assocs edgeSet, 1) > edges (fstLogLine:sndLogLine:[]) edgeSet = > let fstFields = (snd fstLogLine) =~ parseLineRegex :: > (String,String,String,[String]) > sndFields = (snd sndLogLine) =~ parseLineRegex :: > (String,String,String,[String]) > in > if length (fourth fstFields) == 0 > then error ("Unmatched: " ++ (first fstFields)) -- First line must always > match > else if length (fourth sndFields) == 0 -- "Adding", not "Processing" > then edges (fstLogLine:[]) edgeSet -- Skip useless line > else if indentLength fstLogLine >= indentLength sndLogLine > then edges (sndLogLine:[]) edgeSet -- Can't be an edge from first to > second line; drop first line and keep going. > else edges (sndLogLine:[]) > (Map.insertWith (+) ((fullName fstFields) ++ (fullName sndFields)) 1) > edges (fstLogLine:sndLogLine:thdLogLine:logLines) edgeSet = > let fstFields = (snd fstLogLine) =~ parseLineRegex :: > (String,String,String,[String]) > sndFields = (snd sndLogLine) =~ parseLineRegex :: > (String,String,String,[String]) > thdFields = (snd thdLogLine) =~ parseLineRegex :: > (String,String,String,[String]) > in > if length (fourth fstFields) == 0 > then error ("Unmatched: " ++ (first fstFields)) -- First line must always > match > > else if length (fourth sndFields) == 0 -- "Adding", not "Processing" > then edges (fstLogLine:thdLogLine:logLines) edgeSet -- Skip useless line > > else if indentLength fstLogLine >= indentLength sndLogLine > then [] -- Stop processing at outdent > > else > -- Looking one of: > -- 1 > -- 2 -- process 1 -> 2, then process 2.. as subtree > -- 3 -- Need to process as subtree rooted at 2, then drop > subtree (zero or more lines at same level as 3) > -- or > -- 1 > -- 2 -- processs, then drop this line (process 2.. as empty > subtree?) > -- 3 > -- or > -- 1 > -- 2 -- process, then drop this line (drop entire subtree > rooted at 1) (same as above, drop empty subtree? (2)) > -- 3 > -- or > -- 1 > -- 2 -- same as above? Drop empty subtree rooted at 2 > -- 3 > edges (sndLogLine:thdLogLine:logLines) (Map.insertWith (+) ((fullName > fstFields) ++ (fullName sndFields)) 1) -- now what? I need to pass the > UPDATED edgeSet on to the next call, after the subtree rooted at 2 is dropped. > > > > then edges (sndLogLine:logLines) edgeSet -- Can't be an edge from first > to second line; drop first line and keep going. > else edges (sndLogLine:(takeWhile (increasingIndent $ length $ fst > fstLogLine) logLines)) > (Map.insertWith (+) ((fullName fstFields) ++ (fullName sndFields)) 1) > else ((fst $ edges (sndLogLine:logLines) edgeSet) > ++ (fst $ edges (fstLogLine:(drop > (snd $ edges (sndLogLine:logLines) > edgeSet) -- # of lines processed > logLines)) edgeSet), > (snd $ edges (sndLogLine:logLines) edgeSet) > + (snd $ edges (fstLogLine:(drop > (snd $ edges (sndLogLine:logLines) > edgeSet) -- # of lines processed > logLines)) edgeSet) > ) > ----------------------------------------------------------------fullname :: > (String,String,String,[String]) -> Stringfullname > (_,_,_,[_,fileName,directoryName]) = directoryName ++ fileName > ------------------------------------------------------------------ |Edges > from the first line to all following linesedgesFrom :: String -- > ^ First line > -> [String] -- ^ Following lines > -> Map.Map String Int -- ^ Set of edges built so far > -> [String]edgesFrom a b c = [] > ------------------------------------------------------------------ |Return > length of indent or errorindentLength :: (String,String,String,[String]) -- ^ > Regex match context > -> Int -- ^ Length of > indentindentLength (prefix,_,_,[]) = error $ "Not matched: " ++ > prefixindentLength (_,_,_,subexprs) = > length $ subexprs !! 0 > ------------------------------------------------------------------ |Returns a > list of edges, possibly with comments indicating occurrence counts > > 1edgeDump :: [(String,Int)] -- ^ List of (edge,count) tuples > -> [String] -- ^ List of edges, possibly > w/commentsedgeDump [] = []edgeDump ((edge,count):rest) > | count <= 1 = edge:(edgeDump rest) > | otherwise = (edge ++ " /* " ++ (show count) ++ " occurrences > */"):(edgeDump rest) > ----------------------------------------------------------------first :: > (a,b,c,d) -> afirst (x,_,_,_) = x > fourth :: (a,b,c,d) -> dfourth (_,_,_,x) = x > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.haskell.org/pipermail/beginners/attachments/20161214/d71c1195/attachment.html> ------------------------------ Subject: Digest Footer _______________________________________________ Beginners mailing list Beginners@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/beginners ------------------------------ End of Beginners Digest, Vol 102, Issue 6 *****************************************