Hi Richard, perhaps something like this would work:
```Haskell import Text.ParserCombinators.ReadP(readP_to_S, gather) import qualified Text.Read.Lex as L example :: ReadS (Int,String) example input = do ((xs,L.String t), rest) <- readP_to_S (gather L.lex) input pure ((length xs, t), rest) ``` -Iavor On Tue, Apr 27, 2021 at 12:05 PM Richard Eisenberg <r...@richarde.dev> wrote: > Hi devs, > > tl;dr: Is there any (efficient) way to get the String consumed by a > `reads`? > > I'm stuck in thinking about a fix for #19746. Happily, the problem is > simple enough that I could assign it in the first few weeks of a Haskell > course... and yet I can't find a good solution! So I pose it here for > inspiration. > > The high-level problem: Assign correct source spans to options within a > OPTIONS_GHC pragma. > > Current approach: The payload of an OPTIONS_GHC pragma gets turned into a > String and then processed by GHC.Utils.Misc.toArgs :: String -> Either > String [String]. The result of toArgs is either an error string (the Left > result) or a list of lexed options (the Right result). > > A little-known fact is that Haskell strings can be put in a OPTIONS_GHC > pragma. So I can write both {-# OPTIONS_GHC -funbox-strict-fields #-} and > {-# OPTIONS_GHC "-funbox-strict-fieds" #-}. Even stranger, I can write {-# > OPTIONS_GHC ["-funbox-strict-fields"] #-}, where GHC will understand a list > of strings. While I don't really understand the motivation for this last > feature (I posted #19750 about this), the middle option, with the quotes, > seems like it might be useful. > > Desired approach: change toArgs to have this type: RealSrcLoc -> String -> > Either String [Located String], where the input RealSrcLoc is the location > of the first character of the input String. Then, as toArgs processes the > input, it advances the RealSrcLoc (with advanceSrcLoc), allowing us to > create correct SrcSpans for each String. > > Annoying fact: Not all characters advance the source location by one > character. Tabs and newlines don't. Perhaps some other characters don't, > too. > > Central stumbling block: toArgs uses `reads` to parse strings. This makes > great sense, because `reads` already knows how to convert Haskell String > syntax into a proper String. The problem is that we have no idea what > characters were consumed by `reads`. And, short of looking at the length of > the remainder string in `reads` and comparing it to the length of the input > string, there seems to be no way to recreate this lost information. Note > that comparing lengths is slow, because we're dealing with Strings here. > Once we know what was consumed by `reads`, then we can just repeatedly call > advancedSrcLoc, and away we go. > > Ideas to get unblocked: > 1. Just do the slow (quadratic in the number of options) thing, looking at > the lengths of strings often. > 2. Reimplement reading of strings to return both the result and the > characters consumed > 3. Incorporate the parsing of OPTIONS_GHC right into the lexer > > It boggles me that there isn't a better solution here. Do you see one? > > Thanks, > Richard > _______________________________________________ > ghc-devs mailing list > ghc-devs@haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs >
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs