I'm not really certain of your previous example, but what you're describing here is data loss. AFAIK, this is inherent to any kind of streaming approach. Have a look at the chapter in the Yesod book on conduit, the description of data loss is still accurate.
On Mon, Apr 16, 2012 at 3:46 AM, Myles C. Maxfield <myles.maxfi...@gmail.com> wrote: > Sorry for the spam. > > A similar matter is this following program, where something downstream > reaches EOF right after a conduit outputs a HaveOutput. Because the type of > the early-closed function is just 'r' or 'm r', there is no way for the > conduit to return any partial output. This means that any extra values in > the chunk the conduit read are lost. Is there some way around this? > > -- takeConduit as in previous email > -- partial2 outputs ([[1,2,3,4,5]],[]) instead of ([[1,2,3,4,5]],[6,7,8]) > > monadSink :: Monad m => CI.Sink [a1] m ([[a1]], [[a1]]) > monadSink = do > output <- takeConduit 5 C.=$ CL.take 1 > output' <- CL.consume > return (output, output') > > partial2 :: (Num t, Monad m, Enum t) => m ([[t]], [[t]]) > partial2 = CL.sourceList [[1..8]] C.$$ monadSink > > Thanks, > Myles > > On Sun, Apr 15, 2012 at 4:53 PM, Myles C. Maxfield > <myles.maxfi...@gmail.com> wrote: >>>> 2. If you use connect-and-resume ($$+), the leftovers are returned as >>>> part of the `Source`, and provided downstream. >> >> I'm trying to figure out how to use this, but I'm getting a little bit >> confused. In particular, here is a conduit that produces an output for >> every 'i' inputs. I'm returning partial data when the input stream >> hits an EOF (And I verified that the partial data is correct with >> Debug.Trace), yet the output of 'partial' is ([[1,2,3,4,5]],[]) >> instead of ([[1,2,3,4,5]],[6,7,8]). Can you help me understand what's >> going on? >> >> Thanks, >> Myles >> >> import qualified Data.Conduit as C >> import qualified Data.Conduit.List as CL >> >> -- functionally the same as concatenating all the inputs, then >> repeatedly running splitAt on the concatenation. >> takeConduit :: (Num a, Monad m) => a -> C.Pipe [a1] [a1] m () >> takeConduit i = takeConduitHelper i [] [] >> where takeConduitHelper x lout lin >> | x == 0 = C.HaveOutput (takeConduitHelper i [] lin) (return >> ()) $ reverse lout >> | null lin = C.NeedInput (takeConduitHelper x lout) (C.Done >> (Just $ reverse lout) ()) >> | otherwise = takeConduitHelper (x - 1) (head lin : lout) $ tail >> lin >> >> partial :: (Num t, Monad m, Enum t) => m ([[t]], [[t]]) >> partial = do >> (source, output) <- CL.sourceList [[1..8]] C.$$+ (takeConduit 5 C.=$ >> CL.consume) >> output' <- source C.$$ CL.consume >> return (output, output') >> >> On Sun, Apr 15, 2012 at 2:12 PM, Myles C. Maxfield >> <myles.maxfi...@gmail.com> wrote: >>> Thanks for responding to this. Some responses are inline. >>> >>> On Sat, Apr 14, 2012 at 8:30 PM, Michael Snoyman <mich...@snoyman.com> >>> wrote: >>>> On Thu, Apr 12, 2012 at 9:25 AM, Myles C. Maxfield >>>> <myles.maxfi...@gmail.com> wrote: >>>>> Hello, >>>>> I am interested in the argument to Done, namely, leftover data. More >>>>> specifically, when implementing a conduit/sink, what should the >>>>> conduit specify for the (Maybe i) argument to Done in the following >>>>> scenarios (Please note that these scenarios only make sense if the >>>>> type of 'i' is something in Monoid): >>>>> >>>>> 1) The conduit outputted the last thing that it felt like outputting, >>>>> and exited willfully. There seem to be two options here - a) the >>>>> conduit/sink should greedily gather up all the remaining input in the >>>>> stream and mconcat them, or b) Return the part of the last thing that >>>>> never got represented in any part of anything outputted. Option b >>>>> seems to make the most sense here. >>>> >>>> Yes, option (b) is definitely what's intended. >>>> >>>>> 2) Something upstream produced Done, so the second argument to >>>>> NeedInput gets run. This is guaranteed to be run at the boundary of an >>>>> item, so should it always return Nothing? Instead, should it remember >>>>> all the input it has consumed for the current (yet-to-be-outputted) >>>>> element, so it can let Data.Conduit know that, even though the conduit >>>>> appeared to consume the past few items, it actually didn't (because it >>>>> needs more input items to make an output)? Remembering this sequence >>>>> could potentially have disastrous memory usage. On the other hand, It >>>>> could also greedily gather everything remaining in the stream. >>>> >>>> No, nothing so complicated is intended. Most likely you'll never >>>> return any leftovers from the second field of NeedInput. One other >>>> minor point: it's also possible that the second field will be used if >>>> the *downstream* pipe returns Done. >>> >>> Just to help me understand, what is a case when you want to specify >>> something in this field? I can't think of a case when a Conduit would >>> specify anything in this case. >>> >>>> >>>>> 3) The conduit/sink encountered an error mid-item. In general, is >>>>> there a commonly-accepted way to deal with this? If a conduit fails in >>>>> the middle of an item, it might not be clear where it should pick up >>>>> processing, so the conduit probably shouldn't even attempt to >>>>> continue. It would probably be good to return some notion of where it >>>>> was in the input when it failed. It could return (Done (???) (Left >>>>> errcode)) but this requires that everything downstream in the pipeline >>>>> be aware of Errcode, which is not ideal.I could use MonadError along >>>>> with PipeM, but this approach completely abandons the part of the >>>>> stream that has been processed successfully. I'd like to avoid using >>>>> Exceptions if at all possible. >>>> >>>> Why avoid Exceptions? It's the right fit for the job. You can still >>>> keep your conduit pure by setting up an `ExceptionT Identity` stack, >>>> which is exactly how you can use the Data.Conduit.Text functions from >>>> pure code. Really, what you need to be asking is "is there any logical >>>> way to recover from an exception here?" >>> >>> I suppose this is a little off-topic, but do you prefer ExceptionT or >>> ErrorT? Any exception/error that I'd be throwing is just a container >>> around a String, so both of them will work fine for my purposes. >>> >>>> >>>>> It doesn't seem that a user application even has any way to access >>>>> leftover data anyway, so perhaps this discussion will only be relevant >>>>> in a future version of Conduit. At any rate, any feedback you could >>>>> give me on this issue would be greatly appreciated. >>>> >>>> Leftover data is definitely used: >>>> >>>> 1. If you compose together two `Sink` with monadic bind, the leftovers >>>> from the first will be passed to the second. >>> >>> You can do that???? That's so cool!I never realized that Pipes are >>> members of Monad. >>> >>>> 2. If you use connect-and-resume ($$+), the leftovers are returned as >>>> part of the `Source`, and provided downstream. >>> >>> This too is really neat :] I didn't realize how this worked. >>> >>>> >>>> Michael > _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe