The inner blob is expecting an io.Reader. But, perhaps I can change that to pass a Decoder based on what you are saying. For some reason I hadn't grokked that is how Decoder was working. Just to re-iterate what I think you are saying (and in case anyone stumbles across this thread later), assuming a file that has this type of structure (call each of the outer blobs A, B, C for reference):
{ [ {...}, {...} ] } { [ {...}, {...} ] } [ {...}, {...} ] The first call to Decoder() will move the pointer to the first `{` in A. Something like exponen-io.jsonpath Seek() could be used to advance to A's `[` The second call to Decoder(), with the embedded reader, will set the position at A's first inner {...} Each subsequent call to Decode() will process each inner {...} of A one at a time until More() is false, at which point the position is at A's `]` The third call to Decoder() will move the pointer to the first `{` in B. *Question: Is this in fact correct? If not how to I get reader to this point of the stream?* The fourth call to Decoder() will allow me to stream read to B's `[` (in this case using exponent-io.jsonpath SeekTo() or some other mechanism) Each subsequent call to Decode() will process each inner {...} of B one at a time until More() is false, at which point the position is at B's `]` The fifth call to Decoder() will move the pointer to the first `[` in C. Each subsequent call to Decode() will process each inner {...} of C one at a time until More() is false I realize this may not what actually is going internally inside these packages, but at a high level is that conceptually something approaching what is going on? If this is true, I gotta say this is one of the things I *LOVE* about Go. I cannot count the number of times I had some complicated problem which, which Go makes a whole lot easier. Or put another way: I was over-complicating the problem and not recognizing the underlying code defect which should change. In fact, even refactoring this code even though its used in about 100 places would be trivial. I could probably just use perl -pie to fix the code. And also, if I may be a bit indulgent here, the quality of the answers that come out of the Golang community are just amazing. I love reading this mailing list even though I've only posted to it a few times. - Greg On Sunday, March 28, 2021 at 1:26:17 AM UTC-7 Brian Candler wrote: > > This works, but the downside is that each {...} of bytes has to be > pulled into memory. And the functions that is called is already designed > to receive an io.Reader and parse the VERY large inner blob in an efficient > manner. > > Is the inner blob decoder actually using a json.Decoder, as shown in your > example func secondDecoder()? In that case, the simplest and most > efficient answer is to create a persistent json.Decoder which wraps the > underlying io.Reader directly, and just keep calling w2.Decode(&v) on each > call. It will happily consume the stream, one object at a time. > > If that's not possible for some reason, then it sounds like you want to > break the outer stream at outer object boundaries, i.e. { ... }, without > fully parsing it. You can do that with json.RawMessage: > https://play.golang.org/p/BitE6l27160 > > However, you've still read each object as a stream of bytes into memory, > and you've still done some of the work of parsing the JSON to find the > start and end of each object. You can turn it back into an io.Reader by > creating a bytes.NewBuffer around it, if that's what the inner parser > requires. However if each object is large, and you really need to avoid > reading it into memory at all, then you'd need some sort of rewindable > stream. > > Another approach is to stop the source generating pretty-printed JSON, and > make it generate in JSON-Lines <https://jsonlines.org/> format instead. > It sounds like you're unable to change the source, but you might be able to > un-prettyprint the JSON by using an external tool (perhaps jq can do > this). Then I am thinking you could make a custom io.Reader which returns > data up to a newline, then sends EOF and sends you a fresh io.Reader for > the next line. > > But this is all very complicated, when keeping the inner Decoder around > from object to object is a simple solution to the problem that you > described. Is there some other constraint which prevents you from doing > this? > > On Saturday, 27 March 2021 at 19:42:40 UTC greg.sa...@gmail.com wrote: > >> Good afternoon, >> >> For a case where there's a file containing a sequence of hashes (it could >> be arrays too, as the underlying object type seems irrelevant) as per >> RFC-7464. I cannot figure out how to handle this in a memory efficient way >> that doesn't involve pulling each blob >> >> I've tried to express this on Go playground here: >> https://play.golang.org/p/Aqx0gnc39rn >> Note that I'm using exponent-io/jsonpath as the JSON decoder, but >> certainly that could be swapped for something else. >> >> In essence here is an example of the input bytes: >> >> { >> "elements" : [ >> { >> "Space" : "YCbCr", >> "Point" : { >> "Cb" : 0, >> "Y" : 255, >> "Cr" : -10 >> } >> }, >> { >> "Point" : { >> "B" : 255, >> "R" : 98, >> "G" : 218 >> }, >> "Space" : "RGB" >> } >> ] >> } >> { >> "elements" : [ >> { >> "Space" : "YCbCr", >> "Point" : { >> "Cb" : 3000, >> "Y" : 355, >> "Cr" : -310 >> } >> }, >> { >> "Space" : "RGB", >> "Point" : { >> "B" : 355, >> "G" : 318, >> "R" : 108 >> } >> } >> ] >> } >> { >> "elements" : [ >> { >> "Space" : "YCbCr", >> "Point" : { >> "Cr" : -410, >> "Cb" : 400, >> "Y" : 455 >> } >> }, >> { >> "Space" : "RGB", >> "Point" : { >> "B" : 455, >> "R" : 118, >> "G" : 418 >> } >> } >> ] >> } >> >> I can iterate through that with this code: >> >> w := json.NewDecoder(bytes.NewReader(j)) >> for w.More() { >> var v interface{} >> w.Decode(&v) >> fmt.Printf("%+v\n", v) >> } >> >> This works, but the downside is that each {...} of bytes has to be pulled >> into memory. And the functions that is called is already designed to >> receive an io.Reader and parse the VERY large inner blob in an efficient >> manner. >> >> So in principal, this is kinda want I want to do, but maybe I'm looking >> at it all wrong: >> >> >> w := json.NewDecoder(bytes.NewReader(j)) >> for w.More() { >> reader2 := ???? //Some io.Reader that represents each of the 3 json-seq >> blocks >> secondDecoder(reader2) >> } >> >> func secondDecoder(reader io.Reader) { >> w2 := json.NewDecoder(reader) >> var v interface{} >> w2.Decode(&v) >> fmt.Printf("%+v\n", v) >> } >> >> Any ideas on how to solve this problem? >> >> I should note that it is not possible for the input to change in this >> case as the system that consumes it is not the same one that has been >> generating it for the past 5 years. >> >> Thanks! >> >> - Greg >> >> -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/58472f92-aa24-43a1-b22a-adc8f872e8ccn%40googlegroups.com.