On 2024-09-30 at 21:34:07 +0200,
Regarding "Re: Help with Streaming and Chunk Processing for Large JSON Data (60 
GB) from Kenna API,"
Left Right via Python-list <python-list@python.org> wrote:

> > What am I missing?  Handwavingly, start with the first digit, and as
> > long as the next character is a digit, multipliy the accumulated result
> > by 10 (or the appropriate base) and add the next value.  Oh, and handle
> > scientific notation as a special case, and perhaps fail spectacularly
> > instead of recovering gracefully in certain edge cases.  And in the
> > pathological case of a single number with 60 billion digits, run out of
> > memory (and complain loudly to the person who claimed that the file
> > contained a "dataset").  But why do I need to start with the least
> > significant digit?
> 
> You probably forgot that it has to be _streaming_. Suppose you parse
> the first digit: can you hand this information over to an external
> function to process the parsed data? -- No! because you don't know the
> magnitude yet.  What about two digits? -- Same thing.  You cannot
> leave the parser code until you know the magnitude (otherwise the
> information is useless to the external code).

If I recognize the first digit, then I *can* hand that over to an
external function to accumulate the digits that follow.

> So, even if you have enough memory and don't care about special cases
> like scientific notation: yes, you will be able to parse it, but it
> won't be a streaming parser.

Under that constraint, I'm not sure I can parse anything.  How can I
parse a string (and hand it over to an external function) until I've
found the closing quote?

How much state can a parser maintain (before it invokes an external
function) and still be considered streaming?  I fear that we may be
getting hung up on terminology rather than solving the problem at hand.
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to