> What am I missing?  Handwavingly, start with the first digit, and as
> long as the next character is a digit, multipliy the accumulated result
> by 10 (or the appropriate base) and add the next value.  Oh, and handle
> scientific notation as a special case, and perhaps fail spectacularly
> instead of recovering gracefully in certain edge cases.  And in the
> pathological case of a single number with 60 billion digits, run out of
> memory (and complain loudly to the person who claimed that the file
> contained a "dataset").  But why do I need to start with the least
> significant digit?

You probably forgot that it has to be _streaming_. Suppose you parse
the first digit: can you hand this information over to an external
function to process the parsed data? -- No! because you don't know the
magnitude yet.  What about two digits? -- Same thing.  You cannot
leave the parser code until you know the magnitude (otherwise the
information is useless to the external code).

So, even if you have enough memory and don't care about special cases
like scientific notation: yes, you will be able to parse it, but it
won't be a streaming parser.

On Mon, Sep 30, 2024 at 9:30 PM Left Right <olegsivo...@gmail.com> wrote:
>
> > Streaming won't work because the file is gzipped.  You have to receive
> > the whole thing before you can unzip it. Once unzipped it will be even
> > larger, and all in memory.
>
> GZip is specifically designed to be streamed.  So, that's not a
> problem (in principle), but you would need to have a streaming GZip
> parser, quick search in PyPI revealed this package:
> https://pypi.org/project/gzip-stream/ .
>
> On Mon, Sep 30, 2024 at 6:20 PM Thomas Passin via Python-list
> <python-list@python.org> wrote:
> >
> > On 9/30/2024 11:30 AM, Barry via Python-list wrote:
> > >
> > >
> > >> On 30 Sep 2024, at 06:52, Abdur-Rahmaan Janhangeer via Python-list 
> > >> <python-list@python.org> wrote:
> > >>
> > >>
> > >> import polars as pl
> > >> pl.read_json("file.json")
> > >>
> > >>
> > >
> > > This is not going to work unless the computer has a lot more the 60GiB of 
> > > RAM.
> > >
> > > As later suggested a streaming parser is required.
> >
> > Streaming won't work because the file is gzipped.  You have to receive
> > the whole thing before you can unzip it. Once unzipped it will be even
> > larger, and all in memory.
> > --
> > https://mail.python.org/mailman/listinfo/python-list
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to