This discussion has become less useful.

E can all agree that in Computer Science, real infinities are avoided, and
frankly, need not be taken seriously in any serious program.

You can store all kinds of infinities quite compactly as in a transcendental
number you can derive to as many decimal points as you like. Want 1/7 to a
thousand decimal places, no problem. You can be given a digit 1 and a digit
7 and asked to do a division to as many digits as you wish in a
deterministic manner. I can think of quite a few generators that could
easily supply the next digit, or just keep giving the next element from
142857 each time from a circular loop.

Sines, cosines, pi, e and so on, can often be calculated to arbitrary
precision by evaluating things like infinite Taylor Series as many times as
needed up to the precision of the data holding the number as you move along.

 Similar ideas allow generators to give you as many primes as you want, and
no more.

So, if you can store arbitrary python code as part of your JSON, you can
send quite a bit of somewhat compressed data.

The real problem is how the JSON is set up. If you take umpteen data
structures and wrap them all in something like a list, then it may be a tad
hard to stream as you may not necessarily be examining the contents till the
list finishes gigabytes later. But if, instead, you send lots of smaller
parts, such as perhaps sending each row of something like a data.frame
individually, the other side can recombine them incrementally to a larger
structure such as a data.frame and do some logic on it as it streams, such
as keeping only some columns and discarding the rest, or applying filters
that only keep rows you care about. And, of course, all rows could be
appended to one and perhaps more .CSV files as well so if you need multiple
passes on the data, it can now be processed locally in various modes,
including "streamed".

I think that for some purposes, it makes some sense to not stream anything
but results. I mean consider any database that allows a remote login and SQL
commands that only stream results. If I only want info on records about
company X between July 1 and September 15 of a particular year and only if
the amount paid remains zero or is less than the amount owed, ...


-----Original Message-----
From: Python-list <python-list-bounces+avi.e.gross=gmail....@python.org> On
Behalf Of Greg Ewing via Python-list
Sent: Tuesday, October 1, 2024 5:48 PM
To: python-list@python.org
Subject: Re: Help with Streaming and Chunk Processing for Large JSON Data
(60 GB) from Kenna API

On 1/10/24 8:34 am, Left Right wrote:
> You probably forgot that it has to be _streaming_. Suppose you parse
> the first digit: can you hand this information over to an external
> function to process the parsed data? -- No! because you don't know the
> magnitude yet.

By that definition of "streaming", no parser can ever be streaming,
because there will be some constructs that must be read in their
entirety before a suitably-structured piece of output can be
emitted.

The context of this discussion about integers is the claim that
they *could* be parsed incrementally if they were written little
endian instead of big endian, but the same argument applies either
way.

-- 
Greg
-- 
https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to