This discussion has become less useful. E can all agree that in Computer Science, real infinities are avoided, and frankly, need not be taken seriously in any serious program.
You can store all kinds of infinities quite compactly as in a transcendental number you can derive to as many decimal points as you like. Want 1/7 to a thousand decimal places, no problem. You can be given a digit 1 and a digit 7 and asked to do a division to as many digits as you wish in a deterministic manner. I can think of quite a few generators that could easily supply the next digit, or just keep giving the next element from 142857 each time from a circular loop. Sines, cosines, pi, e and so on, can often be calculated to arbitrary precision by evaluating things like infinite Taylor Series as many times as needed up to the precision of the data holding the number as you move along. Similar ideas allow generators to give you as many primes as you want, and no more. So, if you can store arbitrary python code as part of your JSON, you can send quite a bit of somewhat compressed data. The real problem is how the JSON is set up. If you take umpteen data structures and wrap them all in something like a list, then it may be a tad hard to stream as you may not necessarily be examining the contents till the list finishes gigabytes later. But if, instead, you send lots of smaller parts, such as perhaps sending each row of something like a data.frame individually, the other side can recombine them incrementally to a larger structure such as a data.frame and do some logic on it as it streams, such as keeping only some columns and discarding the rest, or applying filters that only keep rows you care about. And, of course, all rows could be appended to one and perhaps more .CSV files as well so if you need multiple passes on the data, it can now be processed locally in various modes, including "streamed". I think that for some purposes, it makes some sense to not stream anything but results. I mean consider any database that allows a remote login and SQL commands that only stream results. If I only want info on records about company X between July 1 and September 15 of a particular year and only if the amount paid remains zero or is less than the amount owed, ... -----Original Message----- From: Python-list <python-list-bounces+avi.e.gross=gmail....@python.org> On Behalf Of Greg Ewing via Python-list Sent: Tuesday, October 1, 2024 5:48 PM To: python-list@python.org Subject: Re: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API On 1/10/24 8:34 am, Left Right wrote: > You probably forgot that it has to be _streaming_. Suppose you parse > the first digit: can you hand this information over to an external > function to process the parsed data? -- No! because you don't know the > magnitude yet. By that definition of "streaming", no parser can ever be streaming, because there will be some constructs that must be read in their entirety before a suitably-structured piece of output can be emitted. The context of this discussion about integers is the claim that they *could* be parsed incrementally if they were written little endian instead of big endian, but the same argument applies either way. -- Greg -- https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list