On Sat, Oct 2, 2021, 10:58 AM Guido van Rossum <gu...@python.org> wrote:

> Are you actually observing that people are doing this with regular lists?
> Don't people working with Big Data usually use Pandas, which is built on
> NumPy arrays and custom data structures?
>

Basically, Guido is right. Big data lives in NumPy, or Dask, or semi-big in
Panda, etc.

A partial exception to this is data that flow as JSON, and can therefore
have hierarchical structure. For example, a list of 100M dictionaries. That
doesn't really fit the NumPy or Pandas tabular model.

But when I work with that, I don't think I'd ever want a new collection of
"all but a few" copied to new data structure. Generally is expect to
process it sequentially, with maybe some condition to `continue` on certain
items. This might take the form of an exclusion-index set (process
everything but items {7, 1000, 9999}).

I suppose that if Python lists were linked lists, I might instead delete
items from the middle. But they're not, and the savings Steven suggests
isn't order-of-magnitude or big-O, so I still wouldn't want to remove a few
items from big lists if it were 2x cheaper (in time, memory, etc).

>
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/2DBCEXNQEYDILRBB7BRE4STAHBWTSTTC/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to