On Sat, Oct 2, 2021, 10:58 AM Guido van Rossum <gu...@python.org> wrote:
> Are you actually observing that people are doing this with regular lists? > Don't people working with Big Data usually use Pandas, which is built on > NumPy arrays and custom data structures? > Basically, Guido is right. Big data lives in NumPy, or Dask, or semi-big in Panda, etc. A partial exception to this is data that flow as JSON, and can therefore have hierarchical structure. For example, a list of 100M dictionaries. That doesn't really fit the NumPy or Pandas tabular model. But when I work with that, I don't think I'd ever want a new collection of "all but a few" copied to new data structure. Generally is expect to process it sequentially, with maybe some condition to `continue` on certain items. This might take the form of an exclusion-index set (process everything but items {7, 1000, 9999}). I suppose that if Python lists were linked lists, I might instead delete items from the middle. But they're not, and the savings Steven suggests isn't order-of-magnitude or big-O, so I still wouldn't want to remove a few items from big lists if it were 2x cheaper (in time, memory, etc). >
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/2DBCEXNQEYDILRBB7BRE4STAHBWTSTTC/ Code of Conduct: http://python.org/psf/codeofconduct/