I am not sure what kind of a scheme would support various-sized native ints. Any scheme that puts pointers in the array is going to be worse: the pointers will be 64-bit. You could store offsets to data, but then you would need to store both the offsets and the contiguous data, nearly doubling your storage. What shape are your arrays, that would be the minimum size of the offsets?

Matti


On 13/3/24 18:15, Dom Grigonis wrote:
By the way, I think I am referring to integer arrays. (Or integer part of floats.)

I don’t think what I am saying sensibly applies to floats as they are.

Although, new float type could base its integer part on such concept.

—

Where I am coming from is that I started to hit maximum bounds on integer arrays, where most of values are very small and some become very large. And I am hitting memory limits. And I don’t have many zeros, so sparse arrays aren’t an option.

Approximately:
90% of my arrays could fit into `np.uint8`
1% requires `np.uint64`
the rest 9% are in between.

And there is no predictable order where is what, so splitting is not an option either.


On 13 Mar 2024, at 17:53, Nathan <nathan.goldb...@gmail.com> wrote:

Yes, an array of references still has a fixed size width in the array buffer. You can think of each entry in the array as a pointer to some other memory on the heap, which can be a dynamic memory allocation.

There's no way in NumPy to support variable-sized array elements in the array buffer, since that assumption is key to how numpy implements strided ufuncs and broadcasting.,

On Wed, Mar 13, 2024 at 9:34 AM Dom Grigonis <dom.grigo...@gmail.com> wrote:

    Thank you for this.

    I am just starting to think about these things, so I appreciate
    your patience.

    But isn’t it still true that all elements of an array are still
    of the same size in memory?

    I am thinking along the lines of per-element dynamic memory
    management. Such that if I had array [0, 1e10000], the first
    element would default to reasonably small size in memory.

    On 13 Mar 2024, at 16:29, Nathan <nathan.goldb...@gmail.com> wrote:

    It is possible to do this using the new DType system.

    Sebastian wrote a sketch for a DType backed by the GNU
    multiprecision float library:
    https://github.com/numpy/numpy-user-dtypes/tree/main/mpfdtype

    It adds a significant amount of complexity to store data outside
    the array buffer and introduces the possibility of
    use-after-free and dangling reference errors that are impossible
    if the array does not use embedded references, so that’s the
    main reason it hasn’t been done much.

    On Wed, Mar 13, 2024 at 8:17 AM Dom Grigonis
    <dom.grigo...@gmail.com> wrote:

        Hi all,

        Say python’s builtin `int` type. It can be as large as
        memory allows.

        np.ndarray on the other hand is optimized for vectorization
        via strides, memory structure and many things that I
        probably don’t know. Well the point is that it is convenient
        and efficient to use for many things in comparison to
        python’s built-in list of integers.

        So, I am thinking whether something in between exists? (And
        obviously something more clever than np.array(dtype=object))

        Probably something similar to `StringDType`, but for
        integers and floats. (It’s just my guess. I don’t know
        anything about `StringDType`, but just guessing it must be
        better than np.array(dtype=object) in combination with
        np.vectorize)

        Regards,
        dgpb

        _______________________________________________
        NumPy-Discussion mailing list -- numpy-discussion@python.org
        To unsubscribe send an email to
        numpy-discussion-le...@python.org
        https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
        Member address: nathan12...@gmail.com

    _______________________________________________
    NumPy-Discussion mailing list -- numpy-discussion@python.org
    To unsubscribe send an email to numpy-discussion-le...@python.org
    https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
    Member address: dom.grigo...@gmail.com

    _______________________________________________
    NumPy-Discussion mailing list -- numpy-discussion@python.org
    To unsubscribe send an email to numpy-discussion-le...@python.org
    https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
    Member address: nathan12...@gmail.com

_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: dom.grigo...@gmail.com


_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: matti.pi...@gmail.com
_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

Reply via email to