[Numpy-discussion] Re: Assessment of the difficulty in porting CPU architecture for numpy
Hi Xuanbao, On Thu, Nov 16, 2023 at 2:59 PM xuanbao via NumPy-Discussion < numpy-discussion@python.org> wrote: > Hello everyone! I am working on implementing a tool to assess the > complexity of CPU architecture porting. It primarily focuses on RISC-V > architecture porting. In fact, the tool may have an average estimate of > various architecture porting efforts.My focus is on the overall workload > and difficulty of transplantation in the past and future,even if a project > has already been ported.As part of my dataset, I have collected the > **numpy** project. **I would like to gather community opinions to support > my assessment. I appreciate your help and response!** Based on scanning > tools, the porting complexity is determined to be moderate leaning towards > simple, with a moderate amount of code related to the CPU architecture in > the project. Is this assessment accurate? "moderate" sounds reasonable. It depends though if you want to only get everything to work, without CPU-specific optimizations, or also support SIMD code. If the former, I'd say moderate - you still have to deal with a lot of C code and platform-specific build stuff. If the latter, the effort is high. > Do you often have any opinions on personnel allocation and consumption > time? That's very difficult to estimate. It depends on both the experience of the developer and how many platform-specific issues there will be. I'd pick a single person who can deal with C/C++ related issues, and go from there. Could be done in a day or two, or could take weeks to months. Cheers, Ralf > I look forward to your help and response. > ___ > NumPy-Discussion mailing list -- numpy-discussion@python.org > To unsubscribe send an email to numpy-discussion-le...@python.org > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > Member address: ralf.gomm...@googlemail.com > ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] next NumPy Optimization Team meeting
The next NumPy Optimization Team meeting will be held on Monday, November 20th at 4pm UTC. Join us via Zoom: https://numfocus-org.zoom.us/j/88162122074?pwd=a3h0cTZubzlLcTZzMVhCL1AxUVFMdz09 . Everyone is welcome and encouraged to attend. To add to the meeting agenda the topics you’d like to discuss, follow the link: https://hackmd.io/dVdSlQ0TThWkOk0OkmGsmw?both. For the notes from the previous meetings, visit: https://github.com/numpy/archive/tree/main/optim_team_meetings. -- Cheers, Inessa Inessa Pawson GitHub: inessapawson ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] How to sample unique vectors
Hi all, I am trying to sample k N-dimensional vectors from a uniform distribution without replacement. It seems like this should be straightforward, but I can't seem to pin it down. Specifically, I am trying to get random indices in an d0 x d1 x d2.. x dN-1 array. I thought about sneaking in a structured dtype into `rng.integers`, but of course that doesn't work. If we had a string sampler, I could sample k unique words (consisting of digits), and convert them to indices. I could over-sample and filter out the non-unique indices. Or iteratively draw blocks of samples until I've built up my k unique indices. The most straightforward solution would be to flatten indices, and to sample from those. The integers get large quickly, though. The rng.integers docstring suggests that it can handle object arrays for very large integers: > When using broadcasting with uint64 dtypes, the maximum value (2**64) > cannot be represented as a standard integer type. > The high array (or low if high is None) must have object dtype, e.g., > array([2**64]). But, that doesn't work: In [35]: rng.integers(np.array([2**64], dtype=object)) ValueError: high is out of bounds for int64 Is there an elegant way to handle this problem? Best regards, Stéfan ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: How to sample unique vectors
On Fri, Nov 17, 2023 at 1:54 PM Stefan van der Walt via NumPy-Discussion < numpy-discussion@python.org> wrote: > Hi all, > > I am trying to sample k N-dimensional vectors from a uniform distribution > without replacement. > It seems like this should be straightforward, but I can't seem to pin it > down. > > Specifically, I am trying to get random indices in an d0 x d1 x d2.. x > dN-1 array. > > I thought about sneaking in a structured dtype into `rng.integers`, but of > course that doesn't work. > > If we had a string sampler, I could sample k unique words (consisting of > digits), and convert them to indices. > > I could over-sample and filter out the non-unique indices. Or iteratively > draw blocks of samples until I've built up my k unique indices. > > The most straightforward solution would be to flatten indices, and to > sample from those. The integers get large quickly, though. The rng.integers > docstring suggests that it can handle object arrays for very large integers: > > > When using broadcasting with uint64 dtypes, the maximum value (2**64) > > cannot be represented as a standard integer type. > > The high array (or low if high is None) must have object dtype, e.g., > array([2**64]). > > But, that doesn't work: > > In [35]: rng.integers(np.array([2**64], dtype=object)) > ValueError: high is out of bounds for int64 > > Is there an elegant way to handle this problem? > The default dtype for the result of `integers()` is the signed `int64`. If you want to sample from the range `[0, 2**64)`, you need to specify `dtype=np.uint64`. The text you are reading is saying that if you want to specify exactly `2**64` as the exclusive upper bound that you won't be able to do it with a `np.uint64` array/scalar because it's one above the bound for that dtype, so you'll have to use a plain Python `int` object or `dtype=object` array in order to represent `2**64`. It is not saying that you can draw arbitrary-sized integers. >>> rng.integers(2**64, dtype=np.uint64) 11569248114014186612 If the arrays you are drawing indices for are real in-memory arrays for present-day 64-bit computers, this should be adequate. If it's a notional array that is larger, then you'll need actual arbitrary-sized integer sampling. The builtin `random.randrange()` will do arbitrary-sized integers and is quite reasonable for this task. If you want it to use our BitGenerators underneath for clean PRNG state management, this is quite doable with a simple subclass of `random.Random`: https://github.com/numpy/numpy/issues/24458#issuecomment-1685022258 -- Robert Kern ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: How to sample unique vectors
On Fri, Nov 17, 2023 at 12:10 PM Robert Kern wrote: > > On Fri, Nov 17, 2023 at 1:54 PM Stefan van der Walt via NumPy-Discussion > wrote: >> >> Hi all, >> >> I am trying to sample k N-dimensional vectors from a uniform distribution >> without replacement. >> It seems like this should be straightforward, but I can't seem to pin it >> down. >> >> Specifically, I am trying to get random indices in an d0 x d1 x d2.. x dN-1 >> array. >> >> I thought about sneaking in a structured dtype into `rng.integers`, but of >> course that doesn't work. >> >> If we had a string sampler, I could sample k unique words (consisting of >> digits), and convert them to indices. >> >> I could over-sample and filter out the non-unique indices. Or iteratively >> draw blocks of samples until I've built up my k unique indices. >> >> The most straightforward solution would be to flatten indices, and to sample >> from those. The integers get large quickly, though. The rng.integers >> docstring suggests that it can handle object arrays for very large integers: >> >> > When using broadcasting with uint64 dtypes, the maximum value (2**64) >> > cannot be represented as a standard integer type. >> > The high array (or low if high is None) must have object dtype, e.g., >> > array([2**64]). >> >> But, that doesn't work: >> >> In [35]: rng.integers(np.array([2**64], dtype=object)) >> ValueError: high is out of bounds for int64 >> >> Is there an elegant way to handle this problem? > > > The default dtype for the result of `integers()` is the signed `int64`. If > you want to sample from the range `[0, 2**64)`, you need to specify > `dtype=np.uint64`. The text you are reading is saying that if you want to > specify exactly `2**64` as the exclusive upper bound that you won't be able > to do it with a `np.uint64` array/scalar because it's one above the bound for > that dtype, so you'll have to use a plain Python `int` object or > `dtype=object` array in order to represent `2**64`. It is not saying that you > can draw arbitrary-sized integers. > > >>> rng.integers(2**64, dtype=np.uint64) > 11569248114014186612 > > If the arrays you are drawing indices for are real in-memory arrays for > present-day 64-bit computers, this should be adequate. If it's a notional > array that is larger, then you'll need actual arbitrary-sized integer > sampling. The builtin `random.randrange()` will do arbitrary-sized integers > and is quite reasonable for this task. If you want it to use our > BitGenerators underneath for clean PRNG state management, this is quite > doable with a simple subclass of `random.Random`: > https://github.com/numpy/numpy/issues/24458#issuecomment-1685022258 Wouldn't it be better to just use random.randint to generate the vectors directly at that point? If the number of possibilities is more than 2**64, birthday odds of generating the same vector twice are on the order of 1 in 2**32. And you can always do a unique() rejection check if you really want to be careful. Aaron Meurer > > -- > Robert Kern > ___ > NumPy-Discussion mailing list -- numpy-discussion@python.org > To unsubscribe send an email to numpy-discussion-le...@python.org > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > Member address: asmeu...@gmail.com ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: How to sample unique vectors
On Fri, Nov 17, 2023 at 4:15 PM Aaron Meurer wrote: > On Fri, Nov 17, 2023 at 12:10 PM Robert Kern > wrote: > > > > If the arrays you are drawing indices for are real in-memory arrays for > present-day 64-bit computers, this should be adequate. If it's a notional > array that is larger, then you'll need actual arbitrary-sized integer > sampling. The builtin `random.randrange()` will do arbitrary-sized integers > and is quite reasonable for this task. If you want it to use our > BitGenerators underneath for clean PRNG state management, this is quite > doable with a simple subclass of `random.Random`: > https://github.com/numpy/numpy/issues/24458#issuecomment-1685022258 > > Wouldn't it be better to just use random.randint to generate the > vectors directly at that point? If the number of possibilities is more > than 2**64, birthday odds of generating the same vector twice are on > the order of 1 in 2**32. And you can always do a unique() rejection > check if you really want to be careful. > Yes, I jumped to correcting the misreading of the docstring rather than solving the root problem. Almost certainly, the most straightforward strategy is to use `rng.integers(0, array_shape)` repeatedly, storing tuples of the resulting indices in a set and rejecting duplicates. You'd have to do the same thing if one used `rng.integers(0, np.prod(array_shape))` or `random.randint(0, np.prod(array_shape))` in the flattened case. `rng.integers()` doesn't sample without replacement in any case. `rng.choice()` does. It also will not support unbounded integers; it uses a signed `int64` to hold the population size, so it is bounded to that. That is _likely_ to be a fine upper bound on the size of the array (if it is a real array in a 64-bit address space). So one could use `rng.choice(np.prod(array_shape), replace=False, size=n_samples)` and unflatten these integers to the index tuples if that suits the array size. It's just doing the same set lookups internally, just somewhat more efficiently. -- Robert Kern ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: How to sample unique vectors
On Fri, Nov 17, 2023, at 11:07, Robert Kern wrote: > If the arrays you are drawing indices for are real in-memory arrays for > present-day 64-bit computers, this should be adequate. If it's a notional > array that is larger, then you'll need actual arbitrary-sized integer > sampling. The builtin `random.randrange()` will do arbitrary-sized integers > and is quite reasonable for this task. If you want it to use our > BitGenerators underneath for clean PRNG state management, this is quite > doable with a simple subclass of `random.Random`: > https://github.com/numpy/numpy/issues/24458#issuecomment-1685022258 Thanks, Robert. The use case is for randomly populating a hypothetical N-dimensional sparse array object. In practice, int64 will probably suffice. I can see how to generate arbitrarily large integers using the pattern above, but still unsure how to sample without replacement. Aaron mentioned that conflicts are extremely unlikely, so perhaps fine to assume they won't happen. Checking for conflicts is expensive. Attached is a script that implements this solution. Stéfan import random import functools import itertools import operator import numpy as np def cumulative_prod(arr): return list(itertools.accumulate(arr, func=operator.mul)) def unravel_index(x, dims): dim_prod = cumulative_prod([1] + list(dims)[:0:-1])[::-1] return [list((x[j] // dim_prod[i]) % dims[i] for i in range(len(dims))) for j in range(len(x))] # From Robert Kern's comment at # https://github.com/numpy/numpy/issues/24458#issuecomment-1685022258 class PythonRandomInterface(random.Random): def __init__(self, rng): self._rng = rng def getrandbits(self, k): """getrandbits(k) -> x. Generates an int with k random bits.""" if k < 0: raise ValueError('number of bits must be non-negative') numbytes = (k + 7) // 8 # bits / 8 and rounded up x = int.from_bytes(self._rng.bytes(numbytes), 'big') return x >> (numbytes * 8 - k)# trim excess bits def indices(self, shape, size=1): D = functools.reduce(lambda x, y: x * y, dims) indices = [pri.randint(0, D) for i in range(size)] return unravel_index(indices, shape) rng = np.random.default_rng() pri = PythonRandomInterface(rng) dims = (500, 400, 30, 15, 20, 800, 900, 2000, 800) k = 5 print(pri.indices(dims, size=k)) ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: How to sample unique vectors
On Fri, Nov 17, 2023, at 14:28, Stefan van der Walt wrote: > Attached is a script that implements this solution. And the version with set duplicates checking. Stéfan import random import functools import itertools import operator import numpy as np def cumulative_prod(arr): return list(itertools.accumulate(arr, func=operator.mul)) def unravel_index(x, dims): dim_prod = cumulative_prod([1] + list(dims)[:0:-1])[::-1] return [list((ix // dim_prod[i]) % dims[i] for i in range(len(dims))) for ix in x] # From Robert Kern's comment at # https://github.com/numpy/numpy/issues/24458#issuecomment-1685022258 class PythonRandomInterface(random.Random): def __init__(self, rng): self._rng = rng def getrandbits(self, k): """getrandbits(k) -> x. Generates an int with k random bits.""" if k < 0: raise ValueError('number of bits must be non-negative') numbytes = (k + 7) // 8 # bits / 8 and rounded up x = int.from_bytes(self._rng.bytes(numbytes), 'big') return x >> (numbytes * 8 - k)# trim excess bits def indices(self, shape, size=1): D = functools.reduce(lambda x, y: x * y, dims) indices = set() while len(indices) < size: indices.add(pri.randint(0, D)) return unravel_index(indices, shape) rng = np.random.default_rng() pri = PythonRandomInterface(rng) dims = (500, 400, 30, 15, 20, 800, 900, 2000, 800) k = 5 print(pri.indices(dims, size=k)) ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: How to sample unique vectors
On Fri, Nov 17, 2023 at 5:34 PM Stefan van der Walt wrote: > On Fri, Nov 17, 2023, at 14:28, Stefan van der Walt wrote: > > Attached is a script that implements this solution. > > > And the version with set duplicates checking. > If you're going to do the set-checking yourself, then you don't need the unbounded integer support. This is somewhat slower, probably due to the greedy tuple conversions, but eliminates all of the complexity of subclassing `Random`: def sample_indices(shape, size, rng=None): rng = np.random.default_rng(rng) ashape = np.array(shape) seen = set() while len(seen) < size: idx = tuple(rng.integers(0, ashape)) seen.add(idx) return list(seen) Unfortunately, subclassing from `Random` still doesn't get you the ability to sample without replacement for arbitrary-sized populations using `Random`'s own methods for that. `Random.sample(population, k)` requires `population` to be a `Sequence`, and that restricts you to index-sized integers (`ssize_t`) (you can try to fake one, but `len()` will balk if it gets a too-large integer from `__len__`). But `Random.sample` is only just doing set-checking anyways, so no loss. -- Robert Kern ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: How to sample unique vectors
rng.integers() (or np.random.randint) lets you specify lists for low and high. So you can just use rng.integers((0,)*len(dims), dims). Although I'm not seeing how to use this to generate a bunch of vectors at once. I would have thought something like size=(10, dims) would let you generate 10 vectors of length dims but it doesn't seem to work. Aaron Meurer On Fri, Nov 17, 2023 at 3:31 PM Stefan van der Walt via NumPy-Discussion wrote: > > On Fri, Nov 17, 2023, at 11:07, Robert Kern wrote: > > If the arrays you are drawing indices for are real in-memory arrays for > present-day 64-bit computers, this should be adequate. If it's a notional > array that is larger, then you'll need actual arbitrary-sized integer > sampling. The builtin `random.randrange()` will do arbitrary-sized integers > and is quite reasonable for this task. If you want it to use our > BitGenerators underneath for clean PRNG state management, this is quite > doable with a simple subclass of `random.Random`: > https://github.com/numpy/numpy/issues/24458#issuecomment-1685022258 > > > Thanks, Robert. The use case is for randomly populating a hypothetical > N-dimensional sparse array object. > In practice, int64 will probably suffice. > > I can see how to generate arbitrarily large integers using the pattern above, > but still unsure how to sample without replacement. Aaron mentioned that > conflicts are extremely unlikely, so perhaps fine to assume they won't > happen. Checking for conflicts is expensive. > > Attached is a script that implements this solution. > > Stéfan > > ___ > NumPy-Discussion mailing list -- numpy-discussion@python.org > To unsubscribe send an email to numpy-discussion-le...@python.org > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > Member address: asmeu...@gmail.com ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: How to sample unique vectors
On Fri, Nov 17, 2023 at 7:11 PM Aaron Meurer wrote: > rng.integers() (or np.random.randint) lets you specify lists for low > and high. So you can just use rng.integers((0,)*len(dims), dims). > > Although I'm not seeing how to use this to generate a bunch of vectors > at once. I would have thought something like size=(10, dims) would let > you generate 10 vectors of length dims but it doesn't seem to work. > `size=(k, len(dims))` def sample_indices(shape, size, rng=None): rng = np.random.default_rng(rng) ashape = np.array(shape) seen = set() while len(seen) < size: dsize = size - len(seen) seen.update(map(tuple, rng.integers(0, ashape, size=(dsize, len(shape) return list(seen) That optimistic optimization makes this the fastest solution. -- Robert Kern ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: How to sample unique vectors
Is numba solution an option for you? > On 17 Nov 2023, at 20:49, Stefan van der Walt via NumPy-Discussion > wrote: > > Hi all, > > I am trying to sample k N-dimensional vectors from a uniform distribution > without replacement. > It seems like this should be straightforward, but I can't seem to pin it down. > > Specifically, I am trying to get random indices in an d0 x d1 x d2.. x dN-1 > array. > > I thought about sneaking in a structured dtype into `rng.integers`, but of > course that doesn't work. > > If we had a string sampler, I could sample k unique words (consisting of > digits), and convert them to indices. > > I could over-sample and filter out the non-unique indices. Or iteratively > draw blocks of samples until I've built up my k unique indices. > > The most straightforward solution would be to flatten indices, and to sample > from those. The integers get large quickly, though. The rng.integers > docstring suggests that it can handle object arrays for very large integers: > >> When using broadcasting with uint64 dtypes, the maximum value (2**64) >> cannot be represented as a standard integer type. >> The high array (or low if high is None) must have object dtype, e.g., >> array([2**64]). > > But, that doesn't work: > > In [35]: rng.integers(np.array([2**64], dtype=object)) > ValueError: high is out of bounds for int64 > > Is there an elegant way to handle this problem? > > Best regards, > Stéfan > ___ > NumPy-Discussion mailing list -- numpy-discussion@python.org > To unsubscribe send an email to numpy-discussion-le...@python.org > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > Member address: dom.grigo...@gmail.com ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: How to sample unique vectors
On Fri, Nov 17, 2023, at 16:52, Robert Kern wrote: > That optimistic optimization makes this the fastest solution. That'd work great, thanks Robert, Aaron, and everyone who shared input. Stéfan___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com