[Numpy-discussion] Re: Policy on AI-generated code

2024-07-04 Thread Daniele Nicolodi

On 04/07/24 13:29, Matthew Brett wrote:

I agree it is hard to enforce, but it seems to me it would be a
reasonable defensive move to say - for now - that authors will need to
take full responsibility for copyright, and that, as of now,
AI-generated code cannot meet that standard, so we require authors to
turn off AI-generation when writing code for Numpy.


I like this position.

I wish it for be common sense for contributors to an open source 
codebase that they need to own the copyright on their contributions, but 
I don't think it can be assumed. Adding something to these lines to the 
project policy has also the potential to educate the contributions about 
the pitfalls of using AI to autocomplete their contributions.


Cheers,
Dan
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Policy on AI-generated code

2024-07-04 Thread Daniele Nicolodi

On 04/07/24 12:49, Matthew Brett wrote:

Hi,

Sorry to top-post!  But - I wanted to bring the discussion back to
licensing.  I have great sympathy for the ecological and code-quality
concerns, but licensing is a separate question, and, it seems to me,
an urgent question.


The licensing issue is complex and it is very likely that it will not 
get a definitive answer until a lawsuit centered around this issue is 
litigated in court. There are several lawsuits involving similar issues 
ongoing, but any resolution is likely to take several years.


Providing other, much more pragmatic and easier to gauge, reasons to 
reject AI generated contributions, I was trying to sidestep the 
licensing issue completely.


If there are other reasons why auto-generated contributions should be 
rejected, there is no need to solve the much harder problem of 
licensing: we don't want them regardless of the licensing issue.


Cheers,
Dan

___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Policy on AI-generated code

2024-07-04 Thread Daniele Nicolodi

On 03/07/24 23:40, Matthew Brett wrote:

Hi,

We recently got a set of well-labeled PRs containing (reviewed)
AI-generated code:

https://github.com/numpy/numpy/pull/26827
https://github.com/numpy/numpy/pull/26828
https://github.com/numpy/numpy/pull/26829
https://github.com/numpy/numpy/pull/26830
https://github.com/numpy/numpy/pull/26831

Do we have a policy on AI-generated code?   It seems to me that
AI-code in general must be a license risk, as the AI may well generate
code that was derived from, for example, code with a GPL-license.


There is definitely the issue of copyright to keep in mind, but I see 
two other issues: the quality of the contributions and one moral issue.


IMHO the PR linked above are not high quality contributions: for 
example, the added examples are often redundant with each other. In my 
experience these are representative of automatically generate content: 
as there is little to no effort involved into writing it, the content is 
often repetitive and with very low information density. In the case of 
documentation, I find this very detrimental to the overall quality.


Contributions generated with AI have huge ecological and social costs. 
Encouraging AI generated contributions, especially where there is 
absolutely no need to involve AI to get to the solution, as in the 
examples above, makes the project co-responsible for these costs.


Cheers,
Dan

___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: spam on the mailing lists

2021-10-10 Thread Daniele Nicolodi
On 10/10/2021 18:37, Hameer Abbasi wrote:
> Hello everyone,
> 
> Just my 2 cents: I marked a few of the actual spam e-mails on this 
> and the SciPy-user list as spam on my client, and it seems many
> random e-mails get sent to spam now, from both the NumPy and SciPy lists.

You should complain about this to your email service provider.

> I’d very much prefer Discourse or GitHub discussions, of the two I
> prefer Discussions, but Discourse isn’t too bad either.
I don't think moving away from the mailing list because your email
provider (despite being one of the largest corporations in the world,
and being the email service one of it core busyness) is unable to
provide you a reliable service is the way forward.

Also, in the case of Github Discussions, I don't think that moving to a
service not controlled by the community is the right thing, especially
as your previous statement already demonstrates that abdicating control
may culminate with the tools we use not operating in the way we would
like them to.

Cheers,
Dan
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


Re: [Numpy-discussion] EHN: Discusions about 'add numpy.topk'

2021-05-30 Thread Daniele Nicolodi
On 30/05/2021 00:48, Robert Kern wrote:
> On Sat, May 29, 2021 at 3:35 PM Daniele Nicolodi  <mailto:dani...@grinta.net>> wrote:
> 
> What does k stand for here? As someone that never encountered this
> function before I find both names equally confusing. If I understand
> what the function is supposed to be doing, I think largest() would be
> much more descriptive.
> 
> 
> `k` is the number of elements to return. `largest()` can connote that
> it's only returning the one largest value. It's fairly typical to
> include a dummy variable (`k` or `n`) in the name to indicate that the
> function lets you specify how many you want. See, for example, the
> stdlib `heapq` module's `nlargest()` function.

I thought that a `largest()` function with an integer second argument
could be enough self explanatory. `nlargest()` would be much more
obvious to the wider audience, I think.

> https://docs.python.org/3/library/heapq.html#heapq.nlargest
> <https://docs.python.org/3/library/heapq.html#heapq.nlargest>
> 
> "top-k" comes from the ML community where this function is used to
> evaluate classification models (`k` instead of `n` being largely an
> accident of history, I imagine). In many classification problems, the
> number of classes is very large, and they are very related to each
> other. For example, ImageNet has a lot of different dog breeds broken
> out as separate classes. In order to get a more balanced view of the
> relative performance of the classification models, you often want to
> check whether the correct class is in the top 5 classes (or whatever `k`
> is appropriate) that the model predicted for the example, not just the
> one class that the model says is the most likely. "5 largest" doesn't
> really work in the sentences that one usually writes when talking about
> ML classifiers; they are talking about the 5 classes that are associated
> with the 5 largest values from the predictor, not the values themselves.
> So "top k" is what gets used in ML discussions, and that transfers over
> to the name of the function in ML libraries.
> 
> It is a top-down reflection of the higher level thing that people want
> to compute (in that context) rather than a bottom-up description of how
> the function is manipulating the input, if that makes sense. Either one
> is a valid way to name things. There is a lot to be said for numpy's
> domain-agnostic nature that we should prefer the bottom-up description
> style of naming. However, we are also in the midst of a diversifying
> ecosystem of array libraries, largely driven by the ML domain, and
> adopting some of that terminology when we try to enhance our
> interoperability with those libraries is also a factor to be considered.

I think that such a simple function should be named in the most obvious
way possible, or it will become one function that will be used in the
domains where the unusual name makes sense, but will end being
re-implemented in all other contexts. I am sure that if I would have
been looking for a function that returns the N largest items in an array
(being that intended accordingly to a given key function or otherwise) I
would never have looked at a function named `topk()` or `top_k()` and I
am pretty sure I would have discarded anything that has `k` or `top` in
its name.

On the other hand, I understand that ML is where all the hipe (and a
large fraction of the money) is this days, thus I understand if numpy
wants to appease the crowd.

Cheers,
Dan
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] EHN: Discusions about 'add numpy.topk'

2021-05-29 Thread Daniele Nicolodi
On 29/05/2021 18:33, David Menéndez Hurtado wrote:
> 
> 
> On Sat, 29 May 2021, 4:29 pm Ralf Gommers,  > wrote:
> 
> 
> 
> On Fri, May 28, 2021 at 4:58 PM  > wrote:
> 
> Hi all,
> 
> Finding topk elements is widely used in several fields, but
> missed in NumPy.
> I implement this functionality named as  numpy.topk using core numpy
> functions and open a PR:
> 
> https://github.com/numpy/numpy/pull/19117
> 
> 
> Any discussion are welcome.
> 
> 
> Thanks for the proposal Kang. I think this functionality is indeed a
> fairly obvious gap in what Numpy offers, and would make sense to
> add. A detailed comparison with other libraries would be very
> helpful here. TensorFlow and JAX call this function `top_k`, while
> PyTorch, Dask and MXNet call it `topk`.
> 
> 
> When I saw `topk` I initially parsed it as "to pk", similar to the
> current `tolist`. I think `top_k` is more explicit and clear.

What does k stand for here? As someone that never encountered this
function before I find both names equally confusing. If I understand
what the function is supposed to be doing, I think largest() would be
much more descriptive.

Cheers,
Dan
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NumPy Feature Request: Function to wrap angles to range [ 0, 2*pi] or [ -pi, pi ]

2020-11-24 Thread Daniele Nicolodi
On 24/11/2020 10:25, Thomas wrote:
> Like Nathaniel said, it would not improve much when compared to the
> modulo operator. 
> 
> It could handle the edge cases better, but really the biggest benefit
> would be that it is more convenient.

Which edge cases? Better how?

> And as the "unwrap" function already exists,

The unwrap() function exists because it is not as trivial.

> people would expect that
> and look for a function for the inverse operation (at least I did).

What is your use of a wrap() function? I cannot think of any.

Cheers,
Dan
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NumPy Feature Request: Function to wrap angles to range [ 0, 2*pi] or [ -pi, pi ]

2020-11-24 Thread Daniele Nicolodi
On 24/11/2020 02:49, Nathaniel Smith wrote:
> How would this proposed function compare to using the modulo operator,
> like 'arr % (2*pi)'?

I wrote almost the same word bu word reply, before realizing that taking
the modulo looses the sign. The correct operation is slightly more
complex (untested):

def wrap(alpha):
return (alpha + np.pi) % 2.0 * np.pi - np.pi

However, I don't think there is much value in adding something so
trivial as a function to numpy: I cannot think to any commonly used
algorithm that requires wrapping the phase, and it is going to be an
infinite source of bikesheeding whether the wrapped range should be
[-pi, pi) or (-pi, pi] or (0, 2*pi] or [0, 2*pi)

Cheers,
Dan


> On Mon, Nov 23, 2020, 16:13 Thomas  > wrote:
> 
> Hi,
> 
> I have a proposal for a feature and I hope this is the right place
> to post this.
> 
> The idea is to have a function to map any input angle to the range
> of [ 0, 2*pi ] or [ - pi, pi ].
> 
> There already is a function called 'unwrap' that does the opposite,
> so I'd suggest calling this function 'wrap'.
> 
> Example usage:
> # wrap to range [ 0, 2*pi ]
> >>> np.wrap([ -2*pi, -pi, 0, 4*pi ])
> [0, pi, 0, 2*pi]
> 
> There is some ambiguity regarding what the solution should be for
> the extremes. An example would be an input of 4*pi, as both 0 and
> 2*pi would be valid mappings.
> 
> There has been interest for this topic in the community
> (see https://stackoverflow.com/questions/15927755/opposite-of-numpy-unwrap
> ).
> 
> Similar functions exist for Matlab
> (see https://de.mathworks.com/help/map/ref/wrapto2pi.html
> ). They solved
> the ambiguity by mapping "positive multiples of 2*pi map to 2*pi and
> negative multiples of 2*pi map to 0." for the 0 to 2*pi case.
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org 
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 
> 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 

___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: add the timestamp64 type (Noam Yorav-Raphael)

2020-11-12 Thread Daniele Nicolodi
On 12/11/2020 17:40, Matti Picus wrote:
> In a one-on-one discussion with Noam in a pre-community call (that, how 
> ironically, we had time for since we both messed up the meeting 
> time-zone change) we reached the conclusion that the request is to 
> clarify whether NumPy's datetime64 represents TAI time [0] or POSIX 
> time, with a preferecne for TAI time. The documentation mentions POSIX 
> time[1]. As Stefano points out, there is a couple of seconds difference 
> between POSIX (or Unix) time and TAI time. In practice numpy simply 
> stores a int64 value to represent the datetime64, and relies on others 
> to convert it. The leap-second might be getting lost in the conversions. 
> So it might make sense to clarify exactly how those conversions deal 
> with the leap-seconds and choose which one we mean when we use 
> datetime64. Noam please correct me if I am mistaken.

Unix time is a representation of the UTC timescale that counts 1 seconds
intervals starting from a defined epoch. It deals with leap seconds
either skipping one interval (never happened so far) or repeating an
interval so that two moments in time that on the UTC timescale are
separated by one second (for example 2016-12-31 23:59:59 and 2016-12-31
23:59:60) are represented in the same way and thus the conversion from
Unix time to UTC is ambiguous during this one second. This happened 37
times since 1972.

This comes with the nice properties that minutes, hours and days have
always the same duration (in Unix time), thus converting from the Unix
time representation to an date and hour and vice versa is fairly easy.

The drawback are, as seen above, an ambiguity on leap seconds and the
fact that the trivial computation of time intervals does not take into
account leap seconds and thus may be shorted of a few seconds (any time
interval across 2016-12-31 23:59:59 is off by at least one second if
computed simply subtracting Unix times).

I don't think these two drawbacks are important for Numpy (or any other
general purpose library). As things stand, it is not even possible, in
Python, with or without Numpy, to create a datetime or datetime64 object
from the time "2016-12-31 23:59:60" (neither accept the existence of a
minute with 61 seconds) thus the ambiguity issue is not an issue in
practice. The time interval issue may matter for some applications, but
the ones affected are aware of the issue and have means to deal with it
(the most common one being taking a day off on the days leap seconds are
introduced).

I think documenting that datetime64 is a representation of fixed time
intervals since a conventional epoch, neglecting leap seconds, is easy
to explain and implement and allows for easy interoperability with the
rest of the world.

What advantage would making datetime64 explicitly a representation of
TAI bring?

One disadvantage would be that `np.datetime64(datetime.now())` would be
harder to support as we are trying to match a point in time on the UTC
time scale to a point in time in on the TAI time scale. This is trivial
for past times (just need to adjust for the right offset) but it is
impossible to do correctly for dates in the future because we cannot
predict future leap second insertions. This would, for example, make
timestamp conversions not be reproducible across announcement of leap
second insertions.

Cheers,
Dan
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] setuptools/distutils merger & numpy.distutils

2020-09-06 Thread Daniele Nicolodi
On 06/09/2020 11:28, Ralf Gommers wrote:
> On Sun, Sep 6, 2020 at 9:50 AM Daniele Nicolodi 
> This may be bigger endeavor, but wouldn't it be possible to extend
> setuptools interfaces in a way that plugging in the fortran support does
> not require monkey patching or accessing the implementation internals?
> 
> Probably not, since distutils really doesn't have much of a design, or
> separation between API and implementation.

I was under the impression that one of the reasons why distutils is
being deprecated in favor of setuptools is to change this and evolve the
code into a better form, not to just move the code around.

> By the way, how is Cython affected by this? Cython also distributes an
> extension to (the module formerly know as) distutils to transparently
> compile extensions written in Cython. Could the efforts toward
> multi-language support in distutils be coordinated with the Cython
> maintainers?
> 
> Cython.Distutils is a few hundred lines of code, numpy.distutils is
> 20,000 lines of code. I don't think Cython will have much problems adapting.

I'm well aware of that, however I was only referring to Fortran support
here, which may be something valuable to merge upstream. And Cython also
injects the support for a different language into distutils/setuptools,
thus maybe a common approach could be envisioned, maybe more robust than
the current one. I haven't looked at the code, though.

Cheers,
Dan
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] setuptools/distutils merger & numpy.distutils

2020-09-06 Thread Daniele Nicolodi
On 06/09/2020 07:06, David Cournapeau wrote:
> Assuming the numpy.distutils codebase has not changed much in the last
> 10 years, my sense is that a lot of the features that relied on monkey
> patching can be merged upstream,  fortran support being one notable
> exception.

This may be bigger endeavor, but wouldn't it be possible to extend
setuptools interfaces in a way that plugging in the fortran support does
not require monkey patching or accessing the implementation internals?

Such an extension could also be used to extend setuptools to handle
extensions written in other languages such as Rust.

By the way, how is Cython affected by this? Cython also distributes an
extension to (the module formerly know as) distutils to transparently
compile extensions written in Cython. Could the efforts toward
multi-language support in distutils be coordinated with the Cython
maintainers?

Cheers,
Dan
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] An alternative to vectorize that lets you access the array?

2020-07-15 Thread Daniele Nicolodi
On 12/07/2020 07:00, Ram Rachum wrote:
> The reason I want the second version is that I can then have sounddevice
> start playing `output_array` in a separate thread, while it's being
> calculated. (Yes, I know about the GIL, I believe that sounddevice
> releases it.)

I don't think this is a sound design.

I don't know sounddevice, but in similar situations the standard pattern
is to allocate a buffer (in this case it can be a numpy array) and pass
that to the consumer (soundevice in your case). The consumer then tells
the producer (your music synth) when it has to produce more data.

At a quick read, it seems that the sounddevice.Stream class allows to
apply this pattern
https://python-sounddevice.readthedocs.io/en/0.3.15/usage.html#callback-streams

This also easily allows your produces function to operate on arrays and
not on single elements. Using numpy functions to operate on arrays is
going to be more efficient than iterating on the elements in Python.

Cheers,
Dan
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal to add clause to license prohibiting use by oil and gas extraction companies

2020-07-01 Thread Daniele Nicolodi
On 01-07-2020 12:34, John Preston wrote:
> Hello all,
> 
> The following proposal was originally issue #16722 on GitHub but at
> the request of Matti Picus I am moving the discussion to this list.

[snip]

Hello John,

I don't have copyright on any of the Numpy code, however would like to
express a few problems I see in this proposal.

First, as you write, such a license does not qualify as Free Software as
defined by OSI or the DFSG. Adopting this license would mean that Numpy
could not be included in many distributions that give their users the
guarantee that the softer they receive is Free Software. Debian would
remove Numpy from its archive, for example. Fedora would probably do the
same. Conda would need to do the same, but being Numpy at the base of
the Python scientific stack, this would effectively kill Conda. This
would have immediate ripercussions on companies that offer services
based on Numpy and on software that depends on Numpy.

Second, the term of the license are extremely vague, at least in a legal
framework. In particular, "used for or aid in" is a very poor choice of
words. It could be argued that if I use Numpy in the code that handles
the orders for my pizza shop and I am asked to deliver pizzas to Exon
employer working late at night I am "aiding in the "the exploration,
extraction, refinement, processing, or transportation of fossil fuels".
Thus, someone that has copyright on (even very small) part of the Numpy
code could sue me and demand a free lifetime supply of pizza for me to
continue to be able to use Numpy. In practice this would make everyone
avoid using Numpy in their software by being scared of violating these
clauses.

At the same time, the wording may be too vague to be enforceable in
court. This in practice would mean that most of the "good guys" (as per
the Climate Strike License definition) would be avoiding to use Numpy
because they do not have the resources to fight alleged license
violations in court, while the "bad guys" will continue to do it because
they have a whole legal department to handle something like this.

Third, if a software project would be to adopt something like the
Climate Strike License, why shouldn't it adopt licenses whose terms are
thought to advance some other political agenda? While the fact that the
reliance on fossil fuels is the cause of climate change is widely (but
not universally) acknowledged and we may agree that the the big
economical interests in the enterprises related to fossil fuels are
holding back alternative solutions, there are many other causes on which
an agreement would be very difficult and would drag the project members
into interminable discussions.

Fourth, are we sure that making fossil fuel companies and companies that
rely on fossil fuels less efficient (by forbidding access to the Python
scientific software stack) would make them less dangerous for the
climate? Absurdly, the Climate Strike License forbids a company that
wants to migrate from a busyness model based on fossil fuels to
something more sustainable to use a software with this license to
evaluate and form their plans.

Free Software (in its copyleft or permissive licensing variants) has
been so successful also because its promoters have not tried to leverage
it for other (noble or otherwise) scopes. There has been talk in the
past to incorporate other clauses in the Free Software license to
advance other causes (from "cause no harm" kind of things to provision
to ensure the economical viability of the development) and the
conclusion has always been that it is not a good idea. The reasons
presented ere are just some. I am sure you can find more detailed essays
from authors much more authoritative than me on this matter.

Cheers,
Dan

___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NEP 38 - Universal SIMD intrinsics

2020-02-04 Thread Daniele Nicolodi
On 04-02-2020 08:08, Matti Picus wrote:
> Together with Sayed Adel (cc) and Ralf, I am pleased to put the draft
> version of NEP 38 [0] up for discussion. As per NEP 0, this is the next
> step in the community accepting the approach layed out in the NEP. The
> NEP PR [1] has already garnered a fair amount of discussion about the
> viability of Universal SIMD Intrinsics, so I will try to capture some of
> that here as well.

Hello,

more interesting prior art may be found in VOLK https://www.libvolk.org.
VOLK is developed mainly to be used in GNURadio, and this reflects in
the available kernels and in the supported data types, I think the
approach used there may be of interest.

Cheers,
Dan
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] argmax() indexes to value

2019-11-01 Thread Daniele Nicolodi
On 01-11-2019 09:51, Allan Haldane wrote:
> my thought was to try `take` or `take_along_axis`:
> 
>ind = np.argmin(a, axis=1)
>np.take_along_axis(a, ind[:,None], axis=1)
> 
> But those functions tend to simply fall back to fancy indexing, and are
> pretty slow. On my system plain fancy indexing is fastest:
> 
 %timeit a[np.arange(N),ind]
> 1.58 µs ± 18.1 ns per loop
 %timeit np.take_along_axis(a, ind[:,None], axis=1)
> 6.49 µs ± 57.3 ns per loop
 %timeit np.min(a, axis=1)
> 9.51 µs ± 64.1 ns per loop
> 
> Probably `take_along_axis` was designed with uses like yours in mind,
> but it is not very optimized.

Hi Allan,

after scanning the documentation once more I found `take_along_axis` and
was hoping that it implements some smart trick that does not involve
generating and indexing array, but apparently that is what it does.

Given the current numpy primitives, I don't see a way to optimize it
further and keep it generic. I think the direct fancy indexing is faster
in your case because of overhead in handling the generic case and not
because of algorithmic inefficiency (from the run times you report it
seems that your test array was fairly small).

Thank you.

Cheers,
Dan
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] argmax() indexes to value

2019-11-01 Thread Daniele Nicolodi
On 31-10-2019 01:44, Elliot Hallmark wrote:
> Depends on how big your array is.  Numpy C code is 150x+ faster than
> python overhead. Fancy indexing can be expensive in my experience. 
> Without trying I'd guess arr[:, argmax(arr, axis=1)] does what you want,

It does not.

> but even if it is, try profiling the two and see.  I highly doubt such
> would be even 1% of your run time, but it depends on what your doing. 
> Part of python with numpy is slightly not caring about big O because
> trying to be clever is rarely worth it in my experience.

Why do you think I am asking for advice on how to do the complicated
thing?  If a 2x increase in the run time would have not mattered, I
would not have bothered. Don't you think?

I appreciate the effort spent guiding inexperienced users toward
pragmatic solutions and not over complicate their code. However, it is
disappointing to have very precise questions dismissed as "that is
complicated, thus you don't really want to do it".

Best,
Dan

> On Thu, Oct 31, 2019 at 12:35 AM Daniele Nicolodi  <mailto:dani...@grinta.net>> wrote:
> 
> On 30/10/2019 22:42, Elliot Hallmark wrote:
> > I wouldn't be surprised at all if calling max in addition to argmax
> > wasn't as fast or faster than indexing the array using argmax.
> > Regardless, just use that then profile when you're done with the
> > whole thing and see if there's any gains to be made. Very likely
> not here.
> 
> Hi Elliot,
> 
> how do you arrive at this conclusion? np.argmax() and np.max() are O(N)
> while indexing is O(1) thus I don't see how you can conclude that
> running both np.argmax() and np.max() on the input array is going to
> incur in a small penalty compared to running np.argmax() and then
> indexing.
> 
> Cheers,
> Dan
> 
> 
> >
> > -elliot
> >
> > On Wed, Oct 30, 2019, 10:32 PM Daniele Nicolodi
> mailto:dani...@grinta.net>
> > <mailto:dani...@grinta.net <mailto:dani...@grinta.net>>> wrote:
> >
> >     On 30/10/2019 19:10, Neal Becker wrote:
> >     > max(axis=1)?
> >
> >     Hi Neal,
> >
> >     I should have been more precise in stating the problem.
> Getting the
> >     values in the array for which I'm looking at the maxima is
> only one step
> >     in a more complex piece of code for which I need the indexes
>     along the
> >     second axis of the array. I would like to avoid to have to
> iterate the
> >     array more than once.
> >
> >     Thank you!
> >
> >     Cheers,
> >     Dan
> >
> >
> >     > On Wed, Oct 30, 2019, 7:33 PM Daniele Nicolodi
> mailto:dani...@grinta.net>
> >     <mailto:dani...@grinta.net <mailto:dani...@grinta.net>>
> >     > <mailto:dani...@grinta.net <mailto:dani...@grinta.net>
> <mailto:dani...@grinta.net <mailto:dani...@grinta.net>>>> wrote:
> >     >
> >     >     Hello,
> >     >
> >     >     this is a very basic question, but I cannot find a
> satisfying
> >     answer.
> >     >     Assume a is a 2D array and that I get the index of the
> maximum
> >     value
> >     >     along the second dimension:
> >     >
> >     >     i = a.argmax(axis=1)
> >     >
> >     >     Is there a better way to get the value of the maximum array
> >     entries
> >     >     along the second axis other than:
> >     >
> >     >     v = a[np.arange(len(a)), i]
> >     >
> >     >     ??
> >     >
> >     >     Thank you.
> >     >
> >     >     Cheers,
> >     >     Daniele
> >     >     ___
> >     >     NumPy-Discussion mailing list
> >     >     NumPy-Discussion@python.org
> <mailto:NumPy-Discussion@python.org>
> >     <mailto:NumPy-Discussion@python.org
> <mailto:NumPy-Discussion@python.org>>
> >     <mailto:NumPy-Discussion@python.org
> <mailto:NumPy-Discussion@python.org>
> >     <mailto:NumPy-Discussion@python.org
> <mailto:NumPy-Discussion@python.org>>>
> >     >     https://mail.python.org/mailman/listinfo/numpy-discussion
> >     >

Re: [Numpy-discussion] argmax() indexes to value

2019-10-30 Thread Daniele Nicolodi
On 30/10/2019 22:42, Elliot Hallmark wrote:
> I wouldn't be surprised at all if calling max in addition to argmax
> wasn't as fast or faster than indexing the array using argmax.
> Regardless, just use that then profile when you're done with the
> whole thing and see if there's any gains to be made. Very likely not here.

Hi Elliot,

how do you arrive at this conclusion? np.argmax() and np.max() are O(N)
while indexing is O(1) thus I don't see how you can conclude that
running both np.argmax() and np.max() on the input array is going to
incur in a small penalty compared to running np.argmax() and then indexing.

Cheers,
Dan


> 
> -elliot
> 
> On Wed, Oct 30, 2019, 10:32 PM Daniele Nicolodi  <mailto:dani...@grinta.net>> wrote:
> 
> On 30/10/2019 19:10, Neal Becker wrote:
> > max(axis=1)?
> 
> Hi Neal,
> 
> I should have been more precise in stating the problem. Getting the
> values in the array for which I'm looking at the maxima is only one step
> in a more complex piece of code for which I need the indexes along the
> second axis of the array. I would like to avoid to have to iterate the
> array more than once.
> 
> Thank you!
> 
> Cheers,
> Dan
> 
> 
> > On Wed, Oct 30, 2019, 7:33 PM Daniele Nicolodi  <mailto:dani...@grinta.net>
> > <mailto:dani...@grinta.net <mailto:dani...@grinta.net>>> wrote:
> >
> >     Hello,
> >
> >     this is a very basic question, but I cannot find a satisfying
> answer.
> >     Assume a is a 2D array and that I get the index of the maximum
> value
> >     along the second dimension:
> >
> >     i = a.argmax(axis=1)
> >
> >     Is there a better way to get the value of the maximum array
> entries
> >     along the second axis other than:
> >
> >     v = a[np.arange(len(a)), i]
> >
> >     ??
> >
> >     Thank you.
> >
> >     Cheers,
> >     Daniele
> >     ___
> >     NumPy-Discussion mailing list
> >     NumPy-Discussion@python.org
> <mailto:NumPy-Discussion@python.org>
> <mailto:NumPy-Discussion@python.org
> <mailto:NumPy-Discussion@python.org>>
> >     https://mail.python.org/mailman/listinfo/numpy-discussion
> >
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@python.org <mailto:NumPy-Discussion@python.org>
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org <mailto:NumPy-Discussion@python.org>
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 

___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] argmax() indexes to value

2019-10-30 Thread Daniele Nicolodi
On 30/10/2019 19:10, Neal Becker wrote:
> max(axis=1)?

Hi Neal,

I should have been more precise in stating the problem. Getting the
values in the array for which I'm looking at the maxima is only one step
in a more complex piece of code for which I need the indexes along the
second axis of the array. I would like to avoid to have to iterate the
array more than once.

Thank you!

Cheers,
Dan


> On Wed, Oct 30, 2019, 7:33 PM Daniele Nicolodi  <mailto:dani...@grinta.net>> wrote:
> 
> Hello,
> 
> this is a very basic question, but I cannot find a satisfying answer.
> Assume a is a 2D array and that I get the index of the maximum value
> along the second dimension:
> 
> i = a.argmax(axis=1)
> 
> Is there a better way to get the value of the maximum array entries
> along the second axis other than:
> 
> v = a[np.arange(len(a)), i]
> 
> ??
> 
> Thank you.
> 
> Cheers,
> Daniele
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org <mailto:NumPy-Discussion@python.org>
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 

___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] argmax() indexes to value

2019-10-30 Thread Daniele Nicolodi
Hello,

this is a very basic question, but I cannot find a satisfying answer.
Assume a is a 2D array and that I get the index of the maximum value
along the second dimension:

i = a.argmax(axis=1)

Is there a better way to get the value of the maximum array entries
along the second axis other than:

v = a[np.arange(len(a)), i]

??

Thank you.

Cheers,
Daniele
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [SciPy-User] Why slicing Pandas column and then subtract gives NaN?

2019-02-15 Thread Daniele Nicolodi
On 15-02-2019 14:48, C W wrote:
> Fair enough. Python has been called the #1 language for data science. If
> I'm slicing a[2:5] out of range, why not throw an error. This is> 
> disappointing!

No one here is trying to convince you to use Python. If you don't like
it, don't use it. Complain in this venue about how you don't like the
language is not productive and it is not going to change Python's (or
Numpy's) design. I suggest you to instead invest the time to understand
why things work the way they do.

Cheers,
Dan
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Github down on comcast

2018-06-29 Thread Daniele Nicolodi
On 6/29/18 11:15 AM, Charles R Harris wrote:
> Hi All,
> 
> Just a note for those who may be having a problem reaching Github, it is
> currently down for comcast users.
> See http://downdetector.com/status/github/map/.

Funny enough http://dowdetector.com seems to not be reachable from this
side of the Internet :-)

Cheers,
Dan

___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Is there a better way to write a stacked matrix multiplication

2017-10-26 Thread Daniele Nicolodi
Hello,

is there a better way to write the dot product between a stack of
matrices?  In my case I need to compute

y = A.T @ inv(B) @ A

with A a 3x1 matrix and B a 3x3 matrix, N times, with N in the few
hundred thousands range.  I thus "vectorize" the thing using stack of
matrices, so that A is a Nx3x1 matrix and B is Nx3x3 and I can write:

y = np.matmul(np.transpose(A, (0, 2, 1)), np.matmul(inv(B), A))

which I guess could be also written (in Python 3.6 and later):

y = np.transpose(A, (0, 2, 1)) @ inv(B) @ A

and I obtain a Nx1x1 y matrix which I can collapse to the vector I need
with np.squeeze().

However, the need for the second argument of np.transpose() seems odd to
me, because all other functions handle transparently the matrix stacking.

Am I missing something?  Is there a more natural matrix arrangement that
I could use to obtain the same results more naturally?

Cheers,
Daniele
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Compare NumPy arrays with threshold and return the differences

2017-05-17 Thread Daniele Nicolodi
On 5/17/17 10:50 AM, Nissim Derdiger wrote:
> Hi,
> In my script, I need to compare big NumPy arrays (2D or 3D), and return
> a list of all cells with difference bigger than a defined threshold.
> The compare itself can be done easily done with "allclose" function,
> like that:
> Threshold = 0.1
> if (np.allclose(Arr1, Arr2, Threshold, equal_nan=True)):
> Print('Same')
> But this compare does not return *_which_* cells are not the same.
>  
> The easiest (yet naive) way to know which cells are not the same is to
> use a simple for loops code like this one:
> def CheckWhichCellsAreNotEqualInArrays(Arr1,Arr2,Threshold):
>if not Arr1.shape == Arr2.shape:
>return ['Arrays size not the same']

I think you have been exposed to too much Matlab :-) Why the [] around
the string? The pythonic way to react to unexpected conditions is to
raise an exception:

 raise ValueError('arrays size not the same')

>Dimensions = Arr1.shape 
>Diff = []
>for i in range(Dimensions [0]):
>for j in range(Dimensions [1]):
>if not np.allclose(Arr1[i][j], Arr2[i][j], Threshold,
> equal_nan=True):
>Diff.append(',' + str(i) + ',' + str(j) + ',' +
> str(Arr1[i,j]) + ','
>+ str(Arr2[i,j]) + ',' + str(Threshold) + ',Fail\n')

Here you are also doing something very unusual. Why do you concatenate
all those strings? It would be more efficient to return the indexes of
the array elements matching the conditions and print them out in a
second step.

>return Diff
> (and same for 3D arrays - with 1 more for loop)
> This way is very slow when the Arrays are big and full of none-equal cells.
>  
> Is there a fast straight forward way in case they are not the same - to
> get a list of the uneven cells? maybe some built-in function in the
> NumPy itself?

a = np.random.randn(100, 100)
b = np.random.randn(100, 100)

ids = np.nonzero(np.abs(a - b) > threshold)

gives you a tuple of the indexes of the array elements pairs satisfying
your condition.  If you want to print them:

matcha = a[ids]
matchb = b[ids]

idt = np.vstack(ids).T

for i, ai, bi in zip(ids, matcha, matchb):
c = ','.join(str(x) for x in i)
print('{},{},{},{},Fail'.format(c, ai, bi,threshold))

works for 2D and 3D (on nD) arrays.

However, if you have many elements matching your condition this is going
to be slow and not very useful to look at. Maybe you can think about a
different way to visualize this result.

Cheers,
Dan

___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion