On Thu, Apr 7, 2016 at 11:35 AM, <josef.p...@gmail.com> wrote:
> On Thu, Apr 7, 2016 at 11:13 AM, Todd <toddr...@gmail.com> wrote: > > On Wed, Apr 6, 2016 at 5:20 PM, Nathaniel Smith <n...@pobox.com> wrote: > >> > >> On Wed, Apr 6, 2016 at 10:43 AM, Todd <toddr...@gmail.com> wrote: > >> > > >> > My intention was to make linear algebra operations easier in numpy. > >> > With > >> > the @ operator available, it is now very easy to do basic linear > algebra > >> > on > >> > arrays without needing the matrix class. But getting an array into a > >> > state > >> > where you can use the @ operator effectively is currently pretty > verbose > >> > and > >> > confusing. I was trying to find a way to make the @ operator more > >> > useful. > >> > >> Can you elaborate on what you're doing that you find verbose and > >> confusing, maybe paste an example? I've never had any trouble like > >> this doing linear algebra with @ or dot (which have similar semantics > >> for 1d arrays), which is probably just because I've had different use > >> cases, but it's much easier to talk about these things with a concrete > >> example in front of us to put everyone on the same page. > >> > > > > Let's say you want to do a simple matrix multiplication example. You > create > > two example arrays like so: > > > > a = np.arange(20) > > b = np.arange(10, 50, 10) > > > > Now you want to do > > > > a.T @ b > > > > First you need to turn a into a 2D array. I can think of 10 ways to do > this > > off the top of my head, and there may be more: > > > > 1a) a[:, None] > > 1b) a[None] > > 1c) a[None, :] > > 2a) a.shape = (1, -1) > > 2b) a.shape = (-1, 1) > > 3a) a.reshape(1, -1) > > 3b) a.reshape(-1, 1) > > 4a) np.reshape(a, (1, -1)) > > 4b) np.reshape(a, (-1, 1)) > > 5) np.atleast_2d(a) > > > > 5 is pretty clear, and will work fine with any number of dimensions, but > is > > also long to type out when trying to do a simple example. The different > > variants of 1, 2, 3, and 4, however, will only work with 1D arrays > (making > > them less useful for functions), are not immediately obvious to me what > the > > result will be (I always need to try it to make sure the result is what I > > expect), and are easy to get mixed up in my opinion. They also require > > people keep a mental list of lots of ways to do what should be a very > simple > > task. > > > > Basically, my argument here is the same as the argument from pep465 for > the > > inclusion of the @ operator: > > > https://www.python.org/dev/peps/pep-0465/#transparent-syntax-is-especially-crucial-for-non-expert-programmers > > > > "A large proportion of scientific code is written by people who are > experts > > in their domain, but are not experts in programming. And there are many > > university courses run each year with titles like "Data analysis for > social > > scientists" which assume no programming background, and teach some > > combination of mathematical techniques, introduction to programming, and > the > > use of programming to implement these mathematical techniques, all > within a > > 10-15 week period. These courses are more and more often being taught in > > Python rather than special-purpose languages like R or Matlab. > > > > For these kinds of users, whose programming knowledge is fragile, the > > existence of a transparent mapping between formulas and code often means > the > > difference between succeeding and failing to write that code at all." > > This doesn't work because of the ambiguity between column and row vector. > > In most cases 1d vectors in statistics/econometrics are column > vectors. Sometime it takes me a long time to figure out whether an > author uses row or column vector for transpose. > > i.e. I often need x.T dot y which works for 1d and 2d to produce > inner product. > but the outer product would require most of the time a column vector > so it's defined as x dot x.T. > > I think keeping around explicitly 2d arrays if necessary is less error > prone and confusing. > > But I wouldn't mind a shortcut for atleast_2d (although more often I > need atleast_2dcol to translate formulas) > > At least from what I have seen, in all cases in numpy where a 1D array is treated as a 2D array, it is always treated as a row vector, the examples I can think of being atleast_2d, hstack, vstack, and dstack. So using this convention would be in line with how it is used elsewhere in numpy.
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion