Re: [Numpy-discussion] Should I use pip install numpy in linux?

2016-01-09 Thread Nathaniel Smith
On Sat, Jan 9, 2016 at 8:58 PM, Matthew Brett  wrote:
> On Sat, Jan 9, 2016 at 8:49 PM, Nathaniel Smith  wrote:
>> On Jan 9, 2016 10:09, "Matthew Brett"  wrote:
>>>
>>> Hi Sandro,
>>>
>>> On Sat, Jan 9, 2016 at 4:44 AM, Sandro Tosi  wrote:
>>> >> I wrote a page on using pip with Debian / Ubuntu here :
>>> >> https://matthew-brett.github.io/pydagogue/installing_on_debian.html
>>> >
>>> > Speaking with my numpy debian maintainer hat on, I would really
>>> > appreciate if you dont suggest to use pip to install packages in
>>> > Debian, or at least not as the only solution.
>>>
>>> I'm very happy to accept alternative suggestions or PRs.
>>>
>>> I know what you mean, but I can't yet see how to write a page that
>>> would be good for explaining the benefits / tradeoffs of using deb
>>> packages vs mainly or only pip packages vs a mix of the two.  Do you
>>> have any thoughts?
>>
>> Why not replace all the "sudo pip" calls with "pip --user"? The trade offs
>> between Debian-installed packages versus pip --user installed packages are
>> subtle, and both are good options. Personal I'd generally recommend anyone
>> actively developing python code to skip straight to pip for most things,
>> since you'll eventually end up there anyway, but this is definitely
>> debatable and situation dependent. On the other hand, "sudo pip"
>> specifically is something I'd never recommend, and indeed has the potential
>> to totally break your system.
>
> Sure, but I don't think the page is suggesting doing ``sudo pip`` for
> anything other than upgrading pip and virtualenv(wrapper) - and I
> don't think that is likely to break the system.

It could... a quick glance suggests that currently installing
virtualenvwrapper like that will also pull in some random pypi
snapshot of stevedore, which will shadow the built-in package version.
And then stevedore is used by tons of different debian packages,
including large parts of openstack...

But more to the point, the target audience for your page is hardly
equipped to perform that kind of analysis, never mind in the general
case of using 'sudo pip' for arbitrary Python packages, and your very
first example is one that demonstrates bad habits... So personally I'd
avoid mentioning the possibility of 'sudo pip', or better yet
explicitly warn against it.

-n

-- 
Nathaniel J. Smith -- http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Should I use pip install numpy in linux?

2016-01-09 Thread Matthew Brett
On Sat, Jan 9, 2016 at 8:49 PM, Nathaniel Smith  wrote:
> On Jan 9, 2016 10:09, "Matthew Brett"  wrote:
>>
>> Hi Sandro,
>>
>> On Sat, Jan 9, 2016 at 4:44 AM, Sandro Tosi  wrote:
>> >> I wrote a page on using pip with Debian / Ubuntu here :
>> >> https://matthew-brett.github.io/pydagogue/installing_on_debian.html
>> >
>> > Speaking with my numpy debian maintainer hat on, I would really
>> > appreciate if you dont suggest to use pip to install packages in
>> > Debian, or at least not as the only solution.
>>
>> I'm very happy to accept alternative suggestions or PRs.
>>
>> I know what you mean, but I can't yet see how to write a page that
>> would be good for explaining the benefits / tradeoffs of using deb
>> packages vs mainly or only pip packages vs a mix of the two.  Do you
>> have any thoughts?
>
> Why not replace all the "sudo pip" calls with "pip --user"? The trade offs
> between Debian-installed packages versus pip --user installed packages are
> subtle, and both are good options. Personal I'd generally recommend anyone
> actively developing python code to skip straight to pip for most things,
> since you'll eventually end up there anyway, but this is definitely
> debatable and situation dependent. On the other hand, "sudo pip"
> specifically is something I'd never recommend, and indeed has the potential
> to totally break your system.

Sure, but I don't think the page is suggesting doing ``sudo pip`` for
anything other than upgrading pip and virtualenv(wrapper) - and I
don't think that is likely to break the system.

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Should I use pip install numpy in linux?

2016-01-09 Thread Nathaniel Smith
On Jan 9, 2016 10:09, "Matthew Brett"  wrote:
>
> Hi Sandro,
>
> On Sat, Jan 9, 2016 at 4:44 AM, Sandro Tosi  wrote:
> >> I wrote a page on using pip with Debian / Ubuntu here :
> >> https://matthew-brett.github.io/pydagogue/installing_on_debian.html
> >
> > Speaking with my numpy debian maintainer hat on, I would really
> > appreciate if you dont suggest to use pip to install packages in
> > Debian, or at least not as the only solution.
>
> I'm very happy to accept alternative suggestions or PRs.
>
> I know what you mean, but I can't yet see how to write a page that
> would be good for explaining the benefits / tradeoffs of using deb
> packages vs mainly or only pip packages vs a mix of the two.  Do you
> have any thoughts?

Why not replace all the "sudo pip" calls with "pip --user"? The trade offs
between Debian-installed packages versus pip --user installed packages are
subtle, and both are good options. Personal I'd generally recommend anyone
actively developing python code to skip straight to pip for most things,
since you'll eventually end up there anyway, but this is definitely
debatable and situation dependent. On the other hand, "sudo pip"
specifically is something I'd never recommend, and indeed has the potential
to totally break your system.

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Should I use pip install numpy in linux?

2016-01-09 Thread Matthew Brett
On Sat, Jan 9, 2016 at 6:57 PM, Sandro Tosi  wrote:
> On Sat, Jan 9, 2016 at 6:08 PM, Matthew Brett  wrote:
>> Hi Sandro,
>>
>> On Sat, Jan 9, 2016 at 4:44 AM, Sandro Tosi  wrote:
 I wrote a page on using pip with Debian / Ubuntu here :
 https://matthew-brett.github.io/pydagogue/installing_on_debian.html
>>>
>>> Speaking with my numpy debian maintainer hat on, I would really
>>> appreciate if you dont suggest to use pip to install packages in
>>> Debian, or at least not as the only solution.
>>
>> I'm very happy to accept alternative suggestions or PRs.
>>
>> I know what you mean, but I can't yet see how to write a page that
>> would be good for explaining the benefits / tradeoffs of using deb
>> packages vs mainly or only pip packages vs a mix of the two.  Do you
>> have any thoughts?
>
> you can start by making extremely clear that this is not the Debian
> supported way to install python modules on a Debian system, that if a
> user uses pip to do it, it's very likely other applications or modules
> will fail, that if they have any problem with anything python related,
> they are on their own as they "broke" their system on purpose. thanks
> for considering

I updated the page with more on reasons to prefer Debian packages over
installing with pip:

https://matthew-brett.github.io/pydagogue/installing_on_debian.html

Is that enough to get the message across?

Cheers,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Should I use pip install numpy in linux?

2016-01-09 Thread Sandro Tosi
On Sat, Jan 9, 2016 at 6:08 PM, Matthew Brett  wrote:
> Hi Sandro,
>
> On Sat, Jan 9, 2016 at 4:44 AM, Sandro Tosi  wrote:
>>> I wrote a page on using pip with Debian / Ubuntu here :
>>> https://matthew-brett.github.io/pydagogue/installing_on_debian.html
>>
>> Speaking with my numpy debian maintainer hat on, I would really
>> appreciate if you dont suggest to use pip to install packages in
>> Debian, or at least not as the only solution.
>
> I'm very happy to accept alternative suggestions or PRs.
>
> I know what you mean, but I can't yet see how to write a page that
> would be good for explaining the benefits / tradeoffs of using deb
> packages vs mainly or only pip packages vs a mix of the two.  Do you
> have any thoughts?

you can start by making extremely clear that this is not the Debian
supported way to install python modules on a Debian system, that if a
user uses pip to do it, it's very likely other applications or modules
will fail, that if they have any problem with anything python related,
they are on their own as they "broke" their system on purpose. thanks
for considering

-- 
Sandro "morph" Tosi
My website: http://matrixhasu.altervista.org/
Me at Debian: http://wiki.debian.org/SandroTosi
G+: https://plus.google.com/u/0/+SandroTosi
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Defining a base linux-64 environment [was: Should I use pip install numpy in linux?]

2016-01-09 Thread Robert McGibbon
> Maybe a better approach would be to look at what libraries are used on
by an up-to-date default Anaconda install (on the assumption that this
is the best tested configuration)

That's not a bad idea. I also have a couple other ideas about how to filter
this based on using debian popularity-contests and the package graph. I
will report back when I have more info.

-Robert

On Sat, Jan 9, 2016 at 3:04 PM, Nathaniel Smith  wrote:

> On Sat, Jan 9, 2016 at 3:52 AM, Robert McGibbon 
> wrote:
> > Hi all,
> >
> > I went ahead and tried to collect a list of all of the libraries that
> could
> > be considered to constitute the "base" system for linux-64. The strategy
> I
> > used was to leverage off the work done by the folks at Continuum by
> > searching through their pre-compiled binaries from
> > https://repo.continuum.io/pkgs/free/linux-64/ to find shared libraries
> that
> > were dependened on (according to ldd)  that were not accounted for by the
> > declared dependencies that each package made known to the conda package
> > manager.
> >
> > The full list of these system libraries, sorted in from
> > most-commonly-depend-on to rarest, is below. There are 158 of them.
> [...]
> > So it's not perfect. But it might be a useful starting place.
>
> Unfortunately, yeah, it looks like there's a lot of false positives in
> here :-(. For example your list contains liblzma and libsqlite, but
> both of these are shipped as dependencies of python itself. So
> probably someone just forgot to declare the dependency explicitly, but
> got away with it because the libraries were pulled in anyway.
>
> Maybe a better approach would be to look at what libraries are used on
> by an up-to-date default Anaconda install (on the assumption that this
> is the best tested configuration), and then erase from the list all
> libraries that are shipped by this configuration (ignoring declared
> dependencies since those seem to be unreliable)? It's better to be
> conservative here, since the end goal is to come up with a list of
> external libraries that we're confident have actually been tested for
> compatibility by lots and lots of different users.
>
> -n
>
> --
> Nathaniel J. Smith -- http://vorpus.org
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Defining a base linux-64 environment [was: Should I use pip install numpy in linux?]

2016-01-09 Thread Nathaniel Smith
On Sat, Jan 9, 2016 at 3:52 AM, Robert McGibbon  wrote:
> Hi all,
>
> I went ahead and tried to collect a list of all of the libraries that could
> be considered to constitute the "base" system for linux-64. The strategy I
> used was to leverage off the work done by the folks at Continuum by
> searching through their pre-compiled binaries from
> https://repo.continuum.io/pkgs/free/linux-64/ to find shared libraries that
> were dependened on (according to ldd)  that were not accounted for by the
> declared dependencies that each package made known to the conda package
> manager.
>
> The full list of these system libraries, sorted in from
> most-commonly-depend-on to rarest, is below. There are 158 of them.
[...]
> So it's not perfect. But it might be a useful starting place.

Unfortunately, yeah, it looks like there's a lot of false positives in
here :-(. For example your list contains liblzma and libsqlite, but
both of these are shipped as dependencies of python itself. So
probably someone just forgot to declare the dependency explicitly, but
got away with it because the libraries were pulled in anyway.

Maybe a better approach would be to look at what libraries are used on
by an up-to-date default Anaconda install (on the assumption that this
is the best tested configuration), and then erase from the list all
libraries that are shipped by this configuration (ignoring declared
dependencies since those seem to be unreliable)? It's better to be
conservative here, since the end goal is to come up with a list of
external libraries that we're confident have actually been tested for
compatibility by lots and lots of different users.

-n

-- 
Nathaniel J. Smith -- http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Should I use pip install numpy in linux?

2016-01-09 Thread Nathaniel Smith
On Jan 9, 2016 04:12, "Julian Taylor"  wrote:
>
> On 09.01.2016 04:38, Nathaniel Smith wrote:
> > On Fri, Jan 8, 2016 at 7:17 PM, Nathan Goldbaum 
wrote:
> >> Doesn't building on CentOS 5 also mean using a quite old version of
gcc?
> >
> > Yes. IIRC CentOS 5 ships with gcc 4.4, and you can bump that up to gcc
> > 4.8 by using the Redhat Developer Toolset release (which is gcc +
> > special backport libraries to let it generate RHEL5/CentOS5-compatible
> > binaries). (I might have one or both of those version numbers slightly
> > wrong.)
> >
> >> I've never tested this, but I've seen claims on the anaconda mailing
list of
> >> ~25% slowdowns compared to building from source or using system
packages,
> >> which was attributed to building using an older gcc that doesn't
optimize as
> >> well as newer versions.
> >
> > I'd be very surprised if that were a 25% slowdown in general, as
> > opposed to a 25% slowdown on some particular inner loop that happened
> > to neatly match some new feature in a new gcc (e.g. something where
> > the new autovectorizer kicked in). But yeah, in general this is just
> > an inevitable trade-off when it comes to distributing binaries: you're
> > always going to pay some penalty for achieving broad compatibility as
> > compared to artisanally hand-tuned binaries specialized for your
> > machine's exact OS version, processor, etc. Not much to be done,
> > really. At some point the baseline for compatibility will switch to
> > "compile everything on CentOS 6", and that will be better but it will
> > still be worse than binaries that target CentOS 7, and so on and so
> > forth.
> >
>
> I have over the years put in one gcc specific optimization after the
> other so yes using an ancient version will make many parts significantly
> slower. Though that is not really a problem, updating a compiler is easy
> even without redhats devtoolset.
>
> At least as far as numpy is concerned linux binaries should not be a
> very big problem. The only dependency where the version matters is glibc
> which has updated its interfaces we use (in a backward compatible way)
> many times.
> But here if we use a old enough baseline glibc (e.g. centos5 or ubuntu
> 10.04) we are fine at reasonable performance costs, basically only
> slower memcpy.

Are you saying that it's easy to use, say, gcc 5.3's C compiler to produce
binaries that will run on an out-of-the-box centos 5 install? I assumed
that there'd be issues with things like new symbol versions in libgcc, not
just glibc, but if not then that would be great...

> Scipy on the other hand is a larger problem as it contains C++ code.
> Linux systems are now transitioning to C++11 which is binary
> incompatible in parts to the old standard. There a lot of testing is
> necessary to check if we are affected.
> How does Anaconda deal with C++11?

IIUC the situation with the C++ stdlib changes in gcc 5 is that old
binaries will continue to work on new systems. The only thing that breaks
is that if two libraries want to pass objects of the affected types back
and forth (e.g. std::string), then either they both need to be compiled
with the old abi or they both need to be compiled with the new abi. (And
when using a new compiler it's still possible to choose the old abi with a
#define; old compilers of course only support the old abi.)

See: http://developerblog.redhat.com/2015/02/05/gcc5-and-the-c11-abi/

So the answer is that most python packages don't care, because even the
ones written in C++ don't generally talk C++ across package boundaries, and
for the ones that do care then the people making the binary packages will
have to coordinate to use the same abi. And for local builds on modern
systems that link against binary packages built using the old abi, people
might have to use -D_GLIBCXX_USE_CXX11_ABI=0.

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Windows build/distribute plan & MingwPy funding

2016-01-09 Thread Ralf Gommers
On Mon, Jan 4, 2016 at 7:38 PM, Matthew Brett 
wrote:

> Hi,
>
> On Mon, Jan 4, 2016 at 5:35 PM, Erik Bray 
> wrote:
> > On Sat, Jan 2, 2016 at 3:20 AM, Ralf Gommers 
> wrote:
> >> Hi all,
> >>
> >> You probably know that building Numpy, Scipy and the rest of the Scipy
> Stack
> >> on Windows is problematic. And that there are plans to adopt the static
> >> MinGW-w64 based toolchain that Carl Kleffner has done a lot of work on
> for
> >> the last two years to fix that situation.
> >>
> >> The good news is: this has become a lot more concrete just now, with
> this
> >> proposal for funding:
> http://mingwpy.github.io/proposal_december2015.html
> >>
> >> Funding for phases 1 and 2 is already confirmed; the phase 3 part has
> been
> >> submitted to the PSF. Phase 1 (of $1000) is funded by donations made to
> >> Numpy and Scipy (through NumFOCUS), and phase 2 (of $4000) by NumFOCUS
> >> directly. So a big thank you to everyone who made a donation to Numpy,
> Scipy
> >> and NumFOCUS!
>

More good news: the PSF has approved phase 3!


>
> >> I hope that that proposal gives a clear idea of the work that's going
> to be
> >> done over the next months. Note that the http://mingwpy.github.io
> contains a
> >> lot more background info, description of technical issues, etc.
> >>
> >> Feedback & ideas very welcome of course!
> >>
> >> Cheers,
> >> Ralf
> >
> > Hi Ralph,
> >
> > I've seen you drop hints about this recently, and am interested to
> > follow this work.  I've been hired as part of the OpenDreamKit project
> > to work, in large part, on developing a sensible toolchain for
> > building and distributing Sage on Windows.  I know you've been
> > following the thread on that too.  Although the primary goal there is
> > "whatever works", I'm personally inclined to focus on the mingwpy /
> > mingw-w64 approach, due in large part with my past success with the
> > MinGW 32-bit toolchain.
>

That's good to hear Erik.


> (I personally have a desire to improve
> > support for building with MSVC as well, but that's a less important
> > goal as far as the funding is concerned.)
>
>
> > So anyways, please keep me in the loop about this, as I will also be
> > putting effort into this over the next year as well.  Has there been
> > any discussion about setting up a mailing list specifically for this
> > project?
>

We'll definitely keep you in the loop. We try to do as much of the
discussion as possible on the mingwpy mailing list and on
https://github.com/mingwpy. If there's significant progress then I guess
that'll be announced on this list as well.

Cheers,
Ralf


> Yes, it exists already, but not well advertised :
> https://groups.google.com/forum/#!forum/mingwpy
>
> It would be great to share work.
>
> Cheers,
>
> Matthew
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Should I use pip install numpy in linux?

2016-01-09 Thread Matthew Brett
Hi Sandro,

On Sat, Jan 9, 2016 at 4:44 AM, Sandro Tosi  wrote:
>> I wrote a page on using pip with Debian / Ubuntu here :
>> https://matthew-brett.github.io/pydagogue/installing_on_debian.html
>
> Speaking with my numpy debian maintainer hat on, I would really
> appreciate if you dont suggest to use pip to install packages in
> Debian, or at least not as the only solution.

I'm very happy to accept alternative suggestions or PRs.

I know what you mean, but I can't yet see how to write a page that
would be good for explaining the benefits / tradeoffs of using deb
packages vs mainly or only pip packages vs a mix of the two.  Do you
have any thoughts?

Cheers,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Generalize hstack/vstack --> stack; Block matrices like in matlab

2016-01-09 Thread Gyro Funch
On 1/9/2016 4:58 AM, Stefan Otte wrote:
> Hey,
> 
> one of my new year's resolutions is to get my pull requests
> accepted (or closed). So here we go...
> 
> Here is the update pull request: 
> https://github.com/numpy/numpy/pull/5057 Here is the docstring: 
> https://github.com/sotte/numpy/commit/3d4c5d19a8f15b35df50d945b9c8853b683f7ab6#diff-2270128d50ff15badd1aba4021c50a8cR358
>
>  The new `block` function is very similar to matlab's `[A, B; C,
> D]`.
> 
> Pros: - it's very useful (in my experiment) - less friction for
> people coming from matlab - it's conceptually simple - the
> implementation is simple - it's documented - it's tested
> 
> Cons: - the implementation is not super efficient. Temporary
> copies are created. However, bmat also does that.
> 
> Feedback is very welcome!
> 
> 
> Best, Stefan
> 
> 


Without commenting on the implementation, I would find this function
*very* useful and convenient in my own work.

-gyro


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Should I use pip install numpy in linux?

2016-01-09 Thread David Cournapeau
On Sat, Jan 9, 2016 at 12:12 PM, Julian Taylor <
jtaylor.deb...@googlemail.com> wrote:

> On 09.01.2016 04:38, Nathaniel Smith wrote:
> > On Fri, Jan 8, 2016 at 7:17 PM, Nathan Goldbaum 
> wrote:
> >> Doesn't building on CentOS 5 also mean using a quite old version of gcc?
> >
> > Yes. IIRC CentOS 5 ships with gcc 4.4, and you can bump that up to gcc
> > 4.8 by using the Redhat Developer Toolset release (which is gcc +
> > special backport libraries to let it generate RHEL5/CentOS5-compatible
> > binaries). (I might have one or both of those version numbers slightly
> > wrong.)
> >
> >> I've never tested this, but I've seen claims on the anaconda mailing
> list of
> >> ~25% slowdowns compared to building from source or using system
> packages,
> >> which was attributed to building using an older gcc that doesn't
> optimize as
> >> well as newer versions.
> >
> > I'd be very surprised if that were a 25% slowdown in general, as
> > opposed to a 25% slowdown on some particular inner loop that happened
> > to neatly match some new feature in a new gcc (e.g. something where
> > the new autovectorizer kicked in). But yeah, in general this is just
> > an inevitable trade-off when it comes to distributing binaries: you're
> > always going to pay some penalty for achieving broad compatibility as
> > compared to artisanally hand-tuned binaries specialized for your
> > machine's exact OS version, processor, etc. Not much to be done,
> > really. At some point the baseline for compatibility will switch to
> > "compile everything on CentOS 6", and that will be better but it will
> > still be worse than binaries that target CentOS 7, and so on and so
> > forth.
> >
>
> I have over the years put in one gcc specific optimization after the
> other so yes using an ancient version will make many parts significantly
> slower. Though that is not really a problem, updating a compiler is easy
> even without redhats devtoolset.
>
> At least as far as numpy is concerned linux binaries should not be a
> very big problem. The only dependency where the version matters is glibc
> which has updated its interfaces we use (in a backward compatible way)
> many times.
> But here if we use a old enough baseline glibc (e.g. centos5 or ubuntu
> 10.04) we are fine at reasonable performance costs, basically only
> slower memcpy.
>
> Scipy on the other hand is a larger problem as it contains C++ code.
> Linux systems are now transitioning to C++11 which is binary
> incompatible in parts to the old standard. There a lot of testing is
> necessary to check if we are affected.
> How does Anaconda deal with C++11?
>

For canopy packages, we use the RH devtoolset w/ gcc 4.8.X, and statically
link the C++ stdlib.

It has worked so far for the few packages requiring C++11 and gcc > 4.4
(llvm/llvmlite/dynd), but that's not a solution I am a fan of myself, as
the implications are not always very clear.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Defining a base linux-64 environment [was: Should I use pip install numpy in linux?]

2016-01-09 Thread David Cournapeau
On Sat, Jan 9, 2016 at 12:20 PM, Julian Taylor <
jtaylor.deb...@googlemail.com> wrote:

> On 09.01.2016 12:52, Robert McGibbon wrote:
> > Hi all,
> >
> > I went ahead and tried to collect a list of all of the libraries that
> > could be considered to constitute the "base" system for linux-64. The
> > strategy I used was to leverage off the work done by the folks at
> > Continuum by searching through their pre-compiled binaries
> > from https://repo.continuum.io/pkgs/free/linux-64/ to find shared
> > libraries that were dependened on (according to ldd)  that were not
> > accounted for by the declared dependencies that each package made known
> > to the conda package manager.
> >
>
> do those packages use ld --as-needed for linking?
> there are a lot libraries in that list that I highly doubt are directly
> used by the packages.
>

It is also a common problem when building packages without using a "clean"
build environment, as it is too easy to pick up dependencies accidentally,
especially for autotools-based packages (unless one uses pbuilder or
similar tools).

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Should I use pip install numpy in linux?

2016-01-09 Thread Sandro Tosi
> I wrote a page on using pip with Debian / Ubuntu here :
> https://matthew-brett.github.io/pydagogue/installing_on_debian.html

Speaking with my numpy debian maintainer hat on, I would really
appreciate if you dont suggest to use pip to install packages in
Debian, or at least not as the only solution.

-- 
Sandro "morph" Tosi
My website: http://matrixhasu.altervista.org/
Me at Debian: http://wiki.debian.org/SandroTosi
G+: https://plus.google.com/u/0/+SandroTosi
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Defining a base linux-64 environment [was: Should I use pip install numpy in linux?]

2016-01-09 Thread Robert McGibbon
> do those packages use ld --as-needed for linking?

Is it possible to check this? I mean, there are over 7000 packages that I
check. I don't know how they were all built.

It's totally possible for many of them to be unused. A reasonably common
thing might be that packages use ctypes or dlopen to dynamically load
shared libraries that are actually just optional (and catch the error and
recover gracefully if the library can't be loaded).

-Robert

On Sat, Jan 9, 2016 at 4:20 AM, Julian Taylor  wrote:

> On 09.01.2016 12:52, Robert McGibbon wrote:
> > Hi all,
> >
> > I went ahead and tried to collect a list of all of the libraries that
> > could be considered to constitute the "base" system for linux-64. The
> > strategy I used was to leverage off the work done by the folks at
> > Continuum by searching through their pre-compiled binaries
> > from https://repo.continuum.io/pkgs/free/linux-64/ to find shared
> > libraries that were dependened on (according to ldd)  that were not
> > accounted for by the declared dependencies that each package made known
> > to the conda package manager.
> >
>
> do those packages use ld --as-needed for linking?
> there are a lot libraries in that list that I highly doubt are directly
> used by the packages.
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Defining a base linux-64 environment [was: Should I use pip install numpy in linux?]

2016-01-09 Thread Julian Taylor
On 09.01.2016 12:52, Robert McGibbon wrote:
> Hi all,
> 
> I went ahead and tried to collect a list of all of the libraries that
> could be considered to constitute the "base" system for linux-64. The
> strategy I used was to leverage off the work done by the folks at
> Continuum by searching through their pre-compiled binaries
> from https://repo.continuum.io/pkgs/free/linux-64/ to find shared
> libraries that were dependened on (according to ldd)  that were not
> accounted for by the declared dependencies that each package made known
> to the conda package manager.
> 

do those packages use ld --as-needed for linking?
there are a lot libraries in that list that I highly doubt are directly
used by the packages.



signature.asc
Description: OpenPGP digital signature
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Should I use pip install numpy in linux?

2016-01-09 Thread Julian Taylor
On 09.01.2016 04:38, Nathaniel Smith wrote:
> On Fri, Jan 8, 2016 at 7:17 PM, Nathan Goldbaum  wrote:
>> Doesn't building on CentOS 5 also mean using a quite old version of gcc?
> 
> Yes. IIRC CentOS 5 ships with gcc 4.4, and you can bump that up to gcc
> 4.8 by using the Redhat Developer Toolset release (which is gcc +
> special backport libraries to let it generate RHEL5/CentOS5-compatible
> binaries). (I might have one or both of those version numbers slightly
> wrong.)
> 
>> I've never tested this, but I've seen claims on the anaconda mailing list of
>> ~25% slowdowns compared to building from source or using system packages,
>> which was attributed to building using an older gcc that doesn't optimize as
>> well as newer versions.
> 
> I'd be very surprised if that were a 25% slowdown in general, as
> opposed to a 25% slowdown on some particular inner loop that happened
> to neatly match some new feature in a new gcc (e.g. something where
> the new autovectorizer kicked in). But yeah, in general this is just
> an inevitable trade-off when it comes to distributing binaries: you're
> always going to pay some penalty for achieving broad compatibility as
> compared to artisanally hand-tuned binaries specialized for your
> machine's exact OS version, processor, etc. Not much to be done,
> really. At some point the baseline for compatibility will switch to
> "compile everything on CentOS 6", and that will be better but it will
> still be worse than binaries that target CentOS 7, and so on and so
> forth.
> 

I have over the years put in one gcc specific optimization after the
other so yes using an ancient version will make many parts significantly
slower. Though that is not really a problem, updating a compiler is easy
even without redhats devtoolset.

At least as far as numpy is concerned linux binaries should not be a
very big problem. The only dependency where the version matters is glibc
which has updated its interfaces we use (in a backward compatible way)
many times.
But here if we use a old enough baseline glibc (e.g. centos5 or ubuntu
10.04) we are fine at reasonable performance costs, basically only
slower memcpy.

Scipy on the other hand is a larger problem as it contains C++ code.
Linux systems are now transitioning to C++11 which is binary
incompatible in parts to the old standard. There a lot of testing is
necessary to check if we are affected.
How does Anaconda deal with C++11?



signature.asc
Description: OpenPGP digital signature
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Generalize hstack/vstack --> stack; Block matrices like in matlab

2016-01-09 Thread Stefan Otte
Hey,

one of my new year's resolutions is to get my pull requests accepted (or
closed). So here we go...

Here is the update pull request: https://github.com/numpy/numpy/pull/5057
Here is the docstring:
https://github.com/sotte/numpy/commit/3d4c5d19a8f15b35df50d945b9c8853b683f7ab6#diff-2270128d50ff15badd1aba4021c50a8cR358

The new `block` function is very similar to matlab's `[A, B; C, D]`.

Pros:
- it's very useful (in my experiment)
- less friction for people coming from matlab
- it's conceptually simple
- the implementation is simple
- it's documented
- it's tested

Cons:
- the implementation is not super efficient. Temporary copies are created.
However, bmat also does that.

Feedback is very welcome!


Best,
 Stefan


On Sun, May 10, 2015 at 12:33 PM, Stefan Otte  wrote:

> Hey,
>
> Just a quick update. I updated the pull request and renamed `stack` into
> `block`. Have a look: https://github.com/numpy/numpy/pull/5057
>
> I'm sticking with simple initial implementation because it's simple and
> does what you think it does.
>
>
> Cheers,
>  Stefan
>
>
>
> On Fri, Oct 31, 2014 at 2:13 PM Stefan Otte  wrote:
>
>> To make the last point more concrete the implementation could look
>> something like this (note that I didn't test it and that it still
>> takes some work):
>>
>>
>> def bmat(obj, ldict=None, gdict=None):
>> return matrix(stack(obj, ldict, gdict))
>>
>>
>> def stack(obj, ldict=None, gdict=None):
>> # the old bmat code minus the matrix calls
>> if isinstance(obj, str):
>> if gdict is None:
>> # get previous frame
>> frame = sys._getframe().f_back
>> glob_dict = frame.f_globals
>> loc_dict = frame.f_locals
>> else:
>> glob_dict = gdict
>> loc_dict = ldict
>> return _from_string(obj, glob_dict, loc_dict)
>>
>> if isinstance(obj, (tuple, list)):
>> # [[A,B],[C,D]]
>> arr_rows = []
>> for row in obj:
>> if isinstance(row, N.ndarray):  # not 2-d
>> return concatenate(obj, axis=-1)
>> else:
>> arr_rows.append(concatenate(row, axis=-1))
>> return concatenate(arr_rows, axis=0)
>>
>> if isinstance(obj, N.ndarray):
>> return obj
>>
>>
>> I basically turned the old `bmat` into `stack` and removed the matrix
>> calls.
>>
>>
>> Best,
>>  Stefan
>>
>>
>>
>> On Wed, Oct 29, 2014 at 3:59 PM, Stefan Otte 
>> wrote:
>> > Hey,
>> >
>> > there are several ways how to proceed.
>> >
>> > - My proposed solution covers the 80% case quite well (at least I use
>> > it all the time). I'd convert the doctests into unittests and we're
>> > done.
>> >
>> > - We could slightly change the interface to leave out the surrounding
>> > square brackets, i.e. turning `stack([[a, b], [c, d]])` into
>> > `stack([a, b], [c, d])`
>> >
>> > - We could extend it even further allowing a "filler value" for non
>> > set values and a "shape" argument. This could be done later as well.
>> >
>> > - `bmat` is not really matrix specific. We could refactor `bmat` a bit
>> > to use the same logic in `stack`. Except the `matrix` calls `bmat` and
>> > `_from_string` are pretty agnostic to the input.
>> >
>> > I'm in favor of the first or last approach. The first: because it
>> > already works and is quite simple. The last: because the logic and
>> > tests of both `bmat` and `stack` would be the same and the feature to
>> > specify a string representation of the block matrix is nice.
>> >
>> >
>> > Best,
>> >  Stefan
>> >
>> >
>> >
>> > On Tue, Oct 28, 2014 at 7:46 PM, Nathaniel Smith  wrote:
>> >> On 28 Oct 2014 18:34, "Stefan Otte"  wrote:
>> >>>
>> >>> Hey,
>> >>>
>> >>> In the last weeks I tested `np.asarray(np.bmat())` as `stack`
>> >>> function and it works quite well. So the question persits:  If `bmat`
>> >>> already offers something like `stack` should we even bother
>> >>> implementing `stack`? More code leads to more
>> >>> bugs and maintenance work. (However, the current implementation is
>> >>> only 5 lines and by using `bmat` which would reduce that even more.)
>> >>
>> >> In the long run we're trying to reduce usage of np.matrix and ideally
>> >> deprecate it entirely. So yes, providing ndarray equivalents of matrix
>> >> functionality (like bmat) is valuable.
>> >>
>> >> -n
>> >>
>> >>
>> >> ___
>> >> NumPy-Discussion mailing list
>> >> NumPy-Discussion@scipy.org
>> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> >>
>>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Defining a base linux-64 environment [was: Should I use pip install numpy in linux?]

2016-01-09 Thread Robert McGibbon
Hi all,

I went ahead and tried to collect a list of all of the libraries that could
be considered to constitute the "base" system for linux-64. The strategy I
used was to leverage off the work done by the folks at Continuum by
searching through their pre-compiled binaries from
https://repo.continuum.io/pkgs/free/linux-64/ to find shared libraries that
were dependened on (according to ldd)  that were not accounted for by the
declared dependencies that each package made known to the conda package
manager.

The full list of these system libraries, sorted in from
most-commonly-depend-on to rarest, is below. There are 158 of them.

['linux-vdso.so.1', 'libc.so.6', 'libpthread.so.0', 'libm.so.6',
'libdl.so.2', 'libutil.so.1', 'libgcc_s.so.1', 'libstdc++.so.6',
'libexpat.so.1', 'librt.so.1', 'libpng12.so.0', 'libcrypt.so.1',
'libffi.so.6', 'libresolv.so.2', 'libkeyutils.so.1', 'libcom_err.so.2',
'libp11-kit.so.0', 'libkrb5.so.26', 'libheimntlm.so.0', 'libtasn1.so.6',
'libheimbase.so.1', 'libgssapi.so.3', 'libroken.so.18', 'libhcrypto.so.4',
'libhogweed.so.4', 'libnettle.so.6', 'libhx509.so.5', 'libwind.so.0',
'libgnutls-deb0.so.28', 'libasn1.so.8', 'libgmp.so.10', 'libsasl2.so.2',
'libidn.so.11', 'librtmp.so.1', 'liblber-2.4.so.2', 'libldap_r-2.4.so.2',
'libXdmcp.so.6', 'libX11.so.6', 'libXau.so.6', 'libxcb.so.1',
'libgssapi_krb5.so.2', 'libkrb5.so.3', 'libk5crypto.so.3',
'libkrb5support.so.0', 'libicudata.so.55', 'libicuuc.so.55',
'libhdf5_serial.so.10', 'libcurl-gnutls.so.4', 'libhdf5_serial_hl.so.10',
'libtinfo.so.5', 'libgcrypt.so.20', 'libgpg-error.so.0', 'libnsl.so.1',
'libXext.so.6', 'libncursesw.so.5', 'libpanelw.so.5', 'libXrender.so.1',
'libjbig.so.0', 'libpcre.so.3', 'libglib-2.0.so.0',
'libnvidia-tls.so.352.41', 'libnvidia-glcore.so.352.41', 'libGL.so.1',
'libuuid.so.1', 'libSM.so.6', 'libICE.so.6', 'libgobject-2.0.so.0',
'libgfortran.so.1', 'liblzma.so.5', 'libXt.so.6', 'libgmodule-2.0.so.0',
'libXi.so.6', 'libgstpbutils-1.0.so.0', 'liborc-0.4.so.0',
'libgstreamer-1.0.so.0', 'libgsttag-1.0.so.0', 'libgstvideo-1.0.so.0',
'libxslt.so.1', 'libaudio.so.2', 'libjpeg.so.8', 'libgstaudio-1.0.so.0',
'libgstbase-1.0.so.0', 'libgstapp-1.0.so.0', 'libz.so.1',
'libgthread-2.0.so.0', 'libfreetype.so.6', 'libfontconfig.so.1',
'libdbus-1.so.3', 'libsystemd.so.0', 'libltdl.so.7', 'libGLU.so.1',
'libsqlite3.so.0', 'libpgm-5.1.so.0', 'libgomp.so.1', 'libxcb-render.so.0',
'libxcb-shm.so.0', 'libncurses.so.5', 'libxml2.so.2', 'libXss.so.1',
'libXft.so.2', 'libtk.so', 'libtcl.so', 'libasound.so.2',
'libharfbuzz.so.0', 'libpixman-1.so.0', 'libgio-2.0.so.0',
'libXinerama.so.1', 'libselinux.so.1', 'libXcomposite.so.1',
'libthai.so.0', 'libXdamage.so.1', 'libgdk-x11-2.0.so.0',
'libpangoft2-1.0.so.0', 'libcairo.so.2', 'libpangocairo-1.0.so.0',
'libdatrie.so.1', 'libatk-1.0.so.0', 'libXcursor.so.1', 'libXfixes.so.3',
'libgraphite2.so.3', 'libgdk_pixbuf-2.0.so.0', 'libgtk-x11-2.0.so.0',
'libquadmath.so.0', 'libpango-1.0.so.0', 'libXrandr.so.2',
'libgfortran.so.3', 'libjson-c.so.2', 'libshiboken-python2.7.so.1.1',
'libogg.so.0', 'libvorbis.so.0', 'libatlas.so.3', 'libcurl.so.4',
'libhdf5.so.9', 'libodbcinst.so.1', 'libpcap.so.0.9', 'libnetcdf.so.7',
'libblas.so.3', 'libpulse.so.0', 'libcaca.so.0', 'libgstreamer-0.10.so.0',
'libXxf86vm.so.1', 'libhdf5_hl.so.9', 'libpulse-simple.so.0',
'libasyncns.so.0', 'libwrap.so.0', 'libvorbisenc.so.2', 'libmagic.so.1',
'libssl.so.1.0.0', 'libFLAC.so.8', 'libSDL-1.2.so.0', 'libsndfile.so.1',
'libslang.so.2', 'libglapi.so.0', 'libaio.so.1',
'libgstinterfaces-0.10.so.0', 'libpulsecommon-6.0.so', 'libjpeg.so.62',
'libcrypto.so.1.0.0']


This list actually contains a fair number of false positives, so it would
need to be pruned manually. If you stare at it a little while, you might
see some libraries in there that you recognize that shouldn't be part of
the base system, like libatlas.so.3.

This gist https://gist.github.com/rmcgibbo/a13e7623c38ec54fcc93 contains
some more detailed data -- for each of libraries in the list above, it
gives a list of names of the packages that depend on this library. For
example, for libatlas.so.3, the there is only a single package which
depends on it, ["scikit-learn-0.11-np16py27_ce0"]. So, probably a bug.

"libgfortran.so.1" is also in the list. It's depended on by
["cvxopt-1.1.6-py27_0", "cvxopt-1.1.7-py27_0", "cvxopt-1.1.7-py34_0",
"cvxopt-1.1.7-py35_0", "numpy-1.5.1-py27_1", "numpy-1.5.1-py27_3",
"numpy-1.5.1-py27_4", "numpy-1.5.1-py27_ce0", "numpy-1.6.2-py27_1",
"numpy-1.6.2-py27_3", "numpy-1.6.2-py27_4", "numpy-1.6.2-py27_ce0",
"numpy-1.7.0-py27_0", "numpy-1.7.0b2-py27_ce0", "numpy-1.7.0rc1-py27_0",
"numpy-1.7.1-py27_0", "numpy-1.7.1-py27_2", "numpy-1.8.0-py27_0",
"numpy-1.8.1-py27_0", "numpy-1.8.1-py34_0", "numpy-1.8.2-py27_0",
"numpy-1.8.2-py34_0", "numpy-1.9.0-py27_0", "numpy-1.9.0-py34_0",
"numpy-1.9.1-py27_0", "numpy-1.9.1-py34_0", "numpy-1.9.2-py27_0",
"numpy-1.9.2-py34_0"].

Note that this list of numpy versions doesn't include the latest ones --
all 

Re: [Numpy-discussion] Should I use pip install numpy in linux?

2016-01-09 Thread Oscar Benjamin
On 8 January 2016 at 23:27, Chris Barker  wrote:
> On Fri, Jan 8, 2016 at 1:58 PM, Robert McGibbon  wrote:
>>
>> I'm not sure if this is the right path for numpy or not,
>
>
> probably not -- AFAICT, the PyPa folks aren't interested in solving teh
> problems we have in the scipy community -- we can tweak around the edges,
> but we wont get there without a commitment to really solve the issues -- and
> if pip did that, it would essentially be conda -- non one wants to
> re-impliment conda.

I think that's a little unfair to the PyPA people. They would like to
solve all of these problems is just a question of priority and
expertise. As always in open source you have to scratch your own itch
and those guys are working on other things like the security,
stability, scalability of the infrastructure, consistency of pip's
version handling and dependency resolution etc.

Linux wheels is a problem that has been discussed on distutils-sig.
The reason it hasn't happened is that it's a lower priority than
wheels for OSX/Windows because:
1) Most distros already package this stuff i.e. apt-get numpy
2) On Linux it's much easier to get the appropriate compilers so that
pip can build e.g. numpy.
3) The average Linux user is more capable of solving these problems.
4) Getting binary distribution to work across all Linux distributions
is significantly harder than for Windows/OSX because of the myriad
different distros/versions.

Considering point 2 pip install numpy etc. already works for a lot of
Linux users even if it is slow because of the time taken to compile (3
minutes on my system). Depending on your use case that problem is
partially solved by wheel caching. So if Linux wheels were allowed but
didn't always work then that would be a regression for many users. On
OSX binary wheels for numpy are already available and work fine AFAIK.
The absence of binary numpy wheels for Windows is not down to PyPA.

Considering point 4 the idea of compiling on an old base Linux system
has been discussed on distutils-sig before and it seems likely to
work. The problem is really about the external non-libc dependencies
though. The reason progress there has stalled is not because the PyPA
folks don't want to solve it but rather because they have other
priorities and are hoping that people with more expertise in that area
will step up to address those problems. Most of the issues stem from
the scientific Python community so ideally someone from the scientific
Python community would address how to solve those problems.

Recently Nathaniel brought some suggestions to distutils-sig to
address the problem of build-requires which is a particular pain
point. I think that people there appreciated the effort from someone
who understands the needs of hard-to-build packages to improve the way
that pip/PyPI works in that area. There was a lot of confusion from
people not understanding each others needs but ultimately I thought
there was agreement on how to move forward. (Although what happened to
that in the end?)

The same can happen with other problems like Linux wheels. If  you
guys here have a clear idea of how to solve the external dependency
problem then I'm sure they'll be receptive. Personally I think the
best approach is the pyopenblas approach: internalise the external
dependency so that pip can work with it. This is precisely what
Anaconda does and there's actually no need to make substantive changes
to the way pip/pypi/wheel works in order to achieve that. It just
needs someone to package the external dependencies as sdist/wheel (and
for PyPI to allow Linux wheels).

--
Oscar
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ENH: Add the function 'expand_view'

2016-01-09 Thread Sebastian Berg
On Do, 2016-01-07 at 22:48 -0500, John Kirkham wrote:
> First, off sorry for the long turnaround on responding to these
> questions. Below I have tried to respond to everyone's questions and
> comments. I have restructured the order of the messages so that my
> responses are a little more structured. If anybody has more thoughts
> or questions, please let me know.
> 



> From a performance standpoint, `expand_dims` using `reshape` to add
> these extra dimensions. So, it has the possibility of not returning a
> view, but a copy, which could take some time to build if the array is
> large. By using our method, we guarantee a view will always be
> returned; so, this allocation will never be encountered.

Actually reshape can be used, if stride tricks can do the trick, then
reshape is guaranteed to not do a copy.

Still can't say I feel it is a worthy addition (but others may
disagree), especially since I realized we have `expand_dims` already. I
just am not sure it would actually be used reasonably often.

But how about adding a `dims=1` argument to `expand_dims`, so that your
function becomes at least a bit easier:

newarr = np.expand_dims(arr, -1, after.ndim)
# If manual broadcast is necessary:
newarr = np.broadcast_to(arr, before.shape + arr.shape + after.shape)

Otherwise I guess we could morph `expand_dims` partially to your
function, though it would return a read-only view if broadcasting is
active, so I am a bit unsure I like it.


In other words, my personal currently preferred solution to get some of
this would be to add a simple `dims` argument to `expand_dims` and add
`broadcast_to` to its "See Also" section (also it could be in the
`squeeze` see also section I think).
To further help out other people, we could maybe even mention the
combination in the examples (i.e. broadcast_to example?).

- Sebastian 


> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> 

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion