Re: [Numpy-discussion] Fwd: Reverse(DESC)-ordered sorting

2015-08-19 Thread Jaime Fernández del Río
On Wed, Aug 19, 2015 at 1:10 PM, Feng Yu  wrote:

> Dear list,
>
> This is forwarded from issue 6217
> https://github.com/numpy/numpy/issues/6217
>
> "What is the way to implement DESC ordering in the sorting routines of
> numpy?"
>
> (I am borrowing DESC/ASC from the SQL notation)
>
> For a stable DESC ordering sort, one can not  revert the sorted array via
> argsort()[::-1] .
>
> I propose the following API change to argsorts/sort. (haven't thought
> about lexsort yet) I will use argsort as an example.
>
> Currently, argsort supports sorting by keys ('order') and by 'axis'.
> These two somewhat orthonal interfaces need to be treated differently.
>
> 1. by axis.
>
> Since there is just one sorting key, a single 'reversed' keyword
> argument is sufficient:
>
> a.argsort(axis=0, kind='merge', reversed=True)
>
> Jaime suggested this can be implemented efficiently as a
> post-processing step.
> (https://github.com/numpy/numpy/issues/6217#issuecomment-132604920) Is
> there a reference to the algorithm?
>

My thinking was that, for native types, you can stably reverse a sorted
permutation in-place by first reversing item-by-item, then reversing every
chunk of repeated entries. Sort of the way you would reverse the words in a
sentence in-place: first reverse every letter, then reverse everything
bounded by spaces:

TURN ME AROUND -> DNUORA EM NRUT -> AROUND EM NRUT -> AROUND ME NRUT ->
AROUND ME TURN

We could even add a type-specific function to do this for each of the
native types in the npy_sort library.

As I mentioned in Yu's very nice PR
, probably it is best to leave
the signature of the function alone, and have something like order='desc'
be the trigger for the proposed reversed=True.

Jaime


>
> Because all of the sorting algorithms for 'atomic' dtypes are using
> _LT functions, a post processing step seems to be the only viable
> solution, if possible.


>
> 2. by fields, ('order' kwarg)
>
> A single 'reversed' keyword argument will not work, because some keys
> are ASC but others are DESC, for example, sorting my LastName ASC,
> then Salary DESC.
>
> a.argsort(kind='merge', order=[('LastName', ('FirstName', 'asc'),
> ('Salary', 'desc'))])
>
> The parsing rule of order is:
>
> - if an item is tuple, the first item is the fieldname, the second
> item is DESC/ASC
> - if an item is scalar, the fieldname is the item, the ordering is ASC.
>
> This part of the code already goes to VOID_compare, which walks a
> temporary copy of a.dtype to call the comparison functions.
>
> If I understood the purpose of c_metadata (numpy 1.7+) correctly, the
> ASC/DESC flags, offsets, and comparison functions can all be
> pre-compiled and passed into VOID_compare via c_metadata of the
> temporary type-descriptor.
>
> By just looking this will actually make VOID_compare faster by
> avoiding calling several Python C-API functions. negating the return
> value of cmp seems to be a negligable overhead in such a complex
> function.


> 3. If both reversed and order is given, the ASC/DESC fields in 'order'
> are effectively reversed.
>
> Any comments?
>
> Best,
>
> Yu
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Fwd: Reverse(DESC)-ordered sorting

2015-08-19 Thread Feng Yu
Dear list,

This is forwarded from issue 6217 https://github.com/numpy/numpy/issues/6217

"What is the way to implement DESC ordering in the sorting routines of numpy?"

(I am borrowing DESC/ASC from the SQL notation)

For a stable DESC ordering sort, one can not  revert the sorted array via
argsort()[::-1] .

I propose the following API change to argsorts/sort. (haven't thought
about lexsort yet) I will use argsort as an example.

Currently, argsort supports sorting by keys ('order') and by 'axis'.
These two somewhat orthonal interfaces need to be treated differently.

1. by axis.

Since there is just one sorting key, a single 'reversed' keyword
argument is sufficient:

a.argsort(axis=0, kind='merge', reversed=True)

Jaime suggested this can be implemented efficiently as a
post-processing step.
(https://github.com/numpy/numpy/issues/6217#issuecomment-132604920) Is
there a reference to the algorithm?

Because all of the sorting algorithms for 'atomic' dtypes are using
_LT functions, a post processing step seems to be the only viable
solution, if possible.


2. by fields, ('order' kwarg)

A single 'reversed' keyword argument will not work, because some keys
are ASC but others are DESC, for example, sorting my LastName ASC,
then Salary DESC.

a.argsort(kind='merge', order=[('LastName', ('FirstName', 'asc'),
('Salary', 'desc'))])

The parsing rule of order is:

- if an item is tuple, the first item is the fieldname, the second
item is DESC/ASC
- if an item is scalar, the fieldname is the item, the ordering is ASC.

This part of the code already goes to VOID_compare, which walks a
temporary copy of a.dtype to call the comparison functions.

If I understood the purpose of c_metadata (numpy 1.7+) correctly, the
ASC/DESC flags, offsets, and comparison functions can all be
pre-compiled and passed into VOID_compare via c_metadata of the
temporary type-descriptor.

By just looking this will actually make VOID_compare faster by
avoiding calling several Python C-API functions. negating the return
value of cmp seems to be a negligable overhead in such a complex
function.

3. If both reversed and order is given, the ASC/DESC fields in 'order'
are effectively reversed.

Any comments?

Best,

Yu
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal to remove the Bento build.

2015-08-19 Thread David Cournapeau
On Wed, Aug 19, 2015 at 1:22 AM, Nathaniel Smith  wrote:

> On Tue, Aug 18, 2015 at 4:15 PM, David Cournapeau 
> wrote:
> > If everybody wants to remove bento, we should remove it.
>
> FWIW, I don't really have an opinion either way on bento versus
> distutils, I just feel that we shouldn't maintain two build systems
> unless we're actively planning to get rid of one of them, and for
> several years now we haven't really been learning anything by keeping
> the bento build working, nor has there been any movement towards
> switching to bento as the one-and-only build system, or even a clear
> consensus that this would be a good thing. (Obviously distutils and
> numpy.distutils are junk, so that's a point in bento's favor, but it
> isn't *totally* cut and dried -- we know numpy.distutils works and we
> have to maintain it regardless for backcompat, while bento doesn't
> seem to have any activity upstream or any other users...).
>
> So I'd be totally in favor of adding bento back later if/when such a
> plan materializes; I just don't think it makes sense to keep
> continuously investing effort into it just in case such a plan
> materializes later.
>
> > Regarding single file builds, why would it help for static builds ? I
> > understand it would make things slightly easier to have one .o per
> > extension, but it does not change the fundamental process as the exported
> > symbols are the same in the end ?
>
> IIUC they aren't: with the multi-file build we control exported
> symbols using __attribute__((visibility("hidden")) or equivalent,
> which hides symbols from the shared object export table, but not from
> other translation units that are statically linked. So if you want to
> statically link cpython and numpy, you need some other way to let
> numpy .o files see each others's symbols without exposing them to
> cpython's .o files,


It is less a problem than in shared linking because you can detect the
conflicts at linking time (instead of loading time).


and the single-file build provides one mechanism
> to do that: make the numpy symbols 'static' and then combine them all
> into a single translation unit.
>
> I would love to be wrong about this though. The single file build is
> pretty klugey :-).
>

I know, it took me a while to split the files to go out of single file
build in the first place :)

David


>
> -n
>
> --
> Nathaniel J. Smith -- http://vorpus.org
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion