Re: [Numpy-discussion] Picking rows with the first (or last) occurrence of each key

Jeff Reback Mon, 04 Jul 2016 16:31:25 -0700

This is trivial in pandas. a simple groupby.

In [6]: data = [[ 'a', 27, 14.5 ],['b', 12, 99.0],['a', 17, 100.3], ['b',
12, -329.0]]


In [7]: df = DataFrame(data, columns=list('ABC'))

In [8]: df
Out[8]:
   A   B      C
0  a  27   14.5
1  b  12   99.0
2  a  17  100.3
3  b  12 -329.0

In [9]: df.groupby('A').first()
Out[9]:
    B     C
A
a  27  14.5
b  12  99.0

In [10]: df.groupby('A').last()
Out[10]:
    B      C
A
a  17  100.3
b  12 -329.0


On Mon, Jul 4, 2016 at 7:27 PM, Skip Montanaro <skip.montan...@gmail.com>
wrote:

> > Any way that you can make your keys numeric? Then you can run np.diff on
> > that first column, and use the indices of nonzero entries
> (np.flatnonzero)
> > to know where values change. With a +1/-1 offset (that I am too lazy to
> > figure out right now ;) you can then index into the original rows to get
> > either the first or last occurrence of each run.
>
> I'll give it some thought, but one of the elements of the key is definitely
> a (short, < six characters) string.  Hashing it probably wouldn't work, too
> great a chance for collisions.
>
> S
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Picking rows with the first (or last) occurrence of each key

Reply via email to