On 3/28/17, zljubi...@gmail.com <zljubi...@gmail.com> wrote: > In dataframe > > import pandas as pd > > data = {'model': ['first', 'first', 'second', 'second', 'second', 'third', > 'third'], > 'dtime': ['2017-01-01_112233', '2017-01-01_112234', > '2017-01-01_112234', '2017-01-01_112234', '2017-01-01_112234', > '2017-01-01_112235', '2017-01-01_112235'], > } > df = pd.DataFrame(data, index = ['a.jpg', 'b.jpg', 'c.jpg', 'd.jpg', > 'e.jpg', 'f.jpg', 'g.jpg'], columns=['model', 'dtime']) > > print(df.head(10)) > > model dtime > a.jpg first 2017-01-01_112233 > b.jpg first 2017-01-01_112234 > c.jpg second 2017-01-01_112234 > d.jpg second 2017-01-01_112234 > e.jpg second 2017-01-01_112234 > f.jpg third 2017-01-01_112235 > g.jpg third 2017-01-01_112235 > > within model, there are duplicate dtime values. > For example, rows d and e are duplicates of the c row. > Row g is duplicate of the f row. > > For each duplicate (within model) I would like to add suffix (starting from > 1) to the dtime value. Something like this: > > model dtime > a.jpg first 2017-01-01_112233 > b.jpg first 2017-01-01_112234 > c.jpg second 2017-01-01_112234 > d.jpg second 2017-01-01_112234-1 > e.jpg second 2017-01-01_112234-2 > f.jpg third 2017-01-01_112235 > g.jpg third 2017-01-01_112235-1 > > How to do that? > -- > https://mail.python.org/mailman/listinfo/python-list >
I am not expert, just played a little... This one could work: gb = df.groupby([df.model, df.dtime]) df.dtime = df.dtime + gb.cumcount().apply(lambda a:str(-a) if a else '') this one is probably more readable: df.dtime = df.dtime + [str(-a) if a else '' for a in gb.cumcount()] I don't know which one is better in memory consumption and/or speed. This small dataframe gave me: %timeit -r 5 df.dtime + gb.cumcount().apply(lambda a:str(-a) if a else '') 1000 loops, best of 5: 387 µs per loop %timeit -r 5 df.dtime + [str(-a) if a else '' for a in gb.cumcount()] 1000 loops, best of 5: 324 µs per loop PL. -- https://mail.python.org/mailman/listinfo/python-list