That's way better, thank you! I never thought I'd say this, but I miss pandas. I could write
df['cs'] = df.groupby('PetalLength').transform(cumsum) That's not possible in Julia because DataFrames don't have a row index. On Wednesday, May 4, 2016 at 9:04:21 AM UTC-4, tshort wrote: > > Here's another way with DataFramesMeta [1]: > > using DataFrames, DataFramesMeta, RDatasets > df = dataset("datasets", "iris")@transform(groupby(df, :Species), cs = > cumsum(:PetalLength)) > > > > [1] https://github.com/JuliaStats/DataFramesMeta.jl/ > > > > On Wed, May 4, 2016 at 8:09 AM, Cedric St-Jean <cedric...@gmail.com > <javascript:>> wrote: > >> "Do blocks" are one of my favourite things about Julia, they're explained in >> the docs >> <http://docs.julialang.org/en/release-0.4/manual/functions/#do-block-syntax-for-function-arguments>. >> >> Basically it's just a convenient way of defining and passing a function >> (the code that comes after `do`) to another function (in this case, `by`). >> `by` goes over the dataframe, splits it into 3 subdataframes (one for each >> Species in the iris dataset), and calls the do-block for each of them. Then >> their return values (the last line in the do-block) gets concatenated >> together to form the final result. The code I really wanted to write is: >> >> using RDatasets >> df = dataset("datasets", "iris") >> # For each species >> df2 = by(df, :Species) do sub_df >> sub_df = copy(sub_df) # don't modify the original dataframe >> # Add a :cumulative_PetalLength column >> sub_df[:cumulative_PetalLength] = cumsum(sub_df[:PetalLength]) >> # Return the new sub-dataframe >> sub_df >> end >> >> but unfortunately, this code doesn't work with DataFrames.jl >> >> >> On Wednesday, May 4, 2016 at 4:42:41 AM UTC-4, Ben Southwood wrote: >>> >>> Thanks Cedric, that worked very well. I'm having a little trouble >>> following the documentation as to how the "by ... do ..." structure >>> actually works. Would you mind explaining what the code is doing? >>> >>> On Tuesday, May 3, 2016 at 10:07:10 PM UTC-4, Cedric St-Jean wrote: >>>> >>>> Something like >>>> >>>> using RDatasets >>>> df = dataset("datasets", "iris") >>>> df[:cumulative_PetalLength] = 0.0 >>>> by(df, :Species) do sub_df >>>> sub_df[:cumulative_PetalLength] = cumsum(sub_df[:PetalLength]) >>>> sub_df >>>> end >>>> >>>> though I hope someone can provide a more elegant solution. `sub_df` a >>>> SubDataFrame, and those objects can neither have a new column nor be >>>> converted to DataFrame. >>>> >>>> On Tuesday, May 3, 2016 at 4:22:29 PM UTC-4, Ben Southwood wrote: >>>>> >>>>> I have the following dataframe with values of the form >>>>> >>>>> date1,label1,qty1_1 >>>>> date2,label1,qty1_2 >>>>> date3,label1,qty1_3 >>>>> .... >>>>> dateN,label1,qty1_N >>>>> date1,label2,qty2_1 >>>>> date2,label2,qty2_2 >>>>> date3,label2,qty2_3 >>>>> .... >>>>> dateN,label2,qty1_N >>>>> .... >>>>> >>>>> >>>>> >>>>> I would like to cumulative sum the qtys such that the value of the >>>>> cumulative sum only increases for each label. And then i'd have >>>>> >>>>> date1,label1,cuml1_1 >>>>> date2,label1,cuml1_2 >>>>> date3,label1,cuml1_3 >>>>> .... >>>>> dateN,label1,cuml1_N >>>>> date1,label2,cuml2_1 >>>>> >>>>> >>>>> >>>>> This way I can use gadfly and run the following plot >>>>> >>>>> >>>>> plot(x=grouped[:date],y=grouped[:cuml_sum],color=grouped[:label],Geom.line) >>>>> >>>>> >>>>> and have each cuml sum have it's own colouring by date. I'm stuck on >>>>> how to do this simply without creating lookups. Any help? Thanks! >>>>> >>>>> >>>>> >