But still wrong... needs to be calculated in two steps because the cars are seen in multiple frames. --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnew...@dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --------------------------------------------------------------------------- Sent from my phone. Please excuse my brevity.
jim holtman <jholt...@gmail.com> wrote: >FYI, increased it to 3M rows of data and it took 0.3 seconds to >summarize it: > >> system.time({ >+ # summarize the slot & Class >+ result <- DF[ >+ , list(Total = .N # count of entries >+ , mVel = mean(velocity) # average velocity >+ ) >+ , keyby = 'slot,Class' >+ ] >+ }) > user system elapsed > 0.30 0.00 0.29 >> dim(DF) >[1] 3000000 5 >> head(result, 20) > slot Class Total mVel > 1: [22,322] 1 32449 37.70693 > 2: [22,322] 2 32741 37.49304 > 3: [22,322] 3 32213 37.40232 > 4: (322,622] 1 32257 37.38503 > 5: (322,622] 2 31912 37.35941 > 6: (322,622] 3 32360 37.43757 > 7: (622,922] 1 32359 37.48244 > 8: (622,922] 2 32505 37.57031 > 9: (622,922] 3 31966 37.58736 >10: (922,1.22e+03] 1 32228 37.34592 >11: (922,1.22e+03] 2 32099 37.44663 >12: (922,1.22e+03] 3 32054 37.51975 >13: (1.22e+03,1.52e+03] 1 32230 37.50131 >14: (1.22e+03,1.52e+03] 2 32353 37.50793 >15: (1.22e+03,1.52e+03] 3 32280 37.46094 >16: (1.52e+03,1.82e+03] 1 31957 37.51671 >17: (1.52e+03,1.82e+03] 2 32030 37.48419 >18: (1.52e+03,1.82e+03] 3 32365 37.49100 >19: (1.82e+03,2.12e+03] 1 32244 37.59558 >20: (1.82e+03,2.12e+03] 2 32309 37.49966 >> > >Jim Holtman >Data Munger Guru > >What is the problem that you are trying to solve? >Tell me what you want to do, not how you want to do it. > > >On Sat, Jan 25, 2014 at 9:52 PM, jim holtman <jholt...@gmail.com> >wrote: >> Here is another way of doing it using the data.table package. Tried >> using new new 'dplyr' package, but it had problems. >> >>> # create some test data >>> N <- 30000 # rows of data >>> set.seed(1) >>> DF <- data.frame(vehicle = sample(1:5, N, TRUE) >> + , frame = sample(22:9322, N, TRUE) >> + , velocity = runif(N, 15, 60) >> + , Class = sample(1:3, N, TRUE) >> + ) >>> # break into slot (as previous post) >>> DF$slot <- cut(DF$frame, seq(22, 9322, 300), include.lowest = TRUE) >>> str(DF) >> 'data.frame': 30000 obs. of 5 variables: >> $ vehicle : int 2 2 3 5 2 5 5 4 4 1 ... >> $ frame : int 3884 1333 4277 7490 3008 8096 2318 7662 2017 5324 >... >> $ velocity: num 47.9 44.1 51.6 19.7 30 ... >> $ Class : int 3 1 3 3 1 1 3 3 3 3 ... >> $ slot : Factor w/ 31 levels "[22,322]","(322,622]",..: 13 5 15 >25 >> 10 27 8 26 7 18 ... >>> >>> # use the data.table package >>> require(data.table) >>> DF <- data.table(DF) >>> >>> # summarize the slot & Class >>> result <- DF[ >> + , list(Total = .N # count of entries >> + , mVel = mean(velocity) # average velocity >> + ) >> + , keyby = 'slot,Class' >> + ] >>> >>> head(result, 20) >> slot Class Total mVel >> 1: [22,322] 1 317 38.68162 >> 2: [22,322] 2 291 38.20892 >> 3: [22,322] 3 356 38.18092 >> 4: (322,622] 1 313 37.46468 >> 5: (322,622] 2 326 38.35842 >> 6: (322,622] 3 329 37.46077 >> 7: (622,922] 1 327 37.06243 >> 8: (622,922] 2 321 37.01083 >> 9: (622,922] 3 351 37.12343 >> 10: (922,1.22e+03] 1 361 36.76076 >> 11: (922,1.22e+03] 2 343 38.48242 >> 12: (922,1.22e+03] 3 312 37.64318 >> 13: (1.22e+03,1.52e+03] 1 304 36.89459 >> 14: (1.22e+03,1.52e+03] 2 324 36.60410 >> 15: (1.22e+03,1.52e+03] 3 343 36.11219 >> 16: (1.52e+03,1.82e+03] 1 313 38.71039 >> 17: (1.52e+03,1.82e+03] 2 305 38.29188 >> 18: (1.52e+03,1.82e+03] 3 308 38.61945 >> 19: (1.82e+03,2.12e+03] 1 308 37.69332 >> 20: (1.82e+03,2.12e+03] 2 335 37.38025 >>> >> >> Jim Holtman >> Data Munger Guru >> >> What is the problem that you are trying to solve? >> Tell me what you want to do, not how you want to do it. >> >> >> On Sat, Jan 25, 2014 at 4:38 PM, Jeff Newmiller >> <jdnew...@dcn.davis.ca.us> wrote: >>> Sorry, messed up the second ddply example: >>> >>> dta3 <- ddply( dta2, c("slot","classf"), function(DF){data.frame( >Total=nrow(DF), >>> MeanVelocity= mean( DF$TimeMeanVelocity ) ) } ) >>> >>> >--------------------------------------------------------------------------- >>> Jeff Newmiller The ..... ..... Go >Live... >>> DCN:<jdnew...@dcn.davis.ca.us> Basics: ##.#. ##.#. >Live Go... >>> Live: OO#.. Dead: OO#.. >Playing >>> Research Engineer (Solar/Batteries O.O#. #.O#. >with >>> /Software/Embedded Controllers) .OO#. .OO#. >rocks...1k >>> >--------------------------------------------------------------------------- >>> Sent from my phone. Please excuse my brevity. >>> >>> Jeff Newmiller <jdnew...@dcn.davis.ca.us> wrote: >>>>While you seem to be making some progress in communicating your >>>>problem, the format is still HTML (so it is a mess) and the subject >and >>>>approach of the question are still a poor fit for this list. We are >not >>>>here to DO your work for you, yet you seem to have an overly long >list >>>>of "needs" that suggest you want a complete solution. What you >should >>>>be looking for here are suggestions for how to solve pieces of this >>>>task so that you can do the work of creating your own solution. >>>>Some tools that I find useful for this kind of problem are the cut >>>>function, the plyr package, and the reshape2 package. Others might >find >>>>the aggregate function or the sqldf package or the datatable package >or >>>>the new dplyr package helpful. Each function and package has >>>>documentation with examples that you should read before using them >>>>(e.g. ?cut). >>>> >>>>Some example calculations are (with dta as your sample data frame): >>>> >>>>library(plyr) >>>>dta$slot <- cut( dta$frame, seq(22,9322,300)) >>>>dta$classf <- factor(dta$class, levels=1:3, >>>>labels=c("motorcycle","car","truck")) >>>>dta2 <- ddply( dta, c("slot","classf","vehicle"), >>>>function(DF){data.frame( TimeMeanVelocity=mean(DF$velocity) ) } ) >>>>dta3 <- ddply( dta2, c("slot","classf"), function(DF){data.frame( >>>>MeanVelocity=mean( Total=nrow(DF), DF$TimeMeanVelocity ) ) } ) >>>> >>>>Then you need to fold the total and mean velocity into wide form >using >>>>the dcast function from the reshape2 package (read the >documentation) >>>>and merge them with the merge or cbind functions. >>>> >>>>Good luck, and keep working on making your questions clear. >>>>--------------------------------------------------------------------------- >>>>Jeff Newmiller The ..... ..... Go >>>>Live... >>>>DCN:<jdnew...@dcn.davis.ca.us> Basics: ##.#. ##.#. >Live >>>>Go... >>>> Live: OO#.. Dead: OO#.. >Playing >>>>Research Engineer (Solar/Batteries O.O#. #.O#. >with >>>>/Software/Embedded Controllers) .OO#. .OO#. >>>>rocks...1k >>>>--------------------------------------------------------------------------- >>>> >>>>Sent from my phone. Please excuse my brevity. >>>> >>>>umair durrani <umairdurr...@outlook.com> wrote: >>>>>Hello everyone,Here is the version using dput. I am sorry for the >junk >>>>>I posted before. I have a large vehicle trajectory data of which >>>>>following is a small part: >>>>>structure(list(vehicle = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,2L, >2L), >>>>>frame = c(221L, 222L, 223L, 224L, 115L, 116L, 117L, 118L, 119L, >120L, >>>>>121L), globalx = c(6451259.685, 6451261.244, 6451262.831, >6451264.362, >>>>>6451181.179, 6451183.532, 6451185.884, 6451188.237, 6451190.609, >>>>>6451192.912, 6451195.132), class = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, >2L, >>>>>2L, 2L, 2L), velocity = c(23.37, 23.16, 22.94, 22.85, 35, 35.01, >>>>35.03, >>>>>34.92, 34.49, 33.66, 32.5), lane = c(5L, 5L, 5L, 5L, 4L, 4L, 4L, >4L, >>>>>4L, 4L, 4L)), .Names = c("vehicle", "frame", "globalx", "class", >>>>>"velocity", "lane"), row.names = c(85L, 86L, 87L, 88L, 447L, 448L, >>>>>449L, 450L, 451L, 452L, 453L), class = "data.frame") >>>>>Explanation of Columns:vehicle = unique ID of vehicle. It is >repeated >>>>>(in column) for every frame in which it was observed;frame= ID of >the >>>>>frame in which the vehicle was observed. One frame is 0.1 seconds >>>>>long;class = class of vehicle i.e. 1=motorcycle, 2=car, >>>>>3=truck;velocity= velocity of vehicle in feet per second;lane= lane >>>>>number in which vehicle is present in a particular frame; >>>>> >>>>>'frame' number can also repeat e.g. in frame 120 the example data >>>>shows >>>>>vehicle 2 was observed but in the original data many more vehicles >>>>>might have been observed in this frame. Similarly, 'class' is >defined >>>>>above and all three classes are present in the original data (here >>>>>example data only shows classes 2 and 3 i.e. cars and trucks). >>>>>I need to determine two things:1) Number of vehicles observed in >every >>>>>30 seconds i.e. 300 frames 2) Average velocity of each vehicle >class >>>>in >>>>>every 30 seconds >>>>>> This means that the first step might be to determine the minimum >and >>>>>maximum frame numbers and then divide them in slots so that every >slot >>>>>has 300 frames. In my original data I found 22 as min and 9233 as >max >>>>>frame number. This makes 30 time slots as 22-322, 322-622, ..., >>>>>9022-9233. I need following columns in one table as an output (note >>>>>that Timeslot column should contain the time intervals as described >>>>>before): TimeSlot, Total-Cars, Total-Trucks, Total-Motorcycles, >>>>>MeanVelocity-Cars, MeanVelocity-Trucks, MeanVelocity-Motorcycles >>>>> >>>>> >>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>>______________________________________________ >>>>>R-help@r-project.org mailing list >>>>>https://stat.ethz.ch/mailman/listinfo/r-help >>>>>PLEASE do read the posting guide >>>>>http://www.R-project.org/posting-guide.html >>>>>and provide commented, minimal, self-contained, reproducible code. >>>> >>>>______________________________________________ >>>>R-help@r-project.org mailing list >>>>https://stat.ethz.ch/mailman/listinfo/r-help >>>>PLEASE do read the posting guide >>>>http://www.R-project.org/posting-guide.html >>>>and provide commented, minimal, self-contained, reproducible code. >>> ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.