I am trying to read a csv file with a date-time field. There are many rows with the same date but different times. I first want to clear the times so that rows from the same day have the same date-time field (called Date). There is another field called Text and I want to collapse all the records with the same date so that there is only one record for this date and with a text field that contains all the strings from all the corresponding text fields. At the same time I want to create a new field that has the count of how many records were collapsed for each date. There is a third field called Tw.ID and I was trying to use tapply on this field to do this. Later I will create a DocumentTermMatrix with the tm package on this dataframe. In the code below I have not figured out how to collapse the data so that there is only one record for each date and I don't really have a good way to add in a count field. Can anyone make any suggestions? Thanks.
install.packages(c("tm")) library(tm) y.df=read.csv("YHOO3000.csv", header=TRUE) y.df$Date= as.POSIXlt( y.df$Date) ysub14.df=y.df ysub14.df$Date=y.df$Date -14*3600 #I pushed the record times back a little here. ysub14.df$Date=as.Date(ysub14.df$Date, "%Y-%m-%d") # might want to use groups <- unstack(data.frame(ysub14.df$Text,ysub14.df$Date)) # to put all the tweets for one day into a group. This makes a list # I think, with the name of the list being the Date and # the tweets for that date being stored in a vector. countgroup2=tapply(ysub14.df$Tw.ID,ysub14.df$Date,length) [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.