I am trying to read a csv file with a date-time field. There are many rows
with the same date but different times. I first want to clear the times so
that rows from the same day have the same date-time field (called Date).
There is another field called Text and I want to collapse all the records
with the same date so that there is only one record for this date and with
a text field that contains all the strings from all the corresponding text
fields. At the same time I want to create a new field that has the count of
how many records were collapsed for each date. There is a third field
called Tw.ID and I was trying to use tapply on this field to do this. Later
I will create a DocumentTermMatrix with the tm package on this dataframe.
In the code below I have not figured out how to collapse the data so that
there is only one record for each date and I don't really have a good way
to add in a count field. Can anyone make any suggestions?
Thanks.

install.packages(c("tm"))
library(tm)
y.df=read.csv("YHOO3000.csv", header=TRUE)
y.df$Date= as.POSIXlt( y.df$Date)
ysub14.df=y.df
ysub14.df$Date=y.df$Date -14*3600 #I pushed the record times back a little
here.
ysub14.df$Date=as.Date(ysub14.df$Date, "%Y-%m-%d")
# might want to use groups <-
unstack(data.frame(ysub14.df$Text,ysub14.df$Date))
# to put all the tweets for one day into a group. This makes a list
# I think, with the name of the list being the Date and
# the tweets for that date being stored in a vector.
countgroup2=tapply(ysub14.df$Tw.ID,ysub14.df$Date,length)

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to