[R] Time series analysis for a large number of series

Trevor Miles Sat, 09 Aug 2014 19:58:38 -0700

I have over 8000 time series that I need to analyze and forecast. Running 1500 
takes over 2 hours using just ETS, let alone Holt-Winters and ARIMA. So I am 
looking at ways in shrinking the time to generate a 2 year forecast.


The code I am using successfully to run through the time series sequentially is 
below. The essence of the code being reading data from multiple CSV files, 1 
per data set, that contain up to 5 years of historical sales by item. I parse 
each file out by item, generate a time-series for each item, fit the ETS model 
by item, generate a 24 months forecast by item, add the item number to the 
forecast, and write the forecast to an Excel file.

I'm looking for guidance in two areas:

*         Reading the raw data in from Excel which is in the form:
             d1    d2    d3   d4    ...
series 1  v11   v12  v13  v14
series 2  v21   v22  v23  v24
.
.

*         Using parallel processing to analyze the data more quickly using 
several cores.

I have tried to use doParallel at the item level, but without success. I have 
annotated the code to show where I tried to insert the %dopar% aspects.

# store the current directory
initial.dir<-getwd()
# change to the new directory
setwd("~/R")
# load the necessary libraries
require(TTR)
require(forecast)
require(xlsx)

#require(doParallel)
#cl <- makeCluster(3)
#registerDoSNOW(cl)
#chunks <- getDoParWorkers()

# output plots to a file
pdf("R Plots.pdf")
# set the output file
sink(file = "R Output.out", type = c("output"))

# load the dataset
files <- c("3MH", "6MH", "12MH")
for (j in 1:3)
{
  title <- paste("\n\n\n Evaluation of", files[j], " - Started at", date(), 
"\n\n\n")
  cat(title)

  History <- read.csv(paste(files[j],"csv", sep="."))

  # output forecast to XLSX
  outwb <- createWorkbook()
  sheet <- createSheet(outwb, sheetName = paste(files[j], " - ETS"))
  Item <- unique(unlist(History$Item))

  for (i in 1:length(Item))  # I tried using r <- foreach(i=1:length(Item) , 
.combine='rbind') %dopar% at this level
  {
    title <- paste("Evaluation of item ", Item[i], "-", i, "of", 
length(Item),"\n")
    cat(title)
    data <- subset(History, Item == Item[i])
    dates <- unique(unlist(data$Date))
    d <- as.Date(dates, format("%d/%m/%Y"))
    data.ts <- ts(data$Volume, frequency=12, 
start=c(as.numeric(format(d[1],"%Y")), as.numeric(format(d[1],"%m"))))
    #try(plot(decompose(data.ts)))
    #acf(data.ts)
    try(data.ets <- ets(data.ts))
    try(forecast.ets <- forecast.ets(data.ets, h=24))
    IL <- 
c(Item[i],Item[i],Item[i],Item[i],Item[i],Item[i],Item[i],Item[i],Item[i],Item[i],Item[i],Item[i],Item[i],Item[i],Item[i],Item[i],Item[i],Item[i],Item[i],Item[i],Item[i],Item[i],Item[i],Item[i])
    ets.df <- data.frame(forecast.ets)
    ets.df$Item <- IL
    r <- 24*(i-1)+2
    addDataFrame(ets.df, sheet, col.names=FALSE, startRow=r)
  }

  title <- paste("\n\n\n Evaluation of", files[j], " - Completed at", date(), 
"\n\n\n")
  cat(title)
  saveWorkbook(outwb, paste(files[j],"xlsx",sep='.'))
}

# close the output file
sink()
dev.off()
#stopCluster(cl)
# change back to the original directory
setwd(initial.dir)


Trevor Miles
Vice President, Thought Leadership
[http://www.kinaxis.com/email-signature/images/logo-kinaxis.png]<http://www.kinaxis.com>
O: +1.613.907.7611  |  M: +1.647.248.6269  |  T: 
@MilesAhead<https://twitter.com/milesahead>  |  L: 
ca.linkedin.com/in/trevormiles<http://ca.linkedin.com/in/trevormiles>

[Kinexions '14]<http://kinexions.kinaxis.com>

[http://www2.kinaxis.com/email-signature/images/social-icon-twitter.png]<http://twitter.com/kinaxis>
  [http://www2.kinaxis.com/email-signature/images/social-icon-facebook.png] 
<http://www.facebook.com/Kinaxis>   
[http://www2.kinaxis.com/email-signature/images/social-icon-linkedin.png] 
<http://www.linkedin.com/company/kinaxis>   
[http://www2.kinaxis.com/email-signature/images/social-icon-community.png] 
<https://community.kinaxis.com>

Confidential. This email and any attachments hereto may contain private, 
confidential, and privileged material for the sole use of the addressee. Any 
review, copying, or distribution of this email (or any attachments thereto) by 
others is strictly prohibited. If you are not the intended recipient, please 
return this email to the sender immediately and permanently delete the original 
and any copies of this email and any of its attachments. Thank you.


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Time series analysis for a large number of series

Reply via email to