I write program to do experiment about time series(weekly) with machine learning. I record changes of everyday of each ID and Count. I read the csv as dataset like below: ID, Count 1,30 // First Day 2,33 3,45 4,11 5,66 7,88 1,32 // 2nd Day 2,35 3,55 4,21 5,36 7,48
I have two array X, y. I want to put ID and Count in X, and put Count of 2nd Day in y. The element of X and y is corresponding with the ID. ex: X[0] and y[0] is the value where ID == 1 X[1] and y[1] is the value where ID == 2...etc So X is like below: array([ [1,30], [2,33], [3,45], [4,11], [5,66], [7,88] ]) y is like below: array([ [32], [35], [55], [21], [36], [48] ]) Program what I write always cost O(n^2) complexity. Code: dataframe = pandas.read_csv(path, header=0, engine='python') dataset = dataframe.dropna().values.astype('float64') create_data(dataset) def create_data(dataset): uni = np.unique(dataset[:,0]) X = np.zeros((7, 2)) y = np.zeros((7, 1)) offset = 0 for i in uni: index = dataset[:,0] == i data = dataset[index] for j in xrange(len(data)-2+1): X[offset] = data[j] y[offset] = data[j, 1] offset += 1 return X, y I use two for loop and estimate the complexity is O(n^2). Is there any better way to re-write the code and reduct the time complexity? -- https://mail.python.org/mailman/listinfo/python-list