Re: [R] Different TFIDF settings in test set prevent testing model

2023-08-11 Thread James C Schopf
peats = 2, classProbs = TRUE) model_svmRadial <- train(M2 ~ ., data = trainData, method = "svmRadial", trControl = ctrl) From: Ivan Krylov Sent: Saturday, August 12, 2023 12:49 AM To: James C Schopf Cc: r-help@r-project.org Subject: Re: [R] Differ

Re: [R] Different TFIDF settings in test set prevent testing model

2023-08-11 Thread Ivan Krylov
В Fri, 11 Aug 2023 10:20:27 + James C Schopf пишет: > > train_text_dtm <- > > DocumentTermMatrix(Corpus(VectorSource(all_train_tokens))) > > test_text_dtm <- > > DocumentTermMatrix(Corpus(VectorSource(all_test_tokens))) I understand the need to prepare the test dataset separately (e.g.

Re: [R] Different TFIDF settings in test set prevent testing model

2023-08-11 Thread Bert Gunter
I know nothing about tf, etc., but can you not simply read in the whole file into R and then randomly split using R? The training and test sets would simply be defined by a single random sample of subscripts which is either chosen or not. e.g. (simplified example -- you would be subsetting the

[R] Different TFIDF settings in test set prevent testing model

2023-08-11 Thread James C Schopf
Hello, I'd be very grateful for your help. I randomly separated a .csv file with 1287 documents 75%/25% into 2 csv files, one for training an algorithm and the other for testing the algorithm. I applied similar preprocessing, including TFIDF transformation, to both sets, but R won't let me