Just to re-iterate what you said, I should fit IDF model only on training data and then re-use it for both test data and then later on unseen data to make predictions.
On Tue, Nov 1, 2016 at 3:49 AM, Robin East <robin.e...@xense.co.uk> wrote: > The point of setting aside a portion of your data as a test set is to try > and mimic applying your model to unseen data. If you fit your IDF model to > all your data, any evaluation you perform on your test set is likely to > over perform compared to ‘real’ unseen data. Effectively you would have > overfit your model. > ------------------------------------------------------------ > ------------------- > Robin East > *Spark GraphX in Action* Michael Malak and Robin East > Manning Publications Co. > http://www.manning.com/books/spark-graphx-in-action > > > > > > On 1 Nov 2016, at 10:15, Nirav Patel <npa...@xactlycorp.com> wrote: > > FYI, I do reuse IDF model while making prediction against new unlabeled > data but not between training and test data while training a model. > > On Tue, Nov 1, 2016 at 3:10 AM, Nirav Patel <npa...@xactlycorp.com> wrote: > >> I am using IDF estimator/model (TF-IDF) to convert text features into >> vectors. Currently, I fit IDF model on all sample data and then transform >> them. I read somewhere that I should split my data into training and test >> before fitting IDF model; Fit IDF only on training data and then use same >> transformer to transform training and test data. >> This raise more questions: >> 1) Why would you do that? What exactly do IDF learn during fitting >> process that it can reuse to transform any new dataset. Perhaps idea is to >> keep same value for |D| and DF|t, D| while use new TF|t, D| ? >> 2) If not then fitting and transforming seems redundant for IDF model >> > > > > > [image: What's New with Xactly] <http://www.xactlycorp.com/email-click/> > > <https://www.nyse.com/quote/XNYS:XTLY> [image: LinkedIn] > <https://www.linkedin.com/company/xactly-corporation> [image: Twitter] > <https://twitter.com/Xactly> [image: Facebook] > <https://www.facebook.com/XactlyCorp> [image: YouTube] > <http://www.youtube.com/xactlycorporation> > > > -- [image: What's New with Xactly] <http://www.xactlycorp.com/email-click/> <https://www.nyse.com/quote/XNYS:XTLY> [image: LinkedIn] <https://www.linkedin.com/company/xactly-corporation> [image: Twitter] <https://twitter.com/Xactly> [image: Facebook] <https://www.facebook.com/XactlyCorp> [image: YouTube] <http://www.youtube.com/xactlycorporation>