Github user shubhamchopra commented on the issue: https://github.com/apache/spark/pull/17673 @Krimit _Can you provide some information about the practical differences between CBOW and skip-grams?_ ![Model Architectures](https://cloud.githubusercontent.com/assets/6588487/25546610/d0f95aa8-2c31-11e7-8b47-4f9d31254f0f.png) As mentioned in [this paper](https://arxiv.org/pdf/1301.3781.pdf), CBOW model looks at the words around a target word, and tries to predict the target word. SkipGram does just the opposite. Given a target word, it tries to predict the context words around it. The prediction is done using a very simple neural network with a single hidden layer. _Wikipedia quotes the author (I assume they mean Tomas) as saying that CBOW is faster while skip-gram is slower but does a better job for infrequent words. Has this been your experience as well? How pronounced is the difference?_ The current CBOW + Negative Sampling I found to take almost the same time as the existing SkipGram + Hierarchical sampling. The negative sampling is tunable, and the performance will be slower for a higher number of negative samples. _in what cases would a user choose one over the other? I'm basically seconding @hhbyyh's comment on a more in-depth comparison experiment._ There is a good amount of research around this with comparison experiments. It appears to largely depend on the application embeddings would be used for. [Levy et al](http://www.aclweb.org/anthology/Q15-1016) show how different methods perform with extensive experiments. They used the embeddings to perform similarity, relatedness and other tests on some open datasets. [Mikolov et al](https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf) found SkipGram with Negative Sampling to outperform CBOW. [Baroni et al](http://anthology.aclweb.org/P/P14/P14-1023.pdf) found that CBOW had a slight advantage. [Levy et al](http://www.aclweb.org/anthology/Q15-1016) explain that while CBOW did not perform as well in their experiments, others have shown that capturing joint contexts (CBOW does this) can improve performance on word similarity tasks. They also saw CBOW to perform well in analogy tasks. So again, it depends on the task being performed. [Mikolov et al](https://arxiv.org/pdf/1309.4168.pdf) recommend using Skip-Gram when mono-lingual data is small and CBOW for larger datasets. _The fact that the original paper has both implementations is not in itself enough of a reason for Spark to do the same, IMO_ This is an active area of research, and both methods generate embeddings that perform well on different tasks. As a library providing these implementations, the choice I think is best left to the user and the application it is being used for.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org