Github user shubhamchopra commented on the issue:

    https://github.com/apache/spark/pull/17673
  
    @Krimit 
    _Can you provide some information about the practical differences between 
CBOW and skip-grams?_
    ![Model 
Architectures](https://cloud.githubusercontent.com/assets/6588487/25546610/d0f95aa8-2c31-11e7-8b47-4f9d31254f0f.png)
    As mentioned in [this paper](https://arxiv.org/pdf/1301.3781.pdf), CBOW 
model looks at the words around a target word, and tries to predict the target 
word. SkipGram does just the opposite. Given a target word, it tries to predict 
the context words around it. The prediction is done using a very simple neural 
network with a single hidden layer. 
    
    _Wikipedia quotes the author (I assume they mean Tomas) as saying that CBOW 
is faster while skip-gram is slower but does a better job for infrequent words. 
Has this been your experience as well? How pronounced is the difference?_ 
    The current CBOW + Negative Sampling I found to take almost the same time 
as the existing SkipGram + Hierarchical sampling. The negative sampling is 
tunable, and the performance will be slower for a higher number of negative 
samples.
    
    _in what cases would a user choose one over the other?  I'm basically 
seconding @hhbyyh's comment on a more in-depth comparison experiment._
    There is a good amount of research around this with comparison experiments. 
It appears to largely depend on the application embeddings would be used for. 
[Levy et al](http://www.aclweb.org/anthology/Q15-1016) show how different 
methods perform with extensive experiments. They used the embeddings to perform 
similarity, relatedness and other tests on some open datasets.
    
    [Mikolov et 
al](https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf)
 found SkipGram with Negative Sampling to outperform CBOW. [Baroni et 
al](http://anthology.aclweb.org/P/P14/P14-1023.pdf) found that CBOW had a 
slight advantage. [Levy et al](http://www.aclweb.org/anthology/Q15-1016) 
explain that while CBOW did not perform as well in their experiments, others 
have shown that capturing joint contexts (CBOW does this) can improve 
performance on word similarity tasks. They also saw CBOW to perform well in 
analogy tasks. So again, it depends on the task being performed.
    
    [Mikolov et al](https://arxiv.org/pdf/1309.4168.pdf) recommend using 
Skip-Gram when mono-lingual data is small and CBOW for larger datasets.
    
    _The fact that the original paper has both implementations is not in itself 
enough of a reason for Spark to do the same, IMO_
    This is an active area of research, and both methods generate embeddings 
that perform well on different tasks. As a library providing these 
implementations, the choice I think is best left to the user and the 
application it is being used for.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to