[GitHub] [singa] dcslin commented on pull request #722: cudnn lstm

GitBox Sat, 06 Jun 2020 11:44:12 -0700


dcslin commented on pull request #722:
URL: https://github.com/apache/singa/pull/722#issuecomment-640101950



   hi @nudles, sorry for the delay, there are still issues regarding the demo 
model, updates are:
   0. added gensim as word2vec converter. it is not cleared how the pooling 
part of the model design is done in the paper. But from the reference model, it 
contracts the lstm output tensor by mean on sequence axis, which could be done 
by `autograd.reduce_mean()`
   1. as loss function `L = max{0, M − cosine(q, a+) + cosine(q, a−)}` required 
two forward passes, then one backward pass, which is not supported by singa. 
Tried to concate the a+ and a- into {bs2, seq, embed} tensor and make model 
accept input like `(q, a+, a-)`. then in testing phase it is confusing because 
there is no label for answer.
   2. tried to implemented a simplified version that subsituting loss function 
`L = max{0, M − cosine(q, a+) + cosine(q, a−)}`, with mseloss, then the model 
could be  trained with date in the format of `<q, a+, 1>, <q, a-, 0>`. but 
there is convergence problem.
   3. advised by @joddiy , we could train the model with data format: one batch 
has two samples `<q,a+>` and `<q,a->` ordered alternatively, then we modified 
the loss function compute the loss for every batch of 2 samples(batch_index 0: 
`pos_sim`, batch_index 1:`neg_sim`), still checking.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [singa] dcslin commented on pull request #722: cudnn lstm

Reply via email to