All is needed

Architecturally wise the layers themselves can be arbitrary but for most 
sequence processing people will shout "Transformer!" and call it a day

Most processes do follow a sorta-autoregressive thing and sequence modeling is 
a good way to look at the problem, so I get the jump to that

But I want to see someone test some things with popstar Transformers like GPT-2 
and BERT:
1. layer order dependence: swapping layers with one another to determine if 
certain layers learn independent things
2. robustness to change: progressive growth, random layer insertion and removal
3. Importance of architecture: Add in random Decoder layer connections and 
observe the effect on overall performance

This will unlock the questions for forming AI Brain
------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T8fe5317c3cebf70b-M31fbfb7c4978240b8fbc69ff
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to