subject:"\[FRIAM\] Dissecting Recall of Factual Associations in, Auto\-Regressive Language Models"

Re: [FRIAM] Dissecting Recall of Factual Associations in, Auto-Regressive Language Models

2023-05-07 Thread Russ Abbott

Interesting that I downloaded this paper this morning also. Haven't yet
looked to see what I could understand about it. From the little I know
about it, this looks like it is related to the self-attention mechanism in
transformers.

-- Russ Abbott
Professor Emeritus, Computer Science
California State University, Los Angeles


On Sun, May 7, 2023 at 9:29 AM Steve Smith  wrote:

> https://arxiv.org/pdf/2304.14767.pdf
>
> I am pretty much over my head in this literature, but continue to be
> fascinated as I watch people who are not try to untangle some explanatory
> power in their models...
>
> The details of this analysis or framing this as *information flow* rather
> than *static data/structure* is reminiscent of some very nascent work we
> *tried* to do 15 years ago, attempting to analyze/understand huge Systems
> Dynamics models of Critical Infrastructure joined together/coupled to try
> to predict the potential for cascading failures through these coupled
> systems.   The representation *as* SD models were natural for this framing
> but we made only the tiniest progress IMO in extracting hints of
> *explanatory* narratives.I was primarily doing visualization on those
> tasks but tried to focus on clustering of the Dual Graph/Network  to find
> structure in the *flow* during extreme events rather than in the
> engineered/designed structure of the network itself.
>
> I know there are others on this list who have worked with complex, dynamic
> networks  (I'm thinking of Frank's colleagues and Causal Discovery in
> Graphical Models,   various project Glen has alluded to, and a wide variety
> of problems Stephen has related to me over the years, but I'm sure there
> are plenty of others)...  I'm curious if anyone else is wading in this deep
> (and more to the point, finding any traction)?
>
> From the paper:
>
> -. --- - / ...- .- .-.. .. -.. / -- --- .-. ... . / -.-. --- -.. .
> FRIAM Applied Complexity Group listserv
> Fridays 9a-12p Friday St. Johns Cafe   /   Thursdays 9a-12p Zoom
> https://bit.ly/virtualfriam
> to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com
> FRIAM-COMIC http://friam-comic.blogspot.com/
> archives:  5/2017 thru present
> https://redfish.com/pipermail/friam_redfish.com/
>   1/2003 thru 6/2021  http://friam.383.s1.nabble.com/
>
-. --- - / ...- .- .-.. .. -.. / -- --- .-. ... . / -.-. --- -.. .
FRIAM Applied Complexity Group listserv
Fridays 9a-12p Friday St. Johns Cafe   /   Thursdays 9a-12p Zoom 
https://bit.ly/virtualfriam
to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com
FRIAM-COMIC http://friam-comic.blogspot.com/
archives:  5/2017 thru present https://redfish.com/pipermail/friam_redfish.com/
  1/2003 thru 6/2021  http://friam.383.s1.nabble.com/

Re: [FRIAM] Dissecting Recall of Factual Associations in, Auto-Regressive Language Models

2023-05-07 Thread Pieter Steenekamp

You know what endlessly fascinates me? The way large language models are
like those magic growth pills you see in cartoons. Just add some extra
data, give it a stir, and voila! Emergent abilities appear out of thin air.
It's like watching a kid turn into a superhero overnight.

I quote verbatim from https://arxiv.org/pdf/2206.07682.pdf
"Scaling up language models has been shown to predictably improve
performance and sample efficiency on a wide range of downstream tasks. This
paper instead discusses an unpredictable phenomenon that we refer to as
emergent abilities of large language models. We consider an ability to be
emergent if it is not present in smaller models but is present in larger
models. Thus, emergent abilities cannot be predicted simply by
extrapolating the performance of smaller models. The existence of such
emergence raises the question of whether additional scaling could
potentially further expand the range of capabilities of language models."

Emergent properties of large language models are abilities that are not
present in smaller models but are present in larger models. They are
unpredictable and cannot be explained simply by extrapolating the
performance of smaller models. Some examples of emergent abilities are:

Solving math problems
Answering factual questions
Generating summaries
Writing code
Playing games
Translating languages
Composing music
Drawing images
Detecting emotions
Reasoning logically

On Sun, 7 May 2023 at 18:29, Steve Smith  wrote:

> https://arxiv.org/pdf/2304.14767.pdf
>
> I am pretty much over my head in this literature, but continue to be
> fascinated as I watch people who are not try to untangle some explanatory
> power in their models...
>
> The details of this analysis or framing this as *information flow* rather
> than *static data/structure* is reminiscent of some very nascent work we
> *tried* to do 15 years ago, attempting to analyze/understand huge Systems
> Dynamics models of Critical Infrastructure joined together/coupled to try
> to predict the potential for cascading failures through these coupled
> systems.   The representation *as* SD models were natural for this framing
> but we made only the tiniest progress IMO in extracting hints of
> *explanatory* narratives.I was primarily doing visualization on those
> tasks but tried to focus on clustering of the Dual Graph/Network  to find
> structure in the *flow* during extreme events rather than in the
> engineered/designed structure of the network itself.
>
> I know there are others on this list who have worked with complex, dynamic
> networks  (I'm thinking of Frank's colleagues and Causal Discovery in
> Graphical Models,   various project Glen has alluded to, and a wide variety
> of problems Stephen has related to me over the years, but I'm sure there
> are plenty of others)...  I'm curious if anyone else is wading in this deep
> (and more to the point, finding any traction)?
>
> From the paper:
>
> -. --- - / ...- .- .-.. .. -.. / -- --- .-. ... . / -.-. --- -.. .
> FRIAM Applied Complexity Group listserv
> Fridays 9a-12p Friday St. Johns Cafe   /   Thursdays 9a-12p Zoom
> https://bit.ly/virtualfriam
> to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com
> FRIAM-COMIC http://friam-comic.blogspot.com/
> archives:  5/2017 thru present
> https://redfish.com/pipermail/friam_redfish.com/
>   1/2003 thru 6/2021  http://friam.383.s1.nabble.com/
>
-. --- - / ...- .- .-.. .. -.. / -- --- .-. ... . / -.-. --- -.. .
FRIAM Applied Complexity Group listserv
Fridays 9a-12p Friday St. Johns Cafe   /   Thursdays 9a-12p Zoom 
https://bit.ly/virtualfriam
to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com
FRIAM-COMIC http://friam-comic.blogspot.com/
archives:  5/2017 thru present https://redfish.com/pipermail/friam_redfish.com/
  1/2003 thru 6/2021  http://friam.383.s1.nabble.com/

[FRIAM] Dissecting Recall of Factual Associations in, Auto-Regressive Language Models

2023-05-07 Thread Steve Smith


https://arxiv.org/pdf/2304.14767.pdf

I am pretty much over my head in this literature, but continue to be 
fascinated as I watch people who are not try to untangle some 
explanatory power in their models...


The details of this analysis or framing this as /information flow/ 
rather than /static data/structure/ is reminiscent of some very nascent 
work we *tried* to do 15 years ago, attempting to analyze/understand 
huge Systems Dynamics models of Critical Infrastructure joined 
together/coupled to try to predict the potential for cascading failures 
through these coupled systems.   The representation *as* SD models were 
natural for this framing but we made only the tiniest progress IMO in 
extracting hints of *explanatory* narratives.    I was primarily doing 
visualization on those tasks but tried to focus on clustering of the 
Dual Graph/Network  to find structure in the *flow* during extreme 
events rather than in the engineered/designed structure of the network 
itself.


I know there are others on this list who have worked with complex, 
dynamic networks  (I'm thinking of Frank's colleagues and Causal 
Discovery in Graphical Models,   various project Glen has alluded to, 
and a wide variety of problems Stephen has related to me over the years, 
but I'm sure there are plenty of others)... I'm curious if anyone else 
is wading in this deep (and more to the point, finding any traction)?


From the paper:
-. --- - / ...- .- .-.. .. -.. / -- --- .-. ... . / -.-. --- -.. .
FRIAM Applied Complexity Group listserv
Fridays 9a-12p Friday St. Johns Cafe   /   Thursdays 9a-12p Zoom 
https://bit.ly/virtualfriam
to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com
FRIAM-COMIC http://friam-comic.blogspot.com/
archives:  5/2017 thru present https://redfish.com/pipermail/friam_redfish.com/
  1/2003 thru 6/2021  http://friam.383.s1.nabble.com/

Re: [FRIAM] Dissecting Recall of Factual Associations in, Auto-Regressive Language Models

Re: [FRIAM] Dissecting Recall of Factual Associations in, Auto-Regressive Language Models

[FRIAM] Dissecting Recall of Factual Associations in, Auto-Regressive Language Models

3 matches

Site Navigation

Mail list logo

Footer information