re: [tips] signal detection and ROC curves
On Thu, 28 Jan 2016 20:08:38 -0800, Carol DeVolder wrote: Dear TIPSters, I am currently teaching about the Theory of Signal Detectability, Stevens's Power Law, and ROC curves in my Sensation and Perception course. I have to admit that I find your lumping Stevens' Power law with SDT and ROC (or, depending upon the phenomenon being studied MOC or Memory Operating Characteristic curves or AOC or Attention Operating Characteristics or the more general measure AUC or Area under the Curve). Given that SDT was developed in the context of detecting weak signals in presence of noise while the Power law is supposed to represent the relationship of stimulus magnitude to sensory/subjective magnitude, I find it hard to reconcile the two theories into a single framework. Historically, Fechner leads to Stevens (among others) for relating stimulus energies to sensation -- all above an "absolute threshold" (if one believes in such a thing). SDT does away with the concept of threshold in favor of describing a person's performance in term of sensitivity (ability to detect a stimulus, usually in a background of noise of some sort) and bias or willingness to say "Yes" (in a Yes-No task; other response in multiple alternative tasks) which if often assumed to be independent of sensitivity (but may be wrong in certain situations). This is why simple measures of "accuracy" like "percent correct" are often misleading indicators of a person's ability to detect or discriminate stimuli. Do any of you have any examples that you work on in class or use to illustrate how to implement them? You do understand that the types of task you would use with SDT (ROC is just one way to represent the performance on SDT tasks) would be different from those used with Power law? If you put a gun to my head and say you'll blow my brains out if I don't come with appropriate tasks, I'd suggest: (1) Showing how the Self-Reference Effect (SRE; typically a recognition memory task that uses SDT analysis -- see the Http://opl.apa.org website for their implementation) and (2) How to use magnitude estimation procedures for various social phenomena, such as seriousness of different crimes. If Hugh Foley is still on Tips, he can provide more information about this type of research from when he worked with Dave Cross and others at Stony Brook back when he was in grad school (a cohort of mine). I want to do several things. First, I want to be able to explain the logic of SDT, the power law, and ROCs. It is probably me but I would have said the following instead of what you wrote above: (1) What is SDT, how it is a model of decision-making about stimuli when they are difficult to detect or discriminate (not limited to human; animal psychophysics have also used SDT analysis), and how the ROC provides a convenient representation of the performance on a SDT task (i.e., it shows the degree of sensitivity as reflected by d' or a similar measure, the effect of payoffs and probabilities of stimuli [placement of Beta along the ROC curve], and accuracy [the area under the ROC curve]). Second, I want to be able to make the topics relevant and convince the students that these concepts are active in their daily lives. I think you need to be a little bit more specific about which "concepts" you're referring to. Stevens' power law is just one example of the "psychophysical law" and it has a number of problems associated with it -- see the entry on Wikipedia for a brief presentation on the objections to it: https://en.wikipedia.org/wiki/Stevens'_power_law Shepard has shown that what researcher what to do when it comes to the psychophysical law if show the following relationship: Sensation = f(stimulus energy) The problem is that we cannot directly observe sensation so we typically rely upon the following empirical relationship: Response = f(stimulus energy) In both cases, f(stimulus energy) is a mathematical function relation stimulus energy to sensation or response but the function can take a variety of form (just ask any Fechnerian ; -). Shepard, however, has pointed out that this assume that there is a simple relationship between response and sensation or Response = f(sensation) which can be ignored -- it has been ignored or over simplified in Stevens and other psychophysical functions. So, the equation that is possibly operating is: Response = f(sensation[f(stimulus energy])) That is, the observed response on, say, a magnitude estimation task is the result of a function of a function, each may differ for different stimuli. With respect to SDT, originally it was based on Wald's statistical decision theory which we are most familiar with whenever we use the Neyman-Pearson framework for doing statistical analysis in contrast to classical Fisherian analysis (i.e., it involves the concepts of Type II errors, statistical power, confidence errors, etc.). So, SDT represents a model of how (some) people might make decisions in certain situations (if one were so
RE: [tips] signal detection and ROC curves
Carol, E-mail in three parts 1. The activity I use to demonstrate SDT 2. Why SDT is useful and applicable 3. Why ROC curves are better in application PART 1 I use "the dice game" activity when teaching SDT and ROC curves and find that it helps students really grasp how shifting the criterion has no effect on estimated d' but does change estimated beta. How the game is played. I role 3 six-sided dice. Two of the dice are normal ranging from 1-6 and the third die (called the signal) is either 0 (1-3) or 1 (4-6). The goal of the game is to determine based on the total number of all three dice whether the signal die is a 0 or a 1. The regular dice produce the noise in which the signal is either hidden or not. You can play the game a few times and then ask students how they decide when to say signal or no signal, most will develop a natural criterion point and which totals above some number result in saying signal (you may need an aside on the gambler's fallacy too). You can manipulate signal strength by making the value of the signal die larger (e.g., 0,3 or 0,6) and play again. They will see that the stronger the signal, the easier it is to be accurate. You can also introduce pay-off matrices in terms of points for hits vs. correct rejections and watch their criterion shift in one direction or the other. This is all fun but the real power of the game is in the next step. You can create the probability function for both outcomes for every dice total (and it isn't overwhelming because there are only 36 possible noise totals and 36 possible signal+noise totals). For example: a total of 2 must be (1-1-0; with the last number representing the signal die value). A total of 4 can be (1-3-0, 3-1-0, 2-2-0 for the no signal combinations and 1-2-1, 2-1-1 for the signal present combinations). I have my students draw these on graph paper and the patterns of number of combinations becomes obvious. Further, if they draw the distributions for two different signal strengths and they will see the s+n curve shift to the right. Once you have these distributions you can choose any criterion (let's say 8 or higher total I say signal) and calculate the hit and false alarm rate. Hit rate will be 21/36 or 58.3% (there are 21 combinations of the two regular dice plus 1 that produce a total of 8 or higher) and the false alarm rate will be 15/36 or 41.6%. With these two values students can use a computational estimate for d' (d'=z(hit)-z(FA)). I have a spreadsheet that does this OR use this website by Ian Neath (http://memory.psych.mun.ca/models/dprime/). Thus for the example d' is .422 and beta (is 1 which isn't computed on the website). Students can chose different criterion and should note that d' changes only slightly (because it is an estimate) but beta will shift (the website uses C which is easier to interpret because no bais is zero with values being either positive for conservative - less accepting of a Type 1 error - criterion point and negative for less conservative - less accepting of a Type 2 error). PART 2 The primary value of SDT is for comparison of two circumstances where there is bias toward one type of error or the other and you wish to compare the two situations. For example, lets say we are designing a severe weather indicator for small aircraft. One display results in 97% hits (correctly recognizing severe weather when it is present) but also produces (65% false alarms). Is that display better or worse than one that produces only 80% hits and 9% false alarms? Based on SDT estimates of d' the second display is better (d' of 2.18 for the latter and 1.49 for the former). The big difference is that the two displays produce different bias in responding and if we were to adopt the same level of bias in the second display resulting in 97% hit rate we would find that the associated FA rate would be 38%. Gee, wouldn't it be nice if we could somehow visualize how that works? You can with an ROC curve. But the even more important question is do you want a weather display that encourages MORE risky decisions even if it is better in terms of absolute signal detection? I use SDT analysis all the time in Human Factors applications, you'll find it (or a derivative) in medial research and anyone who has been to the eye doctor should be able to appreciate that comparing two images repeatedly until you can't tell a difference could be considered a process of driving d' between the two option to zero (I'll have to think about this one a little more). PART 3 I take the example I explained in PART 2 and plot it with hit rate on the y-axis and FA rate on the X-axis. Two points are difficult to compare because one has a much better hit rate but the other has a better FA rate. Assuming we can manipulate bias in our observers you can use instructions or incentives to generate more points and start to estimate the curve associated with each
Re: [tips] signal detection and ROC curves
The main point I liked to make about Signal Detectability is that there is no such thing in the sense that a given stimulus has a given strength below which it cannot be detected. First you must define the response being controlled by the stimulus. We are really talking about changes in the likelihood of occurrence of a specified response given the presence of a certain stimulus situation. A particular change in the strength of a stimulus may increase the likelihood of one response enough for it to be emitted, while not a different response. So SDT is really about behavior under stimulus control, not just stimuli. for my own experimental application: "Brandon, Paul K. A Signal Detection Analysis of Counting Behavior (1981). in Quantitative Analysis of Behavior vol.I, Michael Commons and John A. Nevin, eds., Ballinger” On Jan 28, 2016, at 10:06 PM, Carol DeVolderwrote: > Dear TIPSters, > I am currently teaching about the Theory of Signal Detectability, Stevens's > Power Law, and ROC curves in my Sensation and Perception course. Do any of > you have any examples that you work on in class or use to illustrate how to > implement them? I want to do several things. First, I want to be able to > explain the logic of SDT, the power law, and ROCs. Second, I want to be able > to make the topics relevant and convince the students that these concepts are > active in their daily lives. And third, I want to give them some > opportunities to practice. I've already talked about hits, misses, false > alarms, and correct rejections in class, and using payoffs to manipulate > response criteria, now I want to make it all applicable.I welcome any and all > ideas. > > Thank you very much. > Carol Paul Brandon Emeritus Professor of Psychology Minnesota State University, Mankato pkbra...@hickorytech.net --- You are currently subscribed to tips as: arch...@mail-archive.com. To unsubscribe click here: http://fsulist.frostburg.edu/u?id=13090.68da6e6e5325aa33287ff385b70df5d5=T=tips=48009 or send a blank email to leave-48009-13090.68da6e6e5325aa33287ff385b70df...@fsulist.frostburg.edu
Re: [tips] signal detection and ROC curves
On Fri, 29 Jan 2016 07:47:20 -0800, Paul Brandon wrote: The main point I liked to make about Signal Detectability is that there is no such thing in the sense that a given stimulus has a given strength below which it cannot be detected. Exactly right, The old idea of an absolute threshold is shown to be wrong because it is not the threshold that varies and produces a normal distribution (or other probability distribution) of sensations but there is an intrinsic background level of "noise" (be it neural or a combination of factors) that exists and is used as a reference level that the new distribution of "signal+noise" is compared to. Thus, the ratio of the signal+noise distribution to the noise distribution (i.e., the likelihood ratio), serves as the basis for making a decision. The comparison of this ratio L(S+N/N) to Beta (criterion or a fixed value of L(S+N/N) for the combination of payoffs, probabilities of signals/stimuli, distributions, etc.) is what serves as the person's/organism's decision rule: If L(S+N/N) > Beta, say "Yes" or "Stimulus present" if L(S+N/N) < Beta say "No" or "Stimulus absent" If L(S+N/N) = Beta guess. ;-) So, unlike the old absolute threshold notion that there is an energy level that cannot be detected, we have sensations that are produced even by weak stimuli and the only question is do they produce a S+N distribution of sensations that differs from noise alone. Of course, our willingness to say "Yes" is only partly determined by this because the pay-off matrix (costs of being wrong, benefits of being right) and probability of the stimulus) play important roles. First you must define the response being controlled by the stimulus. We are really talking about changes in the likelihood of occurrence of a specified response given the presence of a certain stimulus situation. A particular change in the strength of f a stimulus may increase the likelihood of one response enough for it to be emitted, while not a different response. Don't forget the effect of context on underlying noise distribution. Detecting the presence of a weak flash of light through a pinhole or a small area of a computer screen will be affected by whether you do the task in a room with bright lighting or completely dark. David Krantz & Co have estimated that it might take a single quantum of light to activate a rod in the eye under conditions of pure darkness for the dark adapted eye (remember the commercials that said one could see the light of candle several thousand feet away on a dark night [assuming no light pollution]) but under ordinary light conditions, a stimulus, even a weak one, will require many more quanta in order to produce a sensation that leads to detection or, in other words, a d-prime not equal to zero or a Hit rate not = False Alarm rate (or AUC = .50). So SDT is really about behavior under stimulus control, not just stimuli. for my own experimental application: Your behavioristic tendencies are showing. ;-) "Brandon, Paul K. A Signal Detection Analysis of Counting Behavior (1981). in Quantitative Analysis of Behavior vol.I, Michael Commons and John A. Nevin, eds., Ballinger" Remember Skinner's comparison of his approach to that of Tolman that I mentioned in a previous post? Tolman asserted that certain variables operated within the organism while Skinner argued that those variables operated in the environment. The latter gives rise to notions like "stimulus control" while the former gives rise to the evaluation of evidence, an internal process. This then raises the question of whether SDT is correctly specified or even the correct model (perhaps Luce's choice axioms provide a better description). -Mike Palij New York University m...@nyu.edu --- You are currently subscribed to tips as: arch...@mail-archive.com. To unsubscribe click here: http://fsulist.frostburg.edu/u?id=13090.68da6e6e5325aa33287ff385b70df5d5=T=tips=48011 or send a blank email to leave-48011-13090.68da6e6e5325aa33287ff385b70df...@fsulist.frostburg.edu
RE: [tips] signal detection and ROC curves
On Fri, 29 Jan 2016 08:49:23 -0800, Douglas Peterson wrote: [snip] ... SDT continues to be applicable in a number of settings, particularly medical tests, many use a the AUC that Mike mentions and while this isn't technically SDT (no z transforms) the ROC method is identical (here is a short and good example http://www.nature.com/nmeth/journal/v12/n9/fig_tab/nmeth.3482_SF9.html ) A few points: (1) As I mentioned in an earlier post, SDT is based on Wald's statistical theory which serves as the basis for the Neyman-Pearson framework for statistical testing. The decision matrix originally developed is a 2 x 2 table where the rows represent the response ("yes" or "no", "present" or "absent", etc.) and the columns represent the "true state of nature", that is, stimulus was presented or not presented (this is knows with absolute certainty since they are selected by the researcher; given that the "true state" is known, the question that remains is how well do the responses or decision match the true state -- if the Hit rate is 100% and Correct Rejection ate is 100%, then there the False Alarm rate = 0.00 and the Miss rate = 0.00, in other words, performance is perfect which with weak stimuli in psychophysics rarely/never occurs). (2) I am puzzled by Peterson's statement that AUC is not really SDT given that it's equivalent A' was developed by memory researchers as early as the 1960s and has been shown to be part of SDT. In the http://opl.apa.org experiment on the "Self Reference Effect", the dependent variable is a version of A' that represents the area under "curve" created by the single pair of Hit and False Alarm rates. One reference on this point is the following: Macmillan, N. A., & Creelman, C. D. (1996). Triangles in ROC space: History and theory of "nonparametric" measures of sensitivity and response bias. Psychonomic Bulletin & Review, 3(2), 164-170. Given that the ROC/MOC/AOC is presented in a unit square -- the x-axis represents the probability of a false alarm is limited to the range 0.00 to 1.00 and the y-axis represents the prob of a Hit which also ranges from 0.00 to 1.00 -- chance performance is represented by the diagonal line representing P(Hit)=P(FA). In traditional SDT this implies d-prime is zero. It also implies that the area under the performance curve is 0.50 which can be interpreted as a measure of accuracy; in this case, it represents chance performance (hence the term "chance diagonal"). In most Yes-No recognition memory experiments, only one hit rate and one false alarm rate is obtained. For nonrandom performance, this provides a single point above the chance diagonal, forming a triangle with the chance diagonal as the base. The sum of the area of the triangle and the area under the chance diagonal (i.e., 0.50) becomes a measure of accuracy. As the Hit rate increases and the False Alarm rate decreases, the area in the triangle increases -- in the limit when the False Alarm rate is zero, the triangle fills the upper space and A' or AuC is 1.00 or the entire area of the unit-square. Thus, perfect performance is represented by A' = AuC = 1.00. (3) In making a medical diagnosis or interpreting a medical test, the same reasoning above is employed but the terms differ:: Hit rate becomes True Positive Rate = "Sensitivity" Correct Rejection become True Negative Rate = "Specificity" For more on these ideas and how they are used to determine how good your usual medical test is, see the Wikipedia entry: https://en.wikipedia.org/wiki/Sensitivity_and_specificity This entry eventually leads to d-prime but go to the Wikipedia entry on ROC curves for alternative measures; including AuC: https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve This my third post to TiPS today, so no more till the morrow. -Mike Palij New York University m...@nyu.edu --- You are currently subscribed to tips as: arch...@mail-archive.com. To unsubscribe click here: http://fsulist.frostburg.edu/u?id=13090.68da6e6e5325aa33287ff385b70df5d5=T=tips=48012 or send a blank email to leave-48012-13090.68da6e6e5325aa33287ff385b70df...@fsulist.frostburg.edu
RE: [tips] signal detection and ROC curves
No argument here. Just me not being clear. A' and AUC are valid measures comparing two systems and much more interpretable than other SDT measures given the parameters as Mike explains but they are not direct measures of SDT parameters as typically explains. Pastore, Crawley, Berens and Skelly (2003) present a good discussion of the issues including the advantages and disadvantages of A'. Specifically, A' is not independent from bias and is actually a poorer estimate when performance is nearer to perfect in terns of hits or false alarms. For the 3 of us who care about this issue, estimates of d' aren't much good in those extremes either. Macmillan and Creelman (1991) suggest adjusting hit rates of 100% to 1-(1/2n) an false alarm rates to 1/2n and I don't have any reason to doubt that I just don't see it used very often. The use of the sensitivity/specificity reporting doesn't capture both the sensitive and response bias as explained in SDT examples (i.e.g, estimate of the distance between the two distributions (I believe this the reason that this entry is clear to distinguish the sensitivity index, called d' as something different from sensitivity as true positives). The two approaches might be considered two sides of the same coin but they are not the same side of the same coin. Macmillan, N.A., & Creelman, C.D. (1991). Detection Theory: A User’s Guide. NY: Cambridge University Press. Pastore, R.E., Crawley, E.J., Berens, M.S., & Skelley, M.A. (2003). "Nonparametirc" A' and other modern misconceptions about signal detection theory. Psychonomic Builletin and Review, 10(3), 556-569. Doug Peterson, PhD Associate Professor of Psychology The University of South Dakota Vermillion SD 57069 605.677.5295 From: Mike Palij [m...@nyu.edu] Sent: Friday, January 29, 2016 12:05 PM To: Teaching in the Psychological Sciences (TIPS) Cc: Michael Palij Subject: RE: [tips] signal detection and ROC curves On Fri, 29 Jan 2016 08:49:23 -0800, Douglas Peterson wrote: [snip] >... SDT continues to be applicable in a number of settings, >particularly medical tests, many use a the AUC that Mike mentions >and while this isn't technically SDT (no z transforms) the ROC >method is identical (here is a short and good example > http://www.nature.com/nmeth/journal/v12/n9/fig_tab/nmeth.3482_SF9.html > ) A few points: (1) As I mentioned in an earlier post, SDT is based on Wald's statistical theory which serves as the basis for the Neyman-Pearson framework for statistical testing. The decision matrix originally developed is a 2 x 2 table where the rows represent the response ("yes" or "no", "present" or "absent", etc.) and the columns represent the "true state of nature", that is, stimulus was presented or not presented (this is knows with absolute certainty since they are selected by the researcher; given that the "true state" is known, the question that remains is how well do the responses or decision match the true state -- if the Hit rate is 100% and Correct Rejection ate is 100%, then there the False Alarm rate = 0.00 and the Miss rate = 0.00, in other words, performance is perfect which with weak stimuli in psychophysics rarely/never occurs). (2) I am puzzled by Peterson's statement that AUC is not really SDT given that it's equivalent A' was developed by memory researchers as early as the 1960s and has been shown to be part of SDT. In the http://opl.apa.org experiment on the "Self Reference Effect", the dependent variable is a version of A' that represents the area under "curve" created by the single pair of Hit and False Alarm rates. One reference on this point is the following: Macmillan, N. A., & Creelman, C. D. (1996). Triangles in ROC space: History and theory of "nonparametric" measures of sensitivity and response bias. Psychonomic Bulletin & Review, 3(2), 164-170. Given that the ROC/MOC/AOC is presented in a unit square -- the x-axis represents the probability of a false alarm is limited to the range 0.00 to 1.00 and the y-axis represents the prob of a Hit which also ranges from 0.00 to 1.00 -- chance performance is represented by the diagonal line representing P(Hit)=P(FA). In traditional SDT this implies d-prime is zero. It also implies that the area under the performance curve is 0.50 which can be interpreted as a measure of accuracy; in this case, it represents chance performance (hence the term "chance diagonal"). In most Yes-No recognition memory experiments, only one hit rate and one false alarm rate is obtained. For nonrandom performance, this provides a single point above the chance diagonal, forming a triangle with the chance diagonal as the base. The sum of the area of the triangle and the area under the chance diagonal (i.e., 0.50) becomes a measure of accuracy. As the Hit rate increases and the False Alarm rate decreases, the area in the triangle increases -- in the limit when the False Alarm rate is zero, the triangle
[tips] signal detection and ROC curves
Thank you all for your great responses. Mike, I knew I could count on you, and yes, I read your message in its entirety. :) Why I lumped all of that together is that it is all lumped together in the unit we are on. I talked about each separately, but since my students tend to be math phobes, I wanted to not only convey how each procedure is carried out, but really wanted some mundane examples in addition to practical. And Annette, I am reading through the information on Wixted's page. Thanks again to all, I appreciate your help. -- Carol DeVolder, Ph.D. Professor of Psychology St. Ambrose University 518 West Locust Street Davenport, Iowa 52803 563-333-6482 --- You are currently subscribed to tips as: arch...@mail-archive.com. To unsubscribe click here: http://fsulist.frostburg.edu/u?id=13090.68da6e6e5325aa33287ff385b70df5d5=T=tips=48014 or send a blank email to leave-48014-13090.68da6e6e5325aa33287ff385b70df...@fsulist.frostburg.edu
Re: [tips] signal detection and ROC curves
On Jan 29, 2016, at 10:54 AM, Mike Palijwrote: >> So SDT is really about behavior under stimulus control, not just stimuli. >> for my own experimental application: > > Your behavioristic tendencies are showing. ;-) I’ll take that as a compliment ;-). >> "Brandon, Paul K. >> A Signal Detection Analysis of Counting Behavior (1981). >> in Quantitative Analysis of Behavior vol.I, Michael Commons and John A. >> Nevin, >> eds., Ballinger" > > Remember Skinner's comparison of his approach to that of Tolman > that I mentioned in a previous post? Tolman asserted that certain > variables operated within the organism while Skinner argued that > those variables operated in the environment. The latter gives rise to > notions like "stimulus control" while the former gives rise to the > evaluation of evidence, an internal process. This then raises the > question of whether SDT is correctly specified or even the correct > model (perhaps Luce's choice axioms provide a better description). > > -Mike Palij > New York University > m...@nyu.edu As I read Skinner (and I’ve read most of it) he never denied the existence of immediate causation (internal mediating processes) — but he doubted that the state of neurology during his time was adequate to account for behavior at the level of internal mechanisms. So we’re not talking about the same variables here; Tolman was talking about intervening variables (a mechanism mediating between environmental variables and behavior), while Skinner was talking about independent, directly observable variables (environment, history) as better predictors of behavior. Paul Brandon Emeritus Professor of Psychology Minnesota State University, Mankato pkbra...@hickorytech.net --- You are currently subscribed to tips as: arch...@mail-archive.com. To unsubscribe click here: http://fsulist.frostburg.edu/u?id=13090.68da6e6e5325aa33287ff385b70df5d5=T=tips=48015 or send a blank email to leave-48015-13090.68da6e6e5325aa33287ff385b70df...@fsulist.frostburg.edu