Re: [R] Stringr / Regular Expressions advice
Sara, Yes, I modified the code that you provided and it worked quite well. Here is the revised code: . accel_data - data *# pattern to be identified* v.to.match - c(438, 454, 459) # call the below function anytime the v.to.match criteria changes to ensure match is updated v.matches - apply(fakedata, 1, function(x)all(x == v.to.match)) which(v.matches) [1] 405 sum(v.matches) [1] 1 .. Again, here is the dataset: dput(head(accel_data, 20)) structure(list(x_reading = c(455L, 451L, 458L, 463L, 462L, 460L, 448L, 449L, 450L, 451L, 445L, 440L, 439L, 445L, 448L, 447L, 440L, 439L, 440L, 434L), y_reading = c(502L, 503L, 502L, 502L, 495L, 505L, 480L, 483L, 489L, 488L, 489L, 456L, 497L, 476L, 470L, 474L, 469L, 482L, 484L, 477L), z_reading = c(454L, 454L, 452L, 452L, 446L, 459L, 456L, 451L, 451L, 455L, 438L, 462L, 437L, 455L, 470L, 455L, 460L, 463L, 458L, 458L)), .Names = c(x_reading, y_reading, z_reading), row.names = c(NA, 20L), class = data.frame) My next goal is to extend the range for each column. For instance: v.to.match - c(438:445, 454:460, 459:470) Your thoughts? Many thanks, Vincent On Fri, Jun 27, 2014 at 5:51 AM, Sarah Goslee sarah.gos...@gmail.com wrote: Hi, It's a good idea to copy back to the list, not just to mo, to keep the discussion all in one place. On Thursday, June 26, 2014, VINCENT DEAN BOYCE vincentdeanbo...@gmail.com wrote: Sarah, Great feedback and direction. Here is the data I am working with*: dput(head(data_log, 20)) structure(list(x_reading = c(455L, 451L, 458L, 463L, 462L, 460L, 448L, 449L, 450L, 451L, 445L, 440L, 439L, 445L, 448L, 447L, 440L, 439L, 440L, 434L), y_reading = c(502L, 503L, 502L, 502L, 495L, 505L, 480L, 483L, 489L, 488L, 489L, 456L, 497L, 476L, 470L, 474L, 469L, 482L, 484L, 477L), z_reading = c(454L, 454L, 452L, 452L, 446L, 459L, 456L, 451L, 451L, 455L, 438L, 462L, 437L, 455L, 470L, 455L, 460L, 463L, 458L, 458L)), .Names = c(x_reading, y_reading, z_reading), row.names = c(NA, 20L), class = data.frame) *however, I am unsure why the letter L has been appended to each numerical string. It denotes values stored as integers, and is nothing you need to worry about. In any event, as you can see there are three columns of data named x_reading, y_reading and z_reading. I would like to detect patterns among them. For instance, let's say the pattern I wish to detect is 455, 502, 454 across the three columns respectively. As you can see in the data, this is found in the first row.This particular string reoccurs numerous times within the dataset is what I wish to quantify - how many times the string 455, 502, 454 appears. Your thoughts? Did you try the code I provided? It does what I think you're looking for. Sarah Many thanks, Vincent On Thu, Jun 26, 2014 at 4:46 PM, Sarah Goslee sarah.gos...@gmail.com wrote: Hi, On Thu, Jun 26, 2014 at 12:17 PM, VINCENT DEAN BOYCE vincentdeanbo...@gmail.com wrote: Hello, Using R, I've loaded a .cvs file comprised of several hundred rows and 3 columns of data. The data within maps the output of a triaxial accelerometer, a sensor which measures an object's acceleration along the x,y and z axes. The data for each respective column sequentially oscillates, and ranges numerically from 100 to 500. If your data are numeric, why are you using stringr? It would be easier to provide you with an answer if we knew what your data looked like. dput(head(yourdata, 20)) and paste that into your non-HTML email. I want create a function that parses the data and detects patterns across the three columns. For instance, I would like to detect instances when the values for the x,y and z columns equal 150, 200, 300 respectively. Additionally, when a match is detected, I would like to know how many times the pattern appears. That's easy enough: fakedata - data.frame(matrix(c( 100, 100, 200, 150, 200, 300, 100, 350, 100, 400, 200, 300, 200, 500, 200, 150, 200, 300, 150, 200, 300), ncol=3, byrow=TRUE)) v.to.match - c(150, 200, 300) v.matches - apply(fakedata, 1, function(x)all(x == v.to.match)) # which rows match which(v.matches) # how many rows match sum(v.matches) I have been successful using str_detect to provide a Boolean, however it seems to only work on a single vector, i.e, 400 , not a range of values i.e 400 - 450. See below: This is where I get confused, and where we need sample data. Are your data numeric, as you state above, or some other format? If your data are character, and like 400 - 450, you can still match them with the code I suggested above. # this works vals - str_detect (string = data_log$x_reading, pattern = 400) # this also works, but doesn't detect the particular range, rather the existence of the numbers vals - str_detect (string = data_log$x_reading, pattern = [400-450]) Are you trying to match any numeric value in the range 400-450? Again, actual data. Also, it
Re: [R] Stringr / Regular Expressions advice
#or res - mapply(`%in%`, accel_data, v.to.match) res1 - sapply(seq_len(ncol(accel_data)),function(i) accel_data[i]=tail(v.to.match[[i]],1) accel_data[i] =v.to.match[[i]][1]) all.equal(res, res1,check.attributes=F) #[1] TRUE A.K. On Tuesday, July 1, 2014 10:56 PM, arun smartpink...@yahoo.com wrote: Hi Vincent, You could try: v.to.match - list(438:445, 454:460,459:470) sapply(seq_len(ncol(accel_data)),function(i) accel_data[i]=tail(v.to.match[[i]],1) accel_data[i] =v.to.match[[i]][1]) #or use ?cut or ?findInterval A.K. On Tuesday, July 1, 2014 2:23 PM, VINCENT DEAN BOYCE vincentdeanbo...@gmail.com wrote: Sara, Yes, I modified the code that you provided and it worked quite well. Here is the revised code: . accel_data - data *# pattern to be identified* v.to.match - c(438, 454, 459) # call the below function anytime the v.to.match criteria changes to ensure match is updated v.matches - apply(fakedata, 1, function(x)all(x == v.to.match)) which(v.matches) [1] 405 sum(v.matches) [1] 1 .. Again, here is the dataset: dput(head(accel_data, 20)) structure(list(x_reading = c(455L, 451L, 458L, 463L, 462L, 460L, 448L, 449L, 450L, 451L, 445L, 440L, 439L, 445L, 448L, 447L, 440L, 439L, 440L, 434L), y_reading = c(502L, 503L, 502L, 502L, 495L, 505L, 480L, 483L, 489L, 488L, 489L, 456L, 497L, 476L, 470L, 474L, 469L, 482L, 484L, 477L), z_reading = c(454L, 454L, 452L, 452L, 446L, 459L, 456L, 451L, 451L, 455L, 438L, 462L, 437L, 455L, 470L, 455L, 460L, 463L, 458L, 458L)), .Names = c(x_reading, y_reading, z_reading), row.names = c(NA, 20L), class = data.frame) My next goal is to extend the range for each column. For instance: v.to.match - c(438:445, 454:460, 459:470) Your thoughts? Many thanks, Vincent On Fri, Jun 27, 2014 at 5:51 AM, Sarah Goslee sarah.gos...@gmail.com wrote: Hi, It's a good idea to copy back to the list, not just to mo, to keep the discussion all in one place. On Thursday, June 26, 2014, VINCENT DEAN BOYCE vincentdeanbo...@gmail.com wrote: Sarah, Great feedback and direction. Here is the data I am working with*: dput(head(data_log, 20)) structure(list(x_reading = c(455L, 451L, 458L, 463L, 462L, 460L, 448L, 449L, 450L, 451L, 445L, 440L, 439L, 445L, 448L, 447L, 440L, 439L, 440L, 434L), y_reading = c(502L, 503L, 502L, 502L, 495L, 505L, 480L, 483L, 489L, 488L, 489L, 456L, 497L, 476L, 470L, 474L, 469L, 482L, 484L, 477L), z_reading = c(454L, 454L, 452L, 452L, 446L, 459L, 456L, 451L, 451L, 455L, 438L, 462L, 437L, 455L, 470L, 455L, 460L, 463L, 458L, 458L)), .Names = c(x_reading, y_reading, z_reading), row.names = c(NA, 20L), class = data.frame) *however, I am unsure why the letter L has been appended to each numerical string. It denotes values stored as integers, and is nothing you need to worry about. In any event, as you can see there are three columns of data named x_reading, y_reading and z_reading. I would like to detect patterns among them. For instance, let's say the pattern I wish to detect is 455, 502, 454 across the three columns respectively. As you can see in the data, this is found in the first row.This particular string reoccurs numerous times within the dataset is what I wish to quantify - how many times the string 455, 502, 454 appears. Your thoughts? Did you try the code I provided? It does what I think you're looking for. Sarah Many thanks, Vincent On Thu, Jun 26, 2014 at 4:46 PM, Sarah Goslee sarah.gos...@gmail.com wrote: Hi, On Thu, Jun 26, 2014 at 12:17 PM, VINCENT DEAN BOYCE vincentdeanbo...@gmail.com wrote: Hello, Using R, I've loaded a .cvs file comprised of several hundred rows and 3 columns of data. The data within maps the output of a triaxial accelerometer, a sensor which measures an object's acceleration along the x,y and z axes. The data for each respective column sequentially oscillates, and ranges numerically from 100 to 500. If your data are numeric, why are you using stringr? It would be easier to provide you with an answer if we knew what your data looked like. dput(head(yourdata, 20)) and paste that into your non-HTML email. I want create a function that parses the data and detects patterns across the three columns. For instance, I would like to detect instances when the values for the x,y and z columns equal 150, 200, 300 respectively. Additionally, when a match is detected, I would like to know how many times the pattern appears. That's easy enough: fakedata - data.frame(matrix(c( 100, 100, 200, 150, 200, 300, 100, 350, 100, 400, 200, 300, 200, 500, 200, 150, 200, 300, 150, 200, 300), ncol=3, byrow=TRUE)) v.to.match - c(150, 200, 300) v.matches - apply(fakedata, 1, function(x)all(x == v.to.match)) # which rows match which(v.matches) # how many rows match sum(v.matches) I have been successful using str_detect to provide a Boolean, however it seems to only work on a single vector, i.e, 400 , not a range
Re: [R] Stringr / Regular Expressions advice
Hi, It's a good idea to copy back to the list, not just to mo, to keep the discussion all in one place. On Thursday, June 26, 2014, VINCENT DEAN BOYCE vincentdeanbo...@gmail.com wrote: Sarah, Great feedback and direction. Here is the data I am working with*: dput(head(data_log, 20)) structure(list(x_reading = c(455L, 451L, 458L, 463L, 462L, 460L, 448L, 449L, 450L, 451L, 445L, 440L, 439L, 445L, 448L, 447L, 440L, 439L, 440L, 434L), y_reading = c(502L, 503L, 502L, 502L, 495L, 505L, 480L, 483L, 489L, 488L, 489L, 456L, 497L, 476L, 470L, 474L, 469L, 482L, 484L, 477L), z_reading = c(454L, 454L, 452L, 452L, 446L, 459L, 456L, 451L, 451L, 455L, 438L, 462L, 437L, 455L, 470L, 455L, 460L, 463L, 458L, 458L)), .Names = c(x_reading, y_reading, z_reading), row.names = c(NA, 20L), class = data.frame) *however, I am unsure why the letter L has been appended to each numerical string. It denotes values stored as integers, and is nothing you need to worry about. In any event, as you can see there are three columns of data named x_reading, y_reading and z_reading. I would like to detect patterns among them. For instance, let's say the pattern I wish to detect is 455, 502, 454 across the three columns respectively. As you can see in the data, this is found in the first row.This particular string reoccurs numerous times within the dataset is what I wish to quantify - how many times the string 455, 502, 454 appears. Your thoughts? Did you try the code I provided? It does what I think you're looking for. Sarah Many thanks, Vincent On Thu, Jun 26, 2014 at 4:46 PM, Sarah Goslee sarah.gos...@gmail.com javascript:_e(%7B%7D,'cvml','sarah.gos...@gmail.com'); wrote: Hi, On Thu, Jun 26, 2014 at 12:17 PM, VINCENT DEAN BOYCE vincentdeanbo...@gmail.com javascript:_e(%7B%7D,'cvml','vincentdeanbo...@gmail.com'); wrote: Hello, Using R, I've loaded a .cvs file comprised of several hundred rows and 3 columns of data. The data within maps the output of a triaxial accelerometer, a sensor which measures an object's acceleration along the x,y and z axes. The data for each respective column sequentially oscillates, and ranges numerically from 100 to 500. If your data are numeric, why are you using stringr? It would be easier to provide you with an answer if we knew what your data looked like. dput(head(yourdata, 20)) and paste that into your non-HTML email. I want create a function that parses the data and detects patterns across the three columns. For instance, I would like to detect instances when the values for the x,y and z columns equal 150, 200, 300 respectively. Additionally, when a match is detected, I would like to know how many times the pattern appears. That's easy enough: fakedata - data.frame(matrix(c( 100, 100, 200, 150, 200, 300, 100, 350, 100, 400, 200, 300, 200, 500, 200, 150, 200, 300, 150, 200, 300), ncol=3, byrow=TRUE)) v.to.match - c(150, 200, 300) v.matches - apply(fakedata, 1, function(x)all(x == v.to.match)) # which rows match which(v.matches) # how many rows match sum(v.matches) I have been successful using str_detect to provide a Boolean, however it seems to only work on a single vector, i.e, 400 , not a range of values i.e 400 - 450. See below: This is where I get confused, and where we need sample data. Are your data numeric, as you state above, or some other format? If your data are character, and like 400 - 450, you can still match them with the code I suggested above. # this works vals - str_detect (string = data_log$x_reading, pattern = 400) # this also works, but doesn't detect the particular range, rather the existence of the numbers vals - str_detect (string = data_log$x_reading, pattern = [400-450]) Are you trying to match any numeric value in the range 400-450? Again, actual data. Also, it appears that I can only apply it to a single column, not to all three columns. However I may be mistaken. You answer your own question unwittingly - apply(). Sarah -- Sarah Goslee http://www.functionaldiversity.org -- Sarah Goslee http://www.stringpage.com http://www.sarahgoslee.com http://www.functionaldiversity.org [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Stringr / Regular Expressions advice
Hello, Using R, I've loaded a .cvs file comprised of several hundred rows and 3 columns of data. The data within maps the output of a triaxial accelerometer, a sensor which measures an object's acceleration along the x,y and z axes. The data for each respective column sequentially oscillates, and ranges numerically from 100 to 500. I want create a function that parses the data and detects patterns across the three columns. For instance, I would like to detect instances when the values for the x,y and z columns equal 150, 200, 300 respectively. Additionally, when a match is detected, I would like to know how many times the pattern appears. I have been successful using str_detect to provide a Boolean, however it seems to only work on a single vector, i.e, 400 , not a range of values i.e 400 - 450. See below: # this works vals - str_detect (string = data_log$x_reading, pattern = 400) # this also works, but doesn't detect the particular range, rather the existence of the numbers vals - str_detect (string = data_log$x_reading, pattern = [400-450]) Also, it appears that I can only apply it to a single column, not to all three columns. However I may be mistaken. Any advice on my current approach or alternativea I should consider is greatly appreciated. Many thanks, Vincent [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Stringr / Regular Expressions advice
You could define a simple function to detect whether a value is within a given range. For example, inrange - function(vec, range) { !is.na(vec) vec = range[1] vec = range[2] } x - 1:30 inrange(x, c(5, 20)) If you wanted to apply this function to all three columns at once, you could use apply(). For example, apply(data_log, 2, inrange) Jean On Thu, Jun 26, 2014 at 11:17 AM, VINCENT DEAN BOYCE vincentdeanbo...@gmail.com wrote: Hello, Using R, I've loaded a .cvs file comprised of several hundred rows and 3 columns of data. The data within maps the output of a triaxial accelerometer, a sensor which measures an object's acceleration along the x,y and z axes. The data for each respective column sequentially oscillates, and ranges numerically from 100 to 500. I want create a function that parses the data and detects patterns across the three columns. For instance, I would like to detect instances when the values for the x,y and z columns equal 150, 200, 300 respectively. Additionally, when a match is detected, I would like to know how many times the pattern appears. I have been successful using str_detect to provide a Boolean, however it seems to only work on a single vector, i.e, 400 , not a range of values i.e 400 - 450. See below: # this works vals - str_detect (string = data_log$x_reading, pattern = 400) # this also works, but doesn't detect the particular range, rather the existence of the numbers vals - str_detect (string = data_log$x_reading, pattern = [400-450]) Also, it appears that I can only apply it to a single column, not to all three columns. However I may be mistaken. Any advice on my current approach or alternativea I should consider is greatly appreciated. Many thanks, Vincent [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Stringr / Regular Expressions advice
Hi, On Thu, Jun 26, 2014 at 12:17 PM, VINCENT DEAN BOYCE vincentdeanbo...@gmail.com wrote: Hello, Using R, I've loaded a .cvs file comprised of several hundred rows and 3 columns of data. The data within maps the output of a triaxial accelerometer, a sensor which measures an object's acceleration along the x,y and z axes. The data for each respective column sequentially oscillates, and ranges numerically from 100 to 500. If your data are numeric, why are you using stringr? It would be easier to provide you with an answer if we knew what your data looked like. dput(head(yourdata, 20)) and paste that into your non-HTML email. I want create a function that parses the data and detects patterns across the three columns. For instance, I would like to detect instances when the values for the x,y and z columns equal 150, 200, 300 respectively. Additionally, when a match is detected, I would like to know how many times the pattern appears. That's easy enough: fakedata - data.frame(matrix(c( 100, 100, 200, 150, 200, 300, 100, 350, 100, 400, 200, 300, 200, 500, 200, 150, 200, 300, 150, 200, 300), ncol=3, byrow=TRUE)) v.to.match - c(150, 200, 300) v.matches - apply(fakedata, 1, function(x)all(x == v.to.match)) # which rows match which(v.matches) # how many rows match sum(v.matches) I have been successful using str_detect to provide a Boolean, however it seems to only work on a single vector, i.e, 400 , not a range of values i.e 400 - 450. See below: This is where I get confused, and where we need sample data. Are your data numeric, as you state above, or some other format? If your data are character, and like 400 - 450, you can still match them with the code I suggested above. # this works vals - str_detect (string = data_log$x_reading, pattern = 400) # this also works, but doesn't detect the particular range, rather the existence of the numbers vals - str_detect (string = data_log$x_reading, pattern = [400-450]) Are you trying to match any numeric value in the range 400-450? Again, actual data. Also, it appears that I can only apply it to a single column, not to all three columns. However I may be mistaken. You answer your own question unwittingly - apply(). Sarah -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Stringr / Regular Expressions advice
Hi, May be you can use ?cut or ?findInterval for the range dat1 - read.table(text=100, 100, 200 250, 300, 350 100, 350, 100 400, 250, 300 200, 450, 200 150, 501, 300 150, 250, 300,sep=,,header=F) sapply(dat1, findInterval, c(400,500))==1 # V1 V2 V3 #[1,] FALSE FALSE FALSE #[2,] FALSE FALSE FALSE #[3,] FALSE FALSE FALSE #[4,] TRUE FALSE FALSE #[5,] FALSE TRUE FALSE #[6,] FALSE FALSE FALSE #[7,] FALSE FALSE FALSE A.K. On Thursday, June 26, 2014 4:11 PM, VINCENT DEAN BOYCE vincentdeanbo...@gmail.com wrote: Hello, Using R, I've loaded a .cvs file comprised of several hundred rows and 3 columns of data. The data within maps the output of a triaxial accelerometer, a sensor which measures an object's acceleration along the x,y and z axes. The data for each respective column sequentially oscillates, and ranges numerically from 100 to 500. I want create a function that parses the data and detects patterns across the three columns. For instance, I would like to detect instances when the values for the x,y and z columns equal 150, 200, 300 respectively. Additionally, when a match is detected, I would like to know how many times the pattern appears. I have been successful using str_detect to provide a Boolean, however it seems to only work on a single vector, i.e, 400 , not a range of values i.e 400 - 450. See below: # this works vals - str_detect (string = data_log$x_reading, pattern = 400) # this also works, but doesn't detect the particular range, rather the existence of the numbers vals - str_detect (string = data_log$x_reading, pattern = [400-450]) Also, it appears that I can only apply it to a single column, not to all three columns. However I may be mistaken. Any advice on my current approach or alternativea I should consider is greatly appreciated. Many thanks, Vincent [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.