Re: [R] splitting a vector of strings

2016-07-21 Thread Michael Dewey

Dear Eric

I think you are looking for sub or gsub

Without an example set of input and output I am not quite sure but you 
would need to define an expression which matches your separator (;) 
followed by any characters up to the end of line. If you have trouble 
with that then someone here will no doubt write the pattern for you but 
learning about regular expressions is well worthwhile


On 21/07/2016 12:54, Eric Elguero wrote:

Hi everybody,

I have a vector of character strings.
Each string has the same pattern and I want
to split them in pieces and get a vector made
of the first pieces of each string.

The problem is that strsplit returns a list.

All I found is

uu<- matrix(unlist(strsplit(x,";")),ncol=3,byrow=T)[,1]

where x is the vector ";" is the delimiting character
and I know that each string will be cut in 3 pieces.

That works for my problem but I would prefer a
more elegant solution. Besides, it would not
work if all the string didn't have the same
number of pieces.

does someone have a better solution?

sorry if that topic was discussed recently.
There is too much traffic on the r-help list,
I cannot catch up.



--
Michael
http://www.dewey.myzen.co.uk/home.html

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] splitting a vector of strings

2016-07-21 Thread Ben Tupper
Hi,

I'm not sure about the more generalized solution, but how about this for a 
start.


x <- c("a;b;c", "d;e", "foo;g;h;i")
x
#[1] "a;b;c" "d;e"   "foo;g;h;i"

sapply(strsplit(x, ";",fixed = TRUE), '[',1)
#[1] "a"   "d"   "foo"

If you want elegance then I suggest you take a look at the stringr package. 

https://cran.r-project.org/web/packages/stringr/index.html

Cheers,
Ben


> On Jul 21, 2016, at 7:54 AM, Eric Elguero  wrote:
> 
> Hi everybody,
> 
> I have a vector of character strings.
> Each string has the same pattern and I want
> to split them in pieces and get a vector made
> of the first pieces of each string.
> 
> The problem is that strsplit returns a list.
> 
> All I found is
> 
> uu<- matrix(unlist(strsplit(x,";")),ncol=3,byrow=T)[,1]
> 
> where x is the vector ";" is the delimiting character
> and I know that each string will be cut in 3 pieces.
> 
> That works for my problem but I would prefer a
> more elegant solution. Besides, it would not
> work if all the string didn't have the same
> number of pieces.
> 
> does someone have a better solution?
> 
> sorry if that topic was discussed recently.
> There is too much traffic on the r-help list,
> I cannot catch up.
> 
> -- 
> Eric Elguero
> 
> MIVEGEC. - UMR (CNRS/IRD/UM) 5290
> Maladies Infectieuses et Vecteurs, Génétique, Evolution et Contrôle
> Institut de Recherche pour le Développement (IRD)
> 911, Avenue Agropolis
> BP 64501
> 34394 Montpellier Cedex 5, France
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Ben Tupper
Bigelow Laboratory for Ocean Sciences
60 Bigelow Drive, P.O. Box 380
East Boothbay, Maine 04544
http://www.bigelow.org

Report Gulf of Maine jellyfish sightings to jellyf...@bigelow.org or tweet them 
to #MaineJellies -- include date, time, and location, as well as any 
descriptive information such as size or type.  Learn more at 
https://www.bigelow.org/research/srs/nick-record/nick-record-laboratory/mainejellies/

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] splitting a vector of strings

2016-07-21 Thread Eric Elguero

Hi everybody,

I have a vector of character strings.
Each string has the same pattern and I want
to split them in pieces and get a vector made
of the first pieces of each string.

The problem is that strsplit returns a list.

All I found is

uu<- matrix(unlist(strsplit(x,";")),ncol=3,byrow=T)[,1]

where x is the vector ";" is the delimiting character
and I know that each string will be cut in 3 pieces.

That works for my problem but I would prefer a
more elegant solution. Besides, it would not
work if all the string didn't have the same
number of pieces.

does someone have a better solution?

sorry if that topic was discussed recently.
There is too much traffic on the r-help list,
I cannot catch up.

--
Eric Elguero

MIVEGEC. - UMR (CNRS/IRD/UM) 5290
Maladies Infectieuses et Vecteurs, Génétique, Evolution et Contrôle
Institut de Recherche pour le Développement (IRD)
911, Avenue Agropolis
BP 64501
34394 Montpellier Cedex 5, France

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] splitting a vector of strings...

2009-10-22 Thread andrew
the following works - double backslash to remove the "or"
functionality of | in a regex.  (Bill Dunlap showed that you don't
need sapply for it to work)

xs <- "this is | string"
xsv <- paste(xs, 1:10)
strsplit(xsv, "\\|")


On Oct 23, 3:50 pm, Jonathan Greenberg  wrote:
> William et al:
>
>     Thanks!  I think I have a somewhat more complicated issue due to the
> type of string I'm using -- the split is " | " (space pipe space) -- how
> do I code that based on your sub code below?  Using " | *" doesn't seem
> to be working.  Thanks!
>
> --j
>
>
>
> William Dunlap wrote:
> >> -Original Message-
> >> From: r-help-boun...@r-project.org
> >> [mailto:r-help-boun...@r-project.org] On Behalf Of Jonathan Greenberg
> >> Sent: Thursday, October 22, 2009 7:35 PM
> >> To: r-help
> >> Subject: [R] splitting a vector of strings...
>
> >> Quick question -- if I have a vector of strings that I'd like
> >> to split
> >> into two new vectors based on a substring that is inside of
> >> each string,
> >> what is the most efficient way to do this?  The substring
> >> that I want to
> >> split on is multiple characters, if that matters, and it is
> >> contained in
> >> every element of the character vector.
>
> > strsplit and sub can both be used for this.  If you know
> > the string will be split into 2 parts then 2 calls to sub
> > with slightly different patterns will do it.  strsplit requires
> > less fiddling with the pattern and is handier when the number
> > of parts is variable or large.  strsplit's output often needs to
> > be rearranged for convenient use.
>
> > E.g., I made 100,000 strings with a 'qaz' in their middles with
> >   x<-paste("X",sample(1e5),sep="")
> >   y<-sub("X","Y",x)
> >   xy<-paste(x,y,sep="qaz")
> > and split them by the 'qaz' in two ways:
> >   system.time(ret1<-list(x=sub("qaz.*","",xy),y=sub(".*qaz","",xy)))
> >   # user  system elapsed
> >   # 0.22    0.00    0.21
>
> > system.time({tmp<-strsplit(xy,"qaz");ret2<-list(x=unlist(lapply(tmp,`[`,
> > 1)),y=unlist(lapply(tmp,`[`,2)))})
> >    user  system elapsed
> >   # 2.42    0.00    2.20
> >   identical(ret1,ret2)
> >   #[1] TRUE
> >   identical(ret1$x,x) && identical(ret1$y,y)
> >   #[1] TRUE
>
> > Bill Dunlap
> > Spotfire, TIBCO Software
> > wdunlap tibco.com
>
> >> --j
>
> >> --
>
> >> Jonathan A. Greenberg, PhD
> >> Postdoctoral Scholar
> >> Center for Spatial Technologies and Remote Sensing (CSTARS)
> >> University of California, Davis
> >> One Shields Avenue
> >> The Barn, Room 250N
> >> Davis, CA 95616
> >> Phone: 415-763-5476
> >> AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307
>
> >> __
> >> r-h...@r-project.org mailing list
> >>https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >>http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
>
> --
>
> Jonathan A. Greenberg, PhD
> Postdoctoral Scholar
> Center for Spatial Technologies and Remote Sensing (CSTARS)
> University of California, Davis
> One Shields Avenue
> The Barn, Room 250N
> Davis, CA 95616
> Phone: 415-763-5476
> AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307
>
> __
> r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] splitting a vector of strings...

2009-10-22 Thread Jonathan Greenberg

William et al:

   Thanks!  I think I have a somewhat more complicated issue due to the 
type of string I'm using -- the split is " | " (space pipe space) -- how 
do I code that based on your sub code below?  Using " | *" doesn't seem 
to be working.  Thanks!


--j

William Dunlap wrote:

-Original Message-
From: r-help-boun...@r-project.org 
[mailto:r-help-boun...@r-project.org] On Behalf Of Jonathan Greenberg

Sent: Thursday, October 22, 2009 7:35 PM
To: r-help
Subject: [R] splitting a vector of strings...

Quick question -- if I have a vector of strings that I'd like 
to split 
into two new vectors based on a substring that is inside of 
each string, 
what is the most efficient way to do this?  The substring 
that I want to 
split on is multiple characters, if that matters, and it is 
contained in 
every element of the character vector.



strsplit and sub can both be used for this.  If you know
the string will be split into 2 parts then 2 calls to sub
with slightly different patterns will do it.  strsplit requires
less fiddling with the pattern and is handier when the number
of parts is variable or large.  strsplit's output often needs to
be rearranged for convenient use.

E.g., I made 100,000 strings with a 'qaz' in their middles with
  x<-paste("X",sample(1e5),sep="")
  y<-sub("X","Y",x)
  xy<-paste(x,y,sep="qaz")
and split them by the 'qaz' in two ways:
  system.time(ret1<-list(x=sub("qaz.*","",xy),y=sub(".*qaz","",xy)))
  # user  system elapsed 
  # 0.220.000.21 
 
system.time({tmp<-strsplit(xy,"qaz");ret2<-list(x=unlist(lapply(tmp,`[`,

1)),y=unlist(lapply(tmp,`[`,2)))})
   user  system elapsed 
  # 2.420.002.20 
  identical(ret1,ret2)

  #[1] TRUE
  identical(ret1$x,x) && identical(ret1$y,y)
  #[1] TRUE

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

  

--j

--

Jonathan A. Greenberg, PhD
Postdoctoral Scholar
Center for Spatial Technologies and Remote Sensing (CSTARS)
University of California, Davis
One Shields Avenue
The Barn, Room 250N
Davis, CA 95616
Phone: 415-763-5476
AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.




--

Jonathan A. Greenberg, PhD
Postdoctoral Scholar
Center for Spatial Technologies and Remote Sensing (CSTARS)
University of California, Davis
One Shields Avenue
The Barn, Room 250N
Davis, CA 95616
Phone: 415-763-5476
AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] splitting a vector of strings...

2009-10-22 Thread andrew
xs <- "this is string"
xsv <- paste(xs, 1:10)
sapply(xsv, function(x) strsplit(x, '\\sis\\s'))

This will split the vector of string "xsv" on the word 'is' that has a
space immediately before and after it.



On Oct 23, 1:34 pm, Jonathan Greenberg  wrote:
> Quick question -- if I have a vector of strings that I'd like to split
> into two new vectors based on a substring that is inside of each string,
> what is the most efficient way to do this?  The substring that I want to
> split on is multiple characters, if that matters, and it is contained in
> every element of the character vector.
>
> --j
>
> --
>
> Jonathan A. Greenberg, PhD
> Postdoctoral Scholar
> Center for Spatial Technologies and Remote Sensing (CSTARS)
> University of California, Davis
> One Shields Avenue
> The Barn, Room 250N
> Davis, CA 95616
> Phone: 415-763-5476
> AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307
>
> __
> r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] splitting a vector of strings...

2009-10-22 Thread William Dunlap
> -Original Message-
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of Jonathan Greenberg
> Sent: Thursday, October 22, 2009 7:35 PM
> To: r-help
> Subject: [R] splitting a vector of strings...
> 
> Quick question -- if I have a vector of strings that I'd like 
> to split 
> into two new vectors based on a substring that is inside of 
> each string, 
> what is the most efficient way to do this?  The substring 
> that I want to 
> split on is multiple characters, if that matters, and it is 
> contained in 
> every element of the character vector.

strsplit and sub can both be used for this.  If you know
the string will be split into 2 parts then 2 calls to sub
with slightly different patterns will do it.  strsplit requires
less fiddling with the pattern and is handier when the number
of parts is variable or large.  strsplit's output often needs to
be rearranged for convenient use.

E.g., I made 100,000 strings with a 'qaz' in their middles with
  x<-paste("X",sample(1e5),sep="")
  y<-sub("X","Y",x)
  xy<-paste(x,y,sep="qaz")
and split them by the 'qaz' in two ways:
  system.time(ret1<-list(x=sub("qaz.*","",xy),y=sub(".*qaz","",xy)))
  # user  system elapsed 
  # 0.220.000.21 
 
system.time({tmp<-strsplit(xy,"qaz");ret2<-list(x=unlist(lapply(tmp,`[`,
1)),y=unlist(lapply(tmp,`[`,2)))})
   user  system elapsed 
  # 2.420.002.20 
  identical(ret1,ret2)
  #[1] TRUE
  identical(ret1$x,x) && identical(ret1$y,y)
  #[1] TRUE

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

> 
> --j
> 
> -- 
> 
> Jonathan A. Greenberg, PhD
> Postdoctoral Scholar
> Center for Spatial Technologies and Remote Sensing (CSTARS)
> University of California, Davis
> One Shields Avenue
> The Barn, Room 250N
> Davis, CA 95616
> Phone: 415-763-5476
> AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] splitting a vector of strings...

2009-10-22 Thread Jonathan Greenberg
Quick question -- if I have a vector of strings that I'd like to split 
into two new vectors based on a substring that is inside of each string, 
what is the most efficient way to do this?  The substring that I want to 
split on is multiple characters, if that matters, and it is contained in 
every element of the character vector.


--j

--

Jonathan A. Greenberg, PhD
Postdoctoral Scholar
Center for Spatial Technologies and Remote Sensing (CSTARS)
University of California, Davis
One Shields Avenue
The Barn, Room 250N
Davis, CA 95616
Phone: 415-763-5476
AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.