Re: [R] splitting a vector of strings

2016-07-21 Thread Michael Dewey

Dear Eric

I think you are looking for sub or gsub

Without an example set of input and output I am not quite sure but you 
would need to define an expression which matches your separator (;) 
followed by any characters up to the end of line. If you have trouble 
with that then someone here will no doubt write the pattern for you but 
learning about regular expressions is well worthwhile


On 21/07/2016 12:54, Eric Elguero wrote:

Hi everybody,

I have a vector of character strings.
Each string has the same pattern and I want
to split them in pieces and get a vector made
of the first pieces of each string.

The problem is that strsplit returns a list.

All I found is

uu<- matrix(unlist(strsplit(x,";")),ncol=3,byrow=T)[,1]

where x is the vector ";" is the delimiting character
and I know that each string will be cut in 3 pieces.

That works for my problem but I would prefer a
more elegant solution. Besides, it would not
work if all the string didn't have the same
number of pieces.

does someone have a better solution?

sorry if that topic was discussed recently.
There is too much traffic on the r-help list,
I cannot catch up.



--
Michael
http://www.dewey.myzen.co.uk/home.html

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] splitting a vector of strings

2016-07-21 Thread Ben Tupper
Hi,

I'm not sure about the more generalized solution, but how about this for a 
start.


x <- c("a;b;c", "d;e", "foo;g;h;i")
x
#[1] "a;b;c" "d;e"   "foo;g;h;i"

sapply(strsplit(x, ";",fixed = TRUE), '[',1)
#[1] "a"   "d"   "foo"

If you want elegance then I suggest you take a look at the stringr package. 

https://cran.r-project.org/web/packages/stringr/index.html

Cheers,
Ben


> On Jul 21, 2016, at 7:54 AM, Eric Elguero  wrote:
> 
> Hi everybody,
> 
> I have a vector of character strings.
> Each string has the same pattern and I want
> to split them in pieces and get a vector made
> of the first pieces of each string.
> 
> The problem is that strsplit returns a list.
> 
> All I found is
> 
> uu<- matrix(unlist(strsplit(x,";")),ncol=3,byrow=T)[,1]
> 
> where x is the vector ";" is the delimiting character
> and I know that each string will be cut in 3 pieces.
> 
> That works for my problem but I would prefer a
> more elegant solution. Besides, it would not
> work if all the string didn't have the same
> number of pieces.
> 
> does someone have a better solution?
> 
> sorry if that topic was discussed recently.
> There is too much traffic on the r-help list,
> I cannot catch up.
> 
> -- 
> Eric Elguero
> 
> MIVEGEC. - UMR (CNRS/IRD/UM) 5290
> Maladies Infectieuses et Vecteurs, Génétique, Evolution et Contrôle
> Institut de Recherche pour le Développement (IRD)
> 911, Avenue Agropolis
> BP 64501
> 34394 Montpellier Cedex 5, France
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Ben Tupper
Bigelow Laboratory for Ocean Sciences
60 Bigelow Drive, P.O. Box 380
East Boothbay, Maine 04544
http://www.bigelow.org

Report Gulf of Maine jellyfish sightings to jellyf...@bigelow.org or tweet them 
to #MaineJellies -- include date, time, and location, as well as any 
descriptive information such as size or type.  Learn more at 
https://www.bigelow.org/research/srs/nick-record/nick-record-laboratory/mainejellies/

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] splitting a vector of strings

2016-07-21 Thread Eric Elguero

Hi everybody,

I have a vector of character strings.
Each string has the same pattern and I want
to split them in pieces and get a vector made
of the first pieces of each string.

The problem is that strsplit returns a list.

All I found is

uu<- matrix(unlist(strsplit(x,";")),ncol=3,byrow=T)[,1]

where x is the vector ";" is the delimiting character
and I know that each string will be cut in 3 pieces.

That works for my problem but I would prefer a
more elegant solution. Besides, it would not
work if all the string didn't have the same
number of pieces.

does someone have a better solution?

sorry if that topic was discussed recently.
There is too much traffic on the r-help list,
I cannot catch up.

--
Eric Elguero

MIVEGEC. - UMR (CNRS/IRD/UM) 5290
Maladies Infectieuses et Vecteurs, Génétique, Evolution et Contrôle
Institut de Recherche pour le Développement (IRD)
911, Avenue Agropolis
BP 64501
34394 Montpellier Cedex 5, France

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] splitting a vector of strings...

2009-10-22 Thread Jonathan Greenberg
Quick question -- if I have a vector of strings that I'd like to split 
into two new vectors based on a substring that is inside of each string, 
what is the most efficient way to do this?  The substring that I want to 
split on is multiple characters, if that matters, and it is contained in 
every element of the character vector.


--j

--

Jonathan A. Greenberg, PhD
Postdoctoral Scholar
Center for Spatial Technologies and Remote Sensing (CSTARS)
University of California, Davis
One Shields Avenue
The Barn, Room 250N
Davis, CA 95616
Phone: 415-763-5476
AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] splitting a vector of strings...

2009-10-22 Thread William Dunlap
 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of Jonathan Greenberg
 Sent: Thursday, October 22, 2009 7:35 PM
 To: r-help
 Subject: [R] splitting a vector of strings...
 
 Quick question -- if I have a vector of strings that I'd like 
 to split 
 into two new vectors based on a substring that is inside of 
 each string, 
 what is the most efficient way to do this?  The substring 
 that I want to 
 split on is multiple characters, if that matters, and it is 
 contained in 
 every element of the character vector.

strsplit and sub can both be used for this.  If you know
the string will be split into 2 parts then 2 calls to sub
with slightly different patterns will do it.  strsplit requires
less fiddling with the pattern and is handier when the number
of parts is variable or large.  strsplit's output often needs to
be rearranged for convenient use.

E.g., I made 100,000 strings with a 'qaz' in their middles with
  x-paste(X,sample(1e5),sep=)
  y-sub(X,Y,x)
  xy-paste(x,y,sep=qaz)
and split them by the 'qaz' in two ways:
  system.time(ret1-list(x=sub(qaz.*,,xy),y=sub(.*qaz,,xy)))
  # user  system elapsed 
  # 0.220.000.21 
 
system.time({tmp-strsplit(xy,qaz);ret2-list(x=unlist(lapply(tmp,`[`,
1)),y=unlist(lapply(tmp,`[`,2)))})
   user  system elapsed 
  # 2.420.002.20 
  identical(ret1,ret2)
  #[1] TRUE
  identical(ret1$x,x)  identical(ret1$y,y)
  #[1] TRUE

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

 
 --j
 
 -- 
 
 Jonathan A. Greenberg, PhD
 Postdoctoral Scholar
 Center for Spatial Technologies and Remote Sensing (CSTARS)
 University of California, Davis
 One Shields Avenue
 The Barn, Room 250N
 Davis, CA 95616
 Phone: 415-763-5476
 AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] splitting a vector of strings...

2009-10-22 Thread andrew
xs - this is string
xsv - paste(xs, 1:10)
sapply(xsv, function(x) strsplit(x, '\\sis\\s'))

This will split the vector of string xsv on the word 'is' that has a
space immediately before and after it.



On Oct 23, 1:34 pm, Jonathan Greenberg greenb...@ucdavis.edu wrote:
 Quick question -- if I have a vector of strings that I'd like to split
 into two new vectors based on a substring that is inside of each string,
 what is the most efficient way to do this?  The substring that I want to
 split on is multiple characters, if that matters, and it is contained in
 every element of the character vector.

 --j

 --

 Jonathan A. Greenberg, PhD
 Postdoctoral Scholar
 Center for Spatial Technologies and Remote Sensing (CSTARS)
 University of California, Davis
 One Shields Avenue
 The Barn, Room 250N
 Davis, CA 95616
 Phone: 415-763-5476
 AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307

 __
 r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] splitting a vector of strings...

2009-10-22 Thread Jonathan Greenberg

William et al:

   Thanks!  I think I have a somewhat more complicated issue due to the 
type of string I'm using -- the split is  |  (space pipe space) -- how 
do I code that based on your sub code below?  Using  | * doesn't seem 
to be working.  Thanks!


--j

William Dunlap wrote:

-Original Message-
From: r-help-boun...@r-project.org 
[mailto:r-help-boun...@r-project.org] On Behalf Of Jonathan Greenberg

Sent: Thursday, October 22, 2009 7:35 PM
To: r-help
Subject: [R] splitting a vector of strings...

Quick question -- if I have a vector of strings that I'd like 
to split 
into two new vectors based on a substring that is inside of 
each string, 
what is the most efficient way to do this?  The substring 
that I want to 
split on is multiple characters, if that matters, and it is 
contained in 
every element of the character vector.



strsplit and sub can both be used for this.  If you know
the string will be split into 2 parts then 2 calls to sub
with slightly different patterns will do it.  strsplit requires
less fiddling with the pattern and is handier when the number
of parts is variable or large.  strsplit's output often needs to
be rearranged for convenient use.

E.g., I made 100,000 strings with a 'qaz' in their middles with
  x-paste(X,sample(1e5),sep=)
  y-sub(X,Y,x)
  xy-paste(x,y,sep=qaz)
and split them by the 'qaz' in two ways:
  system.time(ret1-list(x=sub(qaz.*,,xy),y=sub(.*qaz,,xy)))
  # user  system elapsed 
  # 0.220.000.21 
 
system.time({tmp-strsplit(xy,qaz);ret2-list(x=unlist(lapply(tmp,`[`,

1)),y=unlist(lapply(tmp,`[`,2)))})
   user  system elapsed 
  # 2.420.002.20 
  identical(ret1,ret2)

  #[1] TRUE
  identical(ret1$x,x)  identical(ret1$y,y)
  #[1] TRUE

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

  

--j

--

Jonathan A. Greenberg, PhD
Postdoctoral Scholar
Center for Spatial Technologies and Remote Sensing (CSTARS)
University of California, Davis
One Shields Avenue
The Barn, Room 250N
Davis, CA 95616
Phone: 415-763-5476
AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.




--

Jonathan A. Greenberg, PhD
Postdoctoral Scholar
Center for Spatial Technologies and Remote Sensing (CSTARS)
University of California, Davis
One Shields Avenue
The Barn, Room 250N
Davis, CA 95616
Phone: 415-763-5476
AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] splitting a vector of strings...

2009-10-22 Thread andrew
the following works - double backslash to remove the or
functionality of | in a regex.  (Bill Dunlap showed that you don't
need sapply for it to work)

xs - this is | string
xsv - paste(xs, 1:10)
strsplit(xsv, \\|)


On Oct 23, 3:50 pm, Jonathan Greenberg greenb...@ucdavis.edu wrote:
 William et al:

     Thanks!  I think I have a somewhat more complicated issue due to the
 type of string I'm using -- the split is  |  (space pipe space) -- how
 do I code that based on your sub code below?  Using  | * doesn't seem
 to be working.  Thanks!

 --j



 William Dunlap wrote:
  -Original Message-
  From: r-help-boun...@r-project.org
  [mailto:r-help-boun...@r-project.org] On Behalf Of Jonathan Greenberg
  Sent: Thursday, October 22, 2009 7:35 PM
  To: r-help
  Subject: [R] splitting a vector of strings...

  Quick question -- if I have a vector of strings that I'd like
  to split
  into two new vectors based on a substring that is inside of
  each string,
  what is the most efficient way to do this?  The substring
  that I want to
  split on is multiple characters, if that matters, and it is
  contained in
  every element of the character vector.

  strsplit and sub can both be used for this.  If you know
  the string will be split into 2 parts then 2 calls to sub
  with slightly different patterns will do it.  strsplit requires
  less fiddling with the pattern and is handier when the number
  of parts is variable or large.  strsplit's output often needs to
  be rearranged for convenient use.

  E.g., I made 100,000 strings with a 'qaz' in their middles with
    x-paste(X,sample(1e5),sep=)
    y-sub(X,Y,x)
    xy-paste(x,y,sep=qaz)
  and split them by the 'qaz' in two ways:
    system.time(ret1-list(x=sub(qaz.*,,xy),y=sub(.*qaz,,xy)))
    # user  system elapsed
    # 0.22    0.00    0.21

  system.time({tmp-strsplit(xy,qaz);ret2-list(x=unlist(lapply(tmp,`[`,
  1)),y=unlist(lapply(tmp,`[`,2)))})
     user  system elapsed
    # 2.42    0.00    2.20
    identical(ret1,ret2)
    #[1] TRUE
    identical(ret1$x,x)  identical(ret1$y,y)
    #[1] TRUE

  Bill Dunlap
  Spotfire, TIBCO Software
  wdunlap tibco.com

  --j

  --

  Jonathan A. Greenberg, PhD
  Postdoctoral Scholar
  Center for Spatial Technologies and Remote Sensing (CSTARS)
  University of California, Davis
  One Shields Avenue
  The Barn, Room 250N
  Davis, CA 95616
  Phone: 415-763-5476
  AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307

  __
  r-h...@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 --

 Jonathan A. Greenberg, PhD
 Postdoctoral Scholar
 Center for Spatial Technologies and Remote Sensing (CSTARS)
 University of California, Davis
 One Shields Avenue
 The Barn, Room 250N
 Davis, CA 95616
 Phone: 415-763-5476
 AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307

 __
 r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.