Re: [R] splitting First 10 words in a string

steven mosher Tue, 02 Nov 2010 15:35:25 -0700

Line should be:

first=second=third=fourth=fifth=sixth=seventh=eighth=ninth=tenth<-vector(length=nrow(
sent))


sorry cut and past error

On Tue, Nov 2, 2010 at 3:32 PM, steven mosher <mosherste...@gmail.com>wrote:

>  That's easy you are confusing the dummy code I sent.
>
>  Do this:
>
>  lit<-read.csv("litologija.csv", sep=";", dec=".")
> sent <-data.frame(sentence=lit$Opis,stringsAsFactors=FALSE)
>
> first=second=third=fourth=fifth=sixth=seventh=eighth=ninth=tenth<-vector(length=nrow(
> sent)
>
> I put the length of the vector to 10 just to do a dummy problem.
>
> Then do this:
>
> for(j in 1:nrow(sent) {
>
>   sent[j,2:11]<-strsplit(sent[j,1]," ")[[1]][1:10]
>
> }
>
>
> That will get you a result the crude brute force way.
>
> try that.
>
> Then you can learn sapply way. but first you need to learn R data
> structures.
>
>
>
>
>
> On Tue, Nov 2, 2010 at 1:47 PM, MatevÅ¾ PavliÄ 
> <matevz.pav...@gi-zrmk.si>wrote:
>
>> Hi Steven,
>>
>>
>>
>> Thank you for the help. I get an error though when i do this :
>>
>>
>>
>> >lit<-read.csv("litologija.csv", sep=";", dec=".")
>>
>> >sent <-data.frame(sentence=lit$Opis,stringsAsFactors=FALSE)
>>
>> >str(sent)
>>
>> >sentV<-rep(sent,10)
>>
>> >str(sentV)
>>
>>
>>
>>
>> >first=second=third=fourth=fifth=sixth=seventh=eighth=ninth=tenth<-vector(length=10)
>>
>> >DF
>> <-data.frame(Sentence=sent,first,second,third,fourth,fifth,sixth,seventh,eighth,ninth,tenth,stringsAsFactors=FALSE)
>>
>>
>>
>> Â»Error in data.frame(Sentence = sent, first, second, third, fourth,
>> fifth,  :
>>
>> arguments imply differing number of rows: 22928, 10Â«
>>
>>
>>
>> What am I doing wrong?
>>
>>
>>
>> Thnks, m
>>
>>
>>
>>
>>
>>
>>
>> *From:* steven mosher [mailto:mosherste...@gmail.com]
>> *Sent:* Tuesday, November 02, 2010 8:45 PM
>> *To:* David Winsemius
>> *Cc:* MatevÅ¾ PavliÄ; Gaj Vidmar; r-h...@stat.math.ethz.ch
>> *Subject:* Re: [R] spliting first 10 words in a string
>>
>>
>>
>>  Thanks david.
>>
>>
>>
>>   Matevz, maybe I can help explain by doing a very simple and brute force
>> approach
>>
>> as opposed to  the way david did it. But you should learn his methods.
>>
>>
>>
>> I will just do a subset of your problem and if you understand how it works
>> then you should
>>
>> be able to get something done and then make it more elegant.
>>
>>
>>
>> First, I simplify the problem by separating out the "sentence" column.
>>
>>
>>
>> You can do this with your data frame by simply doing this
>>
>>
>>
>> MySentence <-data.frame(sentence=yourbigDF$Opis,stringsAsFactors=FALSE)
>>
>>
>>
>> so I take your original data.frame (yourbigDF) and I just create a copy of
>> that one column
>>
>>  $Opis
>>
>>
>>
>> Later we can merge the two back together after I add 10 columns for the
>> words
>>
>>
>>
>>
>>
>> Lets make some dummy data with just 10 rows
>>
>>
>>
>>
>>
>>
>>
>>  sentence<- "this is a sentence with ten words or maybe more than ten
>> words"
>>
>>  sentV<-rep(sentence,10)
>>
>> # now I just made 10 rows of the same sentence
>>
>> # NEXT because I am going to create 10 new colums of 10 rows I create
>>
>> # 10 vectors> each is named and each has 10 elements For the rows.
>>
>> # they have NO DATA in them
>>
>>
>>
>>
>>  
>> first=second=third=fourth=fifth=sixth=seventh=eighth=ninth=tenth<-vector(length=10)
>>
>>
>>
>> #Next I create a dataframe with Sentence in the first column and 10 blank
>> colums.
>>
>> # NOTE I use stringsAsFactors=False
>>
>>
>>
>>  DF
>> <-data.frame(Sentence=sentence,first,second,third,fourth,fifth,sixth,seventh,eighth,ninth,tenth,stringsAsFactors=FALSE)
>>
>>
>>
>> # This is what it would look like ( the first row)
>>
>> DF[1,]
>>
>>
>>
>> Sentence first second third fourth fifth sixth seventh eighth ninth tenth
>>
>> 1 this is a sentence with ten words or maybe more than ten words FALSE
>>  FALSE FALSE  FALSE FALSE FALSE   FALSE  FALSE FALSE FALSE
>>
>>
>>
>> Next, I will show you how to assign the first ten words to the 10 blank
>> columns
>>
>>
>>
>> DF[1,2:11]<-strsplit(DF[1,1]," ")[[1]][1:10]
>>
>>
>>
>> #DF[1,2:11]  selects the columns 2-11 of the first row
>>
>> #strsplit  returns the first 10 words [1:10] and place them in the
>> columsn2-11
>>
>>
>>
>> If you want to do this the slow way you can just loop through your
>> dataframe row by row
>>
>> or you can probably use apply.
>>
>>
>>
>> Make more sense?
>>
>> > DF[1,2:11]<-strsplit(DF[1,1]," ")[[1]][1:10]
>>
>> > DF[1,]
>>
>>                                                         Sentence first
>> second third   fourth fifth sixth seventh eighth ninth tenth
>>
>> 1 this is a sentence with ten words or maybe more than ten words  this
>> is     a sentence  with   ten   words     or maybe  more
>>
>> > DF[1,"first"]
>>
>> [1] "this"
>>
>>
>>
>> On Tue, Nov 2, 2010 at 12:22 PM, David Winsemius <dwinsem...@comcast.net>
>> wrote:
>>
>>
>> On Nov 2, 2010, at 3:01 PM, MatevÅ¾ PavliÄ wrote:
>>
>> Hi all,
>>
>> Thanks for all the help. I managed to do it with what Gaj suggested (Excel
>> :().
>>
>> The last solution from David is also freat i just don't undestand why R
>>  put the words in 14 columns and thre rows?
>>
>>
>>
>> Because the maximum number of words was 14 and the fill argument was TRUE.
>> There were three rows because there were three items in the supplied
>> character vector.
>>
>>
>>
>> I would like it to put just the first 10 words in source field to 10
>> diefferent destiantion fields, but the same row. And so on...is that
>> possible?
>>
>>
>>
>> I don't know what a destination field might be. Those are not R data
>> types.
>>
>> This would trim the extra columns (in this example set to those greater
>> than 8) by adding a lot of "NULL"'s to the end of a colClasses specification
>> .... at the expense of a warning message which can be ignored:
>>
>> > read.table(textConnection(words), fill=T, colClasses =
>> c(rep("character", 8), rep("NULL", 30) ) , stringsAsFactors=FALSE )
>>
>>
>>   V1    V2    V3      V4    V5    V6    V7      V8
>>
>> 1   I  have     a columnn  with  text  that     has
>>
>> 2   I would  like      to split these words      in
>>
>> 3 but  just first     ten words    in   the string.
>>
>> Warning message:
>> In read.table(textConnection(words), fill = T, colClasses =
>> c(rep("character",  :
>>  cols = 14 != length(data) = 38
>>
>>
>> If you want to assign the first column to a variable then just:
>> > first8 <- read.table(textConnection(words), fill=T, colClasses =
>> c(rep("character", 8), rep("NULL", 30) ) , stringsAsFactors=FALSE)
>> > var1 <- first8[[1]]
>> > var1
>> [1] "I"   "I"   "but"
>>
>> --
>> David.
>>
>>
>>
>>
>> Thank you, m
>> -----Original Message-----
>> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
>> On Behalf Of David Winsemius
>> Sent: Tuesday, November 02, 2010 3:47 PM
>> To: Gaj Vidmar
>> Cc: r-h...@stat.math.ethz.ch
>> Subject: Re: [R] spliting first 10 words in a string
>>
>>
>> On Nov 2, 2010, at 6:24 AM, Gaj Vidmar wrote:
>>
>> Though <forbidden> in this list, in Excel it's just (literally!)
>> five clicks
>> away!
>> (with the column in question selected)
>> Data -> Text to Columns -> Delimited -> tick Space -> Finish
>> Pa je! (~Voila in Slovenian)
>> (then import back to R, keeping only the first 10 columns if so
>> desired)
>>
>>
>> You could do the same thing without needing to leave R. Just
>> read.table( textConnection(..), header=FALSE, fill=TRUE)
>>
>> read.table(textConnection(words), fill=T)
>>
>>   V1    V2    V3      V4    V5    V6    V7      V8       V9
>> V10      V11   V12 V13 V14
>> 1   I  have     a columnn  with  text  that     has    quite
>> a      few words  in it.
>> 2   I would  like      to split these words      in separate columns
>> 3 but  just first     ten words    in   the string.       Is    that
>> possible    in  R?
>>
>>
>> Regards,
>> Assist. Prof. Gaj Vidmar, PhD
>> University Rehabilitattion Institute, Republic of Slovenia
>>
>> Irrelevant P.S. Long ago, before embarking on what eventually ended
>> mainly
>> in statistics,
>> I did two years of geology, so (and also because of knowing what the
>> poster's institute does)
>> I even kinda imagine what these data are.
>>
>> "MatevÂ¾ PavliÃ¨" <matevz.pav...@gi-zrmk.si> wrote in message
>> news:ad5ca6183570b54f92aa45ce2619f9b9d96...@gi-zrmk.si...
>>
>> Hi,
>>
>> I am sorry, will try to be more exact from now on...
>>
>> I have a data.frame  with a field called Opis. IT contains
>> sentenses that
>> I would like to split in words or fields in data.frame...when I say
>> columns I mean as in Excel table. I would like to split "Opis" into
>> ten
>> fields from the first ten words in Opis field.
>> Here is an example of my data.frame.
>>
>> 'data.frame':   22928 obs. of  12 variables:
>> $ VrtinaID        : int  1 1 1 1 2 2 2 2 2 2 ...
>> $ ZapStev         : int  1 2 3 4 1 2 3 4 5 6 ...
>> $ GlobinaOd       : num  0 0.8 9.2 10.1 0 0.9 2.6 4.9 6.8 7.3 ...
>> $ GlobinaDo       : num  0.8 9.2 10.1 11 0.9 2.6 4.9 6.8 7.3 8.2 ...
>> $ Opis            : Factor w/ 12754 levels "","(MIVKA) DROBEN MELJAST
>> PESEK, GOST, SIVORJAV",..: 2060 11588 2477 11660 7539 3182 7884
>> 9123 2500
>> 4756 ...
>> $ ACklasifikacija : Factor w/ 290 levels "","(CL)","(CL)/(SC)",..:
>> 154 125
>> 101 101 NA 106 125 80 106 101 ...
>> $ GeolNastOd      : num  0 0.8 9.2 10.1 0 0.9 2.6 4.9 6.8 7.3 ...
>> $ GeolNastDo      : num  0.8 9.2 10.1 11 0.9 2.6 4.9 6.8 7.3 8.2 ...
>> $ GeolNastOpis    : Factor w/ 113 levels "","B. M. S.",..: 56 53 53
>> 53 56
>> 53 53 53 53 53 ...
>> $ NacinVrtanjaOd  : num  0e+00 1e+09 1e+09 1e+09 0e+00 ...
>> $ NacinVrtanjaDo  : num  1.1e+01 1.0e+09 1.0e+09 1.0e+09 1.0e+01 ...
>> $ NacinVrtanjaOpis: Factor w/ 43 levels "","H. N.","IZKOP",..: 26 1
>> 1 1 26
>> 1 1 1 1 1 ...
>>
>> Hope that explains better...
>> Thank you, m
>>
>> -----Original Message-----
>> From: David Winsemius [mailto:dwinsem...@comcast.net]
>> Sent: Monday, November 01, 2010 10:13 PM
>> To: MatevÂ¾ PavliÃ¨
>> Cc: r-help@r-project.org
>> Subject: Re: [R] spliting first 10 words in a string
>>
>>
>> On Nov 1, 2010, at 4:39 PM, MatevÂ¾ PavliÃ¨ wrote:
>>
>> Hi all,
>>
>>
>>
>> I have a columnn with text that has quite a few words in it. I would
>> like to split these words in separate columns, but just first ten
>> words in the string. Is that possible in R?
>>
>>
>> Not sure what a column means to you. It's not a precisely defined R
>> type or class. (And you are requested to offered a concrete example
>> rather than making us guess.)
>>
>> words <-"I have a columnn with text that has quite a few words in
>>
>> it. I would like to split these words in separate columns, but just
>> first ten words in the string. Is that possible in R?"
>>
>> strsplit(words, " ")[[1]][1:10]
>>
>> [1] "I"       "have"    "a"       "columnn" "with"    "text"
>> "that"    "has"     "quite"   "a"
>>
>>
>> Or if in a dataframe:
>>
>> words <-c("I have a columnn with text that has quite a few words in
>>
>> it.",   "I would like to split these words in separate columns", "but
>> just first ten words in the string. Is that possible in R?")
>>
>> worddf <- data.frame(words=words)
>>
>>
>>
>> t(sapply(strsplit(worddf$words, " "), "[", 1:10) )
>>
>>   [,1]  [,2]    [,3]    [,4]      [,5]    [,6]    [,7]    [,
>> 8]      [,9]       [,10]
>> [1,] "I"   "have"  "a"     "columnn" "with"  "text"  "that"  "has"
>> "quite"    "a"
>> [2,] "I"   "would" "like"  "to"      "split" "these" "words" "in"
>> "separate" "columns"
>> [3,] "but" "just"  "first" "ten"     "words" "in"    "the"
>> "string."
>> "Is"       "that"
>>
>>
>> --
>> David Winsemius, MD
>> West Hartford, CT
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>> David Winsemius, MD
>> West Hartford, CT
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>> David Winsemius, MD
>> West Hartford, CT
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] splitting First 10 words in a string

Reply via email to