That's easy you are confusing the dummy code I sent. Do this:
lit<-read.csv("litologija.csv", sep=";", dec=".") sent <-data.frame(sentence=lit$Opis,stringsAsFactors=FALSE) irst=second=third=fourth=fifth=sixth=seventh=eighth=ninth=tenth<-vector(length=nrow( sent) I put the length of the vector to 10 just to do a dummy problem. Then do this: for(j in 1:nrow(sent) { sent[j,2:11]<-strsplit(sent[j,1]," ")[[1]][1:10] } That will get you a result the crude brute force way. try that. Then you can learn sapply way. but first you need to learn R data structures. On Tue, Nov 2, 2010 at 1:47 PM, Matevž PavliÄ <matevz.pav...@gi-zrmk.si>wrote: > Hi Steven, > > > > Thank you for the help. I get an error though when i do this : > > > > >lit<-read.csv("litologija.csv", sep=";", dec=".") > > >sent <-data.frame(sentence=lit$Opis,stringsAsFactors=FALSE) > > >str(sent) > > >sentV<-rep(sent,10) > > >str(sentV) > > > > > >first=second=third=fourth=fifth=sixth=seventh=eighth=ninth=tenth<-vector(length=10) > > >DF > <-data.frame(Sentence=sent,first,second,third,fourth,fifth,sixth,seventh,eighth,ninth,tenth,stringsAsFactors=FALSE) > > > > »Error in data.frame(Sentence = sent, first, second, third, fourth, fifth, > : > > arguments imply differing number of rows: 22928, 10« > > > > What am I doing wrong? > > > > Thnks, m > > > > > > > > *From:* steven mosher [mailto:mosherste...@gmail.com] > *Sent:* Tuesday, November 02, 2010 8:45 PM > *To:* David Winsemius > *Cc:* Matevž PavliÄ; Gaj Vidmar; r-h...@stat.math.ethz.ch > *Subject:* Re: [R] spliting first 10 words in a string > > > > Thanks david. > > > > Matevz, maybe I can help explain by doing a very simple and brute force > approach > > as opposed to the way david did it. But you should learn his methods. > > > > I will just do a subset of your problem and if you understand how it works > then you should > > be able to get something done and then make it more elegant. > > > > First, I simplify the problem by separating out the "sentence" column. > > > > You can do this with your data frame by simply doing this > > > > MySentence <-data.frame(sentence=yourbigDF$Opis,stringsAsFactors=FALSE) > > > > so I take your original data.frame (yourbigDF) and I just create a copy of > that one column > > $Opis > > > > Later we can merge the two back together after I add 10 columns for the > words > > > > > > Lets make some dummy data with just 10 rows > > > > > > > > sentence<- "this is a sentence with ten words or maybe more than ten > words" > > sentV<-rep(sentence,10) > > # now I just made 10 rows of the same sentence > > # NEXT because I am going to create 10 new colums of 10 rows I create > > # 10 vectors> each is named and each has 10 elements For the rows. > > # they have NO DATA in them > > > > > > first=second=third=fourth=fifth=sixth=seventh=eighth=ninth=tenth<-vector(length=10) > > > > #Next I create a dataframe with Sentence in the first column and 10 blank > colums. > > # NOTE I use stringsAsFactors=False > > > > DF > <-data.frame(Sentence=sentence,first,second,third,fourth,fifth,sixth,seventh,eighth,ninth,tenth,stringsAsFactors=FALSE) > > > > # This is what it would look like ( the first row) > > DF[1,] > > > > Sentence first second third fourth fifth sixth seventh eighth ninth tenth > > 1 this is a sentence with ten words or maybe more than ten words FALSE > FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > > > > Next, I will show you how to assign the first ten words to the 10 blank > columns > > > > DF[1,2:11]<-strsplit(DF[1,1]," ")[[1]][1:10] > > > > #DF[1,2:11] selects the columns 2-11 of the first row > > #strsplit returns the first 10 words [1:10] and place them in the > columsn2-11 > > > > If you want to do this the slow way you can just loop through your > dataframe row by row > > or you can probably use apply. > > > > Make more sense? > > > DF[1,2:11]<-strsplit(DF[1,1]," ")[[1]][1:10] > > > DF[1,] > > Sentence first > second third fourth fifth sixth seventh eighth ninth tenth > > 1 this is a sentence with ten words or maybe more than ten words this > is a sentence with ten words or maybe more > > > DF[1,"first"] > > [1] "this" > > > > On Tue, Nov 2, 2010 at 12:22 PM, David Winsemius <dwinsem...@comcast.net> > wrote: > > > On Nov 2, 2010, at 3:01 PM, Matevž PavliÄ wrote: > > Hi all, > > Thanks for all the help. I managed to do it with what Gaj suggested (Excel > :(). > > The last solution from David is also freat i just don't undestand why R > put the words in 14 columns and thre rows? > > > > Because the maximum number of words was 14 and the fill argument was TRUE. > There were three rows because there were three items in the supplied > character vector. > > > > I would like it to put just the first 10 words in source field to 10 > diefferent destiantion fields, but the same row. And so on...is that > possible? > > > > I don't know what a destination field might be. Those are not R data types. > > This would trim the extra columns (in this example set to those greater > than 8) by adding a lot of "NULL"'s to the end of a colClasses specification > .... at the expense of a warning message which can be ignored: > > > read.table(textConnection(words), fill=T, colClasses = c(rep("character", > 8), rep("NULL", 30) ) , stringsAsFactors=FALSE ) > > > V1 V2 V3 V4 V5 V6 V7 V8 > > 1 I have a columnn with text that has > > 2 I would like to split these words in > > 3 but just first ten words in the string. > > Warning message: > In read.table(textConnection(words), fill = T, colClasses = > c(rep("character", : > cols = 14 != length(data) = 38 > > > If you want to assign the first column to a variable then just: > > first8 <- read.table(textConnection(words), fill=T, colClasses = > c(rep("character", 8), rep("NULL", 30) ) , stringsAsFactors=FALSE) > > var1 <- first8[[1]] > > var1 > [1] "I" "I" "but" > > -- > David. > > > > > Thank you, m > -----Original Message----- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] > On Behalf Of David Winsemius > Sent: Tuesday, November 02, 2010 3:47 PM > To: Gaj Vidmar > Cc: r-h...@stat.math.ethz.ch > Subject: Re: [R] spliting first 10 words in a string > > > On Nov 2, 2010, at 6:24 AM, Gaj Vidmar wrote: > > Though <forbidden> in this list, in Excel it's just (literally!) > five clicks > away! > (with the column in question selected) > Data -> Text to Columns -> Delimited -> tick Space -> Finish > Pa je! (~Voila in Slovenian) > (then import back to R, keeping only the first 10 columns if so > desired) > > > You could do the same thing without needing to leave R. Just > read.table( textConnection(..), header=FALSE, fill=TRUE) > > read.table(textConnection(words), fill=T) > > V1 V2 V3 V4 V5 V6 V7 V8 V9 > V10 V11 V12 V13 V14 > 1 I have a columnn with text that has quite > a few words in it. > 2 I would like to split these words in separate columns > 3 but just first ten words in the string. Is that > possible in R? > > > Regards, > Assist. Prof. Gaj Vidmar, PhD > University Rehabilitattion Institute, Republic of Slovenia > > Irrelevant P.S. Long ago, before embarking on what eventually ended > mainly > in statistics, > I did two years of geology, so (and also because of knowing what the > poster's institute does) > I even kinda imagine what these data are. > > "Matev¾ Pavliè" <matevz.pav...@gi-zrmk.si> wrote in message > news:ad5ca6183570b54f92aa45ce2619f9b9d96...@gi-zrmk.si... > > Hi, > > I am sorry, will try to be more exact from now on... > > I have a data.frame with a field called Opis. IT contains > sentenses that > I would like to split in words or fields in data.frame...when I say > columns I mean as in Excel table. I would like to split "Opis" into > ten > fields from the first ten words in Opis field. > Here is an example of my data.frame. > > 'data.frame': 22928 obs. of 12 variables: > $ VrtinaID : int 1 1 1 1 2 2 2 2 2 2 ... > $ ZapStev : int 1 2 3 4 1 2 3 4 5 6 ... > $ GlobinaOd : num 0 0.8 9.2 10.1 0 0.9 2.6 4.9 6.8 7.3 ... > $ GlobinaDo : num 0.8 9.2 10.1 11 0.9 2.6 4.9 6.8 7.3 8.2 ... > $ Opis : Factor w/ 12754 levels "","(MIVKA) DROBEN MELJAST > PESEK, GOST, SIVORJAV",..: 2060 11588 2477 11660 7539 3182 7884 > 9123 2500 > 4756 ... > $ ACklasifikacija : Factor w/ 290 levels "","(CL)","(CL)/(SC)",..: > 154 125 > 101 101 NA 106 125 80 106 101 ... > $ GeolNastOd : num 0 0.8 9.2 10.1 0 0.9 2.6 4.9 6.8 7.3 ... > $ GeolNastDo : num 0.8 9.2 10.1 11 0.9 2.6 4.9 6.8 7.3 8.2 ... > $ GeolNastOpis : Factor w/ 113 levels "","B. M. S.",..: 56 53 53 > 53 56 > 53 53 53 53 53 ... > $ NacinVrtanjaOd : num 0e+00 1e+09 1e+09 1e+09 0e+00 ... > $ NacinVrtanjaDo : num 1.1e+01 1.0e+09 1.0e+09 1.0e+09 1.0e+01 ... > $ NacinVrtanjaOpis: Factor w/ 43 levels "","H. N.","IZKOP",..: 26 1 > 1 1 26 > 1 1 1 1 1 ... > > Hope that explains better... > Thank you, m > > -----Original Message----- > From: David Winsemius [mailto:dwinsem...@comcast.net] > Sent: Monday, November 01, 2010 10:13 PM > To: Matev¾ Pavliè > Cc: r-help@r-project.org > Subject: Re: [R] spliting first 10 words in a string > > > On Nov 1, 2010, at 4:39 PM, Matev¾ Pavliè wrote: > > Hi all, > > > > I have a columnn with text that has quite a few words in it. I would > like to split these words in separate columns, but just first ten > words in the string. Is that possible in R? > > > Not sure what a column means to you. It's not a precisely defined R > type or class. (And you are requested to offered a concrete example > rather than making us guess.) > > words <-"I have a columnn with text that has quite a few words in > > it. I would like to split these words in separate columns, but just > first ten words in the string. Is that possible in R?" > > strsplit(words, " ")[[1]][1:10] > > [1] "I" "have" "a" "columnn" "with" "text" > "that" "has" "quite" "a" > > > Or if in a dataframe: > > words <-c("I have a columnn with text that has quite a few words in > > it.", "I would like to split these words in separate columns", "but > just first ten words in the string. Is that possible in R?") > > worddf <- data.frame(words=words) > > > > t(sapply(strsplit(worddf$words, " "), "[", 1:10) ) > > [,1] [,2] [,3] [,4] [,5] [,6] [,7] [, > 8] [,9] [,10] > [1,] "I" "have" "a" "columnn" "with" "text" "that" "has" > "quite" "a" > [2,] "I" "would" "like" "to" "split" "these" "words" "in" > "separate" "columns" > [3,] "but" "just" "first" "ten" "words" "in" "the" > "string." > "Is" "that" > > > -- > David Winsemius, MD > West Hartford, CT > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > David Winsemius, MD > West Hartford, CT > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > David Winsemius, MD > West Hartford, CT > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.