Line should be: first=second=third=fourth=fifth=sixth=seventh=eighth=ninth=tenth<-vector(length=nrow( sent))
sorry cut and past error On Tue, Nov 2, 2010 at 3:32 PM, steven mosher <mosherste...@gmail.com>wrote: > That's easy you are confusing the dummy code I sent. > > Do this: > > lit<-read.csv("litologija.csv", sep=";", dec=".") > sent <-data.frame(sentence=lit$Opis,stringsAsFactors=FALSE) > > first=second=third=fourth=fifth=sixth=seventh=eighth=ninth=tenth<-vector(length=nrow( > sent) > > I put the length of the vector to 10 just to do a dummy problem. > > Then do this: > > for(j in 1:nrow(sent) { > > sent[j,2:11]<-strsplit(sent[j,1]," ")[[1]][1:10] > > } > > > That will get you a result the crude brute force way. > > try that. > > Then you can learn sapply way. but first you need to learn R data > structures. > > > > > > On Tue, Nov 2, 2010 at 1:47 PM, Matevž PavliÄ > <matevz.pav...@gi-zrmk.si>wrote: > >> Hi Steven, >> >> >> >> Thank you for the help. I get an error though when i do this : >> >> >> >> >lit<-read.csv("litologija.csv", sep=";", dec=".") >> >> >sent <-data.frame(sentence=lit$Opis,stringsAsFactors=FALSE) >> >> >str(sent) >> >> >sentV<-rep(sent,10) >> >> >str(sentV) >> >> >> >> >> >first=second=third=fourth=fifth=sixth=seventh=eighth=ninth=tenth<-vector(length=10) >> >> >DF >> <-data.frame(Sentence=sent,first,second,third,fourth,fifth,sixth,seventh,eighth,ninth,tenth,stringsAsFactors=FALSE) >> >> >> >> »Error in data.frame(Sentence = sent, first, second, third, fourth, >> fifth, : >> >> arguments imply differing number of rows: 22928, 10« >> >> >> >> What am I doing wrong? >> >> >> >> Thnks, m >> >> >> >> >> >> >> >> *From:* steven mosher [mailto:mosherste...@gmail.com] >> *Sent:* Tuesday, November 02, 2010 8:45 PM >> *To:* David Winsemius >> *Cc:* Matevž PavliÄ; Gaj Vidmar; r-h...@stat.math.ethz.ch >> *Subject:* Re: [R] spliting first 10 words in a string >> >> >> >> Thanks david. >> >> >> >> Matevz, maybe I can help explain by doing a very simple and brute force >> approach >> >> as opposed to the way david did it. But you should learn his methods. >> >> >> >> I will just do a subset of your problem and if you understand how it works >> then you should >> >> be able to get something done and then make it more elegant. >> >> >> >> First, I simplify the problem by separating out the "sentence" column. >> >> >> >> You can do this with your data frame by simply doing this >> >> >> >> MySentence <-data.frame(sentence=yourbigDF$Opis,stringsAsFactors=FALSE) >> >> >> >> so I take your original data.frame (yourbigDF) and I just create a copy of >> that one column >> >> $Opis >> >> >> >> Later we can merge the two back together after I add 10 columns for the >> words >> >> >> >> >> >> Lets make some dummy data with just 10 rows >> >> >> >> >> >> >> >> sentence<- "this is a sentence with ten words or maybe more than ten >> words" >> >> sentV<-rep(sentence,10) >> >> # now I just made 10 rows of the same sentence >> >> # NEXT because I am going to create 10 new colums of 10 rows I create >> >> # 10 vectors> each is named and each has 10 elements For the rows. >> >> # they have NO DATA in them >> >> >> >> >> >> first=second=third=fourth=fifth=sixth=seventh=eighth=ninth=tenth<-vector(length=10) >> >> >> >> #Next I create a dataframe with Sentence in the first column and 10 blank >> colums. >> >> # NOTE I use stringsAsFactors=False >> >> >> >> DF >> <-data.frame(Sentence=sentence,first,second,third,fourth,fifth,sixth,seventh,eighth,ninth,tenth,stringsAsFactors=FALSE) >> >> >> >> # This is what it would look like ( the first row) >> >> DF[1,] >> >> >> >> Sentence first second third fourth fifth sixth seventh eighth ninth tenth >> >> 1 this is a sentence with ten words or maybe more than ten words FALSE >> FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE >> >> >> >> Next, I will show you how to assign the first ten words to the 10 blank >> columns >> >> >> >> DF[1,2:11]<-strsplit(DF[1,1]," ")[[1]][1:10] >> >> >> >> #DF[1,2:11] selects the columns 2-11 of the first row >> >> #strsplit returns the first 10 words [1:10] and place them in the >> columsn2-11 >> >> >> >> If you want to do this the slow way you can just loop through your >> dataframe row by row >> >> or you can probably use apply. >> >> >> >> Make more sense? >> >> > DF[1,2:11]<-strsplit(DF[1,1]," ")[[1]][1:10] >> >> > DF[1,] >> >> Sentence first >> second third fourth fifth sixth seventh eighth ninth tenth >> >> 1 this is a sentence with ten words or maybe more than ten words this >> is a sentence with ten words or maybe more >> >> > DF[1,"first"] >> >> [1] "this" >> >> >> >> On Tue, Nov 2, 2010 at 12:22 PM, David Winsemius <dwinsem...@comcast.net> >> wrote: >> >> >> On Nov 2, 2010, at 3:01 PM, Matevž PavliÄ wrote: >> >> Hi all, >> >> Thanks for all the help. I managed to do it with what Gaj suggested (Excel >> :(). >> >> The last solution from David is also freat i just don't undestand why R >> put the words in 14 columns and thre rows? >> >> >> >> Because the maximum number of words was 14 and the fill argument was TRUE. >> There were three rows because there were three items in the supplied >> character vector. >> >> >> >> I would like it to put just the first 10 words in source field to 10 >> diefferent destiantion fields, but the same row. And so on...is that >> possible? >> >> >> >> I don't know what a destination field might be. Those are not R data >> types. >> >> This would trim the extra columns (in this example set to those greater >> than 8) by adding a lot of "NULL"'s to the end of a colClasses specification >> .... at the expense of a warning message which can be ignored: >> >> > read.table(textConnection(words), fill=T, colClasses = >> c(rep("character", 8), rep("NULL", 30) ) , stringsAsFactors=FALSE ) >> >> >> V1 V2 V3 V4 V5 V6 V7 V8 >> >> 1 I have a columnn with text that has >> >> 2 I would like to split these words in >> >> 3 but just first ten words in the string. >> >> Warning message: >> In read.table(textConnection(words), fill = T, colClasses = >> c(rep("character", : >> cols = 14 != length(data) = 38 >> >> >> If you want to assign the first column to a variable then just: >> > first8 <- read.table(textConnection(words), fill=T, colClasses = >> c(rep("character", 8), rep("NULL", 30) ) , stringsAsFactors=FALSE) >> > var1 <- first8[[1]] >> > var1 >> [1] "I" "I" "but" >> >> -- >> David. >> >> >> >> >> Thank you, m >> -----Original Message----- >> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] >> On Behalf Of David Winsemius >> Sent: Tuesday, November 02, 2010 3:47 PM >> To: Gaj Vidmar >> Cc: r-h...@stat.math.ethz.ch >> Subject: Re: [R] spliting first 10 words in a string >> >> >> On Nov 2, 2010, at 6:24 AM, Gaj Vidmar wrote: >> >> Though <forbidden> in this list, in Excel it's just (literally!) >> five clicks >> away! >> (with the column in question selected) >> Data -> Text to Columns -> Delimited -> tick Space -> Finish >> Pa je! (~Voila in Slovenian) >> (then import back to R, keeping only the first 10 columns if so >> desired) >> >> >> You could do the same thing without needing to leave R. Just >> read.table( textConnection(..), header=FALSE, fill=TRUE) >> >> read.table(textConnection(words), fill=T) >> >> V1 V2 V3 V4 V5 V6 V7 V8 V9 >> V10 V11 V12 V13 V14 >> 1 I have a columnn with text that has quite >> a few words in it. >> 2 I would like to split these words in separate columns >> 3 but just first ten words in the string. Is that >> possible in R? >> >> >> Regards, >> Assist. Prof. Gaj Vidmar, PhD >> University Rehabilitattion Institute, Republic of Slovenia >> >> Irrelevant P.S. Long ago, before embarking on what eventually ended >> mainly >> in statistics, >> I did two years of geology, so (and also because of knowing what the >> poster's institute does) >> I even kinda imagine what these data are. >> >> "Matev¾ Pavliè" <matevz.pav...@gi-zrmk.si> wrote in message >> news:ad5ca6183570b54f92aa45ce2619f9b9d96...@gi-zrmk.si... >> >> Hi, >> >> I am sorry, will try to be more exact from now on... >> >> I have a data.frame with a field called Opis. IT contains >> sentenses that >> I would like to split in words or fields in data.frame...when I say >> columns I mean as in Excel table. I would like to split "Opis" into >> ten >> fields from the first ten words in Opis field. >> Here is an example of my data.frame. >> >> 'data.frame': 22928 obs. of 12 variables: >> $ VrtinaID : int 1 1 1 1 2 2 2 2 2 2 ... >> $ ZapStev : int 1 2 3 4 1 2 3 4 5 6 ... >> $ GlobinaOd : num 0 0.8 9.2 10.1 0 0.9 2.6 4.9 6.8 7.3 ... >> $ GlobinaDo : num 0.8 9.2 10.1 11 0.9 2.6 4.9 6.8 7.3 8.2 ... >> $ Opis : Factor w/ 12754 levels "","(MIVKA) DROBEN MELJAST >> PESEK, GOST, SIVORJAV",..: 2060 11588 2477 11660 7539 3182 7884 >> 9123 2500 >> 4756 ... >> $ ACklasifikacija : Factor w/ 290 levels "","(CL)","(CL)/(SC)",..: >> 154 125 >> 101 101 NA 106 125 80 106 101 ... >> $ GeolNastOd : num 0 0.8 9.2 10.1 0 0.9 2.6 4.9 6.8 7.3 ... >> $ GeolNastDo : num 0.8 9.2 10.1 11 0.9 2.6 4.9 6.8 7.3 8.2 ... >> $ GeolNastOpis : Factor w/ 113 levels "","B. M. S.",..: 56 53 53 >> 53 56 >> 53 53 53 53 53 ... >> $ NacinVrtanjaOd : num 0e+00 1e+09 1e+09 1e+09 0e+00 ... >> $ NacinVrtanjaDo : num 1.1e+01 1.0e+09 1.0e+09 1.0e+09 1.0e+01 ... >> $ NacinVrtanjaOpis: Factor w/ 43 levels "","H. N.","IZKOP",..: 26 1 >> 1 1 26 >> 1 1 1 1 1 ... >> >> Hope that explains better... >> Thank you, m >> >> -----Original Message----- >> From: David Winsemius [mailto:dwinsem...@comcast.net] >> Sent: Monday, November 01, 2010 10:13 PM >> To: Matev¾ Pavliè >> Cc: r-help@r-project.org >> Subject: Re: [R] spliting first 10 words in a string >> >> >> On Nov 1, 2010, at 4:39 PM, Matev¾ Pavliè wrote: >> >> Hi all, >> >> >> >> I have a columnn with text that has quite a few words in it. I would >> like to split these words in separate columns, but just first ten >> words in the string. Is that possible in R? >> >> >> Not sure what a column means to you. It's not a precisely defined R >> type or class. (And you are requested to offered a concrete example >> rather than making us guess.) >> >> words <-"I have a columnn with text that has quite a few words in >> >> it. I would like to split these words in separate columns, but just >> first ten words in the string. Is that possible in R?" >> >> strsplit(words, " ")[[1]][1:10] >> >> [1] "I" "have" "a" "columnn" "with" "text" >> "that" "has" "quite" "a" >> >> >> Or if in a dataframe: >> >> words <-c("I have a columnn with text that has quite a few words in >> >> it.", "I would like to split these words in separate columns", "but >> just first ten words in the string. Is that possible in R?") >> >> worddf <- data.frame(words=words) >> >> >> >> t(sapply(strsplit(worddf$words, " "), "[", 1:10) ) >> >> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [, >> 8] [,9] [,10] >> [1,] "I" "have" "a" "columnn" "with" "text" "that" "has" >> "quite" "a" >> [2,] "I" "would" "like" "to" "split" "these" "words" "in" >> "separate" "columns" >> [3,] "but" "just" "first" "ten" "words" "in" "the" >> "string." >> "Is" "that" >> >> >> -- >> David Winsemius, MD >> West Hartford, CT >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> >> David Winsemius, MD >> West Hartford, CT >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> >> David Winsemius, MD >> West Hartford, CT >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> >> > > [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.