Let us take the max space is two and the output should not be fixed filed but preferable a csv file.
On Mon, Feb 22, 2021 at 8:05 PM jim holtman <jholt...@gmail.com> wrote: > > Messed up did not see your 'desired' output which will be hard since there is > not a consistent number of spaces that would represent the desired column > number. Do you have any hit as to how to interpret the spacing especially > you have several hundred more lines? Is the output supposed to the 'fixed' > field? > > Jim Holtman > Data Munger Guru > > What is the problem that you are trying to solve? > Tell me what you want to do, not how you want to do it. > > > On Mon, Feb 22, 2021 at 5:00 PM jim holtman <jholt...@gmail.com> wrote: >> >> Try this: >> >> > library(tidyverse) >> >> > text <- "x1 x2 x3 x4\n1 B12 \n2 C23 \n322 B32 D34 \n4 >> > D44 \n51 D53\n60 D62 " >> >> > # read in the data as characters and replace multiple blanks with single >> > blank >> > input <- read_lines(text) >> >> > input <- str_replace_all(input, ' +', ' ') >> >> > mydata <- read_delim(input, ' ', col_names = TRUE) >> Warning: 5 parsing failures. >> row col expected actual file >> 1 -- 4 columns 3 columns literal data >> 2 -- 4 columns 3 columns literal data >> 4 -- 4 columns 3 columns literal data >> 5 -- 4 columns 2 columns literal data >> 6 -- 4 columns 3 columns literal data >> >> > mydata >> # A tibble: 6 x 4 >> x1 x2 x3 x4 >> <dbl> <chr> <chr> <lgl> >> 1 1 B12 NA NA >> 2 2 C23 NA NA >> 3 322 B32 D34 NA >> 4 4 D44 NA NA >> 5 51 D53 NA NA >> 6 60 D62 NA NA >> > >> >> Jim Holtman >> Data Munger Guru >> >> What is the problem that you are trying to solve? >> Tell me what you want to do, not how you want to do it. >> >> >> Jim Holtman >> Data Munger Guru >> >> What is the problem that you are trying to solve? >> Tell me what you want to do, not how you want to do it. >> >> >> On Mon, Feb 22, 2021 at 4:49 PM Val <valkr...@gmail.com> wrote: >>> >>> That is my problem. The spacing between columns is not consistent. It >>> may be single space or multiple spaces (two or three). >>> >>> On Mon, Feb 22, 2021 at 6:14 PM Bill Dunlap <williamwdun...@gmail.com> >>> wrote: >>> > >>> > You said the column values were separated by space characters. >>> > Copying the text from gmail shows that some column names and column >>> > values are separated by single spaces (e.g., between x1 and x2) and >>> > some by multiple spaces (e.g., between x3 and x4. Did the mail mess >>> > up the spacing or is there some other way to tell where the omitted >>> > values are? >>> > >>> > -Bill >>> > >>> > On Mon, Feb 22, 2021 at 2:54 PM Val <valkr...@gmail.com> wrote: >>> > > >>> > > I Tried that one and it did not work. Please see the error message >>> > > Error in read.table(text = "x1 x2 x3 x4\n1 B12 \n2 C23 >>> > > \n322 B32 D34 \n4 D44 \n51 D53\n60 D62 ", >>> > > : >>> > > more columns than column names >>> > > >>> > > On Mon, Feb 22, 2021 at 5:39 PM Bill Dunlap <williamwdun...@gmail.com> >>> > > wrote: >>> > > > >>> > > > Since the columns in the file are separated by a space character, " ", >>> > > > add the read.table argument sep=" ". >>> > > > >>> > > > -Bill >>> > > > >>> > > > On Mon, Feb 22, 2021 at 2:21 PM Val <valkr...@gmail.com> wrote: >>> > > > > >>> > > > > Hi all, I am trying to read a messy data but facing difficulty. >>> > > > > The >>> > > > > data has several columns separated by blank space(s). Each column >>> > > > > value may have different lengths across the rows. The first >>> > > > > row(header) has four columns. However, each row may not have the >>> > > > > four >>> > > > > column values. For instance, the first data row has only the first >>> > > > > two column values. The fourth data row has the first and last column >>> > > > > values, the second and the third column values are missing for this >>> > > > > row.. How do I read this data set correctly? Here is my sample data >>> > > > > set, output and desired output. To make it clear to each data >>> > > > > point >>> > > > > I have added the row and column numbers. I cannot use fixed width >>> > > > > format reading because each row may have different length for a >>> > > > > given column. >>> > > > > >>> > > > > dat<-read.table(text="x1 x2 x3 x4 >>> > > > > 1 B22 >>> > > > > 2 C33 >>> > > > > 322 B22 D34 >>> > > > > 4 D44 >>> > > > > 51 D53 >>> > > > > 60 D62 ",header=T, fill=T,na.strings=c("","NA")) >>> > > > > >>> > > > > Output >>> > > > > x1 x2 x3 x4 >>> > > > > 1 1 B12 <NA> NA >>> > > > > 2 2 C23 <NA> NA >>> > > > > 3 322 B32 D34 NA >>> > > > > 4 4 D44 <NA> NA >>> > > > > 5 51 D53 <NA> NA >>> > > > > 6 60 D62 <NA> NA >>> > > > > >>> > > > > >>> > > > > Desired output >>> > > > > x1 x2 x3 x4 >>> > > > > 1 1 B22 <NA> NA >>> > > > > 2 2 <NA> C33 NA >>> > > > > 3 322 B32 NA D34 >>> > > > > 4 4 <NA> NA D44 >>> > > > > 5 51 <NA> D53 NA >>> > > > > 6 60 D62 <NA> NA >>> > > > > >>> > > > > Thank you, >>> > > > > >>> > > > > ______________________________________________ >>> > > > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> > > > > https://stat.ethz.ch/mailman/listinfo/r-help >>> > > > > PLEASE do read the posting guide >>> > > > > http://www.R-project.org/posting-guide.html >>> > > > > and provide commented, minimal, self-contained, reproducible code. >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.