Re: [R] field values from text file to dataframe
I imagine that the FieldStateOption is irrelevant, so you might be able to create a data.frame like this: library(tidyr) fl <- readLines("pdf_dump.txt") fl <- grep("FieldStateOption", fl, value = TRUE, invert = TRUE) field_number <- vector(mode = "integer", length = length(fl)) tmpid <- 0 for(i in seq_along(1:length(fl))){ if(fl[i] == "---"){ tmpid <- tmpid + 1 } field_number[i] <- tmpid } data.frame(field_number, file_line = fl) %>% subset(file_line != "---") %>% separate(file_line,into = c("field_name", "field_value")) %>% spread(key = "field_name", value = "field_value") The field_number is there to make each row in the final data.frame unique (without it, `spread` complains) HTH, Ulrik On Mon, 13 Mar 2017 at 09:28 Jim Lemon wrote: > Hi Vijayan, > You have a bit of a problem with repeated field names. While you can > mangle the field names to do something like this, I don't see how you > are going to make sense of multiple "FieldStateOption" fields. The > strategy I would take is to collect all of the field names and then > set up rows with the unique field names, but the multiple field names > will make a mess of that. > > Jim > > > On Sun, Mar 12, 2017 at 2:13 AM, Vijayan Padmanabhan > wrote: > > Dear r-help group > > I have a text file which is a data dump of a pdf form as given below.. > > I want it to be converted into a data frame with field name as column > names > > and the field value as the row value for each field. > > I might have different pdf forms with different field name value pairs to > > process. so the script should not require reference to specific field > names > > in the extraction of data frame. > > Where the field value for a given field is empty or where Field Value > > doesn't appear.. the dataframe can record them as NA against that field > > name column > > > > Will someone know how to get this accomplished using R? > > > > > > Regards > > VP > > > > --- > > FieldType: Choice > > FieldName: P1 > > FieldFlags: 4849664 > > FieldValue: P1 > > FieldValueDefault: P1 > > FieldJustification: Left > > FieldStateOption: > > FieldStateOption: P1 > > --- > > FieldType: Choice > > FieldName: P2 > > FieldFlags: 4849664 > > FieldValue: > > FieldValueDefault: P2 > > FieldJustification: Left > > FieldStateOption: > > FieldStateOption: P2 > > --- > > FieldType: Choice > > FieldName: P3 > > FieldFlags: 4849664 > > FieldValue: > > FieldValueDefault: P3 > > FieldJustification: Left > > FieldStateOption: > > FieldStateOption: P3 > > --- > > FieldType: Choice > > FieldName: P4 > > FieldFlags: 4849664 > > FieldValue: P2 > > FieldValueDefault: P2 > > FieldJustification: Left > > FieldStateOption: > > FieldStateOption: P2 > > --- > > FieldType: Choice > > FieldName: P5 > > FieldFlags: 4849664 > > FieldValue: > > FieldValueDefault: > > FieldJustification: Left > > FieldStateOption: > > FieldStateOption: P5 > > --- > > FieldType: Choice > > FieldName: P6 > > FieldFlags: 4849664 > > FieldValue: > > FieldValueDefault: > > FieldJustification: Left > > FieldStateOption: > > FieldStateOption: P6 > > --- > > FieldType: Choice > > FieldName: P7 > > FieldFlags: 4849664 > > FieldValue: > > FieldValueDefault: > > FieldJustification: Left > > FieldStateOption: > > FieldStateOption: P7 > > --- > > FieldType: Choice > > FieldName: P8 > > FieldFlags: 4849664 > > FieldValue: > > FieldValueDefault: > > FieldJustification: Left > > FieldStateOption: > > FieldStateOption: P8 > > --- > > FieldType: Choice > > FieldName: P1IDS > > FieldFlags: 4849664 > > FieldValue: 2 > > FieldValueDefault: > > FieldJustification: Left > > FieldStateOption: > > FieldStateOption: 1 > > FieldStateOption: 2 > > FieldStateOption: 3 > > FieldStateOption: 4 > > FieldStateOption: 5 > > --- > > FieldType: Choice > > FieldName: P1PDS > > FieldFlags: 4849664 > > FieldValue: > > FieldValueDefault: > > FieldJustification: Left > > FieldStateOption: > > FieldStateOption: 1 > > FieldStateOption: 2 > > FieldStateOption: 3 > > FieldStateOption: 4 > > FieldStateOption: 5 > > --- > > FieldType: Choice > > FieldName: P1IIU > > FieldFlags: 4849664 > > FieldValue: > > FieldValueDefault: > > FieldJustification: Left > > FieldStateOption: > > FieldStateOption: 1 > > FieldStateOption: 2 > > FieldStateOption: 3 > > FieldStateOption: 4 > > FieldStateOption: 5 > > --- > > FieldType: Choice > > FieldName: P1PIU > > FieldFlags: 4849664 > > FieldValue: > > FieldValueDefault: > > FieldJustification: Left > > FieldStateOption: > > FieldStateOption: 1 > > FieldStateOption: 2 > > FieldStateOption: 3 > > FieldStateOption: 4 > > FieldStateOption: 5 > > --- > > FieldType: Choice > > FieldName: P1IPU > > FieldFlags: 4849664 > > FieldValue: 3 > > FieldValueDefault: > > FieldJustification: Left > > FieldStateOption: > > FieldStateOption: 1 > > FieldStateOption: 2 > > FieldStateOption: 3 > > FieldStateOption: 4 > > FieldStateOption: 5 > > --- > > FieldType: Choice > > FieldName: P1PPU > > FieldFlags: 4849664 > > FieldValue: > > FieldValueDefault:
Re: [R] field values from text file to dataframe
Hi Vijayan, You have a bit of a problem with repeated field names. While you can mangle the field names to do something like this, I don't see how you are going to make sense of multiple "FieldStateOption" fields. The strategy I would take is to collect all of the field names and then set up rows with the unique field names, but the multiple field names will make a mess of that. Jim On Sun, Mar 12, 2017 at 2:13 AM, Vijayan Padmanabhan wrote: > Dear r-help group > I have a text file which is a data dump of a pdf form as given below.. > I want it to be converted into a data frame with field name as column names > and the field value as the row value for each field. > I might have different pdf forms with different field name value pairs to > process. so the script should not require reference to specific field names > in the extraction of data frame. > Where the field value for a given field is empty or where Field Value > doesn't appear.. the dataframe can record them as NA against that field > name column > > Will someone know how to get this accomplished using R? > > > Regards > VP > > --- > FieldType: Choice > FieldName: P1 > FieldFlags: 4849664 > FieldValue: P1 > FieldValueDefault: P1 > FieldJustification: Left > FieldStateOption: > FieldStateOption: P1 > --- > FieldType: Choice > FieldName: P2 > FieldFlags: 4849664 > FieldValue: > FieldValueDefault: P2 > FieldJustification: Left > FieldStateOption: > FieldStateOption: P2 > --- > FieldType: Choice > FieldName: P3 > FieldFlags: 4849664 > FieldValue: > FieldValueDefault: P3 > FieldJustification: Left > FieldStateOption: > FieldStateOption: P3 > --- > FieldType: Choice > FieldName: P4 > FieldFlags: 4849664 > FieldValue: P2 > FieldValueDefault: P2 > FieldJustification: Left > FieldStateOption: > FieldStateOption: P2 > --- > FieldType: Choice > FieldName: P5 > FieldFlags: 4849664 > FieldValue: > FieldValueDefault: > FieldJustification: Left > FieldStateOption: > FieldStateOption: P5 > --- > FieldType: Choice > FieldName: P6 > FieldFlags: 4849664 > FieldValue: > FieldValueDefault: > FieldJustification: Left > FieldStateOption: > FieldStateOption: P6 > --- > FieldType: Choice > FieldName: P7 > FieldFlags: 4849664 > FieldValue: > FieldValueDefault: > FieldJustification: Left > FieldStateOption: > FieldStateOption: P7 > --- > FieldType: Choice > FieldName: P8 > FieldFlags: 4849664 > FieldValue: > FieldValueDefault: > FieldJustification: Left > FieldStateOption: > FieldStateOption: P8 > --- > FieldType: Choice > FieldName: P1IDS > FieldFlags: 4849664 > FieldValue: 2 > FieldValueDefault: > FieldJustification: Left > FieldStateOption: > FieldStateOption: 1 > FieldStateOption: 2 > FieldStateOption: 3 > FieldStateOption: 4 > FieldStateOption: 5 > --- > FieldType: Choice > FieldName: P1PDS > FieldFlags: 4849664 > FieldValue: > FieldValueDefault: > FieldJustification: Left > FieldStateOption: > FieldStateOption: 1 > FieldStateOption: 2 > FieldStateOption: 3 > FieldStateOption: 4 > FieldStateOption: 5 > --- > FieldType: Choice > FieldName: P1IIU > FieldFlags: 4849664 > FieldValue: > FieldValueDefault: > FieldJustification: Left > FieldStateOption: > FieldStateOption: 1 > FieldStateOption: 2 > FieldStateOption: 3 > FieldStateOption: 4 > FieldStateOption: 5 > --- > FieldType: Choice > FieldName: P1PIU > FieldFlags: 4849664 > FieldValue: > FieldValueDefault: > FieldJustification: Left > FieldStateOption: > FieldStateOption: 1 > FieldStateOption: 2 > FieldStateOption: 3 > FieldStateOption: 4 > FieldStateOption: 5 > --- > FieldType: Choice > FieldName: P1IPU > FieldFlags: 4849664 > FieldValue: 3 > FieldValueDefault: > FieldJustification: Left > FieldStateOption: > FieldStateOption: 1 > FieldStateOption: 2 > FieldStateOption: 3 > FieldStateOption: 4 > FieldStateOption: 5 > --- > FieldType: Choice > FieldName: P1PPU > FieldFlags: 4849664 > FieldValue: > FieldValueDefault: > FieldJustification: Left > FieldStateOption: > FieldStateOption: 1 > FieldStateOption: 2 > FieldStateOption: 3 > FieldStateOption: 4 > FieldStateOption: 5 > --- > FieldType: Choice > FieldName: P2IDS > FieldFlags: 4849664 > FieldValue: > FieldValueDefault: > FieldJustification: Left > FieldStateOption: > FieldStateOption: 1 > FieldStateOption: 2 > FieldStateOption: 3 > FieldStateOption: 4 > FieldStateOption: 5 > --- > FieldType: Choice > FieldName: P2IIU > FieldFlags: 4849664 > FieldValue: > FieldValueDefault: > FieldJustification: Left > FieldStateOption: > FieldStateOption: 1 > FieldStateOption: 2 > FieldStateOption: 3 > FieldStateOption: 4 > FieldStateOption: 5 > --- > FieldType: Choice > FieldName: P2PIU > FieldFlags: 4849664 > FieldValue: > FieldValueDefault: > FieldJustification: Left > FieldStateOption: > FieldStateOption: 1 > FieldStateOption: 2 > FieldStateOption: 3 > FieldStateOption: 4 > FieldStateOption: 5 > --- > FieldType: Choice > FieldName: P2IPU > FieldFlags: 4849664 > FieldValue: > FieldValueDefault: > FieldJustification: Left > FieldStateOption: > FieldStateOption: 1 > FieldStateOptio