Re: [R] field values from text file to dataframe

2017-03-13 Thread Ulrik Stervbo
I imagine that the FieldStateOption is irrelevant, so you might be able to
create a data.frame like this:

library(tidyr)

fl <- readLines("pdf_dump.txt")

fl <- grep("FieldStateOption", fl, value = TRUE, invert = TRUE)

field_number <- vector(mode = "integer", length = length(fl))
tmpid <- 0
for(i in seq_along(1:length(fl))){
  if(fl[i] == "---"){
tmpid <- tmpid + 1
  }
  field_number[i] <- tmpid
}

data.frame(field_number, file_line = fl) %>%
  subset(file_line != "---") %>%
  separate(file_line,into = c("field_name", "field_value")) %>%
  spread(key = "field_name", value = "field_value")

The field_number is there to make each row in the final data.frame unique
(without it, `spread` complains)

HTH,
Ulrik





On Mon, 13 Mar 2017 at 09:28 Jim Lemon  wrote:

> Hi Vijayan,
> You have a bit of a problem with repeated field names. While you can
> mangle the field names to do something like this, I don't see how you
> are going to make sense of multiple "FieldStateOption" fields. The
> strategy I would take is to collect all of the field names and then
> set up rows with the unique field names, but the multiple field names
> will make a mess of that.
>
> Jim
>
>
> On Sun, Mar 12, 2017 at 2:13 AM, Vijayan Padmanabhan
>  wrote:
> > Dear r-help group
> > I have a text file which is a data dump of a pdf form as given below..
> > I want it to be converted into a data frame with field name as column
> names
> > and the field value as the row value for each field.
> > I might have different pdf forms with different field name value pairs to
> > process. so the script should not require reference to specific field
> names
> > in the extraction of data frame.
> > Where the field value for a given field is empty or where Field Value
> > doesn't appear.. the dataframe can record them as NA against that field
> > name column
> >
> > Will someone know how to get this accomplished using R?
> >
> >
> > Regards
> > VP
> >
> > ---
> > FieldType: Choice
> > FieldName: P1
> > FieldFlags: 4849664
> > FieldValue: P1
> > FieldValueDefault: P1
> > FieldJustification: Left
> > FieldStateOption:
> > FieldStateOption: P1
> > ---
> > FieldType: Choice
> > FieldName: P2
> > FieldFlags: 4849664
> > FieldValue:
> > FieldValueDefault: P2
> > FieldJustification: Left
> > FieldStateOption:
> > FieldStateOption: P2
> > ---
> > FieldType: Choice
> > FieldName: P3
> > FieldFlags: 4849664
> > FieldValue:
> > FieldValueDefault: P3
> > FieldJustification: Left
> > FieldStateOption:
> > FieldStateOption: P3
> > ---
> > FieldType: Choice
> > FieldName: P4
> > FieldFlags: 4849664
> > FieldValue: P2
> > FieldValueDefault: P2
> > FieldJustification: Left
> > FieldStateOption:
> > FieldStateOption: P2
> > ---
> > FieldType: Choice
> > FieldName: P5
> > FieldFlags: 4849664
> > FieldValue:
> > FieldValueDefault:
> > FieldJustification: Left
> > FieldStateOption:
> > FieldStateOption: P5
> > ---
> > FieldType: Choice
> > FieldName: P6
> > FieldFlags: 4849664
> > FieldValue:
> > FieldValueDefault:
> > FieldJustification: Left
> > FieldStateOption:
> > FieldStateOption: P6
> > ---
> > FieldType: Choice
> > FieldName: P7
> > FieldFlags: 4849664
> > FieldValue:
> > FieldValueDefault:
> > FieldJustification: Left
> > FieldStateOption:
> > FieldStateOption: P7
> > ---
> > FieldType: Choice
> > FieldName: P8
> > FieldFlags: 4849664
> > FieldValue:
> > FieldValueDefault:
> > FieldJustification: Left
> > FieldStateOption:
> > FieldStateOption: P8
> > ---
> > FieldType: Choice
> > FieldName: P1IDS
> > FieldFlags: 4849664
> > FieldValue: 2
> > FieldValueDefault:
> > FieldJustification: Left
> > FieldStateOption:
> > FieldStateOption: 1
> > FieldStateOption: 2
> > FieldStateOption: 3
> > FieldStateOption: 4
> > FieldStateOption: 5
> > ---
> > FieldType: Choice
> > FieldName: P1PDS
> > FieldFlags: 4849664
> > FieldValue:
> > FieldValueDefault:
> > FieldJustification: Left
> > FieldStateOption:
> > FieldStateOption: 1
> > FieldStateOption: 2
> > FieldStateOption: 3
> > FieldStateOption: 4
> > FieldStateOption: 5
> > ---
> > FieldType: Choice
> > FieldName: P1IIU
> > FieldFlags: 4849664
> > FieldValue:
> > FieldValueDefault:
> > FieldJustification: Left
> > FieldStateOption:
> > FieldStateOption: 1
> > FieldStateOption: 2
> > FieldStateOption: 3
> > FieldStateOption: 4
> > FieldStateOption: 5
> > ---
> > FieldType: Choice
> > FieldName: P1PIU
> > FieldFlags: 4849664
> > FieldValue:
> > FieldValueDefault:
> > FieldJustification: Left
> > FieldStateOption:
> > FieldStateOption: 1
> > FieldStateOption: 2
> > FieldStateOption: 3
> > FieldStateOption: 4
> > FieldStateOption: 5
> > ---
> > FieldType: Choice
> > FieldName: P1IPU
> > FieldFlags: 4849664
> > FieldValue: 3
> > FieldValueDefault:
> > FieldJustification: Left
> > FieldStateOption:
> > FieldStateOption: 1
> > FieldStateOption: 2
> > FieldStateOption: 3
> > FieldStateOption: 4
> > FieldStateOption: 5
> > ---
> > FieldType: Choice
> > FieldName: P1PPU
> > FieldFlags: 4849664
> > FieldValue:
> > FieldValueDefault:

Re: [R] field values from text file to dataframe

2017-03-13 Thread Jim Lemon
Hi Vijayan,
You have a bit of a problem with repeated field names. While you can
mangle the field names to do something like this, I don't see how you
are going to make sense of multiple "FieldStateOption" fields. The
strategy I would take is to collect all of the field names and then
set up rows with the unique field names, but the multiple field names
will make a mess of that.

Jim


On Sun, Mar 12, 2017 at 2:13 AM, Vijayan Padmanabhan
 wrote:
> Dear r-help group
> I have a text file which is a data dump of a pdf form as given below..
> I want it to be converted into a data frame with field name as column names
> and the field value as the row value for each field.
> I might have different pdf forms with different field name value pairs to
> process. so the script should not require reference to specific field names
> in the extraction of data frame.
> Where the field value for a given field is empty or where Field Value
> doesn't appear.. the dataframe can record them as NA against that field
> name column
>
> Will someone know how to get this accomplished using R?
>
>
> Regards
> VP
>
> ---
> FieldType: Choice
> FieldName: P1
> FieldFlags: 4849664
> FieldValue: P1
> FieldValueDefault: P1
> FieldJustification: Left
> FieldStateOption:
> FieldStateOption: P1
> ---
> FieldType: Choice
> FieldName: P2
> FieldFlags: 4849664
> FieldValue:
> FieldValueDefault: P2
> FieldJustification: Left
> FieldStateOption:
> FieldStateOption: P2
> ---
> FieldType: Choice
> FieldName: P3
> FieldFlags: 4849664
> FieldValue:
> FieldValueDefault: P3
> FieldJustification: Left
> FieldStateOption:
> FieldStateOption: P3
> ---
> FieldType: Choice
> FieldName: P4
> FieldFlags: 4849664
> FieldValue: P2
> FieldValueDefault: P2
> FieldJustification: Left
> FieldStateOption:
> FieldStateOption: P2
> ---
> FieldType: Choice
> FieldName: P5
> FieldFlags: 4849664
> FieldValue:
> FieldValueDefault:
> FieldJustification: Left
> FieldStateOption:
> FieldStateOption: P5
> ---
> FieldType: Choice
> FieldName: P6
> FieldFlags: 4849664
> FieldValue:
> FieldValueDefault:
> FieldJustification: Left
> FieldStateOption:
> FieldStateOption: P6
> ---
> FieldType: Choice
> FieldName: P7
> FieldFlags: 4849664
> FieldValue:
> FieldValueDefault:
> FieldJustification: Left
> FieldStateOption:
> FieldStateOption: P7
> ---
> FieldType: Choice
> FieldName: P8
> FieldFlags: 4849664
> FieldValue:
> FieldValueDefault:
> FieldJustification: Left
> FieldStateOption:
> FieldStateOption: P8
> ---
> FieldType: Choice
> FieldName: P1IDS
> FieldFlags: 4849664
> FieldValue: 2
> FieldValueDefault:
> FieldJustification: Left
> FieldStateOption:
> FieldStateOption: 1
> FieldStateOption: 2
> FieldStateOption: 3
> FieldStateOption: 4
> FieldStateOption: 5
> ---
> FieldType: Choice
> FieldName: P1PDS
> FieldFlags: 4849664
> FieldValue:
> FieldValueDefault:
> FieldJustification: Left
> FieldStateOption:
> FieldStateOption: 1
> FieldStateOption: 2
> FieldStateOption: 3
> FieldStateOption: 4
> FieldStateOption: 5
> ---
> FieldType: Choice
> FieldName: P1IIU
> FieldFlags: 4849664
> FieldValue:
> FieldValueDefault:
> FieldJustification: Left
> FieldStateOption:
> FieldStateOption: 1
> FieldStateOption: 2
> FieldStateOption: 3
> FieldStateOption: 4
> FieldStateOption: 5
> ---
> FieldType: Choice
> FieldName: P1PIU
> FieldFlags: 4849664
> FieldValue:
> FieldValueDefault:
> FieldJustification: Left
> FieldStateOption:
> FieldStateOption: 1
> FieldStateOption: 2
> FieldStateOption: 3
> FieldStateOption: 4
> FieldStateOption: 5
> ---
> FieldType: Choice
> FieldName: P1IPU
> FieldFlags: 4849664
> FieldValue: 3
> FieldValueDefault:
> FieldJustification: Left
> FieldStateOption:
> FieldStateOption: 1
> FieldStateOption: 2
> FieldStateOption: 3
> FieldStateOption: 4
> FieldStateOption: 5
> ---
> FieldType: Choice
> FieldName: P1PPU
> FieldFlags: 4849664
> FieldValue:
> FieldValueDefault:
> FieldJustification: Left
> FieldStateOption:
> FieldStateOption: 1
> FieldStateOption: 2
> FieldStateOption: 3
> FieldStateOption: 4
> FieldStateOption: 5
> ---
> FieldType: Choice
> FieldName: P2IDS
> FieldFlags: 4849664
> FieldValue:
> FieldValueDefault:
> FieldJustification: Left
> FieldStateOption:
> FieldStateOption: 1
> FieldStateOption: 2
> FieldStateOption: 3
> FieldStateOption: 4
> FieldStateOption: 5
> ---
> FieldType: Choice
> FieldName: P2IIU
> FieldFlags: 4849664
> FieldValue:
> FieldValueDefault:
> FieldJustification: Left
> FieldStateOption:
> FieldStateOption: 1
> FieldStateOption: 2
> FieldStateOption: 3
> FieldStateOption: 4
> FieldStateOption: 5
> ---
> FieldType: Choice
> FieldName: P2PIU
> FieldFlags: 4849664
> FieldValue:
> FieldValueDefault:
> FieldJustification: Left
> FieldStateOption:
> FieldStateOption: 1
> FieldStateOption: 2
> FieldStateOption: 3
> FieldStateOption: 4
> FieldStateOption: 5
> ---
> FieldType: Choice
> FieldName: P2IPU
> FieldFlags: 4849664
> FieldValue:
> FieldValueDefault:
> FieldJustification: Left
> FieldStateOption:
> FieldStateOption: 1
> FieldStateOptio