I get to J little enough these days so I'm a bit rusty when it comes to the interesting stuff, and I'm stuck on a particular problem.
I start with a PDF report. I run it through pdftotext and then format/zulu's a2b to get a file that is mostly of the form value attribute value attribute value . . . value value attribute value . . . The first value of each entry has no explicit attribute name, although "entry name" would be a suitable attribute name. Some attributes span multiple rows, and attributes may be of any reasonable length and do include whitespace. I know the set of attribute names, and some include whitespace, too. Some entries don't use all attributes. There's one other complication: one attribute (call it 'location,' if you will) has multiple rows that indicate multiple locations. I need to duplicate the full entry for each location listed in that entry. For other's use, I want to output a csv file that has one entry per row and each attribute in a separate column, with empty cells where the attribute wasn't used. I can then sort, search, and aggregate inside J, as I wish, to process further myself. Here's an example bit of data: d1=: 0 : 0 alpha Attribute 1 bravo Attribute 2 charlie delta Location echo foxtrot golf Attribute 3 hotel india Attribute 1 juliet Attribute 2 kilo Location lima ) Here's what I think I want it to look like at an intermediate step: d2 =: 0 : 0 Attribute 0: alpha Attribute 1: bravo Attribute 2: charlie delta Location: echo Attribute 3: hotel Attribute 0: alpha Attribute 1: bravo Attribute 2: charlie delta Location: foxtrot Attribute 3: hotel Attribute 0: alpha Attribute 1: bravo Attribute 2: charlie delta Location: golf Attribute 3: hotel Attribute 0: india Attribute 1: juliet Attribute 2: kilo Location: lima Attribute 3: ) Attribute 0 is always a one-liner, so I detect its value by backing up one from 'Attribute 1'. (I didn't pick the file format. :-) ) There are about 20-40 lines at the start that I need to drop--everything before the first instance of a value for Attribute 0. The final result, ready for analysis, would look something like d3 =: 4 5 $ <;._2 d2 Better, it would look like that with everything up to and including the first ':' elided (the value entries can include multiple colons) and with the attributes as a header row. I can manage the header, and I'm pretty sure I can manage stripping out attribute names. I've looked at JfC chapter 23 as a potentially useful spot, but I haven't yet seen the light. Suggestions of fruitful paths forward? Thanks, Bill -- Bill Harris http://facilitatedsystems.com/weblog/ ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
