At first I thought numbers ". would be close. However, it finds only 3 values per human readable line, I supposed, because the line feeds aren't recognized as white space. So I wrote:
5 4$(#~ _&~:) _".' 'I.@:(LF&=)@]}T The amend dictionary entries seem strange, and I figured them out yesterday. Essentially you can have 3 2 or 1 verbs before } but multiple verbs are packed into a gerund, causing multiple entries in the dictionary. modified_noun =: ' 'I.@:(LF&=)@]}T replaces in T linefeeds with spaces. Apparently there is a library function for string replace (or regular expressions would work). I'm not yet well versed in the j libraries. numerical_data =: _".modified_noun converts to numerical data substituting infinity for unrecognized words. (#~ _&~:) numerical_data is (at the least related to) standard filtering technique, and removes the non-numbers. 5 4$ is an overly direct reshaping according to the request. Dave On Mon, 2010-12-20 at 06:04 +0800, [email protected] wrote: > From: Bo Jacoby <[email protected]> > Subject: [Jprogramming] Extracting data from a text > To: Programming forum <[email protected]> > Message-ID: <[email protected]> > Content-Type: text/plain; charset=utf-8 > > I need elementary advice. Please help me. > > I have a text T, retrieved by cut and past from a pdf-document, like > this: > > T =. 0 : 0 > SN 1990O . . . . . . 9065 134.7 67.3 2.3 > SN 1990T . . . . . . . 12012 158.9 75.6 3.1 > SN 1990af . . . . . . 15055 198.6 75.8 2.8 > SN 1991S . . . . . . . 16687 238.9 69.8 2.8 > SN 1991U . . . . . . 9801 117.1 83.7 3.4 > ) > > How to produce a 5 row 4 column array containing the numeric data from > T? > > - Bo > > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
