Re: [Jprogramming] Extracting data from a text

David Ward Lambert Sun, 19 Dec 2010 16:24:23 -0800

At first I thought numbers   ".    would be close.  However, it finds
only 3 values per human readable line, I supposed, because the line
feeds aren't recognized as white space.  So I wrote:

   5 4$(#~ _&~:) _".' 'I.@:(LF&=)@]}T

The amend dictionary entries seem strange, and I figured them out
yesterday.  Essentially you can have 3 2 or 1 verbs before } but
multiple verbs are packed into a gerund, causing multiple entries in the
dictionary.

modified_noun =: ' 'I.@:(LF&=)@]}T   replaces in T linefeeds with
spaces.  Apparently there is a library function for string replace (or
regular expressions would work).  I'm not yet well versed in the j
libraries.

numerical_data =: _".modified_noun    converts to numerical data
substituting infinity for unrecognized words.

(#~ _&~:) numerical_data   is (at the least related to) standard
filtering technique, and removes the non-numbers.

5 4$   is an overly direct reshaping according to the request.

Dave

On Mon, 2010-12-20 at 06:04 +0800, [email protected]
wrote:
> From: Bo Jacoby <[email protected]>
> Subject: [Jprogramming]  Extracting data from a text
> To: Programming forum <[email protected]>
> Message-ID: <[email protected]>
> Content-Type: text/plain; charset=utf-8
> 
> I need elementary advice. Please help me.
> 
> I have a text T, retrieved by cut and past from a pdf-document, like
> this:
> 
>    T =. 0 : 0
> SN 1990O . . . . . . 9065 134.7 67.3 2.3
> SN 1990T . . . . . . . 12012 158.9 75.6 3.1
> SN 1990af . . . . . . 15055 198.6 75.8 2.8
> SN 1991S . . . . . . . 16687 238.9 69.8 2.8
> SN 1991U . . . . . . 9801 117.1 83.7 3.4
> )
> 
> How to produce a 5 row 4 column array containing the numeric data from
> T? 
> 
> - Bo
> 
> 

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Extracting data from a text

Reply via email to