On 19/05/2007 3:14 PM, Paddy wrote: > On May 19, 12:07 am, py_genetic <[EMAIL PROTECTED]> wrote: >> Hello, >> >> I'm importing large text files of data using csv. I would like to add >> some more auto sensing abilities. I'm considing sampling the data >> file and doing some fuzzy logic scoring on the attributes (colls in a >> data base/ csv file, eg. height weight income etc.) to determine the >> most efficient 'type' to convert the attribute coll into for further >> processing and efficient storage... >> >> Example row from sampled file data: [ ['8','2.33', 'A', 'BB', 'hello >> there' '100,000,000,000'], [next row...] ....] >> >> Aside from a missing attribute designator, we can assume that the same >> type of data continues through a coll. For example, a string, int8, >> int16, float etc. >> >> 1. What is the most efficient way in python to test weather a string >> can be converted into a given numeric type, or left alone if its >> really a string like 'A' or 'hello'? Speed is key? Any thoughts? >> >> 2. Is there anything out there already which deals with this issue? >> >> Thanks, >> Conor > > You might try investigating what can generate your data. With luck, > it could turn out that the data generator is methodical and column > data-types are consistent and easily determined by testing the > first or second row. At worst, you will get to know how much you > must check for human errors. >
Here you go, Paddy, the following has been generated very methodically; what data type is the first column? What is the value in the first column of the 6th row likely to be? "$39,082.00","$123,456.78" "$39,113.00","$124,218.10" "$39,141.00","$124,973.76" "$39,172.00","$125,806.92" "$39,202.00","$126,593.21" N.B. I've kindly given you five lines instead of one or two :-) Cheers, John -- http://mail.python.org/mailman/listinfo/python-list