On Mar 8, 1:53 am, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote: > On Mar 6, 9:17 pm, Luis M. González <[EMAIL PROTECTED]> wrote: > > > > > On 6 mar, 11:27, Pierre Quentel <[EMAIL PROTECTED]> wrote: > > > > Hi, > > > > I would like to know if there is a module that converts a string to a > > > value of the "most probable type" ; for instance : > > > - if the string is "abcd" the value is the same string "abcd" > > > - string "123" : value = the integer 123 > > > - string "-1.23" (or "-1,23" if the locale for decimals is ,) : value > > > = the float -1.23 > > > - string "2008/03/06" (the format is also locale-dependant) : value = > > > datetime.date(2008,03,06) > > > > Like in spreadsheets, special prefixes could be used to force the > > > type : for instance '123 would be converted to the *string* "123" > > > instead of the *integer* 123 > > > > I could code it myself, but this wheel is probably already invented > > > > Regards, > > > Pierre > > >>> def convert(x): > > > if '.' in x: > > try: return float(x) > > except ValueError: return x > > else: > > try: return int(x) > > except: return x > > > >>> convert('123') > > 123 > > >>> convert('123.99') > > 123.98999999999999 > > >>> convert('hello') > > > 'hello' > > Neat solution. The real challenge though is whether to support > localised dates, these are all valid: > 20/10/01 > 102001 > 20-10-2001 > 20011020
Neat solution doesn't handle the case of using dots as date separators e.g. 20.10.01 [they are used in dates in some locales and the location of . on the numeric keypad is easier on the pinkies than / or -] I'm a bit dubious about the utility of "most likely format" for ONE input. I've used a brute-force approach when inspecting largish CSV files (with a low but non-zero rate of typos etc) with the goal of determining what is the most likely type of data in each column. E.g 102001 could be a valid MMDDYY date, but not a valid DDMMYY or YYMMDD date. 121212 could be all of those. Both qualify as int, float and text. A column with 100% of entries qualifying as text, 99.999% as float, 99.99% as integer, 99.9% as DDMMYY, and much lower percentages as MMDDYY and YYMMDD would be tagged as DDMMYY. The general rule is: pick the type whose priority is highest and whose score exceeds a threshold. Priorities: date > int > float > text. Finding the date order works well with things like date of birth where there is a wide distribution of days and years. However a field (e.g. date interest credited to bank account) where the day is always 01 and the year is in 01 to 08 would give the same scores for each of 3 date orders ... eye-balling the actual data never goes astray. -- http://mail.python.org/mailman/listinfo/python-list