On 22Sep2019 07:39, Albert-Jan Roskam <[email protected]> wrote:
On 22 Sep 2019 04:27, Cameron Simpson <[email protected]> wrote: On 21Sep2019 20:42, Markos <[email protected]> wrote:I have a table.csv file with the following structure:, Polyarene conc ,, mg L-1 ,,,,,,, Spectrum, Py, Ace, Anth, 1, "0,456", "0,120", "0,168" 2, "0,456", "0,040", "0,280" 3, "0,152", "0,200", "0,280" I open as dataframe with the command: data = pd.read_csv ('table.csv', sep = ',', skiprows = 1)[...]And the data_array variable gets the fields in string format: [['0,456' '0,120' '0,168'][...]Please see the documentation for the >read_csv function here:https://pandas.pydata.org/pandasdocs/stable/reference/api/pandas.read_cs> v.html?highlight=read_csv#pandas.read_csvDo you think it's a deliberate design choice that decimal and thousands where used here as params, and not a 'locale' param? It seems nice to be able to specify e.g. locale='dutch' and then all the right lc_numeric, lc_monetary, lc_time where used. Or even locale='nl_NL.1252' and you also wouldn't need 'encoding' as a separate param. Or might that be bad on windows where there's no locale-gen? Just wondering...
Locales are tricky; I don't know enough.A locale parameter might be convenient for some things, but such things are table driven. From an arbitrary Linux box nearby:
% locale -a C C.UTF-8 POSIX en_AU.utf8 No "dutch" or similar there.I doubt pandas would ship with such a thing. And the OP probably doesn't know the originating locale anyway. Nor do _we_ know that those values themselves were driven from some well known locale table.
The advantage of specifical decimal= and thousands= parameters is that they do exactly what they say, rather than looking up a locale and hoping for a specific side effect. So the specific parameters offer better control.
The thousands= itself is a little parachial (for example, in India a factor of 100 is a common division point[1]), but it may merely be used to strip this character from the left portion of the number.
[1] https://en.wikipedia.org/wiki/Indian_numbering_systemSo while I am not a pandas person, I would expect that decimal= and thousands= are useful parameters for specific lexical situations (like the OP's CSV data) and work regardless of any locale knowledge.
Cheers, Cameron Simpson <[email protected]> -- https://mail.python.org/mailman/listinfo/python-list
