On 01/15/2014 11:25 AM, Daπid wrote: > On 15 January 2014 11:12, Hedieh Ebrahimi <hedieh.ebrah...@amphos21.com > <mailto:hedieh.ebrah...@amphos21.com>> wrote: > > I try to print my fileContent array after I read it and it looks > like this : > > ["b'C:\\\\Users\\\\Documents\\\\Project\\\\mytextfile1.txt'" > "b'C:\\\\Users\\\\Documents\\\\Project\\\\mytextfile2.txt'" > "b'C:\\\\Users\\\\Documents\\\\Project\\\\mytextfile3.txt'"] > > Why is this happening and how can I prevent it ? > Also if I have a line that starts like this in my file, python will > crash on me. how can i fix this ? > > > What is wrong with this case? If you are concerned about the multiple > backslashes, they are there because they are special symbols, and so > they have to be escaped (you actually want a backslash, not whatever > else they could mean). >
you have the bytes representation and a duplicate slash in it. Its due to unicode strings in python3. A workaround that only works for ascii is: np.loadtxt(file, dtype=bytes).astype(str) for non ascii I guess you should use python directly as numpy would also require a python loop with explicit decoding. Currently handling strings in python3 with numpy is even worse than before, you always have to go over bytes and do explicit decodes to get python strings out of ascii data. What we might need in numpy is new string xtypes specifying encodings to allow sane conversion to python3 strings without the excessive memory usage of 4 byte unicode (ucs-4). e.g. if its ascii reuse a (which currently maps to bytes) np.loadtxt(file, dtype='a') for utf 8 data: d = np.loadtxt(file, dtype='utf8') so that type(d[0]) is unicode and not bytes as is currently the case if you don't want to store your arrays in 4 bytes per character. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion