Hi Pandas require knows the encoding and delimiter previously when you use pd.read_csv(filepath, encoding=" ", delimiter=" ") I think that is the same 馃
El vie., 24 de julio de 2020 3:42 p. m., Jani Tiainen <[email protected]> escribi贸: > Hi, > > I highly can recommend to use pandas to read csv. It does pretty good job > to guess a lot of things without extra config. > > Of course it's one more extra dependency. > > > pe 24. hein盲k. 2020 klo 17.09 Ronaldo Mata <[email protected]> > kirjoitti: > >> Yes, I will try it. Anythin I will let you know >> >> El mi茅., 22 de julio de 2020 12:24 p. m., Liu Zheng < >> [email protected]> escribi贸: >> >>> Hi, >>> >>> Are you sure that the file used for detection is the same as the file >>> opened and decoded and gave you incorrect information? >>> >>> By the way, ascii is a proper subset of utf-8. If chardet said it ascii, >>> decoding it using utf-8 should always work. >>> >>> If your file contains non-ascii UTF-8 bytes, maybe it鈥檚 a bug in >>> chardet? You can try it directly, without mixing it with django鈥檚 requests >>> first. Make sure you can detect and decode the file locally in a test >>> program. Then put it into the app. >>> >>> If you share the file, i鈥檓 also glad to help you try it. >>> >>> On Thu, 23 Jul 2020 at 12:04 AM, Ronaldo Mata <[email protected]> >>> wrote: >>> >>>> Hi Kovy, this is not solved. Liu Zheng but using >>>> chardet(request.FILES['file'].read()) return encoding "ascii" is not >>>> correct, I've uploaded a file using utf-7 as encoding for example and the >>>> result is wrog. and then I tried >>>> request.FILES['file'].read().decode('ascii') and not work return bad data. >>>> Example for @ string return "+AEA-" string. >>>> >>>> El mi茅., 22 jul. 2020 a las 11:16, Kovy Jacob (<[email protected]>) >>>> escribi贸: >>>> >>>>> I鈥檓 confused. I don鈥檛 know if I can help. >>>>> >>>>> On Jul 22, 2020, at 11:11 AM, Liu Zheng <[email protected]> >>>>> wrote: >>>>> >>>>> Hi, glad you solved the problem. Yes, both the request.FILES[鈥榝ile鈥橾 >>>>> and the chardet file handler are binary handlers. Binary handler presents >>>>> the raw data. chardet takes a sequence or raw data and then detect the >>>>> encoding format. With its prediction, if you want to open that puece of >>>>> data in text mode, you can use the .decode(<encoding format>) method of >>>>> bytes object to get a python string. >>>>> >>>>> On Wed, 22 Jul 2020 at 11:04 PM, Kovy Jacob <[email protected]> >>>>> wrote: >>>>> >>>>>> That鈥檚 probably not the proper answer, but that鈥檚 the best I can do. >>>>>> Sorry :-( >>>>>> >>>>>> >>>>>> On Jul 22, 2020, at 10:46 AM, Ronaldo Mata <[email protected]> >>>>>> wrote: >>>>>> >>>>>> Yes, the problem here is that the files will be loaded by the user, >>>>>> so I don't know what delimiter I will receive. This is not a base command >>>>>> that I am using, it is the logic that I want to incorporate in a view >>>>>> >>>>>> El mi茅., 22 jul. 2020 a las 10:43, Kovy Jacob (<[email protected]>) >>>>>> escribi贸: >>>>>> >>>>>>> Ah, so is the problem that you don鈥檛 always know what the delimiter >>>>>>> is when you read it? If yes, what is the use case for this? You might >>>>>>> not >>>>>>> need a universal solution, maybe just put all the info into a csv >>>>>>> yourself, >>>>>>> manually. >>>>>>> >>>>>>> On Jul 22, 2020, at 10:39 AM, Ronaldo Mata <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>> Hi Kovy, I'm using csv module, but I need to handle the delimiters >>>>>>> of the files, sometimes you come separated by "," others by ";" and >>>>>>> rarely >>>>>>> by "|" >>>>>>> >>>>>>> El mi茅., 22 jul. 2020 a las 10:28, Kovy Jacob (<[email protected]>) >>>>>>> escribi贸: >>>>>>> >>>>>>>> Could you just use the standard python csv module? >>>>>>>> >>>>>>>> On Jul 22, 2020, at 10:25 AM, Ronaldo Mata <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>> Hi Liu thank for your answer. >>>>>>>> >>>>>>>> This has been a headache, I am trying to read the file using >>>>>>>> csv.DictReader initially i had an error trying to get the dict keys >>>>>>>> when >>>>>>>> iterating by rows, and i thought it could be encoding (for this reason >>>>>>>> i >>>>>>>> wanted to prepare the view to use the correct encoding). for that >>>>>>>> reason I >>>>>>>> asked my question. >>>>>>>> >>>>>>>> 1) your first approach doesn't work, if i send utf-8 file, chardet >>>>>>>> returns ascii as encoding. it seems request.FILES ['file']. read () >>>>>>>> returns >>>>>>>> a binary with that encoding. >>>>>>>> >>>>>>>> 2) In the end I realized that the problem was the delimiter of the >>>>>>>> csv but predicting it is another problem. >>>>>>>> >>>>>>>> Anyway, it was a task that I had to do and that was my >>>>>>>> limitation. I think there must be a library that does all this, >>>>>>>> uploading a >>>>>>>> csv file is common practice in many web apps. >>>>>>>> >>>>>>>> El mar., 21 jul. 2020 a las 13:47, Liu Zheng (< >>>>>>>> [email protected]>) escribi贸: >>>>>>>> >>>>>>>>> Hi. First of all, I think it's impossible to perfectly detect >>>>>>>>> encoding without further information. See the answer in this SO post: >>>>>>>>> https://stackoverflow.com/questions/436220/how-to-determine-the-encoding-of-text >>>>>>>>> There >>>>>>>>> are many packages and tools to help detect encoding format, but keep >>>>>>>>> in >>>>>>>>> mind that they are only giving educated guesses. (Most of the time, >>>>>>>>> the >>>>>>>>> guess is correct, but do check the dev page to see whether there are >>>>>>>>> known >>>>>>>>> issues related to your problem.) >>>>>>>>> >>>>>>>>> Now let's say you have decided to use chardet. Check its doc page >>>>>>>>> for the usage: >>>>>>>>> https://chardet.readthedocs.io/en/latest/usage.html#usage You'll >>>>>>>>> have more than one solutions. Here are some examples: >>>>>>>>> >>>>>>>>> 1. If the files uploaded to your server are all expected to be >>>>>>>>> small csv files (less than a few MB and not many users do it >>>>>>>>> concurrently), >>>>>>>>> you can do the following: >>>>>>>>> >>>>>>>>> #in the view to handle the uploaded file: (assume file input name >>>>>>>>> is just "file") >>>>>>>>> file_content = request.FILES['file'].read() >>>>>>>>> chardet.detect(file_content) >>>>>>>>> >>>>>>>>> 2. Also, chardet seems to support incremental (line-by-line) >>>>>>>>> detection >>>>>>>>> https://chardet.readthedocs.io/en/latest/usage.html#example-detecting-encoding-incrementally >>>>>>>>> >>>>>>>>> Given this, we can also read from requests.FILES line by line and >>>>>>>>> pass each line to chardet >>>>>>>>> >>>>>>>>> from chardet.universaldetector import UniversalDetector >>>>>>>>> >>>>>>>>> #somewhere in a view function >>>>>>>>> detector = UniversalDetector() >>>>>>>>> file_handle = request.FILES['file'] >>>>>>>>> for line in file_handle: >>>>>>>>> detector.feed(line) >>>>>>>>> if detector.done: break >>>>>>>>> detector.close() >>>>>>>>> # result available as a dict at detector.result >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tuesday, July 21, 2020 at 7:09:35 AM UTC+8, Ronaldo Mata wrote: >>>>>>>>>> >>>>>>>>>> How to deal with encoding when you try to read a csv file on view. >>>>>>>>>> >>>>>>>>>> I have a view to upload csv file, in this view I read file and >>>>>>>>>> save each row as new record. >>>>>>>>>> >>>>>>>>>> My bug is when I try to upload a csv file with a >>>>>>>>>> differente encoding (not UTF-8) >>>>>>>>>> >>>>>>>>>> how to handle this on django (using request.FILES) I was >>>>>>>>>> researching and I found chardet but I don't know how to pass it a >>>>>>>>>> request.FILES. I need help please. >>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> You received this message because you are subscribed to the Google >>>>>>>>> Groups "Django users" group. >>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>> send an email to [email protected]. >>>>>>>>> To view this discussion on the web visit >>>>>>>>> https://groups.google.com/d/msgid/django-users/64307441-0e65-45a2-b917-ece15a4ea729o%40googlegroups.com >>>>>>>>> <https://groups.google.com/d/msgid/django-users/64307441-0e65-45a2-b917-ece15a4ea729o%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>> . >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> You received this message because you are subscribed to the Google >>>>>>>> Groups "Django users" group. >>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>> send an email to [email protected]. >>>>>>>> To view this discussion on the web visit >>>>>>>> https://groups.google.com/d/msgid/django-users/CAP%3DoziQuZyb74Wsk%2BnjngUpSccOKCYRM_C%3D7KgGX%2BgV5wRzHwQ%40mail.gmail.com >>>>>>>> <https://groups.google.com/d/msgid/django-users/CAP%3DoziQuZyb74Wsk%2BnjngUpSccOKCYRM_C%3D7KgGX%2BgV5wRzHwQ%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>>>>> . >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> You received this message because you are subscribed to the Google >>>>>>>> Groups "Django users" group. >>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>> send an email to [email protected]. >>>>>>>> To view this discussion on the web visit >>>>>>>> https://groups.google.com/d/msgid/django-users/91E9FE01-4701-478C-B575-2BD5BA5DCE86%40gmail.com >>>>>>>> <https://groups.google.com/d/msgid/django-users/91E9FE01-4701-478C-B575-2BD5BA5DCE86%40gmail.com?utm_medium=email&utm_source=footer> >>>>>>>> . >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "Django users" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to [email protected]. >>>>>>> To view this discussion on the web visit >>>>>>> https://groups.google.com/d/msgid/django-users/CAP%3DoziSjnUSkWgHqb1RzsSHsUURLM9%3DPP0ZNX_zORkp3v-L1%2BQ%40mail.gmail.com >>>>>>> <https://groups.google.com/d/msgid/django-users/CAP%3DoziSjnUSkWgHqb1RzsSHsUURLM9%3DPP0ZNX_zORkp3v-L1%2BQ%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>>>> . >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "Django users" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to [email protected]. >>>>>>> To view this discussion on the web visit >>>>>>> https://groups.google.com/d/msgid/django-users/1471A9A8-8BFD-41B0-9AC4-2EA424F1F989%40gmail.com >>>>>>> <https://groups.google.com/d/msgid/django-users/1471A9A8-8BFD-41B0-9AC4-2EA424F1F989%40gmail.com?utm_medium=email&utm_source=footer> >>>>>>> . >>>>>>> >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "Django users" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/django-users/CAP%3DoziR%3DrkT%3DCHquc%3DOCB1WbmLFdGuJy0CWadM7bMs8-cGGPNw%40mail.gmail.com >>>>>> <https://groups.google.com/d/msgid/django-users/CAP%3DoziR%3DrkT%3DCHquc%3DOCB1WbmLFdGuJy0CWadM7bMs8-cGGPNw%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "Django users" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/django-users/1DD30686-3E37-4217-AC5A-F865A522F059%40gmail.com >>>>>> <https://groups.google.com/d/msgid/django-users/1DD30686-3E37-4217-AC5A-F865A522F059%40gmail.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>>> >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "Django users" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/django-users/CAGQ3pf-hZFLu6JpfTg7qj0jJ92v5br38z9Dx2m%3DkKwouiZZhFw%40mail.gmail.com >>>>> <https://groups.google.com/d/msgid/django-users/CAGQ3pf-hZFLu6JpfTg7qj0jJ92v5br38z9Dx2m%3DkKwouiZZhFw%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "Django users" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/django-users/73558DAD-CAE6-4275-A8F0-F3A7C47E1514%40gmail.com >>>>> <https://groups.google.com/d/msgid/django-users/73558DAD-CAE6-4275-A8F0-F3A7C47E1514%40gmail.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "Django users" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/django-users/CAP%3DoziSHnZFKiXON8b5Jn7hu7LVX-jHCOQ%2BHUSeiBO%3DF3Q_yxw%40mail.gmail.com >>>> <https://groups.google.com/d/msgid/django-users/CAP%3DoziSHnZFKiXON8b5Jn7hu7LVX-jHCOQ%2BHUSeiBO%3DF3Q_yxw%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Django users" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/django-users/CAGQ3pf-CsurYvoDYJvbqW9kTMQGMcu5XdJ2zJsp3zz5ZwFvT5g%40mail.gmail.com >>> <https://groups.google.com/d/msgid/django-users/CAGQ3pf-CsurYvoDYJvbqW9kTMQGMcu5XdJ2zJsp3zz5ZwFvT5g%40mail.gmail.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "Django users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/django-users/CAP%3DoziTNYmh37hvx0fJL0n5cK_4HBm3fBi5BZf%3D0cnrG3pzvmw%40mail.gmail.com >> <https://groups.google.com/d/msgid/django-users/CAP%3DoziTNYmh37hvx0fJL0n5cK_4HBm3fBi5BZf%3D0cnrG3pzvmw%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> > -- > You received this message because you are subscribed to the Google Groups > "Django users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/django-users/CAHn91offCbz%3DH_QH%3D60wpVVM6xHFPnSj4oFg4ZMOso5PS5SfzA%40mail.gmail.com > <https://groups.google.com/d/msgid/django-users/CAHn91offCbz%3DH_QH%3D60wpVVM6xHFPnSj4oFg4ZMOso5PS5SfzA%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "Django users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/CAP%3DoziRCr_GBFvfE-FWW3v%3Dd2CV_G3Lr1JwGc%2BYR40y69ufcyw%40mail.gmail.com.

