Re: [Mayan EDMS: 2463] Re: Error with OCR in Spanish - Mayan 2.7

Roberto Rosario Fri, 11 May 2018 13:35:13 -0700

Thanks for update Pablo, this will help fix the issue faster.

On Fri, May 11, 2018, 4:18 PM Pablo Castro <pabloscastro1...@gmail.com>
wrote:


> UPDATE
>
> I was able to get the spanish OCR working by simply deleting the
> mayan-edms docker container and running it again, this successfully
> installed tesseract-ocr-spa.deb
>
>
>
> On Friday, 11 May 2018 12:37:39 UTC-5, Pablo Castro wrote:
>>
>> Hello,
>>
>> I installed Mayan with the following guide:
>> https://www.mayan-edms.com/post/deploy-mayan-docker-mysql/
>>
>> Which means I have 2 docker containers with Mayan-EDMS and MySQL running
>> in an Ubuntu box.
>>
>> I tried the OCR function but was getting the following error in the OCR
>> errors log:
>>
>> (1366, "Incorrect string value: '\\xEF\\xAC\\x81\\x0A21...' for column
>> 'content' at row 1")
>>
>> Tried with a different document and got a similar error:
>>
>> (1366, "Incorrect string value: '\\xEF\\xAC\\x81eio...' for column
>> 'content' at row 1")
>>
>> I assumed it was because the documents were being uploaded with "English"
>> as the document language, so I changed the default document language as
>> follows:
>>
>>
>> I modified the local.py file under
>> var/lib/docker/volumes/mayan_data/_data/settings and added the following
>> lines:
>>
>> DOCUMENTS_LANGUAGE_CHOICES = (('deu', 'Deutsch'),('eng', 'English'), (
>> 'spa', 'Spanish'))
>> DOCUMENTS_LANGUAGE = 'spa'
>>
>> This worked fine and now the default language when adding a new document
>> is Spanish and the list contains just spanish, english and german.
>>
>> Afterwards, I modified the envfile to install the spansh tesseract package
>>
>> # MySQL container
>> MYSQL_ROOT_PASSWORD=********
>> MYSQL_PASSWORD=*********
>> MYSQL_DATABASE=mayan_db
>> MYSQL_USER=mayan_user
>>
>> # Mayan container
>> MAYAN_DATABASE_DRIVER=django.db.backends.mysql
>> MAYAN_DATABASE_NAME=mayan_db
>> MAYAN_DATABASE_USER=mayan_user
>> MAYAN_DATABASE_PASSWORD=********
>> MAYAN_DATABASE_HOST=mayan-mysql
>> MAYAN_DATABASE_PORT=3306
>> MAYAN_APT_INSTALLS=libsasl2-dev python-dev libldap2-dev libssl-dev
>> *tesseract-ocr-spa*
>> MAYAN_PIP_INSTALLS=python-ldap==2.4.41 django-auth-ldap==1.2.14
>>
>> I assumed this should be enough for OCR to be working in spanish, so I
>> restarted the docker container and uploaded a document for OCR
>>
>> OCR is still not working, and there's no error log under the OCR errors
>> tool.
>>
>> I checked the docker logs for the mayan-edms container and found this:
>>
>> Error opening data file /usr/share/tesseract-ocr/tessdata/spa.traineddata
>> Please make sure the TESSDATA_PREFIX environment variable is set to the
>> parent directory of your "tessdata" directory.
>> Failed loading language 'spa'
>> Tesseract couldn't load any languages!
>> [2018-05-11 16:55:37,489: ERROR/MainProcess] Task
>> ocr.tasks.task_do_ocr[fb11d940-faaa-4d51-8eb1-a20227ced574] raised
>> unexpected: WorkerLostError('Worker exited prematurely: signal 11
>> (SIGSEGV).',)
>> Traceback (most recent call last):
>>   File "/usr/local/lib/python2.7/dist-packages/billiard/pool.py", line
>> 1175, in mark_as_worker_lost
>>     human_status(exitcode)),
>> WorkerLostError: Worker exited prematurely: signal 11 (SIGSEGV).
>>
>>
>> Has anyone experienced something similar? I am still searching for ways
>> to modify the TESSDATA_PREFIX environment variable but my experience with
>> docker is limited.
>>
>> Any help is appreciated.
>>
>>
>> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "Mayan EDMS" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to mayan-edms+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mayan-edms+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [Mayan EDMS: 2463] Re: Error with OCR in Spanish - Mayan 2.7

Reply via email to