Thanks for update Pablo, this will help fix the issue faster.

On Fri, May 11, 2018, 4:18 PM Pablo Castro <[email protected]>
wrote:

> UPDATE
>
> I was able to get the spanish OCR working by simply deleting the
> mayan-edms docker container and running it again, this successfully
> installed tesseract-ocr-spa.deb
>
>
>
> On Friday, 11 May 2018 12:37:39 UTC-5, Pablo Castro wrote:
>>
>> Hello,
>>
>> I installed Mayan with the following guide:
>> https://www.mayan-edms.com/post/deploy-mayan-docker-mysql/
>>
>> Which means I have 2 docker containers with Mayan-EDMS and MySQL running
>> in an Ubuntu box.
>>
>> I tried the OCR function but was getting the following error in the OCR
>> errors log:
>>
>> (1366, "Incorrect string value: '\\xEF\\xAC\\x81\\x0A21...' for column
>> 'content' at row 1")
>>
>> Tried with a different document and got a similar error:
>>
>> (1366, "Incorrect string value: '\\xEF\\xAC\\x81eio...' for column
>> 'content' at row 1")
>>
>> I assumed it was because the documents were being uploaded with "English"
>> as the document language, so I changed the default document language as
>> follows:
>>
>>
>> I modified the local.py file under
>> var/lib/docker/volumes/mayan_data/_data/settings and added the following
>> lines:
>>
>> DOCUMENTS_LANGUAGE_CHOICES = (('deu', 'Deutsch'),('eng', 'English'), (
>> 'spa', 'Spanish'))
>> DOCUMENTS_LANGUAGE = 'spa'
>>
>> This worked fine and now the default language when adding a new document
>> is Spanish and the list contains just spanish, english and german.
>>
>> Afterwards, I modified the envfile to install the spansh tesseract package
>>
>> # MySQL container
>> MYSQL_ROOT_PASSWORD=********
>> MYSQL_PASSWORD=*********
>> MYSQL_DATABASE=mayan_db
>> MYSQL_USER=mayan_user
>>
>> # Mayan container
>> MAYAN_DATABASE_DRIVER=django.db.backends.mysql
>> MAYAN_DATABASE_NAME=mayan_db
>> MAYAN_DATABASE_USER=mayan_user
>> MAYAN_DATABASE_PASSWORD=********
>> MAYAN_DATABASE_HOST=mayan-mysql
>> MAYAN_DATABASE_PORT=3306
>> MAYAN_APT_INSTALLS=libsasl2-dev python-dev libldap2-dev libssl-dev
>> *tesseract-ocr-spa*
>> MAYAN_PIP_INSTALLS=python-ldap==2.4.41 django-auth-ldap==1.2.14
>>
>> I assumed this should be enough for OCR to be working in spanish, so I
>> restarted the docker container and uploaded a document for OCR
>>
>> OCR is still not working, and there's no error log under the OCR errors
>> tool.
>>
>> I checked the docker logs for the mayan-edms container and found this:
>>
>> Error opening data file /usr/share/tesseract-ocr/tessdata/spa.traineddata
>> Please make sure the TESSDATA_PREFIX environment variable is set to the
>> parent directory of your "tessdata" directory.
>> Failed loading language 'spa'
>> Tesseract couldn't load any languages!
>> [2018-05-11 16:55:37,489: ERROR/MainProcess] Task
>> ocr.tasks.task_do_ocr[fb11d940-faaa-4d51-8eb1-a20227ced574] raised
>> unexpected: WorkerLostError('Worker exited prematurely: signal 11
>> (SIGSEGV).',)
>> Traceback (most recent call last):
>>   File "/usr/local/lib/python2.7/dist-packages/billiard/pool.py", line
>> 1175, in mark_as_worker_lost
>>     human_status(exitcode)),
>> WorkerLostError: Worker exited prematurely: signal 11 (SIGSEGV).
>>
>>
>> Has anyone experienced something similar? I am still searching for ways
>> to modify the TESSDATA_PREFIX environment variable but my experience with
>> docker is limited.
>>
>> Any help is appreciated.
>>
>>
>> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "Mayan EDMS" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to