You may want to consider a different package or python function.
Here is a way to extract only the text from a docx without using python-docx:
http://etienned.github.io/posts/extract-text-from-word-docx-simply/

There is also another script that appears to be more accurate by taking headers 
and footers into account.
https://github.com/ankushshah89/python-docx2txt

I am interested in your question because I am interested in examining Word 
documents through automation as well.  My database used to be in Microsoft 
Access before I moved it to Django/PostgreSQL, so I’m quite familiar with VBA 
and modifying Word documents through VBA in the COM interface.

I would prefer pywin32 and following this tutorial:
http://new.galalaly.me/2011/09/use-python-to-parse-microsoft-word-documents-using-pywin32-library/

And in VBA, I would be able to use this command to get the Word Count:
Word.ActiveDocument.Words.Count

It’s just a matter of converting that into Python.
I hope you find this helpful.


From: django-users@googlegroups.com [mailto:django-users@googlegroups.com] On 
Behalf Of agoulouzego...@gmail.com
Sent: Tuesday, February 14, 2017 11:35 PM
To: Django users
Subject: counting words in word document works in python but not in django

Hello everyone,

I have been struggling with a function for days. The following function opens a 
word document, count the number of words in it and returns it. but it just 
doesn't work in my django app. please help me

import os
import re
import docx

from docx import Document

cwd = os.getcwd()  # Get the current working directory (cwd)
files = os.listdir(cwd)  # Get all the files in that directory
# print("Files in '%s': %s" % (cwd, files))

def doc_size(str):
    mon_fichier = open(str, "rb")
    document = Document(mon_fichier)

    compteur = 0
    for para in document.paragraphs:
        content= para.text
        countage= len(content.split())
        compteur = compteur + countage
    return compteur

fichier = "essai.docx"
print(doc_size(fichier))

Here starts my django app
my urls

from django.conf.urls import url, include
from django.contrib import admin
from django.conf.urls.static import static
from django.conf import settings
from . import views
from django.views.generic import TemplateView

urlpatterns = [
    url(r'^$', views.home, name="home"),
    url(r'^soumettre$',TemplateView.as_view(template_name = 
'francais/soumettre.html'), name='soumettre'),
    url(r'^saved$', views.newSubmit, name="saved"),
    url(r'^About$', views.aboutUs, name="about"),
    url(r'^login$',views.formView, name="login"),
    url(r'^loggedin$', views.login, name= "loggedin"),


]


Here is my model. It is a model to upload and save a file

class SubmitDoc(models.Model):
    firstName = models.CharField(max_length =100)
    lastName = models.CharField(max_length =100)
    email = models.EmailField()
    uploadDoc = models.FileField(upload_to="documents/%Y/%m/%d/")
   #
    comment = models.TextField()
    date = models.DateTimeField(auto_now_add=True, verbose_name = "Date of 
creation")

    def obtain_text(self):
        compteur =0
        with open(self.uploadDoc) as data:
            document = Document(data)
            for para in document.paragraphs:
                content = para.text
                countage= len(content.split())
                compteur=countage + compteur

            return compteur

here is the form that goes with the model



class SubmitDocForm(forms.Form):
    firstName = forms.CharField()
    lastName = forms.CharField()
    email = forms.EmailField()
    uploadDoc = forms.FileField()
#     CHOICES = (("Document Professionel", 'Document Professionel'), ("Document 
Scolaire", 'Document Scolaire'))
#     like = forms.TypedChoiceField(choices=CHOICES, widget=forms.RadioSelect)
    comment = forms.CharField(widget = forms.Textarea)


here is the template used to upload the file


{% extends "base.html" %}

{% block content%}

  <div class="w3-container w3-orange">
  <h2>Document Submission Form</h2>
 </div>
 <form name = "form" enctype = "multipart/form-data" class="w3-container" 
action = "{% url "saved" %}" method = "POST" >
   {% csrf_token %}

  <p>
  <input class="w3-input" type="text" name="firstName">
  <label>First Name</label></p>

  <p>
  <input class="w3-input" type="text" name="lastName">
  <label>Last Name</label></p>

  <p>
  <input class="w3-input" type="text" name="email">
  <label>Email</label></p>
  <p>

  <p>  </p>

  <br>
  <br>

  <div class="w3-container ">

   <input type="file" name="uploadDoc" id="uploadDoc">
   <label class="w3-label">load File here</label>

   </div>
   <br>
   <br>

  <div class="w3-row-padding">
  <div class="w3-third">
  <input class="w3-radio" type="radio" name="like" value="Document Scolaire">
  <label class="w3-validate">Document Scolaire</label>
  </div>
  <div class="w3-third">
  <input class="w3-radio w3-half" type="radio" name="like" value="Document 
Professionel">
  <label class="w3-validate">Document Professionnel</label>
  </div>
  </div>


   <div class="w3-container ">

   <textarea class="w3-input" name="comment" required></textarea>
            <label class="w3-label">Comments</label>

   </div>

 </p>



 <p style="text-align:center">
  <button class="w3-btn w3-orange" >Submit</button></p>

</form>

{% endblock%}


Here is my view


from django.shortcuts import render, render_to_response
from django.shortcuts import HttpResponse, HttpResponseRedirect
from .forms import SubmitDocForm, LoginForm
from .models import Login, SubmitDoc
from django.template import context, RequestContext
from _datetime import date
from datetime import datetime
import os
import re
import docx
from docx import Document
from django.http.response import HttpResponseRedirect
import logging


def newSubmit(request):
    save= False
    form = SubmitDocForm(request.POST, request.FILES)
    com =0
    compteur=""
    message ="bon"
    full =" "
    if form.is_valid():
        submitDoc = SubmitDoc()
        submitDoc.firstName = form.cleaned_data['firstName']
        submitDoc.lastName = form.cleaned_data['lastName']
        submitDoc.email = form.cleaned_data['email']
        submitDoc.uploadDoc = form.cleaned_data['uploadDoc']
        submitDoc.like = form.cleaned_data["like"]
        submitDoc.comment = form.cleaned_data['comment']
        submitDoc.save()
        data = request.FILES["uploadDoc"]

        save = True
        compteur = doc_size_handle(data)

    else:
        form = SubmitDocForm()

    return render(request, 'francais/saved.html', {"form" : form, "compteur": 
compteur, "save": save,
                                                   "com": com, "message" : 
message, "object": object})




def doc_size_handle(f):
    mon_fichier = f.open("rb")
    document = Document(mon_fichier)
    content =""
    compteur = 0
    for para in document.paragraphs: # I FEEL LIKE THERE IS A PROBLEM WITH THIS 
LOOP BUT WHY?
        content = para.text
        countage= len(content.split())
        compteur = compteur + countage
    f.close()

    return content


HERE IS THE TEMPLATE MEANT TO DISPLAY THE NUMBER OF WORDS IN THE FILE. However, 
this "compteur" variable does not get updated, please help me



      {% if save %}

         <strong>
                             your document has been received. We will review it 
and get back to you.

                             <p> current directory  </p>
                             <p> size: {{compteur}} </p>
                             <p>  </p>
                             <p> </p>


                             </strong>

      {% endif %}

      {% if not save %}
         <strong>Your document was not received. Please try again with the 
appropriate format.</strong>
      {% endif %}






--
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
django-users+unsubscr...@googlegroups.com<mailto:django-users+unsubscr...@googlegroups.com>.
To post to this group, send email to 
django-users@googlegroups.com<mailto:django-users@googlegroups.com>.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/7576d9c9-f759-40b2-b0c9-a636a9c4f35f%40googlegroups.com<https://groups.google.com/d/msgid/django-users/7576d9c9-f759-40b2-b0c9-a636a9c4f35f%40googlegroups.com?utm_medium=email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/d94d46b926f4403c8e5159a63e474ebe%40ISS1.ISS.LOCAL.
For more options, visit https://groups.google.com/d/optout.

Reply via email to