Hala Gamal wrote: > thank you :)it worked well for small file but when i enter big file,, i > obtain this error: "Traceback (most recent call last): > File "D:\Python27\yarab (4).py", line 46, in <module> > writer.add_document(**doc) > File "build\bdist.win32\egg\whoosh\filedb\filewriting.py", line 369, in > add_document > items = field.index(value) > File "build\bdist.win32\egg\whoosh\fields.py", line 466, in index > return [(txt, 1, 1.0, '') for txt in self._tiers(num)] > File "build\bdist.win32\egg\whoosh\fields.py", line 454, in _tiers > yield self.to_text(num, shift=shift) > File "build\bdist.win32\egg\whoosh\fields.py", line 487, in to_text > return self._to_text(self.prepare_number(x), shift=shift, > File "build\bdist.win32\egg\whoosh\fields.py", line 476, in > prepare_number > x = self.type(x) > UnicodeEncodeError: 'decimal' codec can't encode characters in position > 0-4: invalid decimal Unicode string" i don't know realy where is the > problem? On Friday, February 22, 2013 4:55:22 PM UTC+2, Hala Gamal wrote: >> my code works well with english file but when i use text file >> encodede"utf-8" "my file contain some arabic letters" it doesn't work.
I guess that one of the fields you require to be NUMERIC contains non-digit characters. Replace the line >> writer.add_document(**doc) with something similar to try: writer.add_document(**doc) except UnicodeEncodeError: print "Skipping malformed line", repr(i) This will allow you to inspect the lines your script cannot handle and if they are indeed "malformed" as I am guessing you can fix your input data. i is a terrible name for a line in a file, btw. Also, you should avoid readlines() which reads the whole file into memory and instead iterate over the file object directly: with codecs.open("tt.txt", encoding='utf-8-sig') as textfile: for line in textfile: # no readlines(), can handle # text files of arbitrary size ... -- http://mail.python.org/mailman/listinfo/python-list