On Sat, Oct 25, 2014 at 12:47 PM, heba abukaff <habukaf...@yahoo.com.dmarc.invalid> wrote: > i have a trouble using the tokenizer to find the frequency list for URL using > arabic text.and iam using python 2.7.2 on winXP,I tried this code but every > time i run the code appears error with first line
I'm seeing two problems here. One of them may not actually be a problem in your code, but just in how you're posting: your text has all been rewrapped. Post the exact code, as plain text (not HTML); you should be able to do this, but if you can't with Yahoo, try a different email provider. Make sure we can see exactly where your code begins and ends, so we can understand what "first line" you're looking at - and if you copy and paste the actual error you get, that would be extremely helpful, too. (Even if it's in Arabic. There'll be parts we can understand.) The second problem is that you're trying to work with non-English text in Python 2.7. This is harder than it needs to be. Install the latest Python (3.4) and use that instead of 2.7; the NLTK module is compatible with 3.2+, so it should work fine. I can't be sure that you're having trouble with bytes vs strings, because I can't see what your code's doing (due to the wrap/indent problem), but in any case, shifting to Python 3 gives you a much better chance of getting things right. All you'll need to do, I suspect, is change your print statements into function calls: # Old style: print "word with highest count: %s" % (fd.max()) # New style: print("word with highest count: %s" % (fd.max())) Easy! And only slightly harder when you send it to a different destination: # Old style: print>>outfile, '%s\t%d' % (t, len(t)) # New style: print('%s\t%d' % (t, len(t)), file=outfile) With those changes, your code will probably (I can't test it) work on Python 3.4. ChrisA -- https://mail.python.org/mailman/listinfo/python-list