Hi all, The API converts every thing to lowercase anyway, so no need for the extra effort. All punctuation will be converted into a pattern too, so no need to filter them out either.
Francisco On 12.09.2013, at 23:29, Marek Otahal wrote: > I did not yet read up all in this thread, so sorry if completely wrong.. How > about going all lowercase, and removing non-aplha(numeric) characters if > necessary? > > regards, breznak > > > On Thu, Aug 29, 2013 at 5:05 AM, James Tauber <[email protected]> wrote: > I pushed a Python 3 script to my repo that does a bunch of calculations. > > Here are the results of that script. Let me know what you'd like to see next. > I can already see one problem in the tokenization where 'No was not split. > > FILENAME BYTES TOKEN TYPE > ----------------------------------------------------- > 01_the_ugly_duckling.txt 3143 782 207 > 02_the_little_pine_tree.txt 1635 388 104 > 03_the_little_match_girl.txt 3065 701 218 > 04_little_red_riding_hood.txt 2168 509 159 > 05_the_apples_of_idun.txt 3923 934 244 > 06_how_thor_got_the_hammer.txt 5857 1373 318 > 07_the_hammer_lost_and_found.txt 4260 1010 258 > 08_the_story_of_the_sheep.txt 1265 304 129 > 09_the_good_ship_argo.txt 889 209 107 > 10_jason_and_the_harpies.txt 2187 495 173 > 11_the_brass_bulls.txt 3487 786 239 > 12_jason_and_the_dragon.txt 1867 427 180 > ----------------------------------------------------- > COLLECTION 33746 7918 882 > > Unique to 01_the_ugly_duckling.txt: > {'spring', 'hid', 'summer', 'dears', 'lake', 'swans', 'own', 'eggs', 'lay', > 'still', 'eating', 'pond', 'duckling', 'yard', 'Soon', 'egg', 'bug', 'cat', > 'bushes', 'does', 'those', 'fun', 'winter', 'duck', 'Ugly', 'lovely', > 'woman', 'hens', 'swim', 'While', 'swan', 'sang', 'nest', 'corner', 'bread', > 'Splash', 'because', 'mother', 'growl', 'ducks', 'An', 'Let', 'noise', 'hen', > 'ducklings', 'Only', 'Stay', 'Duckling'} > > _______________________________________________ > nupic mailing list > [email protected] > http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
_______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
