Re: From JoyceUlysses.txt -- words occurring exactly once
On 5/30/2024 4:03 PM, HenHanna via Python-list wrote: Given a text file of a novel (JoyceUlysses.txt) ... could someone give me a pretty fast (and simple) Python program that'd give me a list of all words occurring exactly once? -- Also, a list of words occurring once, twice or 3 times re: hyphenated words (you can treat it anyway you like) but ideally, i'd treat [editor-in-chief] [go-ahead] [pen-knife] [know-how] [far-fetched] ... as one unit. You will probably get a thousand different suggestions, but here's a fairly direct and readable one in Python: s1 = 'Is this word is the only word repeated in this string' counts = {} for w in s1.lower().split(): counts[w] = counts.get(w, 0) + 1 print(sorted(counts.items())) # [('in', 1), ('is', 2), ('only', 1), ('repeated', 1), ('string', 1), ('the', 1), ('this', 2), ('word', 2)] Of course you can adjust the definition of what constitutes a word, handle punctuation and so on, and tinker with the output format to suit yourself. You would replace s1.lower().split() with, e.g., my_custom_word_splitter(s1). -- https://mail.python.org/mailman/listinfo/python-list
Re: From JoyceUlysses.txt -- words occurring exactly once
HenHanna wrote at 2024-5-30 13:03 -0700: > >Given a text file of a novel (JoyceUlysses.txt) ... > >could someone give me a pretty fast (and simple) Python program that'd >give me a list of all words occurring exactly once? Your task can be split into several subtasks: * parse the text into words This depends on your notion of "word". In the simplest case, a word is any maximal sequence of non-whitespace characters. In this case, you can use `split` for this task * Make a list unique -- you can use `set` for this > -- Also, a list of words occurring once, twice or 3 times For this you count the number of occurrences in a `list`. You can use the `count` method of lists for this. All individual subtasks are simple. I am confident that you will be able to solve them by yourself (if you are willing to invest a bit of time). -- https://mail.python.org/mailman/listinfo/python-list
Re: From JoyceUlysses.txt -- words occurring exactly once
On 2024-05-31, Pieter van Oostrum via Python-list wrote: > HenHanna writes: > >> Given a text file of a novel (JoyceUlysses.txt) ... >> >> could someone give me a pretty fast (and simple) Python program that'd >> give me a list of all words occurring exactly once? >> >> -- Also, a list of words occurring once, twice or 3 times >> >> re: hyphenated words(you can treat it anyway you like) >> >>but ideally, i'd treat [editor-in-chief] >>[go-ahead] [pen-knife] >>[know-how] [far-fetched] ... >>as one unit. >> > > That is a famous Unix task : (Sorry, no Python) > > grep -o '\w*' JoyceUlysses.txt | sort | uniq -c | sort -n Yep, that's what came to my mind (though I couldn't remember the exact grep option without looking it up). However, I assume that doesn't get you very many points on a homework assignemnt from an "Intruction to Python" class. -- https://mail.python.org/mailman/listinfo/python-list
Lprint = ( Lisp-style printing ( of lists and strings (etc.) ) in Python )
;;; Pls tell me about little tricks you use in Python or Lisp. [('the', 36225), ('and', 17551), ('of', 16759), ('i', 16696), ('a', 15816), ('to', 15722), ('that', 11252), ('in', 10743), ('it', 10687)] ((the 36225) (and 17551) (of 16759) (i 16696) (a 15816) (to 15722) (that 11252) (in 10743) (it 10687)) i think the latter is easier-to-read, so i use this code (by Peter Norvig) def lispstr(exp): # "Convert a Python object back into a Lisp-readable string." if isinstance(exp, list): return '(' + ' '.join(map(lispstr, exp)) + ')' else: return str(exp) def Lprint(x): print(lispstr(x)) -- https://mail.python.org/mailman/listinfo/python-list
Re: From JoyceUlysses.txt -- words occurring exactly once
On 5/30/2024 2:18 PM, dn wrote: On 31/05/24 08:03, HenHanna via Python-list wrote: Given a text file of a novel (JoyceUlysses.txt) ... could someone give me a pretty fast (and simple) Python program that'd give me a list of all words occurring exactly once? -- Also, a list of words occurring once, twice or 3 times re: hyphenated words (you can treat it anyway you like) but ideally, i'd treat [editor-in-chief] [go-ahead] [pen-knife] [know-how] [far-fetched] ... as one unit. Split into words - defined as you will. Use Counter. Show some (of your) code and we'll be happy to critique... hard to decide what to do with hyphens and apostrophes (I'd, he's, can't, haven't, A's and B's) 2-step-Process 1. make a file listing all words (one word per line) 2. then, doing the counting. using from collections import Counter Related code (for 1) that i'd used before: Rfile = open("JoyceUlysses.txt", 'r') with open( 'Out.txt', 'w' ) as fo: for line in Rfile: line = line.rstrip() wLis = line.split() for w in wLis: if w != "": w = w.rstrip(";:,'\"[]()*&^%$#@!,./<>?_-+=") w = w.lstrip(";:,'\"[]()*&^%$#@!,./<>?_-+=") fo.write(w.lower()) fo.write('\n') -- https://mail.python.org/mailman/listinfo/python-list
Re: From JoyceUlysses.txt -- words occurring exactly once
HenHanna writes: > Given a text file of a novel (JoyceUlysses.txt) ... > > could someone give me a pretty fast (and simple) Python program that'd > give me a list of all words occurring exactly once? > > -- Also, a list of words occurring once, twice or 3 times > > > > re: hyphenated words(you can treat it anyway you like) > >but ideally, i'd treat [editor-in-chief] >[go-ahead] [pen-knife] >[know-how] [far-fetched] ... >as one unit. > That is a famous Unix task : (Sorry, no Python) grep -o '\w*' JoyceUlysses.txt | sort | uniq -c | sort -n -- Pieter van Oostrum www: http://pieter.vanoostrum.org/ PGP key: [8DAE142BE17999C4] -- https://mail.python.org/mailman/listinfo/python-list