Re: From JoyceUlysses.txt -- words occurring exactly once

2024-05-31 Thread Thomas Passin via Python-list

On 5/30/2024 4:03 PM, HenHanna via Python-list wrote:


Given a text file of a novel (JoyceUlysses.txt) ...

could someone give me a pretty fast (and simple) Python program that'd 
give me a list of all words occurring exactly once?


   -- Also, a list of words occurring once, twice or 3 times



re: hyphenated words    (you can treat it anyway you like)

    but ideally, i'd treat  [editor-in-chief]
    [go-ahead]  [pen-knife]
    [know-how]  [far-fetched] ...
    as one unit.


You will probably get a thousand different suggestions, but here's a 
fairly direct and readable one in Python:


s1 = 'Is this word is the only word repeated in this string'

counts = {}
for w in s1.lower().split():
counts[w] = counts.get(w, 0) + 1
print(sorted(counts.items()))
# [('in', 1), ('is', 2), ('only', 1), ('repeated', 1), ('string', 1), 
('the', 1), ('this', 2), ('word', 2)]


Of course you can adjust the definition of what constitutes a word, 
handle punctuation and so on, and tinker with the output format to suit 
yourself.  You would replace s1.lower().split() with, e.g., 
my_custom_word_splitter(s1).



--
https://mail.python.org/mailman/listinfo/python-list


Re: From JoyceUlysses.txt -- words occurring exactly once

2024-05-31 Thread Dieter Maurer via Python-list
HenHanna wrote at 2024-5-30 13:03 -0700:
>
>Given a text file of a novel (JoyceUlysses.txt) ...
>
>could someone give me a pretty fast (and simple) Python program that'd
>give me a list of all words occurring exactly once?

Your task can be split into several subtasks:
 * parse the text into words

   This depends on your notion of "word".
   In the simplest case, a word is any maximal sequence of non-whitespace
   characters. In this case, you can use `split` for this task

 * Make a list unique -- you can use `set` for this

>   -- Also, a list of words occurring once, twice or 3 times

For this you count the number of occurrences in a `list`.
You can use the `count` method of lists for this.

All individual subtasks are simple. I am confident that
you will be able to solve them by yourself (if you are willing
to invest a bit of time).
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: From JoyceUlysses.txt -- words occurring exactly once

2024-05-31 Thread Grant Edwards via Python-list
On 2024-05-31, Pieter van Oostrum via Python-list  
wrote:
> HenHanna  writes:
>
>> Given a text file of a novel (JoyceUlysses.txt) ...
>>
>> could someone give me a pretty fast (and simple) Python program that'd
>> give me a list of all words occurring exactly once?
>>
>>   -- Also, a list of words occurring once, twice or 3 times
>>
>> re: hyphenated words(you can treat it anyway you like)
>>
>>but ideally, i'd treat  [editor-in-chief]
>>[go-ahead]  [pen-knife]
>>[know-how]  [far-fetched] ...
>>as one unit.
>>
>
> That is a famous Unix task : (Sorry, no Python)
>
> grep -o '\w*' JoyceUlysses.txt | sort | uniq -c | sort -n

Yep, that's what came to my mind (though I couldn't remember the exact
grep option without looking it up).  However, I assume that doesn't
get you very many points on a homework assignemnt from an "Intruction
to Python" class.

-- 
https://mail.python.org/mailman/listinfo/python-list


Lprint = ( Lisp-style printing ( of lists and strings (etc.) ) in Python )

2024-05-31 Thread HenHanna via Python-list



 ;;;  Pls tell me about little tricks you use in Python or Lisp.


[('the', 36225), ('and', 17551), ('of', 16759), ('i', 16696), ('a', 
15816), ('to', 15722), ('that', 11252), ('in', 10743), ('it', 10687)]


((the 36225) (and 17551) (of 16759) (i 16696) (a 15816) (to 15722) (that 
11252) (in 10743) (it 10687))



i think the latter is easier-to-read, so i use this code
   (by Peter Norvig)

def lispstr(exp):
   # "Convert a Python object back into a Lisp-readable string."
if isinstance(exp, list):
return '(' + ' '.join(map(lispstr, exp)) + ')'
else:
return str(exp)

def Lprint(x): print(lispstr(x))
--
https://mail.python.org/mailman/listinfo/python-list


Re: From JoyceUlysses.txt -- words occurring exactly once

2024-05-31 Thread HenHanna via Python-list

On 5/30/2024 2:18 PM, dn wrote:

On 31/05/24 08:03, HenHanna via Python-list wrote:


Given a text file of a novel (JoyceUlysses.txt) ...

could someone give me a pretty fast (and simple) Python program that'd 
give me a list of all words occurring exactly once?


   -- Also, a list of words occurring once, twice or 3 times



re: hyphenated words    (you can treat it anyway you like)

    but ideally, i'd treat  [editor-in-chief]
    [go-ahead]  [pen-knife]
    [know-how]  [far-fetched] ...
    as one unit.





Split into words - defined as you will.
Use Counter.

Show some (of your) code and we'll be happy to critique...



hard to decide what to do with hyphens
   and apostrophes
 (I'd,  he's,  can't, haven't,  A's  and  B's)


2-step-Process

  1. make a file listing all words (one word per line)

  2.  then, doing the counting.  using
  from collections import Counter


Related code  (for 1)  that i'd used before:

 Rfile  = open("JoyceUlysses.txt", 'r')

 with open( 'Out.txt', 'w' ) as fo:
for line in Rfile:
line = line.rstrip()
wLis = line.split()
for w in wLis:
if w != "":
w = w.rstrip(";:,'\"[]()*&^%$#@!,./<>?_-+=")
w = w.lstrip(";:,'\"[]()*&^%$#@!,./<>?_-+=")
fo.write(w.lower())
fo.write('\n')

--
https://mail.python.org/mailman/listinfo/python-list


Re: From JoyceUlysses.txt -- words occurring exactly once

2024-05-31 Thread Pieter van Oostrum via Python-list
HenHanna  writes:

> Given a text file of a novel (JoyceUlysses.txt) ...
>
> could someone give me a pretty fast (and simple) Python program that'd
> give me a list of all words occurring exactly once?
>
>   -- Also, a list of words occurring once, twice or 3 times
>
>
>
> re: hyphenated words(you can treat it anyway you like)
>
>but ideally, i'd treat  [editor-in-chief]
>[go-ahead]  [pen-knife]
>[know-how]  [far-fetched] ...
>as one unit.
>

That is a famous Unix task : (Sorry, no Python)

grep -o '\w*' JoyceUlysses.txt | sort | uniq -c | sort -n


-- 
Pieter van Oostrum 
www: http://pieter.vanoostrum.org/
PGP key: [8DAE142BE17999C4]
-- 
https://mail.python.org/mailman/listinfo/python-list