Dear all,
I wrote a simple sql querry to count co-occurrences between words but it
performs very very slow on large datasets. So, it's time to do it with
Perl. I need just a short tip to start out: which structure to use to
count all possible occurrences between letters (e.g. A, B and C) under
the particular document number. My dataset looks like following:
1 A
1 B
1 C
1 B
2 A
2 A
2 B
2 C
etc. till doc. number 100.000
The result file should than be similar to:
A B 4 ### 2 co-occurrences under doc. number 1 + 2 co-occurrences
under doc. number 2
A C 3 ### 1 co-occurrence under doc. number 1 + 2 co-occurrences under
doc. number 2
B C 3 ### 2 co-occurrences under doc. number 1 + 1 co-occurrence under
doc. number 2
Thanks in advance for any pointers.
Best, Andrej
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/