Dear all,

I wrote a simple sql querry to count co-occurrences between words but it performs very very slow on large datasets. So, it's time to do it with Perl. I need just a short tip to start out: which structure to use to count all possible occurrences between letters (e.g. A, B and C) under the particular document number. My dataset looks like following:

1 A
1 B
1 C
1 B
2 A
2 A
2 B
2 C
etc. till doc. number 100.000

The result file should than be similar to:
A B 4 ### 2 co-occurrences under doc. number 1 + 2 co-occurrences under doc. number 2 A C 3 ### 1 co-occurrence under doc. number 1 + 2 co-occurrences under doc. number 2 B C 3 ### 2 co-occurrences under doc. number 1 + 1 co-occurrence under doc. number 2

Thanks in advance for any pointers.

Best, Andrej




--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to