Re: Number of distinct substrings of a string [continuation]
My Py solution: = import psyco psyco.full() def foo(s): n = len(s) s = s + ' ' a = [[] for i in xrange(128)] ans = 0 for i in xrange(n - 1, -1, -1): lev = 0 for st in xrange(len(a[ord(s[i])]) - 1, -1, -1): j = a[ord(s[i])][st] if (n - j = lev): break if (s[j + lev] != s[i + lev]): continue if (s[j + lev / 2] != s[i + lev / 2]): continue k = 0 while (s[j + k] == s[i + k]): k += 1 if (k lev): lev = k a[ord(s[i])] += [i] ans += n - i - lev return ans from time import time t = time() import sys sys.stdin = open('D:/88.txt', 'rt') f = sys.stdin.read().split() sys.stdin.close() z = open('D:/99.txt', 'wt') for tc in range(int(f[0])): s = f[tc + 1] print z, foo(s) print z, time() - t z.close() -- http://mail.python.org/mailman/listinfo/python-list
Re: Number of distinct substrings of a string [continuation]
This worked out in 5.28s Imo it's not that *much* slower (of course, Psyco can't help here) === import itertools def subs(s): return len(set(itertools.chain( s[i:j] for i in xrange(len(s)) for j in xrange(i, len(s)+1 - 1 from time import time t = time() import sys sys.stdin = open('D:/88.txt', 'rt') f = sys.stdin.read().split() sys.stdin.close() z = open('D:/99.txt', 'wt') for tc in range(int(f[0])): s = f[tc + 1] print z, subs(s) print z, time() - t z.close() == -- http://mail.python.org/mailman/listinfo/python-list
Re: Number of distinct substrings of a string [continuation]
n00m: This worked out in 5.28s Imo it's not that *much* slower (of course, Psyco can't help here) There's no need of a chain here, so you can rewrite this: import itertools def subs(s): return len(set(itertools.chain( s[i:j] for i in xrange(len(s)) for j in xrange(i, len(s)+1 - 1 as: def subs(s): return len(set(s[i:j] for i in xrange(len(s)) for j in xrange(i, len(s)+1))) - 1 Psyco helps a little (about 10% on my PC) if you use it like this: from time import time import psyco; psyco.full() def main(): t = time() fin = open(input.txt) fout = open(output.txt, w) fin.next() for line in fin: r = set() s = line.rstrip() len_s = len(s) for i in xrange(len_s): for j in xrange(i, len_s + 1): r.add(s[i : j]) print fout, len(r) - 1 fout.close() print time() - t main() Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Number of distinct substrings of a string [continuation]
n00m: My Py solution: ... Cute. You can replace: a[ord(s[i])] += [i] With: a[ord(s[i])].append(i) If you want to micro-optimize this with Psyco may be a little faster: (lev 1) Than: lev // 2 But to increase speed it's usually better to reduce memory allocations. So you can try to find a way to pull this out of the function: a = [[] for i in xrange(128)] And to avoid a true append: a[ord(s[i])] += [i] (There are many ways to avoid an append. One of them is to have an already allocated list/array and just bump forward an index variable that starts from 0. This is worth doing especially when you use Psyco). Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Number of distinct substrings of a string [continuation]
n00m: my home tests proved Python is a fellow fast beast of C++. Quite unexpected (I expected Py would be by ~10 times slower). PS Both my codes are identical in their algorithms. = 0.016 0.0150001049042 --- exec times Maybe in your C++ code there's something that can be improved, this is a 1:1 translation to D (V.1) language (using dlibs) and it's about 2.2 times faster than the Psyco version: http://codepad.org/MQLj0ydB Using a smarter usage of memory (that is avoiding all or most memory allocations inside all loops), the performance difference will surely grow. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Number of distinct substrings of a string [continuation]
On Nov 29, 3:15 pm, Bearophile bearophileh...@lycos.com wrote: Maybe in your C++ code there's something that can be improved, this is a 1:1 translation to D (V.1) language (using dlibs) and it's about 2.2 times faster than the Psyco version:http://codepad.org/MQLj0ydB Using a smarter usage of memory (that is avoiding all or most memory allocations inside all loops), the performance difference will surely grow. Very interesting. Thanks. D code looks pretty neat. Btw D lang is among accepted langs there. Even if Py by 4x *slower* -- it's still a perfect Ok for it (C# will be much (much) slower than Python). Micro-optimizations. Of course, the best optimization would be to implement Suffix Tree: http://en.wikipedia.org/wiki/Trie Currently I hardly understand/know/read/etc its core idea. My algo is plainly stupid as soldier muddy boots. My C++ code: BEGIN #include stdlib.h #include stdio.h //#include string.h //#include ctype.h //#include math.h #include time.h #include iostream #include vector #include string using namespace std; int main() { clock_t start_time = clock(); freopen(88.txt, rt, stdin); freopen(99.txt, wt, stdout); int tcs; string s; cin tcs; while (tcs-- 0) { cin s; int n = s.size(); s = s + ' '; vector vectorint a(128); int ans = 0; for (int i = n - 1; i = 0; --i) { int lev = 0; for (int st = (int)a[s[i]].size() - 1; st = 0; --st) { int j = a[s[i]][st]; if (n - j = lev) break; if (s[j + lev] != s[i + lev]) continue; if (s[j + lev / 2] != s[i + lev / 2]) continue; int k = 0; while (s[j + k] == s[i + k]) ++k; if (k lev) lev = k; } a[s[i]].push_back(i); ans += n - i - lev; } cout ans endl; } cout (clock() - start_time) / CLOCKS_PER_SEC endl; return 0; } END -- http://mail.python.org/mailman/listinfo/python-list
Re: Number of distinct substrings of a string [continuation]
http://en.wikipedia.org/wiki/Suffix_tree Looks not very friendly appealing :-) -- http://mail.python.org/mailman/listinfo/python-list
Re: Number of distinct substrings of a string [continuation]
On Sun, Nov 29, 2009 at 9:10 AM, n00m n...@narod.ru wrote: Even if Py by 4x *slower* -- it's still a perfect Ok for it (C# will be much (much) slower than Python). How do you figure? As far as I know C# is many, many times faster than Python. (i was disappointed to find out that even IronPython is dozens of times slower than other .net langs) -- http://mail.python.org/mailman/listinfo/python-list
Re: Number of distinct substrings of a string [continuation]
Tested both my codes against a random string of length = 1. === from random import choice s = '' for i in xrange(1): s += choice(('a','b','c','d','e','f')) === C++: ~0.28s Python: ~0.48s PS I suspect that building of Suffix Tree would be a big exec.time-consuming overhead -- http://mail.python.org/mailman/listinfo/python-list
Re: Number of distinct substrings of a string [continuation]
n00m: I suspect that building of Suffix Tree would be a big exec.time-consuming overhead In C/D/C++ there are ways to allocate memory in smarter ways, using pools, arenas, stacks, freelists, etc. With that, using a compact encoding of tree nodes (there are many different tricks to reduce space used by a suffix tree), you can go fast enough. Probably a basic implementation too will suffice in many cases :-) Anyway, you may try a pure-Python2.x implementation: http://suffixtree.googlecode.com/files/suffixtree-0.1.py If you find time will to try to use that (or similar code) to improve your Python solution to the problem 99, you can show us the code here... Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Number of distinct substrings of a string [continuation]
On Nov 29, 11:43 pm, Bearophile bearophileh...@lycos.com wrote: Anyway, you may try a pure-Python2.x implementation:http://suffixtree.googlecode.com/files/suffixtree-0.1.py Ouch, Bearie... 214 lines of code are there. Ok.. I tested it. Firstly I replaced all print with pass##print (aiming to avoid intensive printing from inside of the code). Then I left in its final part only building of suffix trees. == ... ... ... from time import time t = time() import sys sys.stdin = open('D:/88.txt', 'rt') f = sys.stdin.read().split() sys.stdin.close() z = open('D:/99.txt', 'wt') for tc in range(int(f[0])): s = f[tc + 1] test_str = s + '$' POSITIVE_INFINITY = len(test_str) - 1 suffix_tree = SuffixTree(test_str) print z, 'len(s) =', len(s) print z, 'time =', time() - t z.close() Output: len(s) = 1000 len(s) = 1000 len(s) = 1 time = 0.64132425 0.64s 0.48s (of my algo) I.e. the code can't help in my very special and narrow case. But of course it is worthy to be studied and memoized. E.g.: from huge string s we built its Suffix Tree. Now we are given arbitrary string ss. Task: to find the largest its prefix occured in s, traversing its Suffix Tree. -- http://mail.python.org/mailman/listinfo/python-list