Hi all,
The default regex-normalize.xml currently strips out PHP session ids.
I'm wondering whether it would also make sense to remove anchor text
from URLs. For example, currently these two URLs are treated as
different:
<http://www.dina.kvl.dk/~sestoft/gcsharp/index.html#wordindex>http://www.dina.kvl.dk/~sestoft/gcsharp/index.html#wordindex
and
<http://www.dina.kvl.dk/~sestoft/gcsharp/index.html#wordindex>http://www.dina.kvl.dk/~sestoft/gcsharp/index.html
Is it safe to always strip # followed by (valid anchor characters) at
the end of a URL?
Thanks,
-- Ken
--
Ken Krugler
Krugle, Inc.
+1 530-470-9200