Hi all,

The default regex-normalize.xml currently strips out PHP session ids.

I'm wondering whether it would also make sense to remove anchor text from URLs. For example, currently these two URLs are treated as different:

<http://www.dina.kvl.dk/~sestoft/gcsharp/index.html#wordindex>http://www.dina.kvl.dk/~sestoft/gcsharp/index.html#wordindex

and

<http://www.dina.kvl.dk/~sestoft/gcsharp/index.html#wordindex>http://www.dina.kvl.dk/~sestoft/gcsharp/index.html

Is it safe to always strip # followed by (valid anchor characters) at the end of a URL?

Thanks,

-- Ken
--
Ken Krugler
Krugle, Inc.
+1 530-470-9200

Reply via email to