Thanks for your inputs..
Finally i mangage to remove jsessionid with following :
<regex>
<pattern>;jsessionid=(.*)</pattern>
<substitution></substitution>
</regex>
Cheers,
cha
Espen Amble Kolstad-2 wrote:
>
> Hi,
>
> I've been using:
> <regex>
> <pattern>;jsessionid=[^=\?/\&]+$</pattern>
> <substitution></substitution>
> </regex>
> <regex>
> <pattern>;jsessionid=[^=\?/\&]+(\?.*)</pattern>
> <substitution>$1</substitution>
> </regex>
>
> which seems to work. I've never seen ?jsessionid= or &jsessionid=
> only ;jsessionid=
>
> - Espen
>
> On Friday 23 March 2007 06:43:03 cha wrote:
>> hi,
>>
>> am not able to remove jsessionid while i crawl the web.
>>
>> I have tried following
>>
>> <regex>
>> <pattern>(\?|\&|\&)jsessionid=[a-zA-Z0-9]{32}$</pattern>
>> <substitution></substitution>
>> </regex>
>> <regex>
>>
>>
>> <pattern>(\?|\&|\&)jsessionid=[a-zA-Z0-9]{32}(\&|\&)(.*)</pattern>
>> <substitution></substitution>
>> </regex>
>> <regex>
>>
>> am missing something.
>>
>> Cheers,
>> Cha
>
>
>
>
--
View this message in context:
http://www.nabble.com/removing-jsessionid-tf3451965.html#a9693940
Sent from the Nutch - User mailing list archive at Nabble.com.
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general