Thanks for your inputs..

Finally i mangage to remove  jsessionid with following :

<regex>
  <pattern>;jsessionid=(.*)</pattern>
  <substitution></substitution>
 </regex>

Cheers,
cha


Espen Amble Kolstad-2 wrote:
> 
> Hi,
> 
> I've been using:
> <regex>
>   <pattern>;jsessionid=[^=\?/\&amp;]+$</pattern>
>   <substitution></substitution>
> </regex>
> <regex>
>   <pattern>;jsessionid=[^=\?/\&amp;]+(\?.*)</pattern>
>   <substitution>$1</substitution>
> </regex>
> 
> which seems to work. I've never seen ?jsessionid= or &jsessionid= 
> only ;jsessionid=
> 
> - Espen
> 
> On Friday 23 March 2007 06:43:03 cha wrote:
>> hi,
>>
>> am not able to remove jsessionid while i crawl the web.
>>
>> I have tried following
>>
>>  <regex>
>>   <pattern>(\?|\&|\&amp;)jsessionid=[a-zA-Z0-9]{32}$</pattern>
>>   <substitution></substitution>
>>  </regex>
>>  <regex>
>>
>> 
>> <pattern>(\?|\&|\&amp;)jsessionid=[a-zA-Z0-9]{32}(\&|\&amp;)(.*)</pattern>
>>    <substitution></substitution>
>>  </regex>
>>  <regex>
>>
>> am missing something.
>>
>> Cheers,
>> Cha
> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/removing-jsessionid-tf3451965.html#a9693940
Sent from the Nutch - User mailing list archive at Nabble.com.


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to