Today I am using SgmlLinkExtractor with process_value to transform relative 
to absolute paths. 

rules = (
        Rule(SgmlLinkExtractor(tags="a",attrs="href",
*allow=r'#day=2013-12-24&Id=33'*, *process_value=my_process_value*),callback
='my_parser', follow=False,),
        )


    
 def my_process_value(value):
        print '---->'+value
        return 

When I run the spider I can see all response links, this is the output:

---->#Day=2013-12-24&Id=33
---->#Day=2013-12-24&Id=1269753
---->#Day=2013-12-24&Id=1269753
---->#Day=2013-12-24&Id=1269772
---->#Day=2013-12-24&Id=1269772

I want the first relative link only, , it's like allow param doesn't take 
effect. The output should be this

---->#Day=2013-12-24&Id=*33*

Do you know the reason ?


El martes, 24 de diciembre de 2013 23:31:43 UTC+1, Roberto López escribió:
>
> Hi.
>
> I have to extract and follow links like this:
>
> <a href="#date=2013-12-24&Id=1269282">Tynwald Titan</a>
>
> The next rule doesn't work, no links found
>
> rules = ( Rule(SgmlLinkExtractor(allow=r''), callback='parse_item', follow
> =True), )
>
> Do you know how I can do it ?
>
> Best regards
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to