What rate are these queries arriving? Are they all from the same IP
address? Are they scanning your site or hitting one or two pages over and
over? One request is not useful.

Assuming this is just a poorly behaved bot and not a DOS attack, the
simplest solution is to install a servlet filter at the top of your stack.
Inspect the request and if you don't like something about it (IP address,
user agent, etc) and return a blank page (or goatse, or whatever). Short of
a real DDOS, this will convert your expensive 1100ms requests into
almost-free <10ms requests and mitigate the issue.

Jeff

On Wed, Apr 22, 2015 at 11:42 PM, Ashutosh Mishra <
ashutosh.narayan3...@gmail.com> wrote:

> Hi Jeff,
>
> Thanks for your harsh word suggestion, Please have a look to attached log
> file snap shot you can have IP of the ahref bot, it use to regularly coming
> on my site. I think this real information will be enough, so please let me
> know concrete solution to overcome this problem.
>
> On Thursday, April 23, 2015 at 11:43:08 AM UTC+5:30, Jeff Schnitzer wrote:
>>
>> I'm calling bullshit.
>>
>> You have a website developed on GAE/Java but you don't understand what
>> .htaccess is and why it doesn't apply? If you're having problems with your
>> website, why don't you ask the people who developed it? I don't get it. The
>> advice you have been offered here (all of which is reasonable) requires
>> more technical sophistication than you exhibit.
>>
>> Possibly this is a bot doing normal things. Possibly this is a real DOS
>> attack of some kind. Post some real information like IP addresses and the
>> actual rate of requests, and maybe we can help you with an appropriate
>> mitigation strategy.
>>
>> You have said a bunch of technically dumb things with an accusatory tone
>> of voice (spam bots are attacking me!). This happens a lot, and usually it
>> means _you_ just screwed something up. If you want help, post more
>> information and be less arrogant about it. You don't know what you think
>> you know.
>>
>> Jeff
>>
>> On Wed, Apr 22, 2015 at 7:08 AM, Barry Hunter <barryb...@gmail.com>
>> wrote:
>>
>>> Have you cross checked the IP(s) of the bot?
>>>
>>> The User-Agent is easily spoofed, it might be some other bot just
>>> pertending to be a ahrefbot.
>>>
>>>
>>> Regardless, as already mentioned can put handlers in your code to 'trap'
>>> bad actors. Check the useagent, and do something different. (can't totally
>>> block this way, but can minimise damage -make the requests very
>>> quick/short. And by not returning further links, stop them finding yet more
>>> pages to index).
>>>
>>> ... or use an external service to 'firewall' such requests - as already
>>> mentioned Cloudflare offer this.
>>>
>>>
>>>
>>>
>>> On 22 April 2015 at 15:02, Ashutosh Mishra <ashutosh.n...@gmail.com>
>>> wrote:
>>>
>>>> Hi Vinny,
>>>>
>>>> thanks for your comment I have done the changes in
>>>> myhotelcar.com/bobots.txt file as you have mentioned but issue is
>>>> still not resolved as per my analysis the bots hiiting specifically ahref
>>>> has increased day by day an now issue seems critical.
>>>>
>>>> Please hep me to get out of this situation. I will happy to have your
>>>> advice on this.
>>>>
>>>> On Tuesday, April 21, 2015 at 10:07:28 PM UTC+5:30, Vinny P wrote:
>>>>>
>>>>> On Mon, Apr 20, 2015 at 11:32 PM, Ashutosh Mishra <
>>>>> ashutosh.n...@gmail.com> wrote:
>>>>>>
>>>>>> I have also searched so many thing and I found the Ahref bot doesn't
>>>>>> obey robots principal.
>>>>>> Many people has suggested that I can prohibit them via htaccess file,
>>>>>> I don't want to use that way as in google app engine hosting I didn't 
>>>>>> find
>>>>>> htaccess file. So please provide me any way to filter out these spam 
>>>>>> bots.
>>>>>>
>>>>>
>>>>>
>>>>> The .htaccess file isn't supported in App Engine.
>>>>>
>>>>> If this is the real Ahref bot, it should support robots.txt. I looked
>>>>> in your robots.txt file: I see you disallowing Baidu, Yandex and a 
>>>>> wildcard
>>>>> disallow, but not specifically ahrefbot. Try adding the following to your
>>>>> robots file:
>>>>>
>>>>> *user-agent: AhrefsBot*
>>>>> *disallow: /*
>>>>>
>>>>> According to the ahrefbot robot page, you can also email them directly
>>>>> to ask them to stop; see https://ahrefs.com/robot
>>>>>
>>>>>
>>>>> On Mon, Apr 20, 2015 at 11:36 PM, Ashutosh Mishra <
>>>>> ashutosh.n...@gmail.com> wrote:
>>>>>
>>>>>> I think you have picked the issue correctly they are hitting
>>>>>> particular set of pages regularly hotel pages which were dynamically
>>>>>> generated, you are correct about rss and sitemap feed.
>>>>>> So please tell me the way to overcome this issue as these spam bots
>>>>>> specially ahref bot is consuming my server bandwidth a lot 
>>>>>> un-necessarily.
>>>>>> I want a good solution so that I will not face any spam bot hurdle in
>>>>>> future.
>>>>>>
>>>>>
>>>>>
>>>>> This happens to a lot of websites with a large set of dynamically
>>>>> generated pages.
>>>>>
>>>>> Honestly the best solution would be to sign up for Cloudflare (
>>>>> https://www.cloudflare.com/google ) and use their tools to help
>>>>> filter incoming traffic. You can also do what Barry suggested earlier, and
>>>>> start blocking the IPs that ahrefsbot is using.
>>>>>
>>>>> If you're willing to do some coding, you can write a filter into your
>>>>> application to check for the useragent and kick back a 429 HTTP status 
>>>>> code
>>>>> (Too Many Requests) if traffic is too high:
>>>>> http://tools.ietf.org/html/rfc6585#page-3
>>>>>
>>>>>
>>>>>
>>>>> -----------------
>>>>> -Vinny P
>>>>> Technology & Media Consultant
>>>>> Chicago, IL
>>>>>
>>>>> App Engine Code Samples: http://www.learntogoogleit.com
>>>>>
>>>>>  --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Google App Engine" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to google-appengi...@googlegroups.com.
>>>> To post to this group, send email to google-a...@googlegroups.com.
>>>> Visit this group at http://groups.google.com/group/google-appengine.
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/google-appengine/181d93e6-b9e8-40e6-8a24-d883a2e315f8%40googlegroups.com
>>>> <https://groups.google.com/d/msgid/google-appengine/181d93e6-b9e8-40e6-8a24-d883a2e315f8%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "Google App Engine" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to google-appengi...@googlegroups.com.
>>> To post to this group, send email to google-a...@googlegroups.com.
>>> Visit this group at http://groups.google.com/group/google-appengine.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/google-appengine/CAJCAUuJEnSsZOHVRLL6Z_zGjHtNVV_xgAVh_a5NR2_oRSo1DxQ%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/google-appengine/CAJCAUuJEnSsZOHVRLL6Z_zGjHtNVV_xgAVh_a5NR2_oRSo1DxQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to google-appengine+unsubscr...@googlegroups.com.
> To post to this group, send email to google-appengine@googlegroups.com.
> Visit this group at http://groups.google.com/group/google-appengine.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/google-appengine/5f5fd6c9-82a2-416d-8935-45ac4afd1b47%40googlegroups.com
> <https://groups.google.com/d/msgid/google-appengine/5f5fd6c9-82a2-416d-8935-45ac4afd1b47%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/CADK-0ugvwzeyMsg-4AC4VK2W0_2mffCFR-cJM3E_J0fvMii74A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to