On Tue, 12 May 2020 17:41:09 +0200
Peter Kovacs <pe...@apache.org> wrote:

> Okay, I had a short debug session with Dave and Humbedooh.
> 
> We are now sure that the crawlers are not blocked. The 301 Response 
> comes from the fact that Yandex still defaults to http and not https.


This post on User Forum might be relevant
https://forum.openoffice.org/en/forum/viewtopic.php?f=50&t=102021#p492756

Rory
> 
> After I added https toi the URL all worked fine.
> 
> Wave did also do a curl request which also worked fine.
> 
> 
> We have agreed now that I play the ball back to google, with the 
> feedback that this looks like a Google internal issue.
> 
> The Robot.txt has not been changed for 11 years. Yandex can crawl the 
> URL and we can curl the Webpage. So we think it is an Google Issue.
> 
> 
> I very much appreciated the quick session. Thanks.
> 
> 
> all the Best
> 
> Peter
> 
> Am 12.05.20 um 17:24 schrieb Dave Fisher:
> > It’s not an IP Ban. Infra tells me that would not be a 301.
> >
> > Ah-ha - here is the 301:
> >
> > % curl -D headers http://forum.openoffice.org/
> > <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
> > <html><head>
> > <title>301 Moved Permanently</title>
> > </head><body>
> > <h1>Moved Permanently</h1>
> > <p>The document has moved <a 
> > href="https://forum.openoffice.org/";>here</a>.</p>
> > </body></html>
> >
> > Surprising that they cannot shift from HTTP to HTTPS via a 301!
> >
> > Regards,
> > Dave
> >
> >> On May 12, 2020, at 8:04 AM, Dave Fisher <w...@apache.org> wrote:
> >>
> >> Information about Infra IP Bans is here: 
> >> https://infra.apache.org/infra-ban.html
> >>
> >> Please direct the Google engineer to that resource.
> >>
> >> Regards,
> >> Dave
> >>
> >>> On May 12, 2020, at 7:55 AM, Dave Fisher <w...@apache.org> wrote:
> >>>
> >>> Are you sure you weren’t using forums.openoffice.org instead of 
> >>> forum.openoffice.org?
> >>>
> >>> curl -D headers https://forum.openoffice.org/ does return the correct 
> >>> page.
> >>>
> >>> The robots.txt is this:
> >>>
> >>> curl -D headers https://forum.openoffice.org/robots.txt
> >>> User-agent: *
> >>> Crawl-delay: 1
> >>> Disallow: /en/forum/common.php
> >>> Disallow: /en/forum/config.php
> >>> Disallow: /en/forum/con.php
> >>> Disallow: /en/forum/faq.php
> >>> Disallow: /en/forum/mcp.php
> >>> Disallow: /en/forum/memberlist.php
> >>> Disallow: /en/forum/posting.php
> >>> Disallow: /en/forum/report.php
> >>> Disallow: /en/forum/search.php
> >>> Disallow: /en/forum/style.php
> >>> Disallow: /en/forum/ucp.php
> >>> Disallow: /en/forum/viewonline.php
> >>> Disallow: /en/forum/adm
> >>> Disallow: /en/forum/cache
> >>> Disallow: /en/forum/docs
> >>> Disallow: /en/forum/files
> >>> Disallow: /en/forum/images
> >>> Disallow: /en/forum/includes
> >>> Disallow: /en/forum/language
> >>> Disallow: /en/forum/store
> >>> Disallow: /en/forum/styles
> >>> Disallow: /es/forum/common.php
> >>> Disallow: /es/forum/config.php
> >>> Disallow: /es/forum/con.php
> >>> Disallow: /es/forum/faq.php
> >>> Disallow: /es/forum/mcp.php
> >>> Disallow: /es/forum/memberlist.php
> >>> Disallow: /es/forum/posting.php
> >>> Disallow: /es/forum/report.php
> >>> Disallow: /es/forum/search.php
> >>> Disallow: /es/forum/style.php
> >>> Disallow: /es/forum/ucp.php
> >>> Disallow: /es/forum/viewonline.php
> >>> Disallow: /es/forum/adm
> >>> Disallow: /es/forum/cache
> >>> Disallow: /es/forum/docs
> >>> Disallow: /es/forum/files
> >>> Disallow: /es/forum/images
> >>> Disallow: /es/forum/includes
> >>> Disallow: /es/forum/language
> >>> Disallow: /es/forum/store
> >>> Disallow: /es/forum/styles
> >>> Disallow: /fr/forum/common.php
> >>> Disallow: /fr/forum/config.php
> >>> Disallow: /fr/forum/con.php
> >>> Disallow: /fr/forum/faq.php
> >>> Disallow: /fr/forum/mcp.php
> >>> Disallow: /fr/forum/memberlist.php
> >>> Disallow: /fr/forum/posting.php
> >>> Disallow: /fr/forum/report.php
> >>> Disallow: /fr/forum/search.php
> >>> Disallow: /fr/forum/style.php
> >>> Disallow: /fr/forum/ucp.php
> >>> Disallow: /fr/forum/viewonline.php
> >>> Disallow: /fr/forum/adm
> >>> Disallow: /fr/forum/cache
> >>> Disallow: /fr/forum/docs
> >>> Disallow: /fr/forum/files
> >>> Disallow: /fr/forum/images
> >>> Disallow: /fr/forum/includes
> >>> Disallow: /fr/forum/language
> >>> Disallow: /fr/forum/store
> >>> Disallow: /fr/forum/styles
> >>> Disallow: /fr/ci-joint
> >>> Disallow: /hu/forum/common.php
> >>> Disallow: /hu/forum/config.php
> >>> Disallow: /hu/forum/con.php
> >>> Disallow: /hu/forum/faq.php
> >>> Disallow: /hu/forum/mcp.php
> >>> Disallow: /hu/forum/memberlist.php
> >>> Disallow: /hu/forum/posting.php
> >>> Disallow: /hu/forum/report.php
> >>> Disallow: /hu/forum/search.php
> >>> Disallow: /hu/forum/style.php
> >>> Disallow: /hu/forum/ucp.php
> >>> Disallow: /hu/forum/viewonline.php
> >>> Disallow: /hu/forum/adm
> >>> Disallow: /hu/forum/cache
> >>> Disallow: /hu/forum/docs
> >>> Disallow: /hu/forum/files
> >>> Disallow: /hu/forum/images
> >>> Disallow: /hu/forum/includes
> >>> Disallow: /hu/forum/language
> >>> Disallow: /hu/forum/store
> >>> Disallow: /hu/forum/styles
> >>> Disallow: /ja/forum/common.php
> >>> Disallow: /ja/forum/config.php
> >>> Disallow: /ja/forum/con.php
> >>> Disallow: /ja/forum/faq.php
> >>> Disallow: /ja/forum/mcp.php
> >>> Disallow: /ja/forum/memberlist.php
> >>> Disallow: /ja/forum/posting.php
> >>> Disallow: /ja/forum/report.php
> >>> Disallow: /ja/forum/search.php
> >>> Disallow: /ja/forum/style.php
> >>> Disallow: /ja/forum/ucp.php
> >>> Disallow: /ja/forum/viewonline.php
> >>> Disallow: /ja/forum/adm
> >>> Disallow: /ja/forum/cache
> >>> Disallow: /ja/forum/docs
> >>> Disallow: /ja/forum/files
> >>> Disallow: /ja/forum/images
> >>> Disallow: /ja/forum/includes
> >>> Disallow: /ja/forum/language
> >>> Disallow: /ja/forum/store
> >>> Disallow: /ja/forum/styles
> >>> Disallow: /test
> >>> Disallow: /nl/forum/common.php
> >>> Disallow: /nl/forum/config.php
> >>> Disallow: /nl/forum/con.php
> >>> Disallow: /nl/forum/faq.php
> >>> Disallow: /nl/forum/mcp.php
> >>> Disallow: /nl/forum/memberlist.php
> >>> Disallow: /nl/forum/posting.php
> >>> Disallow: /nl/forum/report.php
> >>> Disallow: /nl/forum/search.php
> >>> Disallow: /nl/forum/style.php
> >>> Disallow: /nl/forum/ucp.php
> >>> Disallow: /nl/forum/viewonline.php
> >>> Disallow: /nl/forum/adm
> >>> Disallow: /nl/forum/cache
> >>> Disallow: /nl/forum/docs
> >>> Disallow: /nl/forum/files
> >>> Disallow: /nl/forum/images
> >>> Disallow: /nl/forum/includes
> >>> Disallow: /nl/forum/language
> >>> Disallow: /nl/forum/store
> >>> Disallow: /nl/forum/styles
> >>> Disallow: /vi/forum/common.php
> >>> Disallow: /vi/forum/config.php
> >>> Disallow: /vi/forum/con.php
> >>> Disallow: /vi/forum/faq.php
> >>> Disallow: /vi/forum/mcp.php
> >>> Disallow: /vi/forum/memberlist.php
> >>> Disallow: /vi/forum/posting.php
> >>> Disallow: /vi/forum/report.php
> >>> Disallow: /vi/forum/search.php
> >>> Disallow: /vi/forum/style.php
> >>> Disallow: /vi/forum/ucp.php
> >>> Disallow: /vi/forum/viewonline.php
> >>> Disallow: /vi/forum/adm
> >>> Disallow: /vi/forum/cache
> >>> Disallow: /vi/forum/docs
> >>> Disallow: /vi/forum/files
> >>> Disallow: /vi/forum/images
> >>> Disallow: /vi/forum/includes
> >>> Disallow: /vi/forum/language
> >>> Disallow: /vi/forum/store
> >>> Disallow: /vi/forum/styles
> >>> Disallow: /zh/forum/common.php
> >>> Disallow: /zh/forum/config.php
> >>> Disallow: /zh/forum/con.php
> >>> Disallow: /zh/forum/faq.php
> >>> Disallow: /zh/forum/mcp.php
> >>> Disallow: /zh/forum/memberlist.php
> >>> Disallow: /zh/forum/posting.php
> >>> Disallow: /zh/forum/report.php
> >>> Disallow: /zh/forum/search.php
> >>> Disallow: /zh/forum/style.php
> >>> Disallow: /zh/forum/ucp.php
> >>> Disallow: /zh/forum/viewonline.php
> >>> Disallow: /zh/forum/adm
> >>> Disallow: /zh/forum/cache
> >>> Disallow: /zh/forum/docs
> >>> Disallow: /zh/forum/files
> >>> Disallow: /zh/forum/images
> >>> Disallow: /zh/forum/includes
> >>> Disallow: /zh/forum/language
> >>> Disallow: /zh/forum/store
> >>> Disallow: /zh/forum/styles
> >>>
> >>> This has been the robots.txt file since: Last-Modified: Sat, 06 Jun 2009 
> >>> 23:40:14 GMT
> >>>
> >>> Forum search uses phpBB
> >>>
> >>> We haven’t allowed search engines to crawl forum.openoffice.org since 
> >>> before the Oracle donation to the ASF.
> >>>
> >>> Crawlers IP addresses might be blocked by ASF Infra if their use is 
> >>> excessive. That could give the 301.
> >>>
> >>> Regards,
> >>> Dave
> >>>
> >>>> On May 12, 2020, at 3:55 AM, Peter Kovacs <leg...@posteo.de> wrote:
> >>>>
> >>>> Hello all,
> >>>>
> >>>>
> >>>> What I figured is that from the Google search tool the URL 
> >>>> forum.openoffice.org is not reachable.
> >>>>
> >>>> So I checked with Duckduckgo (my prefered Search engine), they don't use 
> >>>> crawler and point at the infra of Google, Bing and Yandex.
> >>>>
> >>>> I checked then with Bing, but could not figure out to check bots 
> >>>> feedback on an URL so I moved on
> >>>>
> >>>> I checked with Yandex. They have a search URL test page. I have entered 
> >>>> there forum.openoffice.org
> >>>>
> >>>> The Response is:
> >>>>
> >>>> ------------------------------------------------------------------------
> >>>>
> >>>> * Date: Tue, 12 May 2020 10:37:47 GMT
> >>>> * Server: Apache/2.4.18 (Ubuntu)
> >>>> * Location: https://forum.openoffice.org/
> >>>> * Content-Length: 237
> >>>> * Keep-Alive: timeout=15, max=100
> >>>> * Connection: Keep-Alive
> >>>> * Content-Type: text/html; charset=iso-8859-1
> >>>>
> >>>> ------------------------------------------------------------------------
> >>>>
> >>>>
> >>>> HTTP status code         301 Moved Permanently
> >>>> Server response time     133 ms
> >>>> IP address       54.84.201.130
> >>>> Encoding         UTF-8(unicode-1-1-utf-8, UTF8)
> >>>> Page size        237 B
> >>>>
> >>>>
> >>>> I am not sure, what that means. HTTP Status Code moved Permanently reads 
> >>>> wrong. I just dont know if this is the return code from our webservcer 
> >>>> or a response code from the crawler.
> >>>> I try to get someone from Infra. Or I'll open a ticket.
> >>>>
> >>>>
> >>>> All the best
> >>>> Peter
> >>>>
> >>>> Am 12.05.20 um 10:39 schrieb Matthias Seidel:
> >>>>> Hi Kay,
> >>>>>
> >>>>> Am 12.05.20 um 01:21 schrieb Kay Schenk:
> >>>>>> On 5/11/20 12:33 PM, Matthias Seidel wrote:
> >>>>>>> Hi Kay,
> >>>>>>>
> >>>>>>> Am 11.05.20 um 21:23 schrieb Kay Schenk:
> >>>>>>>> Hi Peter...
> >>>>>>>>
> >>>>>>>> Since I am a Google Search admin for www.openoffice.org, and
> >>>>>>>> openoffice.apache.org, I got this also. Disclaimer: I have not done
> >>>>>>>> ANY work with the Google Search apis on these sites in quite some 
> >>>>>>>> time.
> >>>>>>>>
> >>>>>>>> I actually was NOT aware forum.openoffice.org was set up to use 
> >>>>>>>> Google
> >>>>>>>> Search until I saw this.
> >>>>>>> I think, I added it to the list when we had a discussion about 
> >>>>>>> outdated
> >>>>>>> information regarding SourceForge found by Google Search.
> >>>>>>>
> >>>>>>> But I don't have access to forum.openoffice.org, so I could never
> >>>>>>> complete the step.
> >>>>>>>
> >>>>>>> Regards,
> >>>>>>>
> >>>>>>>    Matthias
> >>>>>> OK. In the top level of the website source, there is a file called
> >>>>>> "skeleton.html" which references the following bit of code --
> >>>>>>
> >>>>>> <!--#include virtual="/scripts/google-analytics.js" -->
> >>>>>>
> >>>>>> I didn't dig far enough to find how "skeleton.html" is used ( I
> >>>>>> forgot) but this this is example for the google-analytics code snippet
> >>>>>> that is used. Basically, this needs to be included in the site you
> >>>>>> want analytics to be used on by putting it in the (header) files that
> >>>>>> generate the site. And, you might  take a look at recent instructions
> >>>>>> from Google. Things change.
> >>>>>>
> >>>>>> https://support.google.com/analytics/answer/1008080
> >>>>> Yes, but this is for Google Analytics. I wouldn't want to "analyze" the
> >>>>> forum...
> >>>>> The procedure for the Google Search Console is the same, it needs access
> >>>>> to the root directory.
> >>>>>
> >>>>> Maybe Andrea can help if he is available again?
> >>>>>
> >>>>> Regards,
> >>>>>
> >>>>>   Matthias
> >>>>>
> >>>>>> Regards,
> >>>>>>
> >>>>>> Kay
> >>>>>>
> >>>>>>>> One of the Google Search admins for forum.openoffice.org could check
> >>>>>>>> the current Google search apis that are in use on that site. Changes
> >>>>>>>> are occasionally made to the calls, and maybe that is the issue, or a
> >>>>>>>> robots.txt for that site is causing this. I don't think it requires a
> >>>>>>>> response, but maybe some investigation.
> >>>>>>>>
> >>>>>>>> Just some ideas...
> >>>>>>>>
> >>>>>>>> Regards,
> >>>>>>>>
> >>>>>>>> Kay
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 5/11/20 6:02 AM, Peter Kovacs wrote:
> >>>>>>>>> Hi all,
> >>>>>>>>>
> >>>>>>>>> I have received following mail. Probably because I am listed in the
> >>>>>>>>> google-Analytics page.
> >>>>>>>>>
> >>>>>>>>> Does this has some action items? What can we answer Mr John Mueller?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> All the Best
> >>>>>>>>>
> >>>>>>>>> Peter
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> -------- Weitergeleitete Nachricht --------
> >>>>>>>>> Betreff:     Critical issue on forum.openoffice.org and Google 
> >>>>>>>>> Search
> >>>>>>>>> Datum:     Mon, 11 May 2020 13:37:27 +0200
> >>>>>>>>> Von:     John Mueller <joh...@google.com>
> >>>>>>>>> An:     morsei...@gmail.com, kay.sch...@gmail.com, legi...@gmail.com
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Dear webmaster of forum.openoffice.org <http://forum.openoffice.org>
> >>>>>>>>>
> >>>>>>>>> I'm an analyst at Google in Switzerland. We wanted to bring your
> >>>>>>>>> attention to a critical issue with your website, and how it's
> >>>>>>>>> available for Google's web search.
> >>>>>>>>>
> >>>>>>>>> In particular, Googlebot has been unable to crawl URLs from
> >>>>>>>>> https://forum.openoffice.org/ . This will cause those pages to drop
> >>>>>>>>> out of Google's search results, and will prevent new pages from 
> >>>>>>>>> being
> >>>>>>>>> picked up for Search. If you're not aware of this issue, you may be
> >>>>>>>>> accidentally blocking these pages from Google Search due to a server
> >>>>>>>>> issue. If you need to block Googlebot from crawling pages on your
> >>>>>>>>> website, we'd recommend using the robots.txt file instead.
> >>>>>>>>>
> >>>>>>>>> Should you need to recognize IP addresses of Googlebot requests, you
> >>>>>>>>> can use a reverse IP lookup to do so:
> >>>>>>>>> https://support.google.com/webmasters/answer/80553
> >>>>>>>>>
> >>>>>>>>> Should you have any questions, feel free to contact me directly. For
> >>>>>>>>> verification purposes, we are sending a copy of this message to your
> >>>>>>>>> site's Search Console account.
> >>>>>>>>>
> >>>>>>>>> Thank you,
> >>>>>>>>> John Mueller (joh...@google.com <mailto:joh...@google.com>)
> >>>>>>>>> Webmaster Trends Analyst
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> ---------------------------------------------------------------------
> >>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
> >>>>>>>> For additional commands, e-mail: dev-h...@openoffice.apache.org
> >>>>>>>>
> >>>>>> ---------------------------------------------------------------------
> >>>>>> To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
> >>>>>> For additional commands, e-mail: dev-h...@openoffice.apache.org
> >>>>>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
> >>> For additional commands, e-mail: dev-h...@openoffice.apache.org
> >>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
> >> For additional commands, e-mail: dev-h...@openoffice.apache.org
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
> > For additional commands, e-mail: dev-h...@openoffice.apache.org
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
> For additional commands, e-mail: dev-h...@openoffice.apache.org
> 


-- 
Rory O'Farrell <ofarr...@iol.ie>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org

Reply via email to