Re: robots.txt in quickgit.kde.org
On Wednesday, December 30, 2015 12:57:23 PM Ben Cooksley wrote: > On Tue, Dec 29, 2015 at 11:16 PM, Kevin Funkwrote: > > On Tuesday, December 29, 2015 10:39:01 PM Ben Cooksley wrote: > >> On Tue, Dec 29, 2015 at 7:59 AM, Lydia Pintscher wrote: > >> > On Sun, Dec 27, 2015 at 12:35 PM, Ben Cooksley wrote: > >> >>> Is there some place where search engines can easily index our source > >> >>> code or are we shooting ourselves in the foot here? > >> >> > >> >> We could probably make it available by publishing the source trees > >> >> used by LXR / EBN. > >> >> This would only have the main branches obviously rather than > >> >> everything > >> >> though. > >> >> > >> >> I haven't checked, but LXR may already make it's copy of the code > >> >> accessible...> > >> > > >> > I think making our sourcecode available to search engines is pretty > >> > important for the reasons already mentioned by others. Do you need > >> > help for it? If you write down what's needed I can help find someone > >> > to do it. > >> > >> I've now provisioned https://sources.kde.org/ > > > > I'm not sure this is super useful, to be honest (as mentioned in #kde- > > sysadmins already). > > > > This is really just plain file serving, with no cross-references to either > > LXR (or apidocs). This is basically a dead-end when you follow a result > > on Google. > > > > Wouldn't it be possible to let robots index https://lxr.kde.org/source/ > > > > instead? We have the infrastructure... > > We'll give it a shot. Just to stress again this would be *really* useful to have. I answered a post on SO: http://stackoverflow.com/a/34612692/592636 Tried to link kwallet's FindGpgpme.cmake into the answer; and there's *no* easy way quickly get a link to KDE infrastructure serving the file via Google (not even api.kde.org). Try googling for "kwallet findgpgme.cmake" (very specific search after all): https://www.google.de/search?q=kwallet+findgpgme.cmake -> First result: Github..., rest: mildly interesting Different issue I just noticed: There's no way to get the plain-text (raw) representation of a given file on LXR, is there? Would be useful as well. Cheers, Kevin > > Of course we need to blacklist all the pages allowing to actively *search* > > LXR for robots, in order to avoid abuse. > > Note that despite robots.txt, many spiders (including Google, Yahoo > and Bing) will actively disregard the instructions in there. > While they may not return the results - or omit snippets of the page > content - they have all been guilty (at least in the past) of > disregarding our restrictions, resulting in downtime (which have in > some cases necessitated full host reboots to fix) for numerous KDE.org > subsites in the past. > > This is why QuickGit and WebSVN have extremely restrictive robots.txt > policies, in addition to blacklist rules within our web server > configurations. > > > Cheers, > > Kevin > > Regards, > Ben > > >> > Cheers > >> > Lydia > >> > >> Regards, > >> Ben > >> > >> > -- > >> > Lydia Pintscher - http://about.me/lydia.pintscher > >> > KDE e.V. Board of Directors / KDE Community Working Group > >> > http://kde.org - http://open-advice.org > >> > > >> >>> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to > >> >>> unsubscribe <<>> > >> >> > >> >> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to > >> >> unsubscribe > >> >> << > > > > -- > > Kevin Funk | kf...@kde.org | http://kfunk.org -- Kevin Funk | kf...@kde.org | http://kfunk.org signature.asc Description: This is a digitally signed message part. >> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<
Re: robots.txt in quickgit.kde.org
On Wed, Jan 6, 2016 at 3:17 AM, Kevin Funkwrote: > On Wednesday, December 30, 2015 12:57:23 PM Ben Cooksley wrote: >> On Tue, Dec 29, 2015 at 11:16 PM, Kevin Funk wrote: >> > On Tuesday, December 29, 2015 10:39:01 PM Ben Cooksley wrote: >> >> On Tue, Dec 29, 2015 at 7:59 AM, Lydia Pintscher wrote: >> >> > On Sun, Dec 27, 2015 at 12:35 PM, Ben Cooksley > wrote: >> >> >>> Is there some place where search engines can easily index our source >> >> >>> code or are we shooting ourselves in the foot here? >> >> >> >> >> >> We could probably make it available by publishing the source trees >> >> >> used by LXR / EBN. >> >> >> This would only have the main branches obviously rather than >> >> >> everything >> >> >> though. >> >> >> >> >> >> I haven't checked, but LXR may already make it's copy of the code >> >> >> accessible...> >> >> > >> >> > I think making our sourcecode available to search engines is pretty >> >> > important for the reasons already mentioned by others. Do you need >> >> > help for it? If you write down what's needed I can help find someone >> >> > to do it. >> >> >> >> I've now provisioned https://sources.kde.org/ >> > >> > I'm not sure this is super useful, to be honest (as mentioned in #kde- >> > sysadmins already). >> > >> > This is really just plain file serving, with no cross-references to either >> > LXR (or apidocs). This is basically a dead-end when you follow a result >> > on Google. >> > >> > Wouldn't it be possible to let robots index https://lxr.kde.org/source/ >> > >> > instead? We have the infrastructure... >> >> We'll give it a shot. > > Just to stress again this would be *really* useful to have. > > I answered a post on SO: > http://stackoverflow.com/a/34612692/592636 > > Tried to link kwallet's FindGpgpme.cmake into the answer; and there's *no* > easy way quickly get a link to KDE infrastructure serving the file via Google > (not even api.kde.org). > > Try googling for "kwallet findgpgme.cmake" (very specific search after all): > https://www.google.de/search?q=kwallet+findgpgme.cmake > > -> First result: Github..., rest: mildly interesting > > > Different issue I just noticed: There's no way to get the plain-text (raw) > representation of a given file on LXR, is there? Would be useful as well. There isn't a link in our templates, but my Google fu (and subsequent tests confirm) that adding the parameter "_raw=1" to a LXR source view URL will return the file without any HTML around it. > > Cheers, > Kevin Regards, Ben > >> > Of course we need to blacklist all the pages allowing to actively *search* >> > LXR for robots, in order to avoid abuse. >> >> Note that despite robots.txt, many spiders (including Google, Yahoo >> and Bing) will actively disregard the instructions in there. >> While they may not return the results - or omit snippets of the page >> content - they have all been guilty (at least in the past) of >> disregarding our restrictions, resulting in downtime (which have in >> some cases necessitated full host reboots to fix) for numerous KDE.org >> subsites in the past. >> >> This is why QuickGit and WebSVN have extremely restrictive robots.txt >> policies, in addition to blacklist rules within our web server >> configurations. >> >> > Cheers, >> > Kevin >> >> Regards, >> Ben >> >> >> > Cheers >> >> > Lydia >> >> >> >> Regards, >> >> Ben >> >> >> >> > -- >> >> > Lydia Pintscher - http://about.me/lydia.pintscher >> >> > KDE e.V. Board of Directors / KDE Community Working Group >> >> > http://kde.org - http://open-advice.org >> >> > >> >> >>> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to >> >> >>> unsubscribe <<>> >> >> >> >> >> >> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to >> >> >> unsubscribe >> >> >> << >> > >> > -- >> > Kevin Funk | kf...@kde.org | http://kfunk.org > > -- > Kevin Funk | kf...@kde.org | http://kfunk.org >> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<
Re: robots.txt in quickgit.kde.org
On Tue, Dec 29, 2015 at 11:16 PM, Kevin Funkwrote: > On Tuesday, December 29, 2015 10:39:01 PM Ben Cooksley wrote: >> On Tue, Dec 29, 2015 at 7:59 AM, Lydia Pintscher wrote: >> > On Sun, Dec 27, 2015 at 12:35 PM, Ben Cooksley wrote: >> >>> Is there some place where search engines can easily index our source >> >>> code or are we shooting ourselves in the foot here? >> >> >> >> We could probably make it available by publishing the source trees >> >> used by LXR / EBN. >> >> This would only have the main branches obviously rather than everything >> >> though. >> >> >> >> I haven't checked, but LXR may already make it's copy of the code >> >> accessible...> >> > I think making our sourcecode available to search engines is pretty >> > important for the reasons already mentioned by others. Do you need >> > help for it? If you write down what's needed I can help find someone >> > to do it. >> >> I've now provisioned https://sources.kde.org/ > > I'm not sure this is super useful, to be honest (as mentioned in #kde- > sysadmins already). > > This is really just plain file serving, with no cross-references to either LXR > (or apidocs). This is basically a dead-end when you follow a result on Google. > > Wouldn't it be possible to let robots index https://lxr.kde.org/source/ > instead? We have the infrastructure... We'll give it a shot. > > Of course we need to blacklist all the pages allowing to actively *search* LXR > for robots, in order to avoid abuse. Note that despite robots.txt, many spiders (including Google, Yahoo and Bing) will actively disregard the instructions in there. While they may not return the results - or omit snippets of the page content - they have all been guilty (at least in the past) of disregarding our restrictions, resulting in downtime (which have in some cases necessitated full host reboots to fix) for numerous KDE.org subsites in the past. This is why QuickGit and WebSVN have extremely restrictive robots.txt policies, in addition to blacklist rules within our web server configurations. > > Cheers, > Kevin Regards, Ben > >> > Cheers >> > Lydia >> >> Regards, >> Ben >> >> > -- >> > Lydia Pintscher - http://about.me/lydia.pintscher >> > KDE e.V. Board of Directors / KDE Community Working Group >> > http://kde.org - http://open-advice.org >> > >> >>> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to >> >>> unsubscribe <<>> >> >> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe >> >> << > > -- > Kevin Funk | kf...@kde.org | http://kfunk.org >> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<
Re: robots.txt in quickgit.kde.org
On Tue, Dec 29, 2015 at 7:59 AM, Lydia Pintscherwrote: > On Sun, Dec 27, 2015 at 12:35 PM, Ben Cooksley wrote: >>> Is there some place where search engines can easily index our source >>> code or are we shooting ourselves in the foot here? >> >> We could probably make it available by publishing the source trees >> used by LXR / EBN. >> This would only have the main branches obviously rather than everything >> though. >> >> I haven't checked, but LXR may already make it's copy of the code >> accessible... > > I think making our sourcecode available to search engines is pretty > important for the reasons already mentioned by others. Do you need > help for it? If you write down what's needed I can help find someone > to do it. I've now provisioned https://sources.kde.org/ > > > Cheers > Lydia Regards, Ben > > -- > Lydia Pintscher - http://about.me/lydia.pintscher > KDE e.V. Board of Directors / KDE Community Working Group > http://kde.org - http://open-advice.org > >>> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe << >> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<
Re: robots.txt in quickgit.kde.org
On Tuesday, December 29, 2015 10:39:01 PM Ben Cooksley wrote: > On Tue, Dec 29, 2015 at 7:59 AM, Lydia Pintscherwrote: > > On Sun, Dec 27, 2015 at 12:35 PM, Ben Cooksley wrote: > >>> Is there some place where search engines can easily index our source > >>> code or are we shooting ourselves in the foot here? > >> > >> We could probably make it available by publishing the source trees > >> used by LXR / EBN. > >> This would only have the main branches obviously rather than everything > >> though. > >> > >> I haven't checked, but LXR may already make it's copy of the code > >> accessible...> > > I think making our sourcecode available to search engines is pretty > > important for the reasons already mentioned by others. Do you need > > help for it? If you write down what's needed I can help find someone > > to do it. > > I've now provisioned https://sources.kde.org/ You rock :) Regards, Kåre > > > Cheers > > Lydia > > Regards, > Ben > > > -- > > Lydia Pintscher - http://about.me/lydia.pintscher > > KDE e.V. Board of Directors / KDE Community Working Group > > http://kde.org - http://open-advice.org > > > >>> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to > >>> unsubscribe << > ___ > Plasma-devel mailing list > plasma-de...@kde.org > https://mail.kde.org/mailman/listinfo/plasma-devel >> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<
Re: robots.txt in quickgit.kde.org
On Tuesday, December 29, 2015 10:39:01 PM Ben Cooksley wrote: > On Tue, Dec 29, 2015 at 7:59 AM, Lydia Pintscherwrote: > > On Sun, Dec 27, 2015 at 12:35 PM, Ben Cooksley wrote: > >>> Is there some place where search engines can easily index our source > >>> code or are we shooting ourselves in the foot here? > >> > >> We could probably make it available by publishing the source trees > >> used by LXR / EBN. > >> This would only have the main branches obviously rather than everything > >> though. > >> > >> I haven't checked, but LXR may already make it's copy of the code > >> accessible...> > > I think making our sourcecode available to search engines is pretty > > important for the reasons already mentioned by others. Do you need > > help for it? If you write down what's needed I can help find someone > > to do it. > > I've now provisioned https://sources.kde.org/ I'm not sure this is super useful, to be honest (as mentioned in #kde- sysadmins already). This is really just plain file serving, with no cross-references to either LXR (or apidocs). This is basically a dead-end when you follow a result on Google. Wouldn't it be possible to let robots index https://lxr.kde.org/source/ instead? We have the infrastructure... Of course we need to blacklist all the pages allowing to actively *search* LXR for robots, in order to avoid abuse. Cheers, Kevin > > Cheers > > Lydia > > Regards, > Ben > > > -- > > Lydia Pintscher - http://about.me/lydia.pintscher > > KDE e.V. Board of Directors / KDE Community Working Group > > http://kde.org - http://open-advice.org > > > >>> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to > >>> unsubscribe <<>> > >> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe > >> << -- Kevin Funk | kf...@kde.org | http://kfunk.org signature.asc Description: This is a digitally signed message part. >> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<
Re: robots.txt in quickgit.kde.org
El Monday 28 December 2015, a les 16:37:47, Thomas Lübking va escriure: > On Montag, 28. Dezember 2015 11:35:23 CEST, Albert Vaca wrote: > > Lxr can't search across every open source project in the world, so that's > > a > > point for Google. > > Presuming google could, why would I. Or anyone? You want to use QComplicatedClass, can't figure out how, go to google and search for it. You see KDE does use it and since you know KDE is an amazing group of developers you read their code and now understand how to use QComplicatedClass, you make your code and then realize the KDE people missed a subtle corner case, you sent a code review, everyone wins. Cheers, Albert > I dig for certain variables or strings in specific code, but why would I > look for m_foo or "SomeSetting" "across every open source project in the > world"? Lower STR? Search who copied "my" code? > > Don't get me wrong, I don't care if anyone can - fine. But the inability > doesn't exactly like shooting yourself. > > Shrug, > Thomas > > >> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe > >> << >> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<
Re: robots.txt in quickgit.kde.org
On Monday, December 28, 2015 04:37:47 PM Thomas Lübking wrote: > On Montag, 28. Dezember 2015 11:35:23 CEST, Albert Vaca wrote: > > Lxr can't search across every open source project in the world, so that's > > a > > point for Google. > > Presuming google could, why would I. Or anyone? > I dig for certain variables or strings in specific code, but why would I > look for m_foo or "SomeSetting" "across every open source project in the > world"? Lower STR? Search who copied "my" code? Are you aware that not even every KDE developer knows about LXR? I constantly have to tell people about it. Now consider someone outside of KDE even, trying to figure out where KAwesomeClass is defined, or how it is implemented quickly -> bummer. (right now there's api.kde.org, which at least gives results about signatures, but the content of the .cpp or .h files are not indexed afaics) So having KDE source code indexed by Google would be definitely a win for everyone. It being linked back to LXR even more (=> people learn LXR exists when googling for KDE source code) Cheers, Kevin > Don't get me wrong, I don't care if anyone can - fine. But the inability > doesn't exactly like shooting yourself. > > Shrug, > Thomas > > >> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe > >> << -- Kevin Funk | kf...@kde.org | http://kfunk.org signature.asc Description: This is a digitally signed message part. >> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<
Re: robots.txt in quickgit.kde.org
On Monday, December 28, 2015 04:37:47 PM Thomas Lübking wrote: > On Montag, 28. Dezember 2015 11:35:23 CEST, Albert Vaca wrote: > > Lxr can't search across every open source project in the world, so that's > > a > > point for Google. > > Presuming google could, why would I. Or anyone? > I dig for certain variables or strings in specific code, but why would I > look for m_foo or "SomeSetting" "across every open source project in the > world"? Lower STR? Search who copied "my" code? > > Don't get me wrong, I don't care if anyone can - fine. But the inability > doesn't exactly like shooting yourself. Potential new contributors do not find our code as they do not know to search in LXR / EBN and when they don't find the code through Google it does not exist in their world and thus they will not contribute. But if it would be possible to find even only git master through LXR through Google it would be great :) /Kåre >> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<
Re: robots.txt in quickgit.kde.org
On Montag, 28. Dezember 2015 18:09:32 CEST, Kevin Funk wrote: Are you aware that not even every KDE developer knows about LXR? I constantly have to tell people about it. Yes, and I'm as well aware of the "if it's not in google, it doesn't exist" phenomenon, BUT: that's not gonna work. If you search for QComplicatedClass, you'll find the Qt API docs, maybe some bug and most likely some stackoverflow entry - not a token in some thousand lines of code (leaving aside that mindless "i copy what I don't understand" is not the best of all approaches) - that will be on page [where no one has gone before] Now consider someone outside of KDE even, trying to figure out where KAwesomeClass is defined But you already know it exists? As mentioned I don't care, but for devs, one should ensure they get aware of lxr.kde.org - that's *far* better than having them enter "m_foo" into google and click until they find a result in KDE. The idea that someone finds a solution to his problem in the KDE sources is a bit far off to me. No idea what it would take, but ideally google would simply forward lxr.kde.org when you search for "kde" and something else. Stillshrug, Thomas Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<
Re: robots.txt in quickgit.kde.org
On Sun, Dec 27, 2015 at 12:35 PM, Ben Cooksleywrote: >> Is there some place where search engines can easily index our source >> code or are we shooting ourselves in the foot here? > > We could probably make it available by publishing the source trees > used by LXR / EBN. > This would only have the main branches obviously rather than everything > though. > > I haven't checked, but LXR may already make it's copy of the code > accessible... I think making our sourcecode available to search engines is pretty important for the reasons already mentioned by others. Do you need help for it? If you write down what's needed I can help find someone to do it. Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher KDE e.V. Board of Directors / KDE Community Working Group http://kde.org - http://open-advice.org >> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<
robots.txt in quickgit.kde.org
Hi everyone, "quickgit.kde.org" contains robots.txt[0] which is disallowing search engines to fetch the project repos. I just wanted to know if this is intentional or not? If I recall correctly, mirror of kde repositories on github was created just because it wasn't being indexed by the search engines. [0] https://quickgit.kde.org/robots.txt -- *Regards,* *Ashish Bansal* *http://ashish-bansal.in <http://ashish-bansal.in>* >> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<
Re: robots.txt in quickgit.kde.org
On Mon, Dec 28, 2015 at 12:15 AM, Lydia Pintscher <ly...@kde.org> wrote: > On Sun, Dec 27, 2015 at 12:08 PM, Ben Cooksley <bcooks...@kde.org> wrote: >> On Sun, Dec 27, 2015 at 11:53 PM, Ashish Bansal >> <bansal.ashish...@gmail.com> wrote: >>> Hi everyone, >> >> Hi Ashish, >> >>> >>> "quickgit.kde.org" contains robots.txt[0] which is disallowing search >>> engines to fetch the project repos. I just wanted to know if this is >>> intentional or not? >>> >>> If I recall correctly, mirror of kde repositories on github was created just >>> because it wasn't being indexed by the search engines. >> >> This is intentional, and is done to reduce the server load created by >> indexers such as Google on the system hosting quickgit.kde.org. >> (Generation of the pages, including the main index is substantially >> more expensive than it appears due to the disk access required by >> Git/SVN to return the needed information). > > Is there some place where search engines can easily index our source > code or are we shooting ourselves in the foot here? We could probably make it available by publishing the source trees used by LXR / EBN. This would only have the main branches obviously rather than everything though. I haven't checked, but LXR may already make it's copy of the code accessible... > > > Cheers > Lydia Regards, Ben > > -- > Lydia Pintscher - http://about.me/lydia.pintscher > KDE e.V. Board of Directors / KDE Community Working Group > http://kde.org - http://open-advice.org > >>> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe << >> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<
Re: robots.txt in quickgit.kde.org
On Sonntag, 27. Dezember 2015 12:35:51 CEST, Ben Cooksley wrote: We could probably make it available by publishing the source trees used by LXR / EBN. Because if it's not in google, it doesn't exist? We've lxr which is a dedicated and *far* superior way to search our code, so what exactly is the purpose of finding "m_fooBar = new KFoo::Bar()" via google? (let alone bing ;-P ) Cheers, sorry if I sound stupid. Thomas Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<