Re: Guidance required for GSoC project PTS rewrite in Django

2013-04-25 Thread Stefano Zacchiroli
On Wed, Apr 24, 2013 at 10:28:07PM +0530, Pankaj Kumar Sharma wrote:
> In the present system the content is loaded explicitly via cron.  The
> confusion that bounds me is that what should be methodology that we should
> use in the upcoming Django project ? Should the data be loaded at the time
> when some one asks for that or it should be present in the databases ?

The short answer is: we'll have to experiment with that :)

Some of the information that the PTS exposes (e.g. those related to the
status of the archive) are "fairly static", meaning that they change at
most 4 times a day. Others are "very dynamic" (e.g. bug information) and
ideally should really be live, as it could be really confusing for a
user to see that, say, a package has 1 RC bug, click on the bugs link,
and discover that that's not true. It's not true *anymore*, but the
random user would have no way of understanding that and think it's a
bug. This kind of incoherences has been an endless source of (bogus) bug
reports along the PTS life.

A separate question is how to make all this efficient, in term of
caching. Obviously, the current solution with static HTML pages is very
fast and is also easy to mirror in case of need.  A purely dynamic
solution would be on the opposite end of the spectrum in terms of
performances. We will probably need to stay somehow in the middle, and
benchmark the scalability of the new solution (as mentioned in the
project description).

Ideally, we should cache heavily, either by using Django caching, or by
producing actual HTML pages via Django templates (as mentioned by Paul
in this thread). And add on top of it heavy cache invalidation
mechanisms for live information, like bugs.  Alternatively, we might
want to cache only the information that are seldomly updated and be
entirely dynamic on the live information.

Regarding where the data come from, my dream would be to develop a
Python abstraction layer over all the data that the PTS uses. And then
have various implementation ("backends") of it. One can for instance
access directly UDD, another can access a local cache updated by cron
(as in the current PTS deployment), another be entirely live, and yet
another use mixed solutions. That would allow to more easily experiment
with the different solutions.

Hope this explains that we don't have yet written-in-stone-answers to
your question, and that finding out, via experiments, the right
trade-offs will be part of the actual project.

Cheers.
-- 
Stefano Zacchiroli  . . . . . . .  z...@upsilon.cc . . . . o . . . o . o
Maître de conférences . . . . . http://upsilon.cc/zack . . . o . . . o o
Former Debian Project Leader  . . @zack on identi.ca . . o o o . . . o .
« the first rule of tautology club is the first rule of tautology club »


signature.asc
Description: Digital signature


Re: PTS: host static files on static.d.o?

2013-04-25 Thread Peter Palfrader
On Thu, 25 Apr 2013, Paul Wise wrote:

> Most of the PTS is currently static HTML/etc files, that can change
> approximately every 6 hours, depending on changes in the archive and in
> external package checkers (lintian etc). The PTS is a fairly essential
> service for package maintainers and it would be nice to take advantage
> of it's static HTML generation to provide information to maintainers
> even when the main host is down or disconnected. Since DSA implemented
> static.d.o for the purpose of distributing static content from multiple
> hosts, I thought we might take advantage of it for the PTS.
> 
> I'm not familiar with how static.d.o works, could any DSA folks comment
> on that?
> 
> Do DSA folks think the data produced by the PTS is suitable for serving
> from static.d.o?

I'm positively inclined but would still like to know more before
agreeing:

How much data are we talking about (how many files, how many gigs)?
What's the expected churn per update.  You already answered update
frequency.

What's left on the current packages.qa.d.o that would not be moved?

Cheers,
weasel
-- 
   |  .''`.   ** Debian **
  Peter Palfrader  | : :' :  The  universal
 http://www.palfrader.org/ | `. `'  Operating System
   |   `-http://www.debian.org/


-- 
To UNSUBSCRIBE, email to debian-qa-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130425143137.gq23...@anguilla.noreply.org



Re: PTS: host static files on static.d.o?

2013-04-25 Thread Paul Wise
On Thu, 2013-04-25 at 16:31 +0200, Peter Palfrader wrote:

> How much data are we talking about (how many files, how many gigs)?

pabs@quantz:~$ find /srv/packages.qa.debian.org/www/web/ -type f | wc -l
638509
pabs@quantz:~$ du -h --summarize /srv/packages.qa.debian.org/www/web/
4.3G/srv/packages.qa.debian.org/www/web/

> What's the expected churn per update.

I'll measure this tomorrow and get back to you.

> What's left on the current packages.qa.d.o that would not be moved?

There are some CGIs:

  * pts.cgi: used for subscription/unsubscriptions/etc
  * error404.cgi: a custom 404 handler that redirects from binary
package names to source package names
  * soap-alpha.cgi: a SOAP interface to the PTS
  * set-csspref.cgi: obsolete IIRC

The 404 hander might be problematic unless we can distribute it and its
1.7 M plain text data file. Another possibly problematic thing is the
mod_rewrite rules that redirect from /srcpkg to /s/srcpkg.html.

-- 
bye,
pabs

http://wiki.debian.org/PaulWise


signature.asc
Description: This is a digitally signed message part


Re: PTS rewrite in Django

2013-04-25 Thread Raphael Hertzog
On Sat, 20 Apr 2013, Pankaj Kumar Sharma wrote:
> I have understood the "web" part of PTS. Now I want to  use the PTS
> email interface [2]. I could not figure out how to setup a similar
> interface on my local machine using the codebase of PTS.

Debian servers use exim with a configuration that supports .forward-*
and aliases files to define the various addresses available on the virtual
domain:
http://anonscm.debian.org/viewvc/qa/trunk/pts/mail/

The .forward-default is the most important one that receives almost all
incoming mails (it's the catchall address). .forward-_control is
_cont...@packages.qa.debian.org aka p...@qa.debian.org.
.forward-_news is _n...@packages.qa.debian.org which is subscribed to
debian-(devel-)?chan...@lists.debian.org

That said you don't need to setup everything like this for your own tests.
Having scripts that accept mails in stdin is enough in most cases.

Cheers,
-- 
Raphaël Hertzog ◈ Debian Developer

Get the Debian Administrator's Handbook:
→ http://debian-handbook.info/get/


-- 
To UNSUBSCRIBE, email to debian-qa-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130425184415.ga27...@x230-buxy.home.ouaza.com



Re: PTS: host static files on static.d.o?

2013-04-25 Thread Raphael Hertzog
Hi Paul,

On Thu, 25 Apr 2013, Paul Wise wrote:
> Most of the PTS is currently static HTML/etc files, that can change
> approximately every 6 hours, depending on changes in the archive and in
> external package checkers (lintian etc). The PTS is a fairly essential
> service for package maintainers and it would be nice to take advantage
> of it's static HTML generation to provide information to maintainers
> even when the main host is down or disconnected.

Please don't implement this for now. The planned GSOC project concerning
the PTS is likely to change this initial design decision of having almost
only static content.

High availability is important for the PTS because it is a central part of
the communication infrastructure too, and that can't be done with
static.d.o. So we should probably investigate other ways to make it fault
tolerant rather than a simple redundancy of static HTML pages.

Cheers,
-- 
Raphaël Hertzog ◈ Debian Developer

Get the Debian Administrator's Handbook:
→ http://debian-handbook.info/get/


-- 
To UNSUBSCRIBE, email to debian-qa-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130425185608.gb27...@x230-buxy.home.ouaza.com



Re: Guidance required for GSoC project PTS rewrite in Django

2013-04-25 Thread Raphael Hertzog
Hi Paul,

thank you Paul for having shared your ideas. They are all very
interesting in some ways but most are probably out of scope
for the GSOC project that I and Stefano are mentoring (I'm saying
this just to avoid confusion in the students heads).

On Thu, 25 Apr 2013, Paul Wise wrote:
> The Debian archive presently only exports an apt archive and sends
> events out as emails, mainly aimed at humans not machines. This year
> there is a planned GSoC project to change this and make dak spit out
> AMPQ events using fedmsg, enabling Debian and other services to get
> realtime updates about package uploads etc.

Interesting indeed, but it's impossible to plan a GSOC based on something
that's going to be developed in parallel.

> Then there is UDD (Universal Debian Database), which is a project to
> import all sources of data about Debian into an SQL database
> (currently PostgreSQL). It has a lot of data but the data in UDD is
> different to the PTS (some extra, some missing).
> 
> http://wiki.debian.org/UltimateDebianDatabase
> 
> The PTS was created before UDD. It is pulls data in from a variety of
> sources (update_incoming.sh), converts that to xml (excuses_to_xml.py
> other_to_xml.py  sources_to_xml.py) and converts that to HTML
> (generate_html.sh) using XSLT. Some data is also pushed to it via
> email (the news mainly).

Using UDD as one possible source of data is certainly desirable
in the planned rewrite. But expanding the scope of UDD is not really
planned (unless we have UDD maintainers who are very reactive).

I'm not expecting the students to contribute code to UDD.

> Realtime updates so that developers can get correct information. The
> current situation is suboptimal.

Agreed. We should at least be prepared for it.

> Static HTML since it is faster to load and means we can distribute the
> content to multiple hosts in case of downtime (not done yet).

Static HTML is not a requirement that I plan to impose. At least
not down to the pre-generation of all HTML pages.

On the contrary, I see the PTS evolving in something much more interactive
and dynamic. As an example of future extension (outside of the scope of
this GSOC), it would be nice if people could fill missing debtags directly
in the PTS and then package maintainers + DD could approve them. Or if
people could fill-in upstream metadata (upstream VCS, BTS, etc.) that we
don't want to store in debian/control currently.

> doesn't need logins/admins/etc. No need for passwords since we don't
> have them currently. Passwords are also a less than ideal
> authentication mechanism, if we need authentication at all we should
> use something better like OpenGPG keys or client-side SSL
> certificates.

It's still interesting to authenticate people so that we can subscribe
them to packages more easily (for a better integration with the mail
part).

And later to handle other kind of operations (as suggested above).

Cheers,
-- 
Raphaël Hertzog ◈ Debian Developer

Get the Debian Administrator's Handbook:
→ http://debian-handbook.info/get/


-- 
To UNSUBSCRIBE, email to debian-qa-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130425192828.ga28...@x230-buxy.home.ouaza.com



Re: PTS: host static files on static.d.o?

2013-04-25 Thread Martin Zobel-Helas
Hi, 

On Thu Apr 25, 2013 at 20:56:08 +0200, Raphael Hertzog wrote:
> Hi Paul,
> 
> On Thu, 25 Apr 2013, Paul Wise wrote:
> > Most of the PTS is currently static HTML/etc files, that can change
> > approximately every 6 hours, depending on changes in the archive and in
> > external package checkers (lintian etc). The PTS is a fairly essential
> > service for package maintainers and it would be nice to take advantage
> > of it's static HTML generation to provide information to maintainers
> > even when the main host is down or disconnected.
> 
> Please don't implement this for now. The planned GSOC project concerning
> the PTS is likely to change this initial design decision of having almost
> only static content.
> 
> High availability is important for the PTS because it is a central part of
> the communication infrastructure too, and that can't be done with
> static.d.o. So we should probably investigate other ways to make it fault
> tolerant rather than a simple redundancy of static HTML pages.

what is the reason to move away from working static webpages? I also
wonder where the CPU cycles for dynamic web pages will come from...

Not amused, 
zobel

-- 
 Martin Zobel-Helas Debian System Administrator
 Debian & GNU/Linux Developer   Debian Listmaster
 http://about.me/zobel   Debian Webmaster
 GPG Fingerprint:  6B18 5642 8E41 EC89 3D5D  BDBB 53B1 AC6D B11B 627B 


-- 
To UNSUBSCRIBE, email to debian-qa-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130425194426.gl30...@ftbfs.de



Re: PTS: host static files on static.d.o?

2013-04-25 Thread Paul Wise
On Thu, 2013-04-25 at 21:44 +0200, Martin Zobel-Helas wrote:

> what is the reason to move away from working static webpages?

Probably due to a lack of research into ways to do that with django.
Based on some quick research into it, there are a number of ways that
this can be done.

I personally have advocated for continuing to generate static pages. 

> I also wonder where the CPU cycles for dynamic web pages will come
> from...

Same place as the static ones, quantz :)
 
-- 
bye,
pabs

http://wiki.debian.org/PaulWise


signature.asc
Description: This is a digitally signed message part


Re: Guidance required for GSoC project PTS rewrite in Django

2013-04-25 Thread Paul Wise
On Fri, Apr 26, 2013 at 3:28 AM, Raphael Hertzog wrote:

> thank you Paul for having shared your ideas. They are all very
> interesting in some ways but most are probably out of scope
> for the GSOC project that I and Stefano are mentoring (I'm saying
> this just to avoid confusion in the students heads).

Thanks for the clarification.

> Interesting indeed, but it's impossible to plan a GSOC based on something
> that's going to be developed in parallel.

Agreed :(

> Using UDD as one possible source of data is certainly desirable
> in the planned rewrite. But expanding the scope of UDD is not really
> planned (unless we have UDD maintainers who are very reactive).
>
> I'm not expecting the students to contribute code to UDD.

Stefano is an admin in collab-qa and a mentor for this project, so
that should be enough?

> On the contrary, I see the PTS evolving in something much more interactive
> and dynamic. As an example of future extension (outside of the scope of
> this GSOC), it would be nice if people could fill missing debtags directly
> in the PTS and then package maintainers + DD could approve them. Or if
> people could fill-in upstream metadata (upstream VCS, BTS, etc.) that we
> don't want to store in debian/control currently.

I expect this move will be unpopular, one of the members of DSA has
already registered their disapproval in the other thread. IIRC Enrico
is planning to add ways for other folks to approve debtags changes.
The upstream metadata apart from homepage is stored in
debian/upstream:

http://wiki.debian.org/debian/upstream
http://wiki.debian.org/UpstreamMetadata
http://dep.debian.net/deps/dep12

Perhaps a hybrid approach is appropriate, distributed static files for
information display and main site for modifications etc.

> It's still interesting to authenticate people so that we can subscribe
> them to packages more easily (for a better integration with the mail
> part).

Hmm, I suppose so.

> And later to handle other kind of operations (as suggested above).

None of the user submission sites (debtags, screenshots, description
translations, watch files) require logins and I suggest we continue
that tradition if you want to move those to the PTS instead of having
them on their own domains.

-- 
bye,
pabs

http://wiki.debian.org/PaulWise


-- 
To UNSUBSCRIBE, email to debian-qa-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/CAKTje6HtvMMHbzQ=s2ybH3wVEZBL6gL4ty2PZM=wla_poqo...@mail.gmail.com