[Wikitech-l] Google Summer of Code: accepted projects

2009-04-21 Thread Roan Kattouw
Yesterday, the selection of GSoC projects was officially announced.
For MediaWiki, the following projects have been accepted:

* Niklas Laxström (Nikerabbit), mentored by Siebrand, will be working
on improving localization and internationalization in MediaWiki, as
well as improving the Translate extension used on translatewiki.net
* Zhe Wu, mentored by Aryeh Gregor (Simetrical), will be building a
thumbnailing daemon, so image manipulation won't have to happen on the
Apache servers any more
* Jeroen de Dauw, mentored by Yaron Koren, will be improving the
Semantic Layers extension and merging it into the Semantic Google Maps
extension
* Gerardo Antonio Cabero, mentored by Michael Dale (mdale), will be
improving the Cortado applet for video playback (I'm a bit fuzzy on
the details for this one)

The official list with links to (parts of) the proposals can be found
at the Google website [1]; lists for other organizations can be
reached through the list of participating organizations [2].

The next event on the GSoC timeline [3] is the community bonding
period [4], during which the students are supposed to get to know
their mentors and the community. This period lasts until May 23rd,
when the students actually begin coding.

Starting now and continuing at least until the end of GSoC in August,
you will probably see and hear from the students on IRC and the
mailing lists and hear about the projects they're working on. To
repeat the crux of an earlier thread on this list [5]: be nice to
these special newcomers, make them feel welcome and comfortable, and
try not to bite them :)

To the mentors and students: have fun!

Roan Kattouw (Catrope)

[1] http://socghop.appspot.com/org/home/google/gsoc2009/wikimedia
[2] http://socghop.appspot.com/program/accepted_orgs/google/gsoc2009
[3] http://socghop.appspot.com/document/show/program/google/gsoc2009/timeline
[4] 
http://googlesummerofcode.blogspot.com/2007/04/so-what-is-this-community-bonding-all.html
[5] http://lists.wikimedia.org/pipermail/wikitech-l/2009-March/041964.html

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-21 Thread Marco Schuster
On Wed, Apr 22, 2009 at 12:22 AM, Roan Kattouw wrote:

> * Zhe Wu, mentored by Aryeh Gregor (Simetrical), will be building a
> thumbnailing daemon, so image manipulation won't have to happen on the
> Apache servers any more


Wow, I'm lookin' forward to this. Mighta be worth a try to give the upper
the ability to choose non-standard resizing filters or so... or full-fledged
image manipulation, something like a wiki-style photoshop.

Marco


-- 
VMSoft GbR
Nabburger Str. 15
81737 München
Geschäftsführer: Marco Schuster, Volker Hemmert
http://vmsoft-gbr.de
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-21 Thread Michael Dale
I was looking at http://editor.pixastic.com/ ... "wiki-style photoshop" 
would be cool ... but not in the scope of that soc project ;)

--michael

Marco Schuster wrote:
> On Wed, Apr 22, 2009 at 12:22 AM, Roan Kattouw wrote:
>
>   
>> * Zhe Wu, mentored by Aryeh Gregor (Simetrical), will be building a
>> thumbnailing daemon, so image manipulation won't have to happen on the
>> Apache servers any more
>> 
>
>
> Wow, I'm lookin' forward to this. Mighta be worth a try to give the upper
> the ability to choose non-standard resizing filters or so... or full-fledged
> image manipulation, something like a wiki-style photoshop.
>
> Marco
>
>
>   


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-21 Thread David Gerard
2009/4/22 Michael Dale :
> Marco Schuster wrote:
>> On Wed, Apr 22, 2009 at 12:22 AM, Roan Kattouw wrote:

>>> * Zhe Wu, mentored by Aryeh Gregor (Simetrical), will be building a
>>> thumbnailing daemon, so image manipulation won't have to happen on the
>>> Apache servers any more

>> Wow, I'm lookin' forward to this. Mighta be worth a try to give the upper
>> the ability to choose non-standard resizing filters or so... or full-fledged
>> image manipulation, something like a wiki-style photoshop.

> I was looking at http://editor.pixastic.com/ ... "wiki-style photoshop"
> would be cool ... but not in the scope of that soc project ;)


You can do pretty much anything with ImageMagick. Trouble is that it's
not the fastest at *anything*. Depends how much that affects
performance in practice - something that *just* thumbnails could be
all sorts of more efficient, but you'd need a new program for each
function, and most Unix users of MediaWiki thumbnail with ImageMagick
already so it'll be there.


- d.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-21 Thread Chad
On Tue, Apr 21, 2009 at 8:16 PM, David Gerard  wrote:
> 2009/4/22 Michael Dale :
>> Marco Schuster wrote:
>>> On Wed, Apr 22, 2009 at 12:22 AM, Roan Kattouw 
>>> wrote:
>
 * Zhe Wu, mentored by Aryeh Gregor (Simetrical), will be building a
 thumbnailing daemon, so image manipulation won't have to happen on the
 Apache servers any more
>
>>> Wow, I'm lookin' forward to this. Mighta be worth a try to give the upper
>>> the ability to choose non-standard resizing filters or so... or full-fledged
>>> image manipulation, something like a wiki-style photoshop.
>
>> I was looking at http://editor.pixastic.com/ ... "wiki-style photoshop"
>> would be cool ... but not in the scope of that soc project ;)
>
>
> You can do pretty much anything with ImageMagick. Trouble is that it's
> not the fastest at *anything*. Depends how much that affects
> performance in practice - something that *just* thumbnails could be
> all sorts of more efficient, but you'd need a new program for each
> function, and most Unix users of MediaWiki thumbnail with ImageMagick
> already so it'll be there.
>
>
> - d.
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>

The main issue with the daemon idea (which was discussed at length in
#mediawiki a few weeks ago) is that it requires a major change in how we
handle images.

Right now, the process involves rendering on-demand, rather than at-leisure.
This has the benefit of always producing an ideal thumb'd image at the end
of every parse. However the major drawbacks are an increase in parsing
time (while we wait for ImageMagik to do its thing) and an increased load on
the app servers. The only time we can sidestep this is if someone uses a
thumb dimension for which we already have a thumb rendered.

In order for this to work, we'd need to shift to a style of "render when you get
a chance, but give me the best fit for now." Basically, we'd begin parsing and
find that we need a thumbnailed copy of some image, but we don't have the
ideal size just yet. Instead, we could return the best-fitting thumbnail so far
and use that until the daemon has given us the right image.

Not an easy task, but I certainly hope some progress can be made on
it over the summer :)

-Chad

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-21 Thread Aryeh Gregor
On Tue, Apr 21, 2009 at 7:54 PM, Marco Schuster
 wrote:
> Wow, I'm lookin' forward to this. Mighta be worth a try to give the upper
> the ability to choose non-standard resizing filters or so... or full-fledged
> image manipulation, something like a wiki-style photoshop.

That seems to be orthogonal to the proposed project.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-21 Thread Aryeh Gregor
On Tue, Apr 21, 2009 at 8:34 PM, Chad  wrote:
> The main issue with the daemon idea (which was discussed at length in
> #mediawiki a few weeks ago) is that it requires a major change in how we
> handle images.
>
> Right now, the process involves rendering on-demand, rather than at-leisure.
> This has the benefit of always producing an ideal thumb'd image at the end
> of every parse. However the major drawbacks are an increase in parsing
> time (while we wait for ImageMagik to do its thing) and an increased load on
> the app servers. The only time we can sidestep this is if someone uses a
> thumb dimension for which we already have a thumb rendered.
>
> In order for this to work, we'd need to shift to a style of "render when you 
> get
> a chance, but give me the best fit for now." Basically, we'd begin parsing and
> find that we need a thumbnailed copy of some image, but we don't have the
> ideal size just yet. Instead, we could return the best-fitting thumbnail so 
> far
> and use that until the daemon has given us the right image.

I'm not clear on why we don't just make the daemon synchronously
return a result the way ImageMagick effectively does.  Given the level
of reuse of thumbnails, it seems unlikely that the latency is a
significant concern -- virtually no requests will ever actually wait
on it.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-22 Thread Michael Dale
Aryeh Gregor wrote:
> I'm not clear on why we don't just make the daemon synchronously
> return a result the way ImageMagick effectively does.  Given the level
> of reuse of thumbnails, it seems unlikely that the latency is a
> significant concern -- virtually no requests will ever actually wait
> on it.
>   
( I basically outlined these issues on the soc page but here they are 
again with at bit more clarity )

I recommended that the image daemon run semi-synchronously since the 
changes needed to maintain multiple states and return non-cached 
place-holder images while managing updates and page purges for when the 
updated images are available within the wikimedia server architecture 
probably won't be completed in the summer of code time-line. But if the 
student is up for it the concept would be useful for other components 
like video transformation / transcoding, sequence flattening etc. But 
its not what I would recommend for the summer of code time-line.

== per issues outlined in bug 4854 ==
I don't think its a good idea to invest a lot of energy into a separate 
python based image daemon. It won't avoid all  problems listed in bug 4854

Shell-character-exploit issues should be checked against anyway (since 
not everyone is going to install the daemon)

Other people using mediaWiki won't add a python or java based image 
resize and resolve dependency python or java  component & libraries. It 
won't be easier to install than imagemagick or "php-gd" that are 
repository hosted applications and already present in shared hosting 
environments.

Once you start integrating other libs like (java) Batik it becomes 
difficult to resolve dependencies (java, python etc) and to install you 
have to push out a "new program" that is not integrated into all the 
application repository manages for the various distributions. 

Potential to isolate CPU and memory usage should be considered in the 
core medaiWiki image resize support anyway . ie we don't want to crash 
other peoples servers who are using mediaWiki by not checking upper 
bounds of image transforms. Instead we should make the core image 
transform smarter maybe have a configuration var that /attempts/ to bind 
the upper memory for spawned processing and take that into account 
before issuing the shell command for a given large image transformation 
with a given sell application.

== what would probably be better for the image resize efforts should 
focus on ===

(1) making the existing system "more robust" and (2) better taking 
advantage of multi-threaded servers.

(1) right now the system chokes on large images we should deploy support 
for an in-place image resize maybe something like vips (?) 
(http://www.vips.ecs.soton.ac.uk/index.php?title=Speed_and_Memory_Use) 
The system should intelligently call vips to transform the image to a 
reasonable size at time of upload then use those derivative for just in 
time thumbs for articles. ( If vips is unavailable we don't transform 
and we don't crash the apache node.)

(2) maybe spinning out the image transform process early on in the 
parsing of the page with a place-holder and callback so by the time all 
the templates and links have been looked up the image is ready for 
output. (maybe another function wfShellBackgroundExec($cmd, 
$callback_function) (maybe using |pcntl_fork then normal |wfShellExec 
then| ||pcntl_waitpid then callback function ... which sets some var in 
the parent process so that pageOutput knows its good to go) |

If operationally the "daemon" should be on a separate server we should 
still more or less run synchronously ... as mentioned above ... if 
possible the daemon should be php based so we don't explode the 
dependencies for deploying robust image handling with mediaWiki.

peace,
--michael

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-22 Thread Brion Vibber
Thanks for taking care of the announce mail, Roan! I spent all day 
yesterday at the dentists... whee :P

I've taken the liberty of reposting it on the tech blog: 
http://techblog.wikimedia.org/2009/04/google-summer-of-code-student-projects-accepted/

I'd love for us to get the students set up on the blog to keep track of 
their project progress and raise visibility... :D

-- brion

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-22 Thread Magnus Manske
On Wed, Apr 22, 2009 at 12:54 AM, Marco Schuster
 wrote:
> On Wed, Apr 22, 2009 at 12:22 AM, Roan Kattouw wrote:
>
>> * Zhe Wu, mentored by Aryeh Gregor (Simetrical), will be building a
>> thumbnailing daemon, so image manipulation won't have to happen on the
>> Apache servers any more
>
>
> Wow, I'm lookin' forward to this. Mighta be worth a try to give the upper
> the ability to choose non-standard resizing filters or so... or full-fledged
> image manipulation, something like a wiki-style photoshop.

On a semi-related note: What's the status of the management routines
that handle "thrwoaway" things like math PNGs?
Is this a generic system, so it can be used e.g. for jmol PNGs in the future?
Is it integrated with the image thumbnail handling?
Should it be?

Magnus

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-22 Thread Brion Vibber
On 4/22/09 11:13 AM, Magnus Manske wrote:
> On Wed, Apr 22, 2009 at 12:54 AM, Marco Schuster
>   wrote:
>> On Wed, Apr 22, 2009 at 12:22 AM, Roan Kattouwwrote:
>>
>>> * Zhe Wu, mentored by Aryeh Gregor (Simetrical), will be building a
>>> thumbnailing daemon, so image manipulation won't have to happen on the
>>> Apache servers any more
>>
>> Wow, I'm lookin' forward to this. Mighta be worth a try to give the upper
>> the ability to choose non-standard resizing filters or so... or full-fledged
>> image manipulation, something like a wiki-style photoshop.
>
> On a semi-related note: What's the status of the management routines
> that handle "thrwoaway" things like math PNGs?

There is no management for this yet, it's done ad-hoc in each such 
system. :(

> Is this a generic system, so it can be used e.g. for jmol PNGs in the future?
> Is it integrated with the image thumbnail handling?
> Should it be?

We do need a central management system for this, which can handle:

1) Storage backends other than raw filesystem

We want to migrate off of using NFS to something we can better control 
failover and other characteristics of. Not having to implement the 
interface a second, third, fourth etc time for math, timeline, etc would 
be nice.


2) Garbage collection / expiration of no-longer-used items

Right now math and timeline renderings just get stored forever and ever...


3) Sensible purging/expiration/override of old renderings when renderer 
behavior changes

When we fix a bug in, upgrade, or expand capabilities of texvc etc we 
need to be able to re-render the new, corrected images. Preferably in a 
way that's friendly to caching, and that doesn't kill our servers with a 
giant immediate crush of requests.


4) Rendering server isolation

Being able to offload rendering to a subcluster with restricted resource 
limits can help avoid bringing down the entire site when there's a 
runaway process (like all those image resizing problems we've seen with 
giant PNGs and animated GIFs).

It may also help to do some privilege separation for services we might 
not trust quite as much (shelling out to an external program with 
user-supplied data? What could go wrong? :)

-- brion

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-22 Thread K. Peachey
On Thu, Apr 23, 2009 at 2:30 AM, Brion Vibber  wrote:
> Thanks for taking care of the announce mail, Roan! I spent all day
> yesterday at the dentists... whee :P
>
> I've taken the liberty of reposting it on the tech blog:
> http://techblog.wikimedia.org/2009/04/google-summer-of-code-student-projects-accepted/
>
> I'd love for us to get the students set up on the blog to keep track of
> their project progress and raise visibility... :D
>
> -- brion
Maybe a nice little install of WordpressMU might be in order so they
each have a little blog which they can update.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-23 Thread Wu Zhe
Michael Dale  writes:

> I recommended that the image daemon run semi-synchronously since the
> changes needed to maintain multiple states and return non-cached
> place-holder images while managing updates and page purges for when the
> updated images are available within the wikimedia server architecture
> probably won't be completed in the summer of code time-line. But if the
> student is up for it the concept would be useful for other components
> like video transformation / transcoding, sequence flattening etc. But
> its not what I would recommend for the summer of code time-line.

I may have problems understanding the concept "semi-synchronously", does
it mean when MW parse a page that contains thumbnail images, the parser
sends requests to daemon which would reply twice for each request, one
immediately with a best fit or a place holder (synchronously), one later
on when thumbnail is ready (asynchronously)?

> == what would probably be better for the image resize efforts should
> focus on ===
>
> (1) making the existing system "more robust" and (2) better taking
> advantage of multi-threaded servers.
>
> (1) right now the system chokes on large images we should deploy
> support for an in-place image resize maybe something like vips (?)
> (http://www.vips.ecs.soton.ac.uk/index.php?title=Speed_and_Memory_Use)
> The system should intelligently call vips to transform the image to a
> reasonable size at time of upload then use those derivative for just
> in time thumbs for articles. ( If vips is unavailable we don't
> transform and we don't crash the apache node.)

Wow, vips sounds great, still reading its documentation. How is its
performance on relatively small size (not huge, a few hundreds pixels in
width/height) images compared with traditional single threaded resizing
programs?

> (2) maybe spinning out the image transform process early on in the
> parsing of the page with a place-holder and callback so by the time
> all the templates and links have been looked up the image is ready for
> output. (maybe another function wfShellBackgroundExec($cmd,
> $callback_function) (maybe using |pcntl_fork then normal |wfShellExec
> then| ||pcntl_waitpid then callback function ... which sets some var
> in the parent process so that pageOutput knows its good to go) |

Asynchronous daemon doesn't make much sense if page purge occurs on
server side, but what if we put off page purge to the browser? It works
like this:

1. mw parser send request to daemon
2. daemon finds the work non-trivial, reply *immediately* with a best
   fit or just a place holder
3. browser renders the page, finds it's not final, so sends a request to
   daemon directly using AJAX
4. daemon reply to the browser when thumbnail is ready
5. browser replace temporary best fit / place holder with new thumb
   using Javascript

Daemon now have to deal with two kinds of clients: mw servers and
browsers.

Letting browser wait instead of mw server has the benefit of reduced
latency for users while still have an acceptable page to read before
image replacing takes place and a perfect page after that. For most of
users, it's likely that the replacing occurs as soon as page loading
ends, since transfering page takes some time, and daemon would have
already finished thumbnailing in the process.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-23 Thread Wu Zhe
Michael Dale  writes:

> I recommended that the image daemon run semi-synchronously since the
> changes needed to maintain multiple states and return non-cached
> place-holder images while managing updates and page purges for when the
> updated images are available within the wikimedia server architecture
> probably won't be completed in the summer of code time-line. But if the
> student is up for it the concept would be useful for other components
> like video transformation / transcoding, sequence flattening etc. But
> its not what I would recommend for the summer of code time-line.

I may have problems understanding the concept "semi-synchronously", does
it mean when MW parse a page that contains thumbnail images, the parser
sends requests to daemon which would reply twice for each request, one
immediately with a best fit or a place holder (synchronously), one later
on when thumbnail is ready (asynchronously)?

> == what would probably be better for the image resize efforts should
> focus on ===
>
> (1) making the existing system "more robust" and (2) better taking
> advantage of multi-threaded servers.
>
> (1) right now the system chokes on large images we should deploy
> support for an in-place image resize maybe something like vips (?)
> (http://www.vips.ecs.soton.ac.uk/index.php?title=Speed_and_Memory_Use)
> The system should intelligently call vips to transform the image to a
> reasonable size at time of upload then use those derivative for just
> in time thumbs for articles. ( If vips is unavailable we don't
> transform and we don't crash the apache node.)

Wow, vips sounds great, still reading its documentation. How is its
performance on relatively small size (not huge, a few hundreds pixels in
width/height) images compared with traditional single threaded resizing
programs?

> (2) maybe spinning out the image transform process early on in the
> parsing of the page with a place-holder and callback so by the time
> all the templates and links have been looked up the image is ready for
> output. (maybe another function wfShellBackgroundExec($cmd,
> $callback_function) (maybe using |pcntl_fork then normal |wfShellExec
> then| ||pcntl_waitpid then callback function ... which sets some var
> in the parent process so that pageOutput knows its good to go) |

Asynchronous daemon doesn't make much sense if page purge occurs on
server side, but what if we put off page purge to the browser? It works
like this:

1. mw parser send request to daemon
2. daemon finds the work non-trivial, reply *immediately* with a best
   fit or just a place holder
3. browser renders the page, finds it's not final, so sends a request to
   daemon directly using AJAX
4. daemon reply to the browser when thumbnail is ready
5. browser replace temporary best fit / place holder with new thumb
   using Javascript

The daemon now have to deal with two kinds of clients: mw servers and
browsers.

Letting browser wait instead of mw server has the benefit of reduced
latency for users while still have an acceptable page to read before
image replacing takes place and a perfect page after that. For most of
users, it's likely that the replacing occurs as soon as page loading
ends, since transfering page takes some time, and daemon would have
already finished thumbnailing in the process.



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-23 Thread Wu Zhe
Michael Dale  writes:

> I recommended that the image daemon run semi-synchronously since the
> changes needed to maintain multiple states and return non-cached
> place-holder images while managing updates and page purges for when the
> updated images are available within the wikimedia server architecture
> probably won't be completed in the summer of code time-line. But if the
> student is up for it the concept would be useful for other components
> like video transformation / transcoding, sequence flattening etc. But
> its not what I would recommend for the summer of code time-line.

I may have problems understanding the concept "semi-synchronously", does
it mean when MW parse a page that contains thumbnail images, the parser
sends requests to daemon which would reply twice for each request, one
immediately with a best fit or a place holder (synchronously), one later
on when thumbnail is ready (asynchronously)?

> == what would probably be better for the image resize efforts should
> focus on ===
>
> (1) making the existing system "more robust" and (2) better taking
> advantage of multi-threaded servers.
>
> (1) right now the system chokes on large images we should deploy
> support for an in-place image resize maybe something like vips (?)
> (http://www.vips.ecs.soton.ac.uk/index.php?title=Speed_and_Memory_Use)
> The system should intelligently call vips to transform the image to a
> reasonable size at time of upload then use those derivative for just
> in time thumbs for articles. ( If vips is unavailable we don't
> transform and we don't crash the apache node.)

Wow, vips sounds great, still reading its documentation. How is its
performance on relatively small size (not huge, a few hundreds pixels in
width/height) images compared with traditional single threaded resizing
programs?

> (2) maybe spinning out the image transform process early on in the
> parsing of the page with a place-holder and callback so by the time
> all the templates and links have been looked up the image is ready for
> output. (maybe another function wfShellBackgroundExec($cmd,
> $callback_function) (maybe using |pcntl_fork then normal |wfShellExec
> then| ||pcntl_waitpid then callback function ... which sets some var
> in the parent process so that pageOutput knows its good to go) |

Asynchronous daemon doesn't make much sense if page purge occurs on
server side, but what if we put off page purge to the browser? It works
like this:

1. mw parser send request to daemon
2. daemon finds the work non-trivial, reply *immediately* with a best
   fit or just a place holder
3. browser renders the page, finds it's not final, so sends a request to
   daemon directly using AJAX
4. daemon reply to the browser when thumbnail is ready
5. browser replace temporary best fit / place holder with new thumb
   using Javascript

The daemon now have to deal with two kinds of clients: mw servers and
browsers.

Letting browser wait instead of mw server has the benefit of reduced
latency for users while still have an acceptable page to read before
image replacing takes place and a perfect page after that. For most of
users, it's likely that the replacing occurs as soon as page loading
ends, since transfering page takes some time, and daemon would have
already finished thumbnailing in the process.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-23 Thread Wu Zhe
Michael Dale  writes:

> I recommended that the image daemon run semi-synchronously since the
> changes needed to maintain multiple states and return non-cached
> place-holder images while managing updates and page purges for when the
> updated images are available within the wikimedia server architecture
> probably won't be completed in the summer of code time-line. But if the
> student is up for it the concept would be useful for other components
> like video transformation / transcoding, sequence flattening etc. But
> its not what I would recommend for the summer of code time-line.

I may have problems understanding the concept "semi-synchronously", does
it mean when MW parse a page that contains thumbnail images, the parser
sends requests to daemon which would reply twice for each request, one
immediately with a best fit or a place holder (synchronously), one later
on when thumbnail is ready (asynchronously)?

> == what would probably be better for the image resize efforts should
> focus on ===
>
> (1) making the existing system "more robust" and (2) better taking
> advantage of multi-threaded servers.
>
> (1) right now the system chokes on large images we should deploy
> support for an in-place image resize maybe something like vips (?)
> (http://www.vips.ecs.soton.ac.uk/index.php?title=Speed_and_Memory_Use)
> The system should intelligently call vips to transform the image to a
> reasonable size at time of upload then use those derivative for just
> in time thumbs for articles. ( If vips is unavailable we don't
> transform and we don't crash the apache node.)

Wow, vips sounds great, still reading its documentation. How is its
performance on relatively small size (not huge, a few hundreds pixels in
width/height) images compared with traditional single threaded resizing
programs?

> (2) maybe spinning out the image transform process early on in the
> parsing of the page with a place-holder and callback so by the time
> all the templates and links have been looked up the image is ready for
> output. (maybe another function wfShellBackgroundExec($cmd,
> $callback_function) (maybe using |pcntl_fork then normal |wfShellExec
> then| ||pcntl_waitpid then callback function ... which sets some var
> in the parent process so that pageOutput knows its good to go) |

Asynchronous daemon doesn't make much sense if page purge occurs on
server side, but what if we put off page purge to the browser? It works
like this:

1. mw parser send request to daemon
2. daemon finds the work non-trivial, reply *immediately* with a best
   fit or just a place holder
3. browser renders the page, finds it's not final, so sends a request to
   daemon directly using AJAX
4. daemon reply to the browser when thumbnail is ready
5. browser replace temporary best fit / place holder with new thumb
   using Javascript

Daemon now have to deal with two kinds of clients: mw servers and
browsers.

Letting browser wait instead of mw server has the benefit of reduced
latency for users while still have an acceptable page to read before
image replacing takes place and a perfect page after that. For most of
users, it's likely that the replacing occurs as soon as page loading
ends, since transfering page takes some time, and daemon would have
already finished thumbnailing in the process.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-23 Thread Wu Zhe

Sorry about the duplicates, I posted via gmane, but haven't seen my post
there for some time and thought there must be something wrong with
gmane. This won't happen again.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-24 Thread Nikola Smolenski
Wu Zhe wrote:
> Asynchronous daemon doesn't make much sense if page purge occurs on
> server side, but what if we put off page purge to the browser? It works
> like this:
> 
> 1. mw parser send request to daemon
> 2. daemon finds the work non-trivial, reply *immediately* with a best
>fit or just a place holder
> 3. browser renders the page, finds it's not final, so sends a request to
>daemon directly using AJAX
> 4. daemon reply to the browser when thumbnail is ready
> 5. browser replace temporary best fit / place holder with new thumb
>using Javascript
> 
> The daemon now have to deal with two kinds of clients: mw servers and
> browsers.

To me this looks way too overcomplicated. I suggest a simpler approach:

1. mw copies a placeholder image to the appropriate filename: the 
placeholder could be the original image, best match thumb or a PNG with 
text "wait until the thumbnail renders";
2. mw send request to daemon;
3. daemon copies resized image over the placeholder.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-24 Thread Wu Zhe
Nikola Smolenski  writes:

> Wu Zhe wrote:
>> Asynchronous daemon doesn't make much sense if page purge occurs on
>> server side, but what if we put off page purge to the browser? It works
>> like this:
>> 
>> 1. mw parser send request to daemon
>> 2. daemon finds the work non-trivial, reply *immediately* with a best
>>fit or just a place holder
>> 3. browser renders the page, finds it's not final, so sends a request to
>>daemon directly using AJAX
>> 4. daemon reply to the browser when thumbnail is ready
>> 5. browser replace temporary best fit / place holder with new thumb
>>using Javascript
>> 
>> The daemon now have to deal with two kinds of clients: mw servers and
>> browsers.
>
> To me this looks way too overcomplicated. I suggest a simpler approach:
>
> 1. mw copies a placeholder image to the appropriate filename: the 
> placeholder could be the original image, best match thumb or a PNG with 
> text "wait until the thumbnail renders";
> 2. mw send request to daemon;
> 3. daemon copies resized image over the placeholder.

This simpler approach differs in that it gets rid of the AJAX thing, now
users have to manually refresh the page. Whether the AJAX is worth the
effort is discussable.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-24 Thread Magnus Manske
I've created an initial proposal for a unified storage-handling database:

http://www.mediawiki.org/wiki/User:Magnus_Manske/File_handling

Feel free to edit and comment :-)

Cheers,
Magnus

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-24 Thread Aryeh Gregor
On Fri, Apr 24, 2009 at 12:31 AM, Wu Zhe  wrote:
> Asynchronous daemon doesn't make much sense if page purge occurs on
> server side, but what if we put off page purge to the browser? It works
> like this:
>
> 1. mw parser send request to daemon
> 2. daemon finds the work non-trivial, reply *immediately* with a best
>   fit or just a place holder
> 3. browser renders the page, finds it's not final, so sends a request to
>   daemon directly using AJAX
> 4. daemon reply to the browser when thumbnail is ready
> 5. browser replace temporary best fit / place holder with new thumb
>   using Javascript
>
> Daemon now have to deal with two kinds of clients: mw servers and
> browsers.
>
> Letting browser wait instead of mw server has the benefit of reduced
> latency for users while still have an acceptable page to read before
> image replacing takes place and a perfect page after that. For most of
> users, it's likely that the replacing occurs as soon as page loading
> ends, since transfering page takes some time, and daemon would have
> already finished thumbnailing in the process.

How long does it take to thumbnail a typical image, though?  Even a
parser cache hit (but Squid miss) will take hundreds of milliseconds
to serve and hundreds of more milliseconds for network latency.  If
we're talking about each image adding 10 ms to the latency, then it's
not worth it to add all this fancy asynchronous stuff.

Moreover, in MediaWiki's case specifically, *very* few requests should
actually require the thumbnailing.  Only the first request for a given
size of a given image should ever require thumbnailing: that can then
be cached more or less forever.  So it's not a good case to optimize
for.  If the architecture should be simplified significantly at the
cost of slight extra latency in 0.01% of requests, I think it's clear
that the simpler architecture is superior.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-24 Thread Roan Kattouw
2009/4/24 Aryeh Gregor :
> How long does it take to thumbnail a typical image, though?  Even a
> parser cache hit (but Squid miss) will take hundreds of milliseconds
> to serve and hundreds of more milliseconds for network latency.  If
> we're talking about each image adding 10 ms to the latency, then it's
> not worth it to add all this fancy asynchronous stuff.
>
The problem here seems to be that thumbnail generation times vary a
lot, based on format and size of the original image. It could be 10 ms
for one image and 10 s for another, who knows.

> Moreover, in MediaWiki's case specifically, *very* few requests should
> actually require the thumbnailing.  Only the first request for a given
> size of a given image should ever require thumbnailing: that can then
> be cached more or less forever.
That's true, we're already doing that.

> So it's not a good case to optimize
> for.
AFAICT this isn't about optimization, it's about not bogging down the
Apache that has the misfortune of getting the first request to thumb a
huge image (but having a dedicated server for that instead), and about
not letting the associated user wait for ages. Even worse, requests
that thumb very large images could hit the 30s execution limit and
fail, which means those thumbs will never be generated but every user
requesting it will have a request last for 30s and time out.

Roan Kattouw (Catrope)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-24 Thread Chad
All true. The images should not be rethumb'd unless
resolution changes, a new version is uploaded, or the
cache is otherwise purged. However, on initial rendering,
the thumb generation can be a large part (especially if
rendering multiple images) of overall page execution time.
Being able to offload this elsewhere should decrease
that load greatly.

-Chad

On Apr 24, 2009 1:23 PM, "Roan Kattouw"  wrote:

2009/4/24 Aryeh Gregor

>:

> How long does it take to thumbnail a typical image, though?  Even a >
parser cache hit (but Squid ...
The problem here seems to be that thumbnail generation times vary a
lot, based on format and size of the original image. It could be 10 ms
for one image and 10 s for another, who knows.

> Moreover, in MediaWiki's case specifically, *very* few requests should >
actually require the thu...
That's true, we're already doing that.

> So it's not a good case to optimize > for.
AFAICT this isn't about optimization, it's about not bogging down the
Apache that has the misfortune of getting the first request to thumb a
huge image (but having a dedicated server for that instead), and about
not letting the associated user wait for ages. Even worse, requests
that thumb very large images could hit the 30s execution limit and
fail, which means those thumbs will never be generated but every user
requesting it will have a request last for 30s and time out.

Roan Kattouw (Catrope)

___ Wikitech-l mailing list
wikitec...@lists.wikimedia
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-24 Thread Roan Kattouw
2009/4/24 Chad :
> All true. The images should not be rethumb'd unless
> resolution changes, a new version is uploaded, or the
> cache is otherwise purged.
Repeat: this is what we do already (not sure if that's what you're
trying to say, but "should" implies differently).

Roan Kattouw (Catrope)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-24 Thread Chad
I'm agreeing with you. By "should" I meant "this should
be happening already and issues with this are bugs."

-Chad

On Apr 24, 2009 1:32 PM, "Roan Kattouw"  wrote:

2009/4/24 Chad :

> All true. The images should not be rethumb'd unless > resolution changes,
a new version is uploade...
Repeat: this is what we do already (not sure if that's what you're
trying to say, but "should" implies differently).

Roan Kattouw (Catrope) ___
Wikitech-l mailing list

Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-24 Thread Brion Vibber
On 4/24/09 10:32 AM, Roan Kattouw wrote:
> 2009/4/24 Chad:
>> All true. The images should not be rethumb'd unless
>> resolution changes, a new version is uploaded, or the
>> cache is otherwise purged.
> Repeat: this is what we do already (not sure if that's what you're
> trying to say, but "should" implies differently).

Just to summarize the current state, here's the default MediaWiki 
configuration workflow:

* During page rendering, MediaWiki checks if a thumb of the proper size 
exists.
   * if not, we resize it synchronously on the same server (via GD or 
shell out to ImageMagick etc)
   * an  pointing to the file is added to output
* The web browser loads up the already-rendered image file in the page.


Here's the behavior variant we have on Wikimedia sites:

* During page rendering, we make an  pointing to where the 
thumbnail should be
* The web browser requests the thumbnail image file
   * If it doesn't exist, the upload web server proxies the request [1] 
back to MediaWiki, running on a subcluster which handles only thumbnail 
generation
 * MediaWiki resizes it synchronously via shell out to ImageMagick
   * The web server serves the now-completed file back to the client, 
and it's now on disk for the next request

[1] http://svn.wikimedia.org/viewvc/mediawiki/trunk/tools/upload-scripts/

This prevents slow or broken thumbnailing operations from bogging down 
the *main* web servers, but if it's not reasonably fast we still have 
difficulties:

* No placeholder image -- browser just shows a nice blank box
* Multiple requests will cause multiple attempts to resize at once, 
potentially eating up all CPU time/memory/tmp disk space on the 
thumbnailing cluster

So if we've got, say, a 50 megapixel PNG or TIFF high-res scan, or a 
giant animated GIF which is very expensive to resize, we don't have a 
good way of producing a thumbnail on a good schedule. It'll either time 
out a lot every time it changes, or just never actually complete.

If we have a way to defer things we know will take longer, and show a 
placeholder until it's completed, then we can use those things more 
reliably.


One suggestion that's been brought up for large images is to create a 
smaller version *once at upload time* which can then be used to quickly 
create inline thumbnails of various sizes on demand. But we still need 
some way to manage that asynchronous initial rendering, and have some 
kind of friendly behavior for what to show while it's working.

-- brion

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-24 Thread Gerard Meijssen
Hoi,
At the moment we have an upper limit of 100Mb. The people who do
restorations have one file that is 680Mb.. The corresponding jpg is also
quite big  !!
Thanks,
   GerardM

2009/4/24 Roan Kattouw 

> 2009/4/24 Aryeh Gregor 
> 
> >:
> > How long does it take to thumbnail a typical image, though?  Even a
> > parser cache hit (but Squid miss) will take hundreds of milliseconds
> > to serve and hundreds of more milliseconds for network latency.  If
> > we're talking about each image adding 10 ms to the latency, then it's
> > not worth it to add all this fancy asynchronous stuff.
> >
> The problem here seems to be that thumbnail generation times vary a
> lot, based on format and size of the original image. It could be 10 ms
> for one image and 10 s for another, who knows.
>
> > Moreover, in MediaWiki's case specifically, *very* few requests should
> > actually require the thumbnailing.  Only the first request for a given
> > size of a given image should ever require thumbnailing: that can then
> > be cached more or less forever.
> That's true, we're already doing that.
>
> > So it's not a good case to optimize
> > for.
> AFAICT this isn't about optimization, it's about not bogging down the
> Apache that has the misfortune of getting the first request to thumb a
> huge image (but having a dedicated server for that instead), and about
> not letting the associated user wait for ages. Even worse, requests
> that thumb very large images could hit the 30s execution limit and
> fail, which means those thumbs will never be generated but every user
> requesting it will have a request last for 30s and time out.
>
> Roan Kattouw (Catrope)
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-24 Thread Aryeh Gregor
On Fri, Apr 24, 2009 at 1:22 PM, Roan Kattouw  wrote:
> The problem here seems to be that thumbnail generation times vary a
> lot, based on format and size of the original image. It could be 10 ms
> for one image and 10 s for another, who knows.

Is it really necessary for any image to take 10s to thumbnail?  I
guess this would only happen for very large images -- perhaps we could
make sure to cache an intermediate-sized thumbnail as soon as the
image is uploaded, and then scale that down synchronously on request,
which should be fast.  Similarly, if specific image features
(progressive JPEG or whatever) make images much slower to thumbnail,
an intermediate version can be automatically generated on upload
without those features.  Of course you'd see a little loss in quality
from the double operation, but it seems like a more robust solution
than trying to use JavaScript.

I'm not an expert on image formats, however, so maybe I'm
misunderstanding our options.

> AFAICT this isn't about optimization, it's about not bogging down the
> Apache that has the misfortune of getting the first request to thumb a
> huge image (but having a dedicated server for that instead), and about
> not letting the associated user wait for ages.

"Not letting the associated user wait for ages" is called "making it
faster", which I'd say qualifies as optimization.  :)

> Even worse, requests
> that thumb very large images could hit the 30s execution limit and
> fail, which means those thumbs will never be generated but every user
> requesting it will have a request last for 30s and time out.

max_execution_time applies only to the time that PHP actually spends
executing.  If it's sleeping on a network request, it will never be
killed for reaching the max execution time.  Try running this code:

ini_set( 'max_execution_time', 5 );
error_reporting( E_ALL | E_STRICT );
ini_set( 'display_errors', 1 );

file_get_contents(
'http://toolserver.org/~simetrical/tmp/delay.php?len=10' );

echo "Fetched long URL!";

while ( true );

It will fetch the URL (which takes ten seconds), then only die after
the while ( true ) runs for about five seconds.  The same goes for
long database queries, etc.  I imagine it uses the OS's reports on
user/system time used instead of real time.

Plus, the idea is apparently for this to not be done by the server at
all, but by the client, so there will be no latency for the overall
page request anyway.  The page will load immediately, only the images
will wait if there's any waiting to be done.

On Fri, Apr 24, 2009 at 1:46 PM, Brion Vibber  wrote:
> One suggestion that's been brought up for large images is to create a
> smaller version *once at upload time* which can then be used to quickly
> create inline thumbnails of various sizes on demand. But we still need
> some way to manage that asynchronous initial rendering, and have some
> kind of friendly behavior for what to show while it's working.

That's what occurred to me.  In that case, the only possible thing to
do seems to be to just have the image request wait until the image is
thumbnailed.  I guess you could show a placeholder image, but that's
probably *less* friendly to the user, as long as we've specified the
height and width in the HTML.  The browser should provide some kind of
placeholder already while the image is loading, after all, and if we
let the browser provide the placeholder, then at least the image will
appear automatically when it's done thumbnailing.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-24 Thread Michael Dale
Roan Kattouw wrote:
> The problem here seems to be that thumbnail generation times vary a
> lot, based on format and size of the original image. It could be 10 ms
> for one image and 10 s for another, who knows.
>
>   
yea again if we only issue the big resize operation on initial upload 
with a memory friendly in-place library like vips I think we will be 
oky. Since the user just waited like 10-15 minutes to upload their huge 
image waiting an additional 10-30s at that point for thumbnail and 
"instant gratification" of seeing your image on the upload page ... is 
not such a big deal.  Then in-page use derivatives could predictably 
resize the 1024x786 ~or so~ image in realtime again instant 
gratification on page preview or page save.

Operationally this could go out to a thumbnail server or be done on the 
apaches if they are small operations it may be easier to keep the 
existing infrastructure than to intelligently handle the edge cases 
outlined. ( many resize request at once, placeholders, image proxy / 
deamon setup)

> AFAICT this isn't about optimization, it's about not bogging down the
> Apache that has the misfortune of getting the first request to thumb a
> huge image (but having a dedicated server for that instead), and about
> not letting the associated user wait for ages. Even worse, requests
> that thumb very large images could hit the 30s execution limit and
> fail, which means those thumbs will never be generated but every user
> requesting it will have a request last for 30s and time out.
>
>   

Again this may be related to the use of unpredictable memory usage of 
image-magic when resizing large images instead of a fast memory confined 
resize engine, no?

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-24 Thread David Gerard
2009/4/24 Aryeh Gregor :

> That's what occurred to me.  In that case, the only possible thing to
> do seems to be to just have the image request wait until the image is
> thumbnailed.  I guess you could show a placeholder image, but that's
> probably *less* friendly to the user, as long as we've specified the
> height and width in the HTML.  The browser should provide some kind of
> placeholder already while the image is loading, after all, and if we
> let the browser provide the placeholder, then at least the image will
> appear automatically when it's done thumbnailing.


There was a spec in earlier versions of HTML to put a low-res
thumbnail up while the full image dribbled through your dialup -  - but it was so little
used (I know of no cases) that I don't know if it's even supported in
browsers any more.

http://www.htmlcodetutorial.com/images/_IMG_LOWSRC.html


- d.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-24 Thread Gerard Meijssen
Hoi,
The library of Alexandria uses it for the display of their awesome
Napoleontic lithographs.. It would be awesome if we had that code.. It is
actually open source..
Thanks,
 Gerard

2009/4/24 David Gerard 

> 2009/4/24 Aryeh Gregor 
> 
> >:
>
> > That's what occurred to me.  In that case, the only possible thing to
> > do seems to be to just have the image request wait until the image is
> > thumbnailed.  I guess you could show a placeholder image, but that's
> > probably *less* friendly to the user, as long as we've specified the
> > height and width in the HTML.  The browser should provide some kind of
> > placeholder already while the image is loading, after all, and if we
> > let the browser provide the placeholder, then at least the image will
> > appear automatically when it's done thumbnailing.
>
>
> There was a spec in earlier versions of HTML to put a low-res
> thumbnail up while the full image dribbled through your dialup -  lowsrc="image-placeholder.gif" src="image.gif"> - but it was so little
> used (I know of no cases) that I don't know if it's even supported in
> browsers any more.
>
> http://www.htmlcodetutorial.com/images/_IMG_LOWSRC.html
>
>
> - d.
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-24 Thread Brion Vibber
On 4/24/09 11:05 AM, Michael Dale wrote:
> Roan Kattouw wrote:
>> The problem here seems to be that thumbnail generation times vary a
>> lot, based on format and size of the original image. It could be 10 ms
>> for one image and 10 s for another, who knows.
>>
>>
> yea again if we only issue the big resize operation on initial upload
> with a memory friendly in-place library like vips I think we will be
> oky. Since the user just waited like 10-15 minutes to upload their huge
> image waiting an additional 10-30s at that point for thumbnail and
> "instant gratification" of seeing your image on the upload page ... is
> not such a big deal.

Well, what about the 5 million other users browsing Special:Newimages? 
We don't want 50 simultaneous attempts to build that first 
über-thumbnail. :)

-- brion

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-24 Thread lists
>> with a memory friendly in-place library like vips I think we will be
>> oky. Since the user just waited like 10-15 minutes to upload their huge
>> image waiting an additional 10-30s at that point for thumbnail and
>> "instant gratification" of seeing your image on the upload page ... is
>> not such a big deal.
>
> Well, what about the 5 million other users browsing Special:Newimages?
> We don't want 50 simultaneous attempts to build that first
> über-thumbnail. :)

Thumbnail generation could be cascaded, i.e. 120px thumbs could be
generated from the 800px previews instead of the original images.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-24 Thread Aryeh Gregor
On Fri, Apr 24, 2009 at 2:44 PM, Brion Vibber  wrote:
> Well, what about the 5 million other users browsing Special:Newimages?
> We don't want 50 simultaneous attempts to build that first
> über-thumbnail. :)

You'd presumably use some kind of locking to stop multiple workers
from trying to render thumbnails of the same size in general
(über-thumbnails or not).

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-24 Thread Brion Vibber
On 4/24/09 12:07 PM, Aryeh Gregor wrote:
> On Fri, Apr 24, 2009 at 2:44 PM, Brion Vibber  wrote:
>> Well, what about the 5 million other users browsing Special:Newimages?
>> We don't want 50 simultaneous attempts to build that first
>> über-thumbnail. :)
>
> You'd presumably use some kind of locking to stop multiple workers
> from trying to render thumbnails of the same size in general
> (über-thumbnails or not).

Best to make it explicit rather than presume -- currently we have no 
such locking for slow resizing requests. :)

-- brion

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-24 Thread Aryeh Gregor
On Fri, Apr 24, 2009 at 3:58 PM, Brion Vibber  wrote:
> Best to make it explicit rather than presume -- currently we have no
> such locking for slow resizing requests. :)

Yes, definitely.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-24 Thread Platonides
Michael Dale wrote:
> yea again if we only issue the big resize operation on initial upload 
> with a memory friendly in-place library like vips I think we will be 
> oky. Since the user just waited like 10-15 minutes to upload their huge 
> image waiting an additional 10-30s at that point for thumbnail and 
> "instant gratification" of seeing your image on the upload page ... is 
> not such a big deal.

It can be parallelized, starting rendering the thumb while the file
hasn't been completely uploaded yet (most formats will allow to do that).
That'd need special software, the easiest would be to use a different
domain on Special:Upload action to the resizing cluster. These changes
are always an annoyance but it would ease many bugs: 10976, 16751,
18202, upload bar, non-NFS backend...


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-24 Thread Platonides
Also relevant: 17255 and 18201
And as this would be a new upload ssytem, also worth mentioning 18563
(new-upload branch)


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-24 Thread Brad Jorsch
On Fri, Apr 24, 2009 at 07:08:05PM +0100, David Gerard wrote:
> 
> There was a spec in earlier versions of HTML to put a low-res
> thumbnail up while the full image dribbled through your dialup -  lowsrc="image-placeholder.gif" src="image.gif"> - but it was so little
> used (I know of no cases) that I don't know if it's even supported in
> browsers any more.
> 
> http://www.htmlcodetutorial.com/images/_IMG_LOWSRC.html

I tried it with FireFox 3.0.9 and IE 7.0.6001.18000; neither paid any
attention to it. IE 6.0.2800.1106 under Wine also ignored it. Too bad,
that would have been nice if it worked.

I don't know that we need fancy AJAX if we know at page rendering time
whether the image is available, though. We might be able to get away
with a simple script like this:
  var ImageCache={};
  function loadImage(id, url){
  var i = document.getElementById(id);
  if(i){
  var img = new Image();
  ImageCache[id] = img;
  img.onload=function(){ i.src = url; ImageCache[id]=null; };
  img.src = url;
  }
  }
And then generate the  tag with the placeholder and some id, and
call that function onload for it. Of course, if there are a lot of these
images on one page then we might run into the browser's concurrent
connection limit, which an AJAX solution might be able to overcome.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-25 Thread Wu Zhe
Brion Vibber  writes:
> Just to summarize the current state, here's the default MediaWiki 
> configuration workflow:
>
> * During page rendering, MediaWiki checks if a thumb of the proper size 
> exists.
>* if not, we resize it synchronously on the same server (via GD or 
> shell out to ImageMagick etc)
>* an  pointing to the file is added to output
> * The web browser loads up the already-rendered image file in the page.
>
>
> Here's the behavior variant we have on Wikimedia sites:
>
> * During page rendering, we make an  pointing to where the 
> thumbnail should be
> * The web browser requests the thumbnail image file
>* If it doesn't exist, the upload web server proxies the request [1] 
> back to MediaWiki, running on a subcluster which handles only thumbnail 
> generation
>  * MediaWiki resizes it synchronously via shell out to ImageMagick
>* The web server serves the now-completed file back to the client, 
> and it's now on disk for the next request

The simpler approach suggested by Nikola seems to be able to address all
the needs here without changing the way Mediawiki currently works.

The daemon will reply only once to each request after it copied the
placeholder to specified destination with the same file name which the
daemon later overwrites with the real thumbnail when generating is done
silently in the background, no notification reply any more. This way we
can get rid of all the complexity asynchronous reply would cause.

There will be two places AFAIU where Mediawiki should request this
daemon:

1. when the requested thumbnail doesn't exists

2. when user uploads a large image, (to generate an intermediate source
image for future resizing), in this case the request object can contain
a flag to instruct the daemon to skip the placeholder coping step.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-27 Thread Michael Dale
Brion Vibber wrote:
>> yea again if we only issue the big resize operation on initial upload
>> with a memory friendly in-place library like vips I think we will be
>> oky. Since the user just waited like 10-15 minutes to upload their huge
>> image waiting an additional 10-30s at that point for thumbnail and
>> "instant gratification" of seeing your image on the upload page ... is
>> not such a big deal.
>> 
>
> Well, what about the 5 million other users browsing Special:Newimages? 
> We don't want 50 simultaneous attempts to build that first 
> über-thumbnail. :)
>   

Right .. I am just saying the simples path is integrate it into the 
upload flow. ie The image won't be a known asset until that first 
uber-thumbnail is generated. Once that happens then its available for 
inclusion and listed on Newimages. The user won't notice that extra 
10-15 second delay because it will be part of the uploading flow.

i.e the user is already waiting for their file to be uploaded a few 
extra seconds of server side processing integrated into that waiting 
won't be noticed that much and will be easier to integrate with the 
existing system. (instead of a new concept of "resource is being 
processed please wait"

We do eventually need a "this resource is being processed" concept but I 
don't know if its good project for the summer of code student to target.

--michael

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l