Re: [Toolserver-l] Needs of important ressources

2011-03-04 Thread Aryeh Gregor
On Fri, Mar 4, 2011 at 2:37 PM, Seb35  wrote:
> Thanks for all these responses, we will ask the next time before renting a
> server for such a purpose.

Account approval can sometimes take a while, often weeks.  If you're
thinking you'll likely use the toolserver in the future, you might
want to apply for an account now.  It doesn't sound like the resources
you need would be any problem at all -- we might offer considerably
better hardware than whoever you were renting from, too.

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette


Re: [Toolserver-l] Needs of important ressources

2011-03-04 Thread Seb35
Fri, 04 Mar 2011 20:17:19 +0100, Platonides  wrote:
> Seb35 wrote:
>> Krinkle wrote:
>>> How much is "too much memory" ?
>>
>> We needed to transform and crop TIFF images, read an XML associated  
>> with a
>> book containing the OCRized text of the digitized book, and create a  
>> DjVu
>> with the images and the text layer.
>>
>> For that we rent a server, I cannot remember exactly the hardware we
>> choosed, but it was probably a 4-core (or 8-core) with 4GB (or 8GB) of  
>> RAM
>> and 200-300GB of disk (and a server bandwith, useful to download the  
>> files
>>  from the FTP of the BnF, about 500 files by book (1 XML/page + TIFF
>> multipage + some others) x 1416 books = 2-3 days of download on the  
>> server
>> because of many small files).
>>
>>  From what I remember, "Too much memory" means my laptop (2-core 2.8GHz,
>> 3GB of RAM) on which I developed the (Python) program had difficulies to
>> load the whole XML file (with DOM). Then I tried with SAX and the work  
>> was
>> done in some seconds without a lot of memory (I didn't used SAX before,
>> but I ♥ SAX now :-)
>>
>> We wrote a technical report about that, but didn't published it for now
>> (perhaps a day, I hope), you can see
>> 
>> for an "outreach" document and
>>  for the Python
>> program.
>>
>> Seb35
>
> It is important to use the right tools. As you mention, such big xmls
> need to be processed on-the-fly, not by loading them in memory.
> You mention a server with 4 or 8 cores. Was your program multithreaded
> (or otherwise running several instances)? Are those single-threaded 24h?
>
> Also, those instances happened once, and are quite different, so it's
> probably better to ask about the needed resources when you know what you
> are next needing.
> What you mention doesn't seem too much for the toolserver. You should be
> able to use enough disk space, and the task could be run in the
> background, so cpu wouldn't need to affect other users (specially given
> that there are not fixed time constraints). Memory could be a problem,
> though, depending on the amount used and for how long. SGE can probably
> show some memory usage graphs from which to deduce the amount available
> for these kind of projects.

Thanks for all these responses, we will ask the next time before renting a  
server for such a purpose.

We use multi-threads (easy with Python, 4 threads after the program on  
FishEye, so it was probably a 4-core server), but most of the time was  
used by disk accesses, so the equivalent single-threaded time should be  
about x2 or x2,5 our 24h-time.

Seb35

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] Needs of important ressources

2011-03-04 Thread Platonides
Seb35 wrote:
> Krinkle wrote:
>> How much is "too much memory" ?
> 
> We needed to transform and crop TIFF images, read an XML associated with a  
> book containing the OCRized text of the digitized book, and create a DjVu  
> with the images and the text layer.
> 
> For that we rent a server, I cannot remember exactly the hardware we  
> choosed, but it was probably a 4-core (or 8-core) with 4GB (or 8GB) of RAM  
> and 200-300GB of disk (and a server bandwith, useful to download the files  
>  from the FTP of the BnF, about 500 files by book (1 XML/page + TIFF  
> multipage + some others) x 1416 books = 2-3 days of download on the server  
> because of many small files).
> 
>  From what I remember, "Too much memory" means my laptop (2-core 2.8GHz,  
> 3GB of RAM) on which I developed the (Python) program had difficulies to  
> load the whole XML file (with DOM). Then I tried with SAX and the work was  
> done in some seconds without a lot of memory (I didn't used SAX before,  
> but I ♥ SAX now :-)
> 
> We wrote a technical report about that, but didn't published it for now  
> (perhaps a day, I hope), you can see  
>   
> for an "outreach" document and  
>  for the Python  
> program.
> 
> Seb35

It is important to use the right tools. As you mention, such big xmls
need to be processed on-the-fly, not by loading them in memory.
You mention a server with 4 or 8 cores. Was your program multithreaded
(or otherwise running several instances)? Are those single-threaded 24h?

Also, those instances happened once, and are quite different, so it's
probably better to ask about the needed resources when you know what you
are next needing.
What you mention doesn't seem too much for the toolserver. You should be
able to use enough disk space, and the task could be run in the
background, so cpu wouldn't need to affect other users (specially given
that there are not fixed time constraints). Memory could be a problem,
though, depending on the amount used and for how long. SGE can probably
show some memory usage graphs from which to deduce the amount available
for these kind of projects.

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

[Toolserver-l] user-store maintenance, tonight

2011-03-04 Thread River Tarnell
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi,

Later tonight (around 12AM UTC[0]) I will restart the NFS server on 
hemlock, which serves user-store.  This will cause an interruption to 
service that should last for under 1 minute.

- river.

[0] http://time.tcx.org.uk/utc/2011-03-05/00:00
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (NetBSD)

iEYEARECAAYFAk1xNqIACgkQIXd7fCuc5vL22QCeM2IXEqx0x3yRrcbXIbjVbCsW
GR0An1hYPDoUq58rs1G9fSxgTrT4P1h1
=4BKf
-END PGP SIGNATURE-

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette


Re: [Toolserver-l] Needs of important ressources

2011-03-04 Thread Seb35
Fri, 04 Mar 2011 14:53:50 +0100, Krinkle  wrote:
> On March 4 2011, Seb35 wrote:
>> Fri, 04 Mar 2011 03:45:53 +0100, MZMcBride  wrote:
>>> Seb35 wrote:
 I'm from the French chapter and we need sometimes a lot of CPU power
 and/or a lot of memory for some projects. For now it happened two
 times:
>>>
>>> It's difficult to know what "a lot" of CPU power or memory is from
>>> your
>>> post. Toolserver accounts have account limits
>>> (), so if you're
>>> staying
>>> within those limits, there's generally no problem. If you want to
>>> exceed
>>> those limits, you should talk to the Toolserver roots first
>>> (). There are
>>> places
>>> like /mnt/user-store that can be used for large media storage as
>>> well.
>>>
>>> As always, the Toolserver resources that you use need to relate to
>>> Wikimedia
>>> in some way, but it sounds like both of your projects do. :-)
>>>
>>> MZMcBride
>> Ok, thank you, I didn't find this page.
>>
>> For the BnF project we needed in fact about one day of computation
>> (most
>> of the time was used by the disk accesses), but we thought it would be
>> more (we optimized too by using SAX instead of DOM to read big XML
>> files,
>> it used too much memory with DOM too).
>> For the video encoding to OGV (it's not me who done that), it was 4-5
>> hours for a single video but some time was used to swap (and there
>> are 100
>> videos corresponding to the conferences).
>>
>> Thank you for the response.
>> Seb35
>
> Hi Seb35,
>
> "One day" or "4-5 hours" still don't mean a lot in terms of technical
> requirements.
> One day of computing with what equipment ? With 24 hours of runtime a
> small
> difference can make a big difference. What kind of server server/setup
> did this run
> on ?
>
> How much is "too much memory" ?

We needed to transform and crop TIFF images, read an XML associated with a  
book containing the OCRized text of the digitized book, and create a DjVu  
with the images and the text layer.

For that we rent a server, I cannot remember exactly the hardware we  
choosed, but it was probably a 4-core (or 8-core) with 4GB (or 8GB) of RAM  
and 200-300GB of disk (and a server bandwith, useful to download the files  
 from the FTP of the BnF, about 500 files by book (1 XML/page + TIFF  
multipage + some others) x 1416 books = 2-3 days of download on the server  
because of many small files).

 From what I remember, "Too much memory" means my laptop (2-core 2.8GHz,  
3GB of RAM) on which I developed the (Python) program had difficulies to  
load the whole XML file (with DOM). Then I tried with SAX and the work was  
done in some seconds without a lot of memory (I didn't used SAX before,  
but I ♥ SAX now :-)

We wrote a technical report about that, but didn't published it for now  
(perhaps a day, I hope), you can see  
  
for an "outreach" document and  
 for the Python  
program.

Seb35

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] Needs of important ressources

2011-03-04 Thread Krinkle
On March 4 2011, Seb35 wrote:
> Fri, 04 Mar 2011 03:45:53 +0100, MZMcBride  wrote:
>> Seb35 wrote:
>>> I'm from the French chapter and we need sometimes a lot of CPU power
>>> and/or a lot of memory for some projects. For now it happened two  
>>> times:
>>
>> It's difficult to know what "a lot" of CPU power or memory is from  
>> your
>> post. Toolserver accounts have account limits
>> (), so if you're  
>> staying
>> within those limits, there's generally no problem. If you want to  
>> exceed
>> those limits, you should talk to the Toolserver roots first
>> (). There are
>> places
>> like /mnt/user-store that can be used for large media storage as  
>> well.
>>
>> As always, the Toolserver resources that you use need to relate to
>> Wikimedia
>> in some way, but it sounds like both of your projects do. :-)
>>
>> MZMcBride
> Ok, thank you, I didn't find this page.
>
> For the BnF project we needed in fact about one day of computation  
> (most
> of the time was used by the disk accesses), but we thought it would be
> more (we optimized too by using SAX instead of DOM to read big XML  
> files,
> it used too much memory with DOM too).
> For the video encoding to OGV (it's not me who done that), it was 4-5
> hours for a single video but some time was used to swap (and there  
> are 100
> videos corresponding to the conferences).
>
> Thank you for the response.
> Seb35

Hi Seb35,

"One day" or "4-5 hours" still don't mean a lot in terms of technical  
requirements.
One day of computing with what equipment ? With 24 hours of runtime a  
small
difference can make a big difference. What kind of server server/setup  
did this run
on ?

How much is "too much memory" ?

--
Krinkle

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette