String concatenation - which is the fastest way ?

2011-08-10 Thread przemolicc
Hello,

I'd like to write a python (2.6/2.7) script which connects to database, fetches
hundreds of thousands of rows, concat them (basically: create XML)
and then put the result into another table. Do I have any choice
regarding string concatenation in Python from the performance point of view ?
Since the number of rows is big I'd like to use the fastest possible library
(if there is any choice). Can you recommend me something ?

Regards
Przemyslaw Bak (przemol)




















































Znajdz samochod idealny dla siebie!
Szukaj >> http://linkint.pl/f2a0a
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: String concatenation - which is the fastest way ?

2011-08-10 Thread przemolicc
On Wed, Aug 10, 2011 at 01:32:06PM +0100, Chris Angelico wrote:
> On Wed, Aug 10, 2011 at 12:17 PM,   wrote:
> > Hello,
> >
> > I'd like to write a python (2.6/2.7) script which connects to database, 
> > fetches
> > hundreds of thousands of rows, concat them (basically: create XML)
> > and then put the result into another table. Do I have any choice
> > regarding string concatenation in Python from the performance point of view 
> > ?
> > Since the number of rows is big I'd like to use the fastest possible library
> > (if there is any choice). Can you recommend me something ?
> 
> First off, I have no idea why you would want to create an XML dump of
> hundreds of thousands of rows, only to store it in another table.
> However, if that is your intention, list joining is about as efficient
> as you're going to get in Python:
> 
> lst=["asdf","qwer","zxcv"] # feel free to add 399,997 more list entries
> xml=""+"".join(lst)+""
> 
> This sets xml to 'asdfqwerzxcv' which
> may or may not be what you're after.

Chris,

since this process (XML building) is running now inside database (using native 
SQL commands)
and is one-thread task it is quite slow. What I wanted to do is to spawn 
several python subprocesses in parallel which
will concat subset of the whole table (and then merge all of them at the end).
Basically:
- fetch all rows from the database (up to 1 million): what is recommended data 
type ?
- spawn X python processes each one:
- concat its own subset
- merge the result from all the subprocesses

This task is running on a server which has many but slow cores and I am trying 
to divide this task
into many subtasks.

Regards
Przemyslaw Bak (przemol)



















































Doladuj telefon przez Internet!
Sprawdz >> http://linkint.pl/f2a06
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: String concatenation - which is the fastest way ?

2011-08-10 Thread przemolicc
On Wed, Aug 10, 2011 at 06:20:10PM +0200, Stefan Behnel wrote:
> [email protected], 10.08.2011 15:31:
>> On Wed, Aug 10, 2011 at 01:32:06PM +0100, Chris Angelico wrote:
>>> On Wed, Aug 10, 2011 at 12:17 PM,  wrote:
 I'd like to write a python (2.6/2.7) script which connects to database, 
 fetches
 hundreds of thousands of rows, concat them (basically: create XML)
 and then put the result into another table. Do I have any choice
 regarding string concatenation in Python from the performance point of 
 view ?
 Since the number of rows is big I'd like to use the fastest possible 
 library
 (if there is any choice). Can you recommend me something ?
>>>
>>> First off, I have no idea why you would want to create an XML dump of
>>> hundreds of thousands of rows, only to store it in another table.
>>> However, if that is your intention, list joining is about as efficient
>>> as you're going to get in Python:
>>>
>>> lst=["asdf","qwer","zxcv"] # feel free to add 399,997 more list entries
>>> xml=""+"".join(lst)+""
>>>
>>> This sets xml to 'asdfqwerzxcv' which
>>> may or may not be what you're after.
>>
>> since this process (XML building) is running now inside database (using 
>> native SQL commands)
>> and is one-thread task it is quite slow. What I wanted to do is to spawn 
>> several python subprocesses in parallel which
>> will concat subset of the whole table (and then merge all of them at the 
>> end).
>> Basically:
>> - fetch all rows from the database (up to 1 million): what is recommended 
>> data type ?
>> - spawn X python processes each one:
>>  - concat its own subset
>> - merge the result from all the subprocesses
>>
>> This task is running on a server which has many but slow cores and I am 
>> trying to divide this task
>> into many subtasks.
>
> Makes sense to me. Note that the really good DBMSes (namely, PostgreSQL)  
> come with built-in Python support.

The data are in Oracle so I have to use cx_oracle.

> You still didn't provide enough information to make me understand why you 
> need XML in between one database and another (or the same?), but if you 
> go that route, you can just read data through multiple connections in 
> multiple threads (or processes), have each build up one (or more) XML 
> entries, and then push those into a queue. Then another thread (or more 
> than one) can read from that queue and write the XML items into a file 
> (or another database) as they come in.

I am not a database developer so I don't want to change the whole process
of data flow between applications in my company. Another process is
reading this XML from particular Oracle table so I have to put the final XML 
there.

> If your data has a considerable size, I wouldn't use string concatenation 
> or joining at all (note that it requires 2x the memory during  
> concatenation), but rather write it into a file, or even just process the 
> data on the fly, i.e. write it back into the target table right away.  
> Reading a file back in after the fact is much more resource friendly than 
> keeping huge amounts of data in memory. And disk speed is usually not a  
> problem when streaming data from disk into a database.

This server has 256 GB of RAM so memory is not a problem.
Also the select which fetches the data is sorted. That is why I have to 
carefully divide into subtasks and then merge it in correct order.

Regards
Przemyslaw Bak (przemol)




















































Dom marzen - kup lub wynajmij taniej niz myslisz!
Szukaj >> http://linkint.pl/f2a0d
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: String concatenation - which is the fastest way ?

2011-08-11 Thread przemolicc
On Wed, Aug 10, 2011 at 03:38:42PM +0100, Chris Angelico wrote:
> On Wed, Aug 10, 2011 at 3:38 PM, Chris Angelico  wrote:
> > Which SQL library are you suing?
> 
> And this is why I should proof-read BEFORE, not AFTER, sending.
> 
> Which SQL library are you *using*?

cx_oracle

Regards
Przemyslaw Bak (przemol)




















































Doladuj telefon przez Internet!
Sprawdz >> http://linkint.pl/f2a06
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: String concatenation - which is the fastest way ?

2011-08-11 Thread przemolicc
On Thu, Aug 11, 2011 at 11:59:31AM +0100, Chris Angelico wrote:
> On Thu, Aug 11, 2011 at 7:40 AM,   wrote:
> > I am not a database developer so I don't want to change the whole process
> > of data flow between applications in my company. Another process is
> > reading this XML from particular Oracle table so I have to put the final 
> > XML there.
> 
> I think you may be looking at a submission to
> http://www.thedailywtf.com/ soon. You seem to be working in a rather
> weird dataflow. :( Under the circumstances, you're probably going to
> want to go with the original ''.join() option.
> 
> > This server has 256 GB of RAM so memory is not a problem.
> > Also the select which fetches the data is sorted. That is why I have to
> > carefully divide into subtasks and then merge it in correct order.
> 
> There's no guarantee that all of that 256GB is available to you, of course.

I am the admin of this server - the memory is available for us :-)

> What may be the easiest way is to do the select in a single process,
> then partition it and use the Python multiprocessing module to split
> the job into several parts. Then you need only concatenate the handful
> of strings.

This is the way I am going to use.

> You'll need to do some serious profiling, though, to ascertain where
> the bottleneck really is. Is it actually slow doing the concatenation,
> or is it taking more time reading/writing the disk? Is it actually all
> just taking time due to RAM usage? Proper string concatenation doesn't
> need a huge amount of CPU.

I did my homework :-) - the CPU working on concatenation is a bottleneck.


Regards
Przemyslaw Bak (przemol)



















































Dziesiatki tysiecy ofert domow i mieszkan!
Ogladaj >> http://linkint.pl/f2a0c
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: String concatenation - which is the fastest way ?

2011-08-11 Thread przemolicc
On Thu, Aug 11, 2011 at 11:59:31AM +0100, Chris Angelico wrote:
> 
> What may be the easiest way is to do the select in a single process,
> then partition it and use the Python multiprocessing module to split
> the job into several parts. Then you need only concatenate the handful
> of strings.

This is the way I am going to use.
But what is the best data type to hold so many rows and then operate on them ?

Regards
Przemyslaw Bak (przemol)




















































Zarabiasz 4 tys./miesiac? Damy wiecej!
Sprawdz >> http://linkint.pl/f2a0f
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: String concatenation - which is the fastest way ?

2011-08-12 Thread przemolicc
On Thu, Aug 11, 2011 at 02:38:32PM -0700, SigmundV wrote:
> When I saw the headline I thought "oh no, not string concatenation
> again... we have had scores of these thread before...", but this is a
> rather interesting problem. The OP says he's not a database
> developer,  but why is he then fiddling with internal database
> operations? Wouldn't it be better to go back to the database
> developers and have them look into parallel processing. I'm sure that
> Oracle databases can do parallel processing by now...

:-)
Good question but I try to explain what motivates me to do it.
First reason (I think the most important :-) ) is that I want to learn
something new - I am new to python (I am unix/storage sysadmin but with 
programming
background so python was a natural choice for more complicated
sysadmin tasks).
Another reason is that our server (and I am responsible for it) has
many, many but slow cores (as I had written before). It means that
parallelization of operations is obvious - the developer is not keen
to spent much time on it (she is busy) - and for me this is something new
(among some boring daily tasks ... ;-) ) and fresh :-)
Another intention is to get some more knowledge about parallelization:
how to divide some task into subtasks, what is the most optimal way to do it, 
etc
And the last reason is that I love performance tuning :-)

Regards
Przemyslaw Bak (przemol)




















































Zmyslowa bielizna? U nas ja znajdziesz!
http://linkint.pl/f29fe
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: String concatenation - which is the fastest way ?

2011-08-12 Thread przemolicc
On Thu, Aug 11, 2011 at 02:48:43PM +0100, Chris Angelico wrote:
> On Thu, Aug 11, 2011 at 2:46 PM,   wrote:
> > This is the way I am going to use.
> > But what is the best data type to hold so many rows and then operate on 
> > them ?
> >
> 
> List of strings. Take it straight from your Oracle interface and work
> with it directly.

Can I use this list in the following way ?
subprocess_1 - run on list between 1 and 1
subprocess_2 - run on list between 10001 and 2
subprocess_3 - run on list between 20001 and 3
etc
...
Sort of indexing ?

Regards
Przemyslaw Bak (przemol)




















































Zmyslowa bielizna? U nas ja znajdziesz!
http://linkint.pl/f29fe
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: String concatenation - which is the fastest way ?

2011-08-12 Thread przemolicc
On Thu, Aug 11, 2011 at 02:48:43PM +0100, Chris Angelico wrote:
> On Thu, Aug 11, 2011 at 2:46 PM,   wrote:
> > This is the way I am going to use.
> > But what is the best data type to hold so many rows and then operate on 
> > them ?
> >
> 
> List of strings. [...]

Let's assume I have the whole list in the memory:
Can I use this list in the following way ?
subprocess_1 - run on list between 1 and 1
subprocess_2 - run on list between 10001 and 2
subprocess_3 - run on list between 20001 and 3
etc
...
Can I use sort of indexing on this list ?


Regards
Przemyslaw Bak (przemol)




















































Doladuj telefon przez Internet!
Sprawdz >> http://linkint.pl/f2a06
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: String concatenation - which is the fastest way ?

2011-08-16 Thread przemolicc
On Fri, Aug 12, 2011 at 11:25:06AM +0200, Stefan Behnel wrote:
> [email protected], 11.08.2011 16:39:
>> On Thu, Aug 11, 2011 at 02:48:43PM +0100, Chris Angelico wrote:
>>> On Thu, Aug 11, 2011 at 2:46 PM,  wrote:
 This is the way I am going to use.
 But what is the best data type to hold so many rows and then operate on 
 them ?

>>>
>>> List of strings. Take it straight from your Oracle interface and work
>>> with it directly.
>>
>> Can I use this list in the following way ?
>> subprocess_1 - run on list between 1 and 1
>> subprocess_2 - run on list between 10001 and 2
>> subprocess_3 - run on list between 20001 and 3
>> etc
>> ...
>
> Sure. Just read the data as it comes in from the database and fill up a  
> chunk, then hand that on to a process. You can also distribute it in  
> smaller packets, just check what size gives the best throughput.

Since the performance is critical I wanted to use multiprocessing module.
But when I get all the source rows into one list of strings can I easly
share it between X processes ?


Regards
Przemyslaw Bak (przemol)




















































Najwieksza baza najtanszych ofert mieszkaniowych
http://linkint.pl/f2a0e
-- 
http://mail.python.org/mailman/listinfo/python-list


locale.format without trailing zeros

2011-08-22 Thread przemolicc
Hello,

>>> import locale
>>> locale.setlocale(locale.LC_ALL, "pl_PL")
'pl_PL'
>>> i=0.20
>>> j=0.25
>>> locale.format('%f', i)
'0,20'
>>> locale.format('%f', j)
'0,25'

I need to print the numbers in the following format:
'0,2'   (i)
'0,25'  (j)
So the last trailing zeros are not printed.

Regards
Przemyslaw Bak (przemol)




















































Najwieksza baza najtanszych ofert mieszkaniowych
http://linkint.pl/f2a0e
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: locale.format without trailing zeros

2011-08-22 Thread przemolicc
On Mon, Aug 22, 2011 at 11:48:46AM +0200, Peter Otten wrote:
> [email protected] wrote:
> 
>  import locale
>  locale.setlocale(locale.LC_ALL, "pl_PL")
> > 'pl_PL'
>  i=0.20
>  j=0.25
>  locale.format('%f', i)
> > '0,20'
>  locale.format('%f', j)
> > '0,25'
> > 
> > I need to print the numbers in the following format:
> > '0,2'   (i)
> > '0,25'  (j)
> > So the last trailing zeros are not printed.
> 
> >>> print locale.format("%g", 1.23)
> 1,23
> >>> print locale.format("%g", 1.2345678)
> 1,23457
> >>> print locale.format("%.10g", 1.2345678)
> 1,2345678
> >>> print locale.format("%.15g", 0.1)
> 0,1
> >>> print locale.format("%.17g", 0.1)
> 0,10001

Thank you very much :-)

Regards
Przemyslaw Bak (przemol)




















































Znajdz samochod idealny dla siebie!
Szukaj >> http://linkint.pl/f2a0a
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: locale.format without trailing zeros

2011-08-22 Thread przemolicc
On Mon, Aug 22, 2011 at 11:48:46AM +0200, Peter Otten wrote:
> [email protected] wrote:
> 
>  import locale
>  locale.setlocale(locale.LC_ALL, "pl_PL")
> > 'pl_PL'
>  i=0.20
>  j=0.25
>  locale.format('%f', i)
> > '0,20'
>  locale.format('%f', j)
> > '0,25'
> > 
> > I need to print the numbers in the following format:
> > '0,2'   (i)
> > '0,25'  (j)
> > So the last trailing zeros are not printed.
> 
> >>> print locale.format("%g", 1.23)
> 1,23
> >>> print locale.format("%g", 1.2345678)
> 1,23457
> >>> print locale.format("%.10g", 1.2345678)
> 1,2345678
> >>> print locale.format("%.15g", 0.1)
> 0,1
> >>> print locale.format("%.17g", 0.1)
> 0,10001

How about this format:
',1'
(the local zero is also not printed)

(I know this is strange but I need compatibility with local requirements)

Regards
Przemyslaw Bak (przemol)




















































Najwieksza baza najtanszych ofert mieszkaniowych
http://linkint.pl/f2a0e
-- 
http://mail.python.org/mailman/listinfo/python-list


[python-list] - what do you think ?

2011-02-08 Thread przemolicc
Hello,

I have just subscribed to this python-list@ and this is my N list.
Usually many mailing lists use square brackets to identify its name
when you have e-mails from different forums.
Would you consider adding [] to this list also ?

Please compare both version below:
5350 Feb 07 Richard Holmes(  20) PIL Open Problem
5351 N   Feb 07 Corey Richardson  (  17) ??>
5352 N   Feb 08 Ben Finney(  28) ??>
5376 N   Feb 07 Collins, Kevin [BEELINE]  (  35) Re: [rhelv5-list] ext4 options
5385 N   Feb 07 Bob Friesenhahn   (  22) Re: [zfs-discuss] ZFS and TRIM 
- No need for TRIM
5386 N   Feb 07 Eric D. Mudama(  25) ??>
5387 N   Feb 07 Nevins Duret  ( 215) [Tutor] Converting From 
Unicode to ASCII!!
5388 N   Feb 08 David Hutto   ( 138) ??>
5389 N   Feb 08 David Hutto   (   7)   ??>
5390 N   Feb 08 David Hutto   (   9) ??>
5391 N   Feb 08 Mr. Bean  (   9) Somehing interesting
5392 N   Feb 08 Mr. Bean  (   9) ??>
5393 N   Feb 07 nguytom   (  11) [Veritas-bu]  How to use the 
DataDomain system cleaning process
5394 N   Feb 07 David Stanaway(  29) ??>Re: [Veritas-bu] How to use 
the DataDomain system cleaning process

and

5350 Feb 07 Richard Holmes(  20) [python-list] PIL Open Problem
5351 N   Feb 07 Corey Richardson  (  17) ??>
5352 N   Feb 08 Ben Finney(  28) ??>
5376 N   Feb 07 Collins, Kevin [BEELINE]  (  35) Re: [rhelv5-list] ext4 options
5385 N   Feb 07 Bob Friesenhahn   (  22) Re: [zfs-discuss] ZFS and TRIM 
- No need for TRIM
5386 N   Feb 07 Eric D. Mudama(  25) ??>
5387 N   Feb 07 Nevins Duret  ( 215) [Tutor] Converting From 
Unicode to ASCII!!
5388 N   Feb 08 David Hutto   ( 138) ??>
5389 N   Feb 08 David Hutto   (   7)   ??>
5390 N   Feb 08 David Hutto   (   9) ??>
5391 N   Feb 08 Mr. Bean  (   9) [python-list] Somehing 
interesting
5392 N   Feb 08 Mr. Bean  (   9) ??>
5393 N   Feb 07 nguytom   (  11) [Veritas-bu]  How to use the 
DataDomain system cleaning process
5394 N   Feb 07 David Stanaway(  29) ??>Re: [Veritas-bu] How to use 
the DataDomain system cleaning process


Kind regards
przemol 'Seamie'


























Rozwiaz krzyzowke i wygraj nagrode!
Sprawdz >> http://linkint.pl/f2907

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: [python-list] - what do you think ?

2011-02-08 Thread przemolicc
On Tue, Feb 08, 2011 at 10:16:42PM +1100, Ben Finney wrote:
> [email protected] writes:
> 
> > I have just subscribed to this python-list@ and this is my N list.
> > Usually many mailing lists use square brackets to identify its name
> > when you have e-mails from different forums.
> > Would you consider adding [] to this list also ?
> 
> No thank you.
> 
> Either your mail client already knows how to filter messages
> appropriately depending on which mailing list they came from; or, you
> should use a better mail client.

mutt is quite good ;-)

> Either way, please don't ask for the subject lines to be munged.

Any technical reason why not ?

Regards
Przemek

---
Dramatyczny wypadek Roberta Kubicy - zobacz najswiezsze doniesienia!
Sprawdz >>> http://linkint.pl/f2915

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: - what do you think ?

2011-02-09 Thread przemolicc
On Tue, Feb 08, 2011 at 01:20:48PM -0500, Terry Reedy wrote:
> On 2/8/2011 7:18 AM, [email protected] wrote:
>> On Tue, Feb 08, 2011 at 10:16:42PM +1100, Ben Finney wrote:
>
>>> Either way, please don't ask for the subject lines to be munged.
>>
>> Any technical reason why not ?
>
> For one reason, python-list exchanges messages with both  
> comp.lang.python and gmane.comp.python.general (the latter is how I read  
> it), and newsreaders already separate messages by group. I also read  
> pydev and a couple of sig lists via gmane, so extra headers would be 
> noise.

It is important technical reason.
Thank you. :-)

Regards
Przemek





























---
Dramatyczny wypadek Roberta Kubicy - zobacz najswiezsze doniesienia!
Sprawdz >>> http://linkint.pl/f2915

-- 
http://mail.python.org/mailman/listinfo/python-list


Graphical library - charts

2009-06-22 Thread przemolicc
Hello,

I have thousends of files with logs from monitoring system. Each file
has some important data (numbers). I'd like to create charts using those
numbers. Could you please suggest library which will allow creating
such charts ? The preferred chart is line chart.

Besides is there any library which allow me to zoom in/out of such chart ?
Sometimes I need to create chart using long-term data (a few months) but
then observe a minutes - it would be good to not create another short-term
chart but just zoom-in.

Those files are on one unix server and the charts will be displayed on
another unix server so the X-Window protocol is going to be used.

Any suggestions ?

Best regards
przemol

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Graphical library - charts

2009-06-22 Thread przemolicc
BJörn Lindqvist wrote:

> 2009/6/22  :
>> Hello,
>>
>> I have thousends of files with logs from monitoring system. Each file
>> has some important data (numbers). I'd like to create charts using those
>> numbers. Could you please suggest library which will allow creating
>> such charts ? The preferred chart is line chart.
>>
>> Besides is there any library which allow me to zoom in/out of such chart
>> ? Sometimes I need to create chart using long-term data (a few months)
>> but then observe a minutes - it would be good to not create another
>> short-term chart but just zoom-in.
>>
>> Those files are on one unix server and the charts will be displayed on
>> another unix server so the X-Window protocol is going to be used.
> 
> Try Google Charts. It is quite excellent for easily creating simple
> charts. There is also Gnuplot which is more advanced and complicated.
> Both tools have python bindings.

Which option is better:
pygooglechart   http://pygooglechart.slowchop.com/
google-chartwrapper http://code.google.com/p/google-chartwrapper/

Regards
Przemek

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Graphical library - charts

2009-06-22 Thread przemolicc
BJörn Lindqvist wrote:

> 2009/6/22  :
>> Hello,
>>
>> I have thousends of files with logs from monitoring system. Each file
>> has some important data (numbers). I'd like to create charts using those
>> numbers. Could you please suggest library which will allow creating
>> such charts ? The preferred chart is line chart.
>>
>> Besides is there any library which allow me to zoom in/out of such chart
>> ? Sometimes I need to create chart using long-term data (a few months)
>> but then observe a minutes - it would be good to not create another
>> short-term chart but just zoom-in.
>>
>> Those files are on one unix server and the charts will be displayed on
>> another unix server so the X-Window protocol is going to be used.
> 
> Try Google Charts. It is quite excellent for easily creating simple
> charts. There is also Gnuplot which is more advanced and complicated.
> Both tools have python bindings.

By the way: do I need any access to internet while using this library ?


Regards
przemol

-- 
http://mail.python.org/mailman/listinfo/python-list