String concatenation - which is the fastest way ?
Hello, I'd like to write a python (2.6/2.7) script which connects to database, fetches hundreds of thousands of rows, concat them (basically: create XML) and then put the result into another table. Do I have any choice regarding string concatenation in Python from the performance point of view ? Since the number of rows is big I'd like to use the fastest possible library (if there is any choice). Can you recommend me something ? Regards Przemyslaw Bak (przemol) Znajdz samochod idealny dla siebie! Szukaj >> http://linkint.pl/f2a0a -- http://mail.python.org/mailman/listinfo/python-list
Re: String concatenation - which is the fastest way ?
On Wed, Aug 10, 2011 at 01:32:06PM +0100, Chris Angelico wrote: > On Wed, Aug 10, 2011 at 12:17 PM, wrote: > > Hello, > > > > I'd like to write a python (2.6/2.7) script which connects to database, > > fetches > > hundreds of thousands of rows, concat them (basically: create XML) > > and then put the result into another table. Do I have any choice > > regarding string concatenation in Python from the performance point of view > > ? > > Since the number of rows is big I'd like to use the fastest possible library > > (if there is any choice). Can you recommend me something ? > > First off, I have no idea why you would want to create an XML dump of > hundreds of thousands of rows, only to store it in another table. > However, if that is your intention, list joining is about as efficient > as you're going to get in Python: > > lst=["asdf","qwer","zxcv"] # feel free to add 399,997 more list entries > xml=""+"".join(lst)+"" > > This sets xml to 'asdfqwerzxcv' which > may or may not be what you're after. Chris, since this process (XML building) is running now inside database (using native SQL commands) and is one-thread task it is quite slow. What I wanted to do is to spawn several python subprocesses in parallel which will concat subset of the whole table (and then merge all of them at the end). Basically: - fetch all rows from the database (up to 1 million): what is recommended data type ? - spawn X python processes each one: - concat its own subset - merge the result from all the subprocesses This task is running on a server which has many but slow cores and I am trying to divide this task into many subtasks. Regards Przemyslaw Bak (przemol) Doladuj telefon przez Internet! Sprawdz >> http://linkint.pl/f2a06 -- http://mail.python.org/mailman/listinfo/python-list
Re: String concatenation - which is the fastest way ?
On Wed, Aug 10, 2011 at 06:20:10PM +0200, Stefan Behnel wrote: > [email protected], 10.08.2011 15:31: >> On Wed, Aug 10, 2011 at 01:32:06PM +0100, Chris Angelico wrote: >>> On Wed, Aug 10, 2011 at 12:17 PM, wrote: I'd like to write a python (2.6/2.7) script which connects to database, fetches hundreds of thousands of rows, concat them (basically: create XML) and then put the result into another table. Do I have any choice regarding string concatenation in Python from the performance point of view ? Since the number of rows is big I'd like to use the fastest possible library (if there is any choice). Can you recommend me something ? >>> >>> First off, I have no idea why you would want to create an XML dump of >>> hundreds of thousands of rows, only to store it in another table. >>> However, if that is your intention, list joining is about as efficient >>> as you're going to get in Python: >>> >>> lst=["asdf","qwer","zxcv"] # feel free to add 399,997 more list entries >>> xml=""+"".join(lst)+"" >>> >>> This sets xml to 'asdfqwerzxcv' which >>> may or may not be what you're after. >> >> since this process (XML building) is running now inside database (using >> native SQL commands) >> and is one-thread task it is quite slow. What I wanted to do is to spawn >> several python subprocesses in parallel which >> will concat subset of the whole table (and then merge all of them at the >> end). >> Basically: >> - fetch all rows from the database (up to 1 million): what is recommended >> data type ? >> - spawn X python processes each one: >> - concat its own subset >> - merge the result from all the subprocesses >> >> This task is running on a server which has many but slow cores and I am >> trying to divide this task >> into many subtasks. > > Makes sense to me. Note that the really good DBMSes (namely, PostgreSQL) > come with built-in Python support. The data are in Oracle so I have to use cx_oracle. > You still didn't provide enough information to make me understand why you > need XML in between one database and another (or the same?), but if you > go that route, you can just read data through multiple connections in > multiple threads (or processes), have each build up one (or more) XML > entries, and then push those into a queue. Then another thread (or more > than one) can read from that queue and write the XML items into a file > (or another database) as they come in. I am not a database developer so I don't want to change the whole process of data flow between applications in my company. Another process is reading this XML from particular Oracle table so I have to put the final XML there. > If your data has a considerable size, I wouldn't use string concatenation > or joining at all (note that it requires 2x the memory during > concatenation), but rather write it into a file, or even just process the > data on the fly, i.e. write it back into the target table right away. > Reading a file back in after the fact is much more resource friendly than > keeping huge amounts of data in memory. And disk speed is usually not a > problem when streaming data from disk into a database. This server has 256 GB of RAM so memory is not a problem. Also the select which fetches the data is sorted. That is why I have to carefully divide into subtasks and then merge it in correct order. Regards Przemyslaw Bak (przemol) Dom marzen - kup lub wynajmij taniej niz myslisz! Szukaj >> http://linkint.pl/f2a0d -- http://mail.python.org/mailman/listinfo/python-list
Re: String concatenation - which is the fastest way ?
On Wed, Aug 10, 2011 at 03:38:42PM +0100, Chris Angelico wrote: > On Wed, Aug 10, 2011 at 3:38 PM, Chris Angelico wrote: > > Which SQL library are you suing? > > And this is why I should proof-read BEFORE, not AFTER, sending. > > Which SQL library are you *using*? cx_oracle Regards Przemyslaw Bak (przemol) Doladuj telefon przez Internet! Sprawdz >> http://linkint.pl/f2a06 -- http://mail.python.org/mailman/listinfo/python-list
Re: String concatenation - which is the fastest way ?
On Thu, Aug 11, 2011 at 11:59:31AM +0100, Chris Angelico wrote: > On Thu, Aug 11, 2011 at 7:40 AM, wrote: > > I am not a database developer so I don't want to change the whole process > > of data flow between applications in my company. Another process is > > reading this XML from particular Oracle table so I have to put the final > > XML there. > > I think you may be looking at a submission to > http://www.thedailywtf.com/ soon. You seem to be working in a rather > weird dataflow. :( Under the circumstances, you're probably going to > want to go with the original ''.join() option. > > > This server has 256 GB of RAM so memory is not a problem. > > Also the select which fetches the data is sorted. That is why I have to > > carefully divide into subtasks and then merge it in correct order. > > There's no guarantee that all of that 256GB is available to you, of course. I am the admin of this server - the memory is available for us :-) > What may be the easiest way is to do the select in a single process, > then partition it and use the Python multiprocessing module to split > the job into several parts. Then you need only concatenate the handful > of strings. This is the way I am going to use. > You'll need to do some serious profiling, though, to ascertain where > the bottleneck really is. Is it actually slow doing the concatenation, > or is it taking more time reading/writing the disk? Is it actually all > just taking time due to RAM usage? Proper string concatenation doesn't > need a huge amount of CPU. I did my homework :-) - the CPU working on concatenation is a bottleneck. Regards Przemyslaw Bak (przemol) Dziesiatki tysiecy ofert domow i mieszkan! Ogladaj >> http://linkint.pl/f2a0c -- http://mail.python.org/mailman/listinfo/python-list
Re: String concatenation - which is the fastest way ?
On Thu, Aug 11, 2011 at 11:59:31AM +0100, Chris Angelico wrote: > > What may be the easiest way is to do the select in a single process, > then partition it and use the Python multiprocessing module to split > the job into several parts. Then you need only concatenate the handful > of strings. This is the way I am going to use. But what is the best data type to hold so many rows and then operate on them ? Regards Przemyslaw Bak (przemol) Zarabiasz 4 tys./miesiac? Damy wiecej! Sprawdz >> http://linkint.pl/f2a0f -- http://mail.python.org/mailman/listinfo/python-list
Re: String concatenation - which is the fastest way ?
On Thu, Aug 11, 2011 at 02:38:32PM -0700, SigmundV wrote: > When I saw the headline I thought "oh no, not string concatenation > again... we have had scores of these thread before...", but this is a > rather interesting problem. The OP says he's not a database > developer, but why is he then fiddling with internal database > operations? Wouldn't it be better to go back to the database > developers and have them look into parallel processing. I'm sure that > Oracle databases can do parallel processing by now... :-) Good question but I try to explain what motivates me to do it. First reason (I think the most important :-) ) is that I want to learn something new - I am new to python (I am unix/storage sysadmin but with programming background so python was a natural choice for more complicated sysadmin tasks). Another reason is that our server (and I am responsible for it) has many, many but slow cores (as I had written before). It means that parallelization of operations is obvious - the developer is not keen to spent much time on it (she is busy) - and for me this is something new (among some boring daily tasks ... ;-) ) and fresh :-) Another intention is to get some more knowledge about parallelization: how to divide some task into subtasks, what is the most optimal way to do it, etc And the last reason is that I love performance tuning :-) Regards Przemyslaw Bak (przemol) Zmyslowa bielizna? U nas ja znajdziesz! http://linkint.pl/f29fe -- http://mail.python.org/mailman/listinfo/python-list
Re: String concatenation - which is the fastest way ?
On Thu, Aug 11, 2011 at 02:48:43PM +0100, Chris Angelico wrote: > On Thu, Aug 11, 2011 at 2:46 PM, wrote: > > This is the way I am going to use. > > But what is the best data type to hold so many rows and then operate on > > them ? > > > > List of strings. Take it straight from your Oracle interface and work > with it directly. Can I use this list in the following way ? subprocess_1 - run on list between 1 and 1 subprocess_2 - run on list between 10001 and 2 subprocess_3 - run on list between 20001 and 3 etc ... Sort of indexing ? Regards Przemyslaw Bak (przemol) Zmyslowa bielizna? U nas ja znajdziesz! http://linkint.pl/f29fe -- http://mail.python.org/mailman/listinfo/python-list
Re: String concatenation - which is the fastest way ?
On Thu, Aug 11, 2011 at 02:48:43PM +0100, Chris Angelico wrote: > On Thu, Aug 11, 2011 at 2:46 PM, wrote: > > This is the way I am going to use. > > But what is the best data type to hold so many rows and then operate on > > them ? > > > > List of strings. [...] Let's assume I have the whole list in the memory: Can I use this list in the following way ? subprocess_1 - run on list between 1 and 1 subprocess_2 - run on list between 10001 and 2 subprocess_3 - run on list between 20001 and 3 etc ... Can I use sort of indexing on this list ? Regards Przemyslaw Bak (przemol) Doladuj telefon przez Internet! Sprawdz >> http://linkint.pl/f2a06 -- http://mail.python.org/mailman/listinfo/python-list
Re: String concatenation - which is the fastest way ?
On Fri, Aug 12, 2011 at 11:25:06AM +0200, Stefan Behnel wrote: > [email protected], 11.08.2011 16:39: >> On Thu, Aug 11, 2011 at 02:48:43PM +0100, Chris Angelico wrote: >>> On Thu, Aug 11, 2011 at 2:46 PM, wrote: This is the way I am going to use. But what is the best data type to hold so many rows and then operate on them ? >>> >>> List of strings. Take it straight from your Oracle interface and work >>> with it directly. >> >> Can I use this list in the following way ? >> subprocess_1 - run on list between 1 and 1 >> subprocess_2 - run on list between 10001 and 2 >> subprocess_3 - run on list between 20001 and 3 >> etc >> ... > > Sure. Just read the data as it comes in from the database and fill up a > chunk, then hand that on to a process. You can also distribute it in > smaller packets, just check what size gives the best throughput. Since the performance is critical I wanted to use multiprocessing module. But when I get all the source rows into one list of strings can I easly share it between X processes ? Regards Przemyslaw Bak (przemol) Najwieksza baza najtanszych ofert mieszkaniowych http://linkint.pl/f2a0e -- http://mail.python.org/mailman/listinfo/python-list
locale.format without trailing zeros
Hello,
>>> import locale
>>> locale.setlocale(locale.LC_ALL, "pl_PL")
'pl_PL'
>>> i=0.20
>>> j=0.25
>>> locale.format('%f', i)
'0,20'
>>> locale.format('%f', j)
'0,25'
I need to print the numbers in the following format:
'0,2' (i)
'0,25' (j)
So the last trailing zeros are not printed.
Regards
Przemyslaw Bak (przemol)
Najwieksza baza najtanszych ofert mieszkaniowych
http://linkint.pl/f2a0e
--
http://mail.python.org/mailman/listinfo/python-list
Re: locale.format without trailing zeros
On Mon, Aug 22, 2011 at 11:48:46AM +0200, Peter Otten wrote: > [email protected] wrote: > > import locale > locale.setlocale(locale.LC_ALL, "pl_PL") > > 'pl_PL' > i=0.20 > j=0.25 > locale.format('%f', i) > > '0,20' > locale.format('%f', j) > > '0,25' > > > > I need to print the numbers in the following format: > > '0,2' (i) > > '0,25' (j) > > So the last trailing zeros are not printed. > > >>> print locale.format("%g", 1.23) > 1,23 > >>> print locale.format("%g", 1.2345678) > 1,23457 > >>> print locale.format("%.10g", 1.2345678) > 1,2345678 > >>> print locale.format("%.15g", 0.1) > 0,1 > >>> print locale.format("%.17g", 0.1) > 0,10001 Thank you very much :-) Regards Przemyslaw Bak (przemol) Znajdz samochod idealny dla siebie! Szukaj >> http://linkint.pl/f2a0a -- http://mail.python.org/mailman/listinfo/python-list
Re: locale.format without trailing zeros
On Mon, Aug 22, 2011 at 11:48:46AM +0200, Peter Otten wrote: > [email protected] wrote: > > import locale > locale.setlocale(locale.LC_ALL, "pl_PL") > > 'pl_PL' > i=0.20 > j=0.25 > locale.format('%f', i) > > '0,20' > locale.format('%f', j) > > '0,25' > > > > I need to print the numbers in the following format: > > '0,2' (i) > > '0,25' (j) > > So the last trailing zeros are not printed. > > >>> print locale.format("%g", 1.23) > 1,23 > >>> print locale.format("%g", 1.2345678) > 1,23457 > >>> print locale.format("%.10g", 1.2345678) > 1,2345678 > >>> print locale.format("%.15g", 0.1) > 0,1 > >>> print locale.format("%.17g", 0.1) > 0,10001 How about this format: ',1' (the local zero is also not printed) (I know this is strange but I need compatibility with local requirements) Regards Przemyslaw Bak (przemol) Najwieksza baza najtanszych ofert mieszkaniowych http://linkint.pl/f2a0e -- http://mail.python.org/mailman/listinfo/python-list
[python-list] - what do you think ?
Hello, I have just subscribed to this python-list@ and this is my N list. Usually many mailing lists use square brackets to identify its name when you have e-mails from different forums. Would you consider adding [] to this list also ? Please compare both version below: 5350 Feb 07 Richard Holmes( 20) PIL Open Problem 5351 N Feb 07 Corey Richardson ( 17) ??> 5352 N Feb 08 Ben Finney( 28) ??> 5376 N Feb 07 Collins, Kevin [BEELINE] ( 35) Re: [rhelv5-list] ext4 options 5385 N Feb 07 Bob Friesenhahn ( 22) Re: [zfs-discuss] ZFS and TRIM - No need for TRIM 5386 N Feb 07 Eric D. Mudama( 25) ??> 5387 N Feb 07 Nevins Duret ( 215) [Tutor] Converting From Unicode to ASCII!! 5388 N Feb 08 David Hutto ( 138) ??> 5389 N Feb 08 David Hutto ( 7) ??> 5390 N Feb 08 David Hutto ( 9) ??> 5391 N Feb 08 Mr. Bean ( 9) Somehing interesting 5392 N Feb 08 Mr. Bean ( 9) ??> 5393 N Feb 07 nguytom ( 11) [Veritas-bu] How to use the DataDomain system cleaning process 5394 N Feb 07 David Stanaway( 29) ??>Re: [Veritas-bu] How to use the DataDomain system cleaning process and 5350 Feb 07 Richard Holmes( 20) [python-list] PIL Open Problem 5351 N Feb 07 Corey Richardson ( 17) ??> 5352 N Feb 08 Ben Finney( 28) ??> 5376 N Feb 07 Collins, Kevin [BEELINE] ( 35) Re: [rhelv5-list] ext4 options 5385 N Feb 07 Bob Friesenhahn ( 22) Re: [zfs-discuss] ZFS and TRIM - No need for TRIM 5386 N Feb 07 Eric D. Mudama( 25) ??> 5387 N Feb 07 Nevins Duret ( 215) [Tutor] Converting From Unicode to ASCII!! 5388 N Feb 08 David Hutto ( 138) ??> 5389 N Feb 08 David Hutto ( 7) ??> 5390 N Feb 08 David Hutto ( 9) ??> 5391 N Feb 08 Mr. Bean ( 9) [python-list] Somehing interesting 5392 N Feb 08 Mr. Bean ( 9) ??> 5393 N Feb 07 nguytom ( 11) [Veritas-bu] How to use the DataDomain system cleaning process 5394 N Feb 07 David Stanaway( 29) ??>Re: [Veritas-bu] How to use the DataDomain system cleaning process Kind regards przemol 'Seamie' Rozwiaz krzyzowke i wygraj nagrode! Sprawdz >> http://linkint.pl/f2907 -- http://mail.python.org/mailman/listinfo/python-list
Re: [python-list] - what do you think ?
On Tue, Feb 08, 2011 at 10:16:42PM +1100, Ben Finney wrote: > [email protected] writes: > > > I have just subscribed to this python-list@ and this is my N list. > > Usually many mailing lists use square brackets to identify its name > > when you have e-mails from different forums. > > Would you consider adding [] to this list also ? > > No thank you. > > Either your mail client already knows how to filter messages > appropriately depending on which mailing list they came from; or, you > should use a better mail client. mutt is quite good ;-) > Either way, please don't ask for the subject lines to be munged. Any technical reason why not ? Regards Przemek --- Dramatyczny wypadek Roberta Kubicy - zobacz najswiezsze doniesienia! Sprawdz >>> http://linkint.pl/f2915 -- http://mail.python.org/mailman/listinfo/python-list
Re: - what do you think ?
On Tue, Feb 08, 2011 at 01:20:48PM -0500, Terry Reedy wrote: > On 2/8/2011 7:18 AM, [email protected] wrote: >> On Tue, Feb 08, 2011 at 10:16:42PM +1100, Ben Finney wrote: > >>> Either way, please don't ask for the subject lines to be munged. >> >> Any technical reason why not ? > > For one reason, python-list exchanges messages with both > comp.lang.python and gmane.comp.python.general (the latter is how I read > it), and newsreaders already separate messages by group. I also read > pydev and a couple of sig lists via gmane, so extra headers would be > noise. It is important technical reason. Thank you. :-) Regards Przemek --- Dramatyczny wypadek Roberta Kubicy - zobacz najswiezsze doniesienia! Sprawdz >>> http://linkint.pl/f2915 -- http://mail.python.org/mailman/listinfo/python-list
Graphical library - charts
Hello, I have thousends of files with logs from monitoring system. Each file has some important data (numbers). I'd like to create charts using those numbers. Could you please suggest library which will allow creating such charts ? The preferred chart is line chart. Besides is there any library which allow me to zoom in/out of such chart ? Sometimes I need to create chart using long-term data (a few months) but then observe a minutes - it would be good to not create another short-term chart but just zoom-in. Those files are on one unix server and the charts will be displayed on another unix server so the X-Window protocol is going to be used. Any suggestions ? Best regards przemol -- http://mail.python.org/mailman/listinfo/python-list
Re: Graphical library - charts
BJörn Lindqvist wrote: > 2009/6/22 : >> Hello, >> >> I have thousends of files with logs from monitoring system. Each file >> has some important data (numbers). I'd like to create charts using those >> numbers. Could you please suggest library which will allow creating >> such charts ? The preferred chart is line chart. >> >> Besides is there any library which allow me to zoom in/out of such chart >> ? Sometimes I need to create chart using long-term data (a few months) >> but then observe a minutes - it would be good to not create another >> short-term chart but just zoom-in. >> >> Those files are on one unix server and the charts will be displayed on >> another unix server so the X-Window protocol is going to be used. > > Try Google Charts. It is quite excellent for easily creating simple > charts. There is also Gnuplot which is more advanced and complicated. > Both tools have python bindings. Which option is better: pygooglechart http://pygooglechart.slowchop.com/ google-chartwrapper http://code.google.com/p/google-chartwrapper/ Regards Przemek -- http://mail.python.org/mailman/listinfo/python-list
Re: Graphical library - charts
BJörn Lindqvist wrote: > 2009/6/22 : >> Hello, >> >> I have thousends of files with logs from monitoring system. Each file >> has some important data (numbers). I'd like to create charts using those >> numbers. Could you please suggest library which will allow creating >> such charts ? The preferred chart is line chart. >> >> Besides is there any library which allow me to zoom in/out of such chart >> ? Sometimes I need to create chart using long-term data (a few months) >> but then observe a minutes - it would be good to not create another >> short-term chart but just zoom-in. >> >> Those files are on one unix server and the charts will be displayed on >> another unix server so the X-Window protocol is going to be used. > > Try Google Charts. It is quite excellent for easily creating simple > charts. There is also Gnuplot which is more advanced and complicated. > Both tools have python bindings. By the way: do I need any access to internet while using this library ? Regards przemol -- http://mail.python.org/mailman/listinfo/python-list
