Re: Is Python string immutable?
Frank Potter wrote: Thank you very much. Steve Holden, I post my soucecode at my blog here: http://hiparrot.wordpress.com/2005/12/08/implementing-a-simple-net-spider/ http://hiparrot.wordpress.com/2005/12/08/implementing-a-simple-net-spider/ I wish you can read and give me some suggestion. Any comments will be appreciated. Before any intensive scrutiny of the code perhaps you can answer a few questions. 1. Are you absolutely sure you have published the code you are running? The code you reference has indentation problems that give syntax errors. 2. Why do you continually create new threads when you could have a fixed set of threads sharing a list of URLs to scan? 3. What steps have you taken to ensure that the program does indeed perform according to its specifications? It seems to me that this code is a Java program transliterated into Python. A more natural way to express the algorithm would be for a number of worker threads to share Queue.Queue of URLs to be visited. The only other data structure that would then need locking would be a dictionary of URLs that had already been considered. Overall you appear to be spending large amounts of time locking and unlocking things, and creating threads unnecessarily, but you don't claim that the algorithm is particularly refined, and it shouldn;t affect memory usage. However, I would like to be sure that the excessive memory usage *is* indeed something to do with your program, as I contend, and not some buggy aspect of threading, so I await your answers with interest. regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC www.holdenweb.com PyCon TX 2006 www.python.org/pycon/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Is Python string immutable?
Thank you very much.Steve Holden, I post my soucecode at my blog here:http://hiparrot.wordpress.com/2005/12/08/implementing-a-simple-net-spider/ I wish you can read and give me some suggestion. Any comments will be appreciated.On 12/2/05, Steve Holden [EMAIL PROTECTED] wrote:could ildg wrote: In java and C# String is immutable, str=str+some more will return a new string and leave some gargabe. so in java and C# if there are some frequent string operation, StringBuilder/StringBuffer is recommanded. Will string operation in python also leave some garbage? I implemented a net-spider in python which includes many html string procession. After it running for sometime, the python exe eats up over 300M memory. Is this because the string garbages?If you create garbage in a Python program it will normally be collected and returned to free memory by the garbage collector, which should berun when memory is exhausted in preference to allocating more memory.Additional memory should therefore only be claimed when garbagecollection fails to return sufficient free space. If cyclic data structures are created (structures in which componentsrefer to each other even though no external references exist) this couldcause problems in older versions of Python, but nowadays the garbage collector also takes pains to collect unreferenced cyclic structures. If String in python is immutable, what class should I use to avoid too much garbages when processing strings frequently? The fact that your process uses 300MB implies that you are retainingreferences to a large amount of data. Without seeing the code, however,it's difficult to suggest how you might improve the situation. Are you, for example, holding the HTML for every spidered page?As a side note, both C# and Java also use garbage collection, so if youralgorithm exhibits the same problem in all three languages this merelyconfirms that the problem really is your algorithm, and not the language in which it is implemented.regardsSteve--Steve Holden +44 150 684 7255+1 800 494 3119Holden Web LLC www.holdenweb.comPyCon TX 2006 www.python.org/pycon/--http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list
Re: Is Python string immutable?
Steve Holden wrote: could ildg wrote: Will string operation in python also leave some garbage? I implemented a net-spider in python which includes many html string procession. After it running for sometime, the python exe eats up over 300M memory. Is this because the string garbages? If you create garbage in a Python program it will normally be collected and returned to free memory So far so good -- true for all Pythons. by the garbage collector, which should be run when memory is exhausted in preference to allocating more memory. True for Jython, probably true for IronPython (I don't know IronPython details), definitely not true for CPython. CPython recycles simply held memory as the last reference goes away. In CPython, only when memory is held in (or by) cyclic structures is the garbage collector needed to come in and do recycling work. Additional memory should therefore only be claimed when garbage collection fails to return sufficient free space. If cyclic data structures are created (structures in which components refer to each other even though no external references exist) this could cause problems in older versions of Python, but nowadays the garbage collector also takes pains to collect unreferenced cyclic structures. By older he means substantially older. If you have a Python = 2.0 you have no worries, the collector does cycles. It needs to get to 2.2 or so before weakrefs are handled correctly. * Python 2.4.2 (September 28, 2005) * Python 2.4 (November 30, 2004) * Python 2.3.5 (February 8, 2005) * Python 2.2.3 (May 30, 2003) * Python 2.1.3 (April 8, 2002) * Python 2.0.1 (June 2001) * Python 1.6.1 (September 2000) * Python 1.5.2 (April 1999) --Scott David Daniels [EMAIL PROTECTED] -- http://mail.python.org/mailman/listinfo/python-list
Re: Is Python string immutable?
Chris Mellon wrote: On 11/30/05, could ildg [EMAIL PROTECTED] wrote: In java and C# String is immutable, str=str+some more will return a new string and leave some gargabe. so in java and C# if there are some frequent string operation, StringBuilder/StringBuffer is recommanded. Will string operation in python also leave some garbage? I implemented a net-spider in python which includes many html string procession. After it running for sometime, the python exe eats up over 300M memory. Is this because the string garbages? If String in python is immutable, what class should I use to avoid too much garbages when processing strings frequently? Python strings are immutable. The StringIO class provides a buffer that you can manipulate like a file, and then convert to a string and is probably most suitable for your purposes. another recipe is to build up a list with .append then when you need to convert to a string with .join(alist) -- http://mail.python.org/mailman/listinfo/python-list
Re: Is Python string immutable?
could ildg wrote: In java and C# String is immutable, str=str+some more will return a new string and leave some gargabe. so in java and C# if there are some frequent string operation, StringBuilder/StringBuffer is recommanded. Will string operation in python also leave some garbage? I implemented a net-spider in python which includes many html string procession. After it running for sometime, the python exe eats up over 300M memory. Is this because the string garbages? If you create garbage in a Python program it will normally be collected and returned to free memory by the garbage collector, which should be run when memory is exhausted in preference to allocating more memory. Additional memory should therefore only be claimed when garbage collection fails to return sufficient free space. If cyclic data structures are created (structures in which components refer to each other even though no external references exist) this could cause problems in older versions of Python, but nowadays the garbage collector also takes pains to collect unreferenced cyclic structures. If String in python is immutable, what class should I use to avoid too much garbages when processing strings frequently? The fact that your process uses 300MB implies that you are retaining references to a large amount of data. Without seeing the code, however, it's difficult to suggest how you might improve the situation. Are you, for example, holding the HTML for every spidered page? As a side note, both C# and Java also use garbage collection, so if your algorithm exhibits the same problem in all three languages this merely confirms that the problem really is your algorithm, and not the language in which it is implemented. regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC www.holdenweb.com PyCon TX 2006 www.python.org/pycon/ -- http://mail.python.org/mailman/listinfo/python-list
Is Python string immutable?
In java and C# String is immutable, str=str+some more will return a new string and leave some gargabe. so in java and C# if there are some frequent string operation, StringBuilder/StringBuffer is recommanded. Willstring operation in python also leave some garbage? I implemented a net-spider in python which includes many html string procession. After it running for sometime, the python exe eats up over 300M memory. Is this because the string garbages? If String in python is immutable, what class should I use to avoid too much garbages when processing strings frequently? -- http://mail.python.org/mailman/listinfo/python-list
Re: Is Python string immutable?
On 11/30/05, could ildg [EMAIL PROTECTED] wrote: In java and C# String is immutable, str=str+some more will return a new string and leave some gargabe. so in java and C# if there are some frequent string operation, StringBuilder/StringBuffer is recommanded. Will string operation in python also leave some garbage? I implemented a net-spider in python which includes many html string procession. After it running for sometime, the python exe eats up over 300M memory. Is this because the string garbages? If String in python is immutable, what class should I use to avoid too much garbages when processing strings frequently? Python strings are immutable. The StringIO class provides a buffer that you can manipulate like a file, and then convert to a string and is probably most suitable for your purposes. -- http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list