Re: Is Python string immutable?

2005-12-08 Thread Steve Holden
Frank Potter wrote:
 Thank you very much.
 Steve Holden, I post my soucecode at my blog here:
 http://hiparrot.wordpress.com/2005/12/08/implementing-a-simple-net-spider/ 
 http://hiparrot.wordpress.com/2005/12/08/implementing-a-simple-net-spider/
 I wish you can read and give me some suggestion.
 Any comments will be appreciated.
 
Before any intensive scrutiny of the code perhaps you can answer a few 
questions.

1. Are you absolutely sure you have published the code you are running? 
The code you reference has indentation problems that give syntax errors.

2. Why do you continually create new threads when you could have a fixed 
set of threads sharing a list of URLs to scan?

3. What steps have you taken to ensure that the program does indeed 
perform according to its specifications?

It seems to me that this code is a Java program transliterated into 
Python. A more natural way to express the algorithm would be for a 
number of worker threads to share  Queue.Queue of URLs to be visited. 
The only other data structure that would then need locking would be a 
dictionary of URLs that had already been considered.

Overall you appear to be spending large amounts of time locking and 
unlocking things, and creating threads unnecessarily, but you don't 
claim that the algorithm is particularly refined, and it shouldn;t 
affect memory usage.

However, I would like to be sure that the excessive memory usage *is* 
indeed something to do with your program, as I contend, and not some 
buggy aspect of threading, so I await your answers with interest.

regards
  Steve
-- 
Steve Holden   +44 150 684 7255  +1 800 494 3119
Holden Web LLC www.holdenweb.com
PyCon TX 2006  www.python.org/pycon/

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is Python string immutable?

2005-12-07 Thread Frank Potter
Thank you very much.Steve Holden, I post my soucecode at my blog here:http://hiparrot.wordpress.com/2005/12/08/implementing-a-simple-net-spider/
I wish you can read and give me some suggestion. Any comments will be appreciated.On 12/2/05, Steve Holden 
[EMAIL PROTECTED] wrote:could ildg wrote: In java and C# String is immutable, str=str+some more will return a
 new string and leave some gargabe. so in java and C# if there are some frequent string operation, StringBuilder/StringBuffer is recommanded. Will string operation in python also leave some garbage? I implemented a
 net-spider in python which includes many html string procession. After it running for sometime, the python exe eats up over 300M memory. Is this because the string garbages?If you create garbage in a Python program it will normally be collected
and returned to free memory by the garbage collector, which should berun when memory is exhausted in preference to allocating more memory.Additional memory should therefore only be claimed when garbagecollection fails to return sufficient free space.
If cyclic data structures are created (structures in which componentsrefer to each other even though no external references exist) this couldcause problems in older versions of Python, but nowadays the garbage
collector also takes pains to collect unreferenced cyclic structures. If String in python is immutable, what class should I use to avoid too much garbages when processing strings frequently?
The fact that your process uses 300MB implies that you are retainingreferences to a large amount of data. Without seeing the code, however,it's difficult to suggest how you might improve the situation. Are you,
for example, holding the HTML for every spidered page?As a side note, both C# and Java also use garbage collection, so if youralgorithm exhibits the same problem in all three languages this merelyconfirms that the problem really is your algorithm, and not the language
in which it is implemented.regardsSteve--Steve Holden +44 150 684 7255+1 800 494 3119Holden Web LLC www.holdenweb.comPyCon TX 2006
www.python.org/pycon/--http://mail.python.org/mailman/listinfo/python-list
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Is Python string immutable?

2005-12-03 Thread Scott David Daniels
Steve Holden wrote:
 could ildg wrote:
 Will string operation in python also leave some garbage? I implemented 
 a net-spider in python which includes many html string procession. 
 After it running for sometime, the python exe eats up over 300M 
 memory. Is this because the string garbages?
  
 If you create garbage in a Python program it will normally be collected 
 and returned to free memory

So far so good -- true for all Pythons.

 by the garbage collector, which should be run when memory is exhausted
  in preference to allocating more memory.

True for Jython, probably true for IronPython (I don't know IronPython
details), definitely not true for CPython.  CPython recycles simply
held memory as the last reference goes away.  In CPython, only when
memory is held in (or by) cyclic structures is the garbage collector
needed to come in and do recycling work.

 Additional memory should therefore only be claimed when garbage 
 collection fails to return sufficient free space.
 
 If cyclic data structures are created (structures in which components 
 refer to each other even though no external references exist) this could 
 cause problems in older versions of Python, but nowadays the garbage 
 collector also takes pains to collect unreferenced cyclic structures.

By older he means substantially older.  If you have a Python = 2.0 you
have no worries, the collector does cycles.  It needs to get to 2.2 or
so before weakrefs are handled correctly.

 * Python 2.4.2 (September 28, 2005)
 * Python 2.4 (November 30, 2004)
 * Python 2.3.5 (February 8, 2005)
 * Python 2.2.3 (May 30, 2003)
 * Python 2.1.3 (April 8, 2002)
 * Python 2.0.1 (June 2001)
 * Python 1.6.1 (September 2000)
 * Python 1.5.2 (April 1999)

--Scott David Daniels
[EMAIL PROTECTED]
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is Python string immutable?

2005-12-02 Thread Martin Franklin
Chris Mellon wrote:
 On 11/30/05, could ildg [EMAIL PROTECTED] wrote:
 
In java and C# String is immutable, str=str+some more will return a new
string and leave some gargabe.
so in java and C# if there are some frequent string operation,
StringBuilder/StringBuffer is recommanded.

Will string operation in python also leave some garbage? I implemented a
net-spider in python which includes many html string procession. After it
running for sometime, the python exe eats up over 300M memory. Is this
because the string garbages?

If String in python is immutable, what class should I use to avoid too much
garbages when processing strings frequently?
 
 
 Python strings are immutable. The StringIO class provides a buffer
 that you can manipulate like a file, and then convert to a string and
 is probably most suitable for your purposes.
 

another recipe is to build up a list with .append then when you need to
convert to a string with  .join(alist)





-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is Python string immutable?

2005-12-02 Thread Steve Holden
could ildg wrote:
 In java and C# String is immutable, str=str+some more will return a 
 new string and leave some gargabe.
 so in java and C# if there are some frequent string operation, 
 StringBuilder/StringBuffer is recommanded.
  
 Will string operation in python also leave some garbage? I implemented a 
 net-spider in python which includes many html string procession. After 
 it running for sometime, the python exe eats up over 300M memory. Is 
 this because the string garbages?
  
If you create garbage in a Python program it will normally be collected 
and returned to free memory by the garbage collector, which should be 
run when memory is exhausted in preference to allocating more memory. 
Additional memory should therefore only be claimed when garbage 
collection fails to return sufficient free space.

If cyclic data structures are created (structures in which components 
refer to each other even though no external references exist) this could 
cause problems in older versions of Python, but nowadays the garbage 
collector also takes pains to collect unreferenced cyclic structures.

 If String in python is immutable, what class should I use to avoid too 
 much garbages when processing strings frequently?
 
The fact that your process uses 300MB implies that you are retaining 
references to a large amount of data. Without seeing the code, however, 
it's difficult to suggest how you might improve the situation. Are you, 
for example, holding the HTML for every spidered page?

As a side note, both C# and Java also use garbage collection, so if your 
algorithm exhibits the same problem in all three languages this merely 
confirms that the problem really is your algorithm, and not the language 
in which it is implemented.

regards
  Steve
-- 
Steve Holden   +44 150 684 7255  +1 800 494 3119
Holden Web LLC www.holdenweb.com
PyCon TX 2006  www.python.org/pycon/

-- 
http://mail.python.org/mailman/listinfo/python-list


Is Python string immutable?

2005-11-30 Thread could ildg
In java and C# String is immutable, str=str+some more will return a new string and leave some gargabe. 
so in java and C# if there are some frequent string operation, StringBuilder/StringBuffer is recommanded.

Willstring operation in python also leave some garbage? I implemented a net-spider in python which includes many html string procession. After it running for sometime, the python exe eats up over 300M memory. Is this because the string garbages? 


If String in python is immutable, what class should I use to avoid too much garbages when processing strings frequently?
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Is Python string immutable?

2005-11-30 Thread Chris Mellon
On 11/30/05, could ildg [EMAIL PROTECTED] wrote:
 In java and C# String is immutable, str=str+some more will return a new
 string and leave some gargabe.
 so in java and C# if there are some frequent string operation,
 StringBuilder/StringBuffer is recommanded.

 Will string operation in python also leave some garbage? I implemented a
 net-spider in python which includes many html string procession. After it
 running for sometime, the python exe eats up over 300M memory. Is this
 because the string garbages?

 If String in python is immutable, what class should I use to avoid too much
 garbages when processing strings frequently?

Python strings are immutable. The StringIO class provides a buffer
that you can manipulate like a file, and then convert to a string and
is probably most suitable for your purposes.

 --
 http://mail.python.org/mailman/listinfo/python-list


-- 
http://mail.python.org/mailman/listinfo/python-list