[issue210647] Segmentation violation on very long lists (PR#334)

2022-04-10 Thread admin


Change by admin :


___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue210647] Segmentation violation on very long lists (PR#334)

2022-04-10 Thread admin


Change by admin :


--
github: None -> 32718

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Are there performance concerns with popping from front of long lists vs. the end of long lists?

2014-06-22 Thread python
Should I have any performance concerns with the index position used to
pop() values off of large lists?



In other words, should pop(0) and pop() be time equivalent operations
with long lists?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Are there performance concerns with popping from front of long lists vs. the end of long lists?

2014-06-22 Thread MRAB

On 2014-06-22 19:03, pyt...@bdurham.com wrote:

Should I have any performance concerns with the index position used
to pop() values off of large lists?

In other words, should pop(0) and pop() be time equivalent operations
 with long lists?


When an item is popped from a list, all of the later items (they are
actually references to each item) are moved down. Therefore, popping
the last item is fast, but popping the first item is slow.

If you want to pop efficiently from both ends, then a deque is the
correct choice of container.
--
https://mail.python.org/mailman/listinfo/python-list


Re: Are there performance concerns with popping from front of long lists vs. the end of long lists?

2014-06-22 Thread Terry Reedy

On 6/22/2014 2:03 PM, pyt...@bdurham.com wrote:

Should I have any performance concerns with the index position used to
pop() values off of large lists?


Yes. While performance is generally not part of the language 
specification, in CPython seq.pop(i) is O(len(seq)-i)



In other words, should pop(0) and pop() be time equivalent operations
with long lists?


No. If you want this, use collections.deque.


--
Terry Jan Reedy

--
https://mail.python.org/mailman/listinfo/python-list


Re: Are there performance concerns with popping from front of long lists vs. the end of long lists?

2014-06-22 Thread Ethan Furman

On 06/22/2014 11:03 AM, pyt...@bdurham.com wrote:


Should I have any performance concerns with the index position used
to pop() values off of large lists? In other words, should pop(0) and
 pop() be time equivalent operations with long lists?


I believe lists are optimized for adding and removing items from the end, so anywhere else will have an impact.  You'll 
have to do measurements to see if the impact is worth worrying about in your code.


--
~Ethan~
--
https://mail.python.org/mailman/listinfo/python-list


Re: Are there performance concerns with popping from front of long lists vs. the end of long lists?

2014-06-22 Thread python
MRAB, Terry, Ethan, and others ...

Thank you - collections.deque is exactly what I was looking for.

Malcolm
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Creating Long Lists

2011-02-23 Thread Jorgen Grahn
On Tue, 2011-02-22, Ben Finney wrote:
 Kelson Zawack zawack...@gis.a-star.edu.sg writes:

 I have a large (10gb) data file for which I want to parse each line
 into an object and then append this object to a list for sorting and
 further processing.

 What is the nature of the further processing?

 Does that further processing access the items sequentially? If so, they
 don't all need to be in memory at once, and you can produce them with a
 generator URL:http://docs.python.org/glossary.html#term-generator.

He mentioned sorting them -- you need all of them for that.

If that's the *only* such use, I'd experiment with writing them as
sortable text to file, and run GNU sort (the Unix utility) on the file.
It seems to have a clever file-backed sort algorithm.

/Jorgen

-- 
  // Jorgen Grahn grahn@  Oo  o.   .  .
\X/ snipabacken.se   O  o   .
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Creating Long Lists

2011-02-23 Thread Tim Wintle
On Wed, 2011-02-23 at 13:57 +, Jorgen Grahn wrote:
 If that's the *only* such use, I'd experiment with writing them as
 sortable text to file, and run GNU sort (the Unix utility) on the file.
 It seems to have a clever file-backed sort algorithm.

+1 - and experiment with the different flags to sort (compression of
intermediate results, intermediate batch size, etc)

Tim


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Creating Long Lists

2011-02-22 Thread Kelson Zawack
The answer it turns out is the garbage collector.  When I disable the
garbage collector before the loop that loads the data into the list
and then enable it after the loop the program runs without issue.
This raises a question though, can the logic of the garbage collector
be changed so that it is not triggered in instances like this were you
really do want to put lots and lots of stuff in memory.  Turning on
and off the garbage collector is not a big deal, but it would
obviously be nicer not to have to.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Creating Long Lists

2011-02-22 Thread Ben Finney
Kelson Zawack zawack...@gis.a-star.edu.sg writes:

 This raises a question though, can the logic of the garbage collector
 be changed so that it is not triggered in instances like this were you
 really do want to put lots and lots of stuff in memory.

Have you considered using a more specialised data type for such large
data sets, such as ‘array.array’ or the NumPy array types?

-- 
 \ “True greatness is measured by how much freedom you give to |
  `\  others, not by how much you can coerce others to do what you |
_o__)   want.” —Larry Wall |
Ben Finney
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Creating Long Lists

2011-02-22 Thread Peter Otten
Kelson Zawack wrote:

 The answer it turns out is the garbage collector.  When I disable the
 garbage collector before the loop that loads the data into the list
 and then enable it after the loop the program runs without issue.
 This raises a question though, can the logic of the garbage collector
 be changed so that it is not triggered in instances like this were you
 really do want to put lots and lots of stuff in memory.  Turning on
 and off the garbage collector is not a big deal, but it would
 obviously be nicer not to have to.

What Python version are you using? The garbage collection heuristic has been 
tweaked in 2.7, see

http://svn.python.org/view/python/trunk/Modules/gcmodule.c?r1=67832r2=68462
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Creating Long Lists

2011-02-22 Thread Kelson Zawack
I am using python 2.6.2, so it may no longer be a problem.

I am open to using another data type, but the way I read the
documentation array.array only supports numeric types, not arbitrary
objects.  I also tried playing around with numpy arrays, albeit for
only a short time, and it seems that although they do support
arbitrary objects they seem to be geared toward numbers as well and I
found it cumbersome to manipulate objects with them.  It could be
though that if I understood them better they would work fine.  Also do
numpy arrays support sorting arbitrary objects, I only saw a method
that sorts numbers.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Creating Long Lists

2011-02-22 Thread Terry Reedy

On 2/22/2011 4:40 AM, Kelson Zawack wrote:

The answer it turns out is the garbage collector.  When I disable the
garbage collector before the loop that loads the data into the list
and then enable it after the loop the program runs without issue.
This raises a question though, can the logic of the garbage collector
be changed so that it is not triggered in instances like this were you
really do want to put lots and lots of stuff in memory.  Turning on
and off the garbage collector is not a big deal, but it would
obviously be nicer not to have to.


Heuristics, by their very nature, are not correct in all situations.

--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list


Creating Long Lists

2011-02-21 Thread Kelson Zawack
I have a large (10gb) data file for which I want to parse each line into 
an object and then append this object to a list for sorting and further 
processing.  I have noticed however that as the length of the list 
increases the rate at which objects are added to it decreases 
dramatically.  My first thought was that  I was nearing the memory 
capacity of the machine and the decrease in performance was due to the 
os swapping things in and out of memory.  When I looked at the memory 
usage this was not the case.  My process was the only job running and 
was consuming 40gb of the the total 130gb and no swapping processes were 
running.  To make sure there was not some problem with the rest of my 
code, or the servers file system, I ran my program again as it was but 
without the line that was appending items to the list and it completed 
without problem indicating that the decrease in performance is the 
result of some part of the process of appending to the list.  Since 
other people have observed this problem as well 
(http://tek-tips.com/viewthread.cfm?qid=1096178page=13,  
http://stackoverflow.com/questions/2473783/is-there-a-way-to-circumvent-python-list-append-becoming-progressively-slower-i) 
I did not bother to further analyze or benchmark it.  Since the answers 
in the above forums do not seem very definitive  I thought  I would 
inquire here about what the reason for this decrease in performance is, 
and if there is a way, or another data structure, that would avoid this 
problem.


--
http://mail.python.org/mailman/listinfo/python-list


Re: Creating Long Lists

2011-02-21 Thread alex23
On Feb 22, 12:57 pm, Kelson Zawack zawack...@gis.a-star.edu.sg
wrote:
 I did not bother to further analyze or benchmark it.  Since the answers
 in the above forums do not seem very definitive  I thought  I would
 inquire here about what the reason for this decrease in performance is,
 and if there is a way, or another data structure, that would avoid this
 problem.

The first link is 6 years old and refers to Python 2.4. Unless you're
using 2.4 you should probably ignore it.

The first answer on the stackoverflow link was accepted by the poster
as resolving his issue. Try disabling garbage collection.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Creating Long Lists

2011-02-21 Thread John Bokma
alex23 wuwe...@gmail.com writes:

 On Feb 22, 12:57 pm, Kelson Zawack zawack...@gis.a-star.edu.sg
 wrote:
 I did not bother to further analyze or benchmark it.  Since the answers
 in the above forums do not seem very definitive  I thought  I would
 inquire here about what the reason for this decrease in performance is,
 and if there is a way, or another data structure, that would avoid this
 problem.

 The first link is 6 years old and refers to Python 2.4. Unless you're
 using 2.4 you should probably ignore it.

 The first answer on the stackoverflow link was accepted by the poster
 as resolving his issue. Try disabling garbage collection.

I just read http://bugs.python.org/issue4074 which discusses a patch
that has been included 2 years ago. So using a recent Python 2.x also
doesn't have this issue?

-- 
John Bokma   j3b

Blog: http://johnbokma.com/Facebook: http://www.facebook.com/j.j.j.bokma
Freelance Perl  Python Development: http://castleamber.com/
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Creating Long Lists

2011-02-21 Thread Dan Stromberg
On Mon, Feb 21, 2011 at 6:57 PM, Kelson Zawack
zawack...@gis.a-star.edu.sgwrote:

 I have a large (10gb) data file for which I want to parse each line into an
 object and then append this object to a list for sorting and further
 processing.  I have noticed however that as the length of the list increases
 the rate at which objects are added to it decreases dramatically.  My first
 thought was that  I was nearing the memory capacity of the machine and the
 decrease in performance was due to the os swapping things in and out of
 memory.  When I looked at the memory usage this was not the case.  My
 process was the only job running and was consuming 40gb of the the total
 130gb and no swapping processes were running.  To make sure there was not
 some problem with the rest of my code, or the servers file system, I ran my
 program again as it was but without the line that was appending items to the
 list and it completed without problem indicating that the decrease in
 performance is the result of some part of the process of appending to the
 list.  Since other people have observed this problem as well (
 http://tek-tips.com/viewthread.cfm?qid=1096178page=13,
 http://stackoverflow.com/questions/2473783/is-there-a-way-to-circumvent-python-list-append-becoming-progressively-slower-i)
 I did not bother to further analyze or benchmark it.  Since the answers in
 the above forums do not seem very definitive  I thought  I would inquire
 here about what the reason for this decrease in performance is, and if there
 is a way, or another data structure, that would avoid this 
 problem.http://mail.python.org/mailman/listinfo/python-list


Do you have 130G of physical RAM, or 130G of virtual memory?  That makes a
big difference.  (Yeah, I know, 130G of physical RAM is probably pretty rare
today)

Disabling garbage collection is a good idea, but if you don't have well over
10G of physical RAM, you'd probably better also use a (partially) disk-based
sort.  To do otherwise would pretty much beg for swapping and a large
slowdown.

Merge sort works very well for very large datasets.
http://en.wikipedia.org/wiki/Merge_sort  Just make your sublists be disk
files, not in-memory lists - until you get down to a small enough sublist
that you can sort it in memory, without thrashing.  Timsort (list_.sort())
is excellent for in memory sorting.

Actually, GNU sort is very good at sorting huge datasets - you could
probably just open a subprocess to it, as long as you can make your data fit
the line-oriented model GNU sort expects, and you have enough temporary disk
space.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Creating Long Lists

2011-02-21 Thread Ben Finney
Kelson Zawack zawack...@gis.a-star.edu.sg writes:

 I have a large (10gb) data file for which I want to parse each line
 into an object and then append this object to a list for sorting and
 further processing.

What is the nature of the further processing?

Does that further processing access the items sequentially? If so, they
don't all need to be in memory at once, and you can produce them with a
generator URL:http://docs.python.org/glossary.html#term-generator.

Note that, if you just want lines of text from a file, the file object
itself is a generator for the lines of text within it.

If, on the other hand, you need arbitrary access all over that large
data set, you probably want a data type better suited. The standard
library has the ‘array’ module for this purpose; the third-party NumPy
library provides even more power.

-- 
 \   “Remember: every member of your ‘target audience’ also owns a |
  `\   broadcasting station. These ‘targets’ can shoot back.” —Michael |
_o__)   Rathbun to advertisers, news.admin.net-abuse.email |
Ben Finney
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Creating Long Lists

2011-02-21 Thread Dan Stromberg
On Mon, Feb 21, 2011 at 7:24 PM, Dan Stromberg drsali...@gmail.com wrote:


 On Mon, Feb 21, 2011 at 6:57 PM, Kelson Zawack 
 zawack...@gis.a-star.edu.sg wrote:

 I have a large (10gb) data file for which I want to parse each line into
 an object and then append this object to a list for sorting and further
 processing.  I have noticed however that as the length of the list increases
 the rate at which objects are added to it decreases dramatically.  My first
 thought was that  I was nearing the memory capacity of the machine and the
 decrease in performance was due to the os swapping things in and out of
 memory.  When I looked at the memory usage this was not the case.  My
 process was the only job running and was consuming 40gb of the the total
 130gb and no swapping processes were running.  To make sure there was not
 some problem with the rest of my code, or the servers file system, I ran my
 program again as it was but without the line that was appending items to the
 list and it completed without problem indicating that the decrease in
 performance is the result of some part of the process of appending to the
 list.  Since other people have observed this problem as well (
 http://tek-tips.com/viewthread.cfm?qid=1096178page=13,
 http://stackoverflow.com/questions/2473783/is-there-a-way-to-circumvent-python-list-append-becoming-progressively-slower-i)
 I did not bother to further analyze or benchmark it.  Since the answers in
 the above forums do not seem very definitive  I thought  I would inquire
 here about what the reason for this decrease in performance is, and if there
 is a way, or another data structure, that would avoid this 
 problem.http://mail.python.org/mailman/listinfo/python-list


 Do you have 130G of physical RAM, or 130G of virtual memory?  That makes a
 big difference.  (Yeah, I know, 130G of physical RAM is probably pretty rare
 today)

 Disabling garbage collection is a good idea, but if you don't have well
 over 10G of physical RAM, you'd probably better also use a (partially)
 disk-based sort.  To do otherwise would pretty much beg for swapping and a
 large slowdown.

 Merge sort works very well for very large datasets.
 http://en.wikipedia.org/wiki/Merge_sort  Just make your sublists be disk
 files, not in-memory lists - until you get down to a small enough sublist
 that you can sort it in memory, without thrashing.  Timsort (list_.sort())
 is excellent for in memory sorting.

 Actually, GNU sort is very good at sorting huge datasets - you could
 probably just open a subprocess to it, as long as you can make your data fit
 the line-oriented model GNU sort expects, and you have enough temporary disk
 space.


Depending on what you're doing after the sort, you might also look at
bsddb.btopen
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: long lists

2007-05-08 Thread Merrigan
On May 7, 10:21 pm, Gabriel Genellina [EMAIL PROTECTED]
wrote:
 En Mon, 07 May 2007 09:14:34 -0300, Merrigan [EMAIL PROTECTED]
 escribió:

  The Script it available at this url :
 http://www.lewendewoord.co.za/theScript.py

 I understand this as a learning exercise, since there are lot of utilities
 for remote syncing.

 Some comments:
 - use os.path.join to build file paths, instead of concatenating strings.
 - instead of reassigning sys.stdout before the call to retrlines, use the
 callback:

  saveinfo = sys.stdout
  fsock = open(tempDir + remotelist.txt, a)
  sys.stdout = fsock
  ftpconn.cwd(remotedir) #This changes to the remote directory
  ftpconn.retrlines(LIST) #This gets a complete list of everything in
 the directory
  sys.stdout = saveinfo
  fsock.close()

 becomes:

  fsock = open(os.path.join(tempDir,remotelist.txt), a)
  ftpconn.cwd(remotedir) #This changes to the remote directory
  ftpconn.retrlines(LIST, fsock.write) #This gets a complete list of
 everything in the directory
  fsock.close()
  (Why mode=a? Shouldn't it be w? Isn't the listing for a single
 directory?)

 - Saving both file lists may be useful, but why do you read them again? If
 you already have a list of local filenames and remote filenames, why read
 them from the saved copy?
 - It's very confusing having filenames ending with \n - strip that as
 you read it. You can use fname = fname.rstrip()
 - If you are interested on filenames with a certain extension, only
 process those files. That is, filter them *before* the processing begins.

 - The time-consuming part appears to be this:

 def comp_are():
  global toup
  temptoup = []
  for file1 in remotefiles:
  a = file1
  for file2 in localfiles:
  b = file2
  if str(a) == str(b):
  pass
  if str(b) != str(a):
  temptoup.append(str(str(b)))
  toup = list(sets.Set(temptoup))
  for filename in remotefiles:
  fn2up = filename
  for item in toup:
  if fn2up == item:
  toup.remove(item)
  else:
  pass
  toup.sort()

 (It's mostly nonsense... what do you expect from str(str(b)) different
  from str(b)? and the next line is just a waste of time, can you see why?)
 I think you want to compare two lists of filenames, and keep the elements
 that are on one localfiles list but not on the other. As you appear to
 know about sets: it's the set difference between localfiles and
 remotefiles. Keeping the same globalish thing:

 def comp_are():
  global toup
  toup = list(sets.Set(localfiles) - sets.Set(remotefiles))
  toup.sort()

 Since Python 2.4, set is a builtin type, and you have sorted(), so you
 could write:

 def comp_are():
  global toup
  toup = sorted(set(localfiles) - set(remotefiles))

 - Functions may have parameters and return useful things :)
 That is, you may write, by example:

remotefiles = getRemoteFiles(host, remotedir)
localfiles = getLocalFiles(localdir)
newfiles = findNewFiles(localfiles, remotefiles)
uploadFiles(host, newfiles)

 --
 Gabriel Genellina

Hmmm, thanks a lot. This has really been helpful. I have tried putting
it in the set, and whoops, it workes. Now, I think I need to start
learning some more.

now the script is running a lot slower...
Now to get the rest of it up and running...

Thanx for the help!

-- 
http://mail.python.org/mailman/listinfo/python-list


long lists

2007-05-07 Thread Merrigan
Hi All,

Firstly - thank you Sean for the help and the guideline to get the
size comparison, I will definitely look into this.

At the moment I actually have 2 bigger issues that needs sorting...

1. I have the script popping all the files that need to be checked
into a list, and have it parsing the list for everything...Now the
problem is this : The sever needs to check (at the moment) 375 files
and eliminate those that don't need reuploading. This number will
obviously get bigger and bigger as more files gets uploaded. Now, the
problem that I'm having is that the script is taking forever to parse
the list and give the final result. How can I speed this up?

2. This issue is actually because of the first one. While the script
is parsing the lists and files, the connection to the ftp server times
out, and I honestly must say that is is quite annoying. I know I can
set the function to reconnect if it cannot find a connection, but
wouldn't it just be easier just to keep the connection alive? Any idea
how I can keep the connection alive?

Thanks for all the help folks, I really appreciate it!

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: long lists

2007-05-07 Thread Steven D'Aprano
On Mon, 07 May 2007 00:28:14 -0700, Merrigan wrote:

 1. I have the script popping all the files that need to be checked into
 a list, and have it parsing the list for everything...Now the problem is
 this : The sever needs to check (at the moment) 375 files and eliminate
 those that don't need reuploading. This number will obviously get bigger
 and bigger as more files gets uploaded. Now, the problem that I'm having
 is that the script is taking forever to parse the list and give the
 final result. How can I speed this up?

By writing faster code???

It's really hard to answer this without more information. In particular:

- what's the format of the list and how do you parse it?

- how does the script decide what files need uploading?



-- 
Steven.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: long lists

2007-05-07 Thread Merrigan
On May 7, 10:18 am, Steven D'Aprano
[EMAIL PROTECTED] wrote:
 On Mon, 07 May 2007 00:28:14 -0700, Merrigan wrote:
  1. I have the script popping all the files that need to be checked into
  a list, and have it parsing the list for everything...Now the problem is
  this : The sever needs to check (at the moment) 375 files and eliminate
  those that don't need reuploading. This number will obviously get bigger
  and bigger as more files gets uploaded. Now, the problem that I'm having
  is that the script is taking forever to parse the list and give the
  final result. How can I speed this up?

 By writing faster code???

 It's really hard to answer this without more information. In particular:

 - what's the format of the list and how do you parse it?

 - how does the script decide what files need uploading?

 --
 Steven.

Hi, Thanx for the reply,

The Script it available at this url : http://www.lewendewoord.co.za/theScript.py

P.S. I know it looks like crap, but I'm a n00b, and not yet through
the OOP part of the tutorial.

Thanx in advance!

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: long lists

2007-05-07 Thread Marc 'BlackJack' Rintsch
In [EMAIL PROTECTED], Merrigan wrote:

 The Script it available at this url : 
 http://www.lewendewoord.co.za/theScript.py
 
 P.S. I know it looks like crap, but I'm a n00b, and not yet through
 the OOP part of the tutorial.

One spot of really horrible runtime is the `comp_are()` function, it has
quadratic runtime. Why the funny spelling BTW?

Why are you binding the objects to new names all the time and calling
`str()` repeatedly on string objects?  The names `a`, `b` and `fn2up` are
unnecessary, you can use `file1`, `file2` and `filename` instead.  And
``str(str(b))`` on a string object is a no-operation.  It's the same as
simply writing ``b``.

Those two nested ``for``-loops can be replaced by converting both lists
into `set()` objects, calculating the difference and convert back to a
sorted list:

def compare(remote, local):
return sorted(set(local) - set(remote))

Ciao,
Marc 'BlackJack' Rintsch
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: long lists

2007-05-07 Thread half . italian
On May 7, 5:14 am, Merrigan [EMAIL PROTECTED] wrote:
 On May 7, 10:18 am, Steven D'Aprano



 [EMAIL PROTECTED] wrote:
  On Mon, 07 May 2007 00:28:14 -0700, Merrigan wrote:
   1. I have the script popping all the files that need to be checked into
   a list, and have it parsing the list for everything...Now the problem is
   this : The sever needs to check (at the moment) 375 files and eliminate
   those that don't need reuploading. This number will obviously get bigger
   and bigger as more files gets uploaded. Now, the problem that I'm having
   is that the script is taking forever to parse the list and give the
   final result. How can I speed this up?

  By writing faster code???

  It's really hard to answer this without more information. In particular:

  - what's the format of the list and how do you parse it?

  - how does the script decide what files need uploading?

  --
  Steven.

 Hi, Thanx for the reply,

 The Script it available at this url 
 :http://www.lewendewoord.co.za/theScript.py

 P.S. I know it looks like crap, but I'm a n00b, and not yet through
 the OOP part of the tutorial.

 Thanx in advance!

Do you have access to the machine via ssh?  I would try to get away
from FTP and use rsync for this kind of thing if possible.

~Sean

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: long lists

2007-05-07 Thread Gabriel Genellina
En Mon, 07 May 2007 09:14:34 -0300, Merrigan [EMAIL PROTECTED]  
escribió:

 The Script it available at this url :  
 http://www.lewendewoord.co.za/theScript.py

I understand this as a learning exercise, since there are lot of utilities  
for remote syncing.

Some comments:
- use os.path.join to build file paths, instead of concatenating strings.
- instead of reassigning sys.stdout before the call to retrlines, use the  
callback:

 saveinfo = sys.stdout
 fsock = open(tempDir + remotelist.txt, a)
 sys.stdout = fsock
 ftpconn.cwd(remotedir) #This changes to the remote directory
 ftpconn.retrlines(LIST) #This gets a complete list of everything in  
the directory
 sys.stdout = saveinfo
 fsock.close()

becomes:

 fsock = open(os.path.join(tempDir,remotelist.txt), a)
 ftpconn.cwd(remotedir) #This changes to the remote directory
 ftpconn.retrlines(LIST, fsock.write) #This gets a complete list of  
everything in the directory
 fsock.close()
 (Why mode=a? Shouldn't it be w? Isn't the listing for a single  
directory?)

- Saving both file lists may be useful, but why do you read them again? If  
you already have a list of local filenames and remote filenames, why read  
them from the saved copy?
- It's very confusing having filenames ending with \n - strip that as  
you read it. You can use fname = fname.rstrip()
- If you are interested on filenames with a certain extension, only  
process those files. That is, filter them *before* the processing begins.

- The time-consuming part appears to be this:

def comp_are():
 global toup
 temptoup = []
 for file1 in remotefiles:
 a = file1
 for file2 in localfiles:
 b = file2
 if str(a) == str(b):
 pass
 if str(b) != str(a):
 temptoup.append(str(str(b)))
 toup = list(sets.Set(temptoup))
 for filename in remotefiles:
 fn2up = filename
 for item in toup:
 if fn2up == item:
 toup.remove(item)
 else:
 pass
 toup.sort()

(It's mostly nonsense... what do you expect from str(str(b)) different  
 from str(b)? and the next line is just a waste of time, can you see why?)
I think you want to compare two lists of filenames, and keep the elements  
that are on one localfiles list but not on the other. As you appear to  
know about sets: it's the set difference between localfiles and  
remotefiles. Keeping the same globalish thing:

def comp_are():
 global toup
 toup = list(sets.Set(localfiles) - sets.Set(remotefiles))
 toup.sort()

Since Python 2.4, set is a builtin type, and you have sorted(), so you  
could write:

def comp_are():
 global toup
 toup = sorted(set(localfiles) - set(remotefiles))

- Functions may have parameters and return useful things :)
That is, you may write, by example:

   remotefiles = getRemoteFiles(host, remotedir)
   localfiles = getLocalFiles(localdir)
   newfiles = findNewFiles(localfiles, remotefiles)
   uploadFiles(host, newfiles)

-- 
Gabriel Genellina

-- 
http://mail.python.org/mailman/listinfo/python-list