On 11/07/2012 08:32 PM, Roy Smith wrote:
In article <509ab0fa$0$6636$9b4e6...@newsspool2.arcor-online.net>,
  Alexander Blinne <n...@blinne.net> wrote:

I don't know the best way to find the current size, I only have a
general remark.
This solution is not so good if you have to impose a hard limit on the
resulting file size. You could end up having a tar file of size "limit +
size of biggest file - 1 + overhead" in the worst case if the tar is at
limit - 1 and the next file is the biggest file. Of course that may be
acceptable in many cases or it may be acceptable to do something about
it by adjusting the limit.
If you truly have a hard limit, one possible solution would be to use
tell() to checkpoint the growing archive after each addition.  If adding
a new file unexpectedly causes you exceed your hard limit, you can
seek() back to the previous spot and truncate the file there.

Whether this is worth the effort is an exercise left for the reader.

So I'm not sure if it's an hard limit or not, but I'll check tomorrow.
But in general for the size I could also take the size of the files and simply estimate the size of all of them,
pushing as many as they should fit in a tarfile.
With compression I might get a much smaller file maybe, but it would be much easier..

But the other problem is that at the moment the people that get our chunks reassemble the file with a simple:

cat file1.tar.gz file2.tar.gz > file.tar.gz

which I suppose is not going to work if I create 2 different tar files, since it would recreate the header in all of the them, right? So or I give also a script to reassemble everything or I have to split in a more "brutal" way..

Maybe after all doing the final split was not too bad, I'll first check if it's actually more expensive for the filesystem (which is very very slow)
or it's not a big deal...
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to