Re: +CONTENTS files

Garrett Cooper Mon, 02 Jul 2007 08:11:05 -0700

Alexander Leidinger wrote:

Quoting Garrett Cooper <[EMAIL PROTECTED]> (from Mon, 02 Jul2007 00:55:25 -0700):

[LoN]Kamikaze wrote:

Garrett Cooper wrote:

Pardon me for being naive, but wouldn't it be wiser for all of thedata

in the +CONTENTS file to be aggregated into sections instead of having
line by line info?

Example (net/samba_3.0.25a):

@comment MD5:9e94560ac5e757d3bc5f922dcf3ab4fb
man/man1/log2pcap.1.gz
[~100 lines of repetitive data...]
@comment MD5:9f5fc8df2a1383a175e165ef2e0b10cc
man/man8/vfs_notify_fam.8.gz

  Could be aggregated into:

@MD5
9e94560ac5e757d3bc5f922dcf3ab4fb man/man1/log2pcap.1.gz
c58f068d603a12d4af867c15cf77e636 man/man1/nmblookup.1.gz
[etc..]
@end MD5

  or something similar to XML.

  This would reduce the filesize from n bytes to n - (9 + 4 -1) *
i_entries + 8. In larger package files this would reduce the amount of
data parsing by a long shot. Also, more powerful scripting languages
like Perl, Python, or smart parsers in C could make short work of this
data and just extract the MD5 elements for comparison.

  Also, by doing a little extra work when creating packages by
organizing all the sections together, I think that the file size could
be reduced by a large degree.

  Similar fields to @comment MD5 could be reduced I believe, but with
less benefit maybe, other than just the @unexec rmdir, etc lines.

  Either that, or the data should be organized into separate files I

think (increases number of files, but reduces overall processingtime IMO).

In some cases the order of data stored is important and thus itcannot beseperated into section. Also, this layout allows for very simpleparsing withusual UNIX tools (sed, cut, awk, perl, simply everything). UnlikeXML, which is
rather complex and thus does not belong into base, in my opinion.

We have libbsdxml in the base already (an old version of one in theports).

Ok.

   I didn't say XML exactly. I say XML-like, with implied end and begin
tags, but keeping with the Makefile like syntax of @MD5 ... @end MD5,
or something similar.
The problem is, that a change would break existing installations, asthey can not cope with such a new format. Feel free to proposeimprovements, but you need to keep in your mind, that any supportedFreeBSD release has to be able to install packages with only thepackage tools available in the basesystem.

The point is though that there's a lot of unnecessary bloat, which addsto longer text file sizes, and thus slows down smarter parsers writtenin C, Perl, or Python.

   My point being is that the +CONTENTS file is bloated a lot by
useless lines, and it would help speed up package processing if it was
clipped or reduced somehow I would think.
You need to provide numbers. Without them this is pure speculation.
And you have to explain, why the current parsing routines can not bespeed up for the current format, maybe the implementation is just alittle bit outdated compared to todays parsing knowledge...
Bye,
Alexander.

Ok. I take your challenge and will have preliminary results in 2-3days. Are Excel formatted spreadsheets ok (thinking graphs)?

Thanks,
-Garrett
_______________________________________________
freebsd-ports@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-ports
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: +CONTENTS files

Reply via email to