Re: [gentoo-dev] Re: [RFC] Moving HOMEPAGE out of ebuilds for the future
On Wed, 03 Dec 2008 02:05:31 +0100 [EMAIL PROTECTED] (Diego 'Flameeyes' Pettenò) wrote: > metadata.xml already contains data that eix and other software should > be able to search in (like longdescriptions), and having each package > in kde-base report http://www.kde.org/ as its homepage is kinda > pointless if you think about search, since that's not data, it's > noise. So you're saying if I'm interested in a url to look for information about kalarm, I should search for it in metadata.xml of random kde packages? Sorry, but that doesn't make any sense to me. While I'm not necessarily against your primary goal here, your argumentation is very subjective to say the least (e.g. just because you find xml easier to read/parse than ebuilds doesn't mean the same holds true for everyone else, ignoring the whole cache issue). It feels a bit like you're looking for problems to justify your solution rather than the other way round. Marius
[gentoo-dev] Re: [RFC] Moving HOMEPAGE out of ebuilds for the future
On Mon, 01 Dec 2008 10:00:33 +0100 [EMAIL PROTECTED] (Diego 'Flameeyes' Pettenò) wrote: > "Alec Warner" <[EMAIL PROTECTED]> writes: > > > That being said I still don't see the usefulness here. > > > > You seem to think that using the existing APIs for this data is > > wrong, and I think the opposite, so I guess we will agree to > > disagree on this matter. > > Yeah I still think that there is no point in requiring using of a > specific API when the same data can easily be available in a format > that is more or less parsable with ease in any modern (and non) > programming language. > > Beside, I find expanding the HOMEPAGE syntax to allow more than one > link a bit ... overkill, if the same thing can be achieved in > metadata.xml... I find moving HOMEPAGE out of ebuilds to be a bit overkill. -- gcc-porting, by design, by neglect treecleaner, for a fact or just for effect wxwidgets @ gentoo EFFD 380E 047A 4B51 D2BD C64F 8AA8 8346 F9A4 0662 signature.asc Description: PGP signature
[gentoo-dev] Re: [RFC] Moving HOMEPAGE out of ebuilds for the future
James Cloos <[EMAIL PROTECTED]> writes: > Searching is an important reason for every package to specify its homepage. And? metadata.xml already contains data that eix and other software should be able to search in (like longdescriptions), and having each package in kde-base report http://www.kde.org/ as its homepage is kinda pointless if you think about search, since that's not data, it's noise. Which only adds to my point. -- Diego "Flameeyes" Pettenò http://blog.flameeyes.eu/ pgpS3fcWfM3UH.pgp Description: PGP signature
Re: [gentoo-dev] Re: [RFC] Moving HOMEPAGE out of ebuilds for the future
> "Diego" == Diego 'Flameeyes' Pettenò <[EMAIL PROTECTED]> writes: >> But also the need to replicate http://www.kde.org/ to metadata.xml of >> all KDE split ebuilds -- right now, this is set by an eclass. Diego> The usefulness of this is IMHO debatable; why not just writing it one Diego> package (say kde-base/kde or kde-meta) and just there? Having each Diego> mini-package express itself as having that as its homepage is not very Diego> useful to me, but I guess it's debatable. Searching is an important reason for every package to specify its homepage. -JimC -- James Cloos <[EMAIL PROTECTED]> OpenPGP: 1024D/ED7DAEA6
Re: [gentoo-dev] Re: [RFC] Moving HOMEPAGE out of ebuilds for the future
> "Jan" == Jan Kundrát <[EMAIL PROTECTED]> writes: >> - less data in metadata cache; Jan> Isn't it in the cache for some reason? Really, I'm just asking. If for nothing else, so that update-eix can get it to allow searching on homepage. And, yes, that is an important feature. And, no, openeing every metadata.xml file during update-eix is in no way acceptable. For eix above, of course, read your favourite query tool. -JimC -- James Cloos <[EMAIL PROTECTED]> OpenPGP: 1024D/ED7DAEA6
[gentoo-dev] Re: [RFC] Moving HOMEPAGE out of ebuilds for the future
"Alec Warner" <[EMAIL PROTECTED]> writes: > That being said I still don't see the usefulness here. > > You seem to think that using the existing APIs for this data is wrong, > and I think the opposite, so I guess we will agree to disagree on this > matter. Yeah I still think that there is no point in requiring using of a specific API when the same data can easily be available in a format that is more or less parsable with ease in any modern (and non) programming language. Beside, I find expanding the HOMEPAGE syntax to allow more than one link a bit ... overkill, if the same thing can be achieved in metadata.xml... >> Beside, if you really want to go down that road you should be counting >> that beside ReiserFS with tail, I don't remember any other Linux FS that >> has block smaller than 512bytes, which means that each file in metadata >> cache is taking up much more than just its size in characters. >> >> All your math is thus wrong. > > As was pointed out on IRC, UTF8 characters are not a fixed size, > making my math even more wrong ;) If we consider HOMEPAGE, the assumption that characters are fixed size to 1 byte is good enough; URLs are usually encoded in pure ascii character space for compatibility; while IDN would break that assumption, we can't even assume that IDN is always available and so on. For description maybe it's different because there is space there for UTF-8 characters, but that's going to bring us even farthest than the point. -- Diego "Flameeyes" Pettenò http://blog.flameeyes.eu/ pgpfg98QhGFq3.pgp Description: PGP signature
Re: [gentoo-dev] Re: [RFC] Moving HOMEPAGE out of ebuilds for the future
On Mon, Dec 1, 2008 at 12:24 AM, Diego 'Flameeyes' Pettenò <[EMAIL PROTECTED]> wrote: > "Alec Warner" <[EMAIL PROTECTED]> writes: > >> - Space savings. Certainly your scheme may be smaller, but the XML >> tag overhead may eat into the savings. You should do some estimates >> to show the community how much smaller the tree will be from this >> proposal. > > Sorry but you lost me on any point you might have brought across since > after this I feel like you were trying to put words in my mouth. Sorry for that, I never meant to imply that you said space savings. That being said I still don't see the usefulness here. You seem to think that using the existing APIs for this data is wrong, and I think the opposite, so I guess we will agree to disagree on this matter. > > Beside, if you really want to go down that road you should be counting > that beside ReiserFS with tail, I don't remember any other Linux FS that > has block smaller than 512bytes, which means that each file in metadata > cache is taking up much more than just its size in characters. > > All your math is thus wrong. As was pointed out on IRC, UTF8 characters are not a fixed size, making my math even more wrong ;) > > -- > Diego "Flameeyes" Pettenò > http://blog.flameeyes.eu/ >
[gentoo-dev] Re: [RFC] Moving HOMEPAGE out of ebuilds for the future
Jan Kundrát <[EMAIL PROTECTED]> writes: > But also the need to replicate http://www.kde.org/ to metadata.xml of > all KDE split ebuilds -- right now, this is set by an eclass. The usefulness of this is IMHO debatable; why not just writing it one package (say kde-base/kde or kde-meta) and just there? Having each mini-package express itself as having that as its homepage is not very useful to me, but I guess it's debatable. >> - allows proper handling of packages lacking a HOMEPAGE; > > Could you elaborate a bit about how different is handling of an > empty/uninitialized shell variable from an empty XML element? That you can provide _other_ links beside an homepage, like "unmaintained", "gentoo:userguide" and stuff like that so that user don't just get no homepage at all, and they are not misdirected by homepage being http://www.gentoo.org/ or something. >> - users can check the metadata much more easily by just opening the xml >> file or interfacing to that rather than having to skim through the >> ebuild, the xml files are probably more user readable then ebuilds >> using multiple eclasses; > > Haven't we already agreed that accessing ebuilds/... directly is > broken by design? For a software sure, but as an user I am automatically brought to just look at the files if I'm looking for the homepage of a package I know, and seeing a metadata.xml file I'm more likely to look at that rather than the metadata cache in /var/db/... . And it's certainly more user-readable an XML file than HOMEPAGE with depend-like syntax for labels and conditionals and whatever else seems like Alec is proposing for EAPI=3 >> - webapps like packages.gentoo.org would be able to display basic >> information without having to parse the ebuilds or the metadata cache. > > Except for the ebuilds which still use the old format (that is 100% of > the tree right now) This of course is meant as "whenever this is fully implemented" -- Diego "Flameeyes" Pettenò http://blog.flameeyes.eu/ pgpfOxlYEmqMh.pgp Description: PGP signature
[gentoo-dev] Re: [RFC] Moving HOMEPAGE out of ebuilds for the future
"Alec Warner" <[EMAIL PROTECTED]> writes: > - Space savings. Certainly your scheme may be smaller, but the XML > tag overhead may eat into the savings. You should do some estimates > to show the community how much smaller the tree will be from this > proposal. Sorry but you lost me on any point you might have brought across since after this I feel like you were trying to put words in my mouth. Beside, if you really want to go down that road you should be counting that beside ReiserFS with tail, I don't remember any other Linux FS that has block smaller than 512bytes, which means that each file in metadata cache is taking up much more than just its size in characters. All your math is thus wrong. -- Diego "Flameeyes" Pettenò http://blog.flameeyes.eu/ pgpcFFtmtOt8h.pgp Description: PGP signature
Re: [gentoo-dev] Re: [RFC] Moving HOMEPAGE out of ebuilds for the future
On Sun, Nov 30, 2008 at 3:12 PM, Diego 'Flameeyes' Pettenò <[EMAIL PROTECTED]> wrote: > "Alec Warner" <[EMAIL PROTECTED]> writes: > >> Diego, What are the concrete benefits of your proposal? > > As I said: > > - no need to replicate homepage data between versions; even though forks > can change homepage, I would expect that to at worse split in two a > package, or have to be different by slot, like Java; > - allows proper handling of packages lacking a HOMEPAGE; > - less data in metadata cache; > - users can check the metadata much more easily by just opening the xml > file or interfacing to that rather than having to skim through the > ebuild, the xml files are probably more user readable then ebuilds > using multiple eclasses; > - displaying info about the package does not require parsing the full > ebuild file, with its eclasses; > - extensible to provide more links than just the homepage (forums, > trackers, gentoo-specific documentation, ...); > - if we also move DESCRIPTION, search software can ignore everything > about ebuild parsing, and just use the metadata.xml files; considering > how many people actually use or used eix, it would make sense to allow > third-party applications to be able to search through the tree; > - webapps like packages.gentoo.org would be able to display basic > information without having to parse the ebuilds or the metadata cache. > - as much as people might think metadata is easier to parse than > anything, XML has one huge advantage: there are plently of parsers for > any language without having to actually write one, even as easy as it > can be, and it's easily interfaced with anything; I wrote a simple XSL > file that outputs the basic metadata details for packages without > having any parser or executable code but xsltproc (or any other XSLT > software), correlating data with herds.xml too; > - it really is metadata, and it makes very little sense to need parsing > of eclasses and EAPI handling to get some data from a package that is > non-functional in nature and free form (just like DESCRIPTION, and > unlike LICENSE like Alec said), and that changes at worse once each > slot (unlike LICENSE that can change at any given version). > > Disadvantages: > > - it requires user-interface software to parse metadata.xml to show > data for a package; which is already needed to show per-package USE > flag meaning; > > General points: > > - it does not solve unrelated problems like code replication; > > Can someone come up with any other point beside "I don't like XML" > (which I already said is a puny answer) and "it can theorically be 10 > different homepages for 10 different versions" (which I have sincerely > some beef with myself since if you fork a software you might as well > change its name)? > > As I said, moving out the HOMEPAGE field from a package manager > prospective is non functional; if you're showing to the user some data > about a package you might as well show as much as you can, like long > descriptions, other links, and USE flags. And the fact that you can ask > the package manager for something is for me not a valid reason to avoi > moving something in a more approchable place for other software. Ciaran covered most of my points already. Third party programs should not parse ebuilds and eclasses by hand. I'd expect half of them to get it wrong if they tried. Ebuild parsing is hard, that is why we have three complex software packages that for the most part do it properly. Why is 'ask the package manager' an invalid reason to not making something more accessible? How accessible must this data be? Writing an XML parser is not accessible enough (for me), we should just put it in plain text on the hard drive, perhaps in "/var/cache/edb/dep/${PORTDIR}/$C/$PV" Oh wait, we do that already[1]. So really this is where I'm confused. If third parties are using the package manager APIs to get at this data; the only rationale to move it out of ebuilds is: - Space savings. Certainly your scheme may be smaller, but the XML tag overhead may eat into the savings. You should do some estimates to show the community how much smaller the tree will be from this proposal. Randomly looking: cd /var/cache/edb/dep/usr/portage grep -hR HOMEPGE | wc -m yields 1.1million characters. Each character is 1 byte (is that so in UTF8?) So at best you could save the 1.2GB tree 2.2 million bytes (about 2 megs) if your scheme was (more than) 100% efficient. The extra 1.1 million characters comes from the space freed in the cache (since we don't cache metadata.xml). 2 megs into 1200 megs is.. ".16%" of the tree. As I thought, not very compelling. Looking at DESCRIPTION: grep -hR DESCRIPTION | wc -m yields ~1.5 million characters. Nice! So if we purge that from the cache and replace it with a (more than) 100% efficient metadata.xml solution we could save: 3 megs 3 megs saved + 2 megs saved = 5 megs saved. 5 / 1200 = .41% of the tree. Still again n
Re: [gentoo-dev] Re: [RFC] Moving HOMEPAGE out of ebuilds for the future
On Mon, 01 Dec 2008 00:12:23 +0100 [EMAIL PROTECTED] (Diego 'Flameeyes' Pettenò) wrote: > - no need to replicate homepage data between versions; even though > forks can change homepage, I would expect that to at worse split in > two a package, or have to be different by slot, like Java; You mean "no way of handling generated homepages, use conditional homepages, per version homepages or common homepages". > - allows proper handling of packages lacking a HOMEPAGE; Uh, we can do that using in-ebuild HOMEPAGE too. Just need to decide on a convention. > - less data in metadata cache; Entirely a non-issue. Heck, we want more in there, not less. > - users can check the metadata much more easily by just opening the > xml file or interfacing to that rather than having to skim through the > ebuild, the xml files are probably more user readable then ebuilds > using multiple eclasses; ...or they can just use a decent too. Try 'paludis --query' for an example. > - displaying info about the package does not require parsing the full > ebuild file, with its eclasses; Uhm. It doesn't anyway, because of the metadata cache. > - extensible to provide more links than just the homepage (forums, > trackers, gentoo-specific documentation, ...); So's HOMEPAGE. You could extend the syntax to allow annotations: HOMEPAGE=" http://example.com/ http://forums.example.com/ [[ role = forums ]] http://www.gentoo.org/example [[ role = [ Gentoo specific docs ] ]] gtk+? ( http://gui.example.com/ [[ role = [ Optional GUI docs ] ]] " > - if we also move DESCRIPTION, search software can ignore everything > about ebuild parsing, and just use the metadata.xml files; > considering how many people actually use or used eix, it would make > sense to allow third-party applications to be able to search through > the tree; Except that any decent search client needs to be aware of masks, visibility and so on anyway. > - webapps like packages.gentoo.org would be able to display basic > information without having to parse the ebuilds or the metadata > cache. But they already display complex information. > - as much as people might think metadata is easier to parse than > anything, XML has one huge advantage: there are plently of parsers > for any language without having to actually write one, even as easy > as it can be, and it's easily interfaced with anything; I wrote a > simple XSL file that outputs the basic metadata details for packages > without having any parser or executable code but xsltproc (or any > other XSLT software), correlating data with herds.xml too; ...or you could use a proper ebuild-aware tool that displays metadata details, including things like visibility. Again, paludis --query. > - it really is metadata, and it makes very little sense to need > parsing of eclasses and EAPI handling to get some data from a package > that is non-functional in nature and free form (just like > DESCRIPTION, and unlike LICENSE like Alec said), and that changes at > worse once each slot (unlike LICENSE that can change at any given > version). It isn't non-functional. > And the fact that you can ask the package manager for something is > for me not a valid reason to avoi moving something in a more > approchable place for other software. "More approachable" is a decent package manager API. If you had that you wouldn't need to mess around with XML APIs. -- Ciaran McCreesh signature.asc Description: PGP signature
Re: [gentoo-dev] Re: [RFC] Moving HOMEPAGE out of ebuilds for the future
Diego 'Flameeyes' Pettenò wrote: - no need to replicate homepage data between versions; even though forks can change homepage, I would expect that to at worse split in two a package, or have to be different by slot, like Java; But also the need to replicate http://www.kde.org/ to metadata.xml of all KDE split ebuilds -- right now, this is set by an eclass. - allows proper handling of packages lacking a HOMEPAGE; Could you elaborate a bit about how different is handling of an empty/uninitialized shell variable from an empty XML element? - less data in metadata cache; Isn't it in the cache for some reason? Really, I'm just asking. - users can check the metadata much more easily by just opening the xml file or interfacing to that rather than having to skim through the ebuild, the xml files are probably more user readable then ebuilds using multiple eclasses; Haven't we already agreed that accessing ebuilds/... directly is broken by design? - webapps like packages.gentoo.org would be able to display basic information without having to parse the ebuilds or the metadata cache. Except for the ebuilds which still use the old format (that is 100% of the tree right now) Cheers, -jkt -- cd /local/pub && more beer > /dev/mouth signature.asc Description: OpenPGP digital signature
[gentoo-dev] Re: [RFC] Moving HOMEPAGE out of ebuilds for the future
"Alec Warner" <[EMAIL PROTECTED]> writes: > Diego, What are the concrete benefits of your proposal? As I said: - no need to replicate homepage data between versions; even though forks can change homepage, I would expect that to at worse split in two a package, or have to be different by slot, like Java; - allows proper handling of packages lacking a HOMEPAGE; - less data in metadata cache; - users can check the metadata much more easily by just opening the xml file or interfacing to that rather than having to skim through the ebuild, the xml files are probably more user readable then ebuilds using multiple eclasses; - displaying info about the package does not require parsing the full ebuild file, with its eclasses; - extensible to provide more links than just the homepage (forums, trackers, gentoo-specific documentation, ...); - if we also move DESCRIPTION, search software can ignore everything about ebuild parsing, and just use the metadata.xml files; considering how many people actually use or used eix, it would make sense to allow third-party applications to be able to search through the tree; - webapps like packages.gentoo.org would be able to display basic information without having to parse the ebuilds or the metadata cache. - as much as people might think metadata is easier to parse than anything, XML has one huge advantage: there are plently of parsers for any language without having to actually write one, even as easy as it can be, and it's easily interfaced with anything; I wrote a simple XSL file that outputs the basic metadata details for packages without having any parser or executable code but xsltproc (or any other XSLT software), correlating data with herds.xml too; - it really is metadata, and it makes very little sense to need parsing of eclasses and EAPI handling to get some data from a package that is non-functional in nature and free form (just like DESCRIPTION, and unlike LICENSE like Alec said), and that changes at worse once each slot (unlike LICENSE that can change at any given version). Disadvantages: - it requires user-interface software to parse metadata.xml to show data for a package; which is already needed to show per-package USE flag meaning; General points: - it does not solve unrelated problems like code replication; Can someone come up with any other point beside "I don't like XML" (which I already said is a puny answer) and "it can theorically be 10 different homepages for 10 different versions" (which I have sincerely some beef with myself since if you fork a software you might as well change its name)? As I said, moving out the HOMEPAGE field from a package manager prospective is non functional; if you're showing to the user some data about a package you might as well show as much as you can, like long descriptions, other links, and USE flags. And the fact that you can ask the package manager for something is for me not a valid reason to avoi moving something in a more approchable place for other software. -- Diego "Flameeyes" Pettenò http://blog.flameeyes.eu/ pgpJevDGzJEf0.pgp Description: PGP signature
[gentoo-dev] Re: [RFC] Moving HOMEPAGE out of ebuilds for the future
Tobias Scherbaum <[EMAIL PROTECTED]> writes: > "dev-java/sun-jdk" unnecessarily. Reducing this to restrict="1.4" isn't > easily readable as you'd need to know that restrict would specify a > slot. If your plan is to make it easier to find useful information about > a package (without using a fancy frontend, just reading the metadata.xml > with $EDITOR) slot="1.4" (or a version attribute) might be a tad more > human readable. Well if we go to these things we should just apply the same to the other attributes using restrict, since we want to have something coherent, don't we? ;) -- Diego "Flameeyes" Pettenò http://blog.flameeyes.eu/ pgpqrcOBdsryo.pgp Description: PGP signature
Re: [gentoo-dev] Re: [RFC] Moving HOMEPAGE out of ebuilds for the future
Diego 'Flameeyes' Pettenò wrote: > Tobias Scherbaum <[EMAIL PROTECTED]> writes: > > > But what about additional slot or version attributes like > > http://java.sun.com/j2se/1.4.2/ > > (or a version attribute)? If slot and version aren't specified they'd be > > interpreted as wildcards. > > > > The restrict attribute exists already and it's better to reuse the same > code, isn't it? In general yes, but in that case you're duplicating info like "dev-java/sun-jdk" unnecessarily. Reducing this to restrict="1.4" isn't easily readable as you'd need to know that restrict would specify a slot. If your plan is to make it easier to find useful information about a package (without using a fancy frontend, just reading the metadata.xml with $EDITOR) slot="1.4" (or a version attribute) might be a tad more human readable. Tobias signature.asc Description: Dies ist ein digital signierter Nachrichtenteil
[gentoo-dev] Re: [RFC] Moving HOMEPAGE out of ebuilds for the future
Tobias Scherbaum <[EMAIL PROTECTED]> writes: > But what about additional slot or version attributes like > http://java.sun.com/j2se/1.4.2/ > (or a version attribute)? If slot and version aren't specified they'd be > interpreted as wildcards. The restrict attribute exists already and it's better to reuse the same code, isn't it? -- Diego "Flameeyes" Pettenò http://blog.flameeyes.eu/ pgpJJ5dlUYEV7.pgp Description: PGP signature
[gentoo-dev] Re: [RFC] Moving HOMEPAGE out of ebuilds for the future
[EMAIL PROTECTED] (Diego 'Flameeyes' Pettenò) writes: > I have a very quick proposal: why don't we move the packages' homepage > in metadata.xml (since it's usually unique for all the versions) and we > get rid of the variable for the next EAPI version? I forgot to say that this also addresses, for the future EAPI, the problem of what to do with missing HOMEPAGE. We still have to find a solution for that on the EAPI 0, 1 and 2 though since it's a bit of a big problem when we point to domain squatters. If it was feasible to just make missing HOMEPAGE a softfail for the other three it would be even better. -- Diego "Flameeyes" Pettenò http://blog.flameeyes.eu/ pgp3NTn0n5kOR.pgp Description: PGP signature
[gentoo-dev] Re: [RFC] Moving HOMEPAGE out of ebuilds for the future
Jan Kundrát <[EMAIL PROTECTED]> writes: > I believe the reason was that HOMEPAGE might change with new versions > and that metadata.xml didn't (doesn't?) support version-specific data. At least the maintainer and (iirc, at least that's how we proposed it the first time around) flag tags support a restrict attribute. But I really expect that as long as the package is the same, homepage is unlikely to change with version; maybe with slot I guess, but even that is debatable and somewhat rare I think. -- Diego "Flameeyes" Pettenò http://blog.flameeyes.eu/ pgpB5SH1b5Viy.pgp Description: PGP signature