Re: [gentoo-dev] pid 1 design

2012-08-09 Thread Wyatt Epp
On Thu, Aug 9, 2012 at 11:25 AM, Rich Freeman  wrote:
> ...have an init as PID=1 that does
> nothing but launch systemd and keep it propped up until it gets a
> signal from systemd.  However, that could have issues I'm just not
> thinking of.

I'm not the maintainer, but this method does seem to work pretty well
for OpenRC and our old friend baselayout-1 (so, the last decade or so,
as I understand it).



Re: [gentoo-dev] rfc: stabilization policies

2013-08-20 Thread Wyatt Epp
On Tue, Aug 20, 2013 at 2:19 PM, William Hubbs  wrote:
>
> During the last release of OpenRC, I learned that people *do* run
> production servers on ~arch. I asked about it and was told that the
> reason for this is bitrot in the stable tree.
>
This right here seems strange to me.  What things in stable are
undergoing bitrot?  What manner of bitrot?  On what architectures?
Specifics or examples seem like they'll be important here because I
think this is largely a matter of perception.  Having to endure CantOS
at work, I have a hard time faulting the stable situation that I see
with my amd64 machine at home.

Regards,
Wyatt



Re: [gentoo-dev] rfc: stabilization policies

2013-08-20 Thread Wyatt Epp
On Tue, Aug 20, 2013 at 5:05 PM, Tom Wijsman  wrote:
>
> At least the numbers for the year sound like something we will want to
> deal with; from there, we could try to keep half a year low. And after
> a while, we might end up ensuring stabilization within 3 months.
>
> That's still three times more than our intended stabilization delay...
>
I think it'd be more interesting to know how many of these stable
requests DON'T have blockers or other issues attached that prevent(ed)
them from getting stabled.

Also, at the older ages, how many of them are obsoleted by a newer
version in the tree and should be closed (or even how many have been
treecleaned)?

-Wyatt



Re: [gentoo-dev] rfc: stabilization policies

2013-08-21 Thread Wyatt Epp
On Wed, Aug 21, 2013 at 5:50 AM, Sergey Popov  wrote:
>
> As i said earlier, we should recruit more people -> then problem will go
> away.

This is a point most of the people in this thread seem to be dancing
around that's sort of problematic.  You can talk about recruiting
until you're blue in the face, but the simple fact is Gentoo DOESN'T
have adequate manpower.  And has it ever, really?  Can you honestly
say we've ever had a solid surplus of devs with time [0]?  We've
gotten where we are, by and large, because Gentoo works smarter.

Fundamentally, I see this as a problem of tooling.

Let's turn the question around; try thinking about it like this: What
tools have historically allowed relatively few active developers
handle stablisation and integration of upstream patch flow IN SPITE of
not having a lot of recruits?  What tools could be added to assist
with, if not outright _remove_ steps of the process?

I'd like to point out something that jumped out to me as a red flag
earlier (not to pick on you specifically, Tom; this is just the
cleanest example I saw), and turn it into an example:

On Wed, Aug 21, 2013 at 4:51 AM, Tom Wijsman  wrote:
>
> Well, they are listed there; but it's quite some work to actually go
> through that list, that is, manually check the bugs of ~2000 packages
> as well as file a STABLEREQ bug, takes quite a while...
>
This right here is a real problem.  Any time you're talking about
doing anything on this scale "manually", you've already lost the
battle.  You need a tool to minimise the overhead of time and
cognitive load.  What would that tool look like?  Think about the
steps involved and how you can reduce them to only the parts that
absolutely require decisions on your part.

>> At least in the areas I usually work, I have found a combination of
>> the automatic stabilisation requests and imlate have definitely cut
>> back on the bitrot.
>
> A single unimportant bug can prevent the automatic STABLEREQ bug from
> getting filed; as for imlate, not everyone seems to know that tool, not
> everyone seems to run it. Attention for some stabilizations is lost...
>
First off, why do developers not know about the tools?  How can this
be addressed?  For a start, I'd suggest making sure the tools are at
least mentioned in the docs.  Somewhere other than the amd64 Arch
Testing guide from 2006. [1] That's the only concrete (i.e. actually
in the DOCS rather than the ML archives) documentation I've found so
far, and it only references the file, rather than the tool.  Maybe in
the devmanual Tools Reference? [2]

But, imlate is a good example of a tool that could ease the time cost
of grindy crap.  You showed before that it can get an ordinary count
bounded on n days.  That's handy, but only a little.  Build out:
- How many of those stablereq bugs reference versions no longer in the
tree?  Those can probably get closed.
- How many have newer STABLE versions in the tree in the same slot?
Probably fine to close those, too.
- Of the remaining, how many have patches or ebuilds attached?  Those
may be solved problems waiting for closure; shortlist them.
- How many are packages with newer versions that have been in the tree
for >30 days?  Consider changing the target version, then.
- How many have open blockers, and what are those blockers (w/link and
summary)?  Scan for low-hanging fruit jumping out in that list.
- Get views by category; are there categories where updates are more
important?  Things in @system, and things with security concerns
(stuff in net-*) should probably get higher priority; games...
probably less so.
- Are there bugs with certain keywords in the body that should raise
priority?  Things like "security" or "overflow" might be good
candidates.
- Are there bugs with certain keywords in the body that indicate it'll
be really easy to decide? e.g. "trivial" or "minor" might turn up some
of those super-small version bumps that you pretty much know aren't
going to affect stability.

These are just examples off the top of my head, and by no means
bulletproof, but these are in the class of improvements that have ROI
because they reduce a task that previously took developer time to one
that takes CPU time.  CPU time is essentially free compared to the
value of dev time.

And I'm not saying more recruiting shouldn't happen, but relying on it
is no better than hoping at $deity for bugs to close themselves. ;)

Cheers,
Wyatt

[0] Okay, maybe in the "glory days" when we were higher up on
Distrowatch and thing were really kicking. (I know, I know, "DW isn't
representative", but really? Sabayon is doing better than we are,
now?)
[1] http://www.gentoo.org/proj/en/base/amd64/tests/index.xml?part=1&chap=2
[2] http://devmanual.gentoo.org/tools-reference/index.html



Re: [gentoo-dev] A new glep: Ebuild format and metadata handling

2009-06-03 Thread Wyatt Epp
On Sun, May 31, 2009 at 6:09 PM, Richard Freeman  wrote:

>
> glep55: See GLEP55. To summarize: The eapi is put into the file name so
> that the package manager knows the EAPI (and thus how to handle this file
> format). While it simplifies the eapi discovery this comes at a high price
> as there is no reliable way to find and validate all ebuilds.  It also
> enforces some minor limitations, for example EAPI needs to be unique and
> cannot be overridden by eclasses. Some people also see it as bad design as
> it exposes file internals in the filename.


Okay, this has been bothering mesorry if this is a sort of silly
question, but why not just use the (already extant) metadata.xml for
the...err, metadata about a package?

In any case, I'm strongly opposed to the idea of encoding any more metadata
into the filename than is strictly necessary to uniquely identify the file.
As both a software developer and a user, please do not do this.

Regards,
Wyatt


Re: [gentoo-dev] Init systems portage category

2009-10-12 Thread Wyatt Epp
2009/10/12 Jesús Guerrero 

>  But there's one... That what the "system" set is about in first place. We
> could argue if creating a new category would be any good or not, that's a
> different issue. But there's already a list of packages that's considered
> critical for a Gentoo system. That's what "system" is, and you will get a
> big red waning when trying to uninstall one package belonging to this
> category.
>
>
Seeing as we understand @system to be "critical for a functional Gentoo
system", the phrase "critical packages" may have been poorly chosen for
communicating the concept of "things that, should I be cavalier in playing
with them, may leave me with a system that is incapable of playing again
without intervention from one of those lovely LiveCD things".

Nevertheless, there is a class of "packages that I need to watch out for,
because they'll make my life miserable in ways X can only dream about and
THEN stab me in the kidneys with a rusty javelin if I'm not careful" under
discussion that could probably use some action.  It's unfortunate that
there's no good way of encoding arbitrary semantic metadata about a small
set of packages such that it can be leveraged by various sources to achieve
this end...

Regards,
Wyatt


Re: [gentoo-dev] FHS or not (WAS: [gentoo-project] Call for agenda items - Council meeting 2014-03-11)

2014-02-28 Thread Wyatt Epp
On Fri, Feb 28, 2014 at 7:47 PM, William Hubbs  wrote:
>
> Patrick thinks that all configuration files belong in /etc, and what has
> happened is, some packages are placing default configuration
> files in /lib or /usr/lib and allowing them to be overridden by files
> with the exact same names and paths in /etc. His argument is that only
> libraries belong in /lib or /usr/lib.
>
I didn't get that vibe from what was quoted in OP.  Maybe there's
something missing.  But let's be real here: if I install something and
want to configure its system-wide bits, the first place I go is ALWAYS
/etc.  When I don't find it there, with the rest of the system config
files, my day gets a little worse and I lose a bit of time trying to
interrogate a search engine for the answer.  And that's annoying.
That sucks.

I don't particularly care about the history, or the politics, or what
upstreams think they have the right to decide for me.  Sure, it might
be "only" convention, but even then it's still valuable by merit of
allowing you to make (often correct) predictions about where to
configure your shiny new daemon and by reducing cognitive load (no
need to remember that "Okay, so bonehead has its config in
/usr/lib/bone/head/ and sillyd has it's config in /var/silly/comedy/,
and...where was riced.conf, again?").

> I disagree with this based on understanding how the config system in
> these packages works. Also, I don't think a distro should do this type of
> patching if the patches are not accepted upstream.
>
I somehow get the sense that you're talking about specific packages,
but more generally: If there's some legitimate reason the config can't
go where configs...go (like the package hardcoding the path to the
config without any overrides possible (which sounds absolutely
moronic, IMO.  What if you want to temporarily test a new config?))
then sure, let it live where it lives.  But for stuff where they're
already able to be overridden by a version in /etc anyway?  I don't
think "if users are supposed to be able to modify it, the config
should be /etc" is an unreasonable position to take.

Reducing user pain isn't an all-or-nothing exercise.

Cheers,
Wyatt



Re: [gentoo-dev] Re: FHS or not (WAS: [gentoo-project] Call for agenda items - Council meeting 2014-03-11)

2014-03-03 Thread Wyatt Epp
On Sat, Mar 1, 2014 at 11:06 AM, William Hubbs  wrote:
>
> No sir, I was not telling a half-truth.
>
> If the default configuration is stored in /lib/udev/rules.d for example,
> and you can override that default by dropping files of the same name in
> /etc/udev/rules.d, I don't see what the concern is.
>
Oh, that's easy.  The concern is that, as a sysadmin, I have no idea
what the current configuration even is, let alone any idea that the
override is even possible or how the override file is formatted.  This
problem is magnified for every thing that works this way multiplied
again by every instance that the configuration needs to be checked or
changed (because it likely needs to be looked up again because it's in
a non-standard place and we humans don't remember things well if
they're not a constant presence in our lives).

In short: Making life easier for users is why distros even exist in
the first place.  This method lacks transparency and makes life harder
for users.

On Sat, Mar 1, 2014 at 1:31 PM, Alec Warner  wrote:
>
> it is easy for a some users to determine, using existing tools (vim, less,
> etc.) to view what the configuration state is.
>
This point is incredibly important:  It should really never require a
search engine to even determine what the current config looks like.  I
don't care if it involves moving the canonical config, or putting a
stub config in /etc with a comment to the effect of:
# This file is for overrides; please see /lib/foo/bar for the default
system configuration.

...or throwing a bunch of code at it to invent a better config
tracking tool (again), or whatever.

Or say "screw it" and this thread dies with no tangible action like so
many others; enjoy your papercuts, users.

> When the default configs are in /lib/udev/.../ and the over-rides are in
> /etc/udev/.../ that is perhaps less clear. Many applications already provide
> app specific tools for this. You can run apt-config dump to dump your entire
> apt configuration (on debian / ubuntu) for example. I'm unsure if polkit or
> dbus have a tool that will read in the configuration and dump what the
> daemon thinks the state would be (if it loaded it.) (puppet has
>
Oh PLEASE don't let this become a trend.  I can't fathom any
legitimate reason to reinvent cat repeatedly.

> gconf, dconf, polkit, dbus, all do stuff like this. I actually find the
> solution somewhat elegant from my side as a sysadmin.
>
I'm curious: how many people have you encountered who even know those
can be configured?  (Never mind things like "how does this work?" or
"what does this even do?"; you've made a very nice list of things
hardly anyone understands. :/ )

Cheers,
Wyatt



Re: [gentoo-dev] Re: FHS or not (WAS: [gentoo-project] Call for agenda items - Council meeting 2014-03-11)

2014-03-03 Thread Wyatt Epp
On Mon, Mar 3, 2014 at 11:10 AM, Alec Warner  wrote:
>
> Many of the config files are large, and splitting them into segments makes
> it easier to read.
>
Ah, no, impedance mismatch. Split configs are easy-- /etc/env.d/ took
something like two minutes to grasp years ago.

To clarify, I was more dismayed by the whole "If you want to know the
configuration run this purpose-built utility that ends up spitting out
a bunch of text anyway rather than just looking at the files" part.

If the _only_ way to get the config for something is ever to run a
specific command specifically tailored for that purpose, then it's
evidence of a truly shocking and advanced sadism (not to mention a
complete and utter failure of software engineering as a discipline).

Cheers,
Wyatt



Re: [gentoo-dev] RFC GLEP 1005: Package Tags

2014-03-23 Thread Wyatt Epp
On Sat, Mar 22, 2014 at 6:33 PM, Alec Warner  wrote:
> https://wiki.gentoo.org/wiki/Package_Tags
>
Ack, this had to happen on a weekend when I wasn't paying attention!
And you beat me to it, too-- I was working on something in this vein,
but wasn't quite satisfied with the design yet.  Oh well.  You're sort
of on the right track, but there are some very important aspects
missing that will make the whole thing collapse with their absence.
(This thread has been in various places, but I frankly don't feel like
finding the relevant snippets, so you get a text dump.  Sorry about
that.)

The first thing missing is aliasing (most proposals for this sort of
system miss this at first; don't feel too bad).  There are many, many,
many cases where you want more than one single tag query to resolve to
the same canonical tag.  The ability to define aliases that take care
of this automatically is critical.  In my notes on this, I had a
global alias file, and users can have an /etc/portage/tag.alias.  It's
just text -- nothing special -- that defines antecedent = consequent
relationships.  This means the antecedent is _replaced_ by the
consequent.  As a quick example, cpp = c++ This also allows for simple
changes to the canonical name.

Second, implication is important for decreasing maintenance burden.
An implication is an antecedent -> consequent relationship where the
consequent is automatically added if the antecedent is present.
Unlike aliasing, the consequent doesn't _replace_ the antecedent.  An
example of this is acpi -> power_management, because acpi is a
distinct aspect of power management, and has value on its own.  Over
time, this significantly lowers the maintenance burden of an expanding
vocabulary and tree.

With that in place, I want to make something clear: consistency in the
vocabulary is absolutely critical.  I cannot overemphasise how
important this is.  Adding tags without any sort of discipline leads
to an unmaintainable vocabulary, which makes the whole thing as
worthless as some people think.  So there needs some sort of basic
canonical list of tags with their descriptions, and yes people should
be expected to be rigourous in how they approach this.  I've attached
a rough draft of descriptions and aliases that I pulled together a
while ago (analogous to /etc/portage/profiles/use.desc).

This is where aliasing becomes essential, because it allows us to
guarantee some amount of consistency.  We're only human and can't be
expected to cover every situation, but there's plenty of low-hanging
fruit in this area.  e.g.:
app = application   # Alias abbreviation to full tag
editors = editor# Make plural -> singular
aliases standard where sensible.
# Rule of thumb 1: "This is a(n)..."
admin = administration  # Rule of thumb 2: "This is
a(n)... ...tool"
backup = back-up# Can use hyphenated forms
benchmark = benchmarking# As with admin, only gerund form.
cdr = disk_authoring# Spaces replaced with
underscores at word boundaries
i18n = internationalisation # Will need to come to a
consensus on the s/z spelling and make some aliases.
cpp = c++   # Valid tags should be
restricted to basic ASCII minus spaces (replaced with underscores) for
our own sanity
.net = dotnet   # This could go either way,
but the leading period makes my Unix blood distrust it.
gamedev = game_development  # "games" becomes ambiguous
with "game" so prefer a more-clear form.
lang = language = programming_language  # Not to be confused with the
i18n language support. Avoid confusion with clear naming
version_control = source_control = vcs  # Well known abbreviations can
be used in place of their expansions
mail = email# No sense not being clear
mail_server = mail_transfer_agent = mta # Multiple aliases to the same
thing are acceptable
nntp = {{newsreader usenet}}# The braced notation denotes
an intersection of two tags.  Need to decide if this sort of alias is
legal.  I'm thinking no, honestly.
sys = system# BUT it's in conflict with
@system!  Don't do that.
www = web   # These are all things that
deal with the web specifically.
apache = apache_module  # classes of packages that
have their own categories is exactly why this is a good idea.

The above is just an excerpt copied directly from my notes on
aliasing.  Some other stuff:
- Query syntax and semantics can be addressed in greater detail later.
 There's some nice sugar to be had here.
- Likewise, tools.  Something along the lines of quse and equery would
be handy in support of this.
- Aliases for reasonable search terms are not a bad idea.
- I've stated at various points in the past, but categories are
already tags after a fashion.  They're not ve

Re: [gentoo-dev] RFC GLEP 1005: Package Tags

2014-03-24 Thread Wyatt Epp
On Mon, Mar 24, 2014 at 12:28 PM, Ciaran McCreesh
 wrote:
> On Mon, 24 Mar 2014 10:55:38 -0400
> Damien Levac  wrote:
>> A lot of people already replied to this question: package search.
>
> Sure, but can you point to prior examples of this kind of stuff
> actually working?
>
eix -C allows you to search for categories.  It's horrendously
under-powered, but almost a useful prototype of what could be.

Pandora uses this general concept with superb granularity for graphing
similarities in music.  That the MGP data is only used for a streaming
service is depressing.

Alternativeto.net is software oriented and has a good bit of this.
Results?  http://alternativeto.net/tag/tiling/ Bam.  Tiling window
managers.  (These are almost certainly all user-sourced; notice the
innocent misuse in that list.)

The various Danbooru-style sites will generally show off impressive
community-sourced rigour as well as proving the efficacy of
alias/implication at scale.  I have a lot of respect for their
collective pep. Most are NSFW, but this one probably won't be (much):
http://safebooru.org/‎

The Library of Congress? (The modern library is practically built on
this sort of metadata.)

Regards,
Wyatt



Re: [gentoo-dev] RFC GLEP 1005: Package Tags

2014-03-28 Thread Wyatt Epp
On Fri, Mar 28, 2014 at 1:14 PM, Ciaran McCreesh
 wrote:
> On Thu, 27 Mar 2014 03:53:47 +0100
> yac  wrote:
>> What I was describing is the difference between fundamental properties
>> of categories and tags.
>
> You are trying to redefine categories in terms of a concept that they
> didn't originally represent.

No one's redefining anything.  You seem awfully fixated on the history
that forced categories to exist, which doesn't really matter in this
context.  Regardless of any of that, people can and _do_ attempt to
use categories as a rudimentary method of attempting to search for
packages.

As you and several others have so eloquently pointed out, that's not
their "purpose".  Concurrently, from the other direction, myself and
several others have noted that they're thoroughly inadequate for that
anyway.  That's why this topic keeps coming up and why this
(work-in-progress) GLEP exists in the first place.

> From a package mangler perspective,
> categories aren't just "a label" for a package. They're fundamentally
> part of a package's name.
>
>From that standpoint, they're even less adequate for lookup; encoding
metadata in names has never turned out well for anyone.

Cheers,
Wyatt



Re: [gentoo-dev] RFC GLEP 1005: Package Tags

2014-03-28 Thread Wyatt Epp
On Fri, Mar 28, 2014 at 4:39 PM, Kent Fredric  wrote:
>
> This example for me suggests we'll need to have some kind of process of
> defining what tags should be used for what things, similar to how we have a
> process for global USE, mostly, because inconsistency is a bad thing here.
>
Yes, you want a controlled, well-defined vocabulary.  That's
important.  On the other hand, don't get too bent out of shape about
it.  These things fall over when you start adding dumb arbitrary
restrictions like "there needs to be consensus" or "there need to be
at least n packages beforehand".

> Because looking at this example and the results of `eix -cS terminal`, I see
> lots of things that may also be ambiguously tagged "terminal" due to being a
> terminal based application.
>
> Thus, either "terminal-emulator" or "terminal-app" or similar tags seem
> necessary.
>
terminal: terminal emulators.  Make it an alias to terminal_emulator.
cli: things that have a normal, line-based terminal interface. See also: curses.

It's not hard to choose good, unambiguous tags when you can use
aliasing to shorthand and unify.  That's why it's more important than
implication, because controlling your vocabulary is seriously
important.

> And now that we're starting to flesh out mock tags that may make sense, it
> quickly seems we'll eventually want some kind of tag hierarchy.
>
No.  You really, really, reaally don't.  At least not in the sense
that you seem to be thinking.  It makes tags annoying to add and
annoying to use, so no one does either and the whole thing falls over.

> But as long as the tag is restricted to [A-Za-z-]+  or similar, we should
> have enough syntactical space to add a hierarchy in later if we find out we
> need it.
>
Don't worry, we won't.  With only the facilities I've outlined in my
first post, the system will scale well beyond a million packages and
tens of thousands of unique tags, so don't worry too much about
exhausting our semantic description space.

Cheers,
Wyatt



Re: [gentoo-dev] rfc: adding sys-apps/iproute2 to the @system set

2014-09-05 Thread Wyatt Epp
On Fri, Sep 5, 2014 at 2:35 PM, Alex Xu  wrote:
>
> no, because it's not necessary to bring up a working system. we don't
> have wpa_supplicant, and we shouldn't have net-tools now that openrc
> isn't in @system anymore.
>
Well, your definition of "working" seems quite a bit narrower than mine!

More saliently, I recall having needed to do network-related things
from within my stage 3 chroot before, and I'd very much like that to
continue being possible.

-Wyatt



Re: [gentoo-dev] rfc: openrc service script dependency checker

2014-12-04 Thread Wyatt Epp
On Thu, Dec 4, 2014 at 12:37 PM, Christopher Head  wrote:
>
> What if now, by some accident, iptables ends up in a loop (maybe not even a 
> loop including $insecure_service, but some other loop entirely), and it’s the 
> randomly chosen victim? Is it still good to boot as many services as 
> possible? I think not.

My understanding of the algorithm is that it explicitly does not break
on "need" boundaries and cycle breaking doesn't affect the rest of the
graph.  So in that scenario, if iptables isn't started, your
hypothetical insecure service won't be started either.  It's rather
conservative and sane, IMO.

-Wyatt



Re: [gentoo-dev] Tags (Was: RFC: split up media-sound/ category)

2011-06-22 Thread Wyatt Epp
On Wed, Jun 22, 2011 at 14:19, Ciaran McCreesh
 wrote:
> On Wed, 22 Jun 2011 21:55:18 +1200
> Kent Fredric  wrote:
>> I'd love a tag solution, that'd be nice, is there a GLEP for it yet?
>> And if so, how long will it take to get this "tag" feature supported
>> by EAPI standards?
>
> The slow parts are coming up with a good design, getting the Council to
> approve it, and getting Portage to implement it. The fast part is
> getting the PMS bit done.
>
> The problem with tags is that all we've heard so far is "we should have
> tags!", with no description of what tags are, what they'll solve or how
> they're used.
>
> --
> Ciaran McCreesh
>

Tags are basically keywords you can use to describe packages, allowing
you to easily search and explore your options based on what the
packages actually does (if we want to get technical, anything that
identifies a package is a sort of tag: name, version, license, set,
checksum, etc.).  It's just a vocabulary that eases the burden of
human lookup.  The categories we have now are essentially (pairs of)
tags tied to a treelike structure in an actual filesystem, and I'd
wager that's a decent place to start, too-- probably the most
prominent problem I can see with the current method comes from these
edge cases where one category is obviously not enough.  The obvious
solution is probably to just stick our semantic metadata into the
metadata.xml.  So for...say, media-video/kdenlive,
media-video[1] becomes more like this:


media
video
kde
editors


The canonical tag list needn't even expand beyond what we have already
(for the time being; attempting to keep your vocabulary entirely
static is a Bad Thing.  Humans are amazing at finding new things that
need tagging.  Getting ahead of myself, though).

In the practical sense, we can probably just whip out a quick script
and get 98% coverage; package maintainers should be encouraged to add
relevant tags to the packages under their care as needed.

--Wyatt, hoping this text is plain as it says it is.  Sorry if it's not.

[1] Let's just assume for the sake of argument that kdenlive actually
has a  field in its metadata file.



Re: [gentoo-dev] Re: Tags (Was: RFC: split up media-sound/ category)

2011-06-22 Thread Wyatt Epp
On Wed, Jun 22, 2011 at 21:25, Duncan <1i5t5.dun...@cox.net> wrote:
> Umm... I believe Ciaran meant "no description" in the practical PM
> implementation rules sense, not in the general definition sense, which I
> suppose most folks here understand by now.
>
Most is not all. ;)  In general, I try not to assume everyone is on
the same page; one of the things academia got right.

> Until that happens, or at least is actually in process, it's all talk.
>
Shall we call it "in process" right now, then?  My impression was he
was calling for us to get down to brass tacks and hammer this out for
real.  Apologies in advance for the long post.

As far as what I've said already, a quick read of the PMS tells me
that "[metadata.xml's] exact format is strictly beyond the scope" of
it.  Would it be acceptable to add this to the ebuilds themselves?
Otherwise, at least the tags become mandatory and drag the xml into
this.  Given that encoding tags into directory paths is why we're
talking about this in the first place, that's out; the third obvious
solution is a separate file for each package, butyeah, not where I
would personally go with it without thinking long and hard about the
other two first.

The directory paths themselveswell, one solution I noted from the
other thread was to populate tag directories with symlinks.  I've done
similar things, but always thought of it as a hack, so I'm reluctant
to advocate for building a deployable semantic system on top of it-- I
could potentially be convinced otherwise, though.  Given that tags and
categories have roughly the same purpose and end result, a flat ebuild
directory referenced only by its metadata should certainly be
possible.  If this is going to happen, and happen right, what this all
looks like in the filesystem is moot anyway.

I bring this up because there are several packages with the same name
and different qualification.  Obviously, they'll have different tags
because they're not the same thing, but neither should they share the
same directory.  So the simple solution is to just change the package
names so we avoid collision and preserve our flat ontology (I've
forgotten the objection to doing this; please forgive).  The next
simplest solution is to just name the directories as hashes in-tree
and cover it up with software magic (I'm pretty sure this ends up
pretty ugly, implementation-wise).

For the sake of migration, packages should probably have their current
category/directory added to the tags; deprecate and remove after
sufficient time (I think this is one of those two-year changes?).

Those are roughly my thoughts for the time being.  Let's do this thing!

Regards,
Wyatt



Re: [gentoo-dev] Re: Tags (Was: RFC: split up media-sound/ category)

2011-06-23 Thread Wyatt Epp
On Thu, Jun 23, 2011 at 02:14, Ciaran McCreesh
 wrote:
> First: how do tags relate to categories? Are they independent, a
> refinement or a replacement?
>
I would suggest they be a replacement because categories are just an
overly limited subset of a proper tagging scheme.

> Second: which of the following are tags suitable for? Searching.
> Browsing just using 'ls' etc. Browsing using tools (e.g. a website or a
> package manager command). Using as dependencies.
>
Yes, Maybe, Yes, Probably Not.

> Third: are tags a property of ebuilds, of packages, of a repository, or
> entirely external?
>
Likely packages.  Though I suppose the potential exists for a version
change to add or remove features and change tags; so, ebuilds?

> Fourth: do tags in any way identify a package?
>
Uniquely?  No.

Regards,
Wyatt



Re: [gentoo-dev] Re: Tags (Was: RFC: split up media-sound/ category)

2011-06-24 Thread Wyatt Epp
On Fri, Jun 24, 2011 at 02:51, Ciaran McCreesh
 wrote:
> If you abolish categories in favour of tags, but tags don't uniquely
> identify a package, how do you uniquely identify a package?
>
> Remember that your solution has to work with overlays and with tags
> being changed after a package is installed.
>
I seem to have misunderstood the thrust of your question?  media-sound
is a category (tag); each package still has its name, URI, files
associated with it and their checksums.  A combination of a tag or two
and package name is going to be plenty for such a small data set
excepting some pretty absurd circumstance where four projects all
choose the same name and do the same thing.  Alternatively, we could
just make names unique in the first place and nip that problem in the
bud forever.  Either way, tags changing isn't especially different
from categories changing.

On Fri, Jun 24, 2011 at 03:18, Zac Medico  wrote:
> On 06/23/2011 05:07 PM, Wyatt Epp wrote:
> Since categories and tags can easily coexist, you might want to rethink
> that.  It's relatively easy to implement a tagging mechanism, while
> (unnecessarily) ripping out the existing category framework is a big
> chore that may not have any practical value.

I was more thinking that in the long term it's reduplication of effort
and annoying.  I probably should have worded it as "tags deprecate
categories".  You're right that practically speaking, once we figure
out where to put it, it's roughly within reach.



Re: [gentoo-dev] Tags (Was: RFC: split up media-sound/ category)

2011-06-25 Thread Wyatt Epp
On Sat, Jun 25, 2011 at 02:49, Kent Fredric  wrote:
> I'm strongly of the mind that by making the tag system arbitrarily
> flat, you might be prematurely limiting yourself, as well as risking a
> future where the "tag index" is a sea of meaningless words.
>
> Tags in my mind, should be grouped by the sort of information they are
> trying to convey, as opposed to being arbitrary and completely
> un-grouped.
>
> The present category system only has one namespace, which is more or
> less "what-you-use-it-for", and if your tag system is likewise going
> to take that vector as the only approach, you will ultimately end up
> duplicating the category system, albeit without the present limitation
> that means one package can only exist in one place.
>
> This need not be the case, we can suggest alternative tag namespaces,
> such as : The sorts of files it supports working with, the sorts of
> things it can read, the sorts of things it can write.
>
> At present, things that migrate one type of media to another, such as
> pdf -> image , image -> pdf, image -> video , video -> images , etc
> have to be forced to a sort of useless categorisation system.
>
> However, if via tag data, we were able to annotate a) what can be
> written and b) what can be read, this system could be leveraged to
> epic proportions of win.
>
Okay, apologies in advance for my long-windedness.  I hope this all
makes sense to everyone.

I should probably clarify that cloying strictly to flatness is not
what I'm proposing.  Reality has borne out the need for implications
and aliases in sanitising an unruly dataset with a complex
user-generated index, while arbitrary democratised group building has
improved some aspects of discovery.  However, I would consider these
features to be a lower priority than having a system at all.

So to break it down:
Tags - a concise vocabulary used for search.  In their default state
they are untyped and non-hierarchical.  They identify traits of a
package.  Suggest using lower-case and simple, descriptive naming
conventions. Highest priority.
Example: alien {{converter nogui package_management reads_tgz
reads_rpm reads_pkg reads_slp reads_lsb writes_tgz writes_rpm
writes_pkg writes_slp writes_lsb}}

Alias - a relationship between two tags establishing equivalence.
Query of the left term returns results of the right.  This type of
relationship helps reduce dictionary clutter. Low priority.
Example: sound = audio.  Attempting to add "sound" to a package will
instead add "audio" and searches for sound will return the results for
audio.

Implication - a relationship between two tags where the presence of
the left term necessarily requires the right.  This relationship
reduces menial work.  Low priority.
Example: mpd -> audio.  Adding "mpd" to the package will also add "audio".

Kent, your idea is pretty interesting and I rather like it.
Fortunately, it's completely possible within the context of the basic
flat layout, as I outlined with Alien above.  It probably looks ugly
to you-- this is no illusion; it's pretty ugly.  But it also grants us
the flexibility to get a basic system in place quickly and without a
lot of hassle.  We get 90% of the benefit up front, and can extend it
as necessary.

Unfortunately for "real" hierarchical methods, people still have
difficulty with even simple metadata systems.  Fetch some MP3s off the
internet and check their tags or look at search engine queries and
you'll find an entire class of people hampered by what is currently a
largely alien art.  In the end, this system needs to be usable by
people and by keeping it primarily flat, we ease the conceptual
overhead of its implementation and its use.  If it can't be
implemented on itch-scratching timescales, we have failed.  If people
can't use it with very little learning curve, we have failed.

A word on vocabulary:
As you've no doubt noticed, there seems to be a degree of combinatoric
explosion of tags in the method I propose.  In practical use, it's not
as bad as it looks.  For Gentoo, I'd recommend a basic "canonical"
list of general tags based on the current category system (subject to
discussion and addition/subtraction) and incorporate suggestions like
Kent's as they come up.  It's okay to control the vocabulary.  What
you find is that after the initial implementation, it grows fairly
slowly. (Even with reads_* and writes_* the number will probably be
south of 500 tags for a long time; the current categories dissolve
into about 175 tags from what I can see.)

Regards,
Wyatt



Re: [gentoo-dev] RFC: split up media-sound/ category

2011-06-25 Thread Wyatt Epp
On Sat, Jun 25, 2011 at 08:22, Kent Fredric  wrote:
> I think something else that may be important to consider if one is
> eliminating category directories is how we'll replace the utility
> currently provided by category/metadata.xml
>
> Some things are simply grossly impractical to maintain individual
> metadata.xml for reliably due to volume ( ie: dev-perl/* , last I
> looked, the metadata.xml in there presently is largely copy-pasted
> between dists )
>
Looking at the category/metadata.xml, it's a multilingual dictionary
entry and little else.  So make a dictionary of tags (categories).
And what does the latter half have to do with tagging things?  Where's
the maintenance?  There's the overhead of tagging it once, I'll grant.
 And then?  Tags are unlikely to change all that frequently once
they've been added (they don't need to).

> Perhaps we need a new way to apply metadata to a whole host of packages?
>
> Trying to make useflags apply to all packages
> with a given tagset would be comparatively messy.
>
Why do you think that?  The directory-like notation doesn't even need
to be discarded:
perl_module/* ssl

> categories also make it easy to do Naïve iteration of packages
> efficiently, ie: for the most part, if you want to iterate all
> perl-modules, you just need to iterate dev-perl and perl-core , and
> that is all, you're not bogged down by stepping into all the other
> categories, loading all their files and working out whether or not
> they're perl related. ( Yes, I am aware this has its own caveats, but
> if you know of these caveats and they're acceptable to your task, then
> its fine )
>
Or just iterate over the perl_module tag.

> the 'virtuals' category also is a bundle of fun. I really do not want
> to see virtuals identified only by whatever their unique-idenitifier
> might be and the tag 'virtual'. Yuck.
>
In the first place, it's still no different: mysql (the virtual) pulls
in db-mysql (or "charles" or whatever name sounds good) whatever else
is available.  Or, as I mentioned before, while unique identifiers are
really terribly simple, we are fully capable of working around the
lack of that feature.  What prevents virtual/mysql from pulling in
database/mysql?

Regards,
Wyatt



Re: [gentoo-dev] Re: RFC: split up media-sound/ category

2011-06-25 Thread Wyatt Epp
On Sat, Jun 25, 2011 at 21:47, Kent Fredric  wrote:
> Package names themselves can be thusly arbitrary , and could be a SHA
> sum or something obscure, as long as all internals and dependencies
> used the same arbitrary name, things would work as intended.
>
I mentioned this idea of internally referencing packages by a hash in
the other thread.  As long as we're clear that the most common
operation (emerge -av ${PN}) is still exposed to the user, it's
perfectly valid.  I want to be very sure we're clear in our
understanding that tags are for discovery in cases where the user is
not sure what is available (like categories).

As for the latter part, the size of a git repo becoming umanageable
over time had not occurred to me, I'm afraid-- would it work to use
shallow clones?  Otherwise, the herd-wise division is probably
acceptable.  Need to think about that one more.

Regards,
Wyatt



Re: [gentoo-dev] Are tags just sets?

2011-06-26 Thread Wyatt Epp
On Sun, Jun 26, 2011 at 03:02, Ciaran McCreesh
 wrote:
> Here's a completely different way of doing tags:
>
You know, that's not a bad way of going about it.  Truth be told, I
had sort of forgotten sets exists because they're a bit cumbersome at
the moment.  But it's cheap and dead simple and gets us our 90%
immediately.  Actually, it gets 100%, even, if you can include a set
as part of another set (implication) and symlinks function as aliases.
 Very clever; I like it.

> where eapi has to be on the first line.
>
Looks fine but just to be clear, why is having the eapi necessary?

> Second, make a bunch of sets named kde-tag, editors-tag, xml-tag,
> monkeys-tag etc.
>
Don't even need the "-tag" part, really.  But yes, a couple hundred
sets are in order.  And some tool-glue.

> Disadvantages: doesn't use some horribly convoluted system of XML,
> wikis and web 2.0.
>
That's not a disadvantage at all.  Thank you for noticing the third path.

While I still don't really believe categories to be necessary, this
will be a fine intermediate step.

Cheers,
Wyatt



Re: [gentoo-dev] Re: RFC: split up media-sound/ category

2011-06-27 Thread Wyatt Epp
2011/6/27 Jesús J. Guerrero Botella :
> That still doesn't answer my question anyway: both features (symlinks
> and +65k files on a single dir) are incompatible with fat32. And
> someone said fat32 compatibility is a feature we want (still can't
> guess why, but well, be consequent...). Obviously, we want fat32
> compatibility when it comes to arguing against symlinks, which have
> always been with us by the way, but that's not important when we talk
> about other things that are not compatible with fat32.

I'm not sure where you're getting 65k files.  Unless I misinterpreted
everything everyone else was saying, every package would still have
its own directory.  There are fewer than 20k even with a bunch of
overlays installed.  Regardless, you might check the other (other)
thread; I think we're probably going to go quick and
not-necessarily-dirty with sets to get 99% of what we're looking for
almost trivially.

2011/6/27 Jesús J. Guerrero Botella :
> C) "ls $PORTDIR/whatever-category" is a command that's way simpler
> than the one you posted.
>
It's also fundamentally broken because one package can only be in one
category and their expansion has not historically been speedy.  Tags
are a non-exclusive one-to-many relationship.  So a package can have
as many tags as it needs, and users will be able to leverage tags
alone or in combinations to find things they want or need.

> I don't even use tags for my music collections

That's very curious, and I wouldn't mind talking about why that is
off-list (not quite joking; that's really interesting).

So to sum it up, we're fixing package navigation and discovery because
Gentoo is about choice.  Even 15,000 packages is too many to have to
play "guess the category", and it's cruel to expect users (including
ourselves) to know everything in the tree at all times.  It's in all
our best interest to make it easy to know what choices are available
so we can get back to more important things.  Tags help further this
ideal.



Re: [gentoo-dev] Are tags just sets?

2011-06-27 Thread Wyatt Epp
On Mon, Jun 27, 2011 at 16:23, Rich Freeman  wrote:
> I too feel that tags should be distinct from sets, for a bunch of reasons.
>
> Sets should really be something carefully controlled by the
> repository.  While I'm fine with having tags in the repository also,
> there is talk about giving users ways of supplying them as well.
>
Too late; /etc/portage/sets/

> Sets are generally used to tell the package manager to do something
> with a lot of packages at once.  I'm not sure there is much of a need
> to do this with tags, at least not in most of the use cases that have
> been suggested.
>
At the moment, yes, that's very true.  But that's a matter of lacking
tools, more than a necessarily orthogonal concept.  If you look at
sets (or categories), you find they describe attributes of packages.
For example, @world is "everything the user has merged".  The kde
overlay provides things like @kde-live, "kde packages built from
subversion" (it's more specific than ${PN} in this case, but generally
won't need to be).  I don't think anyone here believes this feature
exists without some tool support to glue it together.

> Maybe if we define multiple namespaces for tags we could move to using
> tags as dependencies or whatever, and those tags would be distinct and
> much more carefully defined and controlled.  However, I think this is
> more far-out and not the immediate goal.
>
I'd say that's rather unnecessary.  We should be wary of conflating
all metadata together in our heads: Tags are not a replacement for
structured key-value that we already have.  When we talk of tags,
we're talking about general purpose semantic descriptors that are only
loosely structured and benefit from emergent community standards.  We
already have the things that benefit from rigid definition.

> Sets might work, but they seem a bit like a hack...
>
Oh, absolutely.  But nearly anything is better than the current state
of affairs; if it falls apart, we find a different way.



Re: [gentoo-dev] Are tags just sets?

2011-06-27 Thread Wyatt Epp
On Mon, Jun 27, 2011 at 17:23, Rich Freeman  wrote:
> That wasn't what I was thinking of.  Package masking is also something
> we carefully control in the repository but users can override it FOR
> THEIR OWN SYSTEMS.  With tags I think that there were concepts
> floating around of letting anybody influence how packages are tagged.
>
Ah yes, I think it was Jorge that mentioned that.  And it's a good
idea, too: it streamlines the patch process for tag updates and it
gives us an idea of what sorts of things users find important for
discovering packages.  But because the proposed solution is so simple,
we can handle that later pretty easily as long as we agree on what
we're doing up front to actually enable this.

Cheers,
Wyatt



Re: [gentoo-dev] Are tags just sets?

2011-06-28 Thread Wyatt Epp
On Tue, Jun 28, 2011 at 07:53, Peter Volkov  wrote:
> В Пнд, 27/06/2011 в 20:26 -0700, Brian Harring пишет:
>> > Second, make a bunch of sets named kde-tag, editors-tag, xml-tag,
>> > monkeys-tag etc.
>
> I'd like avoid editing multiple files. Much better will be keep tags
> with package.
>

> Also I like metadata.xml proposal since it'll be easy to use
> per-category metadata.xml for all ebuilds to inherit.
>
You weren't planning on doing this all _manually_, were you? ;)  With
proper tool support (or even just a couple quick scripts), the
single/multiple file argument becomes pretty much moot.

Another thing I think should be reiterated is that leveraging sets in
this matter gets us implication and aliasing for free (as long as sets
can be included inside of other sets).  I'm not sure it would be so
easy to accomplish otherwise.

Regards,
Wyatt