Re: Changing the focus of the chronic CPAN problem

Shlomi Fish Sat, 05 Apr 2008 11:35:11 -0700

Hi Andy! (and all)

Thanks for the comments. See below for my response. It will be a rather 
stream-of-consciousness thing, but I hope something can come up of it.

On Saturday 05 April 2008, Andy Lester wrote:
> Every so often, an idea's like Shlomi's comes up, where we talk about
> adding reviews to CPAN, or reorganizing the categories, or any number
> of relatively easy-to-implement tasks.  It's a good idea, but it's
> focused too tightly.  What we're really trying to do is not provide
> reviews, but help with the selection process.
>
> ** We want to help the user find and select the best tool for the job.
> **
>
> It might involve showing the user the bug queue; or a list of reviews;
> or an average star rating.  But ultimately, the goal is to let any
> person with a given problem find and select a solution.
>
> "I want to parse XML, what should I use?"  XML::Parser? XML::Simple?
> XML::Twig?  If "parse XML" really means "find a single tag out of a
> big order file my boss gave me", the answer might well be a regex, no?
>
> In my day job, I work for Follett Library Resources and Book
> Wholesalers, Inc.  We are basically the Amazon.com for the school &
> public library markets, respectively.  The key feature to the website
> is not ordering, but in helping librarians decide what books  they
> should buy for their libraries.  Imagine you have an elementary school
> library, and $10,000 in book budget for the year.  What books do you
> buy?  Our website is geared to making that happen.
>
> Part of this is technical solutions.  We have effective keyword
> searching, so you can search for "horses" and get books about horses.
> Part of it is filtering, like "I want books for this grade level, and
> that have been positively reviewed in at least two journals," in
> addition to plain ol' keyword searching.  Part of it is showing book
> covers, and reprinting reviews from journals. (If anyone's interested
> in specifics, let me know and I can probably get you some screenshots
> and/or guest access.)
>
> BWI takes it even farther.  There's an entire department called
> Collection Development where librarians select books, CDs & DVDs to
> recommend to the librarians.  The recommendations could be based on
> choices made by the CollDev staff directly.  They could be compiled
> from awards lists (Caldecott, Newbery) or state lists (the Texas
> Bluebonnet Awards, for example).  Whatever the source, they help solve
> the customer's problem of "I need to buy some books, what's good?"
>
> This is no small part of the business.  The websites for the two
> companies are key differentiators in the marketplace.  Specifically,
> they raise the company's level of service from simply providing an
> item to purchase to actually helping the customer do her/his job.
>
> Ultimately, I think this is where all "how do we make CPAN easier to
> use" discussions are leading.  The focus needs to change from the
> tactical ("Let's have reviews") to the strategic ("How do we get the
> proper modules/solutions in the hands of the users that want them.")
>

That sounds like a very cool technology and system you have there. Let's try 
to do something similar, but obviously not identical, for CPAN. 

First of all let's define which parameters will make one use a module. I 
recall an essay I started writing and then neglected and had second thought 
about High Quality Software. I'm attaching the summary of it here - make what 
you want of it. (The essay itself was supposed to be more detailed and 
philosophising, but probably redundant.). What we seek is a way for J. Random 
Perler to find a module that's:

1. Reasonably High Quality.

2. Does what they want. (I.e: an XML parser would be useless for date 
processing, and he probably wouldn't want to find it, or even need to use 
it.)

#2 can probably be adequately solved using a smart keyword search (good 
knowledge of English aside) and by possibly by using some PageRank-like 
algorithms or by utilising on Google/search.yahoo.com/etc. When I tried to 
search for modules on CPAN I usually did not have problems finding modules 
that do what I want. 

My problem, and judging by IRC other people's problem was finding which ones 
of them were good (= of a high quality) enough.

What we need is a heuristic to measure quality and display higher quality 
results as the first ones. (see 
http://www.joelonsoftware.com/articles/fog0000000041.html ). This way:

1. The search should be machine automated without human intervention.

2. The system could make use of quality-indicators set by humans, but not 
expect these to be 
compulsory, "if-everyone-did-X-the-whole-world-will-be-better" schemes.

3. It should make a judicial choice based on as many different relevant 
factors as possible.

4. It should be the first thing that people look for.

5. Other parameters that I'm forgetting.

-------------------

For example, the CPANTS Kwalitee metric, if set, is usually very good 
indicator of a module's quality. That's because I assume that if it has a 
full or close to full kwalitee, then the author has:

1. Updated it recently.

2. Probably intends to support it into the future.

I admit that in the CPANTS game, I often just increased my kwalitee by 
relatively superficial means. But at least it means I'm still alive and 
kicking and care enough about the module to release a new version.

Now the number of open bugs is a poor indicator. If a module has no bugs, 
either the author is very diligent and handles all of them, or perhaps no one 
uses the module. On the other hand if it has a lot of bugs, it obviously 
means quite a few people are using it, but it also may mean that despite it 
it is incredibly.[1]

[1] - For example, XML::Simple is very popular, but I find its concept and way 
of implementation broken-by-design, and have a policy against helping people 
with problems with it. 

So we need to be a bit smart. I think adding some (optional) meta-data to the 
META.yml for categories, tags and for some flags or a status indicator saying 
things like "deprecated", "don't use", "in deep maintenance mode", "under 
development", "API is not frozen", etc. would be a step in the right 
direction.

I can try filing an RFC about it for the META.yml spec and implementing it 
into Module::Build, etc. And there could be a Kwalitee metric that the author 
set these flags.

What other parameters can we think of?

1. Last update date - doesn't mean it's not good (see 
http://perl.plover.com/yak/12views/samples/notes.html#sl-9) , but usually a 
good one.

2. Average activity over time. (How many releases and how frequently).

3. Kwalitee metric.

4. CPAN Testers (?).

5. Number of incoming references/dependecies in (WWW, CPAN, etc.) - 
PageRank-like.

6. CPAN Ratings/Reviews.

7. Test coverage.

8. Author tagging/categorising/marking.

9. User-contributed tagging/categorising/marking.

I'm not sure if there's one metric we can rely on exclusively. It probably 
needs to rely on many factors. (Just like Google relies on many factors and 
heuristics). I don't think there's a single "Aha!" silver bullet, but we just 
need to be very smart.

Now Freshmeat.net has a similar problem, but I'm not sure if they solved it 
too well in their search.

Nevertheless I think it's encouraging that CPAN has become so full of good 
modules (because like Andy Lester noted, it's also so full of junk) that we 
now worry about how to find them. Many languages have a problem of lacking 
such good APIs and abstractions in the first place, or that they are 
scattered across many places, and lack the comprehensiveness of CPAN.

I think CPAN faces the same problem that the web faced a few years ago before 
Google - how to find good and relevant information in a lot of low-quality 
one. This means we have a lot of good and bad "information", where that 
information is code.

Anyway, I'll probably set up a wiki page about it, so we can brain-storm more 
easily.

Hopefully we can make this CPAN-Rank happen.

Regards,

        Shlomi Fish

---------------------------------------------------------------------
Shlomi Fish      [EMAIL PROTECTED]
Homepage:        http://www.shlomifish.org/

I'm not an actor - I just play one on T.V.

* Abstract

* Introduction:
    - The fc-solve-discuss thing.
    - "industrial strength" Freecell solvers
    - I'm not going to discuss what makes a software "industrial strength",
    "enterprise-grade", etc., but rather will discuss what makes it
    high quality.
        - What's high-quality?
            - hard to define.
            - Good software.
            - that most people like or advocate.
            - that "just works".
            - often has a lot of FUD or written badly, but at the end of the
            day, it's what most people use or are even told to use.
    - Software that is of exceptional quality:
        - Vim
        - Subversion
        - gcc
        - perl
            - Note about python
            - Note about Ruby
                - bad online documentation
                - no internal UTF-8 support.
                - slow
            - Note about PHP
        - KDE
            - Note about GNOME

* Parameters for Quality:
    + - The software is available for download or buying.
    + - The software has a version number in the archive, or package.
    + - The software has a homepage.
    +     - An SF.net /project/mysoft/ page does not count.
    + - The software has packages for most common distributions.
    + - The software is easy to compile, deploy and install.
    +     - Not too many dependencies.
    + - Source code is freely available.
    + - Licence is as permissive as possible.
    +     - Free software.
    +     - GPL compatible.
    +     - LGPL or better.
    +     - BSD-style.
    + - "Just works".
    +     - Builds out of the box, with minimal hassles.
    +         - ./configure --prefix= ; make ; make install
    +     - There are no show-stopping bugs.
    +     - The software has all the featuers people want.
    +     - Good usability
    + - Good and extensive documentation.
    +     - Documentation is often an indication of usability problem.
    + - Portability
    +     - Runs on all UNIXes.
    +     - Runs on Windows.
    +     - Probably not important to run on more obscure systems. (depends on
    +     the context though)

    + - Good ways to receive support.
    - Speed and good performance.
        - Not the only or most important parameters
    + - Security.
        + - doesn't bite you in unexpected places.
        + - again - not everything.
    + - Backwards compatibility
    - Aesthetics.

* Ways to achieve Quality:
    - Note:
        - not quality in themselves - but rather ways to facilitate 
        achieving it.
    - Modular code.
    - automated tests.
    - beta testers.
    - frequent or constant releases or work
    - refactoring.
    - good software management
        - CatB
        - JoS, etc.
    - tactfullness, humour, etc.
    - see Producing OSS by Karl Fogel for more.
    - lack of politics.
        - Politics is like subjectivity
            - probably no way to completely avoid it, but one should still
            strive to eliminate as much as possible.
    - good knowledge of English
    - hype
        - preferably as small as possible.
        - if there's a lot of FUD against you, it usually means you're
        doing fine.
    - a good name
        - CVSNT
    - NOTE!!!
        - eventually the software may be good enough to only be maintained.
        - MJD's http://perl.plover.com/yak/12views/samples/notes.html#sl-9

* Why FCS is high quality, while its competition not quite as much.

Re: Changing the focus of the chronic CPAN problem

Reply via email to