On Thu, Feb 11, 2016 at 11:35 AM, sebb <seb...@gmail.com> wrote:
> On 11 February 2016 at 12:03, Shane Curcuru <a...@shanecurcuru.org> wrote:
>> I need to annotate our structured data set of Apache projects to track
>> which project names are registered trademarks.  This is needed to be
>> able to properly generate a.o/foundation/marks/list (which is currently
>> sadly outdated since it's manually built now).  This is a serious need
>> for Brand Management, since we regularly have third parties say "but you
>> didn't SAY it was your trademark, so I can do it anyway..."
>>
>> My thought is to annotate the PMC DOAP files with a registered marker,
>> then use the existing projects.a.o building of the organized data.  Then
>> use either JS or some cron static generation to display the actual
>> marks/list page.
>
> There are two kinds of RDF files:
> - the PMC RDF files [1] which are mainly stored in the comdev area
> [2], though they can also be stored elsewhere.
> The locations of the files are held in committees.xml [3]
> [These are not actually DOAP files, though the format looks similar.]
>
> - the project DOAP files which are stored by individual projects; they
> are listed in projects.xml [4]
>
> A single PMC RDF file can be associated with multiple DOAP files, e.g.
> Commons, Creadur, Tomcat all have multiple independent project
> releases.
>
>> Is annotating the project data sources the best idea, or should I simply
>> create a new stable URL data source that's just a list of registered
>> names, and join the tables?
>
> I doubt if either of the above file types are suitable.
> The location of the index XML files [3], [4] has already been changed
> once (when projects-new was established).
>
> DOAP files are located all over the place and are often moved within
> the SCM without updating the index file.
> If they are located in the source tree there are often multiple copies
> in different branches.
>
> PMC RDF files may not be updateable except by the project (if located
> in their SCM), and again may move without warning if they are not in
> [2].
>
> It would potentially be possible to recover the PMC RDF files from
> their external locations and insist that they only be stored in the
> comdev area.
> But a single PMC may have multiple marks. Potentially also a project
> may move from a PMC to become its own PMC.
>
> Therefore I think a separate file is needed.
> That would also allow write access to be limited if necessary.

There are indeed multiple ways to solve this, and each way involves a tradeoff.

I would suggest separating this question into three parts.

- - -

First, where is the ultimate source for the data.  And the best way to
address that question is to first decide who will be updating that
data.  Will it be each project, or those on the branding mailing list,
or only VP brand?  Knowing the answer to that question will make a big
difference.

My suggestion would be to start simple with a single file, in the same
directory as committee-info.txt.  I'd suggest YAML as a format as it
is a good tradeoff between human edit-ability and programmatic
parse-ability.

- - -

Next is access.  What you need is something that takes the data from
the private repository, sanitizes it, and publishes the result for
public consumption.  Whimsy has a bunch of cron jobs that places
similar data here: https://whimsy.apache.org/public/.  A script that
parses a YAML file out of SVN, selects and filters out various parts,
and publishes the results in JSON format is very doable.

---

Finally, there is publishing.  While that could be a cron job that
produces static HTML, web browsers have the ability to consume JSON
and format the results.  That's probably the best solution to this.

---

The Apache Phone book is an example of an application that uses the
above design:

https://home.apache.org/phonebook.html

In fact, if the data is made available in this manner, the trademark
information could be included directly in the results of the page it
produces.  That's one of the nice things about having a public JSON
version of the data published - multiple tools can consume that data.

- Sam Ruby

>> The end result needs to be webcontent listing projects like:
>>
>> <h2>The ASF claims these trademarks</h2>
>> ...list all active TLPs
>> <a href="{$homepage}">Apache <b>{$projectname}</b></a>
>> {$if registered then "&reg;" else "&trade;"}
>>
>> <br/>
>>   {$shortdesc}
>> ...
>> <h2>The following projects are retired</h2>
>> ...list all Attic projects
>>
>> <h2>The following projects are in incubation; all trademarks here may be
>> property of respective owners</h2>
>> ...list all Incubation projects
>>
>> Separately, we should list the name of each software *product* here,
>> since if we offer something with a clear name as an independently
>> downloadable software product, it can be our trademark.  So I'd like to
>> list "Apache Directory Studio", since that's a notable name and a major
>> product.  But I don't want to list "Apache Commons Foo Bar Baz and
>> Kitchensink", since those are effectively just minor components that
>> aren't really worth claiming.
>>
>> Comments/suggestions please?  I'm including the Whimsical project since
>> they are also major consumers of this data.
>>
>> - Shane
>
> [1] https://projects.apache.org/pmc_rdf.html
>
> [2] 
> https://svn.apache.org/repos/asf/comdev/projects.apache.org/data/committees/
> [3] 
> https://svn.apache.org/repos/asf/comdev/projects.apache.org/data/committees.xml
> [4] 
> https://svn.apache.org/repos/asf/comdev/projects.apache.org/data/projects.xml

Reply via email to