Re: [Wiki-research-l] [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

2010-07-21 Thread John Vandenberg
On Wed, Jul 21, 2010 at 9:49 PM, Finn Aarup Nielsen  wrote:
>..
> Do anyone knows anything about the French discussions on the introduction of
> the 'Reference' namespace? Should we just implement the French system on the
> English Wikipedia and we are there?

This was discussed on en.wp in late 2007...

http://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28technical%29/Archive_14#Is_there_a_centralized_bibliographic_database_for_wikipedia.3F_Is_there_a_way_to_make_citations_just_by_giving_an_universal_ID_instead_of_copying_a_full_citation_template.3F

The proposal on fr.wp in early 2006:

http://fr.wikipedia.org/wiki/Wikip%C3%A9dia:Prise_de_d%C3%A9cision/Espace_r%C3%A9f%C3%A9rence

--
John Vandenberg

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

2010-07-21 Thread Brian J Mingus
 On Wed, Jul 21, 2010 at 5:47 PM, David Goodman 
 wrote:

> Sure, but first, is this capable of being done at all?  I have never
> seen a method of bibliographic control that can cope with the complete
> range of publications, even just print publications. Perhaps we need
> to proceed  within narrow domains.
>

I assume that by range you mean the number of publications in a domain, and
that by domain you mean the type of publication, be it a book, webpage or
map.

The generic nature of a markup such as wiki template syntax allows us to
easily adapt the same application to new domains. The challenge of the range
within a domain is largely one of resolving ambiguities, which can be
settled with policies that carefully adjudicate troublesome cases.


> Second, is this capable of being done by crowd-sourcing, or does it
> require enforceable standards? The work of Open Library is not a
> promising model, being a uncontrolled mix, done to many different
> standards.  Actually, within the domain of scientific journal articles
> from the last 10 years in Western languages, the best current method
> seems to be a mechanical algorithm, the one used by Google Scholar.
> True,  it does not aggregate perfectly--but it does aggregate better
> than any other existing database. And it does not get them all--nor
> could it no matter how much improved, for many of the versions that
> are actually available are off limits to its crawlers.


In my conception the enforceable standards are to emerge in the meta pages
of this project based on the actual issues that the community encounters.

Googlebot has many deep web accounts to journals online. When you search
Google Scholar the relevance algorithm is actually comparing your query to
the content of pdf pages which you do not have permission to access. Of
course, Google can't access them all, but many publishers have found it in
their interest to give them a complimentary account since it drives
subscription rates.

We can rely on individuals, particularly academics, who have access to the
deep web to help us curate the bibliography. And we can rely on the massive
number of personal bibliographies already out there to help us get good
coverage.

Cleaning up the mass of bibliographic content that I anticipate would be
uploaded by users would require the writing of bots in coordination with the
creation of policy pages.

Getting rid of copyright material would be handled in the same manner, I
presume. After major content publishers see what we are doing, I am sure
they will let us know their opinion about what we can and cannot do. It
seems likely that they will overreach their bounds, and as I have seen on
Wikipedia, the community members will happily ignore them. Or, if they think
the requests are actually in compliance with the law, they will comply.

Brian
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] There is no silver identifier

2010-07-21 Thread Brian J Mingus
On Wed, Jul 21, 2010 at 2:36 PM, Jakob wrote:

> Hi,
>
> Talking about identifiers for bibliographic records I just want to
> stress one crucial point:
>
> > This gives us the following key, guaranteed to be unique:
> > KangHsuKrajbich20091011b
>
> There is absolutely no such thing as a "guaranteed unique identifier"
> that can be derived from existing metadata. You will *always* have
> false positives (different publications get the same identifier [1])
> and false negatives (same publication has different identifiers [2]).
> Fuzzy identifiers even occur if they are created by the publisher or
> author himself (for instance duplicate ISBNs for definitely different
> editions or even totally different books). If you argue about
> identifiers please keep in mind that you *always* talk about
> heuristics but not about something "unique per se". Existing
> identifiers only differ in the ratio of false positives and false
> negatives.
>
> The only way you may get unique identifiers is to assign your own
> identifiers that are *not* derived from the content - such as
> auto-incremented record ids in a database. Even then they are not
> unique if you change the content because the identity of the object
> may change. A MD5 or SHA-sum on the full content [3] or the version id
> in a versioning database (like MediaWiki) is unique but not practical
> if you want to change content. A solution to this problem is to let
> people decide in every single case about how an identifier looks like
> and when it should change (example: Wikipedia article titles). But
> then the identifiers are not permanent (records may split and join and
> be renamed).
>
> That's the way it is. You have to decide which problem to solve with
> an identifier and then be aware of its limitations.  As Brooks [3]
> wrote there is no silver bullet - so there is no silver identifier.
>
> Cheers
> Jakob
>
> [1] For instance if you have a common name and a general title or if
> you want to distinguish the printed version and the presentation
> slides of the same publication etc.
>
> [2] For instance different ways to abbreviate and/or write the name of
> an author and/or title, different years (year of preprint vs year of
> printed version) etc.
>
> [3] See http://en.wikipedia.org/wiki/No_Silver_Bullet which cites an
> article that has been published in 1986 and 1987, and probably
> reprinted in another year - so what's the identifier? ;-)
>
>
Hi Jakob,

I would like to counter this point with the following rule: There is always
a way to adjudicate ambiguity. It is easy to create a rule that works in 90%
of cases:

Author1Author2Author3EtAl10

It is easy to modify this rule to work in 99% of cases:

Author1Author2Author3EtAl20101011b

Modifying the rule to work in 100% of cases requires a community of users to
adjudicate the relatively small number of special cases.

Brian
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] There is no silver identifier

2010-07-21 Thread Jack Park
"citation signals" will always work until a rock band takes that name
and gets a page in Wikipedia. Try "game theory".

Making "semantic" identifiers seems to be a hard problem. If you put
slashes in an identifier, you irritate the folks who want pure and
simple REST URLs. If you put underscores, MediaWiki interprets them as
spaces. Some other characters simply violate the rules of messages
sent over HTTP just as putting apostrophes in strings gives SQL fits.

I do hope someone comes up with a nice, clean solution.

Jack

On Wed, Jul 21, 2010 at 3:20 PM, Jodi Schneider  wrote:
> On 21 Jul 2010, at 21:43, Reid Priedhorsky wrote:
>> A compromise could be that the ID is the first author's name plus an
>> auto-incrememented ID per author. So for example, the first paper of
>> mine the system learns is priedhorsky1, the second priedhorsky2, etc. So
>> you get a system-generated ID for uniqueness but also something
>> comprehensible for people.
>
> Interesting. I'd really like ID's to be not only comprehensible but also to 
> have a fair chance of being directly inputtable by humans.
>
> For instance, on Wikipedia, if I know that I am looking for the article on 
> "citation signals" I can type the URL directly, without searching.
>
> In my ideal citation-wiki-in-the-sky, you could get to the citation directly 
> in this way -- and sensible disambiguation pages would be automatically 
> generated.
>
> -Jodi
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

2010-07-21 Thread David Goodman
Sure, but first, is this capable of being done at all?  I have never
seen a method of bibliographic control that can cope with the complete
range of publications, even just print publications. Perhaps we need
to proceed  within narrow domains.

Second, is this capable of being done by crowd-sourcing, or does it
require enforceable standards? The work of Open Library is not a
promising model, being a uncontrolled mix, done to many different
standards.  Actually, within the domain of scientific journal articles
from the last 10 years in Western languages, the best current method
seems to be a mechanical algorithm, the one used by Google Scholar.
True,  it does not aggregate perfectly--but it does aggregate better
than any other existing database. And it does not get them all--nor
could it no matter how much improved, for many of the versions that
are actually available are off limits to its crawlers.

On Wed, Jul 21, 2010 at 7:02 PM, Brian J Mingus
 wrote:
>
>
> On Wed, Jul 21, 2010 at 4:33 PM, Jodi Schneider 
> wrote:
>>
>> On 21 Jul 2010, at 19:47, Brian J Mingus wrote:
>>
>>  Finn,
>> I'm not a fan of including a portion of the the title for a couple of
>> reasons. First, it's not required to make the key unique. Second, it makes
>> the key longer than necessary. Third, the first word or words from a title
>> are not guaranteed to convey any meaning.
>> Regarding a Reference: namespace, I can see how this has some utility and
>> why projects have moved to it. However, I consider it a stopgap solution
>> that projects have implemented when what they really want is a proper wiki
>> for citations. Here are a few quick things that you can't do (or would have
>> to go out of your way to do) with just a Reference namespace that you can do
>> with a wiki dedicated to all the world's citations:
>> - Custom reports that are boolean combinations of citation fields, ala
>> SMW. This requires substantive new technology as SMW doesn't scale.
>> - User bibliographies which are a logical subset of all literature ever
>> published.
>>
>> Not sure why a Reference namespace couldn't do this.
>>
>> - Conduct a search of the literature.
>>
>> Or this  (you can search just one namespace)
>>
>> - A new set of policies that are not necessarily NPOV, regarding the
>> creation of articles that discuss collections of literature (lit review-like
>> concept). The content of these policies will emerge over years with the help
>> of a community. These articles could, for instance, help people who are
>> navigating a new area of a literature avoid getting stuck in local minima.
>> It could point out the true global context to them. It could point out
>> experimenter biases in the literature; for example, a recent article was
>> published where it was found that citation networks in academic literature
>> can have a tendency to form based on the assumption of authority, when in
>> fact that authority is false, bringing a whole thread of publications into
>> doubt.
>>
>> I'm not sure that literature reviews belong in the same wiki as citations.
>> That's definitely a different namespace. :)
>>
>> - Create wiki articles about individual sources.
>>
>> This might or might not be the same wiki -- but that could be interesting.
>> I could imagine a page for a journal being pulled in from several sources:
>> the collection of citations in the wiki for that journal, RSS from the
>> current contents (license permitting), a Wikipedia page about the journal
>> (if it exists), a link to author guidelines/submission info, open access
>> info from SHERPA/ROMEO,  In this vision, very little of the content
>> "lives" in this wiki itself. Rather, it's templated from numerous other
>> places Perhaps in the way "buy this book" links are handled in
>> librarything -- there are numerous external links which can be activated
>> with a checkbox, and some external content that is pulled in based on
>> copyright review.
>>
>> While I am not dedicated to any of these things happening, I also do not
>> wish to rule them out. The hope is that a new community will emerge around
>> the project and guide it in the direction that is most useful. My hope in
>> this thread is that we can identify some of the most likely cases and
>> imagine what it will be like, so that we can convey this vision to the
>> Foundation and they can get a sense of the potential importance of the
>> project.
>>
>> Scoping is a big problem, I think -- because it would help to have a
>> vision of which of several related tasks/endpoints is primary.
>> I think an investigation of what fr.wikipedia is doing would be really
>> useful -- does anybody edit there, or have an interest in digging into that?
>> Questions might include: What is the reference namespace doing? What isn't
>> it doing, that they wish it would? Did they consider alternatives to a
>> namespace? How is maintenance going? Do they see the reference namespace as
>> longstanding into the future, or as a stopgap?
>> -Jodi

Re: [Wiki-research-l] Generate Wikipedia citations on Open Library

2010-07-21 Thread David Goodman
Why would anyone cite this particular edition? It's not the first ed.,
which is, I think,
http://openlibrary.org/books/OL23411638M/inland_voyage.

it's not even the first american edition. It's not a standard
scholarly edition. It's not an earlier collected edition.  It's not an
edition which is currently in print. What's more, it's a defective
record, because the date on the displayed cover does not match the
date of the edition on the catalog record--which is the date on the
title page of the actual copy scanned, which does not have the
original cover.  The cover was   selected by an automatic algorithm,
which got it wrong.

If we're going to standardize citations, we should standardize a
correct record to an appropriate version, not any version that happens
along. Of course, that's considerably harder. But I dod not see the
point of setting up an elaborate system based on bad data. .

On Wed, Jul 21, 2010 at 3:44 PM, Edward Betts  wrote:
> http://openlibrary.org/books/OL17963918M/An_inland_voyage



-- 
David Goodman, Ph.D, M.L.S.
http://en.wikipedia.org/wiki/User_talk:DGG

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

2010-07-21 Thread Brian J Mingus
On Wed, Jul 21, 2010 at 4:33 PM, Jodi Schneider wrote:

>
> On 21 Jul 2010, at 19:47, Brian J Mingus wrote:
>
>  Finn,
>
> I'm not a fan of including a portion of the the title for a couple of
> reasons. First, it's not required to make the key unique. Second, it makes
> the key longer than necessary. Third, the first word or words from a title
> are not guaranteed to convey any meaning.
>
> Regarding a Reference: namespace, I can see how this has some utility and
> why projects have moved to it. However, I consider it a stopgap solution
> that projects have implemented when what they really want is a proper wiki
> for citations. Here are a few quick things that you can't do (or would have
> to go out of your way to do) with just a Reference namespace that you can do
> with a wiki dedicated to all the world's citations:
>
> - Custom reports that are boolean combinations of citation fields, ala SMW.
> This requires substantive new technology as SMW doesn't scale.
> - User bibliographies which are a logical subset of all literature ever
> published.
>
>
> Not sure why a Reference namespace couldn't do this.
>
> - Conduct a search of the literature.
>
>
> Or this  (you can search just one namespace)
>
> - A new set of policies that are not necessarily NPOV, regarding the
> creation of articles that discuss collections of literature (lit review-like
> concept). The content of these policies will emerge over years with the help
> of a community. These articles could, for instance, help people who are
> navigating a new area of a literature avoid getting stuck in local minima.
> It could point out the true global context to them. It could point out
> experimenter biases in the literature; for example, a recent article was
> published where it was found that citation networks in academic literature
> can have a tendency to form based on the assumption of authority, when in
> fact that authority is false, bringing a whole thread of publications into
> doubt.
>
>
> I'm not sure that literature reviews belong in the same wiki as citations.
> That's definitely a different namespace. :)
>
>  - Create wiki articles about individual sources.
>
>
> This might or might not be the same wiki -- but that could be interesting.
>
> I could imagine a page for a journal being pulled in from several sources:
> the collection of citations in the wiki for that journal, RSS from the
> current contents (license permitting), a Wikipedia page about the journal
> (if it exists), a link to author guidelines/submission info, open access
> info from SHERPA/ROMEO,  In this vision, very little of the content
> "lives" in this wiki itself. Rather, it's templated from numerous other
> places Perhaps in the way "buy this book" links are handled in
> librarything -- there are numerous external links which can be activated
> with a checkbox, and some external content that is pulled in based on
> copyright review.
>
>
> While I am not dedicated to any of these things happening, I also do not
> wish to rule them out. The hope is that a new community will emerge around
> the project and guide it in the direction that is most useful. My hope in
> this thread is that we can identify some of the most likely cases and
> imagine what it will be like, so that we can convey this vision to the
> Foundation and they can get a sense of the potential importance of the
> project.
>
>
> Scoping is a big problem, I think -- because it would help to have a vision
> of which of several related tasks/endpoints is primary.
>
> I think an investigation of what fr.wikipedia is doing would be really
> useful -- does anybody edit there, or have an interest in digging into that?
> Questions might include: What is the reference namespace doing? What isn't
> it doing, that they wish it would? Did they consider alternatives to a
> namespace? How is maintenance going? Do they see the reference namespace as
> longstanding into the future, or as a stopgap?
>
> -Jodi
>

More broadly speaking, a reference namespace does not accomplish the goal of
having a free repository of all citations, complete with collections of
citations curated by the community, and documentation of those citations by
the community, in various forms to be determined by the community. While it
is possible to create specialized cases that suit the narrow needs of
individual projects, I and many of the people I have spoken to see a
justification for a broader vision. This broader vision is directly in line
with the WMF mission of giving free access to the world's knowledge. One of
the first steps must be making the Wikipedia's aware of that knowledge, and
enabling them to build linked networks of information around it.

Brian
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

2010-07-21 Thread David Goodman
The model for this is WP:Book sources, though this relies upon the
user selecting the appropriate places to look, rather than guiding
him.

On Wed, Jul 21, 2010 at 6:33 PM, Jodi Schneider  wrote:
>
> On 21 Jul 2010, at 19:47, Brian J Mingus wrote:
>
>  Finn,
> I'm not a fan of including a portion of the the title for a couple of
> reasons. First, it's not required to make the key unique. Second, it makes
> the key longer than necessary. Third, the first word or words from a title
> are not guaranteed to convey any meaning.
> Regarding a Reference: namespace, I can see how this has some utility and
> why projects have moved to it. However, I consider it a stopgap solution
> that projects have implemented when what they really want is a proper wiki
> for citations. Here are a few quick things that you can't do (or would have
> to go out of your way to do) with just a Reference namespace that you can do
> with a wiki dedicated to all the world's citations:
> - Custom reports that are boolean combinations of citation fields, ala SMW.
> This requires substantive new technology as SMW doesn't scale.
> - User bibliographies which are a logical subset of all literature ever
> published.
>
> Not sure why a Reference namespace couldn't do this.
>
> - Conduct a search of the literature.
>
> Or this  (you can search just one namespace)
>
> - A new set of policies that are not necessarily NPOV, regarding the
> creation of articles that discuss collections of literature (lit review-like
> concept). The content of these policies will emerge over years with the help
> of a community. These articles could, for instance, help people who are
> navigating a new area of a literature avoid getting stuck in local minima.
> It could point out the true global context to them. It could point out
> experimenter biases in the literature; for example, a recent article was
> published where it was found that citation networks in academic literature
> can have a tendency to form based on the assumption of authority, when in
> fact that authority is false, bringing a whole thread of publications into
> doubt.
>
> I'm not sure that literature reviews belong in the same wiki as citations.
> That's definitely a different namespace. :)
>
> - Create wiki articles about individual sources.
>
> This might or might not be the same wiki -- but that could be interesting.
> I could imagine a page for a journal being pulled in from several sources:
> the collection of citations in the wiki for that journal, RSS from the
> current contents (license permitting), a Wikipedia page about the journal
> (if it exists), a link to author guidelines/submission info, open access
> info from SHERPA/ROMEO,  In this vision, very little of the content
> "lives" in this wiki itself. Rather, it's templated from numerous other
> places Perhaps in the way "buy this book" links are handled in
> librarything -- there are numerous external links which can be activated
> with a checkbox, and some external content that is pulled in based on
> copyright review.
>
> While I am not dedicated to any of these things happening, I also do not
> wish to rule them out. The hope is that a new community will emerge around
> the project and guide it in the direction that is most useful. My hope in
> this thread is that we can identify some of the most likely cases and
> imagine what it will be like, so that we can convey this vision to the
> Foundation and they can get a sense of the potential importance of the
> project.
>
> Scoping is a big problem, I think -- because it would help to have a vision
> of which of several related tasks/endpoints is primary.
> I think an investigation of what fr.wikipedia is doing would be really
> useful -- does anybody edit there, or have an interest in digging into that?
> Questions might include: What is the reference namespace doing? What isn't
> it doing, that they wish it would? Did they consider alternatives to a
> namespace? How is maintenance going? Do they see the reference namespace as
> longstanding into the future, or as a stopgap?
> -Jodi
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>



-- 
David Goodman, Ph.D, M.L.S.
http://en.wikipedia.org/wiki/User_talk:DGG

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] There is no silver identifier

2010-07-21 Thread David Goodman
For items that have been assigned a doi, isn't the doi unique (in the
absence of errors--which i cannot recall having ever encountered)?  Of
course the same item in its various manifestations may have multiple
dois, or may have versions that do not have dois as well as versions
that do have them, and the versions may or may not be identical.
We also need to account for the presence of illegitimate as well as
legitimate copies--a person entering a WP reference may have gotten it
from a site that has an unauthorized copy--quite a few scientific
papers are present on the web in such versions.

There are really two problems: one is a pointer to the voucher
authorized version of a document, which may well be the printed
version, and the other problem is pointers to accessible legitimate
versions.  Crossref does a fairly nice job of this for online
articles, but it organized to provide access to paid publishers
versions preferentially, rather than to possible legitimate free
versions.



On Wed, Jul 21, 2010 at 6:20 PM, Jodi Schneider  wrote:
> On 21 Jul 2010, at 21:43, Reid Priedhorsky wrote:
>> A compromise could be that the ID is the first author's name plus an
>> auto-incrememented ID per author. So for example, the first paper of
>> mine the system learns is priedhorsky1, the second priedhorsky2, etc. So
>> you get a system-generated ID for uniqueness but also something
>> comprehensible for people.
>
> Interesting. I'd really like ID's to be not only comprehensible but also to 
> have a fair chance of being directly inputtable by humans.
>
> For instance, on Wikipedia, if I know that I am looking for the article on 
> "citation signals" I can type the URL directly, without searching.
>
> In my ideal citation-wiki-in-the-sky, you could get to the citation directly 
> in this way -- and sensible disambiguation pages would be automatically 
> generated.
>
> -Jodi
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>



-- 
David Goodman, Ph.D, M.L.S.
http://en.wikipedia.org/wiki/User_talk:DGG

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

2010-07-21 Thread Jodi Schneider

On 21 Jul 2010, at 19:47, Brian J Mingus wrote:
>  Finn,
> 
> I'm not a fan of including a portion of the the title for a couple of 
> reasons. First, it's not required to make the key unique. Second, it makes 
> the key longer than necessary. Third, the first word or words from a title 
> are not guaranteed to convey any meaning.
> 
> Regarding a Reference: namespace, I can see how this has some utility and why 
> projects have moved to it. However, I consider it a stopgap solution that 
> projects have implemented when what they really want is a proper wiki for 
> citations. Here are a few quick things that you can't do (or would have to go 
> out of your way to do) with just a Reference namespace that you can do with a 
> wiki dedicated to all the world's citations:
> 
> - Custom reports that are boolean combinations of citation fields, ala SMW. 
> This requires substantive new technology as SMW doesn't scale.
> - User bibliographies which are a logical subset of all literature ever 
> published.

Not sure why a Reference namespace couldn't do this.

> - Conduct a search of the literature.

Or this  (you can search just one namespace)

> - A new set of policies that are not necessarily NPOV, regarding the creation 
> of articles that discuss collections of literature (lit review-like concept). 
> The content of these policies will emerge over years with the help of a 
> community. These articles could, for instance, help people who are navigating 
> a new area of a literature avoid getting stuck in local minima. It could 
> point out the true global context to them. It could point out experimenter 
> biases in the literature; for example, a recent article was published where 
> it was found that citation networks in academic literature can have a 
> tendency to form based on the assumption of authority, when in fact that 
> authority is false, bringing a whole thread of publications into doubt.

I'm not sure that literature reviews belong in the same wiki as citations. 
That's definitely a different namespace. :)

> - Create wiki articles about individual sources.

This might or might not be the same wiki -- but that could be interesting.

I could imagine a page for a journal being pulled in from several sources: the 
collection of citations in the wiki for that journal, RSS from the current 
contents (license permitting), a Wikipedia page about the journal (if it 
exists), a link to author guidelines/submission info, open access info from 
SHERPA/ROMEO,  In this vision, very little of the content "lives" in this 
wiki itself. Rather, it's templated from numerous other places Perhaps in 
the way "buy this book" links are handled in librarything -- there are numerous 
external links which can be activated with a checkbox, and some external 
content that is pulled in based on copyright review.

> 
> While I am not dedicated to any of these things happening, I also do not wish 
> to rule them out. The hope is that a new community will emerge around the 
> project and guide it in the direction that is most useful. My hope in this 
> thread is that we can identify some of the most likely cases and imagine what 
> it will be like, so that we can convey this vision to the Foundation and they 
> can get a sense of the potential importance of the project.

Scoping is a big problem, I think -- because it would help to have a vision of 
which of several related tasks/endpoints is primary.

I think an investigation of what fr.wikipedia is doing would be really useful 
-- does anybody edit there, or have an interest in digging into that? Questions 
might include: What is the reference namespace doing? What isn't it doing, that 
they wish it would? Did they consider alternatives to a namespace? How is 
maintenance going? Do they see the reference namespace as longstanding into the 
future, or as a stopgap?

-Jodi___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] There is no silver identifier

2010-07-21 Thread Jodi Schneider
On 21 Jul 2010, at 21:43, Reid Priedhorsky wrote:
> A compromise could be that the ID is the first author's name plus an 
> auto-incrememented ID per author. So for example, the first paper of 
> mine the system learns is priedhorsky1, the second priedhorsky2, etc. So 
> you get a system-generated ID for uniqueness but also something 
> comprehensible for people.

Interesting. I'd really like ID's to be not only comprehensible but also to 
have a fair chance of being directly inputtable by humans.

For instance, on Wikipedia, if I know that I am looking for the article on 
"citation signals" I can type the URL directly, without searching.

In my ideal citation-wiki-in-the-sky, you could get to the citation directly in 
this way -- and sensible disambiguation pages would be automatically generated.

-Jodi
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] There is no silver identifier

2010-07-21 Thread Reid Priedhorsky
On 07/21/2010 03:36 PM, Jakob wrote:
> Hi,
> 
> Talking about identifiers for bibliographic records I just want to  
> stress one crucial point:
> 
>> This gives us the following key, guaranteed to be unique:
>> KangHsuKrajbich20091011b
> 
> There is absolutely no such thing as a "guaranteed unique identifier"  
> that can be derived from existing metadata. You will *always* have  
> false positives (different publications get the same identifier [1])  
> and false negatives (same publication has different identifiers [2]).  
> Fuzzy identifiers even occur if they are created by the publisher or  
> author himself (for instance duplicate ISBNs for definitely different  
> editions or even totally different books). If you argue about  
> identifiers please keep in mind that you *always* talk about  
> heuristics but not about something "unique per se". Existing  
> identifiers only differ in the ratio of false positives and false  
> negatives.
> 
> The only way you may get unique identifiers is to assign your own  
> identifiers that are *not* derived from the content - such as  
> auto-incremented record ids in a database. Even then they are not  
> unique if you change the content because the identity of the object  
> may change.

I haven't been following this thread, but the way I addressed this in my 
own bibliography manager (http://yabman.sourceforge.net/) is: the BibTeX 
key is the first author's name (lowercased) plus an auto-incremented ID. 
So for example, one of my papers is "priedhorsky229". 229 is arbitrary, 
but there's only a few 3-digit numbers per author, so I don't get confused.

Now in a large system, that would obviously break down into the long, 
incomprehensible CiteULike-type IDs.

A compromise could be that the ID is the first author's name plus an 
auto-incrememented ID per author. So for example, the first paper of 
mine the system learns is priedhorsky1, the second priedhorsky2, etc. So 
you get a system-generated ID for uniqueness but also something 
comprehensible for people.

HTH,

Reid

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] There is no silver identifier

2010-07-21 Thread Jakob
Hi,

Talking about identifiers for bibliographic records I just want to  
stress one crucial point:

> This gives us the following key, guaranteed to be unique:
> KangHsuKrajbich20091011b

There is absolutely no such thing as a "guaranteed unique identifier"  
that can be derived from existing metadata. You will *always* have  
false positives (different publications get the same identifier [1])  
and false negatives (same publication has different identifiers [2]).  
Fuzzy identifiers even occur if they are created by the publisher or  
author himself (for instance duplicate ISBNs for definitely different  
editions or even totally different books). If you argue about  
identifiers please keep in mind that you *always* talk about  
heuristics but not about something "unique per se". Existing  
identifiers only differ in the ratio of false positives and false  
negatives.

The only way you may get unique identifiers is to assign your own  
identifiers that are *not* derived from the content - such as  
auto-incremented record ids in a database. Even then they are not  
unique if you change the content because the identity of the object  
may change. A MD5 or SHA-sum on the full content [3] or the version id  
in a versioning database (like MediaWiki) is unique but not practical  
if you want to change content. A solution to this problem is to let  
people decide in every single case about how an identifier looks like  
and when it should change (example: Wikipedia article titles). But  
then the identifiers are not permanent (records may split and join and  
be renamed).

That's the way it is. You have to decide which problem to solve with  
an identifier and then be aware of its limitations.  As Brooks [3]  
wrote there is no silver bullet - so there is no silver identifier.

Cheers
Jakob

[1] For instance if you have a common name and a general title or if  
you want to distinguish the printed version and the presentation  
slides of the same publication etc.

[2] For instance different ways to abbreviate and/or write the name of  
an author and/or title, different years (year of preprint vs year of  
printed version) etc.

[3] See http://en.wikipedia.org/wiki/No_Silver_Bullet which cites an  
article that has been published in 1986 and 1987, and probably  
reprinted in another year - so what's the identifier? ;-)



___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Generate Wikipedia citations on Open Library

2010-07-21 Thread Edward Betts
In reference to the discussion about citations, we're recently added a 
'Wikipedia citation' link to Open Library. For example:

http://openlibrary.org/books/OL17963918M/An_inland_voyage

At the bottom of the page on the right is this:

Download catalog record: RDF / JSON | Wikipedia citation

The Wikipedia citation link will give you a citation template to copy 
and paste to Wikipedia. We would welcome any comments about this 
citation template.

We would like to have a list  on the page, "what cites this book" for 
citations in Wikipedia and elsewhere on the web.

-- 
Edward.

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

2010-07-21 Thread Daniel Kinzler
> Hey Daniel,
> 
> Bibsonomy seems to suffer from the same problem as CiteULike - urls
> which convey no meaning. An example url id from CiteULike is 2434335,
> and one from Bibsonomy is 29be860f0bdea4a29fba38ef9e6dd6a09. I hope to
> continue to steer the conversation away from that direction. These IDs
> guarantee uniqueness, but I believe that we can create keys that both
> guarantee uniqueness and convey some meaning to humans. Consider that
> this key will be embedded in wiki articles any time a source is cited.
> It's important that it make some sense.

Oh, I didn#t mean we should use hashes or IDs as keys or identifiers in the URL.
I mean we can employ the hashing technique to detect dupes. Because you will
inadvertably get information about the same thing under two different keys,
because of issues with translitteration, etc.

-- daniel

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

2010-07-21 Thread Brian J Mingus
On Wed, Jul 21, 2010 at 2:42 AM, Daniel Kinzler wrote:

> >> 1) The first three author names separated by slashes
> > why not separate by pluses? they don't form part of names either, and
> > don't cause problems with wiki page titles.
>
> I like this... however, how would you represent this in a URL? Also note
> that
> using plusses in page names don't work with all server configurations,
> since
> plus has a special meaning in URLs.
>
> >> 3) Some or all of the date. For instance, if there is only one source by
> >> this set of authors that year, we can just use . However, once
> another
> >> source by those set of authors is added, the key should change to
> MMDD
> >> or similar.
> > I don't think it is a good idea to change one key as a function of
> > updates on another, except for a generic disambiguation tag.
>
> I agree. And if you *have* to use the full date, use MMDD, not the
> other way
> around, please.
>
> >> Since the slashes are somewhat cumbersome, perhaps we can not make them
> >> mandatory, but similarly use them only when they are necessary in order
> to
> >> "escape" a name. In the case that one of the authors does not have a
> slash
> >> in their name - the dominant case - we can stick to the easily legible
> and
> >> niecly compact CamelCase format.
> >>
> >> Example keys generated by this algorithm:
> >>
> >> KangHsuKrajbichEtAl2009
> > Kang+Hsu+Krajbich+2009+the+wick+in
> > or
> > Kang+Hsu+Krajbich+2009+twi
>
> Both seem good, though i would suggest to form a convention to ignore any
> leading "the" and "a", to a more distinctive 3 word suffix.
>
> > Of course, it does not have to be _exactly_ three authors, nor three
> > words from the title, and it does not solve the John Smith (or Zheng
> > Wang) problem.
>
> It also doesn't solve issues with transliteration: Merik Möller may become
> "Moeller" or "Moller", Jakob Voß may become "Voss" or "Vosz"  or even
> "VoB",
> etc. In case of chinese names, it's often not easy to decide which part is
> the
> last name.
>
> To avoid this kind of ambiguity, i suggest to automatically apply some type
> of
> normalization and/or hashing. There is quite a bit of research about this
> kind
> of normalisation out there, generally with the aim of detecting duplicates.
> Perhaps we can learn from bibsonomy.org, have a look how they do it:
> .
>
> Gotta love open source university research projects :)
>
> -- daniel


Hey Daniel,

Bibsonomy seems to suffer from the same problem as CiteULike - urls which
convey no meaning. An example url id from CiteULike is 2434335, and one from
Bibsonomy is 29be860f0bdea4a29fba38ef9e6dd6a09. I hope to continue to steer
the conversation away from that direction. These IDs guarantee uniqueness,
but I believe that we can create keys that both guarantee uniqueness and
convey some meaning to humans. Consider that this key will be embedded in
wiki articles any time a source is cited. It's important that it make some
sense.

Plus signs and slashes in the key appear to be cumbersome. Perhaps we can
avoid this by truncating last names that involve a slash to either the
portion before or after the slash.

Changing the key seems to be a bad idea, so we want a key system that is
unique from the start. That means we should use the full date, MMDD as
suggested by Daniel.

In the event that multiple sources are published by the same set of authors
on the same day, we can use a, b, c disambiguation.

This gives us the following key, guaranteed to be unique:
KangHsuKrajbich20091011b

Brian
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

2010-07-21 Thread Brian J Mingus
On Wed, Jul 21, 2010 at 5:49 AM, Finn Aarup Nielsen  wrote:

>
>
> On Wed, 21 Jul 2010, Jodi Schneider wrote:
>
>  On 21 Jul 2010, at 09:42, Daniel Kinzler wrote:
>>
>>> Kang+Hsu+Krajbich+2009+the+wick+in

>>>
>> This seems best to me of what's proposed so far.
>>
>>> Both seem good, though i would suggest to form a convention to ignore any
>>> leading "the" and "a", to a more distinctive 3 word suffix.
>>>
>>
>> While that's a good idea, then we'd have to know all "indistinctive" words
>> in all languages. (Die, Der, La, L', ...)
>>
>> There are still going to be duplicates, alas...
>>
>>
>>>  Of course, it does not have to be _exactly_ three authors, nor three
 words from the title, and it does not solve the John Smith (or Zheng
 Wang) problem.

>>>
>>> It also doesn't solve issues with transliteration: Merik Möller may
>>> become
>>> "Moeller" or "Moller", Jakob Voß may become "Voss" or "Vosz"  or even
>>> "VoB",
>>> etc. In case of chinese names, it's often not easy to decide which part
>>> is the
>>> last name.
>>>
>>
> I have a large bibtex file where I (mostly) use Surname + one initial +
> year + first important word (
> http://neuro.imm.dtu.dk/software/lyngby/doc/lyngby.bib)
>
> So for example: AaltoS2002Neuroanatomical
>
> There are lots of special cases
>
> "M. C. B. {\AA}berg" becomes AbergM2006Multivariate (transliterate Å)
>
> "Anissa Abi-Dargham" AbiDarghamA2000Measurement (discard dash).
>
> ACM computer classification system "ACM1998Computing" (an organization as
> an author: do you use 'association' or 'ACM'?)
>
> "A Content-Driven Reputation System for the {Wikipedia}" ->
> AdlerB2007ContentDriven (discarding slash in title and camelcasing)
>
> "$[^{15}$O$]$water {PET}: More ``Noise'' than Signal?" ->
> StrotherS1996Owater (here we have sharp parentheses that will be a problem
> in wiki text. I suppose that in chemistry it becomes even worse)
>
> "On the Distribution of the Quotient of two chance variables" becomes
> CurtissJ1941On (as 'On' here is not regarded as a stopword).
>
> Modelling the fMRI response using smooth FIR filters ->
> NielsenF2001ModelingfMRI (extra word because of collision with "Modeling of
> locations in the {BrainMap} database: Detection of outliers"
>
> With 3 author + year + title you sometimes run into collisions:
>
>  author =   {J. M. Ollinger and Gordon L. Shulman and M. Corbetta},
>  title ={Separating Processes within a Trial in Event-Related
>  Functional {MRI}. {II}. Analysis},
>
>  author =   {J. M. Ollinger and Gordon L. Shulman and M. Corbetta},
>  title ={Separating Processes within a Trial in Event-Related
>  Functional {MRI}. {I}. The Method},
>
>
> When dealing with scientific articles it is not always possible to use the
> full given name, since sometimes you just know the initial.
>
> I know one called Vibe Frøkjær. Presumable because she is afraid the PubMed
> and others will not be able to handle the Nordic letters she writes her name
> as Vibe G. Frokjaer in science contexts. Other authors may write her as Vibe
> G. Frøkjær.
>
>
> Articles usually one have one edition. Sometimes you find reprinted
> versions here and there. For books there might be different versions and you
> need to find out whether you want to have the key to the 'Work',
> 'Expression', 'Manifestation' or 'Item' to use the wording from
>
>
> http://en.wikipedia.org/wiki/Functional_Requirements_for_Bibliographic_Records
>
> The French Wikipedia has a page for each book title ('work' regardless of
> language and editions). Editions are listed with multiple infoboxes on the
> page. In this way there is not a one-to-one correspondence between wiki page
> and, say, ISBN. It seems the best to me to have one page for a 'work' where
> you collect comments. However, in citations with page numbers you need the
> 'expression' because of page break differences between versions.
>
> I like the French way, except that each book has two pages: One under the
> 'Reference' namespace and another under the 'Template' namespace.
>
> The French tend to use "Title (authors)" as key in the Reference namespace.
> Mostly fullname:
>
> http://fr.wikipedia.org/wiki/Référence:Weaving_the_Web_(Tim_Berners-Lee)
>
> But sometimes diverge a bit:
>
> http://fr.wikipedia.org/wiki/Référence:Theory_of_numbers_(HardyWright)
>
> The associated template has somewhat unpredictable name, e.g.,
>
> http://fr.wikipedia.org/wiki/Modèle:HardyWright
>
> They link in the template instatiations, e.g., "auteurs=[[Tim
> Berners-Lee]], Mark Fischetti" which I still don't like and would instead
> suggest:
>
> author1=Tim Berners-Lee | author2=Mark Fischetti and templates
> [[{{{author1}}}]], [[{{{author1}}}]] or perhaps better for disamb

Re: [Wiki-research-l] [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

2010-07-21 Thread Finn Aarup Nielsen



On Wed, 21 Jul 2010, Jodi Schneider wrote:


On 21 Jul 2010, at 09:42, Daniel Kinzler wrote:

Kang+Hsu+Krajbich+2009+the+wick+in


This seems best to me of what's proposed so far.

Both seem good, though i would suggest to form a convention to ignore any
leading "the" and "a", to a more distinctive 3 word suffix.


While that's a good idea, then we'd have to know all "indistinctive" words in 
all languages. (Die, Der, La, L', ...)

There are still going to be duplicates, alas...




Of course, it does not have to be _exactly_ three authors, nor three
words from the title, and it does not solve the John Smith (or Zheng
Wang) problem.


It also doesn't solve issues with transliteration: Merik Möller may become
"Moeller" or "Moller", Jakob Voß may become "Voss" or "Vosz"  or even "VoB",
etc. In case of chinese names, it's often not easy to decide which part is the
last name.


I have a large bibtex file where I (mostly) use Surname + one initial + 
year + first important word 
(http://neuro.imm.dtu.dk/software/lyngby/doc/lyngby.bib)


So for example: AaltoS2002Neuroanatomical

There are lots of special cases

"M. C. B. {\AA}berg" becomes AbergM2006Multivariate (transliterate Å)

"Anissa Abi-Dargham" AbiDarghamA2000Measurement (discard dash).

ACM computer classification system "ACM1998Computing" (an organization as 
an author: do you use 'association' or 'ACM'?)


"A Content-Driven Reputation System for the {Wikipedia}" ->
AdlerB2007ContentDriven (discarding slash in title and camelcasing)

"$[^{15}$O$]$water {PET}: More ``Noise'' than Signal?" -> 
StrotherS1996Owater (here we have sharp parentheses that will be a problem 
in wiki text. I suppose that in chemistry it becomes even worse)


"On the Distribution of the Quotient of two chance variables" becomes 
CurtissJ1941On (as 'On' here is not regarded as a stopword).


Modelling the fMRI response using smooth FIR filters -> 
NielsenF2001ModelingfMRI (extra word because of collision with "Modeling 
of locations in the {BrainMap} database: Detection of outliers"


With 3 author + year + title you sometimes run into collisions:

  author =   {J. M. Ollinger and Gordon L. Shulman and M. Corbetta},
  title ={Separating Processes within a Trial in Event-Related
  Functional {MRI}. {II}. Analysis},

  author =   {J. M. Ollinger and Gordon L. Shulman and M. Corbetta},
  title ={Separating Processes within a Trial in Event-Related
  Functional {MRI}. {I}. The Method},


When dealing with scientific articles it is not always possible to use the 
full given name, since sometimes you just know the initial.


I know one called Vibe Frøkjær. Presumable because she is afraid the 
PubMed and others will not be able to handle the Nordic letters she writes 
her name as Vibe G. Frokjaer in science contexts. Other authors may write 
her as Vibe G. Frøkjær.



Articles usually one have one edition. Sometimes you find reprinted 
versions here and there. For books there might be different versions and 
you need to find out whether you want to have the key to the 'Work', 
'Expression', 'Manifestation' or 'Item' to use the wording from


http://en.wikipedia.org/wiki/Functional_Requirements_for_Bibliographic_Records

The French Wikipedia has a page for each book title ('work' regardless of 
language and editions). Editions are listed with multiple infoboxes on the 
page. In this way there is not a one-to-one correspondence between wiki 
page and, say, ISBN. It seems the best to me to have one page for a 'work' 
where you collect comments. However, in citations with page numbers you 
need the 'expression' because of page break differences between versions.


I like the French way, except that each book has two pages: One under the 
'Reference' namespace and another under the 'Template' namespace.


The French tend to use "Title (authors)" as key in the Reference 
namespace. Mostly fullname:


http://fr.wikipedia.org/wiki/Référence:Weaving_the_Web_(Tim_Berners-Lee)

But sometimes diverge a bit:

http://fr.wikipedia.org/wiki/Référence:Theory_of_numbers_(HardyWright)

The associated template has somewhat unpredictable name, e.g.,

http://fr.wikipedia.org/wiki/Modèle:HardyWright

They link in the template instatiations, e.g., "auteurs=[[Tim 
Berners-Lee]], Mark Fischetti" which I still don't like and would instead 
suggest:


author1=Tim Berners-Lee | author2=Mark Fischetti and templates 
[[{{{author1}}}]], [[{{{author1}}}]] or perhaps better for disambig 
[[{{authorlink1}}}|{{{author1}}}]], [[{{{authorlink2|{{{author2}}}]] This 
way you allow for easier extraction and you do not need SMW array 
processing to distinguish the names.


It seems to me that the French has come a long way. I am surprised that 
only John Vandenberg has pointed to the French efforts. I was not aware of 
it before.


Do anyone knows anything about the French discussions on the introduction 
of the 'Reference' namespace? Should we just implement the French

Re: [Wiki-research-l] [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

2010-07-21 Thread Daniel Mietchen
On Wed, Jul 21, 2010 at 10:42 AM, Daniel Kinzler  wrote:
>>> 1) The first three author names separated by slashes
>> why not separate by pluses? they don't form part of names either, and
>> don't cause problems with wiki page titles.
>
> I like this... however, how would you represent this in a URL?
%2B would seem to be the obvious choice to me.

> Also note that
> using plusses in page names don't work with all server configurations, since
> plus has a special meaning in URLs.

Don't know too much about the double escaping business to comment on that, but
if pluses are not acceptable, we still have equal signs (possibly with
similar problems, but
still useful for direct web search) and underscores (which would turn
the whole key into one
string for search engines).

Daniel

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

2010-07-21 Thread Daniel Kinzler
Jodi Schneider schrieb:
> On 21 Jul 2010, at 09:42, Daniel Kinzler wrote:
>>> Kang+Hsu+Krajbich+2009+the+wick+in
> 
> This seems best to me of what's proposed so far. 
>> Both seem good, though i would suggest to form a convention to ignore any
>> leading "the" and "a", to a more distinctive 3 word suffix.
> 
> While that's a good idea, then we'd have to know all "indistinctive" words in 
> all languages. (Die, Der, La, L', ...)

Stopword lists for major languages exists, and where they don't, they are easily
created, even automatically. Word frequency analysis on a few megabyte of text
is cheap these days :)

-- daniel


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] page numbers

2010-07-21 Thread Jodi Schneider
Jeff makes some good points about page numbers on public-lld (where I had 
forwarded part of this conversation). -Jodi

Begin forwarded message:

> Resent-From: public-...@w3.org
> From: "Young,Jeff (OR)" 
> Date: 20 July 2010 22:53:40 GMT+01:00
> To: "Tom Morris" 
> Cc: "Karen Coyle" , "Jodi Schneider" 
> , "public-lld" , "Code for 
> Libraries" , "Brian Mingus" 
> 
> Subject: RE: "universal citation index"
> 
> I suspect this discussion happened on code4lib before the thread got
> cross-posting to LLD XG where I first saw it.
> 
> There are undoubtedly a ton of diverse use cases, but that doesn't mean
> APIs are the best solution. Here are some spitball possibilities for
> "not just manifestations" and "we need page numbers".
> 
> http://example.org/frbr:serial/2/citation-apa.{bcp-47}.txt
> http://example.org/frbr:manifestation/1/citation-apa.{bcp-47}.txt?xyz:st
> artPage=5&xyz:endPage=6  
> 
> I'm imagining an xyz ontology with startPage and endPage, but we can
> surely create it if something doesn't already exist.
> 
> Jeff
> 
>> -Original Message-
>> From: Tom Morris [mailto:tfmor...@gmail.com]
>> Sent: Tuesday, July 20, 2010 5:37 PM
>> To: Young,Jeff (OR)
>> Cc: Karen Coyle; Jodi Schneider; public-lld; Code for Libraries; Brian
>> Mingus
>> Subject: Re: "universal citation index"
>> 
>> On Tue, Jul 20, 2010 at 1:40 PM, Young,Jeff (OR) 
>> wrote:
>>> In terms of Linked Data, it should make sense to treat citations as
>>> text/plain variant representations of a FRBR Manifestation.
>> 
>> As Karen mentioned, many types of citation need more information than
>> just the manifestation.  You also need pages numbers, etc.
>> 
>> Tom
> 
> 
> 

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

2010-07-21 Thread Jodi Schneider

On 21 Jul 2010, at 09:42, Daniel Kinzler wrote:
>> Kang+Hsu+Krajbich+2009+the+wick+in

This seems best to me of what's proposed so far. 
> Both seem good, though i would suggest to form a convention to ignore any
> leading "the" and "a", to a more distinctive 3 word suffix.

While that's a good idea, then we'd have to know all "indistinctive" words in 
all languages. (Die, Der, La, L', ...)

There are still going to be duplicates, alas...

> 
>> Of course, it does not have to be _exactly_ three authors, nor three
>> words from the title, and it does not solve the John Smith (or Zheng
>> Wang) problem.
> 
> It also doesn't solve issues with transliteration: Merik Möller may become
> "Moeller" or "Moller", Jakob Voß may become "Voss" or "Vosz"  or even "VoB",
> etc. In case of chinese names, it's often not easy to decide which part is the
> last name.
> 
> To avoid this kind of ambiguity, i suggest to automatically apply some type of
> normalization and/or hashing. There is quite a bit of research about this kind
> of normalisation out there, generally with the aim of detecting duplicates.
> Perhaps we can learn from bibsonomy.org, have a look how they do it:
> .

Good idea!

-Jodi
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

2010-07-21 Thread Daniel Kinzler
>> 1) The first three author names separated by slashes
> why not separate by pluses? they don't form part of names either, and
> don't cause problems with wiki page titles.

I like this... however, how would you represent this in a URL? Also note that
using plusses in page names don't work with all server configurations, since
plus has a special meaning in URLs.

>> 3) Some or all of the date. For instance, if there is only one source by
>> this set of authors that year, we can just use . However, once another
>> source by those set of authors is added, the key should change to MMDD
>> or similar.
> I don't think it is a good idea to change one key as a function of
> updates on another, except for a generic disambiguation tag.

I agree. And if you *have* to use the full date, use MMDD, not the other way
around, please.

>> Since the slashes are somewhat cumbersome, perhaps we can not make them
>> mandatory, but similarly use them only when they are necessary in order to
>> "escape" a name. In the case that one of the authors does not have a slash
>> in their name - the dominant case - we can stick to the easily legible and
>> niecly compact CamelCase format.
>>
>> Example keys generated by this algorithm:
>>
>> KangHsuKrajbichEtAl2009
> Kang+Hsu+Krajbich+2009+the+wick+in
> or
> Kang+Hsu+Krajbich+2009+twi

Both seem good, though i would suggest to form a convention to ignore any
leading "the" and "a", to a more distinctive 3 word suffix.

> Of course, it does not have to be _exactly_ three authors, nor three
> words from the title, and it does not solve the John Smith (or Zheng
> Wang) problem.

It also doesn't solve issues with transliteration: Merik Möller may become
"Moeller" or "Moller", Jakob Voß may become "Voss" or "Vosz"  or even "VoB",
etc. In case of chinese names, it's often not easy to decide which part is the
last name.

To avoid this kind of ambiguity, i suggest to automatically apply some type of
normalization and/or hashing. There is quite a bit of research about this kind
of normalisation out there, generally with the aim of detecting duplicates.
Perhaps we can learn from bibsonomy.org, have a look how they do it:
.

Gotta love open source university research projects :)

-- daniel



___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

2010-07-21 Thread Daniel Mietchen
On Tue, Jul 20, 2010 at 9:26 PM, Brian J Mingus
 wrote:
> I like your suggestion that the abc disambiguator be chosen based on the
> first date of publication, and I also like the prospect of using slashes
> since they can't be contained in names. Using the full year is a good idea
> too. We can combine these to come up with a key that, in principle, is
> guaranteed to be unique. This key would contain:
>
> 1) The first three author names separated by slashes
why not separate by pluses? they don't form part of names either, and
don't cause problems with wiki page titles.

> 2) If there are more than three authors, an EtAl
don't think that's necessary if we get the abc part right.

> 3) Some or all of the date. For instance, if there is only one source by
> this set of authors that year, we can just use . However, once another
> source by those set of authors is added, the key should change to MMDD
> or similar.
I don't think it is a good idea to change one key as a function of
updates on another, except for a generic disambiguation tag.

> If there are multiple publications on the same day, we can
> resort to abc. Redirects and disambiguation pages can be set up when a key
> changes.
As Jodi pointed out already, the exact date is often not clearly
identifiable, so I would go simply for the year.
Instead of an alphabetic abc, one could use some function of the
article title (e.g. the first three words thereof, or the initials of
the first three words), always in lower case.

An even less ambiguous abc would be starting page (for printed stuff)
or article number (for online only) but this brings us back to the
7523225 problem you mentioned above.

> Since the slashes are somewhat cumbersome, perhaps we can not make them
> mandatory, but similarly use them only when they are necessary in order to
> "escape" a name. In the case that one of the authors does not have a slash
> in their name - the dominant case - we can stick to the easily legible and
> niecly compact CamelCase format.
>
> Example keys generated by this algorithm:
>
> KangHsuKrajbichEtAl2009
Kang+Hsu+Krajbich+2009+the+wick+in
or
Kang+Hsu+Krajbich+2009+twi

also note that the CamelCase key does not yield results in a google
search, whereas the first plused variant brings up the right work
correctly, while the plused one with initialed title tends to bring at
least something written by or cited from these authors.

> Author1Author2/Author-Three/2009
Author1+Author2+Author-Three+2009+just+another+article
or
Author1+Author2+Author-Three+2009+jat

Of course, it does not have to be _exactly_ three authors, nor three
words from the title, and it does not solve the John Smith (or Zheng
Wang) problem.

Daniel

-- 
http://www.google.com/profiles/daniel.mietchen

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l