Re: [R] The Future of R | API to Public Databases

Spencer Graves Fri, 13 Jan 2012 18:51:25 -0800

A traditional way to exit a chaotic situation as you describe isto try to establish a standards committee, invite participation fromsuppliers and users of whatever (data in this case), apply forregistration with the International Standards Organization, and organizemeetings, draft and circulate a proposed standard, etc. A statisticianwho had published maybe 100 papers and 3 books told me that his work onISO 9000 (I think) made a larger contribution to humanity than anythingelse he had done. Work on standards is one of the most boring, tediousactivities I can imagine -- and can potentially be the most impactfulthing one does in this life: If you have an ISO standard number forsomething, people who are starting something new may find it and followit. People who are working to upgrade something may tell theirmanagement, "Let's follow this standard." Customers sometimes ask theirsuppliers, "If you follow the standard, you might get more customers."

I think you could get support for such a standard effort from theAmerican Association for the Advancement of Science, the AmericanEconomics Association, the American Statistical Association, and manyother organizations, including many on-line science journals that todaypressure authors of papers to put the data behind their published paperin the public domain, downloadable from their web site, etc.



      IMHO.
      Spencer


On 1/13/2012 3:39 PM, Benjamin Weber wrote:

The whole issue is related to the mismatch of (1) the publisher of the
data and (2) the user at the rendezvous point.
Both the publisher and the user don't know anything about the
rendezvous point. Both want to meet but don't meet in reality.
The user wastes time to find the rendezvous point defined by the publisher.
The publisher assumes any rendezvous point. As per the number of
publishers, the variety of the fields and the flavor of each expert,
we end up in today's data world. Everyone has to waste his precious
time to find out the rendezvous point. Only experts do know in which
corner to focus their search on - but even they need their time to
find what they want.
However, each expert (of each profession) believes that his approach
is the best one in the world.
Finally we have a state of total confusion, where only experts can
handle the information and non-experts can not even access the data
without diving fully into the flood of data and their specialities.
That's my point: Data is not accessible.

The discussion should follow a strategical approach:
- Is the classical csv file (in all its varieties) the simplest and best way?
- Isn't it the responsibility of the R community to recommend
standards for different kinds of data?
With the existence of this rendezvous point the publisher would know a
specific point which is favorable from the user's point of view. That
is missing.
Only a rendezvous point defined by the community can be a 'known'
rendezvous point for all stakeholders, globally.

I do believe that the publisher's greatest interest is data
accessibility. Where is the toolkit we provide them to enable them to
serve us the data exactly as we want it? No, we just try to build even
more packages to be lost in the noise of information.

I disagree with a proposed solution to have a maintained package or a
bunch of packages which just combines connections to the existing
databases and keeping them up to date. It is a question of time when
the user will be lost there. Such an approach is neither feasible, nor
efficient.

We should just tell them where we would like to meet.

Benjamin

On 14 January 2012 04:58, Brian Diggs<dig...@ohsu.edu>  wrote:

On 1/13/2012 2:26 PM, MacQueen, Don wrote:

It's a nice idea, but I wouldn't be optimistic about it happening:

Each of these public databases no doubt has its own more or less unique
API, and the people likely to know the API well enough to write R code to
access any particular database will be specialists in that field. They
likely won't know much if anything about other public databases. The
likelihood of a group forming to develop ** and maintain ** a single R
package to access the no-doubt huge variety of public databases strikes me
as small.


I agree. The more reasonable model is a collection of packages, each of
which can access a particular data source.

However, this looks like a great opportunity for a new CRAN Task View. The
task view would simply identify which packages connect to which public
databases. (sorry, I can't volunteer)


A CRAN Task View would be well suited for this. I have tagged these sort of
packages on crantastic with the "onlineData" tag when I happen to notice
one, but I have not made a concerted effort to find all packages.  A Task
View would be even better.

http://crantastic.org/tags/onlineData

-Don

p.s.
I can mention openair as a package that has tools to access public
databases.


Tagged it.

--
Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health&  Science University


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Spencer Graves, PE, PhD
President and Chief Technology Officer
Structure Inspection and Monitoring, Inc.
751 Emerson Ct.
San José, CA 95126
ph:  408-655-4567
web:  www.structuremonitoring.com

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] The Future of R | API to Public Databases

Reply via email to