Re: [CODE4LIB] irc back channel logs [hacks]

2011-02-15 Thread Michael B. Klein
And a word cloud:

http://www.wordle.net/show/wrdl/3157008/code4lib_2011_IRC_logs

On Sat, Feb 12, 2011 at 10:13 AM, Eric Lease Morgan  wrote:

> I have written a few hacks allowing me to do rudimentary text mining
> against the logs. [1] From readme.txt:
>
>  This directory contains a number of files and scripts allowing
>  one to do a bit of text mining against the Code4Lib conference
>  IRC log files for 2011. This is just a beginning, and the
>  directory includes:
>
>* irclog.txt - the raw log file downloaded from
>  http://irc.code4lib.org/c4l11/static/logs/irclog
>
>* log2db.pl - reads the raw log and outputs a tab-delimited
>  file with three columns (date, name, text)
>
>* irclog.db - the output of log2db.pl
>
>* count.pl - outputs the number of names (n), increases (i),
>  decreases (d), URLs (u), and commands (c) found in the log;
>  useful for seeing what is hot and what is not.
>
>* ngrams.pl - given an integer (n), outputs the most frequent
>  n-length phrases; useful to see what words and phrases are
>  used most frequently
>
>* concordance.pl - a KWIK index; the simplest of search engines
>
>* readme.txt - this file
>
>  Using these tools one can see that:
>
>* Zoia had the most to say
>* mbklein's karma was increased the most
>* Zoia's karma was decreased the most
>* the most popular URL passed around regarded social activities
>* we tried to sing as many as 196 songs closely followed by anagrams
>* 28 of the songs weren't found
>* live streams were mentioned frequently
>
>
> I have to go shovel snow now...
>
> [1] initial hacks - http://bit.ly/gMO4op
>
> --
> Eric Lease Morgan
>


[CODE4LIB] SKOS-2-HIVE workshop, DC area

2011-02-15 Thread Ryan Scherle
Workshop announcement:

**SKOS-2-HIVE: CREATING SKOS VOCABULARIES TO HELP INTERDISCIPLINARY VOCABULARY 
ENGINEERING**

George Washington University (Mt. Vernon Campus), March 9, 2011

Location: Eckles Library Auditorium, Mt. Vernon Campus of George Washington 
University

Click Here to Register (REGISTRATION CLOSES ON MARCH 1)
 

 

§  WORKSHOP DESCRIPTION

The SKOS-2-HIVE workshop focuses on using semantic web technologies for 
representing and describing collections using multiple controlled vocabularies. 
The workshop focuses on basic understanding and usage of W3C's Simple Knowledge 
Organization Systems (SKOS), linked data, and the HIVE library of open source 
applications.

There are two workshop components:

1. Foundational Concepts and HIVE Basics. This component addresses the 
conceptual design of structured vocabularies, including a range of semantic 
relationships; domain representation and issues central to identifying useful 
vocabularies; the application of basic SKOS tags; and basic techniques 
underlying the HIVE vocabulary server for enriching digital resource 
descriptions.

2. Implementing HIVE. This component covers more technical aspects including 
steps for implementing a HIVE server.

Workshop outlines and learning outcomes provided further below.
Workshop rationale: Semantic web technologies provide innovative means for 
organizing, describing, and managing digital resources in a range of formats. 
Successful implementation and use of semantic web technologies requires both 
information professionals and system developers to become knowledgeable about 
the underlying intellectual construct and roadmap toward forming a semantic 
web. The IMLS-funded Helping Interdisciplinary Vocabulary Engineering (HIVE) 
project has been addressing these needs by working with the W3C's Simple 
Knowledge Organization Systems (SKOS) in the linked data environment. HIVE has 
been implemented using semantic web enabling technologies and machine learning 
to provide a solution to the traditional controlled vocabulary problems of 
cost, interoperability, and usability. Current HIVE vocabulary partners include 
the Library of Congress, theGetty Research Institute, and the U.S. Geological 
Survey.

 

§  WORKSHOP OUTLINE AND LEARNING OUTCOMES

Morning Session: Foundational Concepts and HIVE Basics, 9:00 AM-12:00 PM

Overview

This session addresses traditional thesaural concepts and the extension of 
these concepts via SKOS/linked data, HIVE and the semantic web.

Audience

This workshop targets information professionals (librarians, archivists, museum 
professional, web architects, and others); system developers; and students 
seeking knowledge about the basic framework and conceptual aspect of vocabulary 
design.

Prerequisites

Have a basic understanding of subject metadata creation or subject cataloging.

Learning Outcomes

- Evaluate controlled vocabulary, thesauri, and ontologies that would best fit 
your information environment's needs.

- Identify basic thesaural relationships including: relative, associative and 
hierarchical.

- Use basic SKOS tags to identify the above thesaural relationships.

- Become familiar with using the HIVE software and the HIVE processes.


Lunch on your own 12:00 PM-1:00 PM


Afternoon Session: Implementing HIVE 1:00 PM-4:00 PM

Overview

This session provides details on the HIVE system, underlying algorithms, source 
code, and the library of system features.

Audience

System developers, as well as technologists, librarians, and information 
scientists who are interested in the technological side of the semantic web, 
and who may be implementing, experiments with, and/or extending HIVE 
technologies.

Prerequisites

Java programming, and object oriented design.

Learning Outcomes

- Understand the architecture of the HIVE vocabulary server.
- Become familiar with information retrieval techniques and how HIVE applies 
them to vocabulary terms.
- Gain experience indexing documents with HIVE and KEA (a machine learning 
application).
- Learn how to integrate HIVE vocabulary services into other tools.
- Learn how to use the SPARQL language for querying content in HIVE.
 

Registration fees and registration

$60.00 half day (single session)

$105.00 full day (both sessions)

Registration fee includes: Coffee and Danishes from 8:00 AM-9:00 AM; does not 
include lunch.

Participants are asked to bring their own laptops.

Click Here to Register

 

 

** Wiki link for workshop: 
https://www.nescent.org/sites/hive/GWU_Workshop_2011#George_Washington_University_.28Mt._Vernon_Campus.29.2C_March_9.2C_2011.

 

Workshop Leaders

Jane Greenberg is a professor at the School of Information and Library Science, 
University of North Carolina at Chapel Hill (SILS/UNC-CH), and director of the 
SILS Metadata Research Center.

Ryan Scherle is the lead data repository architect for Dryad at the National 
Evolutionary Synthesis Center (NESCent).

Hollie White is doctoral fellow at the SILS Me

[CODE4LIB] Bad numbers in my lightning talk (e.g. 45% of sessions have one action: search)

2011-02-15 Thread Bill Dueber
Basically, I failed to exclude a whole swath of activity I should have
ignored.

An explanation, the new data, and an excellent link to a corroborating paper
by our usability group, is at:

http://robotlibrarian.billdueber.com/corrected-code4lib-slides-are-up/

My sincere apologies to everyone. I'm trying to do due-diligence, but anyone
passed a copy of my slides to anyone, please make sure they get the better
numbers.

-- 
Bill Dueber
Library Systems Programmer
University of Michigan Library


Re: [CODE4LIB] Ranking factors for library resources: Who really uses what?

2011-02-15 Thread Jonathan Rochkind
A bunch of us are using Solr/lucene for discovery over library 
bibliographic records, which is based on the basic tf*idf weighting type 
algorithm, with a bunch of tweaks.   So all of us doing that, and 
finding it pretty successful, are probably surprised to hear that this 
approach won't work on library data. :)


Jonathan

On 2/15/2011 4:13 PM, Dave Caroline wrote:

I wrote my own search engine for my system and thought long and hard
about relevancy, in the end went for none! and display alphabetical.

Dave Caroline

On Tue, Feb 15, 2011 at 8:32 PM, Till Kinstler  wrote:

There is a vivid discussion about relevance ranking for library
resources in discovery interfaces in recent years. In articles, blog
posts and presentations on this topic, again and again possible ranking
factors are discussed beyond well known term statistic based methods
like the vector space retrieval model with tf*idf weighting (often after
claiming term statistics based approaches wouldn't work on library data,
of course without proofing that).

Usually the following possible factors are mentioned:
- popularity (often after stressing Google's success with PageRank),
measured in several ways like holding quantities, circulation
statistics, clicks in catalogues, explicit user ratings, number of
citations, ...
- freshness: rank newer items higher (ok, we have that in many old
school Boolean OPACs as "sort by date", but not in combination with
other ranking factors like term statistics)
- availability
- contextual/individual factors, eg. if (user.status=student)
boost(textbook); if (user.faculty=economics) boost(Karl Marx); if
season=christmas boost(gingerbread recipes); ...
- ...

I tried to find examples where such factors beyond term statistics are
used to rank search results in libraryland. But I hardly find them, only
lots of theoretical discussions about all the pros and cons of all
thinkable factors going on since the 1980s. I mean, all that is doable
with search engines like Solr today. But it seems, it is hardly
implemented somewhere in real systems (beyond simple cases, for example
we slightly boost hits in collections a user has immediate online access
to, but we never asked users, if they like it or notice at all).
WorldCat does a little bit something, it seems. They, of course, boost
resources with local holdings in WorldCat local. And they use language
preferences (Accept-Language HTTP header) for boosting titles in users'
preferred languages. And there might be more in WorldCat ranking. But
there is not much published on that, it seems?

So, if you implemented something beyond term statistics based ranking,
speak up and show. I am very interested in real world implementations
and experiences (like user feedback, user studies etc.).

Thanks,
Till



Re: [CODE4LIB] Ranking factors for library resources: Who really uses what?

2011-02-15 Thread Dave Caroline
I wrote my own search engine for my system and thought long and hard
about relevancy, in the end went for none! and display alphabetical.

Dave Caroline

On Tue, Feb 15, 2011 at 8:32 PM, Till Kinstler  wrote:
> There is a vivid discussion about relevance ranking for library
> resources in discovery interfaces in recent years. In articles, blog
> posts and presentations on this topic, again and again possible ranking
> factors are discussed beyond well known term statistic based methods
> like the vector space retrieval model with tf*idf weighting (often after
> claiming term statistics based approaches wouldn't work on library data,
> of course without proofing that).
>
> Usually the following possible factors are mentioned:
> - popularity (often after stressing Google's success with PageRank),
> measured in several ways like holding quantities, circulation
> statistics, clicks in catalogues, explicit user ratings, number of
> citations, ...
> - freshness: rank newer items higher (ok, we have that in many old
> school Boolean OPACs as "sort by date", but not in combination with
> other ranking factors like term statistics)
> - availability
> - contextual/individual factors, eg. if (user.status=student)
> boost(textbook); if (user.faculty=economics) boost(Karl Marx); if
> season=christmas boost(gingerbread recipes); ...
> - ...
>
> I tried to find examples where such factors beyond term statistics are
> used to rank search results in libraryland. But I hardly find them, only
> lots of theoretical discussions about all the pros and cons of all
> thinkable factors going on since the 1980s. I mean, all that is doable
> with search engines like Solr today. But it seems, it is hardly
> implemented somewhere in real systems (beyond simple cases, for example
> we slightly boost hits in collections a user has immediate online access
> to, but we never asked users, if they like it or notice at all).
> WorldCat does a little bit something, it seems. They, of course, boost
> resources with local holdings in WorldCat local. And they use language
> preferences (Accept-Language HTTP header) for boosting titles in users'
> preferred languages. And there might be more in WorldCat ranking. But
> there is not much published on that, it seems?
>
> So, if you implemented something beyond term statistics based ranking,
> speak up and show. I am very interested in real world implementations
> and experiences (like user feedback, user studies etc.).
>
> Thanks,
> Till
>


[CODE4LIB] Ranking factors for library resources: Who really uses what?

2011-02-15 Thread Till Kinstler
There is a vivid discussion about relevance ranking for library
resources in discovery interfaces in recent years. In articles, blog
posts and presentations on this topic, again and again possible ranking
factors are discussed beyond well known term statistic based methods
like the vector space retrieval model with tf*idf weighting (often after
claiming term statistics based approaches wouldn't work on library data,
of course without proofing that).

Usually the following possible factors are mentioned:
- popularity (often after stressing Google's success with PageRank),
measured in several ways like holding quantities, circulation
statistics, clicks in catalogues, explicit user ratings, number of
citations, ...
- freshness: rank newer items higher (ok, we have that in many old
school Boolean OPACs as "sort by date", but not in combination with
other ranking factors like term statistics)
- availability
- contextual/individual factors, eg. if (user.status=student)
boost(textbook); if (user.faculty=economics) boost(Karl Marx); if
season=christmas boost(gingerbread recipes); ...
- ...

I tried to find examples where such factors beyond term statistics are
used to rank search results in libraryland. But I hardly find them, only
lots of theoretical discussions about all the pros and cons of all
thinkable factors going on since the 1980s. I mean, all that is doable
with search engines like Solr today. But it seems, it is hardly
implemented somewhere in real systems (beyond simple cases, for example
we slightly boost hits in collections a user has immediate online access
to, but we never asked users, if they like it or notice at all).
WorldCat does a little bit something, it seems. They, of course, boost
resources with local holdings in WorldCat local. And they use language
preferences (Accept-Language HTTP header) for boosting titles in users'
preferred languages. And there might be more in WorldCat ranking. But
there is not much published on that, it seems?

So, if you implemented something beyond term statistics based ranking,
speak up and show. I am very interested in real world implementations
and experiences (like user feedback, user studies etc.).

Thanks,
Till


[CODE4LIB] Academic librarians holding PhDs - seeking participants for a study

2011-02-15 Thread Mitchell, Erik
Dear Academic Librarian:



We are writing to request your participation in a study of academic
librarians holding PhD degrees.





ABOUT THIS RESEARCH



We are conducting a qualitative study to understand how the
competencies and interests developed in a doctoral education
contribute to academic librarianship, and how they are supported or
inhibited by academic libraries.  The results will help respond to the
challenge of identifying novel ways to enhance the research and
teaching support at academic libraries.  More details at
https://sites.google.com/site/phdlibrarians/





YOU ARE ELIGIBLE TO PARTICIPATE IF YOU ARE:

=

(1) currently employed as an academic librarian,

(2) hold an ALA-accredited Master’s degree in Library and Information
Studies, AND

(3) have completed a PhD research degree (or equivalent research
degree) in any field.



If you know someone who meets these characteristics, please forward
this message on.





WHAT WILL HAPPEN IN THIS STUDY?

===

You will complete a qualitative survey.  This means that you will
write responses to open-ended questions about your perceptions on the
value of a doctoral education in academic librarianship and how
academic libraries leverage your doctoral competencies and interests.





IF YOU WOULD LIKE TO PARTICIPATE

===

This is great.  Please visit the online survey at:

http://qtrial.qualtrics.com/SE/?SID=SV_0AKdGGF43y2b1is





PARTICIPANT RIGHTS

===

There is no obligation to participate in this study.  If you agree to
participate, your privacy will be protected at all times.  There are
no anticipated risks or discomfort involved, and you may decide to end
your participation at any time.  We have a consent form that explains
your involvement, our research, and our responsibilities in protecting
your privacy.



This study has been approved by the Institutional Review Board at Wake
Forest University in Winston-Salem, North Carolina, for compliance
with ethical principles and regulatory requirements (#IRB00015580).
If you have questions about this, you may contact the Office of
Research & Sponsored Programs at 336-758-5888.





Thank you for your consideration.  If you have any questions about our
study please contact any member of our research team.





Sincerely,



Jeffery Loo, MLIS MSc PhD

Librarian, University of California, Berkeley

j...@berkeley.edu



Erik Mitchell, MLIS PhD

Librarian, Wake Forest University

mitch...@wfu.edu



Susan Rathbun-Grubb, MAT MSLS PhD

Assistant Professor, School of Library and Information Science,
University of South Carolina

srath...@mailbox.sc.edu


[CODE4LIB] Fwd: [Air-L] Call For Papers: Videoconferencing in Practice: 21st Century Challenges (Electronic Journal of Communication)

2011-02-15 Thread Jodi Schneider
Possibly of interest, both for describing the conference, as well as for
describing some innovating working arrangements some of you have, involving
videoconferencing or other offsite arrangements in place of some/all of your
commute.

-Jodi

-- Forwarded message --
From: Dr. Sean Rintel 
Date: Tue, Feb 15, 2011 at 1:55 AM
Subject: [Air-L] Call For Papers: Videoconferencing in Practice: 21st
Century Challenges (Electronic Journal of Communication)
To: ai...@listserv.aoir.org


CALL FOR PAPERS
Electronic Journal of Communication / La Revue Electronique de Communication
Special Issue: Videoconferencing in Practice: 21st Century Challenges
Editor: Sean Rintel (s.rintel@uq edu.au)
Deadline: June 27, 2011. The issue is scheduled for publication in the
first half of 2012.

While not yet ubiquitous, videoconferencing can certainly be said to
have come of age at the end of the first decade of the 21st century.
The capabilities of videoconferencing systems have improved while
barriers have been significantly lowered to the point where
videoconferencing is no longer extraordinary, albeit still quite
novel. This special issue of the Electronic Journal of Communication
invites contributions exploring how videoconferencing has become a
practical method of interaction in personal, professional,
pedagogical, and institutional contexts.  Contributors should have a
central concern with whether and how users attend to the affordances
and constraints of videoconferencing as relevant to the business at
hand.

The issue will seek to cover a broad range of subjects to provide a
snapshot of 21st century videoconferencing research from a
Communication perspective. As such, all authors are asked to include
in their literature review and conclusions some sense of how their
study deals with questions of Communication research in general, and
Communication-focused videoconferencing research in particular.

Manuscripts are invited that cover one of the following areas, or
authors may propose their own area that demonstrably adds to the goal
of the special issue:

* Case studies of practical videoconferencing in context.
* Survey studies of practical videoconferencing in context.
* Experimental studies that show practical behaviours that may be
specific to contexts or cut across contexts.
* Literature reviews that focus on changes or critiques of
videoconferencing findings and/or theories.
* Representations of videoconferencing in the mass media or other
venues of the public imagination.
* Methodological studies about research practices or methods for
capturing/analysing videoconferencing in context.
* Trouble using videoconferencing in context (e.g. constraints of the
medium, communication trouble, operational problems such as network
trouble, work-arounds).
* Adoption/take-up studies of practical videoconferencing in context.

Authors are welcome to contact the editor to discuss ideas for
manuscripts. Given the topic and the electronic nature of the journal,
authors are also encouraged to supply video and/or audio clip examples
or supplemental materials. Submission detail are as follows:
* Deadline for completed manuscripts is June 27, 2011. The issue is
scheduled for publication in the first half of 2012.
* Maximum 7500 words (limit includes references but excludes
transcribed examples, figures, and tables).
* Manuscript and citation format: APA 6th Edition style.
* Submission: Email manuscript files to Sean Rintel at s.rin...@uq.edu.au.
* File format: Word 97 (.doc), Word 2008 (.docx), or PDF. Authors
using carefully formatted transcripts (e.g. those of Conversation
Analysis) should also provide screenshot images of all excerpts
labeled with the appropriate example number. Video and audio clips
should be provided in Quicktime .mov format with appropriate
indication in the manuscript as to where the clip should appear. Do
not embed video or audio in Word or PDF files.

This call is also viewable at: http://www.cios.org/www/ejc/calls/vidprac.htm

--
Dr. Sean Rintel
Associate Lecturer in Communication
School of English, Media Studies, and Art History
Room 509, Michie Building (9)
The University of Queensland
Brisbane, QLD, Australia, 4072
Telephone: +61-7-3365-2147
Facsimile: +61-7-3365-2799
CIROS Provider Number 00025B
___
The ai...@listserv.aoir.org mailing list
is provided by the Association of Internet Researchers http://aoir.org
Subscribe, change options or unsubscribe at:
http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers:
http://www.aoir.org/