Re: [CODE4LIB] irc back channel logs [hacks]
And a word cloud: http://www.wordle.net/show/wrdl/3157008/code4lib_2011_IRC_logs On Sat, Feb 12, 2011 at 10:13 AM, Eric Lease Morgan wrote: > I have written a few hacks allowing me to do rudimentary text mining > against the logs. [1] From readme.txt: > > This directory contains a number of files and scripts allowing > one to do a bit of text mining against the Code4Lib conference > IRC log files for 2011. This is just a beginning, and the > directory includes: > >* irclog.txt - the raw log file downloaded from > http://irc.code4lib.org/c4l11/static/logs/irclog > >* log2db.pl - reads the raw log and outputs a tab-delimited > file with three columns (date, name, text) > >* irclog.db - the output of log2db.pl > >* count.pl - outputs the number of names (n), increases (i), > decreases (d), URLs (u), and commands (c) found in the log; > useful for seeing what is hot and what is not. > >* ngrams.pl - given an integer (n), outputs the most frequent > n-length phrases; useful to see what words and phrases are > used most frequently > >* concordance.pl - a KWIK index; the simplest of search engines > >* readme.txt - this file > > Using these tools one can see that: > >* Zoia had the most to say >* mbklein's karma was increased the most >* Zoia's karma was decreased the most >* the most popular URL passed around regarded social activities >* we tried to sing as many as 196 songs closely followed by anagrams >* 28 of the songs weren't found >* live streams were mentioned frequently > > > I have to go shovel snow now... > > [1] initial hacks - http://bit.ly/gMO4op > > -- > Eric Lease Morgan >
[CODE4LIB] SKOS-2-HIVE workshop, DC area
Workshop announcement: **SKOS-2-HIVE: CREATING SKOS VOCABULARIES TO HELP INTERDISCIPLINARY VOCABULARY ENGINEERING** George Washington University (Mt. Vernon Campus), March 9, 2011 Location: Eckles Library Auditorium, Mt. Vernon Campus of George Washington University Click Here to Register (REGISTRATION CLOSES ON MARCH 1) § WORKSHOP DESCRIPTION The SKOS-2-HIVE workshop focuses on using semantic web technologies for representing and describing collections using multiple controlled vocabularies. The workshop focuses on basic understanding and usage of W3C's Simple Knowledge Organization Systems (SKOS), linked data, and the HIVE library of open source applications. There are two workshop components: 1. Foundational Concepts and HIVE Basics. This component addresses the conceptual design of structured vocabularies, including a range of semantic relationships; domain representation and issues central to identifying useful vocabularies; the application of basic SKOS tags; and basic techniques underlying the HIVE vocabulary server for enriching digital resource descriptions. 2. Implementing HIVE. This component covers more technical aspects including steps for implementing a HIVE server. Workshop outlines and learning outcomes provided further below. Workshop rationale: Semantic web technologies provide innovative means for organizing, describing, and managing digital resources in a range of formats. Successful implementation and use of semantic web technologies requires both information professionals and system developers to become knowledgeable about the underlying intellectual construct and roadmap toward forming a semantic web. The IMLS-funded Helping Interdisciplinary Vocabulary Engineering (HIVE) project has been addressing these needs by working with the W3C's Simple Knowledge Organization Systems (SKOS) in the linked data environment. HIVE has been implemented using semantic web enabling technologies and machine learning to provide a solution to the traditional controlled vocabulary problems of cost, interoperability, and usability. Current HIVE vocabulary partners include the Library of Congress, theGetty Research Institute, and the U.S. Geological Survey. § WORKSHOP OUTLINE AND LEARNING OUTCOMES Morning Session: Foundational Concepts and HIVE Basics, 9:00 AM-12:00 PM Overview This session addresses traditional thesaural concepts and the extension of these concepts via SKOS/linked data, HIVE and the semantic web. Audience This workshop targets information professionals (librarians, archivists, museum professional, web architects, and others); system developers; and students seeking knowledge about the basic framework and conceptual aspect of vocabulary design. Prerequisites Have a basic understanding of subject metadata creation or subject cataloging. Learning Outcomes - Evaluate controlled vocabulary, thesauri, and ontologies that would best fit your information environment's needs. - Identify basic thesaural relationships including: relative, associative and hierarchical. - Use basic SKOS tags to identify the above thesaural relationships. - Become familiar with using the HIVE software and the HIVE processes. Lunch on your own 12:00 PM-1:00 PM Afternoon Session: Implementing HIVE 1:00 PM-4:00 PM Overview This session provides details on the HIVE system, underlying algorithms, source code, and the library of system features. Audience System developers, as well as technologists, librarians, and information scientists who are interested in the technological side of the semantic web, and who may be implementing, experiments with, and/or extending HIVE technologies. Prerequisites Java programming, and object oriented design. Learning Outcomes - Understand the architecture of the HIVE vocabulary server. - Become familiar with information retrieval techniques and how HIVE applies them to vocabulary terms. - Gain experience indexing documents with HIVE and KEA (a machine learning application). - Learn how to integrate HIVE vocabulary services into other tools. - Learn how to use the SPARQL language for querying content in HIVE. Registration fees and registration $60.00 half day (single session) $105.00 full day (both sessions) Registration fee includes: Coffee and Danishes from 8:00 AM-9:00 AM; does not include lunch. Participants are asked to bring their own laptops. Click Here to Register ** Wiki link for workshop: https://www.nescent.org/sites/hive/GWU_Workshop_2011#George_Washington_University_.28Mt._Vernon_Campus.29.2C_March_9.2C_2011. Workshop Leaders Jane Greenberg is a professor at the School of Information and Library Science, University of North Carolina at Chapel Hill (SILS/UNC-CH), and director of the SILS Metadata Research Center. Ryan Scherle is the lead data repository architect for Dryad at the National Evolutionary Synthesis Center (NESCent). Hollie White is doctoral fellow at the SILS Me
[CODE4LIB] Bad numbers in my lightning talk (e.g. 45% of sessions have one action: search)
Basically, I failed to exclude a whole swath of activity I should have ignored. An explanation, the new data, and an excellent link to a corroborating paper by our usability group, is at: http://robotlibrarian.billdueber.com/corrected-code4lib-slides-are-up/ My sincere apologies to everyone. I'm trying to do due-diligence, but anyone passed a copy of my slides to anyone, please make sure they get the better numbers. -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] Ranking factors for library resources: Who really uses what?
A bunch of us are using Solr/lucene for discovery over library bibliographic records, which is based on the basic tf*idf weighting type algorithm, with a bunch of tweaks. So all of us doing that, and finding it pretty successful, are probably surprised to hear that this approach won't work on library data. :) Jonathan On 2/15/2011 4:13 PM, Dave Caroline wrote: I wrote my own search engine for my system and thought long and hard about relevancy, in the end went for none! and display alphabetical. Dave Caroline On Tue, Feb 15, 2011 at 8:32 PM, Till Kinstler wrote: There is a vivid discussion about relevance ranking for library resources in discovery interfaces in recent years. In articles, blog posts and presentations on this topic, again and again possible ranking factors are discussed beyond well known term statistic based methods like the vector space retrieval model with tf*idf weighting (often after claiming term statistics based approaches wouldn't work on library data, of course without proofing that). Usually the following possible factors are mentioned: - popularity (often after stressing Google's success with PageRank), measured in several ways like holding quantities, circulation statistics, clicks in catalogues, explicit user ratings, number of citations, ... - freshness: rank newer items higher (ok, we have that in many old school Boolean OPACs as "sort by date", but not in combination with other ranking factors like term statistics) - availability - contextual/individual factors, eg. if (user.status=student) boost(textbook); if (user.faculty=economics) boost(Karl Marx); if season=christmas boost(gingerbread recipes); ... - ... I tried to find examples where such factors beyond term statistics are used to rank search results in libraryland. But I hardly find them, only lots of theoretical discussions about all the pros and cons of all thinkable factors going on since the 1980s. I mean, all that is doable with search engines like Solr today. But it seems, it is hardly implemented somewhere in real systems (beyond simple cases, for example we slightly boost hits in collections a user has immediate online access to, but we never asked users, if they like it or notice at all). WorldCat does a little bit something, it seems. They, of course, boost resources with local holdings in WorldCat local. And they use language preferences (Accept-Language HTTP header) for boosting titles in users' preferred languages. And there might be more in WorldCat ranking. But there is not much published on that, it seems? So, if you implemented something beyond term statistics based ranking, speak up and show. I am very interested in real world implementations and experiences (like user feedback, user studies etc.). Thanks, Till
Re: [CODE4LIB] Ranking factors for library resources: Who really uses what?
I wrote my own search engine for my system and thought long and hard about relevancy, in the end went for none! and display alphabetical. Dave Caroline On Tue, Feb 15, 2011 at 8:32 PM, Till Kinstler wrote: > There is a vivid discussion about relevance ranking for library > resources in discovery interfaces in recent years. In articles, blog > posts and presentations on this topic, again and again possible ranking > factors are discussed beyond well known term statistic based methods > like the vector space retrieval model with tf*idf weighting (often after > claiming term statistics based approaches wouldn't work on library data, > of course without proofing that). > > Usually the following possible factors are mentioned: > - popularity (often after stressing Google's success with PageRank), > measured in several ways like holding quantities, circulation > statistics, clicks in catalogues, explicit user ratings, number of > citations, ... > - freshness: rank newer items higher (ok, we have that in many old > school Boolean OPACs as "sort by date", but not in combination with > other ranking factors like term statistics) > - availability > - contextual/individual factors, eg. if (user.status=student) > boost(textbook); if (user.faculty=economics) boost(Karl Marx); if > season=christmas boost(gingerbread recipes); ... > - ... > > I tried to find examples where such factors beyond term statistics are > used to rank search results in libraryland. But I hardly find them, only > lots of theoretical discussions about all the pros and cons of all > thinkable factors going on since the 1980s. I mean, all that is doable > with search engines like Solr today. But it seems, it is hardly > implemented somewhere in real systems (beyond simple cases, for example > we slightly boost hits in collections a user has immediate online access > to, but we never asked users, if they like it or notice at all). > WorldCat does a little bit something, it seems. They, of course, boost > resources with local holdings in WorldCat local. And they use language > preferences (Accept-Language HTTP header) for boosting titles in users' > preferred languages. And there might be more in WorldCat ranking. But > there is not much published on that, it seems? > > So, if you implemented something beyond term statistics based ranking, > speak up and show. I am very interested in real world implementations > and experiences (like user feedback, user studies etc.). > > Thanks, > Till >
[CODE4LIB] Ranking factors for library resources: Who really uses what?
There is a vivid discussion about relevance ranking for library resources in discovery interfaces in recent years. In articles, blog posts and presentations on this topic, again and again possible ranking factors are discussed beyond well known term statistic based methods like the vector space retrieval model with tf*idf weighting (often after claiming term statistics based approaches wouldn't work on library data, of course without proofing that). Usually the following possible factors are mentioned: - popularity (often after stressing Google's success with PageRank), measured in several ways like holding quantities, circulation statistics, clicks in catalogues, explicit user ratings, number of citations, ... - freshness: rank newer items higher (ok, we have that in many old school Boolean OPACs as "sort by date", but not in combination with other ranking factors like term statistics) - availability - contextual/individual factors, eg. if (user.status=student) boost(textbook); if (user.faculty=economics) boost(Karl Marx); if season=christmas boost(gingerbread recipes); ... - ... I tried to find examples where such factors beyond term statistics are used to rank search results in libraryland. But I hardly find them, only lots of theoretical discussions about all the pros and cons of all thinkable factors going on since the 1980s. I mean, all that is doable with search engines like Solr today. But it seems, it is hardly implemented somewhere in real systems (beyond simple cases, for example we slightly boost hits in collections a user has immediate online access to, but we never asked users, if they like it or notice at all). WorldCat does a little bit something, it seems. They, of course, boost resources with local holdings in WorldCat local. And they use language preferences (Accept-Language HTTP header) for boosting titles in users' preferred languages. And there might be more in WorldCat ranking. But there is not much published on that, it seems? So, if you implemented something beyond term statistics based ranking, speak up and show. I am very interested in real world implementations and experiences (like user feedback, user studies etc.). Thanks, Till
[CODE4LIB] Academic librarians holding PhDs - seeking participants for a study
Dear Academic Librarian: We are writing to request your participation in a study of academic librarians holding PhD degrees. ABOUT THIS RESEARCH We are conducting a qualitative study to understand how the competencies and interests developed in a doctoral education contribute to academic librarianship, and how they are supported or inhibited by academic libraries. The results will help respond to the challenge of identifying novel ways to enhance the research and teaching support at academic libraries. More details at https://sites.google.com/site/phdlibrarians/ YOU ARE ELIGIBLE TO PARTICIPATE IF YOU ARE: = (1) currently employed as an academic librarian, (2) hold an ALA-accredited Masters degree in Library and Information Studies, AND (3) have completed a PhD research degree (or equivalent research degree) in any field. If you know someone who meets these characteristics, please forward this message on. WHAT WILL HAPPEN IN THIS STUDY? === You will complete a qualitative survey. This means that you will write responses to open-ended questions about your perceptions on the value of a doctoral education in academic librarianship and how academic libraries leverage your doctoral competencies and interests. IF YOU WOULD LIKE TO PARTICIPATE === This is great. Please visit the online survey at: http://qtrial.qualtrics.com/SE/?SID=SV_0AKdGGF43y2b1is PARTICIPANT RIGHTS === There is no obligation to participate in this study. If you agree to participate, your privacy will be protected at all times. There are no anticipated risks or discomfort involved, and you may decide to end your participation at any time. We have a consent form that explains your involvement, our research, and our responsibilities in protecting your privacy. This study has been approved by the Institutional Review Board at Wake Forest University in Winston-Salem, North Carolina, for compliance with ethical principles and regulatory requirements (#IRB00015580). If you have questions about this, you may contact the Office of Research & Sponsored Programs at 336-758-5888. Thank you for your consideration. If you have any questions about our study please contact any member of our research team. Sincerely, Jeffery Loo, MLIS MSc PhD Librarian, University of California, Berkeley j...@berkeley.edu Erik Mitchell, MLIS PhD Librarian, Wake Forest University mitch...@wfu.edu Susan Rathbun-Grubb, MAT MSLS PhD Assistant Professor, School of Library and Information Science, University of South Carolina srath...@mailbox.sc.edu
[CODE4LIB] Fwd: [Air-L] Call For Papers: Videoconferencing in Practice: 21st Century Challenges (Electronic Journal of Communication)
Possibly of interest, both for describing the conference, as well as for describing some innovating working arrangements some of you have, involving videoconferencing or other offsite arrangements in place of some/all of your commute. -Jodi -- Forwarded message -- From: Dr. Sean Rintel Date: Tue, Feb 15, 2011 at 1:55 AM Subject: [Air-L] Call For Papers: Videoconferencing in Practice: 21st Century Challenges (Electronic Journal of Communication) To: ai...@listserv.aoir.org CALL FOR PAPERS Electronic Journal of Communication / La Revue Electronique de Communication Special Issue: Videoconferencing in Practice: 21st Century Challenges Editor: Sean Rintel (s.rintel@uq edu.au) Deadline: June 27, 2011. The issue is scheduled for publication in the first half of 2012. While not yet ubiquitous, videoconferencing can certainly be said to have come of age at the end of the first decade of the 21st century. The capabilities of videoconferencing systems have improved while barriers have been significantly lowered to the point where videoconferencing is no longer extraordinary, albeit still quite novel. This special issue of the Electronic Journal of Communication invites contributions exploring how videoconferencing has become a practical method of interaction in personal, professional, pedagogical, and institutional contexts. Contributors should have a central concern with whether and how users attend to the affordances and constraints of videoconferencing as relevant to the business at hand. The issue will seek to cover a broad range of subjects to provide a snapshot of 21st century videoconferencing research from a Communication perspective. As such, all authors are asked to include in their literature review and conclusions some sense of how their study deals with questions of Communication research in general, and Communication-focused videoconferencing research in particular. Manuscripts are invited that cover one of the following areas, or authors may propose their own area that demonstrably adds to the goal of the special issue: * Case studies of practical videoconferencing in context. * Survey studies of practical videoconferencing in context. * Experimental studies that show practical behaviours that may be specific to contexts or cut across contexts. * Literature reviews that focus on changes or critiques of videoconferencing findings and/or theories. * Representations of videoconferencing in the mass media or other venues of the public imagination. * Methodological studies about research practices or methods for capturing/analysing videoconferencing in context. * Trouble using videoconferencing in context (e.g. constraints of the medium, communication trouble, operational problems such as network trouble, work-arounds). * Adoption/take-up studies of practical videoconferencing in context. Authors are welcome to contact the editor to discuss ideas for manuscripts. Given the topic and the electronic nature of the journal, authors are also encouraged to supply video and/or audio clip examples or supplemental materials. Submission detail are as follows: * Deadline for completed manuscripts is June 27, 2011. The issue is scheduled for publication in the first half of 2012. * Maximum 7500 words (limit includes references but excludes transcribed examples, figures, and tables). * Manuscript and citation format: APA 6th Edition style. * Submission: Email manuscript files to Sean Rintel at s.rin...@uq.edu.au. * File format: Word 97 (.doc), Word 2008 (.docx), or PDF. Authors using carefully formatted transcripts (e.g. those of Conversation Analysis) should also provide screenshot images of all excerpts labeled with the appropriate example number. Video and audio clips should be provided in Quicktime .mov format with appropriate indication in the manuscript as to where the clip should appear. Do not embed video or audio in Word or PDF files. This call is also viewable at: http://www.cios.org/www/ejc/calls/vidprac.htm -- Dr. Sean Rintel Associate Lecturer in Communication School of English, Media Studies, and Art History Room 509, Michie Building (9) The University of Queensland Brisbane, QLD, Australia, 4072 Telephone: +61-7-3365-2147 Facsimile: +61-7-3365-2799 CIROS Provider Number 00025B ___ The ai...@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org Join the Association of Internet Researchers: http://www.aoir.org/