Re: [CODE4LIB] preconference proposals

2009-11-12 Thread Gabriel Farrell
On Tue, Nov 10, 2009 at 02:47:42PM +, Jodi Schneider wrote:
> If you'd be up for it Erik, I'd envision a basic session in the morning.
> Some of us (like me) have never gotten Solr up and running.
> 
> Then the afternoon could break off for an advanced session.
> 
> Though I like Bess's idea, too! Would that be suitable for a conference
> breakout? Not sure I'd want to pit it against Solr advanced session!

The preconfs should be as inclusive as possible, but I'm wondering if
the Solr session might be more beneficial if we dive into the
particulars right off the bat in the morning.  There are only a few
steps to get Solr up and running -- it's in the configuration for our
custom needs that the advice of a certain Mr. Hatcher can really be
helpful.  

You're right, though, that the NGC thing sounds more like a BOF session.
I'd support that in order to attend a full preconf day of Solr.  


Gabriel


Re: [CODE4LIB] preconference proposals

2009-11-12 Thread Gabriel Farrell
On Tue, Nov 10, 2009 at 06:41:20AM -0800, Bess Sadler wrote:
> +1 from me on this, no surprise. :)
> 
> What if we did a next gen catalog day thing? We could spend the
> morning on solr, which many projects have in common, in the morning,
> and then in the afternoon have sessions that build on top of solr
> (vufind, blacklight, kochief, etc.) We were going to submit a
> proposal for a blacklight pre-conference regardless, but it makes a
> lot of sense to do something more coordinated, and it particularly
> makes sense to ensure that as many people as possible can take
> advantage of Erik's presence and expertise.

Great idea, Bess.  Advanced Solr in the morning, including extended
dismax, query weighting, and solrmarc.  Then more general NGC stuff in
the afternoon, such as options for pulling data in and pushing it out,
how best to display various collections, etc.


Gabriel


Re: [CODE4LIB] preconference proposals

2009-11-12 Thread Gabriel Farrell
On Thu, Nov 12, 2009 at 09:02:09AM -0500, Erik Hatcher wrote:
> Or, use the new Lucid contributed "extended dismax" parser ;)
> 
>   https://issues.apache.org/jira/browse/SOLR-1553
> 
>   Erik

This looks sweet, Erik.  Many thanks for sharing.


Gabriel


[CODE4LIB] XForms EAD editor sandbox available

2009-11-12 Thread Ethan Gruber
Hello all,

Over the past few months I have been working on and off on a research
project to develop a XForms, web-based editor for EAD finding aids that runs
within the Orbeon tomcat application.  While still in a very early alpha
stage (I have probably put only 60-80 hours of work into it thus far), I
think that it's ready for a general demonstration to solicit opinions,
criticism, etc. from librarians, and technical staff.

Background:
For those not familiar with XForms, it is a W3C standard for creating
next-generation forms.  It is powerful and can allow you to create XML in
the way that it is intended to be created, without limits to repeatability,
complex hierarchies, or mixed content.  Orbeon adds a level on top of that,
taking care of all the ajax calls, serialization, CRUD operations, and a
variety of widgets that allow nice features like tabs and
autocomplete/autosuggest that can be bound to authority lists and controlled
access terms.  By default, Orbeon reads and writes data from and to an eXist
database that comes packaged with it, but you can have it serialize the XML
to disk or have it interact with any REST interface such as Fedora.

Goals:
Ultimately, I wish to create a system of forms that can open any EAD
2002-compliant XML file without any data loss or XML transformation
whatsoever.  I think that this is the shortcoming of systems such as Archon
and Archivists' Toolkit.  I want to integrate authority lists that can be
integrated into certain fields with autosuggest (such as corporate names,
people, and subjects).  If there is demand, I can build a public interface
for viewing the entire EAD collection, complete with solr for faceted browse
and search, but this is secondary to producing a form that people with some
basic archiving knowledge and EAD background can use to easily and
effectively create finding aids.  A public interface is the easy part, in
any case.  It wouldn't take more than a week or two to build something
fairly nice and robust.

Here is the link:  http://beta.scholarslab.org:9080/cocoon/eaditor/

I should stress that the application is *not complete.*  I am using cocoon
for providing a list of EAD content in the system.  I will remove that
application eventually and utilize Orbeon's internal pipelining features to
achieve the same objective.  I haven't delved too deeply into Orbeon's
pipelines yet.

Here are some things to note:

1. If you click on a link to open the main part of the guide or any of its
components, you have to click the "Load" link on the top of the form.  Forms
aren't being loaded on page load yet.
2. Elements that accept mixed content per the EAD 2002 schema (e.g.
paragraphs) only accept PCDATA.  I haven't worked on mixed content yet; it
is by far the most challenging aspect of the project.
3. I only have a few C-level elements available to add.
4. Not all did elements are available yet.
5. A lot of the generic attributes, like type and label, are not available
for editing yet.  This may be the type of thing that is best customized per
institution relative to their own best practices.  I don't want more input
fields than necessary right now.
6. The only thing you can add into the archdesc right now is the .
Once I finish all of the c-level elements, I can just put some xi:includes
into the archdesc XForm file to show them in the archdesc level.

I think those are the major issues for now.  As I stated earlier, this is
sort of a pre-alpha.  The project is open source and available (through svn)
to anyone who wants it.  http://code.google.com/p/eaditor/ .  I have put
together an easy package to get the application up and running without
difficulty.  All you have to do is unzip the download, go into the apache
tomcat folder and execute the startup script.  This assumes you have nothing
running on port 8080 already.

Download page: http://code.google.com/p/eaditor/downloads/list

Wiki instructions:
http://code.google.com/p/eaditor/wiki/QuickstartInstallation?ts=1257887453&updated=QuickstartInstallation

Comments, questions, criticism welcome.  The editor is a sandbox.  Feel free
to experiment.

Ethan Gruber
University of Virginia Library


Re: [CODE4LIB] solr | StopFilterFactory - stopwords.txt

2009-11-12 Thread Eric James
Thanks, Erik, there is no specific reason for their removal, I think this was 
just that the StopFilterFactory is preconfigured in the analyzer chain for 
fieldType=text.  We will do some performance testing with this filter removed.

 

BTW, a useful tool in deciding appropriate stopwords is the schema browser, 
which can be found on the /solr/admin page.  Here you can see term frequencies 
for each of the fields sorted from highest frequency to help weed out the terms 
of little querying value.

 

Eric  
 
> Date: Thu, 12 Nov 2009 09:06:46 -0500
> From: erikhatc...@mac.com
> Subject: Re: [CODE4LIB] solr | StopFilterFactory - stopwords.txt
> To: CODE4LIB@LISTSERV.ND.EDU
> 
> I often recommend against stop word removal altogether. Is there any 
> reason you need to remove them?
> 
> The primary reason stop words get removed is to increase performance 
> of queries with very common terms. If you are encountering that, 
> using Solr's CommonGramsFilter(Factory) is a good solution to keep 
> your stop words and alleviate the performance degradation potential. 
> The HathiTrust folks have had success with the common grams capability.
> 
> Erik
> 
> 
> On Nov 11, 2009, at 3:41 PM, Eric James wrote:
> 
> > Has anyone already given some thought into refining the solr 
> > stopwords.txt for library collections, particularly finding aids? 
> > The words included in the out of the box stopwords.txt are of very 
> > questionable unimportance:
> >
> >  > t that the their then there these they this to was will with>
> >
> >
> >
> > We were indexing a field id with "no." as one of its tokens (for 
> > number), but wanted a query with "no" (where the person did not add 
> > the period) to find the doc, but in actuality the "no" would get 
> > stripped by the StopFilterFactory. And thus we stumbled upon this 
> > list, and was a bit suprised by some of the inclusions (ex:"will"), 
> > and exclusions( ex:"a").
> >
> >
> >
> > Thanks,
> >
> > Eric James
> >
> > Yale University Libraries
> > 
  

Re: [CODE4LIB] deadline for preconference proposals

2009-11-12 Thread Kevin S. Clarke
On Thu, Nov 12, 2009 at 10:06 AM, Jonathan Rochkind  wrote:

> I would suggest that you should plan to have the pre-confs _set_ (not just
> proposals received) by the time registration opens. Realistically I usually
> register first, and _then_ get my approval to go, figuring if I don't get
> approval I can always give up my spot.
>
> Perhaps this is what you were already planning?

Yes, sorry if that wasn't clear.  The preconferences would be set by
registration time (also because you'd have to sign up for them at that
point).

Kevin


Re: [CODE4LIB] deadline for preconference proposals

2009-11-12 Thread Jonathan Rochkind
As a practical matter, I need to get my travel approvals and make my 
travel plans quite in advance. I can only get approval for a pre-conf I 
know about, so if a pre-conf comes too late, it won't be something I can 
possibly attend.


I expect I am not alone here.

I would suggest that you should plan to have the pre-confs _set_ (not 
just proposals received) by the time registration opens. Realistically I 
usually register first, and _then_ get my approval to go, figuring if I 
don't get approval I can always give up my spot.


Perhaps this is what you were already planning? As long as the pre-confs 
are actually determined by registration open, I think it's reasonable.  
Ideally they'd be determined in advance of registration though.


Jonathan

Kevin S. Clarke wrote:

Someone asked a good question about deadlines for preconference
proposals.  I didn't set one because I don't expect we'll reach the
max number of spaces.  We will have to have all the proposals a week
or so before registration though.  If we start pushing up against the
amount of space that we have, I'll call for any last proposals (so
that if we need to vote we can... I'm not expecting us to get that
many though).

Kevin
  


Re: [CODE4LIB] Transport options from Charlotte to Asheville for c4l2010

2009-11-12 Thread Bridger Dyson-Smith
Hello -
an FYI: if you're planning on flying into Knoxville and making a drive to
Asheville via I-40, be advised that there has been a rather large rockslide
and 40 is closed. Here's a link to the official update --
http://www.ncdot.org/traffictravel/.

Safe travels to all.
cheers,
Bridger

--
Bridger Dyson-Smith
Digital Library Initiatives
University of Tennessee Libraries

On Thu, Nov 12, 2009 at 8:29 AM, John Fereira  wrote:

> Ross Singer wrote:
>
>> Likewise, Knoxville is also ~1.5 hours from Asheville.  Between
>> Greenville, Charlotte and Knoxville you might be able to catch a
>> special deal.
>>
>>
> A bit closer is the Tri-Cities airport (Johnson City, Bristol, Kingsport).
>  I've flown in there a couple of times when my in-laws lived in Johnson
> City.  It's a real nice drive, about an hour and 20 minutes, from there to
> Asheville.
>
> --
> John Fereira
> Cornell University
> Twitter: @john_fereira
> Google Wave: fere...@googlewave.com
>


[CODE4LIB] deadline for preconference proposals

2009-11-12 Thread Kevin S. Clarke
Someone asked a good question about deadlines for preconference
proposals.  I didn't set one because I don't expect we'll reach the
max number of spaces.  We will have to have all the proposals a week
or so before registration though.  If we start pushing up against the
amount of space that we have, I'll call for any last proposals (so
that if we need to vote we can... I'm not expecting us to get that
many though).

Kevin


Re: [CODE4LIB] solr | StopFilterFactory - stopwords.txt

2009-11-12 Thread Erik Hatcher
I often recommend against stop word removal altogether.  Is there any  
reason you need to remove them?


The primary reason stop words get removed is to increase performance  
of queries with very common terms.  If you are encountering that,  
using Solr's CommonGramsFilter(Factory) is a good solution to keep  
your stop words and alleviate the performance degradation potential.   
The HathiTrust folks have had success with the common grams capability.


Erik


On Nov 11, 2009, at 3:41 PM, Eric James wrote:

Has anyone already given some thought into refining the solr  
stopwords.txt for library collections, particularly finding aids?  
The words included in the out of the box stopwords.txt are of very  
questionable unimportance:


t that the their then there these they this to was will with>




We were indexing a field id with "no." as one of its tokens (for  
number), but wanted a query with "no" (where the person did not add  
the period) to find the doc, but in actuality the "no" would get  
stripped by the StopFilterFactory. And thus we stumbled upon this  
list, and was a bit suprised by some of the inclusions (ex:"will"),  
and exclusions( ex:"a").




Thanks,

Eric James

Yale University Libraries



Re: [CODE4LIB] preconference proposals

2009-11-12 Thread Erik Hatcher

On Nov 11, 2009, at 6:46 PM, Naomi Dushay wrote:

What do you think about the Solr part having some specific goodies  
like:


+1 to it all!


lots on dismax magic

how to do fielded searching (author/title/subject) with dismax

how to do browsing (termsComponent query, then fielded query to get  
matching docs)


how to do boolean  (use lucene QP, or fake it with dismax)


Or, use the new Lucid contributed "extended dismax" parser ;)

  https://issues.apache.org/jira/browse/SOLR-1553

Erik


Re: [CODE4LIB] Transport options from Charlotte to Asheville for c4l2010

2009-11-12 Thread John Fereira

Ross Singer wrote:

Likewise, Knoxville is also ~1.5 hours from Asheville.  Between
Greenville, Charlotte and Knoxville you might be able to catch a
special deal.
  
A bit closer is the Tri-Cities airport (Johnson City, Bristol, 
Kingsport).  I've flown in there a couple of times when my in-laws lived 
in Johnson City.  It's a real nice drive, about an hour and 20 minutes, 
from there to Asheville.


--
John Fereira
Cornell University
Twitter: @john_fereira
Google Wave: fere...@googlewave.com


Re: [CODE4LIB] big thanks to ND folks

2009-11-12 Thread Jodi Schneider
Lovely! Thanks for this, Eric!

I've updated the Code4Lib homepage to link back to
http://dewey.library.nd.edu/mailing-lists/code4lib/
from Email.

:) -Jodi

On Tue, Nov 10, 2009 at 4:10 PM, Eric Lease Morgan  wrote:

> On Nov 10, 2009, at 10:54 AM, Dan Chudnov wrote:
>
> > A quick interjection to praise the wise folks at ND who host this list
> > and whose listserv overlords (or they themselves?) saw fit to upgrade
> > the listserv web front-end sometime past year or two.  Following
> > Eric's page into "subscribers' corner" led me into a screen where I
> > could actually set my settings the way I wanted for the first time
> > ever.
> >
> > Maintaining infrastructure for years is no mean feat, and modernizing
> > ancient infrastructure is no meaner.  Many thanks to our gracious
> > hosts for all their years of list support!
>
>
>
> Dan, you're welcome, and for everybody else, one of the best places to
> begin when it comes to the Code4Lib mailing list is the following:
>
>   http://dewey.library.nd.edu/mailing-lists/code4lib/
>
> Fortunately, Google has also well-indexed our community.
>
> --
> Eric Morgan
> University of Notre Dame
>


Re: [CODE4LIB] Transport options from Charlotte to Asheville for c4l2010

2009-11-12 Thread Jodi Schneider
"Nearby" searches will typically pick up Greenville and Tri Cities (How is
Tri Cities to get to?).

Kayak allows you to customize the 'nearby airport' search using airport
codes. Recommended 'from' search:
AVL, GSP, CLT, TYS
at
http://kayak.com/

-Jodi

On Thu, Nov 12, 2009 at 3:06 AM, Ross Singer  wrote:

> Likewise, Knoxville is also ~1.5 hours from Asheville.  Between
> Greenville, Charlotte and Knoxville you might be able to catch a
> special deal.
>
> -Ross.
>
> On Tue, Nov 10, 2009 at 12:09 PM, Jay Luker  wrote:
> > FYI, on my previous visits to Asheville I've flown into/out of
> > Greenville, SC. It's only about 1 hr driving time, so could be worth
> > comparing vs. Charlotte shuttles.
> >
> > --jay
> >
> > On Tue, Nov 10, 2009 at 11:58 AM, Kevin S. Clarke 
> wrote:
> >> I wonder how many folks would be interested in this?  It might be
> >> possible to use one of the university vans to make trips.  I'll look
> >> into this.
> >>
> >> Kevin
> >>
> >>
> >>
> >> On Tue, Nov 10, 2009 at 1:32 AM, Mark A. Matienzo 
> wrote:
> >>> Charlotte to Asheville shuttles are not cheap - one Asheville-based
> >>> company is advertising a $170 one-way rate. Obviously, if you were to
> >>> split that, it would be cheaper.
> >>>
> >>> It's my understanding that Charlotte to Asheville driving time is
> >>> about 2.5 hours - you may just want to get a group together to rent a
> >>> car.
> >>>
> >>> Mark A. Matienzo
> >>> Applications Developer, Strategic Planning
> >>> The New York Public Library
> >>>
> >>>
> >>>
> >>> On Tue, Nov 10, 2009 at 1:19 AM, Mark Jordan  wrote:
>  Hi,
> 
>  Can anyone recommend transportation options to get from Charlotte
> International Airport to Asheville? From my neck of the woods airfare to
> Charlotte appears to be a ~ $200 cheaper than to Asheville.
> 
>  TIA,
> 
>  Mark
> 
> >>>
> >>
> >
>