Re: [CODE4LIB] preconference proposals
On Tue, Nov 10, 2009 at 02:47:42PM +, Jodi Schneider wrote: > If you'd be up for it Erik, I'd envision a basic session in the morning. > Some of us (like me) have never gotten Solr up and running. > > Then the afternoon could break off for an advanced session. > > Though I like Bess's idea, too! Would that be suitable for a conference > breakout? Not sure I'd want to pit it against Solr advanced session! The preconfs should be as inclusive as possible, but I'm wondering if the Solr session might be more beneficial if we dive into the particulars right off the bat in the morning. There are only a few steps to get Solr up and running -- it's in the configuration for our custom needs that the advice of a certain Mr. Hatcher can really be helpful. You're right, though, that the NGC thing sounds more like a BOF session. I'd support that in order to attend a full preconf day of Solr. Gabriel
Re: [CODE4LIB] preconference proposals
On Tue, Nov 10, 2009 at 06:41:20AM -0800, Bess Sadler wrote: > +1 from me on this, no surprise. :) > > What if we did a next gen catalog day thing? We could spend the > morning on solr, which many projects have in common, in the morning, > and then in the afternoon have sessions that build on top of solr > (vufind, blacklight, kochief, etc.) We were going to submit a > proposal for a blacklight pre-conference regardless, but it makes a > lot of sense to do something more coordinated, and it particularly > makes sense to ensure that as many people as possible can take > advantage of Erik's presence and expertise. Great idea, Bess. Advanced Solr in the morning, including extended dismax, query weighting, and solrmarc. Then more general NGC stuff in the afternoon, such as options for pulling data in and pushing it out, how best to display various collections, etc. Gabriel
Re: [CODE4LIB] preconference proposals
On Thu, Nov 12, 2009 at 09:02:09AM -0500, Erik Hatcher wrote: > Or, use the new Lucid contributed "extended dismax" parser ;) > > https://issues.apache.org/jira/browse/SOLR-1553 > > Erik This looks sweet, Erik. Many thanks for sharing. Gabriel
[CODE4LIB] XForms EAD editor sandbox available
Hello all, Over the past few months I have been working on and off on a research project to develop a XForms, web-based editor for EAD finding aids that runs within the Orbeon tomcat application. While still in a very early alpha stage (I have probably put only 60-80 hours of work into it thus far), I think that it's ready for a general demonstration to solicit opinions, criticism, etc. from librarians, and technical staff. Background: For those not familiar with XForms, it is a W3C standard for creating next-generation forms. It is powerful and can allow you to create XML in the way that it is intended to be created, without limits to repeatability, complex hierarchies, or mixed content. Orbeon adds a level on top of that, taking care of all the ajax calls, serialization, CRUD operations, and a variety of widgets that allow nice features like tabs and autocomplete/autosuggest that can be bound to authority lists and controlled access terms. By default, Orbeon reads and writes data from and to an eXist database that comes packaged with it, but you can have it serialize the XML to disk or have it interact with any REST interface such as Fedora. Goals: Ultimately, I wish to create a system of forms that can open any EAD 2002-compliant XML file without any data loss or XML transformation whatsoever. I think that this is the shortcoming of systems such as Archon and Archivists' Toolkit. I want to integrate authority lists that can be integrated into certain fields with autosuggest (such as corporate names, people, and subjects). If there is demand, I can build a public interface for viewing the entire EAD collection, complete with solr for faceted browse and search, but this is secondary to producing a form that people with some basic archiving knowledge and EAD background can use to easily and effectively create finding aids. A public interface is the easy part, in any case. It wouldn't take more than a week or two to build something fairly nice and robust. Here is the link: http://beta.scholarslab.org:9080/cocoon/eaditor/ I should stress that the application is *not complete.* I am using cocoon for providing a list of EAD content in the system. I will remove that application eventually and utilize Orbeon's internal pipelining features to achieve the same objective. I haven't delved too deeply into Orbeon's pipelines yet. Here are some things to note: 1. If you click on a link to open the main part of the guide or any of its components, you have to click the "Load" link on the top of the form. Forms aren't being loaded on page load yet. 2. Elements that accept mixed content per the EAD 2002 schema (e.g. paragraphs) only accept PCDATA. I haven't worked on mixed content yet; it is by far the most challenging aspect of the project. 3. I only have a few C-level elements available to add. 4. Not all did elements are available yet. 5. A lot of the generic attributes, like type and label, are not available for editing yet. This may be the type of thing that is best customized per institution relative to their own best practices. I don't want more input fields than necessary right now. 6. The only thing you can add into the archdesc right now is the . Once I finish all of the c-level elements, I can just put some xi:includes into the archdesc XForm file to show them in the archdesc level. I think those are the major issues for now. As I stated earlier, this is sort of a pre-alpha. The project is open source and available (through svn) to anyone who wants it. http://code.google.com/p/eaditor/ . I have put together an easy package to get the application up and running without difficulty. All you have to do is unzip the download, go into the apache tomcat folder and execute the startup script. This assumes you have nothing running on port 8080 already. Download page: http://code.google.com/p/eaditor/downloads/list Wiki instructions: http://code.google.com/p/eaditor/wiki/QuickstartInstallation?ts=1257887453&updated=QuickstartInstallation Comments, questions, criticism welcome. The editor is a sandbox. Feel free to experiment. Ethan Gruber University of Virginia Library
Re: [CODE4LIB] solr | StopFilterFactory - stopwords.txt
Thanks, Erik, there is no specific reason for their removal, I think this was just that the StopFilterFactory is preconfigured in the analyzer chain for fieldType=text. We will do some performance testing with this filter removed. BTW, a useful tool in deciding appropriate stopwords is the schema browser, which can be found on the /solr/admin page. Here you can see term frequencies for each of the fields sorted from highest frequency to help weed out the terms of little querying value. Eric > Date: Thu, 12 Nov 2009 09:06:46 -0500 > From: erikhatc...@mac.com > Subject: Re: [CODE4LIB] solr | StopFilterFactory - stopwords.txt > To: CODE4LIB@LISTSERV.ND.EDU > > I often recommend against stop word removal altogether. Is there any > reason you need to remove them? > > The primary reason stop words get removed is to increase performance > of queries with very common terms. If you are encountering that, > using Solr's CommonGramsFilter(Factory) is a good solution to keep > your stop words and alleviate the performance degradation potential. > The HathiTrust folks have had success with the common grams capability. > > Erik > > > On Nov 11, 2009, at 3:41 PM, Eric James wrote: > > > Has anyone already given some thought into refining the solr > > stopwords.txt for library collections, particularly finding aids? > > The words included in the out of the box stopwords.txt are of very > > questionable unimportance: > > > > > t that the their then there these they this to was will with> > > > > > > > > We were indexing a field id with "no." as one of its tokens (for > > number), but wanted a query with "no" (where the person did not add > > the period) to find the doc, but in actuality the "no" would get > > stripped by the StopFilterFactory. And thus we stumbled upon this > > list, and was a bit suprised by some of the inclusions (ex:"will"), > > and exclusions( ex:"a"). > > > > > > > > Thanks, > > > > Eric James > > > > Yale University Libraries > >
Re: [CODE4LIB] deadline for preconference proposals
On Thu, Nov 12, 2009 at 10:06 AM, Jonathan Rochkind wrote: > I would suggest that you should plan to have the pre-confs _set_ (not just > proposals received) by the time registration opens. Realistically I usually > register first, and _then_ get my approval to go, figuring if I don't get > approval I can always give up my spot. > > Perhaps this is what you were already planning? Yes, sorry if that wasn't clear. The preconferences would be set by registration time (also because you'd have to sign up for them at that point). Kevin
Re: [CODE4LIB] deadline for preconference proposals
As a practical matter, I need to get my travel approvals and make my travel plans quite in advance. I can only get approval for a pre-conf I know about, so if a pre-conf comes too late, it won't be something I can possibly attend. I expect I am not alone here. I would suggest that you should plan to have the pre-confs _set_ (not just proposals received) by the time registration opens. Realistically I usually register first, and _then_ get my approval to go, figuring if I don't get approval I can always give up my spot. Perhaps this is what you were already planning? As long as the pre-confs are actually determined by registration open, I think it's reasonable. Ideally they'd be determined in advance of registration though. Jonathan Kevin S. Clarke wrote: Someone asked a good question about deadlines for preconference proposals. I didn't set one because I don't expect we'll reach the max number of spaces. We will have to have all the proposals a week or so before registration though. If we start pushing up against the amount of space that we have, I'll call for any last proposals (so that if we need to vote we can... I'm not expecting us to get that many though). Kevin
Re: [CODE4LIB] Transport options from Charlotte to Asheville for c4l2010
Hello - an FYI: if you're planning on flying into Knoxville and making a drive to Asheville via I-40, be advised that there has been a rather large rockslide and 40 is closed. Here's a link to the official update -- http://www.ncdot.org/traffictravel/. Safe travels to all. cheers, Bridger -- Bridger Dyson-Smith Digital Library Initiatives University of Tennessee Libraries On Thu, Nov 12, 2009 at 8:29 AM, John Fereira wrote: > Ross Singer wrote: > >> Likewise, Knoxville is also ~1.5 hours from Asheville. Between >> Greenville, Charlotte and Knoxville you might be able to catch a >> special deal. >> >> > A bit closer is the Tri-Cities airport (Johnson City, Bristol, Kingsport). > I've flown in there a couple of times when my in-laws lived in Johnson > City. It's a real nice drive, about an hour and 20 minutes, from there to > Asheville. > > -- > John Fereira > Cornell University > Twitter: @john_fereira > Google Wave: fere...@googlewave.com >
[CODE4LIB] deadline for preconference proposals
Someone asked a good question about deadlines for preconference proposals. I didn't set one because I don't expect we'll reach the max number of spaces. We will have to have all the proposals a week or so before registration though. If we start pushing up against the amount of space that we have, I'll call for any last proposals (so that if we need to vote we can... I'm not expecting us to get that many though). Kevin
Re: [CODE4LIB] solr | StopFilterFactory - stopwords.txt
I often recommend against stop word removal altogether. Is there any reason you need to remove them? The primary reason stop words get removed is to increase performance of queries with very common terms. If you are encountering that, using Solr's CommonGramsFilter(Factory) is a good solution to keep your stop words and alleviate the performance degradation potential. The HathiTrust folks have had success with the common grams capability. Erik On Nov 11, 2009, at 3:41 PM, Eric James wrote: Has anyone already given some thought into refining the solr stopwords.txt for library collections, particularly finding aids? The words included in the out of the box stopwords.txt are of very questionable unimportance: t that the their then there these they this to was will with> We were indexing a field id with "no." as one of its tokens (for number), but wanted a query with "no" (where the person did not add the period) to find the doc, but in actuality the "no" would get stripped by the StopFilterFactory. And thus we stumbled upon this list, and was a bit suprised by some of the inclusions (ex:"will"), and exclusions( ex:"a"). Thanks, Eric James Yale University Libraries
Re: [CODE4LIB] preconference proposals
On Nov 11, 2009, at 6:46 PM, Naomi Dushay wrote: What do you think about the Solr part having some specific goodies like: +1 to it all! lots on dismax magic how to do fielded searching (author/title/subject) with dismax how to do browsing (termsComponent query, then fielded query to get matching docs) how to do boolean (use lucene QP, or fake it with dismax) Or, use the new Lucid contributed "extended dismax" parser ;) https://issues.apache.org/jira/browse/SOLR-1553 Erik
Re: [CODE4LIB] Transport options from Charlotte to Asheville for c4l2010
Ross Singer wrote: Likewise, Knoxville is also ~1.5 hours from Asheville. Between Greenville, Charlotte and Knoxville you might be able to catch a special deal. A bit closer is the Tri-Cities airport (Johnson City, Bristol, Kingsport). I've flown in there a couple of times when my in-laws lived in Johnson City. It's a real nice drive, about an hour and 20 minutes, from there to Asheville. -- John Fereira Cornell University Twitter: @john_fereira Google Wave: fere...@googlewave.com
Re: [CODE4LIB] big thanks to ND folks
Lovely! Thanks for this, Eric! I've updated the Code4Lib homepage to link back to http://dewey.library.nd.edu/mailing-lists/code4lib/ from Email. :) -Jodi On Tue, Nov 10, 2009 at 4:10 PM, Eric Lease Morgan wrote: > On Nov 10, 2009, at 10:54 AM, Dan Chudnov wrote: > > > A quick interjection to praise the wise folks at ND who host this list > > and whose listserv overlords (or they themselves?) saw fit to upgrade > > the listserv web front-end sometime past year or two. Following > > Eric's page into "subscribers' corner" led me into a screen where I > > could actually set my settings the way I wanted for the first time > > ever. > > > > Maintaining infrastructure for years is no mean feat, and modernizing > > ancient infrastructure is no meaner. Many thanks to our gracious > > hosts for all their years of list support! > > > > Dan, you're welcome, and for everybody else, one of the best places to > begin when it comes to the Code4Lib mailing list is the following: > > http://dewey.library.nd.edu/mailing-lists/code4lib/ > > Fortunately, Google has also well-indexed our community. > > -- > Eric Morgan > University of Notre Dame >
Re: [CODE4LIB] Transport options from Charlotte to Asheville for c4l2010
"Nearby" searches will typically pick up Greenville and Tri Cities (How is Tri Cities to get to?). Kayak allows you to customize the 'nearby airport' search using airport codes. Recommended 'from' search: AVL, GSP, CLT, TYS at http://kayak.com/ -Jodi On Thu, Nov 12, 2009 at 3:06 AM, Ross Singer wrote: > Likewise, Knoxville is also ~1.5 hours from Asheville. Between > Greenville, Charlotte and Knoxville you might be able to catch a > special deal. > > -Ross. > > On Tue, Nov 10, 2009 at 12:09 PM, Jay Luker wrote: > > FYI, on my previous visits to Asheville I've flown into/out of > > Greenville, SC. It's only about 1 hr driving time, so could be worth > > comparing vs. Charlotte shuttles. > > > > --jay > > > > On Tue, Nov 10, 2009 at 11:58 AM, Kevin S. Clarke > wrote: > >> I wonder how many folks would be interested in this? It might be > >> possible to use one of the university vans to make trips. I'll look > >> into this. > >> > >> Kevin > >> > >> > >> > >> On Tue, Nov 10, 2009 at 1:32 AM, Mark A. Matienzo > wrote: > >>> Charlotte to Asheville shuttles are not cheap - one Asheville-based > >>> company is advertising a $170 one-way rate. Obviously, if you were to > >>> split that, it would be cheaper. > >>> > >>> It's my understanding that Charlotte to Asheville driving time is > >>> about 2.5 hours - you may just want to get a group together to rent a > >>> car. > >>> > >>> Mark A. Matienzo > >>> Applications Developer, Strategic Planning > >>> The New York Public Library > >>> > >>> > >>> > >>> On Tue, Nov 10, 2009 at 1:19 AM, Mark Jordan wrote: > Hi, > > Can anyone recommend transportation options to get from Charlotte > International Airport to Asheville? From my neck of the woods airfare to > Charlotte appears to be a ~ $200 cheaper than to Asheville. > > TIA, > > Mark > > >>> > >> > > >