Jonathan, Bill, 

Very interesting--thanks for the replies. While I'm not sure I understand what 
indexing arbitrary XML into solr might look like, this does prompt me to think 
it would be interesting to look at Trajecting up some EAD (may I use it as a 
verb?) into solr, for finding aid searchability. It is my impression that most 
of the effort in making finding aids searchable is in the indexing, and I'm not 
aware of a general purpose tool / approach for those of us using solr yet, 
though there have been plenty of successful approaches at individual sites. 
(Happy to have my ignorance rectified.)

Mike Giarlo is organizing a DLF hackfest for ArchivesSpace / Hydra integration. 
I wonder if Traject for EAD might be touched on there? 

- Tom



On Oct 15, 2013, at 10:28 AM, Jonathan Rochkind wrote:

> Yep, what Bill said, I have had thoughts of extending it to other types of 
> input too, it was part of my original design goals.
> 
> In particular, I was thinking of extending it to arbitrary XML.
> 
> Unlike MARC, there are many other options for indexing XML into Solr 
> (assuming that's your end goal), so you may or may not find traject to be 
> better than those, although for myself there might be some benefit in using 
> the same tool accross formats too.
> 
> There are a number of built-in 'macros' that are MARC-specific; you wouldn't 
> use those. And might need some others that are, say, XML-specific. (Probably 
> just a single one, extract_xpath, for XML).
> 
> Same could be done for MODS, sure -- or you could handle MODS with a 
> (hypothetical) generic XML setup.
> 
> But yeah, if you want to take input records, and transform them into 
> hash-like data structures -- I was thinking from the start of structuring 
> traject to support such use cases, yep. (If you want to go to something other 
> than a hash-like data structure, well, it might still be possible, but it's 
> straying from traject's target a bit more).
> 
> [Oh, and I just made up 'traject'. I was looking for a word (made up or real) 
> not already being used for any popular software, and thinking about 
> 'projections' in the sense of mathematical transformations; and about 
> 'trajectory' in the sense of things sent through outer space, with the 
> Solr/Solar connection. I actually had originally decided to call it 
> "transject", but then accidentally wrote "traject" when I created the github 
> project, and then figured that was easier to pronounce and write anyhow.]
> 
> On 10/15/13 1:02 PM, Bill Dueber wrote:
>> 'traject' means "to transmit" (e.g., "trajectory") -- or at least it did,
>> when people still used it, which they don't.
>> 
>> The traject workflow is incredibly general: *a reader* sends *a record* to 
>> *an
>> indexing routine* which stuffs...stuff...into a context object which is
>> then sent to *a writer*. We have a few different MARC readers, a few useful
>> writers (one of which, obviously, is the solr writer), and a bunch of
>> shipped routines (which we're calling "macros" but are just well-formed
>> ruby lambda or blocks) for extracting and transforming common MARC data.
>> 
>> [see
>> http://robotlibrarian.billdueber.com/announcing-traject-indexing-software/for
>> more explanation and some examples]
>> 
>> But there's no reason why a reader couldn't produce a MODS record which
>> would then be worked on. I'm already imagining readers and writers that
>> target databases (RDBMS or NoSQL), or a queueing system like Hornet, etc.
>> 
>> If there are people at Stanford that want to talk about how (easy it is) to
>> extend traject, I'd be happy to have that conversation.
>> 
>> 
>> 
>> On Tue, Oct 15, 2013 at 12:28 PM, Tom Cramer <[email protected]> wrote:
>> 
>>> ++ Jonathan and Bill.
>>> 
>>> 1.) Do you have any thoughts on extending traject to index other types of
>>> data--say MODS--into solr, in the future?
>>> 
>>> 2.) What's the etymology of 'traject'?
>>> 
>>> - Tom
>>> 
>>> 
>>> On Oct 14, 2013, at 8:53 AM, Jonathan Rochkind wrote:
>>> 
>>>> Jonathan Rochkind (Johns Hopkins) and Bill Dueber (University of
>>> Michigan), are happy to announce a robust, feature-complete beta release of
>>> "traject," a tool for indexing MARC data to Solr.
>>>> 
>>>> traject, in the vein of solrmarc, allows you to define your indexing
>>> rules using simple macro and translation files. However, traject runs under
>>> JRuby and is "ruby all the way down," so you can easily provide additional
>>> logic by simply requiring ruby files.
>>>> 
>>>> There's a sample configuration file to give you a feel for traject[1].
>>>> 
>>>> You can view the code[2] on github, and easily install it as a (jruby)
>>> gem using "gem install traject".
>>>> 
>>>> traject is in a beta release hoping for feedback from more testers prior
>>> to a 1.0.0 release, but it is already being used in production to generate
>>> the HathiTrust (metadata-lookup) Catalog (http://www.hathitrust.org/).
>>> traject was developed using a test-driven approach and has undergone both
>>> continuous integration and an extensive benchmarking/profiling period to
>>> keep it fast. It is also well covered by high-quality documentation.
>>>> 
>>>> Feedback is very welcome on all aspects of traject including
>>> documentation, ease of getting started, features, any problems you have,
>>> etc.
>>>> 
>>>> What we think makes traject great:
>>>> 
>>>> * It's all just well-crafted and documented ruby code; easy to program,
>>> easy to read, easy to modify (the whole code base is only 6400 lines of
>>> code, more than a third of which is tests)
>>>> * Fast. Traject by default indexes using multiple threads, so you can
>>> use all your cores!
>>>> * Decoupled from specific readers/writers, so you can use ruby-marc or
>>> marc4j to read, and write to solr, a debug file, or anywhere else you'd
>>> like with little extra code.
>>>> * Designed so it's easy to test your own code and distribute it as a gem
>>>> 
>>>> We're hoping to build up an ecosystem around traject and encourage
>>> people to ask questions and contribute code (either directly to the project
>>> or via releasing plug-in gems).
>>>> 
>>>> [1]
>>> https://github.com/traject-project/traject/blob/master/test/test_support/demo_config.rb
>>>> [2] http://github.com/traject-project/traject
>>> 
>> 
>> 
>> 

Reply via email to