Hi all,

The following is a summary of the discussion from the OpenWayback call 
December 14, 2016. Please feel free to comment here on anything discussed.

1. State of OpenWayback and webarchive-commons

John Erik has done work on URI classes for webarchive-commons (on his 
private GitHub account). He is comfortable in their usefulness to other web 
archiving Java projects. It should better mimic how browsers parse and 
resolve URIs. The changes are an improvement on canonicalization, which has 
primarily been lower casing the URI, and are configurable. It passes 
Heritrix unit tests but is not yet compatible. He plans to make a PR soon.

We are expecting a webarchive-commons release version 1.2.0 early in 2017. 
Currently there are 9 open PRs, 5 are labeled with the 1.2.0 release. Kris 
has looked at all of them and thinks they are okay but hasn't tested them. 
John Erik will further review the PRs in January. #34 
<https://github.com/iipc/webarchive-commons/pull/34> requires a minor 
change in Heritrix if Heritrix will be updating its webarchive-commons 
dependency. Mohamed volunteered to review #63 
<https://github.com/iipc/webarchive-commons/pull/63>. Anyone else is 
invited to review any of the open PRs 
<https://github.com/iipc/webarchive-commons/pulls> and comment.

There are also some webarchive-commons issues marked with a 2.0.0 release. 
At this point, the breaking change that requires a major release is a 
module name change, as John Erik has begun separating the project into 
separate poms, with what is currently in webarchive-commons 1.7 renamed to 
webarchive-commons-core. The API itself has not changed.

There has not been much work on OpenWayback or the Resource Resolver since 
John Erik messaged the OpenWayback Google Group about having an unpolished 
version of the Resource Resolver ready for trial (September 14). There was 
some question on proceeding with work on this regarding issues raised about 
the format of CDXJ: short vs. long names in the JSON block, and whether 
record type should move inside of the JSON block.

Sawood suggested the use of aliases with shorter names. It was decided this 
would add undesirable complexity.

John Erik and Kris support not focusing on defining a CDXJ file format. 
Kris sees the files on disk as particular to whoever is implementing the 
indexing. He supports moving forward with what we have, supporting this 
position with the argument that this is only being used for indexing and 
should be kept separate from interchange.


2. Set an official CDX Server Protocol (carried on from above)
 
We should focus on a well-defined and documented Resource Resolver response 
format for interoperability. The response should have specified documented 
field names, but the CDXJ on disk can be up to the implementer of the 
indexing.


3. Alternative configuration format for OpenWayback

While there is not a desire to re-architect the project without Spring 
configuration, and we would like to retain the great control allowed by 
Spring for experienced users, exposing an additional more simple 
configuration format is welcome. This would be an easier to use 
configuration mechanism for users who just want basic configuration and to 
easily be able to do things like set indexes and resource locations for 
collections.

Kris explained how National and University Library of Iceland achieves this 
sort of thing with an overlay and properties file, so Spring is not touched 
unless doing code work.

https://github.com/ato/wayback-easy is another route that could allow 
configuration via yaml and would allow reusing some of pywb's config format.

Please let us know if you want to work on the implementation for this.


4. AOB

Sawood asked about the possibility of OpenWayback having an embedded web 
server instead of having to be deployed to run in Tomcat. There was 
agreement that this is a trend that makes sense for OpenWayback to follow.


Thank you,
Lauren Ko
UNT Libraries

-- 
You received this message because you are subscribed to the Google Groups 
"openwayback-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to