Hi Alex, Thanks, its starting to make a bit more sense now.
I notice your implementation supports multiple range requests, does openwayback send multi-range requests? Cheers, Ben On Tuesday, September 13, 2016 at 12:20:36 PM UTC+12, Alex Osborne wrote: > > Hi Ben, > > There's an example in RemoteCollection.xml. > > > https://github.com/iipc/openwayback/blob/master/wayback-webapp/src/main/webapp/WEB-INF/RemoteCollection.xml#L33 > > Note that you can configure the resourceStore independently of the > resourceIndex. So if you want to use a local CDX resourceIndex with a > remote resourceStore just put the appropriate stanzas from both example > CDXCollection.xml and RemoteCollection.xml in the one WaybackCollection. > > Note also that the server for the resource store should support HTTP 1.1 > range requests. This is so that Wayback can retrieve just the record it's > interested in and not the whole WARC file. Most regular web servers like > Apache and nginx will do this out of the box but if you implement your own > servlet it's something you'll need to take care of. A common scenario is a > servlet proxying to multiple backend servers that have the actual files. In > that case just make sure to also proxy the request and response headers and > status code. If your servlet is to serve the files directly off disk or via > say calls to a preservation system API you might need to take care of that > range headers yourself. > > Here's the relevant RFC for range requests: > > https://tools.ietf.org/html/rfc7233 > > My implementation, which currently looks up the path in a database and > serves from disk is here: > > > https://github.com/nla/bamboo/blob/32d7f2e/ui/src/bamboo/crawl/WarcsController.java#L132 > > Cheers, > > Alex > > > > On Monday, September 12, 2016 at 9:15:56 AM UTC+10, Ben O'Brien wrote: >> >> Hi Lauren, >> >> Thanks for your relpy. >> >> Not exactly, I want to handle that 'path-index' functionality separately >> from OW. >> I was hoping I could write a servlet to act as the remote resource store >> to OW, which will look up the warc location on the fly. I see your point >> about serving the warcs via a webserver and using the path-index file with >> URLs. But it seemed nicer (in my head) if I could just serve the warc >> location via an external service, removing the path-index flat file step >> altogether. >> >> The context is that we are trying to use OW as a viewer from our >> preservation system, which has a growing web archive. For a growing >> collection the remote resource store seemed more of a fit than using a >> path-index file. >> >> >> Cheers, >> Ben >> >> >> >> On Friday, September 9, 2016 at 8:24:32 AM UTC+12, Lauren Ko wrote: >>> >>> Hi Ben, >>> If you are using a FlatFileResourceFileLocationDB as described here >>> https://github.com/iipc/openwayback/wiki/How-to-configure#telling-openwayback-where-to-find-your-arc-and-warc-files >>> >>> , in your path-index.txt file you would put the URL to where the ARC/WARC >>> files are being served instead of just a local path. Then you can serve the >>> WARC files via whatever web server, such as Apache, from wherever you want. >>> Is that what you are wanting to do? >>> >>> Lauren Ko >>> UNT Libraries >>> >>> On Mon, Sep 5, 2016 at 7:22 PM, Ben O'Brien <[email protected]> wrote: >>> >>>> Hello all, >>>> >>>> >>>> I've found myself wanting to setup and test a remote resource store in >>>> openwayback recently. Initially I was excited to see a link on the >>>> Advanced-configuration wiki page 'Configuring a remote >>>> ResourceStore'....only to find it was a placeholder :( >>>> >>>> So in the interest of generating some content for that page - does >>>> anybody have an example of configuring a remote ResourceStore? >>>> >>>> >>>> Cheers, >>>> Ben >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "openwayback-dev" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- You received this message because you are subscribed to the Google Groups "openwayback-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
