Hi Ben,

Wayback only makes single open range requests:

https://github.com/iipc/openwayback/blob/829693b6d43f40b1b045f08611a4fa5e27395e29/wayback-core/src/main/java/org/archive/wayback/resourcestore/resourcefile/TimeoutArchiveReaderFactory.java#L68

So you can skip implementing multiple ranges if you like.

Cheers,

Alex

________________________________
From: [email protected] [[email protected]] on 
behalf of Ben O'Brien [[email protected]]
Sent: Thursday, 15 September 2016 6:28 PM
To: openwayback-dev
Subject: Re: [openwayback-dev] Configuring a remote ResourceStore

Hi Alex,

Thanks, its starting to make a bit more sense now.

I notice your implementation supports multiple range requests, does openwayback 
send multi-range requests?


Cheers,
Ben

On Tuesday, September 13, 2016 at 12:20:36 PM UTC+12, Alex Osborne wrote:
Hi Ben,

There's an example in RemoteCollection.xml.

https://github.com/iipc/openwayback/blob/master/wayback-webapp/src/main/webapp/WEB-INF/RemoteCollection.xml#L33

Note that you can configure the resourceStore independently of the 
resourceIndex. So if you want to use a local CDX resourceIndex with a remote 
resourceStore just put the appropriate stanzas from both example 
CDXCollection.xml and RemoteCollection.xml in the one WaybackCollection.

Note also that the server for the resource store should support HTTP 1.1 range 
requests. This is so that Wayback can retrieve just the record it's interested 
in and not the whole WARC file. Most regular web servers like Apache and nginx 
will do this out of the box but if you implement your own servlet it's 
something you'll need to take care of. A common scenario is a servlet proxying 
to multiple backend servers that have the actual files. In that case just make 
sure to also proxy the request and response headers and status code. If your 
servlet is to serve the files directly off disk or via say calls to a 
preservation system API you might need to take care of that range headers 
yourself.

Here's the relevant RFC for range requests:

https://tools.ietf.org/html/rfc7233

My implementation, which currently looks up the path in a database and serves 
from disk is here:

https://github.com/nla/bamboo/blob/32d7f2e/ui/src/bamboo/crawl/WarcsController.java#L132

Cheers,

Alex



On Monday, September 12, 2016 at 9:15:56 AM UTC+10, Ben O'Brien wrote:
Hi Lauren,

Thanks for your relpy.

Not exactly, I want to handle that 'path-index' functionality separately from 
OW.
I was hoping I could write a servlet to act as the remote resource store to OW, 
which will look up the warc location on the fly. I see your point about serving 
the warcs via a webserver and using the path-index file with URLs. But it 
seemed nicer (in my head) if I could just serve the warc location via an 
external service, removing the path-index flat file step altogether.

The context is that we are trying to use OW as a viewer from our preservation 
system, which has a growing web archive. For a growing collection the remote 
resource store seemed more of a fit than using a path-index file.


Cheers,
Ben



On Friday, September 9, 2016 at 8:24:32 AM UTC+12, Lauren Ko wrote:
Hi Ben,
If you are using a FlatFileResourceFileLocationDB as described here 
https://github.com/iipc/openwayback/wiki/How-to-configure#telling-openwayback-where-to-find-your-arc-and-warc-files
 , in your path-index.txt file you would put the URL to where the ARC/WARC 
files are being served instead of just a local path. Then you can serve the 
WARC files via whatever web server, such as Apache, from wherever you want. Is 
that what you are wanting to do?

Lauren Ko
UNT Libraries

On Mon, Sep 5, 2016 at 7:22 PM, Ben O'Brien <[email protected]> wrote:
Hello all,


I've found myself wanting to setup and test a remote resource store in 
openwayback recently. Initially I was excited to see a link on the 
Advanced-configuration wiki page 'Configuring a remote ResourceStore'....only 
to find it was a placeholder :(

So in the interest of generating some content for that page - does anybody have 
an example of configuring a remote ResourceStore?


Cheers,
Ben

--
You received this message because you are subscribed to the Google Groups 
"openwayback-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to a topic in the Google 
Groups "openwayback-dev" group.
To unsubscribe from this topic, visit 
https://groups.google.com/d/topic/openwayback-dev/XmpUvhOQn1w/unsubscribe.
To unsubscribe from this group and all its topics, send an email to 
[email protected]<mailto:[email protected]>.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"openwayback-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to