Hi Lauren, Thanks for testing and finding bugs. I'm sure there are a lot more :-)
For the cdx-cli tool: I found the bug with the pipe symbol, it should work now. I think I also have fixed the bugs in the Resource resolver. I also updated the readme to reflect that Java 8 is needed. I totally agree that error messages from the Resource Resolver with incorrect input, needs better handling. It's on my todo list. For the problem with invalid cdx-files, it will now log a warning and skip the invalid file instead of aborting. Thanks, John Erik fredag 16. september 2016 21.18.58 UTC+2 skrev Lauren Ko følgende: > > Hi John Erik, > Thanks for all your work on the Resource Resolver and the cdx-cli. I tried > them both successfully. I noticed a few things, but nothing major. > > For the Resource Resolver I basically just did what was documented in the > README: queried both /resource and /resourcelist, used old-style CDX and > CDXJ, tried the various parameters listed, sent request headers for the > different Accept values. Here are the issues I encountered (all were easily > overcome). > > When first trying to start up with > openwayback-resource-resolver-3.0.0-SNAPSHOT/bin/warr I got: > Exception in thread "main" java.lang.UnsupportedClassVersionError: > org/netpreserve/resource/resolver/Main : Unsupported major.minor version > 52.0 > > - I set $JAVA_HOME to Java 8 instead of the Java 7 that I had set by > default on my machine. > > > Then trying to start again I got: > java.lang.IllegalArgumentException: > /tmp/warr/openwayback-resource-resolver-3.0.0-SNAPSHOT/cdx/index.cdx is not > a recognized CDX format > > - I remembered OpenWayback 3 requires SURT-formatted CDX files, so I > grabbed a SURT-formatted file. > > > Tried to start again: > 10:51:48.707 [main] INFO org.netpreserve.commons.cdx.CdxSourceFactory - > Loaded CDX Source Factory for scheme 'cdxfile' > 10:51:48.712 [main] INFO > org.netpreserve.commons.cdx.cdxsource.CdxFileSourceFactory - Adding all > files in '/tmp/warr/openwayback-resource-resolver-3.0.0-SNAPSHOT/cdx' as > cdx sources > 10:51:48.713 [main] INFO > org.netpreserve.commons.cdx.cdxsource.CdxFileSourceFactory - Adding file > '/tmp/warr/openwayback-resource-resolver-3.0.0-SNAPSHOT/cdx/index_IA_surt.cdx' > > as a cdx source > Exception in thread "main" java.lang.IllegalArgumentException: Negative > position > at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:670) > at > org.netpreserve.commons.cdx.cdxsource.CdxFileDescriptor.<init>(CdxFileDescriptor.java:70) > at > org.netpreserve.commons.cdx.cdxsource.CdxFileDescriptor.<init>(CdxFileDescriptor.java:55) > at > org.netpreserve.commons.cdx.cdxsource.CdxFileSourceFactory.createCdxSource(CdxFileSourceFactory.java:71) > at > org.netpreserve.commons.cdx.CdxSourceFactory.getCdxSource(CdxSourceFactory.java:62) > at > org.netpreserve.resource.resolver.settings.SettingsUtil.lambda$createCdxSource$0(SettingsUtil.java:38) > at > org.netpreserve.resource.resolver.settings.SettingsUtil$$Lambda$1/1279271200.apply(Unknown > > Source) > at > java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) > at > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1359) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:512) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:502) > at > java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) > at > org.netpreserve.resource.resolver.settings.SettingsUtil.createCdxSource(SettingsUtil.java:40) > at > org.netpreserve.resource.resolver.ResourceResolverServer.<init>(ResourceResolverServer.java:69) > at org.netpreserve.resource.resolver.Main.main(Main.java:33) > > - Turns out the first SURT-formatted CDX file I grabbed was 30GB and > seemed to be too big to handle. I fed the first 1,000,000 lines to a new > CDX file (359MB) and then it worked: Resource Resolver (v. 3.0.0-SNAPSHOT) > started. > > > I tried doing some searches, stopped the Resource Resolver, and upon > trying to restart it I got: > Exception in thread "main" java.lang.IllegalArgumentException: > /tmp/warr/openwayback-resource-resolver-3.0.0-SNAPSHOT/cdx/.out.cdx.swp is > not a recognized CDX format > > - At some point my system had created a .out.cdx.swp (my cdx file was > called out.cdx). Not sure if Resource Resolver should ignore dot files or > if it should just be up to the user to handle this sort of issue. > > > - For a date range query, Resource Resolver did not include the exact > start time match (README says start date is inclusive) when precision is > down to the second. For example: > > http://localhost:8080/resourcelist/http%3A%2F%2Fwww.basketball.com%2Frobots.txt?date=2012-10-14T03:18:37,2013 > > - Does not give me the entry in my CDX file with exact timestamp > 2012-10-14T03:18:37. > > > - Also relating to timestamp, but maybe not a problem with the application > itself, in the README, it says "The time stamp can be in either WARC-format > (e.g. 2016-02-05T45:42:00Z)..." In my initial testing of things I copy and > pasted that timestamp without thinking to my request URL and got a 500 > error before realizing that example is not a valid time. My mistake, but > perhaps the example timestamp formats should be changed in the README. > Also, should the invalid time be handled so it doesn't throw a 500? > > > I also tried out cdx-cli to get a CDXJ formatted index. I used both the > reformat and extract commands. I very much appreciate the thorough usage > instructions that will print at the command line. I did have one issue in > trying to convert an existing CDX file: > > | (pipe character) in URLs (but not in the query string) in the CDX file I > was trying to convert (status codes in CDX were 404s for these URLs) would > error and the reformatting process would stop. > > $ cdxcli-1.0.0-SNAPSHOT/bin/cdxcli reformat -o > ../openwayback-resource-resolver-3.0.0-SNAPSHOT/cdx/ -f cdxj -s -i out.cdx > Reformatting: out.cdx into: > ../openwayback-resource-resolver-3.0.0-SNAPSHOT/cdx/out.cdxj > Illegal path: http://youtu.be/csorZustZbo| > > That is what I found in initial testing. Overall it worked well. Thanks > again! > > Lauren Ko > UNT Libraries > > On Wed, Sep 14, 2016 at 9:20 AM, John Erik Halse <[email protected] > <javascript:>> wrote: > >> Hi all, >> >> A very early version of the Resource Resolver (aka CDX server) is ready >> for testing and feedback. >> Have a look here for the details: >> https://github.com/iipc/openwayback/tree/3.0.0-DEV/openwayback-resource-resolver >> >> Since the Resource Resolver also supports the current CDX file format, >> you can test it right away, but if you want to use the new format, a tool >> is available here: >> https://github.com/iipc/cdx-cli >> >> Best, >> >> John Erik Halse >> >> -- >> You received this message because you are subscribed to the Google Groups >> "openwayback-dev" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "openwayback-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
