thank you very much for your reply. Sent copy and paste the line and I get the same result This is the scripts. #!/bin/bash # Find all ARC/WARC files ARCHIVE_BASE_DIR=/opt/warc; TARGET_FILE=file; tempfile="$TARGET_FILE.tmp"; unset a i while IFS= read -r -d $'\0' file; do archive=$(basename $file); echo -e "$archive\t$file" >> $tempfile; done < <(find $ARCHIVE_BASE_DIR -type f -regex ".*\.w?arc\.gz$" -print0) # Now sort the file export LC_ALL=C; sort $tempfile > $TARGET_FILE; rm $tempfile ~
Is there any program to generate scripts or files * .CDX and path-index.txt ? thanks --Carlos El miércoles, 26 de octubre de 2016, 6:34:52 (UTC-3), Kristinn Sigurðsson escribió: > > Hi Carlos, > > > > Just to be clear, the script you are referring to (at the bottom of > https://github.com/iipc/openwayback/wiki/How-to-configure) is only to > create the ‘path-index’ file. I.e. the file that maps WARC and ARC > filenames to actual URIs (either via HTTP or on the local filesystem). > > > > The script is also provided as an example only. It may not be suitable for > everyone’s needs. > > > > In your case, you seem to have made a mistake in copying this line: > > > > done < <(find $ARCHIVE_BASE_DIR -type f -regex ".*\.w?arc\.gz$" -print0) > > > > Note the space following the first and second ‘<’ > > That space is vitally important! > > > > Best, > > Kris > > > > Landsbókasafn Íslands - Háskólabókasafn | Arngrímsgötu 3 - 107 Reykjavík > Sími/Tel: +354 5255600 | www.landsbokasafn.is <http://landsbokasafn.is/> > Leiddu hugann að umhverfinu áður en þú prentar út tölvupóst > > Fyrirvari / Disclaimer <http://fyrirvari.landsbokasafn.is> > > *From:* [email protected] <javascript:> [mailto: > [email protected] <javascript:>] *On Behalf Of *Carlos Córdova > *Sent:* 26. október 2016 03:52 > *To:* openwayback-dev > *Subject:* [openwayback-dev] CDX - INDEX > > > > Hello friends. > > > > I'm trying to will configure a redhat server CDX according settings: > > > > https://github.com/iipc/openwayback/wiki/How-to-configure > > > > WARC's have my files in /opt /warc: > > > > [root@webarchive-testing WARC]# ls -l /opt/warc/ > > > > archivo1.warc.gz > > archivo2.warc.gz > > cdx-index/ > > cdx.sh > > path-index.txt > > > > copy the scripts from the installation page: > > > > > https://github.com/iipc/openwayback/wiki/How-to-configure#telling-openwayback-where-to-find-your-arc-and-warc-files > > > > but it shows me the following error: > > > > [root@webarchive-testing WARC]# sh -x cdx.sh > > + ARCHIVE_BASE_DIR = /opt/warc > > + TARGET_FILE = file > > + Tempfile = file.tmp > > + Unset to i > > cdx.sh: line 14: syntax error near unexpected token `<' > > cdx.sh: line 14: `done <<($ ARCHIVE_BASE_DIR find -type f -regex -print0" > * \ w arc \ .gz $..? ") ' > > > > I have set the export in my /etc/profile: > > > > [root@webarchive-testing WARC]# cat /etc/profile | grep "LC" > > export LC_ALL = C > > > > I can target that does not work the scripts. > > > > Greetings. > > > > --Carlos > > -- > You received this message because you are subscribed to the Google Groups > "openwayback-dev" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] <javascript:>. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "openwayback-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
