thank you very much 2016-11-02 12:28 GMT+02:00 Ben Soares <ben.soa...@ed.ac.uk>:
> Hi Sergio, > > Try > > ^\d{4}-\d{3}[\dxX]$ > > if you know that they will always be formatted with a hyphen in the > middle, or > > ^\d{4}-?\d{3}[\dxX]$ > > if you can't be sure of that. > > (and if you're interested in spotting ISSNs in the middle of a field use > \b\d{4}-?\d{3}[\dxX]\b > but beware this also finds year ranges [e.g. 1990-2000]!) > > Ben > > > On Wednesday, 2 November 2016 12:06:15 GMT Sergio Letuche wrote: > > Thank you dear Stefano, > > > > i am aware of this module, it works great. > > > > But my problem is, what clever regex to use, in order to identify if a > > subfield's content, is an ISSN number. Say our mrc has ISSN numbers > thrown > > in any tag you could imagine... > > > > So my approach, would be, to search the whole mrc, but i do non know > which > > regex to use... > > > > 2016-11-02 11:52 GMT+02:00 Stefano Bargioni <bargi...@pusc.it>: > > > Hi, Sergio: > > > you can try MARCgrep http://en.pusc.it/bib/MARCgrep. > > > Its help is: > > > > > > MARCgrep.pl > > > > > > Extracts MARC records that match a condition on fields. Count > and > > > invert are available. > > > > > > SYNOPSIS > > > > > > MARCgrep.pl [options] [-e condition] file.mrc > > > > > > Options: > > > -h print this help message and exit > > > -c count only > > > -e condition > > > -f comma separated list of fields to print > > > -o output format "marc" | "line" | "INLINE" > > > -s separator string for condition, default "," > > > -v invert match > > > > > > Condition: > > > -e 'tag,indicator1,indicator2,subfield,value' > > > > > > OPTIONS > > > > > > -h Print this message and exit. > > > > > > -c Count and print number of matching records > > > > > > -e The condition to match in the record. > > > > > > For data fields, the syntax is: > > > tag,indicator1,indicator2,subfield,value > > > > > > where tag, indicator1, indicator2, subfield, and value > are > > > > > > regular expressions patterns. > > > > > > Do not put spaces around the separators. > > > > > > For control fields, the syntax is: > > > tag,pos1,pos2,value > > > > > > where tag starts with '00' (use '000' or 'LDR' for > > > > > > leader), pos1 is the starting position, > > > > > > pos2 is the ending position, both 0-based. Value is a > > > > > > regular expression. > > > > > > Default condition (-e not specified) matches any data > > > > > > field. > > > > > > For control fields, only the tag is mandatory. > > > > > > Examples: -e '100,,,a,^A' will match records that > contain > > > > > > 100$a starting with 'A' > > > > > > -e '008,35,37,(ita|eng)' will match records > with > > > > > > language ita or eng in 008 > > > > > > -e '(1|7)(0|1)(0|1),,2' will match > > > > > > 100,110,111,700,710,711 with ind2=2 > > > > > > -f Comma separated list of fields (tags) to print if output > > > > > > format > > > > > > is "line" or "inline". Default is any field. > > > > > > Note that if a tag is preceded by '#' sign (like in > > > > > > '#nnn'), a > > > > > > count of occurrences will be printed instead. > > > > > > Examples: -f '100,245' will print field 100 and 245 > > > > > > -f '400,#400' will print all occurrences of > 400 > > > > > > field as well as the number of its occurrences > > > > > > -o Output format: "marc" for ISO2709, "line" for each > subfield > > > > > > in > > > > > > a line, "inline" (default) for each field in a line. > > > > > > -s Specify a string separator for condition. Default is > ','. > > > > > > -v Invert the sense of matching, to select non-matching > > > > > > records. > > > > > > -V Print the version and exit. > > > > > > file.mrc > > > > > > The mandatory ISO2709 file to read. Can be STDIN, '-'. > > > > > > DESCRIPTION > > > > > > Like grep, the famous Unix utility, MARCgrep.pl allows to filter > > > > > > MARC > > > > > > bibliographic > > > > > > records based on conditions on tag, indicators, and field > value. > > > > > > Conditions can be applied to data fields, control fields or the > > > > > > leader. > > > > > > In case of data fields, the condition can specify tag, > indicators, > > > subfield and value using regular > > > > > > expressions. In case of control fields, the condition must > contain > > > > > > the > > > > > > tag name, the starting > > > > > > and ending position (both 0-based), and a regular expressions > for > > > > > > the > > > > > > value. > > > > > > Options -c and -v allow respectively to count matching records > and > > > > > > to > > > > > > invert the match. > > > > > > If option -c is not specified, the output format can be "line" > or > > > "inline" (both human readable), > > > > > > or "marc" for MARC binary (ISO2709). For formats "line" or > > > > > > "inline", > > > > > > the -f option allows to specify > > > > > > fields to print. > > > > > > You can chain more conditions using > > > > > > ./MARCGgrep.pl -o marc -e condition1 file.mrc | ./MARCGgrep.pl > -e > > > condition2 - > > > > > > KNOWN ISSUES > > > > > > Performance. > > > > > > Accepts and returns only UTF-8. > > > > > > Checks are case sensitive. > > > > > > AUTHOR > > > > > > Pontificia Universita' della Santa Croce < > http://www.pusc.it/bib/> > > > > > > Stefano Bargioni <bargi...@pusc.it> > > > > > > SEE ALSO > > > > > > marktriggs / marcgrep at <https://github.com/ > marktriggs/marcgrep> > > > > > > for > > > > > > filtering large data sets > > > > > > > > On 02 nov 2016, at 09:57, Sergio Letuche <code4libus...@gmail.com> > > > > > > wrote: > > > > Hello community, > > > > > > > > how would you treat the following? > > > > > > > > I need a way to identify all tags - subfields, that have stored an > ISSN > > > > > > number in them. > > > > > > > What would you suggest as a clever approach for this? > > > > > > > > Thank you > > > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > >