In Catmandu you can do this with this script (which will also filter out all
valid ISSN numbers)…
# cpanm Catmandu Catmandu::Identifier
$ cat myfix.txt
marc_map('***',text.$append)
filter(text,'(\b\d{4}-?\d{3}[\dxX]\b)')
replace_all(text.*,'.*(\b\d{4}-?\d{3}[\dxX]\b).*',$1)
do list(path:text)
unless is_valid_issn(.)
reject()
end
end
vacuum()
select exists(text)
join_field(text,' ; ')
retain(_id,text)
$ catmandu convert MARC to CSV --fix myfix.fix < data.mrc
Patrick
> On 2 Nov 2016, at 11:29, Sergio Letuche <[email protected]> wrote:
>
> thank you very much
>
> 2016-11-02 12:28 GMT+02:00 Ben Soares <[email protected]>:
> Hi Sergio,
>
> Try
>
> ^\d{4}-\d{3}[\dxX]$
>
> if you know that they will always be formatted with a hyphen in the middle, or
>
> ^\d{4}-?\d{3}[\dxX]$
>
> if you can't be sure of that.
>
> (and if you're interested in spotting ISSNs in the middle of a field use
> \b\d{4}-?\d{3}[\dxX]\b
> but beware this also finds year ranges [e.g. 1990-2000]!)
>
> Ben
>
>
> On Wednesday, 2 November 2016 12:06:15 GMT Sergio Letuche wrote:
> > Thank you dear Stefano,
> >
> > i am aware of this module, it works great.
> >
> > But my problem is, what clever regex to use, in order to identify if a
> > subfield's content, is an ISSN number. Say our mrc has ISSN numbers thrown
> > in any tag you could imagine...
> >
> > So my approach, would be, to search the whole mrc, but i do non know which
> > regex to use...
> >
> > 2016-11-02 11:52 GMT+02:00 Stefano Bargioni <[email protected]>:
> > > Hi, Sergio:
> > > you can try MARCgrep http://en.pusc.it/bib/MARCgrep.
> > > Its help is:
> > >
> > > MARCgrep.pl
> > >
> > > Extracts MARC records that match a condition on fields. Count and
> > > invert are available.
> > >
> > > SYNOPSIS
> > >
> > > MARCgrep.pl [options] [-e condition] file.mrc
> > >
> > > Options:
> > > -h print this help message and exit
> > > -c count only
> > > -e condition
> > > -f comma separated list of fields to print
> > > -o output format "marc" | "line" | "INLINE"
> > > -s separator string for condition, default ","
> > > -v invert match
> > >
> > > Condition:
> > > -e 'tag,indicator1,indicator2,subfield,value'
> > >
> > > OPTIONS
> > >
> > > -h Print this message and exit.
> > >
> > > -c Count and print number of matching records
> > >
> > > -e The condition to match in the record.
> > >
> > > For data fields, the syntax is:
> > > tag,indicator1,indicator2,subfield,value
> > >
> > > where tag, indicator1, indicator2, subfield, and value are
> > >
> > > regular expressions patterns.
> > >
> > > Do not put spaces around the separators.
> > >
> > > For control fields, the syntax is:
> > > tag,pos1,pos2,value
> > >
> > > where tag starts with '00' (use '000' or 'LDR' for
> > >
> > > leader), pos1 is the starting position,
> > >
> > > pos2 is the ending position, both 0-based. Value is a
> > >
> > > regular expression.
> > >
> > > Default condition (-e not specified) matches any data
> > >
> > > field.
> > >
> > > For control fields, only the tag is mandatory.
> > >
> > > Examples: -e '100,,,a,^A' will match records that contain
> > >
> > > 100$a starting with 'A'
> > >
> > > -e '008,35,37,(ita|eng)' will match records with
> > >
> > > language ita or eng in 008
> > >
> > > -e '(1|7)(0|1)(0|1),,2' will match
> > >
> > > 100,110,111,700,710,711 with ind2=2
> > >
> > > -f Comma separated list of fields (tags) to print if output
> > >
> > > format
> > >
> > > is "line" or "inline". Default is any field.
> > >
> > > Note that if a tag is preceded by '#' sign (like in
> > >
> > > '#nnn'), a
> > >
> > > count of occurrences will be printed instead.
> > >
> > > Examples: -f '100,245' will print field 100 and 245
> > >
> > > -f '400,#400' will print all occurrences of 400
> > >
> > > field as well as the number of its occurrences
> > >
> > > -o Output format: "marc" for ISO2709, "line" for each subfield
> > >
> > > in
> > >
> > > a line, "inline" (default) for each field in a line.
> > >
> > > -s Specify a string separator for condition. Default is ','.
> > >
> > > -v Invert the sense of matching, to select non-matching
> > >
> > > records.
> > >
> > > -V Print the version and exit.
> > >
> > > file.mrc
> > >
> > > The mandatory ISO2709 file to read. Can be STDIN, '-'.
> > >
> > > DESCRIPTION
> > >
> > > Like grep, the famous Unix utility, MARCgrep.pl allows to filter
> > >
> > > MARC
> > >
> > > bibliographic
> > >
> > > records based on conditions on tag, indicators, and field value.
> > >
> > > Conditions can be applied to data fields, control fields or the
> > >
> > > leader.
> > >
> > > In case of data fields, the condition can specify tag, indicators,
> > > subfield and value using regular
> > >
> > > expressions. In case of control fields, the condition must contain
> > >
> > > the
> > >
> > > tag name, the starting
> > >
> > > and ending position (both 0-based), and a regular expressions for
> > >
> > > the
> > >
> > > value.
> > >
> > > Options -c and -v allow respectively to count matching records and
> > >
> > > to
> > >
> > > invert the match.
> > >
> > > If option -c is not specified, the output format can be "line" or
> > > "inline" (both human readable),
> > >
> > > or "marc" for MARC binary (ISO2709). For formats "line" or
> > >
> > > "inline",
> > >
> > > the -f option allows to specify
> > >
> > > fields to print.
> > >
> > > You can chain more conditions using
> > >
> > > ./MARCGgrep.pl -o marc -e condition1 file.mrc | ./MARCGgrep.pl -e
> > > condition2 -
> > >
> > > KNOWN ISSUES
> > >
> > > Performance.
> > >
> > > Accepts and returns only UTF-8.
> > >
> > > Checks are case sensitive.
> > >
> > > AUTHOR
> > >
> > > Pontificia Universita' della Santa Croce <http://www.pusc.it/bib/>
> > >
> > > Stefano Bargioni <[email protected]>
> > >
> > > SEE ALSO
> > >
> > > marktriggs / marcgrep at <https://github.com/marktriggs/marcgrep>
> > >
> > > for
> > >
> > > filtering large data sets
> > > >
> > > > On 02 nov 2016, at 09:57, Sergio Letuche <[email protected]>
> > >
> > > wrote:
> > > > Hello community,
> > > >
> > > > how would you treat the following?
> > > >
> > > > I need a way to identify all tags - subfields, that have stored an ISSN
> > >
> > > number in them.
> > >
> > > > What would you suggest as a clever approach for this?
> > > >
> > > > Thank you
>
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>
Patrick Hochstenbach - digital architect
University Library Ghent
Sint-Hubertusstraat 8 - 9000 Ghent - Belgium
[email protected]
+32 (0)9 264 7980