Re: identify ISSN numbers in an mrc file

Ben Soares Wed, 02 Nov 2016 03:28:40 -0700

Hi Sergio,

Try


^\d{4}-\d{3}[\dxX]$

if you know that they will always be formatted with a hyphen in the middle, or

^\d{4}-?\d{3}[\dxX]$

if you can't be sure of that.

(and if you're interested in spotting ISSNs in the middle of a field use
\b\d{4}-?\d{3}[\dxX]\b
but beware this also finds year ranges [e.g. 1990-2000]!)

Ben


On Wednesday, 2 November 2016 12:06:15 GMT Sergio Letuche wrote:
> Thank you dear Stefano,
> 
> i am aware of this module, it works great.
> 
> But my problem is, what clever regex to use, in order to identify if a
> subfield's content, is an ISSN number. Say our mrc has ISSN numbers thrown
> in any tag you could imagine...
> 
> So my approach, would be, to search the whole mrc, but i do non know which
> regex to use...
> 
> 2016-11-02 11:52 GMT+02:00 Stefano Bargioni <[email protected]>:
> > Hi, Sergio:
> > you can try MARCgrep http://en.pusc.it/bib/MARCgrep.
> > Its help is:
> > 
> > MARCgrep.pl
> > 
> >        Extracts MARC records that match a condition on fields. Count and
> >        invert are available.
> > 
> > SYNOPSIS
> > 
> >        MARCgrep.pl [options] [-e condition] file.mrc
> >        
> >         Options:
> >           -h   print this help message and exit
> >           -c   count only
> >           -e   condition
> >           -f   comma separated list of fields to print
> >           -o   output format "marc" | "line" | "INLINE"
> >           -s   separator string for condition, default ","
> >           -v   invert match
> >         
> >         Condition:
> >           -e  'tag,indicator1,indicator2,subfield,value'
> > 
> > OPTIONS
> > 
> >        -h      Print this message and exit.
> >        
> >        -c      Count and print number of matching records
> >        
> >        -e      The condition to match in the record.
> >        
> >                 For data fields, the syntax is:
> >                   tag,indicator1,indicator2,subfield,value
> >                 
> >                 where tag, indicator1, indicator2, subfield, and value are
> > 
> > regular expressions patterns.
> > 
> >                 Do not put spaces around the separators.
> >                 
> >                 For control fields, the syntax is:
> >                   tag,pos1,pos2,value
> >                 
> >                 where tag starts with '00' (use '000' or 'LDR' for
> > 
> > leader), pos1 is the starting position,
> > 
> >                 pos2 is the ending position, both 0-based. Value is a
> > 
> > regular expression.
> > 
> >                 Default condition (-e not specified) matches any data
> > 
> > field.
> > 
> >                 For control fields, only the tag is mandatory.
> >                 
> >                 Examples: -e '100,,,a,^A' will match records that contain
> > 
> > 100$a starting with 'A'
> > 
> >                           -e '008,35,37,(ita|eng)' will match records with
> > 
> > language ita or eng in 008
> > 
> >                           -e '(1|7)(0|1)(0|1),,2' will match
> > 
> > 100,110,111,700,710,711 with ind2=2
> > 
> >        -f      Comma separated list of fields (tags) to print if output
> > 
> > format
> > 
> >                is "line" or "inline". Default is any field.
> >                
> >                 Note that if a tag is preceded by '#' sign (like in
> > 
> > '#nnn'), a
> > 
> >                count of occurrences will be printed instead.
> >                
> >                 Examples: -f '100,245' will print field 100 and 245
> >                 
> >                           -f '400,#400' will print all occurrences of 400
> > 
> > field as well as the number of its occurrences
> > 
> >        -o      Output format: "marc" for ISO2709, "line" for each subfield
> > 
> > in
> > 
> >                a line, "inline" (default) for each field in a line.
> >        
> >        -s      Specify a string separator for condition. Default is ','.
> >        
> >        -v      Invert the sense of matching, to select non-matching
> > 
> > records.
> > 
> >        -V      Print the version and exit.
> >        
> >        file.mrc
> >        
> >                The mandatory ISO2709 file to read. Can be STDIN, '-'.
> > 
> > DESCRIPTION
> > 
> >        Like grep, the famous Unix utility, MARCgrep.pl allows to filter
> > 
> > MARC
> > 
> >        bibliographic
> >        
> >         records based on conditions on tag, indicators, and field value.
> >        
> >        Conditions can be applied to data fields, control fields or the
> > 
> > leader.
> > 
> >        In case of data fields, the condition can specify tag, indicators,
> >        subfield and value using regular
> >        
> >         expressions. In case of control fields, the condition must contain
> > 
> > the
> > 
> >        tag name, the starting
> >        
> >         and ending position (both 0-based), and a regular expressions for
> > 
> > the
> > 
> >        value.
> >        
> >        Options -c and -v allow respectively to count matching records and
> > 
> > to
> > 
> >        invert the match.
> >        
> >        If option -c is not specified, the output format can be "line" or
> >        "inline" (both human readable),
> >        
> >         or "marc" for MARC binary (ISO2709). For formats "line" or
> > 
> > "inline",
> > 
> >        the -f option allows to specify
> >        
> >         fields to print.
> >        
> >        You can chain more conditions using
> >        
> >        ./MARCGgrep.pl -o marc -e condition1 file.mrc | ./MARCGgrep.pl -e
> >        condition2 -
> > 
> > KNOWN ISSUES
> > 
> >        Performance.
> >        
> >        Accepts and returns only UTF-8.
> >        
> >        Checks are case sensitive.
> > 
> > AUTHOR
> > 
> >        Pontificia Universita' della Santa Croce <http://www.pusc.it/bib/>
> >        
> >        Stefano Bargioni <[email protected]>
> > 
> > SEE ALSO
> > 
> >        marktriggs / marcgrep at <https://github.com/marktriggs/marcgrep>
> > 
> > for
> > 
> >        filtering large data sets
> > > 
> > > On 02 nov 2016, at 09:57, Sergio Letuche <[email protected]>
> > 
> > wrote:
> > > Hello community,
> > > 
> > > how would you treat the following?
> > > 
> > > I need a way to identify all tags - subfields, that have stored an ISSN
> > 
> > number in them.
> > 
> > > What would you suggest as a clever approach for this?
> > > 
> > > Thank you



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

Re: identify ISSN numbers in an mrc file

Reply via email to