Re: A library agnostic datastructure for MARC ?

2010-11-11 Thread Marc Chantreux
hi galen,

On Thu, Nov 11, 2010 at 07:40:35PM -0500, Galen Charlton wrote:
> I don't see how a structure like this gets you anywhere closer to an
> abstraction layer that would permit somebody to code in terms of
> semantic concepts like title and author instead of MARC tags,

It doesn't: the fact is we're working on libraries to do that (see
MARC::Mapper from my other mail) and i really would like to interact
both with

- MARC::Record: it's heavily used in the koha ILS
- the Frederic Demians's MARC lib which is much more modern
- what we at biblibre call a SimpleRecord which is just a hash of non
  ordered fields

i don't want to write a web of gateways for all those structures and
those to come so i propose to have a common way to share between all of
our works.

For example: MARC::Template and ISO2709 have internal code to build
MARC::Records and SimpleRecords so they depends on MARC::Record. I
really would like to drop this code for something more generic and
simple.

> you're looking for a serialization or data structure that is more

i'm not talking about serialization at all, i'm talking about sharing
data between marc related tools as PSGI does for the web thing. sorry if
i wasn't clear.

regards

-- 
Marc Chantreux
BibLibre, expert en logiciels libres pour l'info-doc
http://biblibre.com


Re: A library agnostic datastructure for MARC ?

2010-11-11 Thread Galen Charlton
Hi,

On Thu, Nov 11, 2010 at 7:27 PM, Marc Chantreux
 wrote:

> simple proposition is:
>
> [ [qw/ 001 value  /] # example of control field
> , [qw/ 005 value  /] # example of control field
> , [ [qw/ 200 0 1  /] # example of data field
>  , [ [qw/ a foo  /]
>    , [qw/ b bar  /]
>    , [qw/ a foo2 /]
[snip]

I don't see how a structure like this gets you anywhere closer to an
abstraction layer that would permit somebody to code in terms of
semantic concepts like title and author instead of MARC tags, but if
you're looking for a serialization or data structure that is more
convenient for you to deal with, you might find searching the code4lib
mailing list archives for "MARC" and "JSON" to be fruitful.

Regards,

Galen
-- 
Galen Charlton
gmcha...@gmail.com


WIP stuff about MARC manipulation

2010-11-11 Thread Marc Chantreux
hello again,

As i wrote my last mail, guys at biblibre really would like to write
tools to ease the programmer in charge of migration process.
MARC::Template is something we're very happy about but we have more
tools that aren't as polished. But we already successfully use them so
we share them with you as they are. 

https://github.com/eiro/p5-ISO2709 is a regex driven ISO2709 parser. The
goal of this library is not performance but flexibility: we where able
to read very baddly formatted ISO2709 with just by changing some few
things in the regex. I never published it for now because i didn't find
time to make the things configurable instead of hackable: that's a proof
of concept but i'm very happy about the results.

There is https://github.com/eiro/p5-MARC-Data which is a first step to have a
very simple way to deal with CIB using Moose metaprogramming and serialization
definition. for example, UNIMARC Biblio 100$a definition is: 

( [qw( entered   8 Str )
  , POSIX::strftime('%Y%m%d',localtime )] # 0-7Date Entered on File 
(Mandatory)
, [qw( publication_type  1 Str u   )] # 8  Type of Publication Date
, [qw( publication1  4 )] # 9-12   Publication Date 1
, [qw( publication2  4 )] # 13-16  Publication Date 2
, [qw( audience  3 Str u   )] # 17-19  Target Audience Code
, [qw( government1 Str u   )] # 20 Government Publication Code
, [qw( modified  1 Str 0   )] # 21 Modified Record Code
, [qw( language  3 Str fre )] # 22-24  Language of Cataloguing 
(Mandatory)
, [qw( transliteration   1 Str y   )] # 25 Transliteration Code
, [qw( charset1  4 Str 5050)] # 26-29  Character Set (Mandatory)
, [qw( charset2  4 )] # 30-33  Additional Character Set
, [qw( title_script  2 Str ba  )] # 34-35  Script of Title
) 

Another idea is MARC::Mapper
http://www.tinybox.net/2009/08/06/a-marc-mapper-in-few-lines-of-perl/
but i really think it could be written using MARC::Template syntaxes.

regards
-- 
Marc Chantreux
BibLibre, expert en logiciels libres pour l'info-doc
http://biblibre.com


A library agnostic datastructure for MARC ?

2010-11-11 Thread Marc Chantreux
hello world,

Years ago, i wrote MARC::Template (https://github.com/eiro/MARC-Template)
to ease the process of migrating data to koha ILS. We often use it at
biblibre (http://biblibre.com).

For a MARC to MARC migration, just making some manipulations CRUD manipulations
on fields, cleaning some data, moving some fields, API is an awfull waste of 
time:
learn to manipulate perl structures is much more efficient imho.

For a more complex migration mixing data coming from multiple datasources and
multiple formats, or even to write some migration from MARC to a modern
biblio format, we're convinced that the job can be done better and faster
by adding a level of abstraction over MARC. What i mean about abstraction is
that the business programmer, as well as the librarian, don't carre about the
999$x field: he carres about authors, titles, year of edition ... That
can be partially done by a YAML driven Moose metaprogramming.

Actually: i personnally think that is would be possible to write a complete GUI
driven ETL able to deal with MARC.

At the very end of the process, we transform everything as MARC::Record
to use the MARC::Record serialization but the Frederic's lib can be a
good output for our libs. So is there a chance to specify a library
agnostic datastructure as a bridge for all your libs, a kind of PSGI for
MARC so everyone could import and export to this format so we can easily
mix all of them ?

simple proposition is: 

[ [qw/ 001 value  /] # example of control field
, [qw/ 005 value  /] # example of control field
, [ [qw/ 200 0 1  /] # example of data field
  , [ [qw/ a foo  /]
, [qw/ b bar  /]
, [qw/ a foo2 /]
, [qw/ b bar2 /]
]
  ]
, [ [qw/ 200 0 1  /] # example of data field
  , [ [qw/ a foo  /]
, [qw/ b bar  /]
, [qw/ a foo2 /]
, [qw/ b bar2 /]
]
  ]
]

regards,
-- 
Marc Chantreux
BibLibre, expert en logiciels libres pour l'info-doc
http://biblibre.com




Re: Moose based Perl library for MARC records

2010-11-11 Thread Brad Baxter
2010/11/11 Frédéric DEMIANS :
> Thanks all for your suggestions. I have to choose another name for sure.
> Marc::Moose seems to be a reasonable choice. But I'm very tempted by a
> shorter option: MarcX, MarcX::Record, MarcX::Parser, MarcX::Reader::Isis,
> etc. Any objection?
>

I can't think of a better choice than MARC::Moose::, e.g., MARC::Moose::Record.
There are a lot of MARC::Something's out there, and what differentiates yours
from those appears to be Moose.  Yes, you might rewrite it in the future using
Marmoset, in which case I'd probably suggest renaming it to
MARC::Marmoset::Record.
That way, Moose enthusiasts could take over maintaining the Moose version.
If you catch my drift.

I'm not crazy about it necessarily.  I just can't think of anything I
like better.
Sleeping on it some more might be in order, maybe.

Just my .10 francs worth,

Brad


Re: Moose based Perl library for MARC records

2010-11-11 Thread Bill Birthisel
CPAN stores distributions under author subdirectories. But the module
namespace is done separately and reflects the function of the module.
In the case of the MARC:: namespace, I think Ed Summers is the only one
who has remained involved since the beginning (back in the 1990's). Had
we used names at the start, it would have been BBIRTH::MARC (which would
have been confusing to absolutely everyone  even back then) since I
uploaded the first release to CPAN.

Also, uppercase MARC:: is the preferred CPAN practice for an acronym of
this sort. Compare to existing module names like CGI, DBI, ODBC, ASP,
and PDL.

-bill

On Thu, 2010-11-11 at 20:39 +0100, Frédéric DEMIANS wrote:
> > butting in an interesting discussion ...
> 
> Thanks for joining the discussion.
> 
>  > Would Org::Demians::MARC::Record ( or Tamil::MARC::Record ) be very
>  > wrong, unless you aim to provide the ultimate collection of MARC
>  > modules that would make all the others obsolete ?
> 
> Yes, I aim to... In the Java world, I would have name it
> fr.tamil.marc... I'm not sure it's the usage in CPAN. And there is this
> suggestion to stay under MARC:: umbrella.
> 
>  > Moose is great and I love it, but it's not forever ... in a few years
>  > we'll use Elk or something else, and you might want to port your
>  > modules ...
> 
> I have other plan for the future...



Re: Moose based Perl library for MARC records

2010-11-11 Thread Frédéric DEMIANS

> butting in an interesting discussion ...

Thanks for joining the discussion.

> Would Org::Demians::MARC::Record ( or Tamil::MARC::Record ) be very
> wrong, unless you aim to provide the ultimate collection of MARC
> modules that would make all the others obsolete ?

Yes, I aim to... In the Java world, I would have name it
fr.tamil.marc... I'm not sure it's the usage in CPAN. And there is this
suggestion to stay under MARC:: umbrella.

> Moose is great and I love it, but it's not forever ... in a few years
> we'll use Elk or something else, and you might want to port your
> modules ...

I have other plan for the future...


Re: Moose based Perl library for MARC records

2010-11-11 Thread Emil-Nicolaie Perhinschi
Hello,

butting in an interesting discussion ...

Would Org::Demians::MARC::Record ( or Tamil::MARC::Record ) be very
wrong, unless you aim to provide the ultimate collection of MARC
modules that would make all the others obsolete ?

Moose is great and I love it, but it's not forever ... in a few years
we'll use Elk or something else, and you might want to port your
modules ...

Emil

2010/11/11 Frédéric DEMIANS :
>
>> I was going to express the same concern. Keeping everything under
>> MARC:: may also make it a tiny bit easier to find the existing
>> alternatives for, well, parsing MARC records. I would +1 MARC::Moose.
>
> I understand this point. I don't like the idea of using 'Moose' in the name
> of object using Moose. As this library is a MARC::Record alternative, as you
> said, why not simply Marc::Alt?
>
>> Also, to be purely pedantic, "MARC" is an acronym for "MAchine-Readable
>> Cataloguing", while "Marc" is a person's name, so where-ever it ends up,
>> please keep it uppercase.
>
> On this point, my convention is just to begin any element of class by an
> uppercase and then lowercase. This way there is no need to think about it:
> is it an acronym? should I say Koha or KOHA? SOLR or SolR? (private joke)
>
> But I will think about it since MARC::Record is so widely used.
>
> Thanks.
>



-- 
==
Emil Perhinschi     http://www.lunch-break.ro
==


Re: Moose based Perl library for MARC records

2010-11-11 Thread Frédéric DEMIANS



I was going to express the same concern. Keeping everything under
MARC:: may also make it a tiny bit easier to find the existing
alternatives for, well, parsing MARC records. I would +1 MARC::Moose.


I understand this point. I don't like the idea of using 'Moose' in the 
name of object using Moose. As this library is a MARC::Record 
alternative, as you said, why not simply Marc::Alt?


Also, to be purely pedantic, "MARC" is an acronym for 
"MAchine-Readable Cataloguing", while "Marc" is a person's name, so 
where-ever it ends up, please keep it uppercase.


On this point, my convention is just to begin any element of class by an 
uppercase and then lowercase. This way there is no need to think about 
it: is it an acronym? should I say Koha or KOHA? SOLR or SolR? (private 
joke)


But I will think about it since MARC::Record is so widely used.

Thanks.


Re: Moose based Perl library for MARC records

2010-11-11 Thread Dan Scott
Gah. Replying to all this time instead of just Galen, as I did three
hours ago, for my $0.02...

2010/11/11 Galen Charlton :
> Hi,
>
> 2010/11/11 Frédéric DEMIANS :
>> Thanks all for your suggestions. I have to choose another name for sure.
>> Marc::Moose seems to be a reasonable choice. But I'm very tempted by a
>> shorter option: MarcX, MarcX::Record, MarcX::Parser, MarcX::Reader::Isis,
>> etc. Any objection?
>
> Not from me, but I'm not sure if the CPAN folks will want yet another
> top-level namespace.

I was going to express the same concern. Keeping everything under
MARC:: may also make it a tiny bit easier to find the existing
alternatives for, well, parsing MARC records. I would +1 MARC::Moose.

Also, to be purely pedantic, "MARC" is an acronym for
"MAchine-Readable Cataloguing", while "Marc" is a person's name, so
where-ever it ends up, please keep it uppercase.

-- 
Dan Scott
Laurentian University


RE: Moose based Perl library for MARC records

2010-11-11 Thread Bryan Baldus
2010/11/11 Frédéric DEMIANS :
>> Thanks all for your suggestions. I have to choose another name for sure.
>> Marc::Moose seems to be a reasonable choice. But I'm very tempted by a
>> shorter option: MarcX, MarcX::Record, MarcX::Parser, MarcX::Reader::Isis,
>> etc. Any objection?


Since MARC is an acronym, I believe all of its letters should be capitalized. 
Trying to remember to lowercase some of them while coding would make me less 
likely to want to use your modules.

As for adding another top level instead of keeping MARC:: as the primary prefix 
for the modules, since the modules you are working on seem to be dealing with 
manipulating standard MARC records rather than something new called "MarcX", 
I'd say MARC:: would be the place I'd expect to find such modules.

Thursday, November 11, 2010 8:28 AM Dueber, William [dueb...@umich.edu]:
>I think we should revisit "Biblio::". Yes, I know MARC isn't used only for 
>bibliographic data, but it's sure as hell not used to speak of outside the 
>library/museum world. 'Biblio' might not be perfect, but it's certainly not 
>misleading in any meanigful way.

As mentioned above, MARC::* is where I'd be likely to look for modules related 
to manipulating MARC records. Maybe it's because I haven't needed any of the 
Biblio::* modules, but I'd be less likely to look there for MARC manipulation 
modules. Since the modules under discussion appear to be an alternative to the 
current standard modules for MARC manipulation, the MARC::Record family, it 
seems like something within MARC::* would be appropriate (as long as the names 
don't interfere with the existing modules but instead can be used in 
cooperation with them).


Bryan Baldus
bryan.bal...@quality-books.com
eij...@cpan.org
http://home.comcast.net/~eijabb/



Re: Moose based Perl library for MARC records

2010-11-11 Thread Galen Charlton
Hi,

2010/11/11 Dueber, William :
> I think we should revisit “Biblio::”. Yes, I know MARC isn’t used only for
> bibliographic data, but it’s sure as hell not used to speak of outside the
> library/museum world. ‘Biblio’ might not be perfect, but it’s certainly not
> misleading in any meanigful way.

It's up to Frédéric, of course, but since nearly all of the current
Perl modules used for handling MARC are in the MARC:: namespace,
sticking with the precedent will make it easier for somebody searching
CPAN to find all of the choices.  Anybody from outside librarydom who
realizes that they're stuck dealing with MARC records will presumably
have seen the records identified as "MARC".

Regards,

Galen
-- 
Galen Charlton
gmcha...@gmail.com


Re: Moose based Perl library for MARC records

2010-11-11 Thread Dueber, William
I think we should revisit "Biblio::". Yes, I know MARC isn't used only for 
bibliographic data, but it's sure as hell not used to speak of outside the 
library/museum world. 'Biblio' might not be perfect, but it's certainly not 
misleading in any meanigful way.


On 11/11/10 10:23 AM, "Galen Charlton"  wrote:

Hi,

2010/11/11 Frédéric DEMIANS :
> Thanks all for your suggestions. I have to choose another name for sure.
> Marc::Moose seems to be a reasonable choice. But I'm very tempted by a
> shorter option: MarcX, MarcX::Record, MarcX::Parser, MarcX::Reader::Isis,
> etc. Any objection?

Not from me, but I'm not sure if the CPAN folks will want yet another
top-level namespace.

Regards,

Galen
--
Galen Charlton
gmcha...@gmail.com



Re: Moose based Perl library for MARC records

2010-11-11 Thread Galen Charlton
Hi,

2010/11/11 Frédéric DEMIANS :
> Thanks all for your suggestions. I have to choose another name for sure.
> Marc::Moose seems to be a reasonable choice. But I'm very tempted by a
> shorter option: MarcX, MarcX::Record, MarcX::Parser, MarcX::Reader::Isis,
> etc. Any objection?

Not from me, but I'm not sure if the CPAN folks will want yet another
top-level namespace.

Regards,

Galen
-- 
Galen Charlton
gmcha...@gmail.com