Re: [Crm-sig] Pages reproduced as spreads

2017-03-11 Thread Jim Salmons
To All Fellow SIG Members,

 

The current discussion about print-page numbers and digital item IDs is an 
ideal opportunity for me to provide an update to the group on my current 
#DATeCH2017 submissions and current activity working toward the development of 
a #GTS (Ground Truth Storage) format for magazines, newspapers, and related 
serial publications.

 

As FactMiners, our #CitizenScience research project, I am working with the good 
folks at PRImA, the Pattern Recognition and Image Analysis research lab at the 
U. of Salford. Through this collaboration I am developing FactMiners' MAGAZINE 
#GTS format which is based on a #cidocCRM/FRBRoo/PRESSoo 'ontological stack.' 
Our goal is to provide integrated complex document structure and content 
depiction models in support of eResearch and machine learning access to digital 
collections of historic documents. The FactMiners' MAGAZINE format is being 
evolved as a superset of PRImA's PAGE #GTS format.

 

As some of you may recall from my self-introductory comments upon joining the 
SIG, I believe the complementary branches of "things" and "activity/time" in 
the CRM lends itself to being used as an executable metamodel for software 
design and development, and not just used as a descriptive ontology. This idea 
was among the central ideas of my first submission to this year's DATeCH 
conference. As we "ground truthed" datasets of Softalk magazine -- curating the 
table of contents, advertiser index, mastheads, etc. of this early 
microcomputer magazine -- we developed a number of structure and content 
revealing datasets that were based on self-referential content within the 
magazine. 

 

For example, in order to begin building a visual repository of Softalk 
advertisements based on a fine-grained pattern language expressed in PRESSoo 
Issuing Rules and Issuing Rule Changes, the 7,158 sightings of advertisements 
in the 48 issues of the magazine required that we be able to transform print 
page number references to the "leaf" digital image IDs used by the Internet 
Archive where the Softalk magazine digital collection is maintained. In fact, 
the print page number (ppg) to 'leaf' ID mapping became an issue with every 
dataset we wanted to explore.

 

This led to our development of our second #DATeCH2017 submission, "Print-Page 
Number to 'Leaf' ID Mapping in Support of eResearch and Machine-Learning at the 
Internet Archive." In this paper we identify the foundational nature of the 
"ppg2leaf" tuple based on an exploration of relevant #cidocCRM functional 
subsystems. We document many of the situations where digitization can result in 
discontinuous mapping of print page numbers to digitized images. Our Softalk 
collection was digitized by the Internet Archive's regional scanning center and 
had an impressive but insufficient "ppg2leaf" map of over 70%. This tuple-match 
was a side-effect based on the vigilance of our scanners asserting page numbers 
during the center's workflow ingestion process. To determine if your experience 
was typical, we examined 265 computer magazine collections at the Archive and 
found only 29 individual print page numbers to leaf ID tuples in over 1.4M 
pages contained in these collections! So, as described in our second paper, we 
have developed the Python-based "ppg2leaf_ferret" app as a metadata discovery 
and curation tool in support of eResearch and machine learning at the Internet 
Archive.

 

Using the ppg2leaf_ferret we curated the print page number to digital image 
ItemID dataset and created the first implementation of FactMiners MAGAZINE 
format #GTS metadata files. The Softalk collection is, as reported in this 
announcement (https://goo.gl/XxMcqe), the first digital magazine collection at 
the Internet Archive to provide a set of magazine-specific Ground Truth Storage 
metadata files including the all-important ppg2leaf_map. At the above 
announcement link, you will find links to ResearchGate.net pre-prints of our 
#DATeCH submissions, embedded video project updates showing progress in the 
development of the "ferret" app, and links to the initial release of our 
publication and issue level MAGAZINE #GTS files in the Softalk collection at 
the Archive.

 

For SIG members who may not have a free ResearchGate user account, here are 
links on my OneDrive cloud storage to our #DATeCH2017 submissions:

 

https://1drv.ms/b/s!AtML1v0eUlpEgdZlN7o9RyadKi-w4A

   

and here:

 

https://1drv.ms/b/s!AtML1v0eUlpEgeJj6hbQKZaBfBscqw



This is an already overly long update, so I will wrap up my current 
contribution to this print page number and itemID conversation. But as will 
become clear to any SIG member who explores the links provided above, I will 
soon be asking for some "best practice" modeling recommendations about how to 
support the linkage between an instance-specific MAGAZINE #GTS metamodel and 
its cited reference models (in our initial case being the 
#cidocCRM/FRBRoo/PRESoo 'stack').

 

As 

Re: [Crm-sig] Pages reproduced as spreads

2017-03-10 Thread martin
ore specific
than "Re: Contents of Crm-sig digest..."


Today's Topics:

 1. Re: Pages reproduced as spreads (martin)




--


Message: 1
Date: Tue, 7 Mar 2017 18:24:17 +0200
From: martin 
To: crm-sig@ics.forth.gr
Subject: Re: [Crm-sig] Pages reproduced as spreads
Message-ID: 
Content-Type: text/plain; charset=UTF-8; format=flowed

Dear Florian,

There is no model without a question. Pages of books constitute a
partitioning of an
information object. Each page number can be seen as an identifier.
Paragraphs belong to an alternative partitioning system. The
reproduction has its own particioning, the scanned double pages.
Each scanned image represents, actually also incorporates, the 
text of

two pages of the reproduced.
Between alternative partitionings, one can define includes/overlaps
relations.

If this is elegant, depends on what queries or functions you'd 
like to

support.

Best,

martin

On 7/3/2017 1:36 ??, Florian Kr?utli wrote:

Dear all,

I have a collection of Books (F5) that have been reproduced 
(F33) as

PDFs (E84).
In some cases, books have been digitised as spreads i.e. one 
page in

the PDF represents two pages in the book.


Is there an elegant way to model this?

Best,

Florian
___
Crm-sig mailing list
Crm-sig@ics.forth.gr
http://lists.ics.forth.gr/mailman/listinfo/crm-sig




--

--
Dr. Martin Doerr  |  Vox:+30(2810)391625|
Research Director |  Fax:+30(2810)391638|
  |  Email: mar...@ics.forth.gr |
|
  Center for Cultural Informatics   |
  Information Systems Laboratory|
   Institute of Computer Science|
  Foundation for Research and Technology - Hellas (FORTH)   |
|
  N.Plastira 100, Vassilika Vouton, |
   GR70013 Heraklion,Crete,Greece   |
|
Web-site: http://www.ics.forth.gr/isl   |
--



--

Subject: Digest Footer

___
Crm-sig mailing list
Crm-sig@ics.forth.gr
http://lists.ics.forth.gr/mailman/listinfo/crm-sig


--

End of Crm-sig Digest, Vol 122, Issue 8
***



___
Crm-sig mailing list
Crm-sig@ics.forth.gr
http://lists.ics.forth.gr/mailman/listinfo/crm-sig

___
Crm-sig mailing list
Crm-sig@ics.forth.gr
http://lists.ics.forth.gr/mailman/listinfo/crm-sig

___
Crm-sig mailing list
Crm-sig@ics.forth.gr
http://lists.ics.forth.gr/mailman/listinfo/crm-sig









-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.ics.forth.gr/pipermail/crm-sig/attachments/20170310/1ff1d6bd/attachment.html>


--

Subject: Digest Footer

___
Crm-sig mailing list
Crm-sig@ics.forth.gr
http://lists.ics.forth.gr/mailman/listinfo/crm-sig


--

End of Crm-sig Digest, Vol 122, Issue 10





___
Crm-sig mailing list
Crm-sig@ics.forth.gr
http://lists.ics.forth.gr/mailman/listinfo/crm-sig



--

--
 Dr. Martin Doerr  |  Vox:+30(2810)391625|
 Research Director |  Fax:+30(2810)391638|
   |  Email: mar...@ics.forth.gr |
 |
   Center for Cultural Informatics   |
   Information Systems Laboratory|
Institute of Computer Science|
   Foundation for Research and Technology - Hellas (FORTH)   |
 |
   N.Plastira 100, Vassilika Vouton, |
GR70013 Heraklion,Crete,Greece   |
 |
 Web-site: http://www.ics.forth.gr/isl   |
--



Re: [Crm-sig] Pages reproduced as spreads

2017-03-10 Thread Florian Kräutli
same (the
>>>>> information object) but page 1 is spread over two carrier pages. i.e. page
>>>>> 1 is still page 1 as an information object but on the application adobe
>>>>> spreads it over two application carrier pages. Is that right? or is it
>>>>> something else.
>>>>> 
>>>>> If the expression is the same (the same information object) then isn't
>>>>> page 1, page 1
>>>>> 
>>>>> Can you clarify.
>>>>> 
>>>>> D
>>>>> 
>>>>> 
>>>>> ________
>>>>> From: Crm-sig [crm-sig-boun...@ics.forth.gr] on behalf of Florian
>>>>> Kr?utli [fkraeu...@mpiwg-berlin.mpg.de]
>>>>> Sent: 09 March 2017 10:38
>>>>> To: crm-sig@ics.forth.gr
>>>>> Subject: Re: [Crm-sig] Crm-sig Digest, Vol 122, Issue 8
>>>>> 
>>>>> Dear Martin,
>>>>> 
>>>>> many thanks for your input!
>>>>> 
>>>>> Our question at the moment is simply, does a page in the PDF represent
>>>>> one or two pages of the book?
>>>>> 
>>>>> Later on, we might have more specific questions that will require us to
>>>>> define the relationships between these two page identifiers (in the
>>>>> physical book and in the PDF) more explicitly. We would then also need to
>>>>> manually assess each PDF as, for instance, we can not assume that page n 
>>>>> in
>>>>> a book corresponds to page n/2 in a double-spread PDF. A PDF might contain
>>>>> some additional pages with information about the digitisation process.
>>>>> 
>>>>> For now we however only need a binary answer: double-spread yes or no.
>>>>> 
>>>>> All the best,
>>>>> 
>>>>> Florian
>>>>> 
>>>>> 
>>>>>> On 8 Mar 2017, at 11:00, crm-sig-requ...@ics.forth.gr wrote:
>>>>>> 
>>>>>> Send Crm-sig mailing list submissions to
>>>>>>  crm-sig@ics.forth.gr
>>>>>> 
>>>>>> To subscribe or unsubscribe via the World Wide Web, visit
>>>>>>  http://lists.ics.forth.gr/mailman/listinfo/crm-sig
>>>>>> or, via email, send a message with subject or body 'help' to
>>>>>>  crm-sig-requ...@ics.forth.gr
>>>>>> 
>>>>>> You can reach the person managing the list at
>>>>>>  crm-sig-ow...@ics.forth.gr
>>>>>> 
>>>>>> When replying, please edit your Subject line so it is more specific
>>>>>> than "Re: Contents of Crm-sig digest..."
>>>>>> 
>>>>>> 
>>>>>> Today's Topics:
>>>>>> 
>>>>>>  1. Re: Pages reproduced as spreads (martin)
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> --
>>>>>> 
>>>>>> Message: 1
>>>>>> Date: Tue, 7 Mar 2017 18:24:17 +0200
>>>>>> From: martin 
>>>>>> To: crm-sig@ics.forth.gr
>>>>>> Subject: Re: [Crm-sig] Pages reproduced as spreads
>>>>>> Message-ID: 
>>>>>> Content-Type: text/plain; charset=UTF-8; format=flowed
>>>>>> 
>>>>>> Dear Florian,
>>>>>> 
>>>>>> There is no model without a question. Pages of books constitute a
>>>>>> partitioning of an
>>>>>> information object. Each page number can be seen as an identifier.
>>>>>> Paragraphs belong to an alternative partitioning system. The
>>>>>> reproduction has its own particioning, the scanned double pages.
>>>>>> Each scanned image represents, actually also incorporates, the text of
>>>>>> two pages of the reproduced.
>>>>>> Between alternative partitionings, one can define includes/overlaps
>>>>>> relations.
>>>>>> 
>>>>>> If this is elegant, depends on what queries or functions you'd like to
>>>>>> support.
>>>>>> 
>>>>>> Best,
>>>>>> 
>>>>>> martin
>>>>>> 
>>>>>> On 7/3/2017 1:36 ??, Florian Kr?utli wrote:
>>>>>>> Dear all,
>>>>>>> 
>>>>>>> I have a collection of Books (F5) that have been reproduced (F33) as
>>>>> PDFs (E84).
>>>>>>> In some cases, books have been digitised as spreads i.e. one page in
>>>>> the PDF represents two pages in the book.
>>>>>>> 
>>>>>>> Is there an elegant way to model this?
>>>>>>> 
>>>>>>> Best,
>>>>>>> 
>>>>>>> Florian
>>>>>>> ___
>>>>>>> Crm-sig mailing list
>>>>>>> Crm-sig@ics.forth.gr
>>>>>>> http://lists.ics.forth.gr/mailman/listinfo/crm-sig
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> 
>>>>>> --
>>>>>> Dr. Martin Doerr  |  Vox:+30(2810)391625|
>>>>>> Research Director |  Fax:+30(2810)391638|
>>>>>>   |  Email: mar...@ics.forth.gr |
>>>>>> |
>>>>>>   Center for Cultural Informatics   |
>>>>>>   Information Systems Laboratory|
>>>>>>Institute of Computer Science|
>>>>>>   Foundation for Research and Technology - Hellas (FORTH)   |
>>>>>> |
>>>>>>   N.Plastira 100, Vassilika Vouton, |
>>>>>>GR70013 Heraklion,Crete,Greece   |
>>>>>> |
>>>>>> Web-site: http://www.ics.forth.gr/isl   |
>>>>>> --
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> 
>>>>>> Subject: Digest Footer
>>>>>> 
>>>>>> ___
>>>>>> Crm-sig mailing list
>>>>>> Crm-sig@ics.forth.gr
>>>>>> http://lists.ics.forth.gr/mailman/listinfo/crm-sig
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> 
>>>>>> End of Crm-sig Digest, Vol 122, Issue 8
>>>>>> ***
>>>>> 
>>>>> 
>>>>> ___
>>>>> Crm-sig mailing list
>>>>> Crm-sig@ics.forth.gr
>>>>> http://lists.ics.forth.gr/mailman/listinfo/crm-sig
>>>>> 
>>>>> ___
>>>>> Crm-sig mailing list
>>>>> Crm-sig@ics.forth.gr
>>>>> http://lists.ics.forth.gr/mailman/listinfo/crm-sig
>>>>> 
>>>>> ___
>>>>> Crm-sig mailing list
>>>>> Crm-sig@ics.forth.gr
>>>>> http://lists.ics.forth.gr/mailman/listinfo/crm-sig
>>>>> 
>>>> 
>>>> 
>>> 
>> 
> -- next part --
> An HTML attachment was scrubbed...
> URL: 
> <http://lists.ics.forth.gr/pipermail/crm-sig/attachments/20170310/1ff1d6bd/attachment.html>
> 
> --
> 
> Subject: Digest Footer
> 
> ___
> Crm-sig mailing list
> Crm-sig@ics.forth.gr
> http://lists.ics.forth.gr/mailman/listinfo/crm-sig
> 
> 
> --
> 
> End of Crm-sig Digest, Vol 122, Issue 10
> 



Re: [Crm-sig] Pages reproduced as spreads

2017-03-07 Thread martin

Dear Florian,

There is no model without a question. Pages of books constitute a 
partitioning of an
information object. Each page number can be seen as an identifier. 
Paragraphs belong to an alternative partitioning system. The 
reproduction has its own particioning, the scanned double pages.
Each scanned image represents, actually also incorporates, the text of 
two pages of the reproduced.
Between alternative partitionings, one can define includes/overlaps 
relations.


If this is elegant, depends on what queries or functions you'd like to 
support.


Best,

martin

On 7/3/2017 1:36 μμ, Florian Kräutli wrote:

Dear all,

I have a collection of Books (F5) that have been reproduced (F33) as PDFs (E84).
In some cases, books have been digitised as spreads i.e. one page in the PDF 
represents two pages in the book.

Is there an elegant way to model this?

Best,

Florian
___
Crm-sig mailing list
Crm-sig@ics.forth.gr
http://lists.ics.forth.gr/mailman/listinfo/crm-sig




--

--
 Dr. Martin Doerr  |  Vox:+30(2810)391625|
 Research Director |  Fax:+30(2810)391638|
   |  Email: mar...@ics.forth.gr |
 |
   Center for Cultural Informatics   |
   Information Systems Laboratory|
Institute of Computer Science|
   Foundation for Research and Technology - Hellas (FORTH)   |
 |
   N.Plastira 100, Vassilika Vouton, |
GR70013 Heraklion,Crete,Greece   |
 |
 Web-site: http://www.ics.forth.gr/isl   |
--



Re: [Crm-sig] Pages reproduced as spreads

2017-03-07 Thread Pierre Choffé
In this case why not use a simple

P2 has type

E55 Type “double-page spread” on your E84 Information Carrier? (in which case 
it would be good to also type the “single-page spread” E84)

Or specialize the

R29 reproduced

 property into 2 sub-properties, one for the single-page reproduction, one for 
the double-page reproduction?

I would go for the latter. What do you think?

All best,

Pierre

On mar. 07 mars 2017 at 15:22 "Florian Kräutli"

<
mailto:
> wrote:

a, pre, code, a:link, body { word-wrap: break-word !important; }

Hi Pierre,

Yes, that's basically it. I don't want to get too specific about things if it's 
not necessary.

Best,

Florian

On 7 Mar 2017, at 15:18, Pierre Choffé <
mailto:choffepie...@gmail.com
> wrote:

Thank you Florian. So, basically what you want to do is signal that a PDF 
contains single-page or double-page spreads, is that it? Or do you want your 
model to be very specific about all the points you mention?

Pierre

On mar. 07 mars 2017 at 15:13 "Florian Kräutli"

 

<
mailto:
> wrote:

Hi Pierre,

Thanks for your reply!

I will try to give you a bit more context. At the centre of the model is the 
book. However, researchers are primarily working with its PDF reproduction.

They want to be able to specify the number of pages in a book, and later also 
want to specify the page range for each chapter in the book.

The PDFs are retrieved from different sources and sometimes contain an extra 
page with source information. In addition, as is usually the case, the page 
numbers in the PDF do not correspond to the page numbers in the book.

Because we cannot safely say how many pages are in the book, we only enter the 
number of pages in the PDF. Also when we locate the chapters in the book, we 
refer to the page number in the PDF and not to the book.

Later on we would like to be able to say how many pages a chapter occupies in a 
book. The same chapter will appear in several books, therefore this can vary 
for the same chapter.

Our problem is that when PDFs reproduce a double-page spread on a single page, 
chapters in the PDFs will appear to be half as long as in the book. Therefore, 
we want to be able to tell if the PDF contains single-page or double-page 
spreads.

All the best,

Florian

On 7 Mar 2017, at 15:04, Pierre Choffé <
mailto:choffepie...@gmail.com
> wrote:

Hi Florian,

What is the granularity level of your present model? The book?

What kind of information would you like to retrieve and for what usage? What 
problem are you trying to solve?

Are there only 2 possible cases? (either globally for the book 1PDF=1page or 
1PDF=2pages, but never “in some cases a mix of both”)

All best,

Pierre

On mar. 07 mars 2017 at 12:36 "Florian Kräutli"

 

<
mailto:
> wrote:

Dear all,

 

I have a collection of Books (F5) that have been reproduced (F33) as PDFs (E84).

 

In some cases, books have been digitised as spreads i.e. one page in the PDF 
represents two pages in the book.

 

Is there an elegant way to model this?

 

Best,

 

Florian

 

___

 

Crm-sig mailing list

 

mailto:Crm-sig@ics.forth.gr
 

http://lists.ics.forth.gr/mailman/listinfo/crm-sig

Re: [Crm-sig] Pages reproduced as spreads

2017-03-07 Thread Florian Kräutli
Hi Pierre,

Yes, that's basically it. I don't want to get too specific about things if it's 
not necessary.

Best,

Florian

> On 7 Mar 2017, at 15:18, Pierre Choffé  wrote:
> 
> 
> Thank you Florian. So, basically what you want to do is signal that a PDF 
> contains single-page or double-page spreads, is that it? Or do you want your 
> model to be very specific about all the points you mention?
> 
> Pierre
> 
> On mar. 07 mars 2017 at 15:13 "Florian Kräutli" <">"Florian Kräutli"  
> > wrote:
> Hi Pierre,
> 
> Thanks for your reply!
> 
> I will try to give you a bit more context. At the centre of the model is the 
> book. However, researchers are primarily working with its PDF reproduction.
> 
> They want to be able to specify the number of pages in a book, and later also 
> want to specify the page range for each chapter in the book.
> 
> The PDFs are retrieved from different sources and sometimes contain an extra 
> page with source information. In addition, as is usually the case, the page 
> numbers in the PDF do not correspond to the page numbers in the book.
> 
> Because we cannot safely say how many pages are in the book, we only enter 
> the number of pages in the PDF. Also when we locate the chapters in the book, 
> we refer to the page number in the PDF and not to the book.
> 
> Later on we would like to be able to say how many pages a chapter occupies in 
> a book. The same chapter will appear in several books, therefore this can 
> vary for the same chapter.
> 
> Our problem is that when PDFs reproduce a double-page spread on a single 
> page, chapters in the PDFs will appear to be half as long as in the book. 
> Therefore, we want to be able to tell if the PDF contains single-page or 
> double-page spreads.
> 
> All the best,
> 
> Florian
> 
> 
> On 7 Mar 2017, at 15:04, Pierre Choffé  > wrote:
> 
> 
> Hi Florian,
> 
> What is the granularity level of your present model? The book?
> 
> What kind of information would you like to retrieve and for what usage? What 
> problem are you trying to solve?
> 
> Are there only 2 possible cases? (either globally for the book 1PDF=1page or 
> 1PDF=2pages, but never “in some cases a mix of both”)
> 
> All best,
> 
> Pierre
> 
> On mar. 07 mars 2017 at 12:36 "Florian Kräutli" <">"Florian Kräutli"  
> > wrote:
> Dear all, 
> 
> I have a collection of Books (F5) that have been reproduced (F33) as PDFs 
> (E84). 
> In some cases, books have been digitised as spreads i.e. one page in the PDF 
> represents two pages in the book. 
> 
> Is there an elegant way to model this? 
> 
> Best, 
> 
> Florian 
> ___ 
> Crm-sig mailing list 
> Crm-sig@ics.forth.gr  
> http://lists.ics.forth.gr/mailman/listinfo/crm-sig 
> 


Re: [Crm-sig] Pages reproduced as spreads

2017-03-07 Thread Pierre Choffé
Thank you Florian. So, basically what you want to do is signal that a PDF 
contains single-page or double-page spreads, is that it? Or do you want your 
model to be very specific about all the points you mention?

Pierre

On mar. 07 mars 2017 at 15:13 "Florian Kräutli"

<
mailto:
> wrote:

a, pre, code, a:link, body { word-wrap: break-word !important; }

Hi Pierre,

Thanks for your reply!

I will try to give you a bit more context. At the centre of the model is the 
book. However, researchers are primarily working with its PDF reproduction.

They want to be able to specify the number of pages in a book, and later also 
want to specify the page range for each chapter in the book.

The PDFs are retrieved from different sources and sometimes contain an extra 
page with source information. In addition, as is usually the case, the page 
numbers in the PDF do not correspond to the page numbers in the book.

Because we cannot safely say how many pages are in the book, we only enter the 
number of pages in the PDF. Also when we locate the chapters in the book, we 
refer to the page number in the PDF and not to the book.

Later on we would like to be able to say how many pages a chapter occupies in a 
book. The same chapter will appear in several books, therefore this can vary 
for the same chapter.

Our problem is that when PDFs reproduce a double-page spread on a single page, 
chapters in the PDFs will appear to be half as long as in the book. Therefore, 
we want to be able to tell if the PDF contains single-page or double-page 
spreads.

All the best,

Florian

On 7 Mar 2017, at 15:04, Pierre Choffé <
mailto:choffepie...@gmail.com
> wrote:

Hi Florian,

What is the granularity level of your present model? The book?

What kind of information would you like to retrieve and for what usage? What 
problem are you trying to solve?

Are there only 2 possible cases? (either globally for the book 1PDF=1page or 
1PDF=2pages, but never “in some cases a mix of both”)

All best,

Pierre

On mar. 07 mars 2017 at 12:36 "Florian Kräutli"

 

<
mailto:
> wrote:

Dear all,

 

I have a collection of Books (F5) that have been reproduced (F33) as PDFs (E84).

 

In some cases, books have been digitised as spreads i.e. one page in the PDF 
represents two pages in the book.

 

Is there an elegant way to model this?

 

Best,

 

Florian

 

___

 

Crm-sig mailing list

 

mailto:Crm-sig@ics.forth.gr
 

http://lists.ics.forth.gr/mailman/listinfo/crm-sig

Re: [Crm-sig] Pages reproduced as spreads

2017-03-07 Thread Florian Kräutli
Hi Pierre,

Thanks for your reply!

I will try to give you a bit more context. At the centre of the model is the 
book. However, researchers are primarily working with its PDF reproduction.

They want to be able to specify the number of pages in a book, and later also 
want to specify the page range for each chapter in the book.

The PDFs are retrieved from different sources and sometimes contain an extra 
page with source information. In addition, as is usually the case, the page 
numbers in the PDF do not correspond to the page numbers in the book.

Because we cannot safely say how many pages are in the book, we only enter the 
number of pages in the PDF. Also when we locate the chapters in the book, we 
refer to the page number in the PDF and not to the book.

Later on we would like to be able to say how many pages a chapter occupies in a 
book. The same chapter will appear in several books, therefore this can vary 
for the same chapter.

Our problem is that when PDFs reproduce a double-page spread on a single page, 
chapters in the PDFs will appear to be half as long as in the book. Therefore, 
we want to be able to tell if the PDF contains single-page or double-page 
spreads.

All the best,

Florian

> On 7 Mar 2017, at 15:04, Pierre Choffé  wrote:
> 
> 
> Hi Florian,
> 
> What is the granularity level of your present model? The book?
> 
> What kind of information would you like to retrieve and for what usage? What 
> problem are you trying to solve?
> 
> Are there only 2 possible cases? (either globally for the book 1PDF=1page or 
> 1PDF=2pages, but never “in some cases a mix of both”)
> 
> All best,
> 
> Pierre
> 
> On mar. 07 mars 2017 at 12:36 "Florian Kräutli" <">"Florian Kräutli"  
> > wrote:
> Dear all, 
> 
> I have a collection of Books (F5) that have been reproduced (F33) as PDFs 
> (E84). 
> In some cases, books have been digitised as spreads i.e. one page in the PDF 
> represents two pages in the book. 
> 
> Is there an elegant way to model this? 
> 
> Best, 
> 
> Florian 
> ___ 
> Crm-sig mailing list 
> Crm-sig@ics.forth.gr  
> http://lists.ics.forth.gr/mailman/listinfo/crm-sig 
> 


[Crm-sig] Pages reproduced as spreads

2017-03-07 Thread Pierre Choffé
Hi Florian,

What is the granularity level of your present model? The book?

What kind of information would you like to retrieve and for what usage? What 
problem are you trying to solve?

Are there only 2 possible cases? (either globally for the book 1PDF=1page or 
1PDF=2pages, but never “in some cases a mix of both”)

All best,

Pierre

On mar. 07 mars 2017 at 12:36 "Florian Kräutli"

<
mailto:
> wrote:

a, pre, code, a:link, body { word-wrap: break-word !important; }

Dear all,

I have a collection of Books (F5) that have been reproduced (F33) as PDFs (E84).

In some cases, books have been digitised as spreads i.e. one page in the PDF 
represents two pages in the book.

Is there an elegant way to model this?

Best,

Florian

___

Crm-sig mailing list
mailto:Crm-sig@ics.forth.gr http://lists.ics.forth.gr/mailman/listinfo/crm-sig

[Crm-sig] Pages reproduced as spreads

2017-03-07 Thread Florian Kräutli
Dear all,

I have a collection of Books (F5) that have been reproduced (F33) as PDFs (E84).
In some cases, books have been digitised as spreads i.e. one page in the PDF 
represents two pages in the book.

Is there an elegant way to model this?

Best,

Florian