Re: [uf-discuss] hoard.it

2008-07-09 Thread Jim O'Donnell

On 8 Jul 2008, at 06:45, Guillaume Lebleu wrote:


Jim O'Donnell wrote:
The recent discussion here about dates has made me wonder if such  
a web service woud be useful for microformats parsers. What do  
others think?
It seems to me that this type of date extraction might present  
risks if used by uf parsers to extract date/time from published  
content (and lead to the "people showing up on the wrong date"  
error mentioned in earlier posts).


I don't think it's so risky. The inspiration for this particular work  
was Dan's experience on the 20th century London site: http://www. 
20thcenturylondon.org.uk/ which involved parsing and normalising text  
dates across four different collections. Granted it's tedious to  
analyse all the different patterns that have been used, but it isn't  
impossible to extract accurate ISO dates. The fact that archive was  
created from those four collections is a testament to that.


Museum catalogue records always have some sort of absolute date,  
though, which makes things easier for me. If people are marking up  
phrases like 'this Saturday' or '25th June' then I can see that  
extracting a date would be tricky - the parser would need the context  
within which to place the date, in order to get the year or month.


That said, I don't how often people use hcalendar to mark up phrases  
like 'next weekend' vs, say, 'Saturday 19th July 2008'. If we had  
some idea of how microformats are being used to mark up dates in  
real, online text, then we could make some meaningful statements  
about how risky, or even impossible, it might be to extract ISO dates  
automatically.



On the other hand, it might be great at the time content is  
authored, to convert ambiguous natural language dates into  
unambiguous microformats, as a way to reduce the pain of micro- 
formatting content (especially it can detect dates in plain text  
rather than parsing something it knows is a date). Authors could  
confirm the generated microformats before publishing in a way  
similar to how Yahoo! shortcuts Wordpress plugin works [1]


Decent authoring tools would be brilliant. Not just for dates but  
locations and possibly other types of microformatted text. For  
instance, I can link a UK street address to Google maps and get back  
a precise point on a map of the UK. So do I really need to manually  
write a lat/long into the HTML to tell a microformats tool how to  
place the address on a map? The text contains all the necessary  
information to perform this operation already.


I think microformats should be relatively easy for a non-technical  
author to add to their text. Decent tools that generate the machine- 
readable data would be an enormous aid here.


Jim

Jim O'Donnell
[EMAIL PROTECTED]
http://eatyourgreens.org.uk
http://flickr.com/photos/eatyourgreens



___
microformats-discuss mailing list
microformats-discuss@microformats.org
http://microformats.org/mailman/listinfo/microformats-discuss


Re: [uf-discuss] hoard.it

2008-07-09 Thread Jim O'Donnell
Thanks. I don't know what Dan did for hoard.it, but our original  
script treated 'about' or 'circa' as the date plus/minus five years.  
So 'circa 1800' would be returned as '1795/1805'. For 'before' or  
'after', you could return a pair of dates with either the first or  
second blank, accordingly. This is assuming we encode time periods as  
per the guidelines in the PNDS application profile:
http://www.ukoln.ac.uk/metadata/pns/pndsdcap/ 
#DctermsTemporalDctermsPeriod


Jim

On 8 Jul 2008, at 06:02, Bob Jonkman wrote:


Sounds great!  How does it deal with dates commonly found in
genealogy, such as "ABT 7 July 1950" or "AFT 25 Dec 2000" or "BEF
Jan 1925"? or even  "ABT 2000 ?

--Bob.


Jim O'Donnell
[EMAIL PROTECTED]
http://eatyourgreens.org.uk
http://flickr.com/photos/eatyourgreens



___
microformats-discuss mailing list
microformats-discuss@microformats.org
http://microformats.org/mailman/listinfo/microformats-discuss