The end point of the research was learning once again that microformats suck. Parsing them is impossible; they are idiosyncratic - everybody has a differing interpretation. It also is self evident why rss succeeds and almost everything else does not - rss is the simplest possible thing - a 15 year old can implement a parser by themselves in a short period of time and get predictable results. We still seem to be far from a world of formal portable public structured metadata on the web. Perhaps the google approach of brute forcing these problems is best. Yes it looks like it's a combination of writing case by case handlers and advocating sites to try be more standard.
And yes of course the hope would be to emit georss. Our goal was just Portland Oregon for now though... This town is starting to have a bit of a tech buzz (heck we had 750 people at the geek love in 'IgnitePortland' last night). a On Feb 6, 2008 9:23 AM, Andrew Turner <[EMAIL PROTECTED]> wrote: > As I understand it, hCard is the HTML Microformat encoding of a vCard. > So any markup should be using hCard, or possibly more simply just > 'adr'. > > This is the most common format, as much as it is "uncommon", with > hopefully growing popularity. Your best bet is just write scrapers on > a site by site basis. > > When I get back from travel can probably help point out useful Event > feeds and stored searches from Mapufacture. > > Also, I assume you'll be outputting your stuff using GeoRSS? Would be > nice to also include the Upcoming, Eventful, or whatever "guid"s so > it can be used in other aggregators to identify duplicates. > > Andrew > > > On Feb 3, 2008 12:03 AM, Anselm Hook <[EMAIL PROTECTED]> wrote: > > We're doing a group code-sprint to build an events aggregator for > > Portland Oregon at http://calagator.org . This is to try provide a > > single point of presence that comprehensively tracks all the regional > > events. > > > > We've successfully been able to scan both upcoming and eventful for > > hCal information - getting the venue, the time, a description and > > facts like that. > > > > However one of the challenges we're facing is plucking out location > > information - which we are not finding well expressed in hcal > > instances in the wild. Our strategy is two-fold: We will brute force > > or special case parse websites with events that don't (or refuse to) > > comply with any kind of standard, but we do want to encourage a > > standard, and parse whatever standards we can find. > > > > It looks like there is another microformat called 'vcard' which has > > some of this. I am going to look at this next. Both upcoming and > > eventful use vcard to store some location information, but in the case > > of eventful it doesn't seem to be as precise as upcoming. { Clearly > > in the case of eventful we can always use their evdb api but we see > > this as a good test case of other cases we will have to tackle. } > > > > Is there some standard that most or even some fraction greater than > > say 10% of the sites publishing events subscribe to? > > > > Leaving georss out of this question for now - clearly we'll consume > > that in the cases where we see it - and encourage it. > > > > a > > _______________________________________________ > > Geowanking mailing list > > [email protected] > > http://lists.burri.to/mailman/listinfo/geowanking > > > > > > -- > Andrew Turner > [EMAIL PROTECTED] 42.2774N x 83.7611W > http://highearthorbit.com Ann Arbor, Michigan, USA > Introduction to Neogeography - http://oreilly.com/catalog/neogeography > _______________________________________________ Geowanking mailing list [email protected] http://lists.burri.to/mailman/listinfo/geowanking
