Re: [uf-discuss] Human and machine readable data format
Not sure if this thread is only covering datetimes in abbreviations. The title seems to suggest that it's more general so thought I'd chip in with a thought on geo as an example. How would a parser deal with natural (non_English) language here? Would it be expected to be able to parse Manchester or Salford or London or Londres or Londinium? Whilst it's just about possible to imagine NLP of dates and trickier to imagine NLP of multi-language date formats it's just beyond the realms of feasibility to consider NLP of place names I'm confused, I'm afraid I don't understand the point of this thought excercise. I thought the problem was any non human readable data where humans can 'see' it - not confined to datetimes One step at a time. I find it terribly frustrating how many people cannot see that this set of constraints yeilds NO solution. At least, when the constraints are held to the level of strictness that this community is holding them to. Seems to me there are 2 solutions: 1. relax the data hiding constraint (tricky because it's fundamental to the uf design philosophy and it's relaxation has been rejected many times) 2. maintain the status quo. Keep the abbreviation design pattern for machine friendly data and leave it up to publishers to decide if this is an issue for them - or not. It would probably need the microformats community to promote the design philosophy and potential issues a little higher than at present. But the wiki already documents much of this - just a bit more prominent linking and some padding out of /about to be a little more neutral. There is another solution that I have been trying to advocate, which is not metadata, and it's not natural language parsing. It is quite simply, to define a strict date format that IS human readable, which can optionally be used in place of ISO 8601 in the title attribute of an ABBR tag. You can keep the percieved benefits of ISO 8601 for international users, because the current pattern will continue to work. However, for users in languages with a well defined date format, a screen reader will not trip up on the date. Whenever I mention this though, everyone seems to think I'm advocating natural language processing. Let me just say again that this is not the case. I'm highly suspicious of the counterargument that such a solution would need to support every language that ISO 8601 supports. This argument does not make sense to me for two reasons: The first, iso 8601 doesn't support ANY language, it is only one date format among many, based on an anglicised calendar, with the only multilingual benefit owing to the fact that happens to be an international standard. To someone with a different calendar, ISO8601 may make just as much sense as July 1st, 2007. that is: very little. I like ISO 8601, but placing it in the title attribute of the ABBR has clearly been a failure, if not a practical failure, it has been a failure to the public image of microformats, and it has ultimately shown the failure of the microformats community structure to be able to deal with an issue such as this effectively. The other reason I'm suspicious of this reason is that such a format would practically only need to support as many languages as there are screen readers. Unless a screen reader supports iso8601 in a title attribute specifically, it's going to read out gibberish, and if it encounters a date written in the wrong language it will read out gibberish. No difference. However, in what I believe is the 80% case, it reads out a date written in the correct language, then we've just improved the experience for more people than we were able to satisfactorally publish to before. What's the counterargument to that? Another solution is to lobby the screen reader vendors to add explicit support for ISO 8601 dates. It's a popular pattern for markup, and adding support for reading them more humanely would provide a clear benefit for their customers. I personally feel that this solution would see more success than trying to wrangle the whole of the microformats community into agreement on this issue. ___ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss
Re: [uf-discuss] Human and machine readable data format
On Tue, Jul 15, 2008 at 12:36 AM, Michael [EMAIL PROTECTED] wrote: Seems to me there are 2 solutions: 1. relax the data hiding constraint (tricky because it's fundamental to the uf design philosophy and it's relaxation has been rejected many times) 2. maintain the status quo. Keep the abbreviation design pattern for machine friendly data and leave it up to publishers to decide if this is an issue for them - or not. It would probably need the microformats community to promote the design philosophy and potential issues a little higher than at present. But the wiki already documents much of this - just a bit more prominent linking and some padding out of /about to be a little more neutral. actually the suggestion of splitting the datetime into date, time and timezone marked up in separate elements seems to me like a good compromise. -mm-dd would certainly not be as scary for humans as a full datetime with timezone and it would avoid needing to hide data and be much easier to do than trying to cope with lots of different date formats or trying to do NLP. In fact it might even help a human in cases where the human date is ambiguous! It might be a good comprimise, but does it actually solve the problem? If they're all in a row aren't we right back where we started? The screen reader would read them all in order, would it not? Or would it add extra pauses by virtue of them being in seperate elements, or having spaces between them? ___ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss
Re: [uf-discuss] Human and machine readable data format
Do you have any examples of the non-Gregorian dates being published online? Or any examples of applications that can take non-Gregorian dates as input? I think we've established non-Gregorian calendars exist, but most countries officially adopted the Gregorian calendar several decades before the web existed (e.g. Japan in 1873). Such adoption wasn't exclusive, but it draws into question (for me anyway) whether such calendars are common enough on the web and have enough potential use cases to warrant modeling in microformats. I realize it's difficult to do such research without belonging to the cultures in which it would appear. Unfortunately that just makes it more necessary to avoid mistakes. Peace, Scott Just to clarify, the original point I was trying to make wasn't that we should model every possible language/calendar in the world. Just that it was unreasonable to expect that from a potential replacement for ISO 8601, since ISO 8601 itself does not meet that requirement. This was in response to David O who wrote: Feel free to get started. I'm sure you can start a wiki page with a listing of language/region codes and the suggested date format for each. Since the current system handles every one of those languages and countries/regions, it would only be logical to expect the same of a suggested replacement. I hope I have convinced a few people that David O's logic falls down at the premise. But this is not to argue that we should make a replacement format that handles that usecase, but rather to consider replacements that don't, since such a replacement would be no worse than the current format, but *would* provide benefits that ISO8601 does not. ___ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss
Re: [uf-discuss] Human and machine readable data format
On Tue, Jul 15, 2008 at 12:53 PM, Breton Slivka [EMAIL PROTECTED] wrote: Do you have any examples of the non-Gregorian dates being published online? Or any examples of applications that can take non-Gregorian dates as input? I think we've established non-Gregorian calendars exist, but most countries officially adopted the Gregorian calendar several decades before the web existed (e.g. Japan in 1873). Such adoption wasn't exclusive, but it draws into question (for me anyway) whether such calendars are common enough on the web and have enough potential use cases to warrant modeling in microformats. I realize it's difficult to do such research without belonging to the cultures in which it would appear. Unfortunately that just makes it more necessary to avoid mistakes. Peace, Scott Just to clarify, the original point I was trying to make wasn't that we should model every possible language/calendar in the world. Just that it was unreasonable to expect that from a potential replacement for ISO 8601, since ISO 8601 itself does not meet that requirement. This was in response to David O who wrote: Feel free to get started. I'm sure you can start a wiki page with a listing of language/region codes and the suggested date format for each. Since the current system handles every one of those languages and countries/regions, it would only be logical to expect the same of a suggested replacement. I hope I have convinced a few people that David O's logic falls down at the premise. But this is not to argue that we should make a replacement format that handles that usecase, but rather to consider replacements that don't, since such a replacement would be no worse than the current format, but *would* provide benefits that ISO8601 does not. And just for the record, I would happily construct such a wikipage, but I am overcommitted as it is! Perhaps in time, once some things are calmed down. ___ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss
Re: [uf-discuss] Human and machine readable data format
On Fri, Jul 11, 2008 at 6:47 PM, Dan Brickley [EMAIL PROTECTED] wrote: Toby A Inkster wrote: Paul Wilkins wrote: We should leverage the computers ability to do the hard work for us. pDate span class=dateFriday, July the 11th 2008/span/p As I've said before, although my parser does support dates in this format, I strongly recommend *not* allowing these per spec, as it will lead to unpredictable and inconsistent results. Yes, many programming languages do have libraries to do natural language parsing of dates, but these all differ subtly in what formats they support, how they interpret certain ambiguous dates, and how well they internationalise. e.g. I know that Perl's DateTime::Format::Natural, while it can perform very sophisticated parsing (Saturday evening 3 months ago = 2008-05-12T19:00:00, thursday morning last week = 2008-07-03T09:00:00) only includes English in the distributed module (though it has hooks allowing support for other languages). PHP's strtotime function is English only too, and there are differences in how it interprets some natural language dates, not just with Perl, but between different versions of PHP. Natural language parsing is really not the way to go, nor is a limited range of date formats that *look* like NLP, because publishers will believe them to *be* NLP and start publishing in any old date format. ISO8601 is what we must stick with - we just must agree a better way of embedding it than abbr. Thank you for spelling this out so clearly. Please let's not slip into treating the non-English-speaking Web as a corner case. ISO8601's the thing. And it won't always be what the party reading the page expects (either in terms of language, script or even calendar). cheers, Dan In what way is ISO 8601 more friendly to non english speakers than any other date format? Please realise that by insisting that no natural language style will be a solution, you are essentially saying that there is no solution to this problem. 1. metadata and information hiding is out of the question 2. putting ISO 8601 style dates (machine dates) in any place where a human can see it or have it read to them is the problem that we are trying to solve, so we can't do that. 3. The date cannot resemble anything a human might want to read. I find it terribly frustrating how many people cannot see that this set of constraints yeilds NO solution. At least, when the constraints are held to the level of strictness that this community is holding them to. Natural language parsing is really not the way to go, nor is a limited range of date formats that *look* like NLP, because publishers will believe them to *be* NLP and start publishing in any old date format. ISO8601 is what we must stick with - we just must agree a better way of embedding it than abbr. The premise that publishers will pick any old format is merely an assertion with no evidence. Please show us an example somewhere else where this has happened, or perhaps a better argument than merely insisting on the obvious truth of it. The way I see it, if they publish in the wrong format, then the parsers won't pick up the date. This is what happens with microformats already. I don't know about anyone else, but when I publish a microformat, I test whether parsers can read it correctly. I do the same thing with any html. If a publisher can't take the time to test, and publish in the correct format then they take the consequences. it's exactly the same with any other technology. Why should microformats be any different? Why do you think making a microformat resemble natural language drastically changes this set of rules? As to the person who was concerned about forcing a particular format in a place where a human can read it, I have not seen a single proposed solution which does not do this, without violating the no information hiding principle You may not like it, but too bad. Making a date resemble natural languge is the only way to go. I don't say this because it's my opinion. This is merely a fact, due to the nature of the problem, and the constraints that the community has enforced on possible solutions. Accept it, or doom yourselves to reasoning around in circles some more, as you have already done. ___ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss
Re: [uf-discuss] Human and machine readable data format
On Thu, Jul 3, 2008 at 7:04 PM, Dan Brickley [EMAIL PROTECTED] wrote: Breton Slivka wrote: I offer the challenge to those developers: If you sincerely believe that simple internationalized date parsing is an unsolvable or difficult problem (which, as I have pointed out has been solved numerous times already, with two examples), please present your evidence. Why is avoiding this work more important than Accessibility? Why is avoiding this work more important than avoiding hidden metadata? Imagine the English language permutations of Tuesday the forteenth of July, next year in terms of word order. Then allow for all natural languages (in all written scripts). And don't forget we use a variety of calendars. Big job. In theory it could be attempted; but the culture around here is averse to 'theoretical' solutions. Once again this straw man is trotted out. Who is discussing this type of solution other than to specifically discredit the approach as too hard? I certainly am not suggesting this kind of wide ranging natural language parser. I haven't seen anyone else seriously suggesting it It's a foolish undertaking, and it's obviously a foolish undertaking. Then WHY OH WHY does this keep being brought up as though it were being seriously discussed? Where does this idea keep popping out from? Let me give an example in pseudocode of a parser that would work, and would be simple to write, and whose format could be read by a screen reader. function parser ( datestring, locale ) { en-months = [January, February, March, April, May, June, July, August, September, October, November, December] if locale === en-us dateparse[month, day, year] = regex(datestring, ([A-Za-z]+) ([1-3]?[0-9])s|n|r|tt|d|h, ([0-9]{1, 4})); if locale === en-au dateparse[day, month, year] = regex(datestring, ([1-3]?[0-9])s|n|r|tt|d|h ([A-Za-z]+), ([0-9]{1, 4})); if locale === en-uk dateparse[day, month, year] = regex(datestring, ([1-3]?[0-9])s|n|r|tt|d|h ([A-Za-z]+), ([0-9]{1, 4})); if locale.contains(en) dateparse.month = en-months.indexOf(dateparse.month); return dateparse AS [year, month, day]; } This is a simple example. There are likely better techniques for doing this than regexes, (or not) but the point is, that you can make a human READABLE format without having to cover the whole spectrum of human expression. Instead, you have ONE precise format for US dates, ONE precise format for UK dates, ONE precise format for japanese dates, etc, etc. You stick this format of date in the title of an ABBR, and you can say whatever you want about the date in whatever language you like in the contents of the ABBR. The parser shouldn't care about the contents. IT's just looking at the title. IT already is. The only change from the current pattern is that we'd be using a less geeky and obscure format than ISO-8601. The lang attribute of the ABBR element provides the format in use. Honestly how difficult is it for a parser author to collect one format for each locale? I've seen far more heroic efforts on simpler things. How difficult is it for content publishers to learn ONE format? (The one for their own locale) ? How difficult is it to ask content authors to learn a format like this? We're already asking them to learn a more difficult format! Yes it's more complicated than parsing ISO 8601. But it's not boiling the ocean. This isn't a binary decision we're facing. It's not a choice between I could implement it in an hour level of simplicity and Human level AI. Comprimise has to be made if we are to make any progress. ___ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss
Re: Re: [uf-discuss] Human and machine readable data format
I honestly believe the bloat to parsers would be significant sorry, I meant I Honestly believe the 'bloat' to parsers would _not_ be significant ___ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss
Re: [uf-discuss] RE: Microformats and RDFa not as far apart as previously thought
On Mon, Jun 30, 2008 at 5:54 PM, Dan Brickley [EMAIL PROTECTED] wrote: Breton Slivka wrote: I think this sort of counter argument is a straw man. The proposal from Guillaume was not to write a natural language parser that can parse any kind of human written date. The proposal was to parse a very specific and standardized format of date. If one were to write Oktober, the specified behavior for parsers should be to fail, and possibly throw errors. I for one, strongly agree with this approach. Essentially the problem with the ABBR problem that the microformat community faces, is a set of three restrictions, all applied, results in a set of 0 solutions. Every solution I've seen so far only satisfies two of those restrictions, and is immediately shot down by someone in the community who thinks the third restriction is invoilatable. the restrictions: 1. No information hiding 2. Humans first, machines second. 3. It must be in a format that's easily machine parsable. You see the problem here? You guys are going to have to comprimise on one of these three damned restrictions, or face irrelevance! I suggests a 4th should be taken very seriously: 4. Respect the natural language, calendar, and writing system preferences of the human content author. cheers, Dan I thought that was implied by restriction #2, and thus leads to proponents of restriction #3 getting in a hoot because perfectly satisfying #2 is too hard. so from there you can either comprimise #2 or #1 to satisfy proponents of #3. violating #2 is a bad idea, but if you violate #1, Tantek steps in and says you can't do that. Since it's difficult to overcome the influence and authority of Tantek in this community, comprimising #3 is the only way you can go. Otherwise the argument is just going to go around in circles forever. ___ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss
Re: [uf-discuss] RE: Microformats and RDFa not as far apart as previously thought
the restrictions: 1. No information hiding 2. Humans first, machines second. 3. It must be in a format that's easily machine parsable. You see the problem here? You guys are going to have to comprimise on one of these three damned restrictions, or face irrelevance! I suggests a 4th should be taken very seriously: 4. Respect the natural language, calendar, and writing system preferences of the human content author. cheers, Dan I thought that was implied by restriction #2, and thus leads to proponents of restriction #3 getting in a hoot because perfectly satisfying #2 is too hard. so from there you can either comprimise #2 or #1 to satisfy proponents of #3. violating #2 is a bad idea, but if you violate #1, Tantek steps in and says you can't do that. Since it's difficult to overcome the influence and authority of Tantek in this community, comprimising #3 is the only way you can go. Otherwise the argument is just going to go around in circles forever. But really, when you get right down to it, in this community there is at least one, strongly influential person who is a proponent of each of the three restrictions, and you're never going to make all three of them perfectly happy. I'm vastly simplifying the case here, but I think that's basically why the community hasn't cracked this nut yet. It's a wicked problem. Any solution is going to be some kind of comprimise, and there's going to be someone in the community quite passionately against it, so we're basically paralyzed until we can all decide which of the three rules is not sacred. ___ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss
Re: [uf-discuss] RE: Microformats and RDFa not as far apart as previously thought
On Tue, Jul 1, 2008 at 9:49 AM, Breton Slivka [EMAIL PROTECTED] wrote: On Tue, Jul 1, 2008 at 3:11 AM, Ben Ward [EMAIL PROTECTED] wrote: I'd like to make a very important point. On 30 Jun 2008, at 10:38, Breton Slivka wrote: if you violate #1, Tantek steps in and says you can't do that. Since it's difficult to overcome the influence and authority of Tantek in this community, comprimising #3 is the only way you can go. Otherwise the argument is just going to go around in circles forever. To quote the wiki: Microformats are not controlled by any individual or organization — http://microformats.org/wiki/microformats#microformats_are_not Disagreement within community members is always likely, such is the nature of community. At this point in this community's life, no one person is more important than another, and if that were ever to be the case, the community and the effort of microformats generally will suffer greatly. When someone says you 'can't' do something, it's likely in the context of the microformats principals. Someone saying 'no' cannot be backed up only by their reputation and stature. 'Citation needed', is perhaps the most succinct requirement. The most worrying thing about this message is that anyone should perceive the direction of this community as being dictated by one personality's viewpoint. That is not the case, and the microformats effort will fall apart if it ever was. To make decisions pre-emptively out of this misperception is not going to lead us to the best solutions. Additionally, it may well be that we're dealing with a problem right now calls for an exception to a principal. I'm not aware that we've ever consciously made exceptions before, so there's no precedent. As such, the justification for and the scope of such exception needs to be _very clearly documented_ and approached thoroughly. The justification for making an exception needs to be held to very careful scrutiny. B Yes, I know that's the party line, but vehemantly insisting on the truth of such an assertion does not make it true. Are you seriously suggesting that there are cases where someone has proposed a solution involving information hiding, and Tantek HASN'T stepped in, and immediately put an end to all conversation along those lines? If there is such a case, I'm quite curious to see it, and I'm also quite curious to see what else must have stepped in the way to put an end to that line of solutions. Yes, restriction #1, no information hiding is a microformats community principle, but it's quite obviously Tantek's baby, and in the past, it's primarily been Tantek who has enforced that rule, and Tantek's enforcement has been effective. If this reality disrupts your rose colored idealistic view of the microformats community, well, I can't help you. You haven't stated a particularly compelling case. You've only recited community dogma. That said, I actually agree with the rule, and I'm glad it's being enforced, and I don't mind that it happens to be Tantek that's enforcing it. It's a good rule, and I understand the reasoning behind it. All I'm saying is that if we're not going to hide information, and we're not going to make things difficult for humans reading the microformat, or humans writing the microformat, violating restriction #3 is the *ONLY* way to go, until someone happens to pull a magic bullet out of the air. But I'm honestly not holding out hope. If nobody wants to violate Rule #1, and nobody wants to violate Rule #2, we're going to have to make bulkier more complicated parsers. Also, I would like to point out, that the restrictions I've listed are not binary. All the solutions I've seen fall somewhere along a continuum, and indeed many existing microformats violate some principle or another to some extent. The ABBR pattern at the center of this debate violates restriction #1 and restriction #2 to some extent. It's semi hidden data that is semi unfriendly to humans. And it's the truth and value of the restrictions #1 and #2, I think, that have led to the failure of the ABBR pattern- It failed because of those violations. And it's especially those failures in restriction #1, and #2 that I think will force a solution that must violate the implicit community rule of avoiding complicated parsers. ___ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss
Re: [uf-discuss] Human and machine readable data format
I think approaching ISO dates as metadata rather than content will remove the need to compromise on core principles. I think you'll find that metadata of any kind is a comprimise of the microformats core principles. It's information hiding, and the example that tantek uses is the meta tag, which is the prototypical failure of the metadata approach. Let's rewind a bit. The problem isn't necessarily that ISO dates aren't human readable. We've demonstrated that they are, as long as someone is familiar with the format and what it means. It's not fun, it's not friendly, so it violates the principle of Humans first, machines second. That's an issue, but that's not the most important issue. The real problem that sparked this whole debate, is that they aren't machine readable. More specifically, they are read incorrectly by screen readers. Any solution that involves a quick snap judgement as to whether a peice of text is legible to a sighted human is irrelevant to that problem. We need to focus on solutions that target screen readers specifically, because that's what's wrong with the current solution. One way to approach this problem is to fix the screen readers. But we can't do that, so in the meantime, how about just an alternative date format that a screen reader converts to speech correctly? ___ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss
Re: [uf-discuss] RE: Microformats and RDFa not as far apart as previously thought
I think this sort of counter argument is a straw man. The proposal from Guillaume was not to write a natural language parser that can parse any kind of human written date. The proposal was to parse a very specific and standardized format of date. If one were to write Oktober, the specified behavior for parsers should be to fail, and possibly throw errors. I for one, strongly agree with this approach. Essentially the problem with the ABBR problem that the microformat community faces, is a set of three restrictions, all applied, results in a set of 0 solutions. Every solution I've seen so far only satisfies two of those restrictions, and is immediately shot down by someone in the community who thinks the third restriction is invoilatable. the restrictions: 1. No information hiding 2. Humans first, machines second. 3. It must be in a format that's easily machine parsable. You see the problem here? You guys are going to have to comprimise on one of these three damned restrictions, or face irrelevance! On Sat, Jun 28, 2008 at 9:07 PM, Fil [EMAIL PROTECTED] wrote: I'm not a great fan of natural language here. What if I want to write 3l33t (well, not at my age mind you), or punk, maybe use Oktober instead of October cause I'm a (admittedly bad) poet? The human will understand, the computer won't. -- Fil ___ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss ___ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss
Re: [uf-discuss] RE: Microformats and RDFa not as far apart as previously thought
the restrictions: 1. No information hiding 2. Humans first, machines second. 3. It must be in a format that's easily machine parsable. You see the problem here? You guys are going to have to comprimise on one of these three damned restrictions, or face irrelevance! To continue- the reason I strongly agree with Guillaume's proposal (go back and read it, this time without attempting to distort it in order to discredit it), is that it comprimises in the most ridiculous and disingenuous of the three inviolable restrictions. ___ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss
Re: [uf-discuss] Apple Data Detectors
I think what was intended, was rather than try to write a parser that picks up most styles of natural language dates, as you suggest- Instead write a parser that only picks up one or two standard styles of dates. Much like the style guides that are used in academia for writing standard forms of citations, and other things. Decide on a freeform text format that you know a machine can pick up, and excludes ambiguous date formats. But you do raise a valid point here: Even if you do that, you invite the assumption from authors, that since it can pick up this format of date, or that format, then perhaps it will pick up THIS format as well. So authors will write badly formed versions of this freeform standard. There will be typos, too. All kinds of things can happen. But much of these bad things can be aleviated by one of the other suggestions in this thread: As-you-type validation. As soon as you type in Feb for instance, autocomplete style routines kick into action, helping the author write the date in exactly the right format. Then as they hit publish it becomes a microformat, proper, with markup and all. On Feb 6, 2008 3:58 PM, Michael MD [EMAIL PROTECTED] wrote: people write dates, addresses, etc on the Web or on their emails. Asking people to write Tuesday, February 5, 2008 in this order, with the commas, etc. is very likely even simpler for normal people than writing you would *think* so - and it would certainly be nice but the behaviour or most people out in the real world does not suggest that this would be easy. Most freeform text dates I see out there are missing the year (how is a machine supposed to work out what year was intended?)... and a lot of them are in useless ambiguous formats like dd/mm/ or mm/dd/yy - ) ... then there all those other variations... to many to list! I've experimented a bit with trying to parse freeform text dates ... the problem is as soon as its loose enough to pick up most of the common ways people write dates it then also starts to pick up a lot of other stuff as dates that were not intended to be dates at all! ___ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss ___ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss
Re: [uf-discuss] Storing Microformats
I would say that a relational database would be the best option here- if it weren't for SQL. It seems all these suggestions are all more about avoiding the pain of dealing with SQL than doing the right thing. The fact is, that microformat attributes all have well defined relations which can easily be modeled with the relational model. The fact that SQL makes the relational model so difficult to actually do is the huge barrier here (that's why you were using a flat table, and also why it didn't work). One easy option is to simply serialize the microformat into json,xml, or as the original html markup, and store it in a text blob. This is a perfectly legitimate solution which is often avoided due to some misunderstandings about atomicity in databases, or due to performance (most often these performance concerns are 'premature optimization'). if you have more questions about the specifics of how I would do the more difficult (but more flexible) relational solution, which I fear may be straying off topic here, and would take some significant effort to write out, please feel free to email me. On 9/18/07, Philip Tellis [EMAIL PROTECTED] wrote: On 18/09/2007, Paul Kinlan [EMAIL PROTECTED] wrote: One of the other ideas that I am toying with is a Microformat spider, that crawls the web looking for microformats, storing them and then allowing them to be searched. My question is: How are people storing the data present in microformats so that they can be searched and maintained and consumed in applications etc? You may want to look at either an Object Oriented Database or an XML Database. Short of that, you're probably best of just using flat text files (each file is an object here) and letting your search engine index them. You'll quickly run into scale issues wrt number of files per directory, so take care of that. ___ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss ___ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss
Re: [uf-discuss] Re: Microformats UI in Firefox 3
sorry for busting in late on this conversation, but let me get this straight, I'm not sure I follow. 1. You guys are proposing a radical change in microformats, and in the way microformats work, and have given us just a week to discuss/ object 2. If radical change is implemented in firefox, all existing microformatted content will fail to work in firefox3 3. said radical change includes inline styles- functionally identical to presentational html tags. 4. In order to play nice with firefox 3, all publishers of microformatted content would need to add extra stuff to their markup. 5. That extra stuff would *only* be necessary for firefox Are any of the above points incorrect? ___ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss
Re: [uf-discuss] Re: Microformats UI in Firefox 3
Okay well, that's a relief. It's amazing though, that we're talking about enabling designers to design, but have only so far mentioned html, javascript and urls. What about putting the design into the design code: css? Would it not be a simple matter of adding selectors for the firefox mf ui elements? example: x-mozilla-add-hcard { visibility:visible; background:orange; border: 1px solid black; } which would select whatever element is the button/link for adding an hcard. -breton On 04/09/2007, at 9:50 PM, Pelle W wrote: Breton Slivka wrote: 1. You guys are proposing a radical change in microformats, and in the way microformats work, and have given us just a week to discuss/object 2. If radical change is implemented in firefox, all existing microformatted content will fail to work in firefox3 3. said radical change includes inline styles- functionally identical to presentational html tags. 4. In order to play nice with firefox 3, all publishers of microformatted content would need to add extra stuff to their markup. 5. That extra stuff would *only* be necessary for firefox It's more of an addition than a radical change to the microformats which enables the designer to add Firefox-actions right into their own design although such actions will always also be available through Firefox own UI and the suggested addition wouldn't change how any existing microformats would work or should work. It would be totally voluntarily. If it would be part of microformat standard it would work in any tool which implements it. Although I think the suggestion that was made at first wasn't that good, the core problem it tries to solve is relevant: A need for a standardized way for a webdesigner to add interaction between the microformatted data and the parsers actions into their own designs. Could the Microformat community come up witha standard way of interacting with the parsers through JavaScripts or perhaps through new URL:s like mailto: or feed: or in another way? Or is such a standard perhaps out of this community's scope? / Pelle W ___ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss ___ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss
Re: [uf-discuss] Proposal: hArgument Microformat
as stated before, proposals go on microformats-new, not this list. Aside from that- microformats tend to be based on existing practice. Wouldn't it be nice if people stated their assumptions straight off? Microformats or no. Unfortunately, the persuasive power of many arguments depend on the assumptions being kept secret. If the assumptions were stated straight off, it would be so much easier to debunk the argument by simply showing an assumption to be false. In an ideal world, everyone would welcome this level of scrutiny and criticism. Unfortunately, this world is populated with humans rather than ideals, so we're stuck debunking arguments by sussing out the assumptions, or spotting fallacies ourselves. Not everyone is schooled in the nuances of logic. On 21/05/2007, at 7:37 AM, Costello, Roger L. wrote: Hi Folks, Michael Crichton says: The greatest challenge facing mankind is the challenge of distinguishing reality from fantasy, truth from propaganda. Perceiving the truth has always been a challenge to mankind, but in the information age (or as I think of it, the disinformation age) it takes on a special urgency and importance. One of the keys to distinguishing information from disinformation is to have a clear understanding of the assumptions an author is making. Typically, it takes a great deal of effort to distill an author's assumptions. Bring clearly to light the assumptions being made would go a long way towards facilitating a web of trust. I propose an hArgument Microformat with two properties: hArgument assumption (repeatable): a statement of what the author assumes to be true, and upon which his/her conclusion follows. [If it can be demonstrated that the assumption is false, then the conclusion is invalid] conclusion (repeatable): a statement that derives from the assumption(s) Example: below is an example of an argument. The argument can be immediately discredited because the assumptions can be shown to be fallacious: p class=hArgument span class=assumptionMicroformats are a disruptive technology/span span class=assumptionMicroformats are attempting to supplant XML documents with HTML and XHTML documents/span span class=assumptionThe main benefit of Microformats is that it allows graceful degradation/span span class=conclusionMicroformats go too far./span span class=conclusionIt's almost better to use a more suited format in such cases/span /p The advantage of this is that there is no need to guess what are the author's assumptions. They are clearly identified. Use Cases: any web page that tries to convince you of something. The examples are endless. Comments? /Roger ___ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss ___ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss
Re: [uf-discuss] Work-of-art/Tim Gambell
I believe it was agreed to use the also stalled hCite instead. -Breton On 15/05/2007, at 7:03 PM, Ottevanger, Jeremy wrote: Dear all, Raising my head above what I hope is the correct parapet to ask, does anyone know if Tim Gambell, who was seemingly leading work on the proposed work-of-art uf a year ago, is still taking it forward? It looks as though the last thing that happened with it was April 2006. If it's croaked then I'll take myself over to the new ufs list and perhaps propose something over there, but if it's alive, or if anyone here is interested in it, do let me know. Thanks, Jeremy Jeremy Ottevanger Web Developer, Museum Systems Team Museum of London Group 46 Eagle Wharf Road London. N1 7ED Tel: 020 7410 2207 Fax: 020 7600 1058 Email: [EMAIL PROTECTED] www.museumoflondon.org.uk Museum of London is changing; our lower galleries will be closed while they undergo a major new development. Visit www.museumoflondon.org.uk to find out more. London's Burning - explore how the Great Fire of London shaped the city we see today www.museumoflondon.org.uk/londonsburning ___ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss ___ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss
Re: [uf-discuss] human readable date parsing
And yet, to not do so means breaking another restriction. It's about give and take. Is it better to make it easier for publishers, and harder for parsers, or is it better to store the same date twice, and let one go out of sync? Another solution is to just store ISO dates free and clear, and offer a javascript library to parse it into a variety of common/ international date formats. My basic point is, that it is impossible to satisfy all the restrictions with one format. Perhaps it is better to have several ways to mark it up, depending on the situation. Which restriction is it important to NOT break for a particular situation? On 04/05/2007, at 12:42 PM, Michael MD wrote: I don't think this will work, for the same reason tel-type and adr- type don't work: l10n/i18n. They require displayed machine values to be in English. span class=vmonth lang=enJuly/span span class=vmonth lang=esjulio/span span class=vmonth lang=jp7 月/span span class=vmonth lang=ruиюль/span good point ... parsing it might end up needing a database of day and month names and character sets and numbers in every known language! (possibly also other types of calendars that might be used in some parts of the world ... this could get very complicated very quickly!) ___ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss ___ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss
Re: [uf-discuss] human readable date parsing
This is a very difficult problem. Difficult problems need as many potential solutions as possible to be presented- The more solutions, the more chance of arriving at a good one. The tricky part here is creating a solution which is in line with common usage. It seems to me that by basing hcalendar on a single existing format, then expecting it to conform to some wider sense of principles concieved well after that format was created- It's a bit counter productive. the ISO date format itself does not fall in line with common usage, unless you consider the iCalendar format- posted in the raw on an html page to be common, or any ISO date. So basically we are presented with a number of restrictions, which define the range of possible solutions. It seems to me that in order to more effectively solve this problem, this set of restrictions should be clarified- Here's what I've got so far, correct me if I'm wrong. Date markup must: 1 be capable of marking up dates from multiple cultures and languages 2 Follow the DRY principle 3 Be completely visible 4 Follow common usage 5 Be machine readable 6 Be unambiguous and the unstated (and perhaps unconcious) restriction 7 Be as similar to iCalendar as possible in form and function. At least two of these restrictions conflict. Most obvious is number 4 and 6. Common usage is frequently ambiguous, so we should perhaps acknowledge that a microformat that marks up a date is going to either force common usage to be unambiguous (By requiring the inclusion of a year in all dates) Or instead, allow ambiguity through sophisticated (or unsophisticated) guessing on the part of the parser. If this course is taken, this process of guessing should be documented and standardized Or, violate restrictions 2 and 3, which is the current solution. So, are those all the restrictions? In order to arrive at a solution, at least one of them must be violated- are we violating the right one? Here's my contribution to the solutions pool. Violate number 7. Example: July 26th, 2005 span class=vmonthJuly/span span class=vday26/spanth, span class=vyear2005/span This solution is certainly more verbose, but note that it follows all restrictions except for 7. Which restrictions do you want to violate? ___ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss
Re: [uf-discuss] human readable date parsing
I should perhaps add that my solution must also violate either restriction 3, or 4- that is, you can hide the year element with CSS. If you leave it visible, then it may follow common usage in a lot of situations. Or you might end up using a year in situations where you may not usually specify a year, violating the common usage in that situation. If you hide it, then you violate 3. But, the choice of which principle to violate is left in the hands of the author. On 04/05/2007, at 9:49 AM, Breton Slivka wrote: This is a very difficult problem. Difficult problems need as many potential solutions as possible to be presented- The more solutions, the more chance of arriving at a good one. The tricky part here is creating a solution which is in line with common usage. It seems to me that by basing hcalendar on a single existing format, then expecting it to conform to some wider sense of principles concieved well after that format was created- It's a bit counter productive. the ISO date format itself does not fall in line with common usage, unless you consider the iCalendar format- posted in the raw on an html page to be common, or any ISO date. So basically we are presented with a number of restrictions, which define the range of possible solutions. It seems to me that in order to more effectively solve this problem, this set of restrictions should be clarified- Here's what I've got so far, correct me if I'm wrong. Date markup must: 1 be capable of marking up dates from multiple cultures and languages 2 Follow the DRY principle 3 Be completely visible 4 Follow common usage 5 Be machine readable 6 Be unambiguous and the unstated (and perhaps unconcious) restriction 7 Be as similar to iCalendar as possible in form and function. At least two of these restrictions conflict. Most obvious is number 4 and 6. Common usage is frequently ambiguous, so we should perhaps acknowledge that a microformat that marks up a date is going to either force common usage to be unambiguous (By requiring the inclusion of a year in all dates) Or instead, allow ambiguity through sophisticated (or unsophisticated) guessing on the part of the parser. If this course is taken, this process of guessing should be documented and standardized Or, violate restrictions 2 and 3, which is the current solution. So, are those all the restrictions? In order to arrive at a solution, at least one of them must be violated- are we violating the right one? Here's my contribution to the solutions pool. Violate number 7. Example: July 26th, 2005 span class=vmonthJuly/span span class=vday26/spanth, span class=vyear2005/span This solution is certainly more verbose, but note that it follows all restrictions except for 7. Which restrictions do you want to violate? ___ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss ___ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss
Re: [uf-discuss] Include-pattern issues
If I remember correctly, these issues can be dealt with by using an a element instead of an object element. This is endorsed in the spec for the pattern, I believe. On 05/02/2007, at 4:00 PM, Jason Karns wrote: I have two issues with the include-pattern, though they are less with the pattern itself and more with simply implementing it. 1) When using IE (6 and 7) there are many styling issues involved with hiding the object element. Simply display:none is not sufficient. 2) Many accessibility 'validators' will flag empty object elements as errors if no fallback text is supplied. Should these issues be listed on the wiki under include-pattern issues, or on a page as special notes about authoring with the include-pattern? Jason Karns ~~~ The Ohio State University [www.osu.edu] Computer Science Engineering [www.cse.osu.edu] ___ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss ___ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss
Re: [uf-discuss] Extending hCard and hCalendar vs. strict adherence to vcard and vCalendar.
On Jan 4, 2007, at 8:52 AM, Brian Suda wrote: On 1/4/07, Breton Slivka [EMAIL PROTECTED] wrote: div class=vcard span class=fnAbraham Lincoln/span div class=orgUnited States/div div class=adr div class=street-address1600 Pennsylvania Ave./div span class=localityWashington/span , span class=regionDC/span abbr class=dod title=18650415April 15, 1865/abbr /div Then, someone can correct me if this is incorrect, when a client written to deal with DoD encounters class=dod, it can import it with an x- prefix (for vendor specific properties, as allowed by vcard, I think) rather than try and do fancy things with notes. (see note above about client author disagreements). --- i'll keep this breif because we are toeing that fine-line between discuss and dev lists. If you want to chat more about this, we can take this to the dev list. The problem with random x- prefixes is that a parse can NOT determine if the value 'dod' is meant to convey semantics (date of death) or that is purely a CSS style. For instance: div class=vcard span class=fn president blue-box call-out alertAbraham Lincoln/ span /div what becomes 'x-???' in vcard and what doesn't? BEGIN:VCARD FN:Abraham Lincoln X-PRESIDENT:Abraham Lincoln X-BLUE-BOX:Abraham Lincoln X-CALL-OUT:Abraham Lincoln X-ALERT:Abraham Lincoln END:VCARD Because of this, there has not been any attempt to add in the 'X-' parameters into the parsing rules. If you (or anyone) is still interested, feel free to email the dev- list. Thanks, -brian Since I don't have access to the dev list, and only have a passing interest in this, I will simply quickly clarify the point I was attempting to make in the first post. x- extensions being vendor specific, the decision of which classes become x-___ would be vendor specific, and only if a specific application really *really* needs it. Since such extensions are unlikely to become globally adapted, the problem of global application doesn't come into it. 1 website, 1 vendor, 1 application. The problem of adapting extensions to the format as a standard is too big to take so lightly though, so I wouldn't reccomend it beyond specific non standard applications. Applications still need to work together for the most part, and allowing this just opens a big complicated can of worms. ___ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss
Re: [uf-discuss] Extending hCard and hCalendar vs. strict adherence to vcard and vCalendar.
On Dec 29, 2006, at 11:54 AM, Andy Mabbett wrote: In message [EMAIL PROTECTED], Breton Slivka [EMAIL PROTECTED] writes There's a few rather obvious problems with this idea that I can see. However, before I point them out, I will note that if the benefits of such a plan outweigh the problems, then go for it. However I suggest very carefully thinking about this before going nuts with extensions. Who is advocating going nuts? It's a rhetorical device. The intent is to warn against unrestricted extensions to the standard with no problem cases to back up such solutions. Nobody was advocating such a thing, but it is one possible consequence of what you suggested. #1. More work for implementors. While this rarely is seen as an issue for people on this list, (Tantek promotes that it's far more important to make it easier for publishers), one has to consider that if you specify some extension such as date of death, how likely is it to be implemented by anyone other than yourself? #2. In such an implementation, what specific benefit would having a specific field offer over just adding a note? Are there specific use cases when sorting contact information by date of death, for example, is important? You're criticising a wide concept by considering one suggested example. No, I'm using the example you suggested as an example, of the sort design as problem solving thought process one should be using while considering extensions to a format. Namely, asking the questions what problem is this solving? Does it actually need to be solved?, How badly does it need to be solved? What are the consequences of solving it in this way? Are there alternative ways of solving it? etc... Nonetheless, there are sufficient dates of death on the web to suggest that marking them up, semantically, would be useful, and incorporating them in hCards, ditto. useful for what? What problem would such a thing solve? I've never needed to find a person via their date of death, but then, I'm not a mortician or a police investigator, so it may very well be an actual problem for 80%, but this needs to be considered before creating an extension specifically for it, and adding complexity to an already somewhat difficult format. I am picking on your date of death example, but similar questions would need to be asked for every extension. This is especially relevant when incorporating hCards into other uFs, such as those for citations and reviews why? #3. Unreliable round tripping: This would be a fairly minor annoyance, but an annoyance nonetheless. What do you mean by Unreliable round tripping? Client X supports features A, B, C. Client Y supports features A, C, D. How would you deal with exchanging data between these two programs, and maintaining self consistent database structures? Admittedly this is an issue for developers, but it becomes a problem for users when the author of client X and client Y don't agree on how to solve it. #4. Divergent standards: Are there any other extensions to icalendar or vcard being done by other groups and/or vendors? Is there likely to be in the future? No, ad no. See previous discussion. [...] Do you mind linking to any specific posts? The hCard and iCalendar standard allow for vendor specific extensions, anyway, if you really really need feature X for a specific problem. With a clever enough client, and publishing implementation, this can probably be done with hCard and hCalendar as is, while maintaining backward compatibility. How? Feel free to use DoD as an example. -- Andy Mabbett div class=vcard span class=fnAbraham Lincoln/span div class=orgUnited States/div div class=adr div class=street-address1600 Pennsylvania Ave./div span class=localityWashington/span , span class=regionDC/span abbr class=dod title=18650415April 15, 1865/abbr /div Then, someone can correct me if this is incorrect, when a client written to deal with DoD encounters class=dod, it can import it with an x- prefix (for vendor specific properties, as allowed by vcard, I think) rather than try and do fancy things with notes. (see note above about client author disagreements). -Breton ___ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss
Re: Banning for meta-discusion [was RE: [uf-discuss] previously non-referenced in the specReferences]
On Jan 3, 2007, at 8:26 PM, Tantek Çelik wrote: The big difference here (in contrast to Usenet, other lists etc.) is that this community has retained a remarkably positive and inviting tone of discussion for quite a long time, much much more so than those other forums, and those involved with this community very much value that and have chosen to protect that over accommodating individuals whose method/manner of communication is harsher, noisier etc., in spite of well- intentions, good points, and heck, even positive contributions. As I read what's been going on in the list, the issue with Andy hasn't been so much his tone. This being text only medium, tone is very difficult to read into text, and most of the perceived tone of a post comes from personal interpretation. I think the reason Andy is now rubbing people the wrong way is a matter of the lack of substance in his posts. Strip away the emotional appeals, and there's virtually nothing left! If an argument can't be reduced to standard form (http://en.wikipedia.org/wiki/Argument_form), then there is little point to the post, and it becomes like talking to a brick wall. My suggestion then is that in a list which is primarily an impersonal and intellectual discussion on problem solving in a specific domain, the judgement call about whether someone is being disruptive should be based on whether there's actual (not emotional or personal) content in the post. Can the argument be restated in standard form? Considering the nature of this list, posts consisting primarily of emotional appeals and personal attacks just don't fit, and can easily escalate, unless cooler heads prevail. In this case, I think Tantek made the right call under these criteria, whether it was done knowingly or intuitively. ___ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss
Re: [uf-discuss] Extending hCard and hCalendar vs. strict adherence to vcard and vCalendar.
There's a few rather obvious problems with this idea that I can see. However, before I point them out, I will note that if the benefits of such a plan outweigh the problems, then go for it. However I suggest very carefully thinking about this before going nuts with extensions. #1. More work for implementors. While this rarely is seen as an issue for people on this list, (Tantek promotes that it's far more important to make it easier for publishers), one has to consider that if you specify some extension such as date of death, how likely is it to be implemented by anyone other than yourself? #2. In such an implementation, what specific benefit would having a specific field offer over just adding a note? Are there specific use cases when sorting contact information by date of death, for example, is important? #3. Unreliable round tripping: This would be a fairly minor annoyance, but an annoyance nonetheless. #4. Divergent standards: Are there any other extensions to icalendar or vcard being done by other groups and/or vendors? Is there likely to be in the future? This probably won't lead to as feirce a battle as the browser wars in the 90's, but is a potential avenue of pain for new application authors who are asked to implement contradictory features, and I think we all know how this turned out for web browsers in the end. Again, it's more important to make it easier for publishers, than for application authors, but I would ask, how easy has the divergent feature-sets of browsers made it for publishers? I'm sure there's less obvious problems, and just as compelling arguments for extensions, but my feeling is that hCard needs to go in the direction of becoming more simple for publishers, more easy to implement, not more complex. The hCard and iCalendar standard allow for vendor specific extensions, anyway, if you really really need feature X for a specific problem. With a clever enough client, and publishing implementation, this can probably be done with hCard and hCalendar as is, while maintaining backward compatibility. On Dec 22, 2006, at 8:55 AM, Andy Mabbett wrote: It has been made clear [1] that vCard and, presumably, vCalendar are unlikely to be developed or extended in the foreseeable future. It is my belief that we should not let this prevent the development of hCard and hCalendar; and that to do so would not harm compatibility with the former. For example, we could add a date of death field to hCard, and simply mandate that it is ignored (or perhaps treated as a 'note') by parsers which convert hCards to vCards. Does anyone foresee any problems with extensions being made in this manner? [1] http://microformats.org/wiki/vcard-suggestions#Note -- Andy Mabbett * Say NO! to compulsory ID Cards: http:// www.no2id.net/ * Free Our Data: http://www.freeourdata.org.uk * Are you using Microformats, yet: http:// microformats.org/ ? ___ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss ___ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss
Re: [uf-discuss] Easy book citations
Lead by example. If you can get some use out of authoring your own xhtml semantics, do it! Document your process, add it to the appropriate wiki pages. The citation format suffers so much from rhetorical discussion, that I think an account of actual experience in implementation would do nothing but help push the process further towards something useful. The citation microformat is one cowpath that has not quite yet been paved, it would seem. On Jul 30, 2006, at 2:53 AM, Simon Cozens wrote: f the code and the comments disagree, then both are probably wrong. -- Norm Schryer ___ ___ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss
Re: validating microformats (was Re: [uf-discuss] Google Gdata new syndication protocol!)
norman walsh recently posted inn his blog about this very issue http://norman.walsh.name/2006/04/13/validatingMicroformats ___ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss
Re: [uf-discuss] Format-of-Formats?
I mostly agree with tantek, but I would like to point out a few more things to look at as far as this sort of effort goes. XSLT provides more than enough power to describe and extract information out of pages with microformats embedded. x2v demonstrates this. If you're looking for a single implementation for microformats, look no further than libxslt, or sabotron, or whatever your favorite xslt engine. The whole model for this sort of thing is laid out in GRDDL on w3's website. Tim Berners Lee seems to advocate using the GRDDL model to transform microformats into RDF, using xslt. RDF is about as neutral a format for data as you're going to get. So pretty much all the difficult problems for the sort of thing you want have already been solved as best they can be. The difficult part now is adoption. On Mar 30, 2006, at 2:54 PM, Chris Messina wrote: Yeah, I didn't really think that this topic could be solved (or even discussed) herein. It's a nice pipedream, but I do agree falls outside the boundaries of the achieveable goals that we've set out w/ microformats. Chris On 3/30/06, Paul Bryson [EMAIL PROTECTED] wrote: Tantek Ç elik wrote... In practice, this never[*] happens. It's been tried *numerous* times. DTD, XML Schema, etc. In practice, key portions/features of really *useful* specific formats (like HTML) *always* fall outside of the meta- format, and *must* be specified in prose of a specification. This is specifically why I designed XMDP to be to absolute minimum of what is necessary to define/recognize a vocabulary. I'm working on some extensions for includes (to transclude multiple XMDP profiles or portions thereof into a single profile), but other than that, I consider XMDP done. In the spirit of don't reinvent what you can re-use, anyone seriously desiring to work on a format-of-formats should *first* teach themselves DTD, and XML Schema *at a minimum*, before having the arrogance to think they can do better. Why aren't they just using DTD or SML Schema for this? That was the first thing I thought of when Joe first posted. Atamido ___ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss ___ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss ___ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss
Re: [uf-discuss] Format-of-Formats?
Allow me to point you directly to the GRDDL site. http://www.w3.org/TeamSubmission/grddl/ Along with xmdp, I believe it thoroughly addresses all the issues you raise about as well as they can possibly be addressed. On Mar 30, 2006, at 4:01 PM, Joe Reger, Jr. wrote: before having the arrogance to think they can do better. I'm not proposing that we create a replacement for XML Schema or any of the other great technologies out there... just that we agree on one as the most frequently used, most standard, most common, baseline, generally accepted but not perfect way to describe a microformat. As you note, there are a lot of ways to crack this nut. And this is the fact that I'm having trouble with. Toolmakers, aggregators and innovators are having a tough time with microformats because each new one that pops up requires custom code. Instead of taking a leadership role, choosing one and advocating adoption, you seem to revel in the establishment of many microformats. I'm questioning where the customization should be... at the user level where apps are differentiated? Or at the format level? Why should each format have to start at ground zero, write custom plugins, force users to install them and then gain adoption? Why should Technorati have to write custom code at the format level for each format (of course it needs to write custom code at the business logic layer... that's how we all differentiate). If we agree to a framework, even with all of the limitations of whatever framework we choose, aren't we helping users use microformats more? What about the people from National Geographic who want to set up a format to track wildlife? Should they have to understand XML Schema to take part in the microformat revolution? And what about the people in middle Iowa who like to count hay stacks? Should they have to learn arcane programming languages just to define a two field microformat (hay stack color, hay stack size)? I understand your desire to not standardize on a definition language. Because doing so will inherently create limitations to what can be done. And some things just can't be done with a basic approach. And those things that gain massive adoption probably shouldn't be done with a simple approach. I'm talking about the long tail of microformats... who's looking out for all those users? Users are crying out, on this very mailing list, every single day for an easier way to create and use microformats. Maybe we should see microformats.org as the high-end solution with the flexibility to cover everything. But I think we also need a microformats Light that enables most of the functionality that most of the people are looking for. In the last 5 days I've seen these microformats proposed: Bookmark Exchange Format Attention Microformat Citation Format MicroId Plants Format Work of Art Conversation Following this list you see these requests all the time. This week's performance would predict 260 microformats in a year. And really, if somebody's posting to this mailing list they're probably hyper-plugged in to geekland. If we think about our users... the millions of people we rely on to make all of our geeky stuff actually useful... how many formats do you think are out there with pent-up demand? I'd say... um... a lot. And how many formats has microformats.org created/sanctioned so far throughout its history? I see nine specs. Eleven drafts. Thirty seven exploratory discussions. That's 21% of the requested formats we're seeing on this board. And I'd argue that it's about .01% of the total number of microformats that our users would like to see and be able to use. Think of all of the hobbies out there... all of the interest groups... they all track custom data of some sort. Sure, we don't care about that data type... but it's their life... they're passionate about it. Who's serving them? Who's enabling them? Who's letting them publish so that smart entrepreneurs can leverage that data into the next aggregation phenomenon? To me this user-oriented analysis paints an obvious argument for a format-of-formats. The current microformat mailing list and developer community is doing great work but it's not supporting the users who want a quicker means of creating and using microformats. I could be wrong on this... please prove me so. Microformats should be the plumbing and grease for this thing we all (begrudgingly) call Web 2.0. I want to be clear on one thing: I love the work being done on microformats.org. It is truly valuable and innovative. The process and ideals are wonderful. The people doing the work are collaborative and productive. I am in no way against what's being done. And I appreciate and completely understand Tantek's strong desire to squash my ideas quickly before I distract people from the work already being done. I simply see a big gaping hole in what's being done today. What I've been told is essentially that I can take my hole and go play
Re: [uf-discuss] Format-of-Formats?
Yes, as a matter of fact, I do. The w3c. Thanks, I completely agree. What I'm looking for is the best way to get some degree of sanctioning of RDF/XMLSchema/XSL/whatever and then use that sanctioning to gain toolmaker adoption. It would seem to me that this mailing list is the place to do that, but I guess I'm wrong. Do you know of another group that's lobbying toolmakers to support something like this? Best, Joe ___ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss ___ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss
Re: [uf-discuss] Format-of-Formats?
This is where you have completely lost me. You are not making it particularly clear what problem it is that you actually want to solve. Here's some more links. I truly believe this problem is much smaller than you believe it is. http://dannyayers.com/2005/08/01/microformats-on-the-grddl/ http://people.w3.org/~dom/archives/2005/05/grddl-specification-updated/ http://b4mad.net/datenbrei/archives/2005/12/13/grddl-vcard-and- microsformats-a-ballet/ These are not extremely obscure technologies, they solve the problem, the w3c advocates their usage, Blog makers that have any interest in standards and the semantic web *will* adopt them sooner or later. So will /browsers/, and /search engines/. And if they don't, it's not rocket science to write a plugin that makes it work for whatever problem you happen to want to solve. It's very simple, and it's not hidden knowledge on microformats.org. If you want to describe a microformat, use xmdp. If you want to do something with a microformat, write an xslt. This is the standard, this is advocated, and it works today. If you want to help out the effort for adoption of these technologies... adopt them! You don't have to go any further than that. If it works, and does something sexy, then other people will try and do what you did. Very simple. On Mar 30, 2006, at 4:33 PM, Joe Reger, Jr. wrote: Yes, as a matter of fact, I do. The w3c. Nah... I appreciate your effort. But the w3c is not forging relationships with blogging toolmakers and trying to gain adoption of a long tail microformat framework. But I know that those on this list have relationships in place. This critical piece of Web 2.0 plumbing should be in place as soon as possible and that's going to take advocacy. I thought that microformats.org would be interested in being the one to define this piece of the puzzle... it seems a natural extension. And microformats.org can accomplish this much more quickly than little old Joe Reger can. If we all generally understand that it's going to happen, why aren't we taking the leadership role in making it happen? I know what microformats are not but maybe microformats.org can embrace a sub-project to make this happen. We don't have to call them microformats... users don't really care what they're called... as long as they can spin them up easily and leverage the power of the blogosphere. ___ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss ___ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss
Re: [uf-discuss] Citation format straw proposal on the wiki
True, but a mechanism for this sort of thing already exists for microformats in XMDP, and in a somewhat more flexiible form, in that one does not need a monolithic profile for all the modules involved, one can have a seperate profile for each module and link to each seperately. The basic thrust of this is to follow the microformat principal of solving the simple problem first. Out of all these specific domains exists a definite simplest problem. The only dispute that I see is that the simplest problem doesn't solve all the domain specific problems. You wouldn't expect it to! So you make additional microformats to solve the domain specific issues. Thus the micro in microformats, as I understand it. On Mar 29, 2006, at 12:13 PM, Alf Eaton wrote: On 29 Mar 2006, at 14:02, Breton Slivka wrote: If we are for the moment to entertain the idea of modularization, couldn't type then be simply inferred by which module(s) in use? If you go with a nesting microformat model for that, type is encapsulated entirely in the container class of specific modules, and the modules which are in use determine behavior, much the same as embedded svg/mathml does today, or a more direct comparison in the modularization of xhtml. If you embed MathML and SVG in XHTML you still have to use the right DOCTYPE, so that the validator knows which modules are allowed (though admittedly you don't necessarily need the precise DOCTYPE just for displaying/interpreting the document): !DOCTYPE html PUBLIC -//W3C//DTD XHTML 1.1 plus MathML 2.0 plus SVG 1.1//EN http://www.w3.org/2002/04/xhtml-math-svg/xhtml-math-svg.dtd; alf. ___ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss ___ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss
Re: [uf-discuss] Enumerating Microformats on a Page
I've only just recently figured out what xmdp's are, and what they're capable of. I notice the structure of your hcard/hreview/hcalendar thing is such that you can pick an attribute, and search for bits of data which contain that attribute. Have you attempted to detect and parse XMDP's not for validation, but to discover new microformats not documented at microformats.org? Now don't get me wrong, I understand the importance of strong standardization, but say someone creates a niche format for their own site, and related sites in their community with an xmdp, do you think your aggregator could use that xmdp to create new searchable attributes in your search engine? Sorry this is a bit of a tangent, but the idea of it kind of fascinated me. On Mar 24, 2006, at 5:40 PM, Scott Reynen wrote: On Mar 24, 2006, at 4:20 PM, Ryan King wrote: Hmm, this sounds to me like a theoretical argument. I'd like to hear what experience people have had here. Has anyone here worked on crawling to index microformats? If so, what challenges did you face? Yes. The two I know of are reevoo, which aggregates hreviews: http://www.reevoo.com/ and my own effort, which aggregates hcards, hcalendars, and hreviews: http://randomchaos.com/microformats/base/ My main challenges have been a lack of space to store the data (which has nothing to do with microformats) and the the lack of a parser that can read invalid X(HT)ML (which is only an issue because I haven't installed Tidy on my server). If microformat site maps existed, I would use them as starting points to know where to look, but I wouldn't trust them as any sort of accurate listing of what's on a domain just because I know I would likely forget to update my own if I had one. So I'd still be reading the same number of documents, just in a different order. Peace, Scott ___ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss ___ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss