Re: [whatwg] Suggest making dt and dd valid in ol
2012/7/15 Jukka K. Korpela jkorp...@cs.tut.fi 2012-07-14 18:51, Ian Yang wrote: If ol is no more and no less ordered than ul, what's the purpose of its introduction? The real purposes, in the dawn of HTML, were that ol and ul correspond to numbered and bulleted lists, respectively, reflecting two very common concepts in word processors. This is how they have been used, though some authors have started overusing ul for thinks like lists of links even when they specifically don't want them to appear as bulleted. Even W3C specifications, in their markup, switch to ul in the midst of hierarchy when they want bullets and not numbers. HTML5 tries to stick to the theoretical idea of ordered vs. unordered list, but it does not really change anything, and it is not supposed to change anything - any ul will still be rendered in the order written. More on this: http://www.cs.tut.fi/~**jkorpela/html/ul-ol.htmlhttp://www.cs.tut.fi/%7Ejkorpela/html/ul-ol.html Thanks. I'm not sure if I understand it correctly. I just couldn't find a robust information from the article to proof that ol is no more and no less ordered than ul. Throughout the article, I saw it mentioned bullets and numbers frequently. However, that's just browsers' default rendering of ul and ol. As a coder, personally I don't care how browsers render them by default. What I care is the meaning of the code I write. That is, when I want an unordered list, I write ul; when I want an ordered list, I write ol. ul means unordered list, and ol means ordered list. It's that simple. Although there may be some people misuse them (like the example mentioned in the article), that's not ul and ol's problem. If I missed anything, please let me know. Thanks again. Sincerely, Ian Yang
Re: [whatwg] Suggest making dt and dd valid in ol
2012-07-15 17:40, Ian Yang wrote: Throughout the article, I saw it mentioned bullets and numbers frequently. However, that's just browsers' default rendering of ul and ol. It's the only real difference between the two. As a coder, personally I don't care how browsers render them by default. You should. Check out the Usual CSS Caveats. What I care is the meaning of the code I write. That is, when I want an unordered list, I write ul; when I want an ordered list, I write ol. ul means unordered list, and ol means ordered list. And what does that mean? Does it mean that browser may or will treat ul as unordered in the sense that it can render the items in any order? If not, what *is* the difference? Just some people's *calling* it unordered. Yucca
Re: [whatwg] Suggest making dt and dd valid in ol
Sat, 14 Jul 2012 23:53:32 +0800, from Ian Yang Okay, it seems that one of the ideas I mentioned in my original email needs to be revamped. I was saying that using general heading (H1) and paragraph (p) loses the meaning of definition term and definition description, but I didn't realize that using ol loses the meaning of definition list. That is, the following code is, in fact, improper: !-- The following code is improper as it loses the meaning of definition list. -- ol li dt/dt dd/dd /li li dt/dt dd/dd /li li dt/dt dd/dd /li /ol An XOXO list should solve this: http://microformats.org/wiki/xoxo#Properties_of_Outline_Items Or just add a dl wrapper around the dt/dd elements in your code above. -- Leif H Silli
[whatwg] Security restriction allows content thievery
Browsers are very restrictive when one tries to access the contents of different domains (including the scheme), embedded via framesets. This is normally a good practice, but I'd suggest to weaken this restriction for the data: URI schema. I'm currently building an analysis system like Google Analytics, which gets embedded into a website via a small JavaScript snippet. When I analyzed the data, I came across a very interesting trick because I got a lot of requests (with the data from location.href) where the entire website was embedded into a data:text/html URI - except that all ads of the page were replaced. Fortunately, my tracking code has been left without modifications. But the scary thing is that this way you can monetize foreign content by simply embedding it somewhere you can direct traffic to. That's pretty clever, because the original site owner doesn't notice this abuse due to the fact that top.location.href isn't readable. Or even worse, he would never notice it at all when he doesn't sniff the URI with JavaScript, because image files would have no referrer. My final approach to convict the abuser is based on the fact, that the JavaScript was dynamically loaded from my server and that I can write to location.href. So I added this piece of code: if (top.location.protocol === 'data:') { top.location.href = 'http://example.com/trap/'; } But even then the referrer will not be passed to the server. So my proposal is that the data URI schema gets an exception on this security behavior. Kind Regards Robert Eisele http://www.xarg.org/
Re: [whatwg] Security restriction allows content thievery
On Sun, Jul 15, 2012 at 3:22 PM, Robert Eisele rob...@xarg.org wrote: Browsers are very restrictive when one tries to access the contents of different domains (including the scheme), embedded via framesets. This is normally a good practice, but I'd suggest to weaken this restriction for the data: URI schema. I'm currently building an analysis system like Google Analytics, which gets embedded into a website via a small JavaScript snippet. When I analyzed the data, I came across a very interesting trick because I got a lot of requests (with the data from location.href) where the entire website was embedded into a data:text/html URI - except that all ads of the page were replaced. Fortunately, my tracking code has been left without modifications. But the scary thing is that this way you can monetize foreign content by simply embedding it somewhere you can direct traffic to. That's pretty clever, because the original site owner doesn't notice this abuse due to the fact that top.location.href isn't readable. Or even worse, he would never notice it at all when he doesn't sniff the URI with JavaScript, because image files would have no referrer. My final approach to convict the abuser is based on the fact, that the JavaScript was dynamically loaded from my server and that I can write to location.href. So I added this piece of code: if (top.location.protocol === 'data:') { top.location.href = 'http://example.com/trap/'; } But even then the referrer will not be passed to the server. So my proposal is that the data URI schema gets an exception on this security behavior. The problem you outline is not directly tied to the solution you present. You can scrape a site and display it as your own without any fancy tricks, just by downloading all the resources and hosting them yourself. This merely consumes a little more bandwidth for the attacker, since they're hosting the images/etc themselves. The correct solution to this kind of problem is legal - this is simple copyright violation. I'm not sure about the merits of your suggestion otherwise. It's reasonable to make data: pages same-origin with their parent when they're contained within something, but it seems dodgy to make them same-origin with their *contained* pages as well. If not done carefully, that could allow contained pages access to the data: page's parent as well, or other cross-origin pages that the data: page is containing. ~TJ
Re: [whatwg] Security restriction allows content thievery
2012/7/16 Tab Atkins Jr. jackalm...@gmail.com On Sun, Jul 15, 2012 at 3:22 PM, Robert Eisele rob...@xarg.org wrote: Browsers are very restrictive when one tries to access the contents of different domains (including the scheme), embedded via framesets. This is normally a good practice, but I'd suggest to weaken this restriction for the data: URI schema. I'm currently building an analysis system like Google Analytics, which gets embedded into a website via a small JavaScript snippet. When I analyzed the data, I came across a very interesting trick because I got a lot of requests (with the data from location.href) where the entire website was embedded into a data:text/html URI - except that all ads of the page were replaced. Fortunately, my tracking code has been left without modifications. But the scary thing is that this way you can monetize foreign content by simply embedding it somewhere you can direct traffic to. That's pretty clever, because the original site owner doesn't notice this abuse due to the fact that top.location.href isn't readable. Or even worse, he would never notice it at all when he doesn't sniff the URI with JavaScript, because image files would have no referrer. My final approach to convict the abuser is based on the fact, that the JavaScript was dynamically loaded from my server and that I can write to location.href. So I added this piece of code: if (top.location.protocol === 'data:') { top.location.href = 'http://example.com/trap/'; } But even then the referrer will not be passed to the server. So my proposal is that the data URI schema gets an exception on this security behavior. The problem you outline is not directly tied to the solution you present. You can scrape a site and display it as your own without any fancy tricks, just by downloading all the resources and hosting them yourself. This merely consumes a little more bandwidth for the attacker, since they're hosting the images/etc themselves. But you would get a valid referrer if the tracking code wasn't removed. The data: protects the abuser in an unecessary way. But you're absolutely right that the solution I present isn't entirly tied to the problem. The correct solution to this kind of problem is legal - this is simple copyright violation. But if you don't have a chance to get information about the attacker, you can't sue him. I had the strange idea to use a prompt to ask the user for the original URL in his address bar. But as I said, that's strange. I'm not sure about the merits of your suggestion otherwise. It's reasonable to make data: pages same-origin with their parent when they're contained within something, but it seems dodgy to make them same-origin with their *contained* pages as well. If not done carefully, that could allow contained pages access to the data: page's parent as well, or other cross-origin pages that the data: page is containing. Very intuitive thought, one could assume that data: pages are same-origin, or better that embedded data: pages are part of the current page. In this way, you wouldn't have the chance to get off the sandbox and access the parent. What would be a situation where a same-origin could be dangerous? ~TJ
Re: [whatwg] Security restriction allows content thievery
On Sun, Jul 15, 2012 at 4:02 PM, Robert Eisele rob...@xarg.org wrote: 2012/7/16 Tab Atkins Jr. jackalm...@gmail.com On Sun, Jul 15, 2012 at 3:22 PM, Robert Eisele rob...@xarg.org wrote: Browsers are very restrictive when one tries to access the contents of different domains (including the scheme), embedded via framesets. This is normally a good practice, but I'd suggest to weaken this restriction for the data: URI schema. I'm currently building an analysis system like Google Analytics, which gets embedded into a website via a small JavaScript snippet. When I analyzed the data, I came across a very interesting trick because I got a lot of requests (with the data from location.href) where the entire website was embedded into a data:text/html URI - except that all ads of the page were replaced. Fortunately, my tracking code has been left without modifications. But the scary thing is that this way you can monetize foreign content by simply embedding it somewhere you can direct traffic to. That's pretty clever, because the original site owner doesn't notice this abuse due to the fact that top.location.href isn't readable. Or even worse, he would never notice it at all when he doesn't sniff the URI with JavaScript, because image files would have no referrer. My final approach to convict the abuser is based on the fact, that the JavaScript was dynamically loaded from my server and that I can write to location.href. So I added this piece of code: if (top.location.protocol === 'data:') { top.location.href = 'http://example.com/trap/'; } But even then the referrer will not be passed to the server. So my proposal is that the data URI schema gets an exception on this security behavior. The problem you outline is not directly tied to the solution you present. You can scrape a site and display it as your own without any fancy tricks, just by downloading all the resources and hosting them yourself. This merely consumes a little more bandwidth for the attacker, since they're hosting the images/etc themselves. But you would get a valid referrer if the tracking code wasn't removed. The data: protects the abuser in an unecessary way. But you're absolutely right that the solution I present isn't entirly tied to the problem. The embedder can easily remove the tracking code. Better yet, the embedder can host the content on his server and disallow access to all external resources to cripple your tracking code. The correct solution to this kind of problem is legal - this is simple copyright violation. But if you don't have a chance to get information about the attacker, you can't sue him. I had the strange idea to use a prompt to ask the user for the original URL in his address bar. But as I said, that's strange. That sounds like a problem we can't solve. - Ryosuke
Re: [whatwg] Suggest making dt and dd valid in ol
2012/7/16 Jukka K. Korpela jkorp...@cs.tut.fi 2012-07-15 17:40, Ian Yang wrote: Throughout the article, I saw it mentioned bullets and numbers frequently. However, that's just browsers' default rendering of ul and ol. It's the only real difference between the two. Sorry, I still don't get it. ul means unordered list; ol means ordered list. They are quite different, aren't they? As a coder, personally I don't care how browsers render them by default. You should. Check out the Usual CSS Caveats. Okay, actually I should say that browser's default rendering is not my *main concern*. I know browsers surely have their different default renderings of different list elements to help readers distinguishing them. But as a coder, my *main concern* is if the meaning of the code I write correspond the the content, not the their default renderings (because browsers will handle that). What I care is the meaning of the code I write. That is, when I want an unordered list, I write ul; when I want an ordered list, I write ol. ul means unordered list, and ol means ordered list. And what does that mean? Does it mean that browser may or will treat ul as unordered in the sense that it can render the items in any order? If not, what *is* the difference? Just some people's *calling* it unordered. Imo, ul means the order of the items is unimportant, not browsers can render the items in any order. If there were a browser which wants to render the items of ul in any order, okay, it may do that. Anyway, that's not my main concern. Sincerely, Ian Yang
Re: [whatwg] Suggest making dt and dd valid in ol
2012/7/16 Leif H Silli xn--mlform-...@xn--mlform-iua.no Sat, 14 Jul 2012 23:53:32 +0800, from Ian Yang Okay, it seems that one of the ideas I mentioned in my original email needs to be revamped. I was saying that using general heading (H1) and paragraph (p) loses the meaning of definition term and definition description, but I didn't realize that using ol loses the meaning of definition list. That is, the following code is, in fact, improper: !-- The following code is improper as it loses the meaning of definition list. -- ol li dt/dt dd/dd /li li dt/dt dd/dd /li li dt/dt dd/dd /li /ol An XOXO list should solve this: http://microformats.org/wiki/**xoxo#Properties_of_Outline_**Itemshttp://microformats.org/wiki/xoxo#Properties_of_Outline_Items Or just add a dl wrapper around the dt/dd elements in your code above. Thanks for the useful information. I didn't know the XOXO thing before. However, after reading the examples they provided, I still couldn't understand its use. Could you please provide me with an example of the use of XOXO, using the life cycle of the butterfly I mentioned above? Thank you very much. Sincerely, Ian Yang
Re: [whatwg] Suggest making dt and dd valid in ol
On Sat, 14 Jul 2012, Ian Yang wrote: Recently I was involved in a project. One of its pages has a special content which is like a life cycle. There are several stages in the cycle, each stage has a term followed by some text describing the term. Let's take the life cycle of butterfly for example: Egg A white egg. Caterpillar The egg hatches into a caterpillar. The caterpillar eats and grows a tremendous amount. Pupa The caterpillar forms a hard outer shell. Inside the shell, the caterpillar changes into a butterfly. Butterfly Butterflies live for only a short time. They will fly, mate, and reproduce. The female lays an egg that was fertilized by the male. By seeing such contents, we usually code it using definition list (dl). At first, I was thinking the same idea. But then I realized that stages in a life cycle should be regarded as ordered contents. So ordered list (ol) would be more appropriate. ol and dl would both be fine here. I'd probably go with ol, because it's a list of states, each of which has a name, rather than a list of names, but both are reasonable. With ol, I'd probably write: ol lidfnEgg/dfn: A white egg. lidfnCaterpillar/dfn: The egg hatches... ...and so on. If we could make dt and dd being not restricted to dl only, but could also exist in ol, the problem will be solved perfectly. It's not clear that there's a problem to be solved. :-) (Also, there are parsing issues that make changing this area of the spec be rather fraught with peril.) On Sat, 14 Jul 2012, Anne van Kesteren wrote: I would recommend not over-thinking the matter. Otherwise soon you will start wrapping your ps in ol/lis too to ensure they stay in the correct order. True! Using dl for ordered groups is perfectly fine. (The specification points this out as well: The order of the list of groups, and of the names and values within each group, may be significant.) Indeed. On Sat, 14 Jul 2012, Jukka K. Korpela wrote: Indeed. The ol element is no more and no less ordered than ul or any other element. Many HTML tag names are misleading. It's certainly true that many element names are derived more from historical accidents than their current semantics, but ol and ul are semantically quite different, as the spec describes. Specifically, ol implies that the order of the list cannot be changed without affecting the meaning of the page, whereas the order in a ul list is merely aesthetic. (The specification points this out as well: The order of the list of groups, and of the names and values within each group, may be significant.) That's actually a questionable statement there, since it may make the [reader] ask whether the order of sub-elements is *generally* significant. That is a good question to ask oneself. It's as questionable as it would be to write The order of successive p elements may be significant or The order of successive section elements may be significant. They indeed _are_ significant. The spec doesn't mention this, though, because it's blatently obvious and nobody in their right mind will question it. :-) With dl, we do get people asking whether it's ok to have the order matter, so having an explicit statement in the spec allowing it is useful. (Witness this very thread for such an example.) On Sat, 14 Jul 2012, Ian Yang wrote: So based on the ul and the ol, we could have unordered definition list (udl) and ordered definition list (odl). I don't really understand what problem this solves. On Sat, 14 Jul 2012, Ian Yang wrote: 2012/7/14 Jukka K. Korpela jkorp...@cs.tut.fi Indeed. The ol element is no more and no less ordered than ul or any other element. Many HTML tag names are misleading. That's interesting. If ol is no more and no less ordered than ul, what's the purpose of its introduction? Could you provide detailed explanations or examples? Thanks. Jukka is incorrect in his statement. The difference between ol and ul is specifically that the order of elements in ol matters and the order of elements in ul does not. From the spec: # The ol element represents a list of items, where the items have been # intentionally ordered, such that changing the order would change the # meaning of the document. # The ul element represents a list of items, where the order of the items # is not important -- that is, where changing the order would not # materially change the meaning of the document. There are examples in the two sections that illustrate the quite serious semantic difference between the two. On Sat, 14 Jul 2012, Jukka K. Korpela wrote: The real purposes, in the dawn of HTML, were that ol and ul correspond to numbered and bulleted lists, respectively, reflecting two very common concepts in word processors. This is how they have been used, though some authors have started overusing ul for thinks like lists of links even when they specifically
Re: [whatwg] Suggest making dt and dd valid in ol
2012-07-16 5:36, Ian Yang wrote: Imo, ul means the order of the items is unimportant, not browsers can render the items in any order. But if the order is unimportant, there still _is_ an order. Being unordered would be something else. And what would it matter to indicate the order as important if you only do that in markup, without affecting rendering, search engines, etc., at all? It's like invisible ink in a book. If it is somehow relevant to say that the order is unimportant, you have to, well, *say* it (in words). The only reason for this unordered list idea (a list is by definition unordered; a set, or a multiset, is not) is the willingness to keep ul and ol in HTML (it would be very impractical to omit one of them) without admitting that they were introduced, and are being used, simply for bulleted and numbered lists. So this resembles the confusing play with words regarding i and b. Yucca