Re: [whatwg] Suggest making dt and dd valid in ol

2012-07-15 Thread Ian Yang
2012/7/15 Jukka K. Korpela jkorp...@cs.tut.fi

 2012-07-14 18:51, Ian Yang wrote:

  If ol is no more and no less ordered than ul,
 what's the purpose of its introduction?


 The real purposes, in the dawn of HTML, were that ol and ul correspond
 to numbered and bulleted lists, respectively, reflecting two very common
 concepts in word processors. This is how they have been used, though some
 authors have started overusing ul for thinks like lists of links even
 when they specifically don't want them to appear as bulleted. Even W3C
 specifications, in their markup, switch to ul in the midst of hierarchy
 when they want bullets and not numbers.

 HTML5 tries to stick to the theoretical idea of ordered vs. unordered
 list, but it does not really change anything, and it is not supposed to
 change anything - any ul will still be rendered in the order written.

 More on this:
 http://www.cs.tut.fi/~**jkorpela/html/ul-ol.htmlhttp://www.cs.tut.fi/%7Ejkorpela/html/ul-ol.html



Thanks. I'm not sure if I understand it correctly. I just couldn't find a
robust information from the article to proof that ol is no more and no
less ordered than ul.

Throughout the article, I saw it mentioned bullets and numbers
frequently. However, that's just browsers' default rendering of ul and
ol. As a coder, personally I don't care how browsers render them by
default. What I care is the meaning of the code I write. That is, when I
want an unordered list, I write ul; when I want an ordered list, I write
ol. ul means unordered list, and ol means ordered list. It's that
simple.

Although there may be some people misuse them (like the example mentioned
in the article), that's not ul and ol's problem.

If I missed anything, please let me know. Thanks again.


Sincerely,
Ian Yang


Re: [whatwg] Suggest making dt and dd valid in ol

2012-07-15 Thread Jukka K. Korpela

2012-07-15 17:40, Ian Yang wrote:

 Throughout the article, I saw it mentioned bullets and numbers
 frequently. However, that's just browsers' default rendering of ul and
 ol.

It's the only real difference between the two.

 As a coder, personally I don't care how browsers render them by
 default.

You should. Check out the Usual CSS Caveats.

 What I care is the meaning of the code I write. That is, when I
 want an unordered list, I write ul; when I want an ordered list, I 
write

 ol. ul means unordered list, and ol means ordered list.

And what does that mean? Does it mean that browser may or will treat 
ul as unordered in the sense that it can render the items in any 
order? If not, what *is* the difference? Just some people's *calling* it 
unordered.


Yucca



Re: [whatwg] Suggest making dt and dd valid in ol

2012-07-15 Thread Leif H Silli

Sat, 14 Jul 2012 23:53:32 +0800, from Ian Yang

Okay, it seems that one of the ideas I mentioned in my original email 
needs

to be revamped.


I was saying that using general heading (H1) and paragraph (p) loses
the meaning of definition term and definition description, but I didn't 


realize that using ol loses the meaning of definition list. That is,

the following code is, in fact, improper:


!-- The following code is improper as it loses the meaning of definition

list. --

ol
   li
   dt/dt
   dd/dd
   /li
   li
   dt/dt
   dd/dd
   /li
   li
   dt/dt
   dd/dd
   /li
/ol


An XOXO list should solve this:

http://microformats.org/wiki/xoxo#Properties_of_Outline_Items

Or just add a dl wrapper around the dt/dd elements in your code above.
--
Leif H Silli 


[whatwg] Security restriction allows content thievery

2012-07-15 Thread Robert Eisele
Browsers are very restrictive when one tries to access the contents of
different domains (including the scheme), embedded via framesets. This is
normally a good practice, but I'd suggest to weaken this restriction for
the data: URI schema.

I'm currently building an analysis system like Google Analytics, which gets
embedded into a website via a small JavaScript snippet. When I analyzed the
data, I came across a very interesting trick because I got a lot of
requests (with the data from location.href) where the entire website was
embedded into a data:text/html URI - except that all ads of the page were
replaced. Fortunately, my tracking code has been left without
modifications.

But the scary thing is that this way you can monetize foreign content by
simply embedding it somewhere you can direct traffic to. That's pretty
clever, because the original site owner doesn't notice this abuse due to
the fact that top.location.href isn't readable. Or even worse, he would
never notice it at all when he doesn't sniff the URI with JavaScript,
because image files would have no referrer.

My final approach to convict the abuser is based on the fact, that the
JavaScript was dynamically loaded from my server and that I can write to
location.href. So I added this piece of code:

if (top.location.protocol === 'data:') {
top.location.href = 'http://example.com/trap/';
}

But even then the referrer will not be passed to the server. So my proposal
is that the data URI schema gets an exception on this security behavior.



Kind Regards

Robert Eisele
http://www.xarg.org/


Re: [whatwg] Security restriction allows content thievery

2012-07-15 Thread Tab Atkins Jr.
On Sun, Jul 15, 2012 at 3:22 PM, Robert Eisele rob...@xarg.org wrote:
 Browsers are very restrictive when one tries to access the contents of
 different domains (including the scheme), embedded via framesets. This is
 normally a good practice, but I'd suggest to weaken this restriction for
 the data: URI schema.

 I'm currently building an analysis system like Google Analytics, which gets
 embedded into a website via a small JavaScript snippet. When I analyzed the
 data, I came across a very interesting trick because I got a lot of
 requests (with the data from location.href) where the entire website was
 embedded into a data:text/html URI - except that all ads of the page were
 replaced. Fortunately, my tracking code has been left without
 modifications.

 But the scary thing is that this way you can monetize foreign content by
 simply embedding it somewhere you can direct traffic to. That's pretty
 clever, because the original site owner doesn't notice this abuse due to
 the fact that top.location.href isn't readable. Or even worse, he would
 never notice it at all when he doesn't sniff the URI with JavaScript,
 because image files would have no referrer.

 My final approach to convict the abuser is based on the fact, that the
 JavaScript was dynamically loaded from my server and that I can write to
 location.href. So I added this piece of code:

 if (top.location.protocol === 'data:') {
 top.location.href = 'http://example.com/trap/';
 }

 But even then the referrer will not be passed to the server. So my proposal
 is that the data URI schema gets an exception on this security behavior.

The problem you outline is not directly tied to the solution you
present.  You can scrape a site and display it as your own without any
fancy tricks, just by downloading all the resources and hosting them
yourself.  This merely consumes a little more bandwidth for the
attacker, since they're hosting the images/etc themselves.

The correct solution to this kind of problem is legal - this is simple
copyright violation.

I'm not sure about the merits of your suggestion otherwise.  It's
reasonable to make data: pages same-origin with their parent when
they're contained within something, but it seems dodgy to make them
same-origin with their *contained* pages as well.  If not done
carefully, that could allow contained pages access to the data: page's
parent as well, or other cross-origin pages that the data: page is
containing.

~TJ


Re: [whatwg] Security restriction allows content thievery

2012-07-15 Thread Robert Eisele
2012/7/16 Tab Atkins Jr. jackalm...@gmail.com

 On Sun, Jul 15, 2012 at 3:22 PM, Robert Eisele rob...@xarg.org wrote:
  Browsers are very restrictive when one tries to access the contents of
  different domains (including the scheme), embedded via framesets. This is
  normally a good practice, but I'd suggest to weaken this restriction for
  the data: URI schema.
 
  I'm currently building an analysis system like Google Analytics, which
 gets
  embedded into a website via a small JavaScript snippet. When I analyzed
 the
  data, I came across a very interesting trick because I got a lot of
  requests (with the data from location.href) where the entire website was
  embedded into a data:text/html URI - except that all ads of the page were
  replaced. Fortunately, my tracking code has been left without
  modifications.
 
  But the scary thing is that this way you can monetize foreign content by
  simply embedding it somewhere you can direct traffic to. That's pretty
  clever, because the original site owner doesn't notice this abuse due to
  the fact that top.location.href isn't readable. Or even worse, he would
  never notice it at all when he doesn't sniff the URI with JavaScript,
  because image files would have no referrer.
 
  My final approach to convict the abuser is based on the fact, that the
  JavaScript was dynamically loaded from my server and that I can write to
  location.href. So I added this piece of code:
 
  if (top.location.protocol === 'data:') {
  top.location.href = 'http://example.com/trap/';
  }
 
  But even then the referrer will not be passed to the server. So my
 proposal
  is that the data URI schema gets an exception on this security behavior.

 The problem you outline is not directly tied to the solution you
 present.  You can scrape a site and display it as your own without any
 fancy tricks, just by downloading all the resources and hosting them
 yourself.  This merely consumes a little more bandwidth for the
 attacker, since they're hosting the images/etc themselves.


But you would get a valid referrer if the tracking code wasn't removed. The
data: protects the abuser in an unecessary way. But you're absolutely right
that the solution I present isn't entirly tied to the problem.


 The correct solution to this kind of problem is legal - this is simple
 copyright violation.


But if you don't have a chance to get information about the attacker, you
can't sue him. I had the strange idea to use a prompt to ask the user for
the original URL in his address bar. But as I said, that's strange.



 I'm not sure about the merits of your suggestion otherwise.  It's
 reasonable to make data: pages same-origin with their parent when
 they're contained within something, but it seems dodgy to make them
 same-origin with their *contained* pages as well.  If not done
 carefully, that could allow contained pages access to the data: page's
 parent as well, or other cross-origin pages that the data: page is
 containing.


Very intuitive thought, one could assume that data: pages are same-origin,
or better that embedded data: pages are part of the current page. In this
way, you wouldn't have the chance to get off the sandbox and access the
parent. What would be a situation where a same-origin could be dangerous?



 ~TJ



Re: [whatwg] Security restriction allows content thievery

2012-07-15 Thread Ryosuke Niwa
On Sun, Jul 15, 2012 at 4:02 PM, Robert Eisele rob...@xarg.org wrote:

 2012/7/16 Tab Atkins Jr. jackalm...@gmail.com
  On Sun, Jul 15, 2012 at 3:22 PM, Robert Eisele rob...@xarg.org wrote:
   Browsers are very restrictive when one tries to access the contents of
   different domains (including the scheme), embedded via framesets. This
 is
   normally a good practice, but I'd suggest to weaken this restriction
 for
   the data: URI schema.
  
   I'm currently building an analysis system like Google Analytics, which
  gets
   embedded into a website via a small JavaScript snippet. When I analyzed
  the
   data, I came across a very interesting trick because I got a lot of
   requests (with the data from location.href) where the entire website
 was
   embedded into a data:text/html URI - except that all ads of the page
 were
   replaced. Fortunately, my tracking code has been left without
   modifications.
  
   But the scary thing is that this way you can monetize foreign content
 by
   simply embedding it somewhere you can direct traffic to. That's pretty
   clever, because the original site owner doesn't notice this abuse due
 to
   the fact that top.location.href isn't readable. Or even worse, he would
   never notice it at all when he doesn't sniff the URI with JavaScript,
   because image files would have no referrer.
  
   My final approach to convict the abuser is based on the fact, that the
   JavaScript was dynamically loaded from my server and that I can write
 to
   location.href. So I added this piece of code:
  
   if (top.location.protocol === 'data:') {
   top.location.href = 'http://example.com/trap/';
   }
  
   But even then the referrer will not be passed to the server. So my
  proposal
   is that the data URI schema gets an exception on this security
 behavior.
 
  The problem you outline is not directly tied to the solution you
  present.  You can scrape a site and display it as your own without any
  fancy tricks, just by downloading all the resources and hosting them
  yourself.  This merely consumes a little more bandwidth for the
  attacker, since they're hosting the images/etc themselves.
 

 But you would get a valid referrer if the tracking code wasn't removed. The
 data: protects the abuser in an unecessary way. But you're absolutely right
 that the solution I present isn't entirly tied to the problem.


The embedder can easily remove the tracking code. Better yet, the embedder
can host the content on his server and disallow access to all external
resources to cripple your tracking code.

 The correct solution to this kind of problem is legal - this is simple
  copyright violation.

 But if you don't have a chance to get information about the attacker, you
 can't sue him. I had the strange idea to use a prompt to ask the user for
 the original URL in his address bar. But as I said, that's strange.


That sounds like a problem we can't solve.

- Ryosuke


Re: [whatwg] Suggest making dt and dd valid in ol

2012-07-15 Thread Ian Yang
2012/7/16 Jukka K. Korpela jkorp...@cs.tut.fi

 2012-07-15 17:40, Ian Yang wrote:
  Throughout the article, I saw it mentioned bullets and numbers
  frequently. However, that's just browsers' default rendering of ul and
  ol.

 It's the only real difference between the two.


Sorry, I still don't get it. ul means unordered list; ol means ordered
list. They are quite different, aren't they?


  As a coder, personally I don't care how browsers render them by
  default.

 You should. Check out the Usual CSS Caveats.


Okay, actually I should say that browser's default rendering is not my *main
concern*.

I know browsers surely have their different default renderings of different
list elements to help readers distinguishing them. But as a coder, my *main
concern* is if the meaning of the code I write correspond the the content,
not the their default renderings (because browsers will handle that).

 What I care is the meaning of the code I write. That is, when I
  want an unordered list, I write ul; when I want an ordered list, I
 write
  ol. ul means unordered list, and ol means ordered list.

 And what does that mean? Does it mean that browser may or will treat ul
 as unordered in the sense that it can render the items in any order? If
 not, what *is* the difference? Just some people's *calling* it unordered.


Imo, ul means the order of the items is unimportant, not browsers can
render the items in any order.

If there were a browser which wants to render the items of ul in any
order, okay, it may do that. Anyway, that's not my main concern.


Sincerely,
Ian Yang


Re: [whatwg] Suggest making dt and dd valid in ol

2012-07-15 Thread Ian Yang
2012/7/16 Leif H Silli xn--mlform-...@xn--mlform-iua.no

 Sat, 14 Jul 2012 23:53:32 +0800, from Ian Yang

 Okay, it seems that one of the ideas I mentioned in my original email
 needs to be revamped.


 I was saying that using general heading (H1) and paragraph (p) loses
 the meaning of definition term and definition description, but I didn't
 realize that using ol loses the meaning of definition list. That is,
 the following code is, in fact, improper:


 !-- The following code is improper as it loses the meaning of
 definition list. --

 ol
li
dt/dt
dd/dd
/li
li
dt/dt
dd/dd
/li
li
dt/dt
dd/dd
/li
 /ol


 An XOXO list should solve this:

 http://microformats.org/wiki/**xoxo#Properties_of_Outline_**Itemshttp://microformats.org/wiki/xoxo#Properties_of_Outline_Items

 Or just add a dl wrapper around the dt/dd elements in your code above.


Thanks for the useful information. I didn't know the XOXO thing before.

However, after reading the examples they provided, I still couldn't
understand its use. Could you please provide me with an example of the use
of XOXO, using the life cycle of the butterfly I mentioned above? Thank you
very much.


Sincerely,
Ian Yang


Re: [whatwg] Suggest making dt and dd valid in ol

2012-07-15 Thread Ian Hickson
On Sat, 14 Jul 2012, Ian Yang wrote:
 
 Recently I was involved in a project. One of its pages has a special 
 content which is like a life cycle. There are several stages in the 
 cycle, each stage has a term followed by some text describing the term. 
 Let's take the life cycle of butterfly for example:
 
 Egg
 A white egg.
 
 Caterpillar
 The egg hatches into a caterpillar. The caterpillar eats and grows a
 tremendous amount.
 
 Pupa
 The caterpillar forms a hard outer shell. Inside the shell, the caterpillar
 changes into a butterfly.
 
 Butterfly
 Butterflies live for only a short time. They will fly, mate, and reproduce.
 The female lays an egg that was fertilized by the male.
 
 By seeing such contents, we usually code it using definition list 
 (dl). At first, I was thinking the same idea. But then I realized that 
 stages in a life cycle should be regarded as ordered contents. So 
 ordered list (ol) would be more appropriate.

ol and dl would both be fine here. I'd probably go with ol, because 
it's a list of states, each of which has a name, rather than a list of 
names, but both are reasonable.

With ol, I'd probably write:

   ol
lidfnEgg/dfn: A white egg.
lidfnCaterpillar/dfn: The egg hatches...

...and so on.


 If we could make dt and dd being not restricted to dl only, but 
 could also exist in ol, the problem will be solved perfectly.

It's not clear that there's a problem to be solved. :-)

(Also, there are parsing issues that make changing this area of the spec 
be rather fraught with peril.)


On Sat, 14 Jul 2012, Anne van Kesteren wrote:
 
 I would recommend not over-thinking the matter. Otherwise soon you will 
 start wrapping your ps in ol/lis too to ensure they stay in the 
 correct order.

True!


 Using dl for ordered groups is perfectly fine.
 
 (The specification points this out as well: The order of the list of 
 groups, and of the names and values within each group, may be 
 significant.)

Indeed.


On Sat, 14 Jul 2012, Jukka K. Korpela wrote:
 
 Indeed. The ol element is no more and no less ordered than ul or any 
 other element. Many HTML tag names are misleading.

It's certainly true that many element names are derived more from 
historical accidents than their current semantics, but ol and ul are 
semantically quite different, as the spec describes.

Specifically, ol implies that the order of the list cannot be changed 
without affecting the meaning of the page, whereas the order in a ul 
list is merely aesthetic.


  (The specification points this out as well: The order of the list of 
  groups, and of the names and values within each group, may be 
  significant.)
 
 That's actually a questionable statement there, since it may make the 
 [reader] ask whether the order of sub-elements is *generally* 
 significant.

That is a good question to ask oneself.


 It's as questionable as it would be to write The order of successive p 
 elements may be significant or The order of successive section 
 elements may be significant.

They indeed _are_ significant. The spec doesn't mention this, though, 
because it's blatently obvious and nobody in their right mind will 
question it. :-)

With dl, we do get people asking whether it's ok to have the order 
matter, so having an explicit statement in the spec allowing it is useful. 
(Witness this very thread for such an example.)


On Sat, 14 Jul 2012, Ian Yang wrote:
 
 So based on the ul and the ol, we could have unordered definition 
 list (udl) and ordered definition list (odl).

I don't really understand what problem this solves.


On Sat, 14 Jul 2012, Ian Yang wrote:
 2012/7/14 Jukka K. Korpela jkorp...@cs.tut.fi
  
  Indeed. The ol element is no more and no less ordered than ul or 
  any other element. Many HTML tag names are misleading.
 
 That's interesting. If ol is no more and no less ordered than ul, 
 what's the purpose of its introduction? Could you provide detailed 
 explanations or examples? Thanks.

Jukka is incorrect in his statement. The difference between ol and ul 
is specifically that the order of elements in ol matters and the order 
of elements in ul does not.

From the spec:

# The ol element represents a list of items, where the items have been 
# intentionally ordered, such that changing the order would change the 
# meaning of the document.

# The ul element represents a list of items, where the order of the items 
# is not important -- that is, where changing the order would not 
# materially change the meaning of the document.

There are examples in the two sections that illustrate the quite serious 
semantic difference between the two.


On Sat, 14 Jul 2012, Jukka K. Korpela wrote:
 
 The real purposes, in the dawn of HTML, were that ol and ul 
 correspond to numbered and bulleted lists, respectively, reflecting two 
 very common concepts in word processors. This is how they have been 
 used, though some authors have started overusing ul for thinks like 
 lists of links even when they specifically 

Re: [whatwg] Suggest making dt and dd valid in ol

2012-07-15 Thread Jukka K. Korpela

2012-07-16 5:36, Ian Yang wrote:


Imo, ul means the order of the items is unimportant, not browsers can
render the items in any order.


But if the order is unimportant, there still _is_ an order. Being 
unordered would be something else. And what would it matter to indicate 
the order as important if you only do that in markup, without affecting 
rendering, search engines, etc., at all? It's like invisible ink in a 
book. If it is somehow relevant to say that the order is unimportant, 
you have to, well, *say* it (in words).


The only reason for this unordered list idea (a list is by definition 
unordered; a set, or a multiset, is not) is the willingness to keep ul 
and ol in HTML (it would be very impractical to omit one of them) 
without admitting that they were introduced, and are being used, simply 
for bulleted and numbered lists. So this resembles the confusing play 
with words regarding i and b.


Yucca