Re: [whatwg] Security restriction allows content thievery

2012-09-07 Thread Adam Barth
On Thu, Sep 6, 2012 at 9:53 PM, Ian Hickson i...@hixie.ch wrote:
 On Fri, 7 Sep 2012, Fred Andrews wrote:
 I think the aim is to have the URL of the page that includes these data:
 URLs sent to the tracking server?

 Ah, I see. So say you have a page A, which itself contains a data: URL,
 and you load that data: URL as page B, and in B there is a link to another
 resource C, the argument here is that in the network request for C, the
 referrer information should be of A, rather than B?

 That's an interesting idea... Any browser vendors want to chip in on this?

We're unlikely to implement that in WebKit.  We'd like to keep
documents created by data URLs in a unique origin and avoid leaking
privileges (including the privilege to send a certain Referer into the
iframe).

Adam


Re: [whatwg] Security restriction allows content thievery

2012-09-06 Thread Ian Hickson
On Mon, 16 Jul 2012, Robert Eisele wrote:

 Browsers are very restrictive when one tries to access the contents of 
 different domains (including the scheme), embedded via framesets. This 
 is normally a good practice, but I'd suggest to weaken this restriction 
 for the data: URI schema.

It already is. The origin of documents and images using data: URLs is 
essentially the origin of wherever you found the URL.


 I'm currently building an analysis system like Google Analytics, which 
 gets embedded into a website via a small JavaScript snippet. When I 
 analyzed the data, I came across a very interesting trick because I got 
 a lot of requests (with the data from location.href) where the entire 
 website was embedded into a data:text/html URI - except that all ads of 
 the page were replaced. Fortunately, my tracking code has been left 
 without modifications.

Weird.


 But the scary thing is that this way you can monetize foreign content by 
 simply embedding it somewhere you can direct traffic to. That's pretty 
 clever, because the original site owner doesn't notice this abuse due to 
 the fact that top.location.href isn't readable. Or even worse, he would 
 never notice it at all when he doesn't sniff the URI with JavaScript, 
 because image files would have no referrer.
 
 My final approach to convict the abuser is based on the fact, that the 
 JavaScript was dynamically loaded from my server and that I can write to 
 location.href. So I added this piece of code:
 
 if (top.location.protocol === 'data:') {
 top.location.href = 'http://example.com/trap/';
 }
 
 But even then the referrer will not be passed to the server. So my 
 proposal is that the data URI schema gets an exception on this security 
 behavior.

I don't understand. What referrer are you trying to set? To what?

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Security restriction allows content thievery

2012-09-06 Thread Fred Andrews

  I'm currently building an analysis system like Google Analytics, which 
  gets embedded into a website via a small JavaScript snippet. When I 
  analyzed the data, I came across a very interesting trick because I got 
  a lot of requests (with the data from location.href) where the entire 
  website was embedded into a data:text/html URI - except that all ads of 
  the page were replaced. Fortunately, my tracking code has been left 
  without modifications.
 
 Weird.

Perhaps the concern is that content has been copied into a data: URL in 
violation of copyrights and used to obtain Ad revenue. However the content 
could very well be used with permission.  Ads are dynamic and do change on 
otherwise static content pages.  Thus this could well be an honest use of 
technology. It would be interesting to know if the search engines actually look 
at content in data: URLs - if not then the 'copied' content would seem to bring 
little advantage.

Or perhaps the concern is just that it thwarts efforts to track the referer.
 
  But the scary thing is that this way you can monetize foreign content by 
  simply embedding it somewhere you can direct traffic to. That's pretty 
  clever, because the original site owner doesn't notice this abuse due to 
  the fact that top.location.href isn't readable. Or even worse, he would 
  never notice it at all when he doesn't sniff the URI with JavaScript, 
  because image files would have no referrer.
  
  My final approach to convict the abuser is based on the fact, that the 
  JavaScript was dynamically loaded from my server and that I can write to 
  location.href. So I added this piece of code:
  
  if (top.location.protocol === 'data:') {
  top.location.href = 'http://example.com/trap/';
  }
  
  But even then the referrer will not be passed to the server. So my 
  proposal is that the data URI schema gets an exception on this security 
  behavior.
 
 I don't understand. What referrer are you trying to set? To what?

I think the aim is to have the URL of the page that includes these data: URLs 
sent to the tracking server?

I can't see any technical issues raised here?

Some think trackers are 'scary' and consider user privacy and safety more 
important, and would prefer to not send a referer and to even have such  
Javascript sandboxed so that it can't leak private information.

cheers
Fred





  

Re: [whatwg] Security restriction allows content thievery

2012-09-06 Thread Ian Hickson
On Fri, 7 Sep 2012, Fred Andrews wrote:
 
 I think the aim is to have the URL of the page that includes these data: 
 URLs sent to the tracking server?

Ah, I see. So say you have a page A, which itself contains a data: URL, 
and you load that data: URL as page B, and in B there is a link to another 
resource C, the argument here is that in the network request for C, the 
referrer information should be of A, rather than B?

That's an interesting idea... Any browser vendors want to chip in on this?

Unless there is browser-vendor interest in implementing this, I don't 
intend to add it to the spec, since it seems a little esoteric and could 
leak referrers in cases where authors had previously assumed they'd be 
safe (e.g. if a Webmail app is opening e-mails in iframes using data: URLs 
to prevent the e-mail's images from including the user's webmail client's 
URL in the referrer information, or something).

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


[whatwg] Security restriction allows content thievery

2012-07-15 Thread Robert Eisele
Browsers are very restrictive when one tries to access the contents of
different domains (including the scheme), embedded via framesets. This is
normally a good practice, but I'd suggest to weaken this restriction for
the data: URI schema.

I'm currently building an analysis system like Google Analytics, which gets
embedded into a website via a small JavaScript snippet. When I analyzed the
data, I came across a very interesting trick because I got a lot of
requests (with the data from location.href) where the entire website was
embedded into a data:text/html URI - except that all ads of the page were
replaced. Fortunately, my tracking code has been left without
modifications.

But the scary thing is that this way you can monetize foreign content by
simply embedding it somewhere you can direct traffic to. That's pretty
clever, because the original site owner doesn't notice this abuse due to
the fact that top.location.href isn't readable. Or even worse, he would
never notice it at all when he doesn't sniff the URI with JavaScript,
because image files would have no referrer.

My final approach to convict the abuser is based on the fact, that the
JavaScript was dynamically loaded from my server and that I can write to
location.href. So I added this piece of code:

if (top.location.protocol === 'data:') {
top.location.href = 'http://example.com/trap/';
}

But even then the referrer will not be passed to the server. So my proposal
is that the data URI schema gets an exception on this security behavior.



Kind Regards

Robert Eisele
http://www.xarg.org/


Re: [whatwg] Security restriction allows content thievery

2012-07-15 Thread Tab Atkins Jr.
On Sun, Jul 15, 2012 at 3:22 PM, Robert Eisele rob...@xarg.org wrote:
 Browsers are very restrictive when one tries to access the contents of
 different domains (including the scheme), embedded via framesets. This is
 normally a good practice, but I'd suggest to weaken this restriction for
 the data: URI schema.

 I'm currently building an analysis system like Google Analytics, which gets
 embedded into a website via a small JavaScript snippet. When I analyzed the
 data, I came across a very interesting trick because I got a lot of
 requests (with the data from location.href) where the entire website was
 embedded into a data:text/html URI - except that all ads of the page were
 replaced. Fortunately, my tracking code has been left without
 modifications.

 But the scary thing is that this way you can monetize foreign content by
 simply embedding it somewhere you can direct traffic to. That's pretty
 clever, because the original site owner doesn't notice this abuse due to
 the fact that top.location.href isn't readable. Or even worse, he would
 never notice it at all when he doesn't sniff the URI with JavaScript,
 because image files would have no referrer.

 My final approach to convict the abuser is based on the fact, that the
 JavaScript was dynamically loaded from my server and that I can write to
 location.href. So I added this piece of code:

 if (top.location.protocol === 'data:') {
 top.location.href = 'http://example.com/trap/';
 }

 But even then the referrer will not be passed to the server. So my proposal
 is that the data URI schema gets an exception on this security behavior.

The problem you outline is not directly tied to the solution you
present.  You can scrape a site and display it as your own without any
fancy tricks, just by downloading all the resources and hosting them
yourself.  This merely consumes a little more bandwidth for the
attacker, since they're hosting the images/etc themselves.

The correct solution to this kind of problem is legal - this is simple
copyright violation.

I'm not sure about the merits of your suggestion otherwise.  It's
reasonable to make data: pages same-origin with their parent when
they're contained within something, but it seems dodgy to make them
same-origin with their *contained* pages as well.  If not done
carefully, that could allow contained pages access to the data: page's
parent as well, or other cross-origin pages that the data: page is
containing.

~TJ


Re: [whatwg] Security restriction allows content thievery

2012-07-15 Thread Robert Eisele
2012/7/16 Tab Atkins Jr. jackalm...@gmail.com

 On Sun, Jul 15, 2012 at 3:22 PM, Robert Eisele rob...@xarg.org wrote:
  Browsers are very restrictive when one tries to access the contents of
  different domains (including the scheme), embedded via framesets. This is
  normally a good practice, but I'd suggest to weaken this restriction for
  the data: URI schema.
 
  I'm currently building an analysis system like Google Analytics, which
 gets
  embedded into a website via a small JavaScript snippet. When I analyzed
 the
  data, I came across a very interesting trick because I got a lot of
  requests (with the data from location.href) where the entire website was
  embedded into a data:text/html URI - except that all ads of the page were
  replaced. Fortunately, my tracking code has been left without
  modifications.
 
  But the scary thing is that this way you can monetize foreign content by
  simply embedding it somewhere you can direct traffic to. That's pretty
  clever, because the original site owner doesn't notice this abuse due to
  the fact that top.location.href isn't readable. Or even worse, he would
  never notice it at all when he doesn't sniff the URI with JavaScript,
  because image files would have no referrer.
 
  My final approach to convict the abuser is based on the fact, that the
  JavaScript was dynamically loaded from my server and that I can write to
  location.href. So I added this piece of code:
 
  if (top.location.protocol === 'data:') {
  top.location.href = 'http://example.com/trap/';
  }
 
  But even then the referrer will not be passed to the server. So my
 proposal
  is that the data URI schema gets an exception on this security behavior.

 The problem you outline is not directly tied to the solution you
 present.  You can scrape a site and display it as your own without any
 fancy tricks, just by downloading all the resources and hosting them
 yourself.  This merely consumes a little more bandwidth for the
 attacker, since they're hosting the images/etc themselves.


But you would get a valid referrer if the tracking code wasn't removed. The
data: protects the abuser in an unecessary way. But you're absolutely right
that the solution I present isn't entirly tied to the problem.


 The correct solution to this kind of problem is legal - this is simple
 copyright violation.


But if you don't have a chance to get information about the attacker, you
can't sue him. I had the strange idea to use a prompt to ask the user for
the original URL in his address bar. But as I said, that's strange.



 I'm not sure about the merits of your suggestion otherwise.  It's
 reasonable to make data: pages same-origin with their parent when
 they're contained within something, but it seems dodgy to make them
 same-origin with their *contained* pages as well.  If not done
 carefully, that could allow contained pages access to the data: page's
 parent as well, or other cross-origin pages that the data: page is
 containing.


Very intuitive thought, one could assume that data: pages are same-origin,
or better that embedded data: pages are part of the current page. In this
way, you wouldn't have the chance to get off the sandbox and access the
parent. What would be a situation where a same-origin could be dangerous?



 ~TJ



Re: [whatwg] Security restriction allows content thievery

2012-07-15 Thread Ryosuke Niwa
On Sun, Jul 15, 2012 at 4:02 PM, Robert Eisele rob...@xarg.org wrote:

 2012/7/16 Tab Atkins Jr. jackalm...@gmail.com
  On Sun, Jul 15, 2012 at 3:22 PM, Robert Eisele rob...@xarg.org wrote:
   Browsers are very restrictive when one tries to access the contents of
   different domains (including the scheme), embedded via framesets. This
 is
   normally a good practice, but I'd suggest to weaken this restriction
 for
   the data: URI schema.
  
   I'm currently building an analysis system like Google Analytics, which
  gets
   embedded into a website via a small JavaScript snippet. When I analyzed
  the
   data, I came across a very interesting trick because I got a lot of
   requests (with the data from location.href) where the entire website
 was
   embedded into a data:text/html URI - except that all ads of the page
 were
   replaced. Fortunately, my tracking code has been left without
   modifications.
  
   But the scary thing is that this way you can monetize foreign content
 by
   simply embedding it somewhere you can direct traffic to. That's pretty
   clever, because the original site owner doesn't notice this abuse due
 to
   the fact that top.location.href isn't readable. Or even worse, he would
   never notice it at all when he doesn't sniff the URI with JavaScript,
   because image files would have no referrer.
  
   My final approach to convict the abuser is based on the fact, that the
   JavaScript was dynamically loaded from my server and that I can write
 to
   location.href. So I added this piece of code:
  
   if (top.location.protocol === 'data:') {
   top.location.href = 'http://example.com/trap/';
   }
  
   But even then the referrer will not be passed to the server. So my
  proposal
   is that the data URI schema gets an exception on this security
 behavior.
 
  The problem you outline is not directly tied to the solution you
  present.  You can scrape a site and display it as your own without any
  fancy tricks, just by downloading all the resources and hosting them
  yourself.  This merely consumes a little more bandwidth for the
  attacker, since they're hosting the images/etc themselves.
 

 But you would get a valid referrer if the tracking code wasn't removed. The
 data: protects the abuser in an unecessary way. But you're absolutely right
 that the solution I present isn't entirly tied to the problem.


The embedder can easily remove the tracking code. Better yet, the embedder
can host the content on his server and disallow access to all external
resources to cripple your tracking code.

 The correct solution to this kind of problem is legal - this is simple
  copyright violation.

 But if you don't have a chance to get information about the attacker, you
 can't sue him. I had the strange idea to use a prompt to ask the user for
 the original URL in his address bar. But as I said, that's strange.


That sounds like a problem we can't solve.

- Ryosuke