Re: Concepts of Unique Tracking

2001-05-27 Thread Brian Reichert

On Fri, May 25, 2001 at 10:03:04AM -0700, Jonathan Hilgeman wrote:
 Now, I'm assuming that Apache has full access to these incoming packets.
 Therefore, they must also have access to this invisible identifier. Is it
 possible to extract that identifier somehow by tinkering with Apache?

Most NAT implemetations keep a hash of destination ports - internal IP.

To wit:

 1) Person behind the firewall sends out a request to a web server.

Person _really_ establishes an outgoing TCP session with his NAT
box.  The NAT box notes his internal_IP:dest_port, sets up an
outgoing TCP session to web server, notes it's own source port for
that leg.

 4) The firewall receives the packets of data first, but now must send those
 data packets to someone inside the firewall. 

Returning packets from the webserver come to that source port, NAT
box looks up hash of:  external_IP:source_port - internal_IP:dest_port,
and hands the packet in.

 5) The packets of data MUST have some unique identifier to let the firewall

That would be the source port of the NAT box's outgoing connection.

But:

- each outgoing TCP connection from the internal host will use a
  different source port.

- the request your web server is receiving may actaully (likely)
  be coming from a web cache somewhere.

 
 Jonathan
 

-- 
Brian 'you Bastard' Reichert[EMAIL PROTECTED]
37 Crystal Ave. #303Daytime number: (603) 434-6842
Derry NH 03038-1713 USA Intel architecture: the left-hand path



Re: Concepts of Unique Tracking

2001-05-25 Thread James G Smith

Jonathan Hilgeman [EMAIL PROTECTED] wrote:
Okay, after I think about it, there must be a way to identify a unique user,
even if they are behind a firewall. Let's run through this process:

1) Person behind the firewall sends out a request to a web server.
2) The firewall intercepts that request, masks the person's IP address and
lets the request keep going out.
3) The web server receives the request and sends back packets of data to the
IP of the user, which is really the IP of the firewall now.
4) The firewall receives the packets of data first, but now must send those
data packets to someone inside the firewall. 
5) The packets of data MUST have some unique identifier to let the firewall
know who requested the data in the first place. 

Now, I'm assuming that Apache has full access to these incoming packets.
Therefore, they must also have access to this invisible identifier. Is it
possible to extract that identifier somehow by tinkering with Apache?

No.  What happens is more like this:

(1) Browser opens socket for connecting to remote server.  This assigns a 
unique identifier to the TCP connection - IP + socket on client side.
(2) Browser connects to remote server, which actually ends up connecting to 
firewall.  Firewall has a unique number on its side - its IP + socket (80 or 
443 most likely).
(3) Firewall opens socket for connecting to remote server.  This assigns a 
unique identifier to the TCP connection - firewall's public IP + socket.  
Firewall remembers this and will transfer any data coming from client to this 
connection, and any data from this connection to the client.  This is part of 
what is meant by a firewall which saves state information.

All the information needed to connect the client and server via the firewall 
is kept within the firewall.  Neither the client or server need be aware of 
any of it, nor, afaik, can they be aware of it without putting a http proxy on 
the firewall.

The server is seeing the firewall's IP and socket, not the actual client's.  
This will change with each connection made, which will happen if the keepalive 
timeout happens.
-- 
James Smith [EMAIL PROTECTED], 979-862-3725
Texas AM CIS Operating Systems Group, Unix





Re: Concepts of Unique Tracking

2001-05-25 Thread Wim Kerkhoff

Jonathan Hilgeman wrote:
 Now, I'm assuming that Apache has full access to these incoming packets.
 Therefore, they must also have access to this invisible identifier. Is it
 possible to extract that identifier somehow by tinkering with Apache?

The only thing that you can access from the webserver side is the
REMOTE_ADDR and REMOTE_PORT. IP masquarding is handled only by the
firewall that is doing the masquarding: the web server and browser have
no idea that this is happening.  The firewall has a table that keeps
track of open TCP connections, so that when it receives data on the
outside port (e.g. 61172) it knows to rewrite the packet and send it off
back to the inside client (e.g. 192.168.1.42:49372) that created the
initial TCP connection. 

This is one of primary reasons that cookies exist.

-- 

Regards,

Wim Kerkhoff, Software Engineer
Merilus, Inc.  -|- http://www.merilus.com
Email: [EMAIL PROTECTED]



RE: Concepts of Unique Tracking

2001-05-25 Thread Jonathan Hilgeman

Actually, I had come up with a similar idea after I sent that one off. My
idea was that packets had packet identifiers in their header or footer, and
the packet identifiers were stored in the firewall and referenced to the
computer inside the firewall, so whenever packets with that identifier came
back, the firewall knew which computer to send it to.

Oh well. 

What about client-specific information available in Javascript, like screen
resolution, size, etc...? Can that be accessed by tinkering with Apache a
bit, or is it something only available because of the browser, since
Javascript is dependent on the browser? 

Jonathan

-Original Message-
From: Wim Kerkhoff [mailto:[EMAIL PROTECTED]]
Sent: Friday, May 25, 2001 10:15 AM
To: Jonathan Hilgeman
Cc: '[EMAIL PROTECTED]'
Subject: Re: Concepts of Unique Tracking


Jonathan Hilgeman wrote:
 Now, I'm assuming that Apache has full access to these incoming packets.
 Therefore, they must also have access to this invisible identifier. Is it
 possible to extract that identifier somehow by tinkering with Apache?

The only thing that you can access from the webserver side is the
REMOTE_ADDR and REMOTE_PORT. IP masquarding is handled only by the
firewall that is doing the masquarding: the web server and browser have
no idea that this is happening.  The firewall has a table that keeps
track of open TCP connections, so that when it receives data on the
outside port (e.g. 61172) it knows to rewrite the packet and send it off
back to the inside client (e.g. 192.168.1.42:49372) that created the
initial TCP connection. 

This is one of primary reasons that cookies exist.

-- 

Regards,

Wim Kerkhoff, Software Engineer
Merilus, Inc.  -|- http://www.merilus.com
Email: [EMAIL PROTECTED]



RE: Concepts of Unique Tracking

2001-05-25 Thread Jonathan Hilgeman

Actually, someone suggested HTTP authorization - does that require a cookie
to work? Or after they are authorized, it simply keeps the session open in
the browser...?

Jonathan

-Original Message-
From: Brian Reichert [mailto:[EMAIL PROTECTED]]
Sent: Friday, May 25, 2001 10:20 AM
To: Jonathan Hilgeman
Cc: '[EMAIL PROTECTED]'
Subject: Re: Concepts of Unique Tracking


On Fri, May 25, 2001 at 10:03:04AM -0700, Jonathan Hilgeman wrote:
 Now, I'm assuming that Apache has full access to these incoming packets.
 Therefore, they must also have access to this invisible identifier. Is it
 possible to extract that identifier somehow by tinkering with Apache?

Most NAT implemetations keep a hash of destination ports - internal IP.

To wit:

 1) Person behind the firewall sends out a request to a web server.

Person _really_ establishes an outgoing TCP session with his NAT
box.  The NAT box notes his internal_IP:dest_port, sets up an
outgoing TCP session to web server, notes it's own source port for
that leg.

 4) The firewall receives the packets of data first, but now must send
those
 data packets to someone inside the firewall. 

Returning packets from the webserver come to that source port, NAT
box looks up hash of:  external_IP:source_port - internal_IP:dest_port,
and hands the packet in.

 5) The packets of data MUST have some unique identifier to let the
firewall

That would be the source port of the NAT box's outgoing connection.

But:

- each outgoing TCP connection from the internal host will use a
  different source port.

- the request your web server is receiving may actaully (likely)
  be coming from a web cache somewhere.

 
 Jonathan
 

-- 
Brian 'you Bastard' Reichert[EMAIL PROTECTED]
37 Crystal Ave. #303Daytime number: (603) 434-6842
Derry NH 03038-1713 USA Intel architecture: the left-hand
path



Re: Concepts of Unique Tracking

2001-05-25 Thread Wim Kerkhoff

Jonathan Hilgeman wrote:
 
 What about client-specific information available in Javascript, like screen
 resolution, size, etc...? Can that be accessed by tinkering with Apache a
 bit, or is it something only available because of the browser, since
 Javascript is dependent on the browser?

I briefly thought about suggesting something like that, or with
combination with the other headers that get sent in the HTTP request for
language, encoding, etc. However, think of the situations such as
computer labs, internet cafes, etc, where all computers are identical in
every aspect, with the exact same version of the browser, hard coded
screen resolutions (e.g. 800x600), etc, that the user can not change.

-- 

Regards,

Wim Kerkhoff, Software Engineer
Merilus, Inc.  -|- http://www.merilus.com
Email: [EMAIL PROTECTED]



RE: Concepts of Unique Tracking

2001-05-25 Thread Jonathan Hilgeman

Dialup users will be given high-speed connections using network cards and
modems will be burned. It'll be like book-burning sessions all over again. 

Jonathan

-Original Message-
From: Ilya Martynov [mailto:[EMAIL PROTECTED]]
Sent: Friday, May 25, 2001 10:53 AM
To: Jonathan Hilgeman
Cc: '[EMAIL PROTECTED]'
Subject: Re: Concepts of Unique Tracking



JH Let's take over the world and recompile all browsers to have them send
out
JH the MAC address of thet network card.

.. and if I'm dialup user :)

JH Jonathan

JH -Original Message-
JH From: Wim Kerkhoff [mailto:[EMAIL PROTECTED]]
JH Sent: Friday, May 25, 2001 10:42 AM
JH To: Jonathan Hilgeman
JH Cc: '[EMAIL PROTECTED]'
JH Subject: Re: Concepts of Unique Tracking


JH Jonathan Hilgeman wrote:
 
 What about client-specific information available in Javascript, like
JH screen
 resolution, size, etc...? Can that be accessed by tinkering with Apache a
 bit, or is it something only available because of the browser, since
 Javascript is dependent on the browser?

JH I briefly thought about suggesting something like that, or with
JH combination with the other headers that get sent in the HTTP request for
JH language, encoding, etc. However, think of the situations such as
JH computer labs, internet cafes, etc, where all computers are identical in
JH every aspect, with the exact same version of the browser, hard coded
JH screen resolutions (e.g. 800x600), etc, that the user can not change.

JH -- 

JH Regards,

JH Wim Kerkhoff, Software Engineer
JH Merilus, Inc.  -|- http://www.merilus.com
JH Email: [EMAIL PROTECTED]


-- 
 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
| Ilya Martynov (http://martynov.org/)|
| GnuPG 1024D/323BDEE6 D7F7 561E 4C1D 8A15 8E80  E4AE BE1A 53EB 323B DEE6 |
| AGAVA Software Company (http://www.agava.com/)  |
 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-



RE: Concepts of Unique Tracking

2001-05-25 Thread Stephen Adkins


How quickly we forget ...

Don't we remember the huge outcry over Intel putting a unique ID in every
CPU which would could be transmitted via web browser and destroy all of our
privacy?

The frustration we feel as programmers who are trying to identify anonymous
visitors
is exactly what privacy is all about.
And I am thankful for it.

Get used to it.
People need to opt-in in order to be identified.
The closest thing we can get to this is people leaving their cookies
enabled on their 
browser.

Stephen

At 10:43 AM 5/25/2001 -0700, Jonathan Hilgeman wrote:
Let's take over the world and recompile all browsers to have them send out
the MAC address of thet network card.

Jonathan






RE: Concepts of Unique Tracking

2001-05-25 Thread Alex Porras


Although I agree about privacy issues, I will keep it short by stating that
there is a difference between identifying you as unique user 1309850825
(assuming no personally identifiable information is also collected) versus
identifying you as Stephen Adkins.  You can use the first method to
collect aggregate information about what percentage of your users are
accessing what parts of your website the most/least, so you could customize
your website appropriately.  That does not require me to know who everyone
is, personally speaking.

--Alex

 -Original Message-
 From: Stephen Adkins [mailto:[EMAIL PROTECTED]]
 Sent: Friday, May 25, 2001 1:14 PM
 To: Jonathan Hilgeman; '[EMAIL PROTECTED]'
 Subject: RE: Concepts of Unique Tracking
 
 
 
 How quickly we forget ...
 
 Don't we remember the huge outcry over Intel putting a unique 
 ID in every
 CPU which would could be transmitted via web browser and 
 destroy all of our
 privacy?
 
 The frustration we feel as programmers who are trying to 
 identify anonymous
 visitors
 is exactly what privacy is all about.
 And I am thankful for it.
 
 Get used to it.
 People need to opt-in in order to be identified.
 The closest thing we can get to this is people leaving their cookies
 enabled on their 
 browser.
 
 Stephen
 
 At 10:43 AM 5/25/2001 -0700, Jonathan Hilgeman wrote:
 Let's take over the world and recompile all browsers to have 
 them send out
 the MAC address of thet network card.
 
 Jonathan
 
 
 



RE: Concepts of Unique Tracking

2001-05-25 Thread Joe Breeden

ASCEND SOAPBOX

I agree with Alex (and it's not just because we work together). Companies
have been doing the kind of data collecting Alex is talking about for years.
As a matter of fact, some Cultural Anthropologists specialize in Corporate
Anthropology (for a recent related news item see
-http://www.cnn.com/2001/CAREER/dayonthejob/05/23/corp.anthropologist.idg/in
dex.html ). Collecting anonymous information about users is something almost
all websites do - I'm hesitant to say all, because I'm sure one website out
there doesn't keep a usage log (i.e. /usr/local/apache/logs/access_log or
/usr/local/apache/logs/error_log). It would be almost impossible to run a
good website that changes based on user trends and preferences and not do
some form of user tracking.

Of course the real problem is when the website tries to link the collected
data in someway to real people. Knowing that 15% of your users HTTP_REFERRER
is www.porn.com is one thing, knowing that Persons X, Y, and Z came from
www.porn.com and acting on that knowledge to send them information about the
latest sale on leather underwear and selling their names to the porn_users
mailing list is completely wrong. 

In my opinion, a good website has to track generalizations about user
preferences so it can react to add to the user experience in positive ways.
One way to do this to collect anonymous data about the things a user does on
the site. This can be done and still protect a users privacy.

DESCEND SOAPBOX

Joe Breeden
--
Sent from my Outlook 2000 Wired Deskheld (www.microsoft.com)


-Original Message-
From: Alex Porras 
Sent: Friday, May 25, 2001 2:38 PM
To: '[EMAIL PROTECTED]'
Subject: RE: Concepts of Unique Tracking



Although I agree about privacy issues, I will keep it short by stating that
there is a difference between identifying you as unique user 1309850825
(assuming no personally identifiable information is also collected) versus
identifying you as Stephen Adkins.  You can use the first method to
collect aggregate information about what percentage of your users are
accessing what parts of your website the most/least, so you could customize
your website appropriately.  That does not require me to know who everyone
is, personally speaking.

--Alex

 -Original Message-
 From: Stephen Adkins [mailto:[EMAIL PROTECTED]]
 Sent: Friday, May 25, 2001 1:14 PM
 To: Jonathan Hilgeman; '[EMAIL PROTECTED]'
 Subject: RE: Concepts of Unique Tracking
 
 
 
 How quickly we forget ...
 
 Don't we remember the huge outcry over Intel putting a unique 
 ID in every
 CPU which would could be transmitted via web browser and 
 destroy all of our
 privacy?
 
 The frustration we feel as programmers who are trying to 
 identify anonymous
 visitors
 is exactly what privacy is all about.
 And I am thankful for it.
 
 Get used to it.
 People need to opt-in in order to be identified.
 The closest thing we can get to this is people leaving their cookies
 enabled on their 
 browser.
 
 Stephen
 
 At 10:43 AM 5/25/2001 -0700, Jonathan Hilgeman wrote:
 Let's take over the world and recompile all browsers to have 
 them send out
 the MAC address of thet network card.
 
 Jonathan