Re: a simple formatter

2008-02-07 Thread Tsantilas Christos

 Tested.

 I have made a (small) file of the structures I've noticed past formatting
 errors on. With a manual fix-up to what I think it should look like in
 readable C++ under the published squid formatting description.

 The output of your script does this:
 ..

It is the --break-blocks option of the astyle. I think it can removed.

 Also, I noticed that the formatter will accept _anything_ given to it as a
 filename and create the files. Thats rather nasty when non-valid filename
 sequences are entered. ie --help


OK fixed :-).  Now accepts only files with .cc,.h,.cci and .c extensions
and ignore all other files.
I am re-sending the fixed script . In this version also I removed the
--break-blocks astyle option. Formated code looks better without it.

--
  Christos

#!/usr/bin/perl
#
# Author: Tsantilas Christos
# email:  [EMAIL PROTECTED]
#
# Distributed under the terms of the GNU General Public License as published
# by the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# The ldap_manager library is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
# Library General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
#
# See LICENSE or http://www.gnu.org/licenses/gpl.html for details .
#


use IPC::Open2;

$ASTYLE_BIN=/usr/bin/astyle;
$ASTYLE_ARGS =--mode=c -s4 -O -l;

#$ASTYLE_ARGS=--mode=c -s4 -O --break-blocks -l;
#$ASTYLE_BIN=/usr/local/src/astyle-1.21/bin/astyle;



if(! -e $ASTYLE_BIN){
print \nFile $ASTYLE_BIN not found\n;
print Please fix the ASTYLE_BIN variable in this script!\n\n;
exit -1;
}
$ASTYLE_BIN=$ASTYLE_BIN. .$ASTYLE_ARGS;

$INDENT = ;

$out = shift @ARGV;
#read options, currently no options available
while($out eq  ||  $out =~ /^-\w+$/){
   if($out eq -h) {
usage($0);
exit 0;
   }
   else {
   usage($0);
   exit -1;
   }
}


while($out){

if( $out !~ /\.cc$|\.cci$|\.h$|\.c$/) {
 print Unknown suffix for file $out, ignoring\n;
 $out = shift @ARGV;
 next;
}

$in= $out.astylebak;
my($new_in) = $in;
my($i) = 0;
while(-e $new_in) {
	$new_in=$in...$i;
	$i++;
}
$in=$new_in;
rename($out, $in);

local (*FROM_ASTYLE, *TO_ASTYLE);
$pid_style=open2(\*FROM_ASTYLE, \*TO_ASTYLE, $ASTYLE_BIN);

if(!$pid_style){
	print An error while open2\n;
	exit -1;
}


if($pid=fork()){
	#do parrent staf
	close(FROM_ASTYLE);
	
	if(!open(IN, $in)){
	print Can not open input file: $in\n;
	exit -1;
	}
	my($line) = '';
	while(IN){
	$line=$line.$_;
	if(input_filter(\$line)==0){
		next;
	}
	print TO_ASTYLE $line;
	$line = '';
	}
	if($line){
	print TO_ASTYLE $line;
	}
	close(TO_ASTYLE);
	waitpid($pid,0);
}
else{
	# child staf
	close(TO_ASTYLE);
	
	if(!open(OUT,$out)){
	print Can't open output file: $out\n;
	exit -1;
	}
	my($line)='';
	while(FROM_ASTYLE){
	$line = $line.$_;
	if(output_filter(\$line)==0){
		next;
	}
	print OUT $line;
	$line = '';
	}
	if($line){
	print OUT $line;
	}
	close(OUT);
	exit 0;
}

$out = shift @ARGV;
}


sub input_filter{
my($line)[EMAIL PROTECTED];
 #if we have integer declaration, get it all before processing it..
if($$line =~/.*\s*int\s+.*/  ){ 
	if(index($$line,;) == -1){
	return 0;
	}
	if($$line =~ /(.*)\s*int\s+([^:]*):\s*(\w+)\s*\;(.*)/s){
#	print  .$$line.($1)\n;
	$$line= $1. int .$2.__FORASTYLE__.$3.;.$4;
#	print -.$$line.\n;
	}
	return 1;
}

return 1;
}

sub output_filter{
my($line)[EMAIL PROTECTED];

if($$line =~ /^\s*$/){
	return 1;
}

#if line is not preprocessor directive keep indentation
if($$line !~  /^(\s*)\#/){
	$$line =~ /^(\s*)./;
	$INDENT = $1;
# here we can handle only the case we have a one line C++/C command.
# but looks enougth..
}

#   The  #preprocessor_directive { case
 if($$line =~ /^(\s*)\#(.*){\s*$/ ){
	$$line=$1.#.$2.\n.$INDENT.{\n;
}
	
#   The  { #preprocessor directive case
if($$line =~ /^(\s*)\{(.*)#(if|else|endif)(.*)$/ ){
#  print From :.$$line.\n;
  $$line= $1.{.$2.\n#.$3.$4.\n;
#  print --.$$line.\n;
}

   # The unsigned int:1; case .
$$line =~ s/__FORASTYLE__/:/;

return 1;
}

sub usage{
my($name)[EMAIL PROTECTED];
print Usage:\n   $name file1 file2 file3 \n;
}

Re: squid-3.HEAD IPAddress leak

2008-02-07 Thread Amos Jeffries

 On Thu, 2008-02-07 at 12:21 +0900, Adrian Chadd wrote:

 Well, I haven't removed the temporary malloc/free pair, whatever its
 called;
 I've just removed Amos' workaround in src/comm.cc so it doesn't leak
 on my
 system whilst I profile.

 Still, this is one of those death of a thousand cuts method of
 killing performance..

 Right, I haven't seen the commit; care to mail the diff?



Here you go.

--- src/comm.cc 2008-01-19 20:50:42.0 +1300
+++ src/comm.cc-2   2008-02-08 09:56:11.0 +1300
@@ -1381,7 +1381,7 @@

 }

-#ifdef _SQUID_LINUX_
+#ifdef 0 // _SQUID_LINUX_
 /* 2007-11-27:
  * Linux Debian replaces our allocated AI pointer with garbage when
  * connect() fails. This leads to segmentation faults deallocating



In reply to your last few emails Adrian.

re: can we join up on IRC?

  I may not be able to for the next few days, too much else on.


re: Why are you trying to allocate the structure on invocation of
GetAddrInfo() ?

  The design was to follow the well-known structure of the kernel call
getaddrinfo(), which fills kernel-managed memory in a thread-safe way
down a dynamic structure. The addrinfo structs are too nasty to use
naked for anything like the amount of comm usage squid has. They are not
an object per-se, but the root of a pointer tree to various types of
node, which depend on the flags for their memory sizes.
  We could make a variant of the call which takes an addrinfo object and
sets it up to point at the IPAddress internals correctly. BUT the bug we
are looking at would then cause the IPAddress to become garbage, maybe
have the kernel free'ing stack memory, etc.


re: Argh, this temporary malloc/free pair is peppered throughout the
codebase! Grr.

 You already know there are naked comm calls everywhere. :-( Most of them
calls addrinfo was desgined for use in.


re: I've removed that hack, and things work fine for me. Ubuntu 7.01 here,
x86_64.

 Yes, I found no problem on ubuntu also. I simply could not (after very
little searching) find a #if test that would only work for Debian.


Amos




Re: squid-3.HEAD IPAddress leak

2008-02-07 Thread Adrian Chadd
Uhm, the kernel won't be free'ing userland memory at all; it doesn't give
a rats how its managed.

You're probably confused with the library - the libc is fumbling with
all the struct addrinfo things.

Also, if the bug bugs you, create a temporary pointer and pass that in. ;)



Adrian

On Fri, Feb 08, 2008, Amos Jeffries wrote:
 
  On Thu, 2008-02-07 at 12:21 +0900, Adrian Chadd wrote:
 
  Well, I haven't removed the temporary malloc/free pair, whatever its
  called;
  I've just removed Amos' workaround in src/comm.cc so it doesn't leak
  on my
  system whilst I profile.
 
  Still, this is one of those death of a thousand cuts method of
  killing performance..
 
  Right, I haven't seen the commit; care to mail the diff?
 
 
 
 Here you go.
 
 --- src/comm.cc 2008-01-19 20:50:42.0 +1300
 +++ src/comm.cc-2   2008-02-08 09:56:11.0 +1300
 @@ -1381,7 +1381,7 @@
 
  }
 
 -#ifdef _SQUID_LINUX_
 +#ifdef 0 // _SQUID_LINUX_
  /* 2007-11-27:
   * Linux Debian replaces our allocated AI pointer with garbage when
   * connect() fails. This leads to segmentation faults deallocating
 
 
 
 In reply to your last few emails Adrian.
 
 re: can we join up on IRC?
 
   I may not be able to for the next few days, too much else on.
 
 
 re: Why are you trying to allocate the structure on invocation of
 GetAddrInfo() ?
 
   The design was to follow the well-known structure of the kernel call
 getaddrinfo(), which fills kernel-managed memory in a thread-safe way
 down a dynamic structure. The addrinfo structs are too nasty to use
 naked for anything like the amount of comm usage squid has. They are not
 an object per-se, but the root of a pointer tree to various types of
 node, which depend on the flags for their memory sizes.
   We could make a variant of the call which takes an addrinfo object and
 sets it up to point at the IPAddress internals correctly. BUT the bug we
 are looking at would then cause the IPAddress to become garbage, maybe
 have the kernel free'ing stack memory, etc.
 
 
 re: Argh, this temporary malloc/free pair is peppered throughout the
 codebase! Grr.
 
  You already know there are naked comm calls everywhere. :-( Most of them
 calls addrinfo was desgined for use in.
 
 
 re: I've removed that hack, and things work fine for me. Ubuntu 7.01 here,
 x86_64.
 
  Yes, I found no problem on ubuntu also. I simply could not (after very
 little searching) find a #if test that would only work for Debian.
 
 
 Amos
 

-- 
- Xenion - http://www.xenion.com.au/ - VPS Hosting - Commercial Squid Support -
- $25/pm entry-level VPSes w/ capped bandwidth charges available in WA -


Re: squid-3.HEAD IPAddress leak

2008-02-07 Thread Amos Jeffries

 On Thu, 2008-02-07 at 14:08 +0900, Adrian Chadd wrote:
 On Thu, Feb 07, 2008, Robert Collins wrote:

   Still, this is one of those death of a thousand cuts method of
   killing performance..
 
  Right, I haven't seen the commit; care to mail the diff?

 Which? I just looked at the places where Amos is calling GetAddrInfo()
 and FreeAddrInfo(); more then one are:

 * GetAddrInfo(temp, );
 * F-{something} = temp;
 * FreeAddrInfo(temp);

 Wheee, surely that would be better as
 F-{something} = GetAddrInfo(...)


F-something should all be IPAddress type. Assignment to update based on
some results kernel places in temp. That usage above is a guaranteed leak
of N bytes where N can be up to the size of a small DNS zone!

Please read this for a short synopsis of the requirements of addrinfo usage:
  http://linux.die.net/man/3/getaddrinfo

Amos



Re: squid-3.HEAD IPAddress leak

2008-02-07 Thread Adrian Chadd
Have you tried running valgrind?

Also, wait. What exactly are you using addr_info for? Can you explain it
to me?

There's a netural sock addr size type - its called sockaddr_storage.
addr_info is for return results from hostname-ip queries..


Adrian


On Fri, Feb 08, 2008, Amos Jeffries wrote:
  On Thu, Feb 07, 2008, Robert Collins wrote:
 
   Still, this is one of those death of a thousand cuts method of
   killing performance..
 
  Right, I haven't seen the commit; care to mail the diff?
 
  Which? I just looked at the places where Amos is calling GetAddrInfo()
  and FreeAddrInfo(); more then one are:
 
  * GetAddrInfo(temp, );
  * F-{something} = temp;
  * FreeAddrInfo(temp);
 
 Where?!
 
  I hate addrinfo enough that I only added addrinfo where it needed:
 
  GetAddrInfo(temp)
  syscall_needing_neutral_sockaddr_size_family(temp)
  FreeAddrInfo(temp)
 
 OR:
 
  GetAddrInfo(temp)
  syscall_altering_addrinfo(temp)
  F-something = temp;
  FreeAddrInfo(temp)
 
 
 Amos
 

-- 
- Xenion - http://www.xenion.com.au/ - VPS Hosting - Commercial Squid Support -
- $25/pm entry-level VPSes w/ capped bandwidth charges available in WA -


Re: squid-3.HEAD IPAddress leak

2008-02-07 Thread Amos Jeffries
 Uhm, the kernel won't be free'ing userland memory at all; it doesn't give
 a rats how its managed.

 You're probably confused with the library - the libc is fumbling with
 all the struct addrinfo things.

Possibly. Once an external call is made I no longer care if its kernel or
library.  Bad effects are equally dangerous to squid.


 Also, if the bug bugs you, create a temporary pointer and pass that in. ;)


Try it. You will be horrified.
The pointer-tree needs by the OS to work with screws that up. We would
need to allocate a new addrinfo for each master node and set its pointers
to new memory, copy the original data into it ... and hey! thats what
IPAddress::GetAddrInfo() does!

Do not confuse IPAddress::GetAddrInfo() which allocates an initializes a
'full' one-IP addrinfo* tree quickly which is about to be used in place of
a blocking getaddrinfo() system call.

The second-best alternative in many of these places is hard-coding
sockaddr_in* types and using #if USE_IPV6 to alternate the calls.

With IPAddress::InitAddrInfo() which allocates a single empty addrinfo*
ready for new data to be inserted and only sets the flags up.

Amos



 Adrian

 On Fri, Feb 08, 2008, Amos Jeffries wrote:
 
  On Thu, 2008-02-07 at 12:21 +0900, Adrian Chadd wrote:
 
  Well, I haven't removed the temporary malloc/free pair, whatever its
  called;
  I've just removed Amos' workaround in src/comm.cc so it doesn't leak
  on my
  system whilst I profile.
 
  Still, this is one of those death of a thousand cuts method of
  killing performance..
 
  Right, I haven't seen the commit; care to mail the diff?
 


 Here you go.
 
 --- src/comm.cc 2008-01-19 20:50:42.0 +1300
 +++ src/comm.cc-2   2008-02-08 09:56:11.0 +1300
 @@ -1381,7 +1381,7 @@

  }

 -#ifdef _SQUID_LINUX_
 +#ifdef 0 // _SQUID_LINUX_
  /* 2007-11-27:
   * Linux Debian replaces our allocated AI pointer with garbage when
   * connect() fails. This leads to segmentation faults deallocating
 


 In reply to your last few emails Adrian.

 re: can we join up on IRC?

   I may not be able to for the next few days, too much else on.


 re: Why are you trying to allocate the structure on invocation of
 GetAddrInfo() ?

   The design was to follow the well-known structure of the kernel call
 getaddrinfo(), which fills kernel-managed memory in a thread-safe way
 down a dynamic structure. The addrinfo structs are too nasty to use
 naked for anything like the amount of comm usage squid has. They are not
 an object per-se, but the root of a pointer tree to various types of
 node, which depend on the flags for their memory sizes.
   We could make a variant of the call which takes an addrinfo object and
 sets it up to point at the IPAddress internals correctly. BUT the bug we
 are looking at would then cause the IPAddress to become garbage, maybe
 have the kernel free'ing stack memory, etc.


 re: Argh, this temporary malloc/free pair is peppered throughout the
 codebase! Grr.

  You already know there are naked comm calls everywhere. :-( Most of
 them
 calls addrinfo was desgined for use in.


 re: I've removed that hack, and things work fine for me. Ubuntu 7.01
 here,
 x86_64.

  Yes, I found no problem on ubuntu also. I simply could not (after very
 little searching) find a #if test that would only work for Debian.


 Amos


 --
 - Xenion - http://www.xenion.com.au/ - VPS Hosting - Commercial Squid
 Support -
 - $25/pm entry-level VPSes w/ capped bandwidth charges available in WA -





Re: squid-3.HEAD IPAddress leak

2008-02-07 Thread Amos Jeffries
 On Thu, Feb 07, 2008, Robert Collins wrote:

  Still, this is one of those death of a thousand cuts method of
  killing performance..

 Right, I haven't seen the commit; care to mail the diff?

 Which? I just looked at the places where Amos is calling GetAddrInfo()
 and FreeAddrInfo(); more then one are:

 * GetAddrInfo(temp, );
 * F-{something} = temp;
 * FreeAddrInfo(temp);

Where?!

 I hate addrinfo enough that I only added addrinfo where it needed:

 GetAddrInfo(temp)
 syscall_needing_neutral_sockaddr_size_family(temp)
 FreeAddrInfo(temp)

OR:

 GetAddrInfo(temp)
 syscall_altering_addrinfo(temp)
 F-something = temp;
 FreeAddrInfo(temp)


Amos




Re: squid-3.HEAD IPAddress leak

2008-02-07 Thread Amos Jeffries
 Have you tried running valgrind?

 Also, wait. What exactly are you using addr_info for? Can you explain it
 to me?

The basics:

  addrinfo* extends the neutral sockaddr* type ( union { sockaddr_in,
sockaddr_in6, sockaddr_storage} ) while adding the size values and
moving the flags and protocol details.

I'm using it in three ways.

First Usage: [simplest via InitAddrInfo() paired with an assignment
somewhere ]
 - to create an empty addrinfo tree root and pass to the system to be
filled out. The results are copied to the fde or xxData objects for later
async use.


Second usage: [ hiding complex setup via GetAddrInfo() ]
  - to create addrinfo tree containing the current IPAddress so system or
squid can use the structure fields in some comm call.
  Variant 1: addrinfo provides: AF_*, sockaddr_storage*, and sizeof()
results in one package with fixed field names/locations for squid to
send individually to a system call without mode-specific #ifdef's.
  Variant 2: whole addrinfo goes to system for use.


Third Usage: [ hiding complex usage via GetAddrInfo() paired with an
assignment somewhere ]
  - to create addrinfo tree containing the current IP and socket
information so system can update/add its knowledge of the FQDN,
hostname, alternate IPs, socket options, or original destination (in NAT
case).
(both above Variantions apply here too)
  - which then gets saved somewhere for squid to use later.


All three uses above end up with a tree of dynamic data squid is expected
to cleanup. So all uses are terminated with a FreeAddrInfo() to perform
that cleanup.


 There's a netural sock addr size type - its called sockaddr_storage.
 addr_info is for return results from hostname-ip queries..


struct addrinfo extends struct sockaddr_storage to handle multiple IPs,
hostnames, and removes the need for many type-castings in user-space code.

Amos


 Adrian


 On Fri, Feb 08, 2008, Amos Jeffries wrote:
  On Thu, Feb 07, 2008, Robert Collins wrote:
 
   Still, this is one of those death of a thousand cuts method of
   killing performance..
 
  Right, I haven't seen the commit; care to mail the diff?
 
  Which? I just looked at the places where Amos is calling GetAddrInfo()
  and FreeAddrInfo(); more then one are:
 
  * GetAddrInfo(temp, );
  * F-{something} = temp;
  * FreeAddrInfo(temp);

 Where?!

  I hate addrinfo enough that I only added addrinfo where it needed:

  GetAddrInfo(temp)
  syscall_needing_neutral_sockaddr_size_family(temp)
  FreeAddrInfo(temp)

 OR:

  GetAddrInfo(temp)
  syscall_altering_addrinfo(temp)
  F-something = temp;
  FreeAddrInfo(temp)


 Amos


 --
 - Xenion - http://www.xenion.com.au/ - VPS Hosting - Commercial Squid
 Support -
 - $25/pm entry-level VPSes w/ capped bandwidth charges available in WA -





Re: squid-3.HEAD IPAddress leak

2008-02-07 Thread Adrian Chadd
On Fri, Feb 08, 2008, Amos Jeffries wrote:

 Try it. You will be horrified.

Why? Whats being trashed, AI, or AI-somemember?

If AI is being trashed, then just create a temporary pointer copy
of AI, use that in the socket call, and then throw it away.
if its trashing the memory -at- the AI rather than the AI pointer
itself then you should probably spend some time with valgrind.

 The pointer-tree needs by the OS to work with screws that up. We would
 need to allocate a new addrinfo for each master node and set its pointers
 to new memory, copy the original data into it ... and hey! thats what
 IPAddress::GetAddrInfo() does!

It doesn't do that in C! I don't get it.

 Do not confuse IPAddress::GetAddrInfo() which allocates an initializes a
 'full' one-IP addrinfo* tree quickly which is about to be used in place of
 a blocking getaddrinfo() system call.
 
 The second-best alternative in many of these places is hard-coding
 sockaddr_in* types and using #if USE_IPV6 to alternate the calls.

 With IPAddress::InitAddrInfo() which allocates a single empty addrinfo*
 ready for new data to be inserted and only sets the flags up.

Are there any situations where you iterate over a list of AddrInfo's ?

You're not passing in a struct addr_info which you've allocated yourself into
a glibc call which then frobs it and possibly allocates another?




Adrian



Re: squid-3.HEAD IPAddress leak

2008-02-07 Thread Amos Jeffries
 On Fri, Feb 08, 2008, Amos Jeffries wrote:

 Try it. You will be horrified.

 Why? Whats being trashed, AI, or AI-somemember?

AI-somemember.


 If AI is being trashed, then just create a temporary pointer copy
 of AI, use that in the socket call, and then throw it away.
 if its trashing the memory -at- the AI rather than the AI pointer
 itself then you should probably spend some time with valgrind.

 The pointer-tree needs by the OS to work with screws that up. We would
 need to allocate a new addrinfo for each master node and set its
 pointers
 to new memory, copy the original data into it ... and hey! thats what
 IPAddress::GetAddrInfo() does!

 It doesn't do that in C! I don't get it.

 Do not confuse IPAddress::GetAddrInfo() which allocates an initializes a
 'full' one-IP addrinfo* tree quickly which is about to be used in place
 of
 a blocking getaddrinfo() system call.

 The second-best alternative in many of these places is hard-coding
 sockaddr_in* types and using #if USE_IPV6 to alternate the calls.

 With IPAddress::InitAddrInfo() which allocates a single empty addrinfo*
 ready for new data to be inserted and only sets the flags up.

 Are there any situations where you iterate over a list of AddrInfo's ?

Yes, in all those F-somewhere = *AI calls. Robert pointed out.
In all results from get*() system calls.
In all results from xgetaddrinfo() system calls.


 You're not passing in a struct addr_info which you've allocated yourself
 into
 a glibc call which then frobs it and possibly allocates another?

I suspect thats whats happening. Or, errors changing it to some sockaddr*
thing representing a bad-IP state. Same diff to squid.
When it should NOT be altering the pointers, but writing to the memory
provided.

Look at this:

int connect(int sockfd, const struct sockaddr *serv_addr, socklen_t addrlen);

in squid ...

  AI = { ...
ai_addr (0xfff...) my pointer somewhere to sockaddr
ai_addrlen = 16
... };

   connect(fd, AI-ai_addr, AI-ai_addrlen);

afterwards:

  AI = { ...
ai_addr (0x80...) definately NOT my pointer
ai_addrlen = 16
... };

the rest of AI is untouched.

So its altering a CONST parameter. And worse. It's taking the pointer
by-value as if it was accepting by-ref and alocating its own.


Amos




Re: squid-3.HEAD IPAddress leak

2008-02-07 Thread Adrian Chadd
And which version of debian on which platform is showing the busted behaviour?



Adrian

On Fri, Feb 08, 2008, Amos Jeffries wrote:
 
  On Thu, 2008-02-07 at 12:21 +0900, Adrian Chadd wrote:
 
  Well, I haven't removed the temporary malloc/free pair, whatever its
  called;
  I've just removed Amos' workaround in src/comm.cc so it doesn't leak
  on my
  system whilst I profile.
 
  Still, this is one of those death of a thousand cuts method of
  killing performance..
 
  Right, I haven't seen the commit; care to mail the diff?
 
 
 
 Here you go.
 
 --- src/comm.cc 2008-01-19 20:50:42.0 +1300
 +++ src/comm.cc-2   2008-02-08 09:56:11.0 +1300
 @@ -1381,7 +1381,7 @@
 
  }
 
 -#ifdef _SQUID_LINUX_
 +#ifdef 0 // _SQUID_LINUX_
  /* 2007-11-27:
   * Linux Debian replaces our allocated AI pointer with garbage when
   * connect() fails. This leads to segmentation faults deallocating
 
 
 
 In reply to your last few emails Adrian.
 
 re: can we join up on IRC?
 
   I may not be able to for the next few days, too much else on.
 
 
 re: Why are you trying to allocate the structure on invocation of
 GetAddrInfo() ?
 
   The design was to follow the well-known structure of the kernel call
 getaddrinfo(), which fills kernel-managed memory in a thread-safe way
 down a dynamic structure. The addrinfo structs are too nasty to use
 naked for anything like the amount of comm usage squid has. They are not
 an object per-se, but the root of a pointer tree to various types of
 node, which depend on the flags for their memory sizes.
   We could make a variant of the call which takes an addrinfo object and
 sets it up to point at the IPAddress internals correctly. BUT the bug we
 are looking at would then cause the IPAddress to become garbage, maybe
 have the kernel free'ing stack memory, etc.
 
 
 re: Argh, this temporary malloc/free pair is peppered throughout the
 codebase! Grr.
 
  You already know there are naked comm calls everywhere. :-( Most of them
 calls addrinfo was desgined for use in.
 
 
 re: I've removed that hack, and things work fine for me. Ubuntu 7.01 here,
 x86_64.
 
  Yes, I found no problem on ubuntu also. I simply could not (after very
 little searching) find a #if test that would only work for Debian.
 
 
 Amos
 

-- 
- Xenion - http://www.xenion.com.au/ - VPS Hosting - Commercial Squid Support -
- $25/pm entry-level VPSes w/ capped bandwidth charges available in WA -


Re: squid-3.HEAD IPAddress leak

2008-02-07 Thread Amos Jeffries
 And which version of debian on which platform is showing the busted
 behaviour?


 Dell i386
 Kernels 2.6.18-4 thru 2.6.22-2
 libstdc6 with g++ 4.1 thru 4.3

All the debian setups I have had to test with basically since I first
found it.

Amos


 On Fri, Feb 08, 2008, Amos Jeffries wrote:
 
  On Thu, 2008-02-07 at 12:21 +0900, Adrian Chadd wrote:
 
  Well, I haven't removed the temporary malloc/free pair, whatever its
  called;
  I've just removed Amos' workaround in src/comm.cc so it doesn't leak
  on my
  system whilst I profile.
 
  Still, this is one of those death of a thousand cuts method of
  killing performance..
 
  Right, I haven't seen the commit; care to mail the diff?
 


 Here you go.
 
 --- src/comm.cc 2008-01-19 20:50:42.0 +1300
 +++ src/comm.cc-2   2008-02-08 09:56:11.0 +1300
 @@ -1381,7 +1381,7 @@

  }

 -#ifdef _SQUID_LINUX_
 +#ifdef 0 // _SQUID_LINUX_
  /* 2007-11-27:
   * Linux Debian replaces our allocated AI pointer with garbage when
   * connect() fails. This leads to segmentation faults deallocating
 


 In reply to your last few emails Adrian.

 re: can we join up on IRC?

   I may not be able to for the next few days, too much else on.


 re: Why are you trying to allocate the structure on invocation of
 GetAddrInfo() ?

   The design was to follow the well-known structure of the kernel call
 getaddrinfo(), which fills kernel-managed memory in a thread-safe way
 down a dynamic structure. The addrinfo structs are too nasty to use
 naked for anything like the amount of comm usage squid has. They are not
 an object per-se, but the root of a pointer tree to various types of
 node, which depend on the flags for their memory sizes.
   We could make a variant of the call which takes an addrinfo object and
 sets it up to point at the IPAddress internals correctly. BUT the bug we
 are looking at would then cause the IPAddress to become garbage, maybe
 have the kernel free'ing stack memory, etc.


 re: Argh, this temporary malloc/free pair is peppered throughout the
 codebase! Grr.

  You already know there are naked comm calls everywhere. :-( Most of
 them
 calls addrinfo was desgined for use in.


 re: I've removed that hack, and things work fine for me. Ubuntu 7.01
 here,
 x86_64.

  Yes, I found no problem on ubuntu also. I simply could not (after very
 little searching) find a #if test that would only work for Debian.


 Amos


 --
 - Xenion - http://www.xenion.com.au/ - VPS Hosting - Commercial Squid
 Support -
 - $25/pm entry-level VPSes w/ capped bandwidth charges available in WA -





Re: a simple formatter

2008-02-07 Thread Amos Jeffries

 Tested.

 I have made a (small) file of the structures I've noticed past
 formatting
 errors on. With a manual fix-up to what I think it should look like in
 readable C++ under the published squid formatting description.

 The output of your script does this:
 ..

 It is the --break-blocks option of the astyle. I think it can removed.

 Also, I noticed that the formatter will accept _anything_ given to it as
 a
 filename and create the files. Thats rather nasty when non-valid
 filename
 sequences are entered. ie --help


 OK fixed :-).  Now accepts only files with .cc,.h,.cci and .c extensions
 and ignore all other files.
 I am re-sending the fixed script . In this version also I removed the
 --break-blocks astyle option. Formated code looks better without it.


Excellent. That dropit it down to:

--- astyle-test-cases.cc2008-02-08 12:37:14.0 +1300
+++ astyle-test-cases.cc-original   2008-02-07 12:13:07.0 +1300
@@ -10,9 +10,9 @@

 /// structures with bit-fields
 struct t1 {
-unsigned  int a:1;
-unsigned  int b:1;
-unsigned  int c:1;
+unsigned int a:1;
+unsigned int b:1;
+unsigned int c:1;
 };


I suppose we could live with that. Though I think it may be a problem with
the formatter.pl final replacement.

Amos



Re: squid-3.HEAD IPAddress leak

2008-02-07 Thread Alex Rousskov
On Fri, 2008-02-08 at 11:46 +1300, Amos Jeffries wrote:

 The basics:
 
   addrinfo* extends the neutral sockaddr* type ( union { sockaddr_in,
 sockaddr_in6, sockaddr_storage} ) while adding the size values and
 moving the flags and protocol details.

Hi Amos,

I understand that we need to support IPv4 and IPv6 addresses. I do
not understand why we are suddenly talking about pointers and trees?

I can understand if wherever Squid2 was using an IPv4 address, Squid3
would use an (ip4, ip6, whichOneFlag) class with appropriate methods
that check the flag and return or set the right thing. I know that would
not be the best way to implement it, but it would work and I would
understand what is going on.

Can you explain AddrInfo in the above simple (ip4, ip6, whichOneFlag)
class terms? Why do we need AddrInfo now and why are we talking about a
tree with pointers and alloc/free operations?

Did Squid2 use an equivalent of AddrInfo? If yes, did it use it as often
as Squid3 does now? Did Squid2 allocated and freed it?

Thank you,

Alex.


 I'm using it in three ways.
 
 First Usage: [simplest via InitAddrInfo() paired with an assignment
 somewhere ]
  - to create an empty addrinfo tree root and pass to the system to be
 filled out. The results are copied to the fde or xxData objects for later
 async use.
 
 
 Second usage: [ hiding complex setup via GetAddrInfo() ]
   - to create addrinfo tree containing the current IPAddress so system or
 squid can use the structure fields in some comm call.
   Variant 1: addrinfo provides: AF_*, sockaddr_storage*, and sizeof()
 results in one package with fixed field names/locations for squid to
 send individually to a system call without mode-specific #ifdef's.
   Variant 2: whole addrinfo goes to system for use.
 
 
 Third Usage: [ hiding complex usage via GetAddrInfo() paired with an
 assignment somewhere ]
   - to create addrinfo tree containing the current IP and socket
 information so system can update/add its knowledge of the FQDN,
 hostname, alternate IPs, socket options, or original destination (in NAT
 case).
 (both above Variantions apply here too)
   - which then gets saved somewhere for squid to use later.
 
 
 All three uses above end up with a tree of dynamic data squid is expected
 to cleanup. So all uses are terminated with a FreeAddrInfo() to perform
 that cleanup.
 
 
  There's a netural sock addr size type - its called sockaddr_storage.
  addr_info is for return results from hostname-ip queries..
 
 
 struct addrinfo extends struct sockaddr_storage to handle multiple IPs,
 hostnames, and removes the need for many type-castings in user-space code.
 
 Amos
 
 
  Adrian
 
 
  On Fri, Feb 08, 2008, Amos Jeffries wrote:
   On Thu, Feb 07, 2008, Robert Collins wrote:
  
Still, this is one of those death of a thousand cuts method of
killing performance..
  
   Right, I haven't seen the commit; care to mail the diff?
  
   Which? I just looked at the places where Amos is calling GetAddrInfo()
   and FreeAddrInfo(); more then one are:
  
   * GetAddrInfo(temp, );
   * F-{something} = temp;
   * FreeAddrInfo(temp);
 
  Where?!
 
   I hate addrinfo enough that I only added addrinfo where it needed:
 
   GetAddrInfo(temp)
   syscall_needing_neutral_sockaddr_size_family(temp)
   FreeAddrInfo(temp)
 
  OR:
 
   GetAddrInfo(temp)
   syscall_altering_addrinfo(temp)
   F-something = temp;
   FreeAddrInfo(temp)
 
 
  Amos
 
 
  --
  - Xenion - http://www.xenion.com.au/ - VPS Hosting - Commercial Squid
  Support -
  - $25/pm entry-level VPSes w/ capped bandwidth charges available in WA -
 
 



Re: a simple formatter

2008-02-07 Thread Alex Rousskov
On Wed, 2008-02-06 at 21:26 +0200, Tsantilas Christos wrote:
 Do we need something like that? Any comments/suggestions? Any testers?

I believe we do and I appreciate you working on it! Please try to fix
the remaining problems Amos pointed out.

Also, can you apply it to the entire Squid tree, remove all whitespace,
and calculate md5, comparing that with the virgin whitespace-less tree?
The MD5s should be the same, right?

This is not a perfect check because spaces within strings/etc are
stripped and not checked, but it is a pretty good one.

Thank you,

Alex.




Re: a simple formatter

2008-02-07 Thread Amos Jeffries
 On Wed, 2008-02-06 at 21:26 +0200, Tsantilas Christos wrote:
 Do we need something like that? Any comments/suggestions? Any testers?

 I believe we do and I appreciate you working on it! Please try to fix
 the remaining problems Amos pointed out.

 Also, can you apply it to the entire Squid tree, remove all whitespace,
 and calculate md5, comparing that with the virgin whitespace-less tree?
 The MD5s should be the same, right?

Oooh. Nice test.


 This is not a perfect check because spaces within strings/etc are
 stripped and not checked, but it is a pretty good one.

It will also miss the #if 0 { block problem.
Though my unit-test for that will catch it any any other whitsspace-exact
case we care to write the test for.

We may need a compile test for any other similar ones that have not had
test-cases written.

Just have to figure out the best way to have the test file built into the
codebase for automatic testing.

Amos



Re: a simple formatter

2008-02-07 Thread Alex Rousskov
On Fri, 2008-02-08 at 14:31 +1300, Amos Jeffries wrote:
  On Wed, 2008-02-06 at 21:26 +0200, Tsantilas Christos wrote:
  Do we need something like that? Any comments/suggestions? Any testers?
 
  I believe we do and I appreciate you working on it! Please try to fix
  the remaining problems Amos pointed out.
 
  Also, can you apply it to the entire Squid tree, remove all whitespace,
  and calculate md5, comparing that with the virgin whitespace-less tree?
  The MD5s should be the same, right?
 
 Oooh. Nice test.
 
 
  This is not a perfect check because spaces within strings/etc are
  stripped and not checked, but it is a pretty good one.
 
 It will also miss the #if 0 { block problem.

Are there free C++ source code obfuscation programs? If they are
guaranteed to generate the same source code regardless of formatting,
then applying them would catch even more bugs.

Too bad compilers produce different output for each execution due to
timestamps and such. Perhaps there is a way to avoid that and compare
md5s of squid executables?

Cheers,

Alex.
P.S. A basic test file would be good to have anyway, so thank you for
bootstrapping that.



Re: squid-3.HEAD IPAddress leak

2008-02-07 Thread Amos Jeffries
 On Fri, 2008-02-08 at 11:46 +1300, Amos Jeffries wrote:

 The basics:

   addrinfo* extends the neutral sockaddr* type ( union { sockaddr_in,
 sockaddr_in6, sockaddr_storage} ) while adding the size values and
 moving the flags and protocol details.

 Hi Amos,

 I understand that we need to support IPv4 and IPv6 addresses. I do
 not understand why we are suddenly talking about pointers and trees?

 I can understand if wherever Squid2 was using an IPv4 address, Squid3
 would use an (ip4, ip6, whichOneFlag) class with appropriate methods
 that check the flag and return or set the right thing. I know that would
 not be the best way to implement it, but it would work and I would
 understand what is going on.

The protocol extensions defined to replace sockaddr_in and in_addr went
through three generations of 'migration-pathway' attempts.
The first;
  sockaddr* defines a union type that was nasty for casting and did not
include that flag you mention in the structure itself. Most of the
accept(), connect(), bind() etc still use this as their input type.

The second;
 sockaddr_storage (as Husni uses, and Adrian mentioned) was created to
provide a better way of using sockaddr* so the sockaddr_in and
sockaddr_in6 bits could be read-written easily. But the big/litte endian
problems between OS screwed up the sockaddr_in* sa_family field locations
inside it so developers still can't portably use it for the v6/v6 flag
they wanted.

The third;
  addrinfo* defines a whole new type. Wrapping that old sockaddr* mess and
providing in a nice set fo bells and whistles for use. Most importantly
that flag we need to pas to the system calls.

It is in _exactly_ the class you mention. But in struct form so C people
can use it too.


 Can you explain AddrInfo in the above simple (ip4, ip6, whichOneFlag)
 class terms?

There are a few extra bits I'v snipped from this explanation.

The important bits for the squid comm code are:

struct addrinfo
{
... here is that flag you want to switch on if its been set properly.
... AF_INET (IPv4) or AF_INET6 (IPv6) or AF_UNSPEC (NULL, error, 'whatever')
  int ai_family;

... some bits needed only by the socket() creation calls
  int ai_socktype;
  int ai_protocol;

... and the actual data for the IP address:

... (1) pointer to somewhere holding the sockaddr_storage/*_in/*_in6
  struct sockaddr *ai_addr;

... (2) and the sizeof() for the (1) object.
... if the flag is not set this _might_ be magic'd instead.
  int ai_addrlen;


... along with some bits set when ia DNS lookup has been done
... _sometimes_ it's the host FQDN
... optional so we never new/free this ourselves in squid.
  char *ai_canonname;

... and because some system lookups in IPv6 can return _multiple_ addresses
... it has a linked-list of branchs for each of the alternatives
  struct addrinfo *ai_next; /* Pointer to next in list.  */
};


 Why do we need AddrInfo now

To kill dozens of castings, magic operations on object sizes, and cut down
dozens of lines of 'compatibility code'.

  and why are we talking about a
 tree with pointers and alloc/free operations?

I see a tree of pointers I call it a tree even if its a class-1 (dynamic
linked-list).

addrinfo - makes a tree node.
   ai_addr (a sockaddr pile of unfortunate structs)
   ai_canonname (a null-terminated string),
   ai_next (another addrinfo .. repeat ad inifinitum)

Together they make a pretty tree. But every used piece is eseentially
another new, memset, free.

HTH.


 Did Squid2 use an equivalent of AddrInfo? If yes, did it use it as often
 as Squid3 does now? Did Squid2 allocated and freed it?

What I've seen of Husni's final 2.6s13 patch he used addrinfo in many of
the same system-interface places and uses the naked system
getaddrinfo()/freeaddrinfo() calls where I use my GetAddrInfo() wrappers
to the system ones.
Then where he is passing around either sockaddr* or sockaddr_storage*
where I universally use IPAddress .


There are a few spots where a 'read-only' variant might be possible. So as
to pretend to allocate when no new/free is actually done, just a pointer
to the IPAddress field and a local addrinfo.
One of them is this connect() IFF that bug can be guaranteed not to
overwrite the internal IPAddress data and warp squid.

I may have some time to do this tonight. Bug me about it if I'm on.

Amos


 Thank you,

 Alex.


 I'm using it in three ways.

 First Usage: [simplest via InitAddrInfo() paired with an assignment
 somewhere ]
  - to create an empty addrinfo tree root and pass to the system to be
 filled out. The results are copied to the fde or xxData objects for
 later
 async use.


 Second usage: [ hiding complex setup via GetAddrInfo() ]
   - to create addrinfo tree containing the current IPAddress so system
 or
 squid can use the structure fields in some comm call.
   Variant 1: addrinfo provides: AF_*, sockaddr_storage*, and sizeof()
 results in one package with fixed field names/locations for squid to
 send individually to a system call 

Re: a simple formatter

2008-02-07 Thread Amos Jeffries
 On Fri, 2008-02-08 at 14:31 +1300, Amos Jeffries wrote:
  On Wed, 2008-02-06 at 21:26 +0200, Tsantilas Christos wrote:
  Do we need something like that? Any comments/suggestions? Any
 testers?
 
  I believe we do and I appreciate you working on it! Please try to fix
  the remaining problems Amos pointed out.
 
  Also, can you apply it to the entire Squid tree, remove all
 whitespace,
  and calculate md5, comparing that with the virgin whitespace-less
 tree?
  The MD5s should be the same, right?

 Oooh. Nice test.

 
  This is not a perfect check because spaces within strings/etc are
  stripped and not checked, but it is a pretty good one.

 It will also miss the #if 0 { block problem.

 Are there free C++ source code obfuscation programs? If they are
 guaranteed to generate the same source code regardless of formatting,
 then applying them would catch even more bugs.

 Too bad compilers produce different output for each execution due to
 timestamps and such.

What do you mean by this? A standard entry-level compiler test is that it
produces the same output from the same input every time.

Beginner students are taught that in the bootstrapping lessons: If it
compiles its own code and produces a different binary, do it again until
it stops or breaks. If it breaks you started with buggy code and still
have work to do.

 Perhaps there is a way to avoid that and compare
 md5s of squid executables?

 Cheers,

 Alex.
 P.S. A basic test file would be good to have anyway, so thank you for
 bootstrapping that.

I thought it would be helpful if you and Christos decided on an astyle
formatter. Turned out right.

Amos



Re: squid-3.HEAD IPAddress leak

2008-02-07 Thread Amos Jeffries
 On Fri, Feb 08, 2008, Amos Jeffries wrote:

  sockaddr_storage (as Husni uses, and Adrian mentioned) was created to
 provide a better way of using sockaddr* so the sockaddr_in and
 sockaddr_in6 bits could be read-written easily. But the big/litte endian
 problems between OS screwed up the sockaddr_in* sa_family field
 locations
 inside it so developers still can't portably use it for the v6/v6 flag
 they wanted.

 How'd that screw things up? Hm, I'll have to read.

 ... (1) pointer to somewhere holding the sockaddr_storage/*_in/*_in6
   struct sockaddr *ai_addr;

 Which you have to allocate.

and do when needed.


   char *ai_canonname;

 Which we're not using, and you have to allocate.

and don't.


 ... and because some system lookups in IPv6 can return _multiple_
 addresses
 ... it has a linked-list of branchs for each of the alternatives
   struct addrinfo *ai_next;  /* Pointer to next in list.  */
 };


 Which is only useful in the context of providing lists of addresses for
 a given host or IP address. What getaddrinfo() returns.

and consider read-only.


  Why do we need AddrInfo now

 To kill dozens of castings, magic operations on object sizes, and cut
 down
 dozens of lines of 'compatibility code'.

 And add in a minimum of one, and a maximum of two allocations per socket
 operation.

 Together they make a pretty tree. But every used piece is eseentially
 another new, memset, free.

 Ah, and here you will have problems.

Agreed.


 The members of that struct should probably be malloc, free, and not
 new/delete. You're using new/delete which -should- map to a default
 new operator and head off to the malloc libraries, but -squids- idea
 of the malloc interface could differ from the -library- idea of
 the malloc interface.

I was thinking squid overiding the new/delete to its xmalloc/xfree at the
same time it overrode the general malloc/free.


 So you should probably drop the new/delete'ing of the addrinfo stuff
 and replace it with malloc/free.

If thats better, no problem.


 You're also memset()'ing the addrinfo struct whether you allocated
 it or not, which may be double-memset'ing the thing, and if someone
 passed in an addrinfo it may have structure members which have now
 been leaked.

In the current usage the call should be the allocation. Not external.
Allowing external allocation via another API call would be the only
optimisation I can think of that does not break anything anywhere.

memset is needed there because I could not tell that the new/delete
_guaranteed_ pre-NULL'd memory and a single set bit in any unused fields
might cause a crash later.
With your deeper knowledge of the memory allocation in squid-3, feel free
to alter the default memset()'s.

Amos




Re: squid-3.HEAD IPAddress leak

2008-02-07 Thread Adrian Chadd
On Fri, Feb 08, 2008, Amos Jeffries wrote:

  sockaddr_storage (as Husni uses, and Adrian mentioned) was created to
 provide a better way of using sockaddr* so the sockaddr_in and
 sockaddr_in6 bits could be read-written easily. But the big/litte endian
 problems between OS screwed up the sockaddr_in* sa_family field locations
 inside it so developers still can't portably use it for the v6/v6 flag
 they wanted.

How'd that screw things up? Hm, I'll have to read.

 ... (1) pointer to somewhere holding the sockaddr_storage/*_in/*_in6
   struct sockaddr *ai_addr;

Which you have to allocate.

   char *ai_canonname;

Which we're not using, and you have to allocate.

 ... and because some system lookups in IPv6 can return _multiple_ addresses
 ... it has a linked-list of branchs for each of the alternatives
   struct addrinfo *ai_next;   /* Pointer to next in list.  */
 };
 

Which is only useful in the context of providing lists of addresses for
a given host or IP address. What getaddrinfo() returns.

  Why do we need AddrInfo now
 
 To kill dozens of castings, magic operations on object sizes, and cut down
 dozens of lines of 'compatibility code'.

And add in a minimum of one, and a maximum of two allocations per socket
operation.

 Together they make a pretty tree. But every used piece is eseentially
 another new, memset, free.

Ah, and here you will have problems.

The members of that struct should probably be malloc, free, and not
new/delete. You're using new/delete which -should- map to a default
new operator and head off to the malloc libraries, but -squids- idea
of the malloc interface could differ from the -library- idea of
the malloc interface.

So you should probably drop the new/delete'ing of the addrinfo stuff
and replace it with malloc/free.

You're also memset()'ing the addrinfo struct whether you allocated
it or not, which may be double-memset'ing the thing, and if someone
passed in an addrinfo it may have structure members which have now
been leaked.




Adrian



Re: squid-3.HEAD IPAddress leak

2008-02-07 Thread Alex Rousskov
On Fri, 2008-02-08 at 15:52 +1300, Amos Jeffries wrote:

 The second;
  sockaddr_storage (as Husni uses, and Adrian mentioned) was created to
 provide a better way of using sockaddr* so the sockaddr_in and
 sockaddr_in6 bits could be read-written easily. But the big/litte endian
 problems between OS screwed up the sockaddr_in* sa_family field locations
 inside it so developers still can't portably use it for the v6/v6 flag
 they wanted.

Do you have a reference for that? I do not want to bug you with more
questions but I am surprised to learn that some kind of a
sockaddr_storage wrapper cannot work well for Squid... We may have to
fix Polygraph that is using that approach, IIRC.

 The third;
   addrinfo* defines a whole new type. Wrapping that old sockaddr* mess and
 providing in a nice set fo bells and whistles for use. Most importantly
 that flag we need to pas to the system calls.

I have to say that the nice set of bells and whistles in a basic
address structure used throughout a performance-sensitive program raises
red flags, but perhaps the actual performance implications are not as
bad.

 The important bits for the squid comm code are:
 
 struct addrinfo
 {
   int ai_family;
   int ai_socktype;
   int ai_protocol;
   struct sockaddr *ai_addr; // pointer to sockaddr_storage/*_in/*_in6
   int ai_addrlen;
   char *ai_canonname; // we never new/free this ourselves in squid.
   struct addrinfo *ai_next;   // Pointer to next in list.
 };

Can you replace IPAddress data members with the above, except not use
any pointers and forget about ai_next and ai_canonname? I think doing so
will eliminate temporary allocations and other things that look rather
scary to both code quality and performance folks.

When you do need a list of addresses or canonname, it is OK to use
addrinfo and convert from/to IPAddress as needed, of course. I am
guessing such uses will not affect performance or overall code quality.

Thank you,

Alex.