Re: [PATCH] Limit log field width

2011-10-13 Thread Tsantilas Christos

If there is not any objection I will commit this patch to trunk.


On 10/05/2011 07:35 PM, Tsantilas Christos wrote:

On 09/06/2011 09:45 PM, Alex Rousskov wrote:

On 04/16/2011 11:28 PM, Amos Jeffries wrote:

Clearing up my queue of waiting patches this old one pops up as approved
but not merged.

Alex, it is waiting on your commit and resolution of a TODO (separate
patch was fine). If it is still going to happen you my want to do it
soon before any more polishing changes alter the logging area.


Done as trunk r11709. A separate polishing patch to remove no longer
needed type castings will follow.


And this is the policing patch, which
- converts type of the Token::[width|precision] members
from "unsigned int" to "int"
- renames the Token::[width|precision] members to
Token::[widthMin/widthMax]
- removes unneeded typecastings






Thank you,

Alex.







Re: [PATCH] Limit log field width

2011-10-05 Thread Tsantilas Christos

On 10/05/2011 08:13 PM, Alex Rousskov wrote:

On 10/05/2011 10:35 AM, Tsantilas Christos wrote:

On 09/06/2011 09:45 PM, Alex Rousskov wrote:

On 04/16/2011 11:28 PM, Amos Jeffries wrote:

Clearing up my queue of waiting patches this old one pops up as approved
but not merged.

Alex, it is waiting on your commit and resolution of a TODO (separate
patch was fine). If it is still going to happen you my want to do it
soon before any more polishing changes alter the logging area.


Done as trunk r11709. A separate polishing patch to remove no longer
needed type castings will follow.


And this is the policing patch, which
- converts type of the Token::[width|precision]  members
   from "unsigned int" to "int"
- renames the Token::[width|precision] members to
   Token::[widthMin/widthMax]
- removes unneeded typecastings


Looks good to me.


  case LFT_TIME_SUBSECOND:
  divisor = 1000;

-if (precision) {
+if (widthMax>  0) {
  int i;
  divisor = 100;

-for (i = precision; i>  1; i--)
+for (i = widthMax; i>  1; i--)
  divisor /= 10;

  if (!divisor)
  divisor = 0;
  }
  break;



 ts  Seconds since epoch
 tu  subsecond time (milliseconds)


I do not think there is a problem with the above code, but it looks like
we could merge %ts.%tu into something like %.3t. That would be a minor
usability improvement and out of scope for this patch, of course.


True...




Thank you,

Alex.





Re: [PATCH] Limit log field width

2011-10-05 Thread Alex Rousskov
On 10/05/2011 10:35 AM, Tsantilas Christos wrote:
> On 09/06/2011 09:45 PM, Alex Rousskov wrote:
>> On 04/16/2011 11:28 PM, Amos Jeffries wrote:
>>> Clearing up my queue of waiting patches this old one pops up as approved
>>> but not merged.
>>>
>>> Alex, it is waiting on your commit and resolution of a TODO (separate
>>> patch was fine). If it is still going to happen you my want to do it
>>> soon before any more polishing changes alter the logging area.
>>
>> Done as trunk r11709. A separate polishing patch to remove no longer
>> needed type castings will follow.
> 
> And this is the policing patch, which
> - converts type of the Token::[width|precision]  members
>   from "unsigned int" to "int"
> - renames the Token::[width|precision] members to
>   Token::[widthMin/widthMax]
> - removes unneeded typecastings

Looks good to me.

>  case LFT_TIME_SUBSECOND:
>  divisor = 1000;
>  
> -if (precision) {
> +if (widthMax > 0) {
>  int i;
>  divisor = 100;
>  
> -for (i = precision; i > 1; i--)
> +for (i = widthMax; i > 1; i--)
>  divisor /= 10;
>  
>  if (!divisor)
>  divisor = 0;
>  }
>  break;

> ts  Seconds since epoch
> tu  subsecond time (milliseconds)

I do not think there is a problem with the above code, but it looks like
we could merge %ts.%tu into something like %.3t. That would be a minor
usability improvement and out of scope for this patch, of course.


Thank you,

Alex.


Re: [PATCH] Limit log field width

2011-10-05 Thread Tsantilas Christos

On 09/06/2011 09:45 PM, Alex Rousskov wrote:

On 04/16/2011 11:28 PM, Amos Jeffries wrote:

Clearing up my queue of waiting patches this old one pops up as approved
but not merged.

Alex, it is waiting on your commit and resolution of a TODO (separate
patch was fine). If it is still going to happen you my want to do it
soon before any more polishing changes alter the logging area.


Done as trunk r11709. A separate polishing patch to remove no longer
needed type castings will follow.


And this is the policing patch, which
- converts type of the Token::[width|precision]  members
  from "unsigned int" to "int"
- renames the Token::[width|precision] members to
  Token::[widthMin/widthMax]
- removes unneeded typecastings






Thank you,

Alex.



log field width polishing 

This patch:
 - converts type of the Token::[width|precision]  members from "unsigned int" to "int"
 - renames the Token::[width|precision] members to Token::[widthMin/widthMax]
 - removes unneeded typecastings

This is a Measurement Factory project

=== modified file 'src/format/Format.cc'
--- src/format/Format.cc	2011-09-12 00:31:13 +
+++ src/format/Format.cc	2011-10-05 15:29:09 +
@@ -202,45 +202,45 @@
 break;
 
 case LOG_QUOTE_URL:
 entry->append("#", 1);
 break;
 
 case LOG_QUOTE_RAW:
 entry->append("'", 1);
 break;
 
 case LOG_QUOTE_NONE:
 break;
 }
 
 if (t->left)
 entry->append("-", 1);
 
 if (t->zero)
 entry->append("0", 1);
 
-if (t->width)
-storeAppendPrintf(entry, "%d", (int) t->width);
+if (t->widthMin >= 0)
+storeAppendPrintf(entry, "%d", t->widthMin);
 
-if (t->precision)
-storeAppendPrintf(entry, ".%d", (int) t->precision);
+if (t->widthMax >= 0)
+storeAppendPrintf(entry, ".%d", t->widthMax);
 
 if (arg)
 storeAppendPrintf(entry, "{%s}", arg);
 
 storeAppendPrintf(entry, "%s", t->label);
 
 if (t->space)
 entry->append(" ", 1);
 }
 }
 
 entry->append("\n", 1);
 }
 
 }
 
 static void
 log_quoted_string(const char *str, char *out)
 {
 char *p = out;
@@ -982,45 +982,45 @@
 case LFT_EXT_LOG:
 if (al->request)
 out = al->request->extacl_log.termedBuf();
 
 quote = 1;
 
 break;
 
 case LFT_SEQUENCE_NUMBER:
 outoff = logSequenceNumber;
 dooff = 1;
 break;
 
 case LFT_PERCENT:
 out = "%";
 
 break;
 }
 
 if (dooff) {
-snprintf(tmp, sizeof(tmp), "%0*" PRId64, fmt->zero ? (int) fmt->width : 0, outoff);
+snprintf(tmp, sizeof(tmp), "%0*" PRId64, fmt->zero && fmt->widthMin >= 0 ? fmt->widthMin : 0, outoff);
 out = tmp;
 
 } else if (doint) {
-snprintf(tmp, sizeof(tmp), "%0*ld", fmt->zero ? (int) fmt->width : 0, outint);
+snprintf(tmp, sizeof(tmp), "%0*ld", fmt->zero && fmt->widthMin >= 0 ? fmt->widthMin : 0, outint);
 out = tmp;
 }
 
 if (out && *out) {
 if (quote || fmt->quote != LOG_QUOTE_NONE) {
 char *newout = NULL;
 int newfree = 0;
 
 switch (fmt->quote) {
 
 case LOG_QUOTE_NONE:
 newout = rfc1738_escape_unescaped(out);
 break;
 
 case LOG_QUOTE_QUOTES: {
 size_t out_len = static_cast(strlen(out)) * 2 + 1;
 if (out_len >= sizeof(tmp)) {
 newout = (char *)xmalloc(out_len);
 newfree = 1;
 } else
@@ -1036,46 +1036,46 @@
 
 case LOG_QUOTE_URL:
 newout = rfc1738_escape(out);
 break;
 
 case LOG_QUOTE_RAW:
 break;
 }
 
 if (newout) {
 if (dofree)
 safe_free(out);
 
 out = newout;
 
 dofree = newfree;
 }
 }
 
 // enforce width limits if configured
-const bool haveMaxWidth = fmt->precision && !doint && !dooff;
-if (haveMaxWidth || fmt->width) {
-const int minWidth = fmt->width ?
- static_cast(fmt->width) : 0;
+const bool haveMaxWidth = fmt->widthMax >=0 && !doint && !dooff;
+if (haveMaxWidth || fmt->widthMin) {
+const int minWidth = fmt->widthMin >= 0 ?
+   

Re: [PATCH] Limit log field width

2011-09-06 Thread Alex Rousskov
On 04/16/2011 11:28 PM, Amos Jeffries wrote:
> Clearing up my queue of waiting patches this old one pops up as approved
> but not merged.
> 
> Alex, it is waiting on your commit and resolution of a TODO (separate
> patch was fine). If it is still going to happen you my want to do it
> soon before any more polishing changes alter the logging area.

Done as trunk r11709. A separate polishing patch to remove no longer
needed type castings will follow.


Thank you,

Alex.


Re: [PATCH] Limit log field width

2011-04-16 Thread Amos Jeffries
Clearing up my queue of waiting patches this old one pops up as approved 
but not merged.


Alex, it is waiting on your commit and resolution of a TODO (separate 
patch was fine). If it is still going to happen you my want to do it 
soon before any more polishing changes alter the logging area.


Amos


Re: [PATCH] Limit log field width

2010-01-22 Thread Amos Jeffries

Alex Rousskov wrote:

On 01/21/2010 05:19 AM, Amos Jeffries wrote:

Alex Rousskov wrote:

Hello,



TODO: The name comes from the printf(3) "precision" format part. It may
be a good idea to rename our "precision" into max_width or similar,
especially if we do not support floating point precision logging.

" [width[.maximum]] " gets my vote. seems clear and concise.


The maximum width field does not require the width field. Should I
change to [width][.maximum] or [width_min].[width_max]? These are just
labels in the .pre -- there is no effect on the code.


Yes definitely needs to be clarified there. The BNF fooled me even with 
your example below it.

I'll agree with your latter naming with that independence.




TODO: Old code used chars to store user-configured field width and
precision. That does not work for URLs, headers, and other entries
longer than 256 characters. This patch changes the storage type to int.
The code should probably be polished further to remove unsigned->signed
conversions.
-

Please review.


+1. If the TODO can be removed.


You are talking about the "unsigned->signed" TODO? I am happy to make
those changes, but they will make the patch much bigger, there is higher
risk of compilation warnings, and they are not really about the feature
in question. Should I still do it? Perhaps as a separate commit?


Up to you.

Amos
--
Please be using
  Current Stable Squid 2.7.STABLE7 or 3.0.STABLE21
  Current Beta Squid 3.1.0.15


Re: [PATCH] Limit log field width

2010-01-22 Thread Alex Rousskov
On 01/21/2010 05:19 AM, Amos Jeffries wrote:
> Alex Rousskov wrote:
>> Hello,
>>
>> The attached patch was motivated by user complaints that “vi”,
>> “cat”, and even some more sophisticated log analysis tools have trouble
>> handling long access.log lines. Those long log lines result from long
>> URLs that do occur in the wild.
>>
>> Squid places a 8192 character limit on the URL length but that limit
>> exceeds (a) some of the tool limits and (b) Squid's access.log buffer
>> limit (if some other fields are logged).
>>
>> My solution was to honor the .precision setting in logformat field
>> specifications. You can use it with %ru or any other text field.
>>
>> For example, the format code below limits logged URI size to the first
>> 1000 characters.
>>
>>   logformat xsquid ... %rm %.1000ru %un ...
>>
>> Squid access log line buffer cannot exceed 8192 characters. If you want
>> to preserve fields logged after the URL, your logged URL width limit
>> should be smaller than 8192 to leave space for other fields on the log
>> line.
>>
>> There is no width limit by default.
>>
>> Here is a possible commit message:
>>
>> -
>> Support maximum field width for string access.log fields.
>>
>> Some standard command-line and some log processing tools have trouble
>> handling URLs or other logged fields exceeding 8KB in length. Moreover,
>> Squid violates its own log line format and truncates the entire log line
>> if, for example, the URL is 8KB long. By supporting .precision format
>> argument, we allow the administrator to specify logged URL size and
>> avoid these problems.
>>
>> Limiting logged field width has no effect on traffic on the wire.
> 
> Um, not quite true now that we log over UDP. anyways...

Also not quite true if you look at the log via ssh :-)

>> TODO: The name comes from the printf(3) "precision" format part. It may
>> be a good idea to rename our "precision" into max_width or similar,
>> especially if we do not support floating point precision logging.
> 
> " [width[.maximum]] " gets my vote. seems clear and concise.

The maximum width field does not require the width field. Should I
change to [width][.maximum] or [width_min].[width_max]? These are just
labels in the .pre -- there is no effect on the code.


>> TODO: Old code used chars to store user-configured field width and
>> precision. That does not work for URLs, headers, and other entries
>> longer than 256 characters. This patch changes the storage type to int.
>> The code should probably be polished further to remove unsigned->signed
>> conversions.
>> -
>>
>> Please review.
>>
> 
> +1. If the TODO can be removed.

You are talking about the "unsigned->signed" TODO? I am happy to make
those changes, but they will make the patch much bigger, there is higher
risk of compilation warnings, and they are not really about the feature
in question. Should I still do it? Perhaps as a separate commit?

Thank you,

Alex.



Re: [PATCH] Limit log field width

2010-01-21 Thread Amos Jeffries

Alex Rousskov wrote:

Hello,

The attached patch was motivated by user complaints that “vi”,
“cat”, and even some more sophisticated log analysis tools have trouble
handling long access.log lines. Those long log lines result from long
URLs that do occur in the wild.

Squid places a 8192 character limit on the URL length but that limit
exceeds (a) some of the tool limits and (b) Squid's access.log buffer
limit (if some other fields are logged).

My solution was to honor the .precision setting in logformat field
specifications. You can use it with %ru or any other text field.

For example, the format code below limits logged URI size to the first
1000 characters.

  logformat xsquid ... %rm %.1000ru %un ...

Squid access log line buffer cannot exceed 8192 characters. If you want
to preserve fields logged after the URL, your logged URL width limit
should be smaller than 8192 to leave space for other fields on the log
line.

There is no width limit by default.

Here is a possible commit message:

-
Support maximum field width for string access.log fields.

Some standard command-line and some log processing tools have trouble
handling URLs or other logged fields exceeding 8KB in length. Moreover,
Squid violates its own log line format and truncates the entire log line
if, for example, the URL is 8KB long. By supporting .precision format
argument, we allow the administrator to specify logged URL size and
avoid these problems.

Limiting logged field width has no effect on traffic on the wire.


Um, not quite true now that we log over UDP. anyways...



TODO: The name comes from the printf(3) "precision" format part. It may
be a good idea to rename our "precision" into max_width or similar,
especially if we do not support floating point precision logging.


" [width[.maximum]] " gets my vote. seems clear and concise.



TODO: Old code used chars to store user-configured field width and
precision. That does not work for URLs, headers, and other entries
longer than 256 characters. This patch changes the storage type to int.
The code should probably be polished further to remove unsigned->signed
conversions.
-

Please review.



+1. If the TODO can be removed.

Amos
--
Please be using
  Current Stable Squid 2.7.STABLE7 or 3.0.STABLE21
  Current Beta Squid 3.1.0.15


[PATCH] Limit log field width

2010-01-20 Thread Alex Rousskov
Hello,

The attached patch was motivated by user complaints that “vi”,
“cat”, and even some more sophisticated log analysis tools have trouble
handling long access.log lines. Those long log lines result from long
URLs that do occur in the wild.

Squid places a 8192 character limit on the URL length but that limit
exceeds (a) some of the tool limits and (b) Squid's access.log buffer
limit (if some other fields are logged).

My solution was to honor the .precision setting in logformat field
specifications. You can use it with %ru or any other text field.

For example, the format code below limits logged URI size to the first
1000 characters.

  logformat xsquid ... %rm %.1000ru %un ...

Squid access log line buffer cannot exceed 8192 characters. If you want
to preserve fields logged after the URL, your logged URL width limit
should be smaller than 8192 to leave space for other fields on the log
line.

There is no width limit by default.

Here is a possible commit message:

-
Support maximum field width for string access.log fields.

Some standard command-line and some log processing tools have trouble
handling URLs or other logged fields exceeding 8KB in length. Moreover,
Squid violates its own log line format and truncates the entire log line
if, for example, the URL is 8KB long. By supporting .precision format
argument, we allow the administrator to specify logged URL size and
avoid these problems.

Limiting logged field width has no effect on traffic on the wire.

TODO: The name comes from the printf(3) "precision" format part. It may
be a good idea to rename our "precision" into max_width or similar,
especially if we do not support floating point precision logging.

TODO: Old code used chars to store user-configured field width and
precision. That does not work for URLs, headers, and other entries
longer than 256 characters. This patch changes the storage type to int.
The code should probably be polished further to remove unsigned->signed
conversions.
-

Please review.

Thank you,

Alex.
Support maximum field width for string access.log fields.

Some standard command-line and some log processing tools have trouble handling
URLs or other logged fields exceeding 8KB in length. Moreover, Squid violates
its own log line format and truncates the entire log line if, for example, the
URL is 8KB long. By supporting .precision format argument, we allow the
administrator to specify logged URL size and avoid these problems.

Limiting logged field width has no effect on traffic on the wire.

TODO: The name comes from the printf(3) "precision" format part. It may be a
good idea to rename our "precision" into max_width or similar, especially if
we do not support floating point precision logging.

TODO: Old code used chars to store user-configured field width and precision.
That does not work for URLs, headers, and other entries longer than 256
characters. This patch changes the storage type to int. The code should
probably be polished further to remove unsigned->signed conversions.

=== modified file 'src/cf.data.pre'
--- src/cf.data.pre	2010-01-02 04:32:46 +
+++ src/cf.data.pre	2010-01-20 17:15:59 +
@@ -2462,7 +2462,7 @@
 	modifiers are usually not needed, but can be specified if an explicit
 	output format is desired.
 
-		% ["|[|'|#] [-] [[0]width] [{argument}] formatcode
+		% ["|[|'|#] [-] [[0]width[.precision]] [{argument}] formatcode
 
 		"	output in quoted string format
 		[	output in squid text log format as used by log_mime_hdrs
@@ -2470,8 +2470,10 @@
 		'	output as-is
 
 		-	left aligned
-		width	field width. If starting with 0 the
-			output is zero padded
+		width	minimum field width. 
+			If starting with 0 the output is zero padded
+		precision	maximum field width for string values.
+			Longer string values are truncated.
 		{arg}	argument such as header name etc
 
 	Format codes:

=== modified file 'src/log/access_log.cc'
--- src/log/access_log.cc	2009-12-22 01:12:53 +
+++ src/log/access_log.cc	2010-01-20 17:15:59 +
@@ -466,8 +466,8 @@
 } header;
 char *timespec;
 } data;
-unsigned char width;
-unsigned char precision;
+unsigned int width;
+unsigned int precision;
 enum log_quote quote;
 unsigned int left:1;
 unsigned int space:1;
@@ -1213,13 +1213,15 @@
 }
 }
 
-if (fmt->width) {
-if (fmt->left)
-mb.Printf("%-*s", (int) fmt->width, out);
-else
-mb.Printf("%*s", (int) fmt->width, out);
-} else
-mb.append(out, strlen(out));
+// enforce width limits if configured
+const int minWidth = fmt->width ? static_cast(fmt->width) : 0;
+const int maxWidth = (fmt->precision && !doint && !dooff) ?
+static_cast(fmt->precision) : strlen(out);
+
+if (fmt->left)
+mb.Printf("%-*.*s", minW