Re: [PATCH] Limit log field width
If there is not any objection I will commit this patch to trunk. On 10/05/2011 07:35 PM, Tsantilas Christos wrote: On 09/06/2011 09:45 PM, Alex Rousskov wrote: On 04/16/2011 11:28 PM, Amos Jeffries wrote: Clearing up my queue of waiting patches this old one pops up as approved but not merged. Alex, it is waiting on your commit and resolution of a TODO (separate patch was fine). If it is still going to happen you my want to do it soon before any more polishing changes alter the logging area. Done as trunk r11709. A separate polishing patch to remove no longer needed type castings will follow. And this is the policing patch, which - converts type of the Token::[width|precision] members from "unsigned int" to "int" - renames the Token::[width|precision] members to Token::[widthMin/widthMax] - removes unneeded typecastings Thank you, Alex.
Re: [PATCH] Limit log field width
On 10/05/2011 08:13 PM, Alex Rousskov wrote: On 10/05/2011 10:35 AM, Tsantilas Christos wrote: On 09/06/2011 09:45 PM, Alex Rousskov wrote: On 04/16/2011 11:28 PM, Amos Jeffries wrote: Clearing up my queue of waiting patches this old one pops up as approved but not merged. Alex, it is waiting on your commit and resolution of a TODO (separate patch was fine). If it is still going to happen you my want to do it soon before any more polishing changes alter the logging area. Done as trunk r11709. A separate polishing patch to remove no longer needed type castings will follow. And this is the policing patch, which - converts type of the Token::[width|precision] members from "unsigned int" to "int" - renames the Token::[width|precision] members to Token::[widthMin/widthMax] - removes unneeded typecastings Looks good to me. case LFT_TIME_SUBSECOND: divisor = 1000; -if (precision) { +if (widthMax> 0) { int i; divisor = 100; -for (i = precision; i> 1; i--) +for (i = widthMax; i> 1; i--) divisor /= 10; if (!divisor) divisor = 0; } break; ts Seconds since epoch tu subsecond time (milliseconds) I do not think there is a problem with the above code, but it looks like we could merge %ts.%tu into something like %.3t. That would be a minor usability improvement and out of scope for this patch, of course. True... Thank you, Alex.
Re: [PATCH] Limit log field width
On 10/05/2011 10:35 AM, Tsantilas Christos wrote: > On 09/06/2011 09:45 PM, Alex Rousskov wrote: >> On 04/16/2011 11:28 PM, Amos Jeffries wrote: >>> Clearing up my queue of waiting patches this old one pops up as approved >>> but not merged. >>> >>> Alex, it is waiting on your commit and resolution of a TODO (separate >>> patch was fine). If it is still going to happen you my want to do it >>> soon before any more polishing changes alter the logging area. >> >> Done as trunk r11709. A separate polishing patch to remove no longer >> needed type castings will follow. > > And this is the policing patch, which > - converts type of the Token::[width|precision] members > from "unsigned int" to "int" > - renames the Token::[width|precision] members to > Token::[widthMin/widthMax] > - removes unneeded typecastings Looks good to me. > case LFT_TIME_SUBSECOND: > divisor = 1000; > > -if (precision) { > +if (widthMax > 0) { > int i; > divisor = 100; > > -for (i = precision; i > 1; i--) > +for (i = widthMax; i > 1; i--) > divisor /= 10; > > if (!divisor) > divisor = 0; > } > break; > ts Seconds since epoch > tu subsecond time (milliseconds) I do not think there is a problem with the above code, but it looks like we could merge %ts.%tu into something like %.3t. That would be a minor usability improvement and out of scope for this patch, of course. Thank you, Alex.
Re: [PATCH] Limit log field width
On 09/06/2011 09:45 PM, Alex Rousskov wrote: On 04/16/2011 11:28 PM, Amos Jeffries wrote: Clearing up my queue of waiting patches this old one pops up as approved but not merged. Alex, it is waiting on your commit and resolution of a TODO (separate patch was fine). If it is still going to happen you my want to do it soon before any more polishing changes alter the logging area. Done as trunk r11709. A separate polishing patch to remove no longer needed type castings will follow. And this is the policing patch, which - converts type of the Token::[width|precision] members from "unsigned int" to "int" - renames the Token::[width|precision] members to Token::[widthMin/widthMax] - removes unneeded typecastings Thank you, Alex. log field width polishing This patch: - converts type of the Token::[width|precision] members from "unsigned int" to "int" - renames the Token::[width|precision] members to Token::[widthMin/widthMax] - removes unneeded typecastings This is a Measurement Factory project === modified file 'src/format/Format.cc' --- src/format/Format.cc 2011-09-12 00:31:13 + +++ src/format/Format.cc 2011-10-05 15:29:09 + @@ -202,45 +202,45 @@ break; case LOG_QUOTE_URL: entry->append("#", 1); break; case LOG_QUOTE_RAW: entry->append("'", 1); break; case LOG_QUOTE_NONE: break; } if (t->left) entry->append("-", 1); if (t->zero) entry->append("0", 1); -if (t->width) -storeAppendPrintf(entry, "%d", (int) t->width); +if (t->widthMin >= 0) +storeAppendPrintf(entry, "%d", t->widthMin); -if (t->precision) -storeAppendPrintf(entry, ".%d", (int) t->precision); +if (t->widthMax >= 0) +storeAppendPrintf(entry, ".%d", t->widthMax); if (arg) storeAppendPrintf(entry, "{%s}", arg); storeAppendPrintf(entry, "%s", t->label); if (t->space) entry->append(" ", 1); } } entry->append("\n", 1); } } static void log_quoted_string(const char *str, char *out) { char *p = out; @@ -982,45 +982,45 @@ case LFT_EXT_LOG: if (al->request) out = al->request->extacl_log.termedBuf(); quote = 1; break; case LFT_SEQUENCE_NUMBER: outoff = logSequenceNumber; dooff = 1; break; case LFT_PERCENT: out = "%"; break; } if (dooff) { -snprintf(tmp, sizeof(tmp), "%0*" PRId64, fmt->zero ? (int) fmt->width : 0, outoff); +snprintf(tmp, sizeof(tmp), "%0*" PRId64, fmt->zero && fmt->widthMin >= 0 ? fmt->widthMin : 0, outoff); out = tmp; } else if (doint) { -snprintf(tmp, sizeof(tmp), "%0*ld", fmt->zero ? (int) fmt->width : 0, outint); +snprintf(tmp, sizeof(tmp), "%0*ld", fmt->zero && fmt->widthMin >= 0 ? fmt->widthMin : 0, outint); out = tmp; } if (out && *out) { if (quote || fmt->quote != LOG_QUOTE_NONE) { char *newout = NULL; int newfree = 0; switch (fmt->quote) { case LOG_QUOTE_NONE: newout = rfc1738_escape_unescaped(out); break; case LOG_QUOTE_QUOTES: { size_t out_len = static_cast(strlen(out)) * 2 + 1; if (out_len >= sizeof(tmp)) { newout = (char *)xmalloc(out_len); newfree = 1; } else @@ -1036,46 +1036,46 @@ case LOG_QUOTE_URL: newout = rfc1738_escape(out); break; case LOG_QUOTE_RAW: break; } if (newout) { if (dofree) safe_free(out); out = newout; dofree = newfree; } } // enforce width limits if configured -const bool haveMaxWidth = fmt->precision && !doint && !dooff; -if (haveMaxWidth || fmt->width) { -const int minWidth = fmt->width ? - static_cast(fmt->width) : 0; +const bool haveMaxWidth = fmt->widthMax >=0 && !doint && !dooff; +if (haveMaxWidth || fmt->widthMin) { +const int minWidth = fmt->widthMin >= 0 ? +
Re: [PATCH] Limit log field width
On 04/16/2011 11:28 PM, Amos Jeffries wrote: > Clearing up my queue of waiting patches this old one pops up as approved > but not merged. > > Alex, it is waiting on your commit and resolution of a TODO (separate > patch was fine). If it is still going to happen you my want to do it > soon before any more polishing changes alter the logging area. Done as trunk r11709. A separate polishing patch to remove no longer needed type castings will follow. Thank you, Alex.
Re: [PATCH] Limit log field width
Clearing up my queue of waiting patches this old one pops up as approved but not merged. Alex, it is waiting on your commit and resolution of a TODO (separate patch was fine). If it is still going to happen you my want to do it soon before any more polishing changes alter the logging area. Amos
Re: [PATCH] Limit log field width
Alex Rousskov wrote: On 01/21/2010 05:19 AM, Amos Jeffries wrote: Alex Rousskov wrote: Hello, TODO: The name comes from the printf(3) "precision" format part. It may be a good idea to rename our "precision" into max_width or similar, especially if we do not support floating point precision logging. " [width[.maximum]] " gets my vote. seems clear and concise. The maximum width field does not require the width field. Should I change to [width][.maximum] or [width_min].[width_max]? These are just labels in the .pre -- there is no effect on the code. Yes definitely needs to be clarified there. The BNF fooled me even with your example below it. I'll agree with your latter naming with that independence. TODO: Old code used chars to store user-configured field width and precision. That does not work for URLs, headers, and other entries longer than 256 characters. This patch changes the storage type to int. The code should probably be polished further to remove unsigned->signed conversions. - Please review. +1. If the TODO can be removed. You are talking about the "unsigned->signed" TODO? I am happy to make those changes, but they will make the patch much bigger, there is higher risk of compilation warnings, and they are not really about the feature in question. Should I still do it? Perhaps as a separate commit? Up to you. Amos -- Please be using Current Stable Squid 2.7.STABLE7 or 3.0.STABLE21 Current Beta Squid 3.1.0.15
Re: [PATCH] Limit log field width
On 01/21/2010 05:19 AM, Amos Jeffries wrote: > Alex Rousskov wrote: >> Hello, >> >> The attached patch was motivated by user complaints that “vi”, >> “cat”, and even some more sophisticated log analysis tools have trouble >> handling long access.log lines. Those long log lines result from long >> URLs that do occur in the wild. >> >> Squid places a 8192 character limit on the URL length but that limit >> exceeds (a) some of the tool limits and (b) Squid's access.log buffer >> limit (if some other fields are logged). >> >> My solution was to honor the .precision setting in logformat field >> specifications. You can use it with %ru or any other text field. >> >> For example, the format code below limits logged URI size to the first >> 1000 characters. >> >> logformat xsquid ... %rm %.1000ru %un ... >> >> Squid access log line buffer cannot exceed 8192 characters. If you want >> to preserve fields logged after the URL, your logged URL width limit >> should be smaller than 8192 to leave space for other fields on the log >> line. >> >> There is no width limit by default. >> >> Here is a possible commit message: >> >> - >> Support maximum field width for string access.log fields. >> >> Some standard command-line and some log processing tools have trouble >> handling URLs or other logged fields exceeding 8KB in length. Moreover, >> Squid violates its own log line format and truncates the entire log line >> if, for example, the URL is 8KB long. By supporting .precision format >> argument, we allow the administrator to specify logged URL size and >> avoid these problems. >> >> Limiting logged field width has no effect on traffic on the wire. > > Um, not quite true now that we log over UDP. anyways... Also not quite true if you look at the log via ssh :-) >> TODO: The name comes from the printf(3) "precision" format part. It may >> be a good idea to rename our "precision" into max_width or similar, >> especially if we do not support floating point precision logging. > > " [width[.maximum]] " gets my vote. seems clear and concise. The maximum width field does not require the width field. Should I change to [width][.maximum] or [width_min].[width_max]? These are just labels in the .pre -- there is no effect on the code. >> TODO: Old code used chars to store user-configured field width and >> precision. That does not work for URLs, headers, and other entries >> longer than 256 characters. This patch changes the storage type to int. >> The code should probably be polished further to remove unsigned->signed >> conversions. >> - >> >> Please review. >> > > +1. If the TODO can be removed. You are talking about the "unsigned->signed" TODO? I am happy to make those changes, but they will make the patch much bigger, there is higher risk of compilation warnings, and they are not really about the feature in question. Should I still do it? Perhaps as a separate commit? Thank you, Alex.
Re: [PATCH] Limit log field width
Alex Rousskov wrote: Hello, The attached patch was motivated by user complaints that “vi”, “cat”, and even some more sophisticated log analysis tools have trouble handling long access.log lines. Those long log lines result from long URLs that do occur in the wild. Squid places a 8192 character limit on the URL length but that limit exceeds (a) some of the tool limits and (b) Squid's access.log buffer limit (if some other fields are logged). My solution was to honor the .precision setting in logformat field specifications. You can use it with %ru or any other text field. For example, the format code below limits logged URI size to the first 1000 characters. logformat xsquid ... %rm %.1000ru %un ... Squid access log line buffer cannot exceed 8192 characters. If you want to preserve fields logged after the URL, your logged URL width limit should be smaller than 8192 to leave space for other fields on the log line. There is no width limit by default. Here is a possible commit message: - Support maximum field width for string access.log fields. Some standard command-line and some log processing tools have trouble handling URLs or other logged fields exceeding 8KB in length. Moreover, Squid violates its own log line format and truncates the entire log line if, for example, the URL is 8KB long. By supporting .precision format argument, we allow the administrator to specify logged URL size and avoid these problems. Limiting logged field width has no effect on traffic on the wire. Um, not quite true now that we log over UDP. anyways... TODO: The name comes from the printf(3) "precision" format part. It may be a good idea to rename our "precision" into max_width or similar, especially if we do not support floating point precision logging. " [width[.maximum]] " gets my vote. seems clear and concise. TODO: Old code used chars to store user-configured field width and precision. That does not work for URLs, headers, and other entries longer than 256 characters. This patch changes the storage type to int. The code should probably be polished further to remove unsigned->signed conversions. - Please review. +1. If the TODO can be removed. Amos -- Please be using Current Stable Squid 2.7.STABLE7 or 3.0.STABLE21 Current Beta Squid 3.1.0.15
[PATCH] Limit log field width
Hello, The attached patch was motivated by user complaints that “vi”, “cat”, and even some more sophisticated log analysis tools have trouble handling long access.log lines. Those long log lines result from long URLs that do occur in the wild. Squid places a 8192 character limit on the URL length but that limit exceeds (a) some of the tool limits and (b) Squid's access.log buffer limit (if some other fields are logged). My solution was to honor the .precision setting in logformat field specifications. You can use it with %ru or any other text field. For example, the format code below limits logged URI size to the first 1000 characters. logformat xsquid ... %rm %.1000ru %un ... Squid access log line buffer cannot exceed 8192 characters. If you want to preserve fields logged after the URL, your logged URL width limit should be smaller than 8192 to leave space for other fields on the log line. There is no width limit by default. Here is a possible commit message: - Support maximum field width for string access.log fields. Some standard command-line and some log processing tools have trouble handling URLs or other logged fields exceeding 8KB in length. Moreover, Squid violates its own log line format and truncates the entire log line if, for example, the URL is 8KB long. By supporting .precision format argument, we allow the administrator to specify logged URL size and avoid these problems. Limiting logged field width has no effect on traffic on the wire. TODO: The name comes from the printf(3) "precision" format part. It may be a good idea to rename our "precision" into max_width or similar, especially if we do not support floating point precision logging. TODO: Old code used chars to store user-configured field width and precision. That does not work for URLs, headers, and other entries longer than 256 characters. This patch changes the storage type to int. The code should probably be polished further to remove unsigned->signed conversions. - Please review. Thank you, Alex. Support maximum field width for string access.log fields. Some standard command-line and some log processing tools have trouble handling URLs or other logged fields exceeding 8KB in length. Moreover, Squid violates its own log line format and truncates the entire log line if, for example, the URL is 8KB long. By supporting .precision format argument, we allow the administrator to specify logged URL size and avoid these problems. Limiting logged field width has no effect on traffic on the wire. TODO: The name comes from the printf(3) "precision" format part. It may be a good idea to rename our "precision" into max_width or similar, especially if we do not support floating point precision logging. TODO: Old code used chars to store user-configured field width and precision. That does not work for URLs, headers, and other entries longer than 256 characters. This patch changes the storage type to int. The code should probably be polished further to remove unsigned->signed conversions. === modified file 'src/cf.data.pre' --- src/cf.data.pre 2010-01-02 04:32:46 + +++ src/cf.data.pre 2010-01-20 17:15:59 + @@ -2462,7 +2462,7 @@ modifiers are usually not needed, but can be specified if an explicit output format is desired. - % ["|[|'|#] [-] [[0]width] [{argument}] formatcode + % ["|[|'|#] [-] [[0]width[.precision]] [{argument}] formatcode " output in quoted string format [ output in squid text log format as used by log_mime_hdrs @@ -2470,8 +2470,10 @@ ' output as-is - left aligned - width field width. If starting with 0 the - output is zero padded + width minimum field width. + If starting with 0 the output is zero padded + precision maximum field width for string values. + Longer string values are truncated. {arg} argument such as header name etc Format codes: === modified file 'src/log/access_log.cc' --- src/log/access_log.cc 2009-12-22 01:12:53 + +++ src/log/access_log.cc 2010-01-20 17:15:59 + @@ -466,8 +466,8 @@ } header; char *timespec; } data; -unsigned char width; -unsigned char precision; +unsigned int width; +unsigned int precision; enum log_quote quote; unsigned int left:1; unsigned int space:1; @@ -1213,13 +1213,15 @@ } } -if (fmt->width) { -if (fmt->left) -mb.Printf("%-*s", (int) fmt->width, out); -else -mb.Printf("%*s", (int) fmt->width, out); -} else -mb.append(out, strlen(out)); +// enforce width limits if configured +const int minWidth = fmt->width ? static_cast(fmt->width) : 0; +const int maxWidth = (fmt->precision && !doint && !dooff) ? +static_cast(fmt->precision) : strlen(out); + +if (fmt->left) +mb.Printf("%-*.*s", minW