OK. I have supplied my patch. The affected files are htsearch/Display.cc
and htcommon/defaults.cc, and
they both come from the htdig-3.1.5 tarball.
It has only been tested under Debian Linux ( 2.2 release 3 "potato" ).
The main issue with compiling it will be file locking, so I have made
it easy to #ifdef that out of the way
The biggest problem that I have with it as it stands is that I have no
"else" condition if I am unable to open the log file for writing - so
it would just silently fail to log the search results.
In answer to some of Geoff's questions:
Q: Why not just check for ( page == 1 ) ? Why use an "init=Y" CGI variable?
A: I wanted to be able to tell the difference between someones original
then clicking on the "1" link at the base of the page
Q: Why use seconds since epoch(SSE) instead of the Standard Date format?
A: For me, the log files are predominantly designed to be read by
CGI scripts, so the SSE format makes alot more sense for me.
The way it works - it is all controlled via some extra variables in the config
file. If you simply specify "logging : true", then it will behave in the
standard way and use "syslog". The extra config variables are:
* logging_file ( Default: none )
If this is set to "none", then it will log using syslog, otherwise
this will be assumed to be the path to the log file
* logging_initonly ( Default : false )
Boolean. If "true", it will only only perform logging when the CGI
parameter "init=Y" is specified, otherwise it will log every usage
of htsearch
* logging_field_separator ( Default : none )
If this is "none", then the data is output in the "traditional" format of
TIME:REMOTE_ADDR [config] (match_method) [words] [logicalWords]
(matches/matches_per_page) - page, HTTP_REFERER
otherwise, it outputs the same fields seperated by the specified
string. For
example,
logging_field_separator : |
would output
TIME|REMOTE_ADDR|config|match_method|words|logicalWords|matches|matches_per_page|page|HTTP_REFERER
* logging_time_as_seconds ( Default : false )
Boolean. Only applicable when writing to a file.
If this is true, it will output the TIME field as the number of
seconds since 00:00 1/1/1970,
otherwise it will output it in a "Wed Jun 30 21:49:08 1993" type format.
This patch allows for some very basic formatting. The single separator
format is still very flexible.
Picking out values using perl is basically two lines:
@fields=split(/\|/, $_ );
local( $date, $words, $hits ) = ( $fields[0], $fields[4] ,
$fields[6] );
The Patches:
*** htdig-3.1.5/htcommon/defaults.cc Thu Feb 24 21:29:10 2000
--- htdig-3.1.5.new/htcommon/defaults.cc Thu May 10 14:23:56 2001
***************
*** 81,86 ****
--- 81,90 ----
{"local_urls_only", "false"},
{"local_user_urls", ""},
{"logging", "false"},
+ {"logging_file", "none"},
+ {"logging_initonly", "false"},
+ {"logging_field_separator", "none"},
+ {"logging_time_as_seconds", "false"},
{"maintainer", "[EMAIL PROTECTED]"},
{"match_method", "and"},
{"matches_per_page", "10"},
*** htdig-3.1.5/htsearch/Display.cc Thu Feb 24 21:29:11 2000
--- htdig-3.1.5.new/htsearch/Display.cc Thu May 10 16:04:15 2001
***************
*** 24,29 ****
--- 24,32 ----
#include "HtURLCodec.h"
#include "HtWordType.h"
+ #include <fcntl.h>
+ #include <sys/file.h>
+
//*****************************************************************************
//
Display::Display(char *indexFile, char *docFile)
***************
*** 96,113 ****
int currentMatch = 0;
int numberDisplayed = 0;
ResultMatch *match = 0;
int number = config.Value("matches_per_page");
if (number <= 0)
! number = 10;
int startAt = (pageNumber - 1) * number;
! if (config.Boolean("logging"))
{
! logSearch(pageNumber, matches);
}
setVariables(pageNumber, matches);
!
//
// The first match is guaranteed to have the highest score of
// all the matches. We use this to compute the number of stars
--- 99,127 ----
int currentMatch = 0;
int numberDisplayed = 0;
ResultMatch *match = 0;
+
int number = config.Value("matches_per_page");
if (number <= 0)
! number = 10;
!
int startAt = (pageNumber - 1) * number;
! if ( config.Boolean("logging") )
{
! int dolog = 1;
! if( config.Boolean("logging_initonly") )
! {
! const char* initstr = input->get("init");
! dolog = ( input->exists("init") &&
! strncasecmp( input->get("init"), "Y", 1)
== 0 );
! }
!
!
! if ( dolog ) logSearch(pageNumber, matches);
}
setVariables(pageNumber, matches);
!
//
// The first match is guaranteed to have the highest score of
// all the matches. We use this to compute the number of stars
***************
*** 1331,1340 ****
void
Display::logSearch(int page, List *matches)
{
// Currently unused time_t t;
int nMatches = 0;
- int level = LOG_LEVEL;
- int facility = LOG_FACILITY;
char *host = getenv("REMOTE_HOST");
char *ref = getenv("HTTP_REFERER");
--- 1345,1356 ----
void
Display::logSearch(int page, List *matches)
{
+ const char* logfile = config["logging_file"];
+
// Currently unused time_t t;
+
+ time_t tnow = time(NULL);
int nMatches = 0;
char *host = getenv("REMOTE_HOST");
char *ref = getenv("HTTP_REFERER");
***************
*** 1349,1360 ****
if (matches)
nMatches = matches->Count();
! openlog("htsearch", LOG_PID, facility);
! syslog(level, "%s [%s] (%s) [%s] [%s] (%d/%s) - %d -- %s\n",
! host,
! input->exists("config") ? input->get("config") : "default",
! config["match_method"], input->get("words"), logicalWords.get(),
! nMatches, config["matches_per_page"],
! page, ref
);
}
--- 1365,1491 ----
if (matches)
nMatches = matches->Count();
! // Establish the output format. The supplied default value of the
! // output format is the traditional one written to the system log
! //
! // If the user has supplied a field separator for logging, then
! // the field separator string is reconstructed using that
!
! const char* basic_output_format = "%s%s [%s] (%s) [%s] [%s] (%d/%s) -
%d -- %s\n";
! const char* alt_output_format = "%s%s|%s|%s|%s|%s|%d|%s|%d|%s\n";
! const char* fieldsep = config[ "logging_field_separator" ];
! char* new_output_format = 0;
!
! const char* file_time_sep = ":";
!
! if ( strcmp( fieldsep, "none" ) != 0 )
! {
! // This is the separator between the time and rest of the output when
! // writing to a file
! file_time_sep = fieldsep;
!
! int nSepLen = strlen( fieldsep );
! new_output_format = (char*) malloc( sizeof(char)* (
10*(nSepLen+2)+15 ));
!
! const char* pchAlt = alt_output_format;
! char* pchNew = new_output_format;
!
! while ( *pchAlt )
! {
! if ( *pchAlt != '|' )
! *pchNew++ = *pchAlt;
! else
! {
! strcpy( pchNew, fieldsep );
! pchNew += nSepLen;
! }
! pchAlt++;
! }
! *pchNew = 0;
! basic_output_format = new_output_format;
! }
!
!
! // If outputting to the system log file, it just gets output straight out
! //
!
! if ( strcmp( logfile, "none" ) == 0 )
! {
! int level = LOG_LEVEL;
! int facility = LOG_FACILITY;
!
! openlog("htsearch", LOG_PID, facility);
! syslog(level, basic_output_format ,
! "", // The first string format item is for a timestamp string -
not required here
! host,
! input->exists("config") ? input->get("config") : "default",
! config["match_method"],
! input->get("words"),
! logicalWords.get(),
! nMatches,
! config["matches_per_page"],
! page,
! ref
);
+ }
+
+ // For a file it would need to be determined as to whether a date string
+ // or the time stamp format would be done
+
+ else
+ {
+ char timestring[100];
+
+ if ( config.Boolean("logging_time_as_seconds") )
+ sprintf( timestring, "%lu%s", (unsigned int)tnow,
file_time_sep );
+ else
+ {
+ strcpy( timestring, ctime( &tnow ));
+
+ // localtime is created with a '\n' as the last character
+ strcpy( timestring + strlen(timestring) - 1, file_time_sep );
+ }
+
+
+ #define __FILE_LOCKING__
+
+ #ifdef __FILE_LOCKING__
+
+ int fd = open( logfile, O_WRONLY | O_APPEND , 00644 );
+ if (fd >= 0 )
+ {
+ flock( fd, LOCK_EX );
+ FILE* fp = fdopen( fd, "a");
+ #else
+ FILE* fp = fopen( logfile, "a");
+ #endif
+
+ if ( fp )
+ {
+ fprintf(fp, basic_output_format,
+ timestring,
+ host,
+ input->exists("config") ? input->get("config") : "default",
+ config["match_method"],
+ input->get("words"),
+ logicalWords.get(),
+ nMatches,
+ config["matches_per_page"],
+ page,
+ ref
+ );
+
+ fclose( fp );
+ }
+
+ #ifdef __FILE_LOCKING__
+ flock( fd, LOCK_UN );
+ }
+ #endif
+
+ }
+
+ // Free up allocated memory blocks
+ if ( new_output_format ) free( new_output_format);
+
}
At 21:00 5/05/2001 -0500, Geoff Hutchison wrote:
>At 8:39 AM +1000 4/11/01, Brian White wrote:
>>We selected ht://Dig to go on a clients internal site, and have been
>>happy enough with it so far, with the exception of logging - I ended up
>>modifying the code to get around what were for us limitations. I'd like
>>to describe the changes I made - I would be curious to get some feedback.
>
>I put in the current logging code as a horrible (but workable) hack. Since
>syslog handles locking for you, it seemed like a decent possibility.
>
>>2) The format that the logged information is written out as is a total
>>pain to pick information out of
>
>A lot of people have complained about this, so alternative suggestions are
>most welcome. If there's a way to do a user-configurable format that
>doesn't incur speed penalties, this might be nice too.
>
>>3) The logging information is written out every time htsearch is called.
>
>Yes, this is a pain.
>
>>I then added
>> <input type=hidden name=init value="Y" >
>
>This doesn't seem necessary. Shouldn't you just check whether the user has
>requested page 1?
>
>> Apr 9 09:38:43 myhost htsearch[11228]: 192.168.1.10 [myconfig] (and)
>> [car]
>> [(car or auto or automobile)] (98/10) - 1 --
>> http://mywebhost.com.au/search/search.html
>
>>number of seconds since epoch at the front:
>> 986844235|192.168.1.10|myconfig|and|car|(car or auto or
>> automobile)|98|10|1|http://mywebhost.com.au/search/search.html
>
>I don't know about most people, but I would probably go with the standard
>date format rather than the seconds since epoch.
>
>Still, I think this is a great contribution--could you send the patch?
>
>Thanks,
>--
>--
>-Geoff Hutchison
>Williams Students Online
>http://wso.williams.edu/
-------------------------
Brian White
Step Two Designs Pty Ltd - SGML, XML & HTML Consultancy
Phone: +612-93197901
Web: http://www.steptwo.com.au/
Email: [EMAIL PROTECTED]
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html