OK. I have supplied my patch. The affected files are htsearch/Display.cc 
and htcommon/defaults.cc, and
they both come from the htdig-3.1.5 tarball.

It has only been tested under Debian Linux ( 2.2 release 3 "potato" ).
The main issue with compiling it will be file locking, so I have made
it easy to #ifdef that out of the way

The biggest problem that I have with it as it stands is that I have no
"else" condition if I am unable to open the log file for writing - so
it would just silently fail to log the search results.

In answer to some of Geoff's questions:
   Q: Why not just check for ( page == 1 ) ? Why use an "init=Y" CGI variable?
   A: I wanted to be able to tell the difference between someones original
      then clicking on the "1" link at the base of the page

   Q: Why use seconds since epoch(SSE) instead of the Standard Date format?
   A: For me, the log files are predominantly designed to be read by
      CGI scripts, so the SSE format makes alot more sense for me.

The way it works - it is all controlled via some extra variables in the config
file. If you simply specify "logging : true", then it will behave in the
standard way and use "syslog". The extra config variables are:

    * logging_file ( Default: none )

      If this is set to "none", then it will log using syslog, otherwise
      this will be assumed to be the path to the log file

    * logging_initonly ( Default : false )

      Boolean. If "true", it will only only perform logging when the CGI
      parameter "init=Y" is specified, otherwise it will log every usage
      of htsearch

    * logging_field_separator ( Default : none )

      If this is "none", then the data is output in the "traditional" format of

       TIME:REMOTE_ADDR [config] (match_method) [words] [logicalWords] 
(matches/matches_per_page) - page, HTTP_REFERER

      otherwise, it outputs the same fields seperated by the specified 
string. For
      example,

         logging_field_separator : |

      would output

         
TIME|REMOTE_ADDR|config|match_method|words|logicalWords|matches|matches_per_page|page|HTTP_REFERER

    * logging_time_as_seconds ( Default : false )

      Boolean. Only applicable when writing to a file.

      If this is true, it will output the TIME field as the number of 
seconds since 00:00 1/1/1970,
      otherwise it will output it in a  "Wed Jun 30 21:49:08 1993" type format.

This patch allows for some very basic formatting. The single separator 
format is still very flexible.
Picking out values using perl is basically two lines:

        @fields=split(/\|/, $_ );
        local( $date, $words, $hits ) = ( $fields[0], $fields[4] , 
$fields[6] );


The Patches:

*** htdig-3.1.5/htcommon/defaults.cc    Thu Feb 24 21:29:10 2000
--- htdig-3.1.5.new/htcommon/defaults.cc       Thu May 10 14:23:56 2001
***************
*** 81,86 ****
--- 81,90 ----
       {"local_urls_only",                       "false"},
       {"local_user_urls",                       ""},
       {"logging",                         "false"},
+     {"logging_file",                    "none"},
+     {"logging_initonly",                "false"},
+     {"logging_field_separator",         "none"},
+     {"logging_time_as_seconds",         "false"},
       {"maintainer",                    "[EMAIL PROTECTED]"},
       {"match_method",                  "and"},
       {"matches_per_page",              "10"},


*** htdig-3.1.5/htsearch/Display.cc     Thu Feb 24 21:29:11 2000
--- htdig-3.1.5.new/htsearch/Display.cc Thu May 10 16:04:15 2001
***************
*** 24,29 ****
--- 24,32 ----
   #include "HtURLCodec.h"
   #include "HtWordType.h"

+ #include <fcntl.h>
+ #include <sys/file.h>
+
   //*****************************************************************************
   //
   Display::Display(char *indexFile, char *docFile)
***************
*** 96,113 ****
       int                       currentMatch = 0;
       int                       numberDisplayed = 0;
       ResultMatch               *match = 0;
       int                       number = config.Value("matches_per_page");
       if (number <= 0)
!       number = 10;
       int                       startAt = (pageNumber - 1) * number;

!     if (config.Boolean("logging"))
       {
!         logSearch(pageNumber, matches);
       }

       setVariables(pageNumber, matches);
!
       //
       // The first match is guaranteed to have the highest score of
       // all the matches.  We use this to compute the number of stars
--- 99,127 ----
       int                       currentMatch = 0;
       int                       numberDisplayed = 0;
       ResultMatch               *match = 0;
+
       int                       number = config.Value("matches_per_page");
       if (number <= 0)
!          number = 10;
!
       int                       startAt = (pageNumber - 1) * number;

!     if ( config.Boolean("logging") )
       {
!               int dolog = 1;
!               if( config.Boolean("logging_initonly") )
!               {
!                       const char* initstr = input->get("init");
!                       dolog = ( input->exists("init") &&
!                                 strncasecmp( input->get("init"), "Y", 1) 
== 0 );
!           }
!
!
!         if ( dolog ) logSearch(pageNumber, matches);
       }

       setVariables(pageNumber, matches);
!
       //
       // The first match is guaranteed to have the highest score of
       // all the matches.  We use this to compute the number of stars
***************
*** 1331,1340 ****
   void
   Display::logSearch(int page, List *matches)
   {
       // Currently unused    time_t     t;
       int               nMatches = 0;
-     int         level = LOG_LEVEL;
-     int         facility = LOG_FACILITY;
       char        *host = getenv("REMOTE_HOST");
       char        *ref = getenv("HTTP_REFERER");

--- 1345,1356 ----
   void
   Display::logSearch(int page, List *matches)
   {
+     const char* logfile = config["logging_file"];
+
       // Currently unused    time_t     t;
+
+     time_t  tnow = time(NULL);
       int               nMatches = 0;
       char        *host = getenv("REMOTE_HOST");
       char        *ref = getenv("HTTP_REFERER");

***************
*** 1349,1360 ****
       if (matches)
         nMatches = matches->Count();

!     openlog("htsearch", LOG_PID, facility);
!     syslog(level, "%s [%s] (%s) [%s] [%s] (%d/%s) - %d -- %s\n",
!          host,
!          input->exists("config") ? input->get("config") : "default",
!          config["match_method"], input->get("words"), logicalWords.get(),
!          nMatches, config["matches_per_page"],
!          page, ref
            );
   }
--- 1365,1491 ----
       if (matches)
         nMatches = matches->Count();

!     // Establish the output format. The supplied default value of the
!     // output format is the traditional one written to the system log
!     //
!     // If the user has supplied a field separator for logging, then
!     // the field separator string is reconstructed using that
!
!     const char* basic_output_format = "%s%s [%s] (%s) [%s] [%s] (%d/%s) - 
%d -- %s\n";
!     const char* alt_output_format   = "%s%s|%s|%s|%s|%s|%d|%s|%d|%s\n";
!     const char* fieldsep = config[ "logging_field_separator" ];
!     char* new_output_format = 0;
!
!     const char* file_time_sep = ":";
!
!     if ( strcmp( fieldsep, "none" ) != 0 )
!     {
!        // This is the separator between the time and rest of the output when
!        // writing to a file
!        file_time_sep = fieldsep;
!
!        int nSepLen = strlen( fieldsep );
!        new_output_format = (char*) malloc( sizeof(char)* ( 
10*(nSepLen+2)+15 ));
!
!        const char* pchAlt = alt_output_format;
!        char* pchNew = new_output_format;
!
!        while ( *pchAlt )
!        {
!           if ( *pchAlt != '|' )
!              *pchNew++ = *pchAlt;
!           else
!           {
!              strcpy( pchNew, fieldsep );
!              pchNew += nSepLen;
!           }
!           pchAlt++;
!        }
!        *pchNew = 0;
!        basic_output_format = new_output_format;
!     }
!
!
!     // If outputting to the system log file, it just gets output straight out
!     //
!
!     if ( strcmp( logfile, "none" ) == 0 )
!     {
!        int       level = LOG_LEVEL;
!        int       facility = LOG_FACILITY;
!
!        openlog("htsearch", LOG_PID, facility);
!        syslog(level, basic_output_format ,
!           "", // The first string format item is for a timestamp string - 
not required here
!           host,
!           input->exists("config") ? input->get("config") : "default",
!             config["match_method"],
!             input->get("words"),
!             logicalWords.get(),
!             nMatches,
!             config["matches_per_page"],
!             page,
!             ref
            );
+     }
+
+     // For a file it would need to be determined as to whether a date string
+     // or the time stamp format would be done
+
+     else
+     {
+         char timestring[100];
+
+         if ( config.Boolean("logging_time_as_seconds") )
+             sprintf( timestring, "%lu%s", (unsigned int)tnow, 
file_time_sep );
+         else
+         {
+            strcpy( timestring, ctime( &tnow ));
+
+            // localtime is created with a '\n' as the last character
+            strcpy( timestring + strlen(timestring) - 1, file_time_sep );
+         }
+
+
+ #define __FILE_LOCKING__
+
+ #ifdef __FILE_LOCKING__
+
+         int fd = open( logfile, O_WRONLY | O_APPEND , 00644 );
+         if (fd >= 0 )
+         {
+            flock( fd, LOCK_EX );
+                  FILE* fp = fdopen( fd, "a");
+ #else
+                  FILE* fp = fopen( logfile, "a");
+ #endif
+
+                  if ( fp )
+                  {
+               fprintf(fp, basic_output_format,
+                 timestring,
+                 host,
+                 input->exists("config") ? input->get("config") : "default",
+                 config["match_method"],
+                 input->get("words"),
+                 logicalWords.get(),
+                 nMatches,
+                 config["matches_per_page"],
+                 page,
+                 ref
+                );
+
+                     fclose( fp );
+              }
+
+ #ifdef __FILE_LOCKING__
+            flock( fd, LOCK_UN );
+         }
+ #endif
+
+     }
+
+     // Free up allocated memory blocks
+     if ( new_output_format ) free( new_output_format);
+
   }








At 21:00 5/05/2001 -0500, Geoff Hutchison wrote:
>At 8:39 AM +1000 4/11/01, Brian White wrote:
>>We selected ht://Dig to go on a clients internal site, and have been 
>>happy enough with it so far, with the exception of logging - I ended up 
>>modifying the code to get around what were for us limitations. I'd like 
>>to describe the changes I made - I would be curious to get some feedback.
>
>I put in the current logging code as a horrible (but workable) hack. Since 
>syslog handles locking for you, it seemed like a decent possibility.
>
>>2) The format that the logged information is written out as is a total 
>>pain to pick information out of
>
>A lot of people have complained about this, so alternative suggestions are 
>most welcome. If there's a way to do a user-configurable format that 
>doesn't incur speed penalties, this might be nice too.
>
>>3) The logging information is written out every time htsearch is called.
>
>Yes, this is a pain.
>
>>I then added
>>    <input type=hidden name=init value="Y" >
>
>This doesn't seem necessary. Shouldn't you just check whether the user has 
>requested page 1?
>
>>   Apr  9 09:38:43 myhost htsearch[11228]: 192.168.1.10 [myconfig] (and) 
>> [car]
>>     [(car or auto or automobile)] (98/10) - 1 -- 
>> http://mywebhost.com.au/search/search.html
>
>>number of seconds since epoch at the front:
>>   986844235|192.168.1.10|myconfig|and|car|(car or auto or 
>> automobile)|98|10|1|http://mywebhost.com.au/search/search.html
>
>I don't know about most people, but I would probably go with the standard 
>date format rather than the seconds since epoch.
>
>Still, I think this is a great contribution--could you send the patch?
>
>Thanks,
>--
>--
>-Geoff Hutchison
>Williams Students Online
>http://wso.williams.edu/

-------------------------
Brian White
Step Two Designs Pty Ltd - SGML, XML & HTML Consultancy
Phone: +612-93197901
Web:   http://www.steptwo.com.au/
Email: [EMAIL PROTECTED]


_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to