Hi,

Pursuant to mjw's request, I'm sending this here:

I'm writing an apache logfile parser (yes, yes, re-inventing the wheel and
all that, but this brings up a more general problem anyway), and ran into
the problem that its memory consumption was going through the roof despite
storing data to a circular buffer and thus eventually throwing out old data.

The core of the parsing had been something to the effect of

while((line = log.readLine()) != null)
        addDataToBuffer(parseLine(line));

where parseLine would pull apart sub-pieces of the String line, such as the
IP address, the referer, etc. In many cases integers were parsed out using
a Integer.parseInt(line.substring()) strategy.

As I said, the memory usage of this kept growing (roughly) linearly with
input after the circular buffer wrapped. I re-implemented a readLine
function to return a character array, and re-implemented the line parsing
purely in terms of that character array (creating the String for referer by
using the String(char data[], int start, int len) constructor). Not only did
this substantially reduce memory usage initially, but memory usage also grew
much slower after the circular buffer wrapped. I'm talking about the
difference between failing after using about 300M of memory (oom killer)
versus succeeding after about 180M of memory, in both cases on a 330M log
file.

So, at mjw & robilad's request, I've implemented an example case which just
implements a rudimentary circular buffer for storing hashes which count the
number of times each IP made requests on that day. (Note: I fake days by
assuming that they're 1E5 requests long, so as not to clutter the example up
with apache date parsing code.)

The example provides both core implementations, string and char[]. You can
tune the parameters in the source, and choose the implementation at runtime.
While running, it will indicate it's progress in the circular buffer. This
is meant to be used in conjunction with another terminal monitoring memory
usage, so you can see the relationship between usage and when wrapping
happens. By default it uses /var/log/apache/access.log, but you can change
this by editing logFileName in the source.

I recommend running this with a log file of at least a few hundred meg or
tuning down kaffe's initial heap size as appropriate.

(Please note: I'm not here making any claims about how JVMs or class
libraries should be designed, or trying to disparrage kaffe. First, I
actually discovered this problem with gcj, which was slightly worse about
memory usage (though much faster at execution), and second I think that both
gcj and kaffe are Really Cool. I'm just providing this for reference.)

Thanks,
-Chris Lansdown

-- 
[EMAIL PROTECTED] -> http://www.powerblogs.com/
"Let us endeavor so to live that when we come to die even the undertaker 
will be sorry."  -- Mark Twain, "Pudd'nhead Wilson's Calendar"
========== Evil Overlord Quote of the Day (www.eviloverlord.com) =========
141. As an alternative to not having children, I will have lots of children.
My sons will be too busy jockeying for position to ever be a real
threat, and the daughters will all sabotage each other's attempts to
win the hero. 
import java.io.*;
import java.util.*;

public class MemoryExample {
	String logFileName = "/var/log/apache/access.log";
	int totalDays = 30;
	HashMap buffer[] = new HashMap[totalDays];
	int dayApproximation = 10000; // approximate # of lines in a day
	int pos = -1;
	int dayCount = -1;
	int daysProcessed = -1;

	public static void main(String argv[]) throws IOException {
		if(argv.length == 0) {
			System.out.println("Usage: MemoryExample (string|char[])");
			System.exit(0);
		}

		if(argv[0].equals("string"))
			(new MemoryExample()).runString(argv);
		else if(argv[0].equals("char[]"))
			(new MemoryExample()).runChar(argv);
		else {
			System.out.println("method must be one of \"string\" or \"char[]\"");
			System.exit(1);
		}
		System.out.println("Done. waiting to be killed.");
		int i = 0;
		while(true) {
			i++;
		}
	}
	public void runString(String argv[]) throws IOException {
		BufferedReader reader = new BufferedReader(new FileReader(logFileName), 64*1024);
		String line;
		
		while((line = reader.readLine()) != null) {
			advanceDay();

			int p = line.indexOf(' ', 0);
			String ip = line.substring(0, p);
			int j = 1;
			Integer i = (Integer)buffer[pos].get(ip);
			if(i != null)
				j += i.intValue();
			buffer[pos].put(ip, new Integer(j));
		}

	}

	public void runChar(String argv[]) throws IOException {
		LogReader reader = new LogReader(new BufferedReader(new FileReader(logFileName), 64*1024));
		char line[];

		while((line = reader.readLine()) != null) {
			advanceDay();

			int p = indexOf(line, ' ', 0);
			String ip = new String(line, 0, p);
			int j = 1;
			Integer i = (Integer)buffer[pos].get(ip);
			if(i != null)
				j += i.intValue();
			buffer[pos].put(ip, new Integer(j));
		}
	}

	

	/************************
	 *** Helper Functions ***
	 ************************/
	public void advanceDay() {
		dayCount = (dayCount + 1) % dayApproximation;
		if(dayCount == 0) {
			pos = (pos + 1) % totalDays;
			buffer[pos] = new HashMap();
			daysProcessed++;
			System.out.println("Advanced in circular buffer to element " + pos + " (" + daysProcessed + " days processed)");
		}
	}

	public static int indexOf(char[] text, char c) {
		return indexOf(text, c, 0);
	}
	public static int indexOf(char[] text, char c, int offset) {
		for(int i = offset; i < text.length; i++)
			if(text[i] == c)
				return i;
		return -1;
	}


	/**********************
	 *** Helper Classes ***
	 **********************/
	static class LogReader {
		private final static ArrayList skewer = new ArrayList();
		private final static int MAX = 4096;
		private final static char buffer[] = new char[MAX];
		
		BufferedReader reader;
		boolean lastWasCR = false, overflow = false;
		public LogReader(BufferedReader reader) {
			this.reader = reader;
		}
		public char[] readLine() throws IOException {
			overflow = false;
			int ct;
			int pos = 0;
			ct = reader.read();
			if(ct == -1)
				return null;
			char c = (char)ct;
			if(lastWasCR && c == '\n') {
				ct = reader.read();
				if(ct == -1)
					return null;
				else
					c = (char)ct;
			}
			
			while(c != '\r' && c != '\n') {
				pos = 0;
				while(c != '\r' && c != '\n' && pos < MAX - 1) {
					buffer[pos] = c;
					ct = reader.read();
					if(ct == -1)
						c = '\n';
					else
						c = (char)ct;
					pos++;
				}
				if(c != '\r' && c != '\n') {
					buffer[pos] = c;
					
					overflow = true;
					char t[] = new char[MAX];
					for(int i = 0; i < MAX; i++)
						t[i] = buffer[i];
					skewer.add(t);
				}
			}
			lastWasCR = (c == '\r');
			
			if(overflow) {
				int size = 0;
				for(int i = 0; i < skewer.size(); i++)
					size += ((char[])skewer.get(i)).length;
				
				char r[] = new char[size];
				pos = 0;
				for(int i = 0; i < skewer.size(); i++) {
					char t[] = (char[])skewer.get(i);
					for(int j = 0; j < t.length; j++) {
						r[pos] = t[j];
						pos++;
					}
				}
				skewer.clear();
				return r;
			} else {
				char r[] = new char[pos];
				for(int i =0; i < pos; i++)
					r[i] = buffer[i];
				
				return r;
			}
		}
	}
}
_______________________________________________
kaffe mailing list
[email protected]
http://kaffe.org/cgi-bin/mailman/listinfo/kaffe

Reply via email to