Re: [OT] Re: Some problem of analyzing the tomcat logs

2010-09-18 Thread yang Yang
Thanks all the guys for your reaply.

SInce we can not use any third-party library,so we have to do ourselves.

So I am looking for a java-based log analyzer to see how do them work.

2010/9/17 Pid p...@pidster.com

 On 17/09/2010 08:19, Wesley Acheson wrote:
  On Fri, Sep 17, 2010 at 9:17 AM, André Warnier a...@ice-sa.com wrote:
  Hi.
 
  In short and in my opinion, I think that you are re-inventing the wheel.
  There exist already numerous open-source programs which analyse web
 logs,
  and generally produce nice-looking graphics etc.. from them.  And they
 do
  the splitting-up work properly, as long as you feed them the correct log
  format.  Their documentation indicates how to do that.
  Look up webalizer, awstats etc..
  Also, these programs are open-source, so you can look inside at how they
 do
  things, if you really want to write your own code.
 
 
 
  +1 There is a lot of software out there that gives good logs. However
  I don't know if many of them distinguish the file extensions which
  seems to be his problem?

 Yes, they usually do.  They're mostly useless otherwise.

 I'm marking this as off-topic, because it's not a Tomcat problem.


 p

  -
  To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
  For additional commands, e-mail: users-h...@tomcat.apache.org
 




Some problem of analyzing the tomcat logs

2010-09-17 Thread yang Yang
Hi:
I am trying to develop a web based tool to track page hit counts, user
session activity and etc of our own sites.

I meet some problems:

1) How to distinguish a request target is a page or a resource?

For example,the following two logs(remove some parts):

#1- [17/Sep/2010:11:38:26 +0800] POST /test.jsp?name=test HTTP/1.1 200
test.jsp
#2- [17/Sep/2010:11:40:11 +0800] POST /example/test.jpg HTTP/1.1 200
/example/test.jpg
#3- [17/Sep/2010:11:44:26 +0800] POST /example/testServlet HTTP/1.1 200
test.jsp
the pattern used in the above log is : '%t %r %s %U'.

The log #1 show a page request with a parameter, it can be use to calculate
the most frequently visited pages.

Log #2 show a resource(it is a image here) request, it can be used to
calculate the most frequently visited files.

Log#3 show a requst with nothing(it is a servlet),in fact it is a page.

That's to say, they are different request types,so how to distinguish them
in my codes?

2)Log parser.
I can read the log file line by line. But how to extract the value of each
attribute?
They are all in one line. Split them using the string.split() method? But
how if the value itself contains the separator?

For example, I use the split( ) to split the log#1,but the value POST
/example/test.jpg HTTP/1.1 will be splitted also,and this maybe
inefficient, so I wonder if there is a tool can make me do this easily?


Re: Some problem of analyzing the tomcat logs

2010-09-17 Thread André Warnier

Hi.

In short and in my opinion, I think that you are re-inventing the wheel.
There exist already numerous open-source programs which analyse web logs, and generally 
produce nice-looking graphics etc.. from them.  And they do the splitting-up work 
properly, as long as you feed them the correct log format.  Their documentation indicates 
how to do that.

Look up webalizer, awstats etc..
Also, these programs are open-source, so you can look inside at how they do things, if you 
really want to write your own code.



yang Yang wrote:

Hi:
I am trying to develop a web based tool to track page hit counts, user
session activity and etc of our own sites.

I meet some problems:

1) How to distinguish a request target is a page or a resource?

For example,the following two logs(remove some parts):

#1- [17/Sep/2010:11:38:26 +0800] POST /test.jsp?name=test HTTP/1.1 200
test.jsp
#2- [17/Sep/2010:11:40:11 +0800] POST /example/test.jpg HTTP/1.1 200
/example/test.jpg
#3- [17/Sep/2010:11:44:26 +0800] POST /example/testServlet HTTP/1.1 200
test.jsp
the pattern used in the above log is : '%t %r %s %U'.

The log #1 show a page request with a parameter, it can be use to calculate
the most frequently visited pages.

Log #2 show a resource(it is a image here) request, it can be used to
calculate the most frequently visited files.

Log#3 show a requst with nothing(it is a servlet),in fact it is a page.

That's to say, they are different request types,so how to distinguish them
in my codes?

2)Log parser.
I can read the log file line by line. But how to extract the value of each
attribute?
They are all in one line. Split them using the string.split() method? But
how if the value itself contains the separator?

For example, I use the split( ) to split the log#1,but the value POST
/example/test.jpg HTTP/1.1 will be splitted also,and this maybe
inefficient, so I wonder if there is a tool can make me do this easily?




-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Some problem of analyzing the tomcat logs

2010-09-17 Thread Wesley Acheson
On Fri, Sep 17, 2010 at 9:17 AM, André Warnier a...@ice-sa.com wrote:
 Hi.

 In short and in my opinion, I think that you are re-inventing the wheel.
 There exist already numerous open-source programs which analyse web logs,
 and generally produce nice-looking graphics etc.. from them.  And they do
 the splitting-up work properly, as long as you feed them the correct log
 format.  Their documentation indicates how to do that.
 Look up webalizer, awstats etc..
 Also, these programs are open-source, so you can look inside at how they do
 things, if you really want to write your own code.



+1 There is a lot of software out there that gives good logs. However
I don't know if many of them distinguish the file extensions which
seems to be his problem?

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Some problem of analyzing the tomcat logs

2010-09-17 Thread Mikolaj Rydzewski
On Fri, 17 Sep 2010 09:19:39 +0200, Wesley Acheson

 +1 There is a lot of software out there that gives good logs. However
 I don't know if many of them distinguish the file extensions which
 seems to be his problem?

One should divide site structure to have separate folders for images,
resources (e.g. PDFs) and actual pages. Let's say:

/i/ - for images
/files/ - for ZIPs, PDFs, etc
... - and the rest if for 'pages'

It's trivial to configure Webalizer to understand such site structure.

-- 
Mikolaj Rydzewski

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



[OT] Re: Some problem of analyzing the tomcat logs

2010-09-17 Thread Pid
On 17/09/2010 08:19, Wesley Acheson wrote:
 On Fri, Sep 17, 2010 at 9:17 AM, André Warnier a...@ice-sa.com wrote:
 Hi.

 In short and in my opinion, I think that you are re-inventing the wheel.
 There exist already numerous open-source programs which analyse web logs,
 and generally produce nice-looking graphics etc.. from them.  And they do
 the splitting-up work properly, as long as you feed them the correct log
 format.  Their documentation indicates how to do that.
 Look up webalizer, awstats etc..
 Also, these programs are open-source, so you can look inside at how they do
 things, if you really want to write your own code.


 
 +1 There is a lot of software out there that gives good logs. However
 I don't know if many of them distinguish the file extensions which
 seems to be his problem?

Yes, they usually do.  They're mostly useless otherwise.

I'm marking this as off-topic, because it's not a Tomcat problem.


p

 -
 To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
 For additional commands, e-mail: users-h...@tomcat.apache.org
 



0x62590808.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature