On Nov 12, 2007 4:49 AM, danil osipchuk <[EMAIL PROTECTED]> wrote:
> XML has a tree-like structure and I would say . Passing all necessary
> structures between the invocations of the tacit verb creates a lot of
> progamming/runtime overhead.

Sure, so J programmers tend to reach for verbs which process as
much data in one bite as they can find.

> > How about
> > get=: ". bind ]
> > set=: 4 :'(x)=:y'
>
> Perfect, but is not it slow? Am I right that each invocation of the explicit
> verb requires parsing?

It takes some time, but that's slow only if you are "taking tiny bites, and
not chewing your food".

> I have huge files with log events (one event per line) that I would like to
> analyze in the following way:
> Get timestamp for each line. Determine  the type of the message by looking
> at the message text. Group events by this type. Then comes the analysys part
> at which J excels (and I don't have questions in this area  yet).

Ok.  (And I like how you have described this -- you did not say "break the log
into lines" as your first step.  That gets into micromanaging the process which
can really slow things down.)

I would build a verb which extracts time stamps for each line from the raw
text.  I would build another verb which extracts message type for each line
from the raw text.  Grouping probably becomes \. on line offset,length pairs,
with message text as a global variable.

> Problem:
> I can not predict beforehand what will be the message string that identifies
> each message type and what is its place in the log string. Therefore I would
> like to build a dictionary that would held already known types (from the
> beginning of the parsing). Each next string is compaired against the
> dictionary. If there is a string containing the same substring as current -
> they are of the same class and this substring identifies the type. Length of
> substring is predefined. So my idea was to go line-by-line and maintain the
> corresponding structure.

Ok, you don't know what you are doing when you start log processing, and
plan to figure it out as you go along.  This plays away from J's strengths
(which tend to leverage the programmer's knowledge of the problem domain).

But you have to have some idea of how you would figure out what it is that
you want to do, and you have not described that.

That said, I would not hesitate to do multiple quick passes over the raw text.
In other words, whatever your rules are for finding message types, I would
simply apply them to the whole log, and then proceed from there.  [And, by
"simply apply" I do mean that I would focus on the simple aspects of these
rules.]

-- 
Raul
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to