Re: Querying A Repository of XML Files...Or, How To Do Portal Objects

2004-04-23 Thread Joerg Heinicke
On 23.04.2004 02:18, David Swearingen wrote:

I picked Cocoon as my platform in part because of the elegance and
simplicity of keeping content in xml files in a directory(s) where I
can see them, and so I can have ad hoc document structures without
having to be tied down to a RDBMS schema that can never match all the
content types I'll be publishing.  So I think for simplicity's sake
here assume I have a directory with a thousand xml files of textual
content, say, news articles.

So any given portal object needs at some point to be able to query my
repository for a few titles that meet a few criteria.  That's easy in
SQL of course -- but how do I do something like that in the
XML/Cocoon world?
Do I index?  Do I scour through once and then cache for a few hours?
Do I have a separate procedural/Java process that creates
intermediate files that can be more rapidly transformed into headline
lists?  I can imagine different general approaches, but I don't know
how to implement with the Cocoon toolset, and I'm sure I'm not the
first person to have this requirement.
A hand-written solution using DirectoryGenerator might be to slow if 
there are really thousands of files. Though you can cache its output, 
every non-cached access would probably take many seconds.

More appropriate seems to be the indexing using Lucene, but I don't how 
flexible it is with regard to your needs (latest 3, first sentence, 
etc.). And the more stuff you have to store the more I would tend to an 
XML database like XIndice.

All components are delivered with a recent Cocoon.

Joerg

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Querying A Repository of XML Files...Or, How To Do Portal Objects

2004-04-22 Thread David Swearingen
(Sorry for the length of this post, but I think it highlights some classic issues that others may encounter so I thought it was worth being specific.)

I'm building a web application -- not unlike a "portal" -- that contains mostly text content in files, and part of it requires me to solve a problem which is requiring me to bend my brain out of the SQL/Servlet paradigm and into the XML world, and I don't know the best approach.

I picked Cocoon as my platform in part because of the elegance and simplicity of keeping content in xml files in a directory(s) where I can see them, and so I can have ad hoc document structures without having to be tied down to a RDBMS schema that can never match all the content types I'll be publishing. So I think for simplicity's sake here assume I have a directory with a thousand xml files of textual content, say, news articles.

So now I'm running into the problem where I want to have objects on my site home page that are like portal objects/windows: theyeach contain clickable titles of a few pieces of content. Imagine a news portal home page where there's a "Sports News" block that lists the most recent three headlines and their dates; and a "Finance News" block that shows Finance headlines, etc., and maybe sometimes in addition to the headline the first sentence of the story is displayed, i.e. each portal block can have it's own behaviors that are configurable.

So each of these objects on the page basically does the same function, but the object's parameters are slightly different based upon what category of news it grabs headlines for, how many headlines the editor or user wants displayed, etc.

So any given portal object needs at some point to be able to query my repository for a few titles that meet a few criteria. That's easy in SQL of course -- but how do I do something like that in the XML/Cocoon world?

Do I index? Do I scour through once and then cache for a few hours? Do I have a separate procedural/Java process that creates intermediate files that can be more rapidly transformed into headline lists? I can imagine different general approaches, but I don't know how to implement with the Cocoon toolset, and I'm sure I'm not the first person to have this requirement.

Thanks,
David