Upayavira wrote, On 31/03/2003 21.11:
Dear All,

Below is the a summary of a brief exchange with Nicola Ken regarding CLI ideas I'd like to implement. He has encouraged me to 'go public', which I am now doing.

Hey, nobody wants to comment on the CLI changes? Or is it that we are doing it too well? ;-)

My aim in the below is twofold: make the CLI into something that is useful to a project I am working on, and also to make the CLI into something that people would prefer to use as opposed to something like wget. [Confession: I'm afraid I still use wget myself.]

:-))


<snip/>
[Nicola Ken - I didn't understand this bit of your reply:]

Actually even the former is managed by Cocoon, I don't remember where but
IIRC the Environment has such an info, only that in the current
implementation of the CLI environments it's unimplemented.

I mean that the hook are already there, you just have to fill in the implementation.


In the Environment there is

    boolean isResponseModified(long lastModified);
    void setResponseIsNotModified();

But it's never implemented. In AbstractEnvironment:

    public boolean isResponseModified(long lastModified) {
        return true; // always modified
    }

    public void setResponseIsNotModified() {
        // does nothing
    }

So it means that the above has to be first implemented, then used when writing to disk.

As Nicola Ken pointed out, links of every page would need to be cached, because when a page will be found to be already on disk and uptodate, you still need the links for crawling. Hmm.

Yup.


---Threading---
Threadinq needs reworking as the ThreadedDestination would become
deprecated.
...
There are two possible forms of threading: generation and dispatch
threading.
...
This kind of threading is important for a system that I want to use it for. The pages bear no relevance to each other, and speed of delivery is important. (I don't plan to implement generation threading ATM).
...
Final comments from Nicola Ken:

What about a publish-subscribe model, with complete decoupling from
the publishing and the handling?

Can you explain more what you mean by this?

I was thinking of a messaging system, like JMS for example, but it's overkill.


Go ahead with your needs.

As points that are important, I would say in order:

 1) make Cocoon *not* output the pages that have an error
 2) make cocoon output xxxpagename.error.txt with the errors
    of the 'xxxpagename' page (configurable)
 3) make the report on broken links in XML so that it can be
    added to the site (where to put it configurable)
 4) make the content not regenerated if uptodate (very important
    from a user perspective POV)
 5) use ModifyableSource instead of Destination
 6) others

Feel free to do whatever in whatever order you prefer, this is just
what IMVHO is the priority. 1+2 are needed BTW so that crawlers see
broken links correctly, otherwise the site seems ok but instead the
broken links are there.


Do you have ideas as to how to do these (i.e. 1-4)? 5 is of greatest importance to me, but if I can understand what is involved in the others, then I can always have a go.

Leave 2 and 3 out then for now.


1 is about not making error pages be printed out... for one thing IIUC it needs resourceUnavailable() to be configurable (write out or not), but I don't know if maybe there are other errors that write directly.

4 is quite important from a user perspective, but maybe it takes some time to do.

Feel really free in doing what you need/prefer, especially if other things take you too much time.

--
Nicola Ken Barozzi                   [EMAIL PROTECTED]
            - verba volant, scripta manent -
   (discussions get forgotten, just code remains)
---------------------------------------------------------------------



Reply via email to