On 4/26/2010 1:33 PM, Justin J Stark wrote:

The way habari handles filters and formatters is very bad.

Formatting for Habari was never anticipated to be as extensive a proposition as what has been described. Filters are certainly in my list of "Things Habari should have done differently." Nonetheless, here are some random additional thoughts...


A more concrete description of what a new system should accomplish (a real use case rather than anonymous "filter X" and "filter Y" placeholders) would be most helpful for producing a practical solution.


It may be useful to retain some of the existing functionality such that there are standard filter suffixes on which the theme can count. For example:

$post->foo  // raw content
$post->foo_filtered  // content with filters applied
$post->foo_safe  // raw content, filtered for xss


There is already a way to provide multiple filters via a single plugin, although it is broken because of a missing Interface, FormatPlugin. If your plugin class implements FormatPlugin, then the Format class will allow any of its methods to be used as formatters. Maybe this isn't ideal.


Input and Output filters should be applied to any filtered content such that an intermediate state is produced. The theme would apply the output filters, and the content entry system would apply the input filters. The result is that content could be re-purposed for a different spec target on output; content could be forced (somehow) to be valid when output in its target format, like XHTML.


There could also be a preliminary intermediate state for cached pre-processing. For example, you might supply post content like this:

  My cat has caught [catch_count] mice.

The input filter would convert the content to an intermediate cached state and store it in the content_cached field (currently in our database and unused). During this input filtering phase, the formatting used by the user (markdown, textile, etc) is converted into a machine-readable neutral format (one of: xml, html, serialized php array, etc) and cached.

On the first output filter pass, the string [catch_count] would need to be replaced by an actual number. This simulates things like latex, heading-to-image font replacements, and the like. The content resulting from this pass would still be in machine-readable neutral format.

On the second output filter pass, the theme would define what format is required for output (xml, html, etc) and call formatters to convert the neutral format to that format. It is imperative (and different from other suggested implementations I've heard) that the *theme* define the ultimate output format, not the user, otherwise the format may cause a mismatch.

In each phase, a battery of converters could be applied to arrive at the intermediate or final states. These should not be weighted (like in Drupal or in Habari's priority model) but dependent, like Habari's stack system. It may even be possible to re-use the Stack system to define filters that should be applied.


One thing that we should try very hard to avoid is the Input Format system in Drupal. It sounds like a good idea, but in practice it's a piece of crap. Letting users choose what input format they want per node is asking for trouble. There are aggravations in there beyond what is readily apparent.

To explain a bit, a Drupal "format" consists of one or more "filters" applied in a specific order. The order is determined by "weight", a Drupal term synonymous with Habari's "priority".

Drupal allows you to configure Input Formats per role. So you can say that an administrator can have access to a Raw HTML format, a Filtered HTML format, and a WYSIWYG format. But an editor role could be configured only to see the WYSIWYG format when editing content. If you then create content as an administrator using the Raw HTML format, no editor can edit it, because they do not have permission to use that format. Debugging these issues is very tedious.

I would personally prefer to make this as simple as possible. Note that having an intermediate format trivializes this process somewhat, since we would only store that intermediate format, and then pull and push the content from it. That is, if content is written and saved from markdown, a user using textile would see textile, because the filter would convert the intermediate format back to textile for them.

Anyway, choosing a format on the publishing page should be avoided, IMO.


Alright, I'm out of the time I allotted for this email.  :-\

Owen

--
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at http://groups.google.com/group/habari-dev

Reply via email to