Recent revs in the js-importer branch provide a basic fix for #1481 <https://github.com/leo-editor/leo-editor/issues/1481>. This long Engineering Notebook post will discuss this fix, it's limitations, further problems and possible fixes for those problems.
As always, feel free to ignore. However, this post describes important aspects of the code and its design. It may more than usual interest to Leo's devs. *The immediate problem* Perfect import failed for reveal.js because *i.gen_lines* failed to allocate lines properly to nodes. This caused lines to appear out of order. i.gen_lines is part of the *pipeline*, defined in the base *Importer class*. At present, the *JS_Importer* class (and most other importers) are subclasses of the Importer class. Iirc, all importers use i.gen_lines unchanged. The JS importer overrides i.starts_block, one of i.gen_lines's helpers. i.gen_lines uses a *parse state* to determine the start and end of *blocks*. For Javascript, these blocks correspond (roughly) to classes and functions. Alas, there are many ways to define a class or function in JS. JS is, by far, the most difficult language to parse of all the languages handled by Leo's importers. Perfect import failed for reveal.js because the importer mistook a line like if('function'){ as the start of a function. The present fix will usually work (Leo can now import reveal.js), but not always, as I'll now explain... *Tokenizing Javascript* You ask, how hard can it be to recognize strings like 'function' in JS? The answer is "very very hard". Tokenizing JS depends on context. In particular, it is difficult to determine whether a '/' character is the "div" arithmetic operator or the start of a regular expression! JS is, by far, the most difficult language to scan (tokenize) of all the languages handled by Leo's importers. *i.scan_line* updates the parse state after *carefully* tokenizing each line. The JS importer method *js_i.scan_line* overrides i.scan_line to handle '/' properly. As you can see, it's not pretty. The bug happened because *js_i.starts_block* contained regex's that didn't distinguish the "function" keyword from a string containing "function". The partial fix is mostly a hack. js_i.starts_block now keeps track of whether function *looks* like it is in a string. But this faux tokenizing could fail if quotes (or "function") appear in a regex, as they very well might. A proper fix would involve fully tokenizing each line, in *both *js_i.scan_line and in js_i.starts_block. Therefore, we want a stand-alone javascript tokenizer, written in python. Present revs include a copy of JsLex.py <https://bitbucket.org/ned/jslex/src/default/jslex.py>, but it isn't hooked up yet. The JsLex code contains a note that it doesn't handle non-ascii characters properly, so it may need substantial revision. Happily, JsLex does contain a suite of unit tests. *Other problems* The lines of reveal.js that caused the perfect import to fail are not handled very well even after perfect import succeeds. Here are the complications... The JS importer must be completely immune from indentation, and it is. As a direct consequence, perfect import tests ignore leading whitespace. However, once lines are allocated to nodes, the importer tries to *adjust *nodes to make them as pleasing as possible. In particular, the importer removes *common leading whitespace* from all nodes. In addition, under special circumstances, the importer may try to move one or more lines from the end of one node to the start of the previous or following sibling node. The *post pass* part of the pipeline handles all these adjustments. It would be difficult/impossible to handle them in gen_lines. At present, the JS importer only generates @others directives, never section references. This could be changed, but imo using @others is much better. However, @others does impose additional constraints on what the post pass can do. It's time for an extended example. Here is the gist of the code that caused perfect import to fail: function startEmbeddedContent( element ) { toArray( element.querySelectorAll( 'video, audio' ) ).forEach( function( el ) { if('function') { // The culprit promise.catch( function() { el.addEventListener( 'play', function() { el.controls = false; } ); } ); } } } To have any chance of understanding what is going on, the post pass must be disabled. Here are the results, without the post pass. Headlines are preceded by ===: === function startEmbeddedContent function startEmbeddedContent( element ) { @others } === toArray( element.querySelectorAll('video, audio')).forEach function toArray( element.querySelectorAll('video, audio')).forEach( function(el) { if('function') { @others } === promise.catch function promise.catch(function() { @others } ); } // <===== === el.addEventListener('play', function el.addEventListener('play', function() { el.controls = false; } ); Now you can see why a post pass is desirable. It's "obvious" that the last line of the node "promise.catch function" belongs in the *previous* node. However, *such a move can not be done in general!* Only the post pass has any chance of having enough data to make this adjustment. This adjustment can only be done because the node "promise.catch function" is the *last* node under the range of the @others in the node "toArray...". The adjustment would be invalid if the node "promise.catch function" had any following siblings. *Summary* The present code now imports reveal.js without error. However, the latest code is a hack. A proper fix entails carefully tokenizing lines in two places. I plan to use JsLex to do this. JsLex will need work to handle non-ascii characters properly. I'll do that as part of the fix for #1481. The post pass attempts to reallocate lines to make the result more palatable. The JS importer uses @others instead of section references. This imposes constraints on possible adjustments. I'll attempt to improve the post pass as another part of #1481. This may involve a rewrite/rethink. This post has been part of the rethinking process. The present problems with Leo's JS importer arise from well-known infelicities in JS itself. The fixes to #1481 actually show the strengths of the Leo's importer architecture. Fixes are straightforward and will be confined only to the JS_Importer class. Edward -- You received this message because you are subscribed to the Google Groups "leo-editor" group. To unsubscribe from this group and stop receiving emails from it, send an email to leo-editor+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/leo-editor/2995648b-a7ee-405c-b9e8-972a50da9d32%40googlegroups.com.