Re: [Chicken-users] Parsing Simple Markup
Wow...that is just perfect...suddenly I'm in love with Scheme :) On Mon, Sep 22, 2014 at 8:01 AM, Andy Bennett andy...@ashurst.eu.org wrote: Hi, Actually due to the possible presence of nested commands, it should probably be something more generic, since in the last example: (bold (smallcap (size 2 text))) what the procedure 'bold' would be taking in is not a string text, but rather an expression...so this is where I guess things would need to be recursive. The evaluation rules will evaluate things in the correct order. So (size 2 text) will be evaluated first, then (smallcap ) and then (bold ). It's deliberately unspecified in which order 2 or text will be evaluated in. Regards, @ndy -- andy...@ashurst.eu.org http://www.ashurst.eu.org/ 0x7EBA75FF ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Parsing Simple Markup
Did I already recommend this? Sorry if that's a duplication. One more example of SXML together with SRFI 110 sweet expressions (indent sensitive LISP syntax). Those two blend well together. I'm using them embedded in XML (XSLT) here: http://ball.askemos.org/Aa176138e655369f8c01c3044ced70cfc (Be sure to view this as source, not in the browser!) Remarks: a) this is served from Chicken b) it's a rather simple but complete payment system, docs coming up here: http://ball.askemos.org/A0cd6168e9408c9c095f700d7c6ec3224/?_v=search_id=1856_go=2 Wilma, Fred and Bamm-Bamm are each running the script above. Best /Jörg Am 21.09.2014 um 22:34 schrieb Arthur Maciel: Dear Yves, with SXML you could write transformation rules as Peter has shown in www.more-magic.net/docs/scheme/sxslt.pdf http://www.more-magic.net/docs/scheme/sxslt.pdf. I'm not experienced with SXML, but AFAIK they would generate a similar effect as the procedures in your example below. Best wishes, Arthur 2014-09-21 17:06 GMT-03:00 Yves Cloutier yves.clout...@gmail.com mailto:yves.clout...@gmail.com: Hello Oleg, Thank you for your recommendations too. I actually just came back from the local library where I picked up The Scheme Programming Language. You know, reading through your reply, it was the last part that made me think about something. If I can convert my input to the format: (bold text) (indent 5 text) (bold (smallcap (size 2 text))) Could I not define each of these as functions (or procedures), and then just call an (eval ' ) procedure to do my output? For example (keeping in mind I'm only just getting familiar with Scheme syntax!): (define (bold (text) (print the opening tag for the command 'bold') (print the string 'text') (print the closing tag for the command 'bold')) (define (indent (indent-value text) (print the opening tag for the command 'indent' with value of 'indent-value') (print the string 'text') (print the closing tag for the command 'indent')) Actually due to the possible presence of nested commands, it should probably be something more generic, since in the last example: (bold (smallcap (size 2 text))) what the procedure 'bold' would be taking in is not a string text, but rather an expression...so this is where I guess things would need to be recursive. Once my document has been converted into one big s-expression, and procedures defined accordingly, then I could just (eval ) it..couldn't I? (eval '(bold text) (indent 5 text) (bold (smallcap (size 2 text Or something along those lines? If this is the casebrilliant! ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Parsing Simple Markup
Hi, Actually due to the possible presence of nested commands, it should probably be something more generic, since in the last example: (bold (smallcap (size 2 text))) what the procedure 'bold' would be taking in is not a string text, but rather an expression...so this is where I guess things would need to be recursive. The evaluation rules will evaluate things in the correct order. So (size 2 text) will be evaluated first, then (smallcap ) and then (bold ). It's deliberately unspecified in which order 2 or text will be evaluated in. Regards, @ndy -- andy...@ashurst.eu.org http://www.ashurst.eu.org/ 0x7EBA75FF ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Parsing Simple Markup
On Sat, Sep 20, 2014 at 11:19:08AM -0400, Yves Cloutier wrote: Hello, I am a new user to Scheme in general and to Chicken in particular, nice to meet you all. Hello Yves, and welcome to the CHICKEN community! I came to scheme looking for an alternative to Perl for doing a personal project which involves parsing an input file, identifying html-like commands and converting those to Groff code. That should be pretty doable. We already have several eggs for parsing various markup languages, you may want to take a look at their implementations for inspiration: - html-parser - htmlprag - lowdown - mistie - svnwiki-sxml Scheme is a totally different paradigm that I'm used to, so while I wait for my books to arrive I will need some hand-holding...hope that's ok. No problem! We always help out newbies. If you have specific questions, you might also like to try our IRC channel. There's usually someone around to answer your question. 1) Is the Chicken Scheme manual available for purchase? Online docs are great but I like to have a hardcopy so that I can read offline. I'm afraid not. You're the first person to ask for a hardcopy. Of course you can always print it... The manual is in svnwiki syntax, which can be translated to sxml/html or markdown. It's also human-readable so you could print out the sources. There's a copy of the manual with every tarball, which gets installed as HTML in your system's doc directory, so it's always available when you're offline. 2) The best way to learn is to get your hands dirty so I was looking at doing everything from scratch, but then I saw input-parse ( http://wiki.call-cc.org/eggref/4/input-parse) which seems pretty much like what I need. But i can't seem to find this in the Eggs. It says that page does not exist yet. Which page are you talking about? http://wiki.call-cc.org/eggref/4/input-parse looks fine to me. In Perl I am able to do most of this with regular expressions, but I'm hitting my head against the wall when it comes to multiple formatting commands within a group ...,...,... There's a famous quote by Jamie Zawinsky about regular expressions, which seems like it applies in this case: Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems. Having said that, I think the SRE notation for regular expressions makes them a lot more readable. However, parsing complex languages using regular expressions is a bad idea... Also to noteI am NOT a programmer of developer - I am a hobbyist and doing this for fun! ...since you're not a programmer, you may not be familiar with formal language theory. The idea there is that there are several classes of languages (or grammars), and only so-called regular grammars can stricly be parsed with regular expressions. A regular grammar is basically one which requires no extra information to parse it, aside from the current rule in the parser. It also means that no backtracking is needed when parsing it. Irregex (like Perl) can do backtracking, which muddles the waters quite a bit, and I think this is one of the reasons people get confused about the abilities of regexes. A good rule of thumb to remember is: if your syntax allows to nest things, regular expressions alone cannot parse it. For instance, in HTML you can arbitrarily nest markup instructions like bi../i/b, but also divdiv.../div/div. This is why people will tell you that HTML/XML cannot be parsed with regular expressions. If you try anyway, you set yourself up for failure. Many security issues have historically been due to poor parsing choices. If you're interested in this stuff, see also http://www.langsec.org, which is a group of people who are using a language-theoretical approach to fighting insecurity. You may be able to do partial parsing steps of a complex grammar using regular expressions combined with some code to drive it. This is the typical PHP/Perl approach of parsing languages, with the reference implementations of Markdown and Textile being prime examples. However, this quickly becomes untractable, and inevitably leads to the aforementioned security issues. Instead, I'd advise you to use one of the parsing eggs, or roll your own recursive descent parser. If performance is not much of a consideration, that's pretty easy to do in Scheme, and you don't need any dependencies. My idea was that I could read a line of text from a file at a time. My understanding is that the input would be read into an s-expression (which I understand to basically be a list). That sounds problematic, because it will limit your ability to have modifiers that span multiple lines. Of course, it's still possible with additional bookkeeping, but you may find it easier to just parse from a character stream, handling newline symbols in the grammar instead of being fundamental to the way your syntax must be parsed. This is my first attempt at functional
Re: [Chicken-users] Parsing Simple Markup
Hi, I am a new user to Scheme in general and to Chicken in particular, nice to meet you all. Welcome! A few examples of what I am trying to parse: 1. Tags that identify structural elements of a document: [chapter] Chapter Title [heading1] Heading Title [list] ... [end] [quote] ... [end] 2. Tags that identify formatting of text: boldtext ;single formatting command with no value indent 5text ; formatting command with a value dropcapOnce upon a time bold, smallcap, size +2text ;a command group which has multiple formatting commands enclosed within A command group can be singular: ... or have multiple commands separated by commas: ...,...,..., the closing signalling the end of the command group. This is not entirely dissimilar to Markdown so I'd echo Peter's advice to check out lowdown, the CHICKEN Markdown implementation, and comparse, the parser library lowdown is implemented in. I'll also point you to the eMail address parsing egg: http://api.call-cc.org/doc/email-address which is another example of a parser written with comparse. It's interesting because, unlike lowdown, it implements a parser for just a small number of things: eMail addresses and lists of eMail addresses. comparse is a parser combinator library. This means that you specify parts of your grammar / language and a procedure which can parse that thing is returned. You then combine these parsers to produce other parsers that, for example, can parse X then Y, X or Y, X then Y then Y, etc. It takes a couple of hours to wrap your head around it but it's very powerful. The email-address parser is build up starting from sets of characters and resulting in two procedures: one that parses and eMail address and one that parses a sequence of eMail address. The idea is to make typesetting with Groff very simple and intuitive for any user - not just programmers and hackers. The markup we are working on is called Typesetting Markup Language (TML). So it would convert html-like commands and generate a Groff document from it. comparse allows to take your results and give them as arguments to other procedures. In the eMail address egg I use this to populate an internal data type that represents an eMail address. You could use an intermediate data type like this or you could try to write a number of different procedures which immediately output the parsed thing in the required format. Right now I am trying to do a prototype which generated Groff in the backend, but the idea is to have a general purpose markup that could also be used to generate LaTex/Contex, HTML xml etc ...it's probably best to generate an intermediate format then. The lowdown egg generates SXML which can easily be rendered down to HTML. SXML is an s-expression representation of the tree structure of XML. See here for an illustration of SXML: http://www.more-magic.net/posts/lispy-dsl-sxml.html In Perl I am able to do most of this with regular expressions, but I'm hitting my head against the wall when it comes to multiple formatting commands within a group ...,...,... In comparse something like X,Y,Z would be something like: (off the top of my head, without testing anything) ; fIorz's separated-by parser : (define (separated-by sep-parser field-parser) (sequence* ((head field-parser) (tail (zero-or-more (preceded-by sep-parser field-parser (result (cons head tail (define the-parser (sequence-of (char ) (separated-by , (maybe ; support null elements (any-of X Y Z)) (char ))) The email-address additionally has the delimited-by parser to support white space around the commas. Above I've used the maybe parser to show how you'd support X,,Y,Z as well as X,Y,Z Also to noteI am NOT a programmer of developer - I am a hobbyist and doing this for fun! It looks like you're on the right tracks. My idea was that I could read a line of text from a file at a time. My understanding is that the input would be read into an s-expression (which I understand to basically be a list). Then could car the first item of the list and match it against my tags or formatting commands (which would be defined as something like below) (define chapter [chapter]) (define list:digit [list:digit]) (define list:alpha [list:alpha]) (define end-list [end]) (define close-command-group ) (define command-group-begin ) (define command-group-end ) (define bold bold) (define smallcap smallcap) (define dropcap dropcap) Don't worry about reading the input: let comparse do that for you. Other than that, it looks like the rules you have defined there aren't a million miles from the way comparse would let you specify things. The additional complexity is that compares returns a procedure that you apply to the string or port you want
Re: [Chicken-users] Parsing Simple Markup
Peter/Andy/Richard, Thanks so much for your replies. Richard: Thanks for showing me how to install input-parse from the command line...I had no idea!. Also thanks for the link comparing how things are done using Python as a comparison. Got me reading a file in 30 seconds! And/Peter: Thank you for suggesting I look at comparse and lowdown. I will certainly do that. Andy, you put a link to Peter's page for SXML...strangely enough I had already visited this page while trying to learn more about Scheme/Lisp and how I would approach my project. Acutally the markup notation I have devised would actually translate itself even better to S-Expression. Using the example from Peter's page: div spanHello, strongdear/strong friends./span spanThis is a lt;simplegt; example./span /div Converting this HTML fragment to an S-expression is straightforward: '(div (span Hello, (strong dear) friends.) (span This is a simple example.)) t's a bit more cumbersome to type because you have to break up the strings for the strong element If I were to write this in TML, it would look something like this: div spanHello, strongdear friends spanthis is a lt;simplegt; example. which to me looks exactly like: '(div (span Hello, (strong dear) friends.) (span This is a simple example.)) In TML markup, the symbol denotes the closing of a tag or tag group, as opposed to XML/HTML where a corresponding /tag must exists for each opening tag. So whenever a is encountered, that is a a cue to close the current tag (/tag) or group of tags (/tag1/tag2/tag3. Well, it looks like Scheme will be a very good choice for this project... Cheers! On Sun, Sep 21, 2014 at 11:01 AM, Andy Bennett andy...@ashurst.eu.org wrote: Hi, I am a new user to Scheme in general and to Chicken in particular, nice to meet you all. Welcome! A few examples of what I am trying to parse: 1. Tags that identify structural elements of a document: [chapter] Chapter Title [heading1] Heading Title [list] ... [end] [quote] ... [end] 2. Tags that identify formatting of text: boldtext ;single formatting command with no value indent 5text ; formatting command with a value dropcapOnce upon a time bold, smallcap, size +2text ;a command group which has multiple formatting commands enclosed within A command group can be singular: ... or have multiple commands separated by commas: ...,...,..., the closing signalling the end of the command group. This is not entirely dissimilar to Markdown so I'd echo Peter's advice to check out lowdown, the CHICKEN Markdown implementation, and comparse, the parser library lowdown is implemented in. I'll also point you to the eMail address parsing egg: http://api.call-cc.org/doc/email-address which is another example of a parser written with comparse. It's interesting because, unlike lowdown, it implements a parser for just a small number of things: eMail addresses and lists of eMail addresses. comparse is a parser combinator library. This means that you specify parts of your grammar / language and a procedure which can parse that thing is returned. You then combine these parsers to produce other parsers that, for example, can parse X then Y, X or Y, X then Y then Y, etc. It takes a couple of hours to wrap your head around it but it's very powerful. The email-address parser is build up starting from sets of characters and resulting in two procedures: one that parses and eMail address and one that parses a sequence of eMail address. The idea is to make typesetting with Groff very simple and intuitive for any user - not just programmers and hackers. The markup we are working on is called Typesetting Markup Language (TML). So it would convert html-like commands and generate a Groff document from it. comparse allows to take your results and give them as arguments to other procedures. In the eMail address egg I use this to populate an internal data type that represents an eMail address. You could use an intermediate data type like this or you could try to write a number of different procedures which immediately output the parsed thing in the required format. Right now I am trying to do a prototype which generated Groff in the backend, but the idea is to have a general purpose markup that could also be used to generate LaTex/Contex, HTML xml etc ...it's probably best to generate an intermediate format then. The lowdown egg generates SXML which can easily be rendered down to HTML. SXML is an s-expression representation of the tree structure of XML. See here for an illustration of SXML: http://www.more-magic.net/posts/lispy-dsl-sxml.html In Perl I am able to do most of this with regular expressions, but I'm hitting my head against the wall when it comes to multiple formatting commands within a group ...,...,... In comparse something like X,Y,Z would be something like: (off
Re: [Chicken-users] Parsing Simple Markup
Dear Yves, with SXML you could write transformation rules as Peter has shown in www.more-magic.net/docs/scheme/sxslt.pdf. I'm not experienced with SXML, but AFAIK they would generate a similar effect as the procedures in your example below. Best wishes, Arthur 2014-09-21 17:06 GMT-03:00 Yves Cloutier yves.clout...@gmail.com: Hello Oleg, Thank you for your recommendations too. I actually just came back from the local library where I picked up The Scheme Programming Language. You know, reading through your reply, it was the last part that made me think about something. If I can convert my input to the format: (bold text) (indent 5 text) (bold (smallcap (size 2 text))) Could I not define each of these as functions (or procedures), and then just call an (eval ' ) procedure to do my output? For example (keeping in mind I'm only just getting familiar with Scheme syntax!): (define (bold (text) (print the opening tag for the command 'bold') (print the string 'text') (print the closing tag for the command 'bold')) (define (indent (indent-value text) (print the opening tag for the command 'indent' with value of 'indent-value') (print the string 'text') (print the closing tag for the command 'indent')) Actually due to the possible presence of nested commands, it should probably be something more generic, since in the last example: (bold (smallcap (size 2 text))) what the procedure 'bold' would be taking in is not a string text, but rather an expression...so this is where I guess things would need to be recursive. Once my document has been converted into one big s-expression, and procedures defined accordingly, then I could just (eval ) it..couldn't I? (eval '(bold text) (indent 5 text) (bold (smallcap (size 2 text Or something along those lines? If this is the casebrilliant! ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Parsing Simple Markup
On Sat, 20 Sep 2014 11:19:08 -0400 Yves Cloutier yves.clout...@gmail.com wrote: Hello, I am a new user to Scheme in general and to Chicken in particular, nice to meet you all. I came to scheme looking for an alternative to Perl for doing a personal project which involves parsing an input file, identifying html-like commands and converting those to Groff code. I was doing well up to a certain point but things started getting messy and thought perhaps there is a language out there better suited for this - which led me to scheme. Scheme is a totally different paradigm that I'm used to, so while I wait for my books to arrive I will need some hand-holding...hope that's ok. 1) Is the Chicken Scheme manual available for purchase? Online docs are great but I like to have a hardcopy so that I can read offline. 2) The best way to learn is to get your hands dirty so I was looking at doing everything from scratch, but then I saw input-parse ( http://wiki.call-cc.org/eggref/4/input-parse) which seems pretty much like what I need. But i can't seem to find this in the Eggs. It says that page does not exist yet. For the most part, a lot of what I want to do is search and replace, except for special cases where additioanl processing would be required to extract command:value pairs. A few examples of what I am trying to parse: 1. Tags that identify structural elements of a document: [chapter] Chapter Title [heading1] Heading Title [list] ... [end] [quote] ... [end] 2. Tags that identify formatting of text: boldtext ;single formatting command with no value indent 5text ; formatting command with a value dropcapOnce upon a time bold, smallcap, size +2text ;a command group which has multiple formatting commands enclosed within A command group can be singular: ... or have multiple commands separated by commas: ...,...,..., the closing signalling the end of the command group. The idea is to make typesetting with Groff very simple and intuitive for any user - not just programmers and hackers. The markup we are working on is called Typesetting Markup Language (TML). So it would convert html-like commands and generate a Groff document from it. Right now I am trying to do a prototype which generated Groff in the backend, but the idea is to have a general purpose markup that could also be used to generate LaTex/Contex, HTML xml etc In Perl I am able to do most of this with regular expressions, but I'm hitting my head against the wall when it comes to multiple formatting commands within a group ...,...,... Also to noteI am NOT a programmer of developer - I am a hobbyist and doing this for fun! So there is my introduction! If any of you have any words of wisdom on where to begin I would love to hear from you. I literally started playing with Scheme last night while i wait for my book order (come on amazon...send me my books!) My idea was that I could read a line of text from a file at a time. My understanding is that the input would be read into an s-expression (which I understand to basically be a list). Then could car the first item of the list and match it against my tags or formatting commands (which would be defined as something like below) (define chapter [chapter]) (define list:digit [list:digit]) (define list:alpha [list:alpha]) (define end-list [end]) (define close-command-group ) (define command-group-begin ) (define command-group-end ) (define bold bold) (define smallcap smallcap) (define dropcap dropcap) And then do something based on what token that is encountered. This is my first attempt at functional programming so I realize I may not be approaching this in the best way. Regards, and looking forward to playing with Scheme! yves Hello Yves, Welcome to Chicken, I can give you a more in-depth answer tomorrow when I have more time. In the meantime; input-parse is working. I do not understand what you mean by not being able to find it in the Eggs? You install it by typing at the command line: chicken-install -s input-parse (use -s if you need root-privilidges). Then, to use it, include (use input-parse) at the top of your source good. You say you have been using regexps before but got stuck, may I point you to: http://wiki.call-cc.org/man/4/Unit%20irregex IMO the extended SRE Syntax is a lot saner than that of Perl. Maybe this is of some help. This is an intro written for Python programmers but you might find it useful none the less. http://wiki.call-cc.org/chicken-for-python-programmers Good luck, Richard ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Parsing Simple Markup
On 09/20/14 19:19, Yves Cloutier wrote: Hello, I am a new user to Scheme in general and to Chicken in particular, nice to meet you all. Welcome! Scheme is a totally different paradigm that I'm used to, so while I wait for my books to arrive I will need some hand-holding...hope that's ok. I was in a similar situation few months ago, with the experience in classic languages - Scheme looked completely foreign. But it's actually very simple once you get the basic concepts. For learning I personally recommend The Scheme Programming Language (http://www.scheme.com/tspl4/) - it contains very nice exercises. The book is somewhat tied to Chez Scheme but many extensions are available in Chicken as well. Also, not often recommended but my favourite is: An Introduction to Scheme and its Implementation (ftp://ftp.cs.utexas.edu/pub/garbage/cs345/schintro-v14/schintro_toc.html) - although it's unfinished there are some gems scattered around, especially useful if you are familiar with the C language. 1) Is the Chicken Scheme manual available for purchase? Online docs are great but I like to have a hardcopy so that I can read offline. There are http://wiki.call-cc.org/eggref/4/chicken-doc - you can install it for offline use. It will look like http://api.call-cc.org/doc/chicken. For the most part, a lot of what I want to do is search and replace, except for special cases where additioanl processing would be required to extract command:value pairs. The idea is to make typesetting with Groff very simple and intuitive for any user - not just programmers and hackers. The markup we are working on is called Typesetting Markup Language (TML). So it would convert html-like commands and generate a Groff document from it. See also http://en.wikipedia.org/wiki/SXML, http://wiki.call-cc.org/man/4/Unit%20irregex and http://wiki.call-cc.org/eggref/4/fmt for ideas. In Perl I am able to do most of this with regular expressions, but I'm hitting my head against the wall when it comes to multiple formatting commands within a group ...,...,... My idea was that I could read a line of text from a file at a time. My understanding is that the input would be read into an s-expression And then do something based on what token that is encountered. You can try to first convert this to simple s-expressions like: (bold text) (indent 5 text) (bold (smallcap (size 2 text))) and then use http://wiki.call-cc.org/eggref/4/matchable egg to generate output. See http://ceaude.twoticketsplease.de/articles/an-introduction-to-lispy-pattern-matching.html for an introduction. I've written very simple recursive s-exp parser using matchable some time ago. I will clean it up and post the link here in a few days for reference. -- Regards, Oleg ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users