Re: official orgmode parser
I'm no expert in parsing but I would expect org's parser to be quite similar to the multitude of markdown or CommonMark [1] parsers. There isn't that much difference in syntax, except maybe org is more versatile and has more syntax elements, like drawers. Searching for "EBNF Markdown" I stumbled upon [2]. [1] https://commonmark.org/ [2] http://roopc.net/posts/2014/markdown-cfg/ On 10/26/20 10:00 PM, Tom Gillespie wrote: Here is an attempt to clarify my own confusion around the nested structures in org. In short: each node in the headline tree and the plain list tree can be parse using the EBNF, the nesting level cannot, which means that certain useful operations such as folding, require additional rules beyond the grammar. More in line. Best! Tom Do you need to? This is valid as an entire Org file, I think: *** foo * bar * baz And that can be represented in EBNF. I'm not aware of places where behavior is indent-level specific, except inline tasks, and that edge case can be represented. You are correct, and as long as the heading depth doesn't change some interpretation then this is a non-issue. The reason I mentioned this though is because it means that you cannot determine how to correctly fold an org file from the grammar alone. To make sure I understand. It is possible to determine the number of leading stars (and thus the level), but I think that it is not possible to identify the end of a section. For example * a *** b ** c * d You can parse out a 1, b 3, c 2, d 1, but if you want to be able to nest b and c inside a but not nest d inside a, then you need a stack in there somewhere. You can't have a rule such as section : headline content content : text | section because the parse would incorrectly nest sections at the same level, you would have to write section-level-1 : headline-1 content-1 content-1 : text | section-level-2-n but since we have an arbitrary number of levels the grammar would have to be infinite. This is only if you want your grammar to be able to encode that the content of sections can include other more deeply nested sections, which in this context we almost certainly do not (as you point out). There is a similar issue with the indentation level in order to correctly interpret plain lists. list ::= ('+' string newline)+ sublist? sublist ::= (indent list)+ I think this captures lists? Ah yes, I see my mistake here. In order for this to work the parser has to implement significant whitespace, so whitespace cannot be parsed into a single token. I think everything works out after that. Definitely not able to be represented in EBNF, unless as you say {name} is a limited vocabulary. Darn those pesky open sets!
Re: official orgmode parser
On 9/23/20 10:09 AM, Bastien wrote: Hi Przemysław, Przemysław Kamiński writes: I oftentimes find myself needing to parse org files with some external tools (to generate reports for customers or sum up clock times for given month, etc). Looking through the list https://orgmode.org/worg/org-tools/ Can you help on making the above page more useful to anyone? Perhaps we can have a separate worg page just for parsers, reporting the ones that seem to fully work. I disagree that a parser is too difficult to maintain because Org is a moving target. Org core syntax is not moving anymore, a parser can reasonably target it. That's what is done with the Ruby parser, in use in this small project called github.com :) So I'd say: - let's enhance Worg's documentation - yes, please go for enhancing parsing tools I don't think we need official tools. The official Org parser exists, it is Org itself. Thanks, Hello Bastien, Thank you for your remarks. I updated the README, hopefully it's more usable now. Przemek
Re: official orgmode parser
On 9/17/20 3:18 AM, Ihor Radchenko wrote: So basically this is what this thread is about. One needs a working Emacs instance and work in "push" mode to export any Org data. This requires dealing with temporary files, as described above, and some ad-hoc formats to keep whatever data I need to pull from org. "Pull" mode would be preferred. I could then, say, write a script in Guile, execute 'emacs -batch' to export org data (I'm ok with that), then parse the S-expressions to get what I need. My choice to use "push" mode is just for performance reasons. Nothing prevents you from writing a function called from emacs --batch that converts parsed org data into whatever format your Guile script prefers. That function may be either on Emacs side or on Guile side. Probably, Emacs has more capabilities when dealing with s-expressions though. You can even directly push the information from Emacs to API server. You may find https://github.com/tkf/emacs-request useful for this task. Finally, you may also consider clock tables to create clock summaries using existing org-mode functionality. The tables can be named and accessed using any programming language via babel. Best, Ihor Przemysław Kamiński writes: On 9/16/20 2:02 PM, Ihor Radchenko wrote: However what Ihor presented is interesting. Do you use similar approach with shellout and 'emacs -batch' to show currently running task or you 'push' data from emacs to show it in the taskbar? I prefer to avoid querying emacs too often for performance reasons. Instead, I only update the clocking info when I clock in/out in emacs. Then, the clocked in time is dynamically updated by independent bash script. The scheme is the following: 1. org clock in/out in Emacs trigger writing clocking info into ~/.org-clock-in status file 2. bash script periodically monitors the file and calculates the clocked in time according to the contents and time from last modification 3. the script updates simple textbox widget using awesome-client 4. the script also warns me (notify-send) when the weighted clocked in time is negative (meaning that I should switch to some more productive activity) Best, Ihor Przemysław Kamiński writes: On 9/16/20 9:56 AM, Ihor Radchenko wrote: Wow, another awesomewm user here; could you share your code? Are you interested in something particular about awesome WM integration? I am using simple textbox widgets to show currently clocked in task and weighted summary of clocked time. See the attachments. Best, Ihor Marcin Borkowski writes: On 2020-09-15, at 11:17, Przemysław Kamiński wrote: So, I keep clock times for work in org mode, this is very handy. However, my customers require that I use their service to provide the times. They do offer API. So basically I'm using elisp to parse org, make API calls, and at the same time generate CSV reports with a Python interop with org babel (because my elisp is just too bad to do that). If I had access to some org parser, I'd pick a language that would be more comfortable for me to get the job done. I guess it can all be done in elisp, however this is just a tool for me alone and I have limited time resources on hacking things for myself :) I was in the exact same situation - I use Org-mode clocking, and we use Toggl at our company, so I wrote a simple tool to fire API requests to Toggl on clock start/cancel/end: https://github.com/mbork/org-toggl It's a bit more than 200 lines of Elisp, so you might try to look into it and adapt it to whatever tool your employer is using. Another one is generating total hours report for day/week/month to put into my awesomewm toolbar. I ended up using orgstat https://github.com/volhovM/orgstat however the author is creating his own DSL in YAML and I guess things were much better off if it all stayed in some Scheme :) Wow, another awesomewm user here; could you share your code? Best, -- Marcin Borkowski http://mbork.pl I don't have interesting code, just standard awesomevm setup. I run periodic script to output data computed by orgstat and show it in the taskbar (uses the shellout_widget). However what Ihor presented is interesting. Do you use similar approach with shellout and 'emacs -batch' to show currently running task or you 'push' data from emacs to show it in the taskbar? P. So basically this is what this thread is about. One needs a working Emacs instance and work in "push" mode to export any Org data. This requires dealing with temporary files, as described above, and some ad-hoc formats to keep whatever data I need to pull from org. "Pull" mode would be preferred. I could then, say, write a script in Guile, execute 'emacs -batch' to export org data (I'm ok with that), then parse the S-expressions to get what I need. P. OK so this is what I got so far https://gitlab.c
Re: official orgmode parser
On 9/16/20 2:02 PM, Ihor Radchenko wrote: However what Ihor presented is interesting. Do you use similar approach with shellout and 'emacs -batch' to show currently running task or you 'push' data from emacs to show it in the taskbar? I prefer to avoid querying emacs too often for performance reasons. Instead, I only update the clocking info when I clock in/out in emacs. Then, the clocked in time is dynamically updated by independent bash script. The scheme is the following: 1. org clock in/out in Emacs trigger writing clocking info into ~/.org-clock-in status file 2. bash script periodically monitors the file and calculates the clocked in time according to the contents and time from last modification 3. the script updates simple textbox widget using awesome-client 4. the script also warns me (notify-send) when the weighted clocked in time is negative (meaning that I should switch to some more productive activity) Best, Ihor Przemysław Kamiński writes: On 9/16/20 9:56 AM, Ihor Radchenko wrote: Wow, another awesomewm user here; could you share your code? Are you interested in something particular about awesome WM integration? I am using simple textbox widgets to show currently clocked in task and weighted summary of clocked time. See the attachments. Best, Ihor Marcin Borkowski writes: On 2020-09-15, at 11:17, Przemysław Kamiński wrote: So, I keep clock times for work in org mode, this is very handy. However, my customers require that I use their service to provide the times. They do offer API. So basically I'm using elisp to parse org, make API calls, and at the same time generate CSV reports with a Python interop with org babel (because my elisp is just too bad to do that). If I had access to some org parser, I'd pick a language that would be more comfortable for me to get the job done. I guess it can all be done in elisp, however this is just a tool for me alone and I have limited time resources on hacking things for myself :) I was in the exact same situation - I use Org-mode clocking, and we use Toggl at our company, so I wrote a simple tool to fire API requests to Toggl on clock start/cancel/end: https://github.com/mbork/org-toggl It's a bit more than 200 lines of Elisp, so you might try to look into it and adapt it to whatever tool your employer is using. Another one is generating total hours report for day/week/month to put into my awesomewm toolbar. I ended up using orgstat https://github.com/volhovM/orgstat however the author is creating his own DSL in YAML and I guess things were much better off if it all stayed in some Scheme :) Wow, another awesomewm user here; could you share your code? Best, -- Marcin Borkowski http://mbork.pl I don't have interesting code, just standard awesomevm setup. I run periodic script to output data computed by orgstat and show it in the taskbar (uses the shellout_widget). However what Ihor presented is interesting. Do you use similar approach with shellout and 'emacs -batch' to show currently running task or you 'push' data from emacs to show it in the taskbar? P. So basically this is what this thread is about. One needs a working Emacs instance and work in "push" mode to export any Org data. This requires dealing with temporary files, as described above, and some ad-hoc formats to keep whatever data I need to pull from org. "Pull" mode would be preferred. I could then, say, write a script in Guile, execute 'emacs -batch' to export org data (I'm ok with that), then parse the S-expressions to get what I need. P.
Re: official orgmode parser
On 9/15/20 2:37 PM, to...@tuxteam.de wrote: On Tue, Sep 15, 2020 at 01:15:56PM +0200, Przemysław Kamiński wrote: [...] There's the org-json (or ox-json) package but for some reason I wasn't able to run it successfully. I guess export to S-exps would be best here. But yes I'll check that out. If that's your route, perhaps the "Org element API" [1] might be helpful. Especially `org-element-parse-buffer' gives you a Lisp data structure which is supposed to be a parse of your Org buffer. From there to S-expression can be trivial (e.g. `print' or `pp'), depending on what you want to do. Walking the structure should be nice in Lisp, too. The topic of (non-Emacs) parsing of Org comes up regularly, and there is a good (but AFAIK not-quite-complete) Org syntax spec in Worg [2], but there are a couple of difficulties to be mastered before such a thing can become really enjoyable and useful. The loose specification of Org's format (arguably its second or third strongest asset, the first two being its incredible community and Emacs itself) is something which makes this problem "interesting". People have invented lots of usages which might be broken should Org change to a strict formal spec. You don't want to break those people. But yes, perhaps some day someone nails it. Perhaps it's you :) Cheers [1] https://orgmode.org/worg/dev/org-element-api.html [2] https://orgmode.org/worg/dev/org-syntax.html - t So I looked at (pp (org-element-parse-buffer)) however it does print out recursive stuff which other schemes have trouble parsing. My code looks more or less like this: (defun org-parse (f) (with-temp-buffer (find-file f) (let* ((parsed (org-element-parse-buffer)) (all (append org-element-all-elements org-element-all-objects)) (mapped (org-element-map parsed all (lambda (item) (strip-parent item) (pp mapped strip-parent is basically (plist-put props :parent nil) for elements properties. However it turns out there are more recursive objects, like :title #("Headline 1" 0 10 (:parent (headline #2 (section So I'm wondering do I have to do it by hand for all cases or is there some way to output only a simple AST without those nested objects? Best, Przemek
Re: official orgmode parser
On 9/16/20 9:56 AM, Ihor Radchenko wrote: Wow, another awesomewm user here; could you share your code? Are you interested in something particular about awesome WM integration? I am using simple textbox widgets to show currently clocked in task and weighted summary of clocked time. See the attachments. Best, Ihor Marcin Borkowski writes: On 2020-09-15, at 11:17, Przemysław Kamiński wrote: So, I keep clock times for work in org mode, this is very handy. However, my customers require that I use their service to provide the times. They do offer API. So basically I'm using elisp to parse org, make API calls, and at the same time generate CSV reports with a Python interop with org babel (because my elisp is just too bad to do that). If I had access to some org parser, I'd pick a language that would be more comfortable for me to get the job done. I guess it can all be done in elisp, however this is just a tool for me alone and I have limited time resources on hacking things for myself :) I was in the exact same situation - I use Org-mode clocking, and we use Toggl at our company, so I wrote a simple tool to fire API requests to Toggl on clock start/cancel/end: https://github.com/mbork/org-toggl It's a bit more than 200 lines of Elisp, so you might try to look into it and adapt it to whatever tool your employer is using. Another one is generating total hours report for day/week/month to put into my awesomewm toolbar. I ended up using orgstat https://github.com/volhovM/orgstat however the author is creating his own DSL in YAML and I guess things were much better off if it all stayed in some Scheme :) Wow, another awesomewm user here; could you share your code? Best, -- Marcin Borkowski http://mbork.pl I don't have interesting code, just standard awesomevm setup. I run periodic script to output data computed by orgstat and show it in the taskbar (uses the shellout_widget). However what Ihor presented is interesting. Do you use similar approach with shellout and 'emacs -batch' to show currently running task or you 'push' data from emacs to show it in the taskbar? P.
Re: official orgmode parser
On 9/15/20 11:55 AM, Russell Adams wrote: On Tue, Sep 15, 2020 at 11:17:57AM +0200, Przemysław Kamiński wrote: Org mode IS an elsip application. This is the main goal. The reason it works so well is because elisp is largely a DSL that focuses on text manipulation and is therefore ideally suited for a text based organiser. So, I keep clock times for work in org mode, this is very handy. However, my customers require that I use their service to provide the times. They do offer API. So basically I'm using elisp to parse org, make API calls, and at the same time generate CSV reports with a Python interop with org babel (because my elisp is just too bad to do that). Please consider this is a very specialized use case. If I had access to some org parser, I'd pick a language that would be more comfortable for me to get the job done. I guess it can all be done in elisp, however this is just a tool for me alone and I have limited time resources on hacking things for myself :) Maintainer time is limited too. Maintaining a parser library outside of Emacs would be difficult for the reasons already given. I'd encourage you to pick up some more Elisp, which I am also trying to do. Anyways, my parser needs aren't that sophisticated: just parse the file, return headings with clock drawers. I tried the common lisp library but got frustrated after fiddling with it for couple of hours. If it's that small you could always do that in Python with regexps for your usage if you're more comfortable in Python. Org's plain text format means you can read it with anything. I suspect grep might even pull headlines and clocks successfully. I haven't looked at the elisp parser much, but I do wonder if someone couldn't write an exporter that exports a programmatic version of your org file data (ie: to xml). Then other tools could ingest those xml files. That'd certainly be a contrib module and not in the core, but might be worth your while to explore the idea if you really want to work with Org data outside of Emacs. -- Russell Adamsrlad...@adamsinfoserv.com PGP Key ID: 0x1160DCB3 http://www.adamsinfoserv.com/ Fingerprint:1723 D8CA 4280 1EC9 557F 66E8 1154 E018 1160 DCB3 There's the org-json (or ox-json) package but for some reason I wasn't able to run it successfully. I guess export to S-exps would be best here. But yes I'll check that out. Przemek
Re: official orgmode parser
On 9/15/20 11:03 AM, Tim Cross wrote: Przemysław Kamiński writes: Hello, I oftentimes find myself needing to parse org files with some external tools (to generate reports for customers or sum up clock times for given month, etc). Looking through the list https://orgmode.org/worg/org-tools/ and having tested some of these, I must say they are lacking. The Haskell ones seem to be done best, but then the compile overhead of Haskell and difficulty in embedding this into other languages is a drawback. I think it might benefit the community when such an official parser would exist (and maybe could be hooked into org mode directly). I was thinking picking some scheme like chicken or guile, which could be later easily embedded into C or whatever. Then use that parser in org mode itself. This way some important part of org mode would be outside of the small world of elisp. This is just an idea, what do you think? :) The problem with this idea is maintenance. It is also partly why external tools are not terribly reliable/good. Org mode is constantly being enhanced and improved. It is very hard for external tools to keep pace with org-mode development, so they soon get out of date or stop working correctly. Org mode IS an elsip application. This is the main goal. The reason it works so well is because elisp is largely a DSL that focuses on text manipulation and is therefore ideally suited for a text based organiser. This means if you want to implement parsing of org files in any other language, there is a lot of fundamental functionality which willl need to be implemented that is not necessary when using elisp as it is already built-in. Not only that, it is also 'battle hardened' and well tested. The other problem would be in selecting another language which behaves consistently across all the platforms Emacs and org-mode is supported on. As org-mode is a stnadard part of Emacs, it also needs to be implemented in something which is also available on all the platforms emacs is on without needing the user to install additional software. The other issue is that you would need another skill in order to maintain/extend org-mode. In addition to elisp, you will also need to know whatever the parser implementation language is. A third negative is that if the parser was in a different language to elisp, the interface between the rest of org mode (in elisp) and the parser would become an issue. At the moment, there are far fewer barriers as it is all elisp. However, if part of the system is in another language, you are now restricted to whatever defined interface exists. This would likely also have performance issues and overheads associated with translating from one format to another etc. So, in short, the chances of org mode using a parser written in something other than elisp is pretty close to 0. This leaves you with 2 options - 1. Implement another external tool which can parse org-files. As metnioned above, this is a non-trivial task and will likely be difficult to maintain. Probably not the best first choice. 2. Provide some details about your workflow where you believe you need to use external tools to process the org-files. It is very likely there are alternative approaches to give you the result you want, but without the need to do external parsing of org-files. There isn't sufficient details in the examples you mention to provide any specific details. However, I have used org-mode for reporting, invoicing, time tracking, documentation, issue/request tracking, project planning and project management and never needed to parse my org files with an external tool. I have exported the data in different formats which have then been processed by other tools and I have tweaked my setup to support various enterprise/corporate standards or requirements (logos, corporate colours, report formats, etc). Sometimes these tweaks are trivial and others require more extensive effort. Often, others have had to do something the same or similar and have working examples etc. So my recommendation is post some messages to this list with details on what you need to try and do and see what others can suggest. I would keep each post to a single item rather than one long post with multiple requests. From watching this list, I've often see someone post a "How can I ..." question only to get the answer "Oh, that is already built-in, just do .". Org is a large application with lots of sophisticated power that isn't always obvious from just reading the manual. So, I keep clock times for work in org mode, this is very handy. However, my customers require that I use their service to provide the times. They do offer API. So basically I'm using elisp to parse org, make API calls, and at the same time generate CSV reports with a Python interop with org babel (because my elisp is just too bad to do that). If I had access to some org parser, I'd pick a languag
official orgmode parser
Hello, I oftentimes find myself needing to parse org files with some external tools (to generate reports for customers or sum up clock times for given month, etc). Looking through the list https://orgmode.org/worg/org-tools/ and having tested some of these, I must say they are lacking. The Haskell ones seem to be done best, but then the compile overhead of Haskell and difficulty in embedding this into other languages is a drawback. I think it might benefit the community when such an official parser would exist (and maybe could be hooked into org mode directly). I was thinking picking some scheme like chicken or guile, which could be later easily embedded into C or whatever. Then use that parser in org mode itself. This way some important part of org mode would be outside of the small world of elisp. This is just an idea, what do you think? :) Best, Przemek