Re: [O] Best practices for literate programming [was: Latex export of tables]

2013-04-21 Thread Rasmus Pank Roulund

Dear Tom,

> I suppose this depends on what is meant by "reproducible."
>
> My goal is to produce a compendium as defined by Gentleman and Lang
> (see Gentleman R, Lang DT (2004). "Statistical Analyses and Reproducible
> Research." Technical report, Bioconductor Project. URL
> http://www.bepress.com/bioconductor/paper2).  
>
> I keep the init.el file as a babel source block with the reproducible
> document, so it can be tangled. I also have an editing setup in a babel
> source block that activates many of the same features handled by the
> init.el file, but also configures the new exporter to look for init.el
> (which might have a different name). The filters are all part of the Org
> document, too, and get pulled into the init.el file with noweb
> references.

My issue here is that this approach might lead to copy-paste
"preambles" which may or may not be desirable.  I can certainly see
the attraction in being able to just tangle the setup.  In fact for my
thesis I also had a preamble.tex blog in my file.  Your proposed setup
here is perhaps better in that it uses emacs-lisp.

Still, say I'm working on two files A and B.  If I fix a bug in
"preamble" A I would have to manually copy it over to B.  

Thus, the main question is how to distribute updates?  I guess one
could keep a separate file, but then we are back at square one in a
way. . .

One possibility might be a file structure like this

setup.org
A/project-A.org
A/setup-A.org
B/project-B.org
B/setup-B.org

where A and B both has a block like
#+BEGIN_SRC org
* Preamlbe:noexport:
#+INCLUDE: "../setup.org"
#+INCLUDE: "setup-A.org"
#+END_SRC

To ship it off one would only have to write a command to replacing
#+INCLUDE with its content.  The exporter could likely be used for
this and one could produce an archive version when signing off a
project.

Even more robust, #+INCLUDE: would look for files in org-directory (it
might already do, I didn't check).

Am I missing something obvious (probably?) in the above stream of
random thoughts?  It's kind of a LaTeX-ish way of dealing with it, I
guess.

> I am able to distribute the compendium, typically as a single
> document (sometimes with associated data files produced by an
> on-line service that can't be used programmatically), which I
> believe is a good step toward reproducibility.

Agreed.

–Rasmus

-- 
Send from my Emacs



Re: [O] Best practices for literate programming [was: Latex export of tables]

2013-04-18 Thread Achim Gratz
Aaron Ecay writes:
> If your external org configuration file were kept under version control
> (I’ll discuss git but the principle is general), then reproducibility
> would be possible.

There's a lot more to reproducibility then just this, but yes, the
configuration files would have to be part of it.

> There are ways of embedding git hashes in LaTeX
> documents (for one example:
> http://thorehusfeldt.net/2011/05/13/including-git-revision-identifiers-in-latex/),
> and of course org could help automate this.  Including the git hash of
> the document itself, the config file, and org-mode’s own code (assuming
> these are kept in 3 separate repos) should allow perfect reproducibility
> (modulo incompatible changes in emacs, I guess).

This is confused thinking and doesn't help anyway with the problem at
hand.  The purpose of Git is to record (and later re-create) the
complete state of your work tree, so monkeying around with hashes
embedded in document sources isn't making progress and recording several
hashes is either superfluous or a sign of incomplete control over the
work tree (you'd maybe want to use submodules).

What you can and should do however is putting a Git hash into the final
document so that this can be linked back to some state of the worktree
(like Org does for its manual and installed sources).

> It would be interesting for org to have an ability to reference files
> not just by name, but by git revision.  So that you could do something
> like (where 123456 is some git hash):
> #+include: [[gitbare:/path/to/repo::123456:my-org-setup-file.org]]
> and have org take care of checking out the proper revision and loading
> the file in the usual way.  This syntax is already implemented, for
> plain links, in contrib/lisp/org-git-link.el, so it is just a matter
> of making #+include and friends understand links in addition to
> filenames.

Git revisions are for the whole tree (or more precisely the commit that
references the tree), not single files.  If you access a file blob by
it's SHA-1 rather than name, then Git lets you do that, but you bypass
most of Git that way (much like you'd bypass the file system if you
started to access files by block numbers).


Regards,
Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

SD adaptations for KORG EX-800 and Poly-800MkII V0.9:
http://Synth.Stromeko.net/Downloads.html#KorgSDada




Re: [O] Best practices for literate programming [was: Latex export of tables]

2013-04-18 Thread Thomas S. Dye
Aloha Rasmus,

Rasmus  writes:

> The following message is a courtesy copy of an article
> that has been posted to gmane.emacs.orgmode as well.
>
>
> Thomas,
>
>>> Tom, do tell us more about what these habits are.
>>
>> The new exporter is really your friend.  Where before I might choose to
>> generate a LaTeX block, now I look to generate Org output and then count
>> on the exporter to do the right thing on the way to pdf.  
>>
>> The exporter's attribute system is very easy to use.  The attributes you
>> need to access are always right there.
>>
>> I've also come to rely on filters quite a bit. I use them for
>> non-breaking spaces, the plus/minus symbol, and for the multiple
>> citation commands used by biblatex (e.g., \parencites). There seems to
>> be a move afoot to collect filters so they can be widely distributed.
>> I'd like to see the filters go to the Library of Babel, but for
>> reproducible research it is probably best to keep them with the source
>> document so there is no doubt about the fidelity of filter code.
>
> I too rely heavily on filters and customizations.  I haven't been able
> to fully appreciate the asynchronous exporter yet.
>
> For instance I set some defaults for tables, pictures, add lots of
> entities etc. in my init file, and I went as far as writing a separate
> init file just loading just the org stuff.  Now, this is clearly /not/
> a very reproducible way of doing this.

I suppose this depends on what is meant by "reproducible."

My goal is to produce a compendium as defined by Gentleman and Lang
(see Gentleman R, Lang DT (2004). "Statistical Analyses and Reproducible
Research." Technical report, Bioconductor Project. URL
http://www.bepress.com/bioconductor/paper2).  

I keep the init.el file as a babel source block with the reproducible
document, so it can be tangled. I also have an editing setup in a babel
source block that activates many of the same features handled by the
init.el file, but also configures the new exporter to look for init.el
(which might have a different name). The filters are all part of the Org
document, too, and get pulled into the init.el file with noweb
references.

A compendium with this structure gets past the problem, often aired on
the ML, that there is "something in my setup" that causes unexpected
behavior.  The Org setup is completely contained in the compendium.

I am able to distribute the compendium, typically as a single document
(sometimes with associated data files produced by an on-line service
that can't be used programmatically), which I believe is a good step
toward reproducibility.

Of course, it leaves open the question of changes in the underlying
software. This is a real problem. Org 8.0, with its new (and sweet)
exporter has broken my first two compendia. Conceivably, changes in
Emacs might break a compendium, as could changes in all the other
software referenced by babel code blocks.  Aaron Ecay seems to be on to
a possible mechanism to take care of at least some of this.  AFAICT,
however, his solution doesn't change the utility of the compendium,
which seems to me an integral part of the reproducibility equation.

What do you think?  

>
> So I'm really interested in hearing or seeing implementation where the
> goal is reproducibility.  On one hand I can appreciate keeping Org
> close to a vanilla state.  On the other hand, I'd have to overwrite
> defaults every time (e.g. I /always/ want booktab tables).  I guess I
> could keep an emacs-lisp block in the top of the file specifying
> stuff, but it also seems kind of tedious (copy-pasting every time).
> (Perhaps this could be resolved by loading external files hosted
> somewhere accessible).

Some journals specify which LaTeX packages can or cannot be used.
PLOS-One, for instance, doesn't use booktab tables, so a reproducible
research document sent to them couldn't include your default setting.
My sense of the publishing world is that it is sufficiently variable
eventually to break almost any default you might hope to establish.   

Incidentally, I think this is an area ripe for growth within Org
mode--additions to the Library of Babel that configure a compendium to
produce LaTeX code that meets the requirements of particular publishing
venues. It would be ideal to do something like <> and
then, when the journal sends back your paper with a digital pink slip,
change that to something like <> and send it off again.

All the best,
Tom 

-- 
T.S. Dye & Colleagues, Archaeologists
735 Bishop St, Suite 315, Honolulu, HI 96813
Tel: 808-529-0866, Fax: 808-529-0884
http://www.tsdye.com



Re: [O] Best practices for literate programming [was: Latex export of tables]

2013-04-18 Thread Rasmus
Aaron Ecay  writes:

> If your external org configuration file were kept under version control
> (I’ll discuss git but the principle is general), then reproducibility
> would be possible.  There are ways of embedding git hashes in LaTeX
> documents (for one example:
> http://thorehusfeldt.net/2011/05/13/including-git-revision-identifiers-in-latex/),
> and of course org could help automate this.  Including the git hash of
> the document itself, the config file, and org-mode’s own code (assuming
> these are kept in 3 separate repos) should allow perfect reproducibility
> (modulo incompatible changes in emacs, I guess).

Sounds interesting.  I'll check it out. 


> It would be interesting for org to have an ability to reference files
> not just by name, but by git revision.  So that you could do something
> like (where 123456 is some git hash):
> #+include: [[gitbare:/path/to/repo::123456:my-org-setup-file.org]]
> and have org take care of checking out the proper revision and loading
> the file in the usual way.  This syntax is already implemented, for
> plain links, in contrib/lisp/org-git-link.el, so it is just a matter
> of making #+include and friends understand links in addition to
> filenames.

Now that is a great idea that allows for both incremental
improvements while still retaining compatibility for old files.

–Rasmus

-- 
And let me remind you also that moderation in the pursuit of justice
is no virtue




Re: [O] Best practices for literate programming [was: Latex export of tables]

2013-04-18 Thread Aaron Ecay
Hi Rasmus,

2013ko apirilak 18an, Rasmus-ek idatzi zuen:
> I too rely heavily on filters and customizations.  I haven't been able
> to fully appreciate the asynchronous exporter yet.
> 
> For instance I set some defaults for tables, pictures, add lots of
> entities etc. in my init file, and I went as far as writing a separate
> init file just loading just the org stuff.  Now, this is clearly /not/
> a very reproducible way of doing this.
> 
> So I'm really interested in hearing or seeing implementation where the
> goal is reproducibility.  On one hand I can appreciate keeping Org
> close to a vanilla state.  On the other hand, I'd have to overwrite
> defaults every time (e.g. I /always/ want booktab tables).  I guess I
> could keep an emacs-lisp block in the top of the file specifying
> stuff, but it also seems kind of tedious (copy-pasting every time).
> (Perhaps this could be resolved by loading external files hosted
> somewhere accessible).

If your external org configuration file were kept under version control
(I’ll discuss git but the principle is general), then reproducibility
would be possible.  There are ways of embedding git hashes in LaTeX
documents (for one example:
http://thorehusfeldt.net/2011/05/13/including-git-revision-identifiers-in-latex/),
and of course org could help automate this.  Including the git hash of
the document itself, the config file, and org-mode’s own code (assuming
these are kept in 3 separate repos) should allow perfect reproducibility
(modulo incompatible changes in emacs, I guess).

It would be interesting for org to have an ability to reference files
not just by name, but by git revision.  So that you could do something
like (where 123456 is some git hash):
#+include: [[gitbare:/path/to/repo::123456:my-org-setup-file.org]]
and have org take care of checking out the proper revision and loading
the file in the usual way.  This syntax is already implemented, for
plain links, in contrib/lisp/org-git-link.el, so it is just a matter
of making #+include and friends understand links in addition to
filenames.

-- 
Aaron Ecay



Re: [O] Best practices for literate programming [was: Latex export of tables]

2013-04-18 Thread Rasmus

Thomas,

>> Tom, do tell us more about what these habits are.
>
> The new exporter is really your friend.  Where before I might choose to
> generate a LaTeX block, now I look to generate Org output and then count
> on the exporter to do the right thing on the way to pdf.  
>
> The exporter's attribute system is very easy to use.  The attributes you
> need to access are always right there.
>
> I've also come to rely on filters quite a bit. I use them for
> non-breaking spaces, the plus/minus symbol, and for the multiple
> citation commands used by biblatex (e.g., \parencites). There seems to
> be a move afoot to collect filters so they can be widely distributed.
> I'd like to see the filters go to the Library of Babel, but for
> reproducible research it is probably best to keep them with the source
> document so there is no doubt about the fidelity of filter code.

I too rely heavily on filters and customizations.  I haven't been able
to fully appreciate the asynchronous exporter yet.

For instance I set some defaults for tables, pictures, add lots of
entities etc. in my init file, and I went as far as writing a separate
init file just loading just the org stuff.  Now, this is clearly /not/
a very reproducible way of doing this.

So I'm really interested in hearing or seeing implementation where the
goal is reproducibility.  On one hand I can appreciate keeping Org
close to a vanilla state.  On the other hand, I'd have to overwrite
defaults every time (e.g. I /always/ want booktab tables).  I guess I
could keep an emacs-lisp block in the top of the file specifying
stuff, but it also seems kind of tedious (copy-pasting every time).
(Perhaps this could be resolved by loading external files hosted
somewhere accessible).

–Rasmus

-- 
. . . The proofs are technical in nature and provides no real understanding.




Re: [O] Best practices for literate programming [was: Latex export of tables]

2013-04-17 Thread Suvayu Ali
Hi Rainer,

On Wed, Apr 17, 2013 at 11:55:50AM +0200, Rainer M. Krug wrote:
> 
> I did not follow the initial thread, but the new header caught my
> attentian, as I am doing something similar with papers. Nothing against
> org for writing papers, but I prefer LyX [1]. But for doing the analysis,
> org together, nothing beats org. So in my org file I have the
> analysis which creates graphs on export (and a basic report of the
> analysis, including all the source code necessary, which I can then use
> as an appendix for the paper).
> 
> These graphs are then inserted in the lyx file. I assume, you used
> something similar, only that the oputput can then be used in the org
> file (thesis) - correct?

Yes something like that; usually for me analysis code is so complicated
that doing it inside Org would be madness :-p, I have dedicated software
projects for that.  I only use Org for simple spreadsheet operations in
tables and eventually plotting them.  These then get included in the
final "thesis" file.

Cheers,

-- 
Suvayu

Open source is the future. It sets us free.



Re: [O] Best practices for literate programming [was: Latex export of tables]

2013-04-17 Thread Rainer M. Krug
Suvayu Ali  writes:

> Hi Vikas,
>
> On Wed, Apr 17, 2013 at 03:40:22AM +0530, Vikas Rawal wrote:
>> 
>> > At one point I realised the problem and made the decision to
>> > split things into two kinds of files: static content (document
>> > structuring, text, plots, etc), and dynamic content (babel, TikZ blocks
>> > that generate tables, plots, figures, etc used by the static content
>> > files).  It is still reproducible research, but modular and less hacky
>> > (hence more stable).
>> 
>> This is indeed a very neat approach. Would you kindly elaborate?
>> 
>> Would it be too much work for you to get some illustrations from your
>> work?
>
> Well ... it was couple of years back, the Org version was quite
> different, e.g. babel was rapidly evolving.  It might be a fair bit of
> work to get it working again.  That said, last year I gave a talk in an
> internal workshop, I made the plots with the attached file.  I didn't
> spend time to make sure everything is pretty, so the legend and titles
> might be a little wonky.  Just evaluating the two main source blocks
> should give you two plots in pdf files.
>
>> In your scheme of things, how do you finally combine the static and
>> the dynamic content?
>> 
>> Any chance that you could release the source of something like a
>> chapter of your thesis for people to see? Or may be create something
>> with dummy content?
>
> The idea is to keep the dynamic content on separate org files which you
> export less frequently during the course of your writing, e.g. any
> tables that are inputs for source blocks.  Evaluating these blocks, or
> exporting these dynamic files (whichever is your preference) generates
> the graphic which is then used in the static file.  This is not limited
> to plots, you could write org/LaTeX tables to separate files.  You can
> then easily include those in your static files.
>
> My main motivation for this was to make the export process simpler.  And
> since the complicated interacting bits are all isolated and modularised,
> there are fewer things that go wrong and many files are updated only
> when required, hence faster too!
>
> Anyway, this is all probably very vague without working examples.  I'll
> try to come up with something, but I have been rather busy for the last
> year or so and do not see any sign of respite in the near future :-/.
> I'll get this fleshed out at some point, just don't know how soon.
>
> Hope this was helpful in some way,
>
> :)
<#secure method=pgpmime mode=sign>
I did not follow the initial thread, but the new header caught my
attentian, as I am doing something similar with papers. Nothing against
org for writing papers, but I prefer LyX [1]. But for doing the analysis,
org together, nothing beats org. So in my org file I have the
analysis which creates graphs on export (and a basic report of the
analysis, including all the source code necessary, which I can then use
as an appendix for the paper).

These graphs are then inserted in the lyx file. I assume, you used
something similar, only that the oputput can then be used in the org
file (thesis) - correct?

Cheers,

Rainer

Footnotes: 
[1]  http://www.lyx.org - very nice LaTeX frontend.

-- 
Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, 
UCT), Dipl. Phys. (Germany)

Centre of Excellence for Invasion Biology
Stellenbosch University
South Africa

Tel :   +33 - (0)9 53 10 27 44
Cell:   +33 - (0)6 85 62 59 98
Fax :   +33 - (0)9 58 10 27 44

Fax (D):+49 - (0)3 21 21 25 22 44

email:  rai...@krugs.de

Skype:  RMkrug



Re: [O] Best practices for literate programming [was: Latex export of tables]

2013-04-16 Thread Suvayu Ali
Hi Vikas,

On Wed, Apr 17, 2013 at 03:40:22AM +0530, Vikas Rawal wrote:
> 
> > At one point I realised the problem and made the decision to
> > split things into two kinds of files: static content (document
> > structuring, text, plots, etc), and dynamic content (babel, TikZ blocks
> > that generate tables, plots, figures, etc used by the static content
> > files).  It is still reproducible research, but modular and less hacky
> > (hence more stable).
> 
> This is indeed a very neat approach. Would you kindly elaborate?
> 
> Would it be too much work for you to get some illustrations from your
> work?

Well ... it was couple of years back, the Org version was quite
different, e.g. babel was rapidly evolving.  It might be a fair bit of
work to get it working again.  That said, last year I gave a talk in an
internal workshop, I made the plots with the attached file.  I didn't
spend time to make sure everything is pretty, so the legend and titles
might be a little wonky.  Just evaluating the two main source blocks
should give you two plots in pdf files.

> In your scheme of things, how do you finally combine the static and
> the dynamic content?
> 
> Any chance that you could release the source of something like a
> chapter of your thesis for people to see? Or may be create something
> with dummy content?

The idea is to keep the dynamic content on separate org files which you
export less frequently during the course of your writing, e.g. any
tables that are inputs for source blocks.  Evaluating these blocks, or
exporting these dynamic files (whichever is your preference) generates
the graphic which is then used in the static file.  This is not limited
to plots, you could write org/LaTeX tables to separate files.  You can
then easily include those in your static files.

My main motivation for this was to make the export process simpler.  And
since the complicated interacting bits are all isolated and modularised,
there are fewer things that go wrong and many files are updated only
when required, hence faster too!

Anyway, this is all probably very vague without working examples.  I'll
try to come up with something, but I have been rather busy for the last
year or so and do not see any sign of respite in the near future :-/.
I'll get this fleshed out at some point, just don't know how soon.

Hope this was helpful in some way,

:)

-- 
Suvayu

Open source is the future. It sets us free.
#+STARTUP: overview
#+PROPERTY: noweb yes
#+PROPERTY: results silent
#+BIND: org-confirm-babel-evaluate nil


* Gnuplot source preamble   :src:
  :PROPERTIES:
  :VISIBILITY: folded
  :END:

#+name: gnuplot-preamble
#+begin_src gnuplot
  reset
  set terminal pdfcairo color size 21cm,14.8cm
  set termoption enhanced
  set encoding utf8
  set termoption font "DejaVuSerif,8"
  # set output '|display png:-'
  set grid back
  set style line 1 linewidth 9 pointtype 1  linecolor rgb 'orange'
  set style line 2 pointsize 1 pointtype 5  linecolor rgb 'forest-green'
  set style line 3 pointsize 1 pointtype 7  linecolor rgb 'red'
  set style line 4 pointsize 1 pointtype 9  linecolor rgb 'blue'
  set style line 5 pointsize 1 pointtype 11 linecolor rgb 'dark-gray'
  set style line 6 pointsize 1 pointtype 13 linecolor rgb 'brown'
  set style line 7 linewidth 7 pointtype 19 linecolor rgb 'black'
  set style line 10 linewidth 2 linecolor rgb 'black'
  set style line 11 linewidth 5 linecolor rgb 'red'
  set key outside
  set key box linestyle 10
#+end_src


* BF Upper Limit summary plots
** Gnuplot source   :src:
#+name: limits-preamble
#+begin_src gnuplot
  set log y
  set format y "10^{%L}"
  set ylabel 'BF Upper Limit'
  set xtics nomirror rotate by 90 offset character 0,-3
#+end_src

*** B⁺ → h⁻l⁺l⁺ / D⁻l⁺l⁺  :Bplus:
#+begin_src gnuplot :noweb yes :var limits=Bpluslimits
  <>
  <>
  set xrange [0:8]
  set yrange [1E-14:1E-5]
  set label 'BF Upper Limits:' at graph 1.02,0.55 font ',10'
  set label ' B⁺ → h⁻l⁺l⁺' at graph 1.02,0.5
  set label ' B⁺ → D⁽*⁾⁻l⁺l⁺' at graph 1.02,0.45
  set label 'LHCb limits \@ 95% C.L.' at graph 1.02,0.37 font ',7'
  set label 'Other limits \@ 90% C.L.' at graph 1.02,0.33 font ',7'
  set xtics ("K⁻e⁺e⁺" 1, "K⁻μ⁺μ⁺" 2, "π⁻e⁺e⁺" 3, "π⁻μ⁺μ⁺" 4, "D⁻e⁺e⁺" 5, 
"D⁻μ⁺μ⁺" 6, "D*⁻μ⁺μ⁺" 7)
  set output "Bpluslimits.pdf"
  plot "$limits" using 1:2 title 'Theory' linestyle 1, \
   "$limits" using 1:3 title 'BaBar' linestyle 2, \
   "$limits" using 1:4 title 'Belle' linestyle 3, \
   "$limits" using 1:5 title 'LHCb' linestyle 4, \
   "$limits" using 1:6 title 'LHCb year-end' linestyle 5, \
   "$limits" using 1:7 title 'LHCb upgrade' linestyle 6
   # 1E-10 with lines linestyle 10 title ''
   # 3.1E-9 with lines linestyle 11
  set output
#+end_src

*** D⁺ → h⁻l⁺l⁺ / Dₛ⁺ → h⁻l⁺l⁺:Dplus:
#+begin_src gnuplot :noweb yes :var limit

Re: [O] Best practices for literate programming [was: Latex export of tables]

2013-04-16 Thread Thomas S. Dye
Aloha Vikas,

Vikas Rawal  writes:

>> I've been down it too many times myself. The habits I've developed
>> over time have helped, but I think they are less systematic than
>> what you've devised.
>
> Tom, do tell us more about what these habits are.

The new exporter is really your friend.  Where before I might choose to
generate a LaTeX block, now I look to generate Org output and then count
on the exporter to do the right thing on the way to pdf.  

The exporter's attribute system is very easy to use.  The attributes you
need to access are always right there.

I've also come to rely on filters quite a bit. I use them for
non-breaking spaces, the plus/minus symbol, and for the multiple
citation commands used by biblatex (e.g., \parencites). There seems to
be a move afoot to collect filters so they can be widely distributed.
I'd like to see the filters go to the Library of Babel, but for
reproducible research it is probably best to keep them with the source
document so there is no doubt about the fidelity of filter code.

All the best,
Tom

-- 
T.S. Dye & Colleagues, Archaeologists
735 Bishop St, Suite 315, Honolulu, HI 96813
Tel: 808-529-0866, Fax: 808-529-0884
http://www.tsdye.com