Re: [O] Babel: communicating irregular data to R source-code block
Michael Hannon jm_han...@yahoo.com writes: On Wednesday, April 25, 2012 at 4:52 PM Thomas S. Dye wrote: Michael Hannon jm_han...@yahoo.com writes: On Monday, April 23, 2012 at 11:44 PM Thomas S. Dye wrote: . . . The documentation of read.table has this: The number of data columns is determined by looking at the first five lines of input (or the whole file if it has less than five lines), or from the length of col.names if it is specified and is longer. This could conceivably be wrong if fill or blank.lines.skip are true, so specify col.names if necessary (as in the ‘Examples’). The example is this: read.csv(tf, fill = TRUE, header = FALSE, col.names = paste(V, seq_len(ncol), sep = )) where read.csv is a synonym of read.table with preset arguments. This explains why the sixth line wraps. . . . Thanks, Tom. I had just run across this myself. I guess I need to walk a mile in somebody's moccasins before complaining, but this behavior on the part of R seems totally stupid to me. I'm going to have to mull this over some more. -- Mike Aloha Mike, Eric Schulte has pushed up some patches designed to make R source block variables accept irregular data. So, with pascals-triangle(8), for instance, one gets a potentially useful dataframe in R: #+NAME: sanity-check #+HEADER: :var sc_input=pascals-triangle #+BEGIN_SRC R sc_input #+END_SRC #+RESULTS: sanity-check | 1 | nil | nil | nil | nil | nil | nil | nil | nil | | 1 | 1 | nil | nil | nil | nil | nil | nil | nil | | 1 | 2 | 1 | nil | nil | nil | nil | nil | nil | | 1 | 3 | 3 | 1 | nil | nil | nil | nil | nil | | 1 | 4 | 6 | 4 | 1 | nil | nil | nil | nil | | 1 | 5 | 10 | 10 | 5 | 1 | nil | nil | nil | | 1 | 6 | 15 | 20 | 15 | 6 | 1 | nil | nil | | 1 | 7 | 21 | 35 | 35 | 21 | 7 | 1 | nil | | 1 | 8 | 28 | 56 | 70 | 56 | 28 | 8 | 1 | Could you pull the development version of Org mode and see if this solves your problem? Well, NOW you've done it! Just when I thought I could beg off on this talk, it all seems to be working ;-) Thanks, Tom and Eric. I've appended a sample output, just FYI. I also tried it for n=7 and got the correct results. Magic! As an aside, the rows of the Pascal Triangle should sum to 2^n, which they do in my test cases. I haven't yet implemented Eric's (much sexier) sum(sub-diagonal-elements) == Fibonacci nos. test, but I'll look into it. Thanks again, -- Mike Good news. Please consider sharing your seminar talk on Worg, if you think it might be appropriate. All the best, Tom # Org-mode version 7.8.09 (release_7.8.09-390-gfb7ebd @ /usr/local/emacs.d/org-mode/org-devel/org-mode/lisp/org-install.el) - #+PROPERTY: session *R* * verify PT #+name: pascals_triangle #+begin_src python :var n=5 :exports none :results value def pascals_triangle(n): if n == 0: return [[1]] prev_triangle = pascals_triangle(n-1) prev_row = prev_triangle[n-1] this_row = map(sum, zip([0] + prev_row, prev_row + [0])) return prev_triangle + [this_row] pascals_triangle(n) #+end_src #+RESULTS: pascals_triangle | 1 | | | | | | | 1 | 1 | | | | | | 1 | 2 | 1 | | | | | 1 | 3 | 3 | 1 | | | | 1 | 4 | 6 | 4 | 1 | | | 1 | 5 | 10 | 10 | 5 | 1 | #+NAME: sanity-check(sc_input=pascals_triangle) #+BEGIN_SRC R :fill yes :results output pt - sc_input pt[is.na(pt)] - 0 rowSums(pt) #+END_SRC #+RESULTS: sanity-check : [1] 1 2 4 8 16 32 -- Thomas S. Dye http://www.tsdye.com
Re: [O] Babel: communicating irregular data to R source-code block
Michael Hannon jm_han...@yahoo.com writes: On Monday, April 23, 2012 at 11:44 PM Thomas S. Dye wrote: . . . The documentation of read.table has this: The number of data columns is determined by looking at the first five lines of input (or the whole file if it has less than five lines), or from the length of col.names if it is specified and is longer. This could conceivably be wrong if fill or blank.lines.skip are true, so specify col.names if necessary (as in the ‘Examples’). The example is this: read.csv(tf, fill = TRUE, header = FALSE, col.names = paste(V, seq_len(ncol), sep = )) where read.csv is a synonym of read.table with preset arguments. This explains why the sixth line wraps. . . . Thanks, Tom. I had just run across this myself. I guess I need to walk a mile in somebody's moccasins before complaining, but this behavior on the part of R seems totally stupid to me. I'm going to have to mull this over some more. -- Mike Aloha Mike, Eric Schulte has pushed up some patches designed to make R source block variables accept irregular data. So, with pascals-triangle(8), for instance, one gets a potentially useful dataframe in R: #+NAME: sanity-check #+HEADER: :var sc_input=pascals-triangle #+BEGIN_SRC R sc_input #+END_SRC #+RESULTS: sanity-check | 1 | nil | nil | nil | nil | nil | nil | nil | nil | | 1 | 1 | nil | nil | nil | nil | nil | nil | nil | | 1 | 2 | 1 | nil | nil | nil | nil | nil | nil | | 1 | 3 | 3 | 1 | nil | nil | nil | nil | nil | | 1 | 4 | 6 | 4 | 1 | nil | nil | nil | nil | | 1 | 5 | 10 | 10 | 5 | 1 | nil | nil | nil | | 1 | 6 | 15 | 20 | 15 | 6 | 1 | nil | nil | | 1 | 7 | 21 | 35 | 35 | 21 | 7 | 1 | nil | | 1 | 8 | 28 | 56 | 70 | 56 | 28 | 8 | 1 | Could you pull the development version of Org mode and see if this solves your problem? All the best, Tom -- Thomas S. Dye http://www.tsdye.com
Re: [O] Babel: communicating irregular data to R source-code block
On Wednesday, April 25, 2012 at 4:52 PM Thomas S. Dye wrote: Michael Hannon jm_han...@yahoo.com writes: On Monday, April 23, 2012 at 11:44 PM Thomas S. Dye wrote: . . . The documentation of read.table has this: The number of data columns is determined by looking at the first five lines of input (or the whole file if it has less than five lines), or from the length of col.names if it is specified and is longer. This could conceivably be wrong if fill or blank.lines.skip are true, so specify col.names if necessary (as in the ‘Examples’). The example is this: read.csv(tf, fill = TRUE, header = FALSE, col.names = paste(V, seq_len(ncol), sep = )) where read.csv is a synonym of read.table with preset arguments. This explains why the sixth line wraps. . . . Thanks, Tom. I had just run across this myself. I guess I need to walk a mile in somebody's moccasins before complaining, but this behavior on the part of R seems totally stupid to me. I'm going to have to mull this over some more. -- Mike Aloha Mike, Eric Schulte has pushed up some patches designed to make R source block variables accept irregular data. So, with pascals-triangle(8), for instance, one gets a potentially useful dataframe in R: #+NAME: sanity-check #+HEADER: :var sc_input=pascals-triangle #+BEGIN_SRC R sc_input #+END_SRC #+RESULTS: sanity-check | 1 | nil | nil | nil | nil | nil | nil | nil | nil | | 1 | 1 | nil | nil | nil | nil | nil | nil | nil | | 1 | 2 | 1 | nil | nil | nil | nil | nil | nil | | 1 | 3 | 3 | 1 | nil | nil | nil | nil | nil | | 1 | 4 | 6 | 4 | 1 | nil | nil | nil | nil | | 1 | 5 | 10 | 10 | 5 | 1 | nil | nil | nil | | 1 | 6 | 15 | 20 | 15 | 6 | 1 | nil | nil | | 1 | 7 | 21 | 35 | 35 | 21 | 7 | 1 | nil | | 1 | 8 | 28 | 56 | 70 | 56 | 28 | 8 | 1 | Could you pull the development version of Org mode and see if this solves your problem? Well, NOW you've done it! Just when I thought I could beg off on this talk, it all seems to be working ;-) Thanks, Tom and Eric. I've appended a sample output, just FYI. I also tried it for n=7 and got the correct results. Magic! As an aside, the rows of the Pascal Triangle should sum to 2^n, which they do in my test cases. I haven't yet implemented Eric's (much sexier) sum(sub-diagonal-elements) == Fibonacci nos. test, but I'll look into it. Thanks again, -- Mike # Org-mode version 7.8.09 (release_7.8.09-390-gfb7ebd @ /usr/local/emacs.d/org-mode/org-devel/org-mode/lisp/org-install.el) - #+PROPERTY: session *R* * verify PT #+name: pascals_triangle #+begin_src python :var n=5 :exports none :results value def pascals_triangle(n): if n == 0: return [[1]] prev_triangle = pascals_triangle(n-1) prev_row = prev_triangle[n-1] this_row = map(sum, zip([0] + prev_row, prev_row + [0])) return prev_triangle + [this_row] pascals_triangle(n) #+end_src #+RESULTS: pascals_triangle | 1 | | | | | | | 1 | 1 | | | | | | 1 | 2 | 1 | | | | | 1 | 3 | 3 | 1 | | | | 1 | 4 | 6 | 4 | 1 | | | 1 | 5 | 10 | 10 | 5 | 1 | #+NAME: sanity-check(sc_input=pascals_triangle) #+BEGIN_SRC R :fill yes :results output pt - sc_input pt[is.na(pt)] - 0 rowSums(pt) #+END_SRC #+RESULTS: sanity-check : [1] 1 2 4 8 16 32
Re: [O] Babel: communicating irregular data to R source-code block
On Monday, April 23, 2012 at 11:44 PM Thomas S. Dye wrote: . . . The documentation of read.table has this: The number of data columns is determined by looking at the first five lines of input (or the whole file if it has less than five lines), or from the length of col.names if it is specified and is longer. This could conceivably be wrong if fill or blank.lines.skip are true, so specify col.names if necessary (as in the ‘Examples’). The example is this: read.csv(tf, fill = TRUE, header = FALSE, col.names = paste(V, seq_len(ncol), sep = )) where read.csv is a synonym of read.table with preset arguments. This explains why the sixth line wraps. . . . Thanks, Tom. I had just run across this myself. I guess I need to walk a mile in somebody's moccasins before complaining, but this behavior on the part of R seems totally stupid to me. I'm going to have to mull this over some more. -- Mike
Re: [O] Babel: communicating irregular data to R source-code block
Michael Hannon jm_han...@yahoo.com writes: On Monday, April 23, 2012 at 11:44 PM Thomas S. Dye wrote: . . . The documentation of read.table has this: The number of data columns is determined by looking at the first five lines of input (or the whole file if it has less than five lines), or from the length of col.names if it is specified and is longer. This could conceivably be wrong if fill or blank.lines.skip are true, so specify col.names if necessary (as in the ‘Examples’). The example is this: read.csv(tf, fill = TRUE, header = FALSE, col.names = paste(V, seq_len(ncol), sep = )) where read.csv is a synonym of read.table with preset arguments. This explains why the sixth line wraps. . . . Thanks, Tom. I had just run across this myself. I guess I need to walk a mile in somebody's moccasins before complaining, but this behavior on the part of R seems totally stupid to me. I'm going to have to mull this over some more. -- Mike Yes, please do. I'm not a programmer, and often get things wrong, but I trust you'll help rein me in if I get off on a tangent. It would be good if this limitation in ob-R were eliminated. The way I see it, ob-R is designed to handle a subset of the expected input. It coerces a variable into a tsv table, then reads it into R, expecting all cells are filled. At the same time, other babel modules are free to export structures (in the Pascal's triangle example, a list of lists) that orgtbl-to-tsv interprets as a table with empty cells. It would be nice if ob-R could be made to read all the tables that orgtbl-to-tsv is able to create. All the best, Tom -- Thomas S. Dye http://www.tsdye.com
Re: [O] Babel: communicating irregular data to R source-code block
t...@tsdye.com (Thomas S. Dye) writes: Michael Hannon jm_han...@yahoo.com writes: On Monday, April 23, 2012 at 11:44 PM Thomas S. Dye wrote: . . . The documentation of read.table has this: The number of data columns is determined by looking at the first five lines of input (or the whole file if it has less than five lines), or from the length of col.names if it is specified and is longer. This could conceivably be wrong if fill or blank.lines.skip are true, so specify col.names if necessary (as in the ‘Examples’). The example is this: read.csv(tf, fill = TRUE, header = FALSE, col.names = paste(V, seq_len(ncol), sep = )) where read.csv is a synonym of read.table with preset arguments. This explains why the sixth line wraps. . . . Thanks, Tom. I had just run across this myself. I guess I need to walk a mile in somebody's moccasins before complaining, but this behavior on the part of R seems totally stupid to me. I'm going to have to mull this over some more. -- Mike Yes, please do. I'm not a programmer, and often get things wrong, but I trust you'll help rein me in if I get off on a tangent. It would be good if this limitation in ob-R were eliminated. The way I see it, ob-R is designed to handle a subset of the expected input. It coerces a variable into a tsv table, then reads it into R, expecting all cells are filled. At the same time, other babel modules are free to export structures (in the Pascal's triangle example, a list of lists) that orgtbl-to-tsv interprets as a table with empty cells. It would be nice if ob-R could be made to read all the tables that orgtbl-to-tsv is able to create. All the best, Tom Here is about as far as I can go with this. It appears to work for tables with or without column heads. The 6 is still hard-coded. I can't find a way to determine the number of columns in VALUE, but assume there is one. If the number of columns in VALUE were to replace the hard-coded 6, this might work. (format %s - read.table(\%s\, header=%s, row.names=%s, sep=\\\t\, as.is=TRUE, fill=TRUE%s) name (org-babel-process-file-name transition-file 'noquote) (if (or (eq (nth 1 value) 'hline) colnames-p) TRUE FALSE) (if rownames-p 1 NULL) (if (eq (nth 1 value) 'hline) , col.names = paste(\V\, seq_len(6), sep = \\ All the best, Tom -- Thomas S. Dye http://www.tsdye.com
Re: [O] Babel: communicating irregular data to R source-code block
Hi Eric, Eric Schulte eric.schu...@gmx.com writes: t...@tsdye.com (Thomas S. Dye) writes: Aloha Michael, Michael Hannon jm_han...@yahoo.com writes: Greetings. I'm sitting in on a weekly, informal, brown-bag seminar on data technologies in statistics. There are more people attending the seminar than there are weeks in which to give talks, so I may get by with being my usual, passive-slug self. But I thought it might be useful to have a contingency plan and decided that giving a brief talk about Babel might be useful/instructive. I thought (and think) that mushing together (with attribution) some of the content of the paper [1] by The Gang of Four and the content of Eric's talk [2] might be a good approach. (BTW, if this isn't legal, desirable, permissible, etc., this would be a good time to tell me.) I would be happy for you to re-use these materials. I liked the Pascal's Triangle example (which morphed from elisp to Python, or vice versa, in the two references), but I was afraid that the elisp routine pst-check, used as a check on the correctness of the previously-generated Pascal's triangle, might be too esoteric for this audience, not to mention me. (The recursive Fibonacci function is virtually identical in all languages, but the second part is more obscure.) I was giving a presentation to a local lisp/scheme user group, so I figured I'd spare them the pain of trying to read python code :). I thought it should be possible to use R to do the same sanity check, as R would be much more-familiar to this audience (and its use would still demonstrate the meta-language feature of Babel). Unfortunately, I haven't been able to find a way to communicate the output of the Pascal's Triangle example to an R source-code block. The gist of the problem seems to be that regardless of how I try to grab the data (scan, readLines, etc.) Babel always ends up trying to read a data frame (table) and I get an error similar to: I present some options below specific to Tom's discussion, but another option may be to use the :results output option on a python code block which prints the table to STDOUT, and then use something line readLines to read from the resulting string into R. I didn't have any luck with :results output, but didn't spend much time trying to figure it out. Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 1 did not have 5 elements Enter a frame number, or 0 to exit 1: read.table(/tmp/babel-3780tje/R-import-3780Akj, header = FALSE, row.names = NULL, sep = If I construct a table by hand with all of the cells occupied, everything goes OK. For instance: #+TBLNAME: some-junk | 1 | 0 | 0 | 0 | | 1 | 1 | 0 | 0 | | 1 | 2 | 1 | 0 | | 1 | 3 | 3 | 1 | #+NAME: read-some-junk(sj_input=some-junk) #+BEGIN_SRC R rowSums(sj_input) #+END_SRC #+RESULTS: read-some-junk | 1 | | 2 | | 4 | | 8 | But the following gives the kind of error I described above: #+name: pascals_triangle #+begin_src python :var n=5 :exports none :return pascals_triangle(5) def pascals_triangle(n): if n == 0: return [[1]] prev_triangle = pascals_triangle(n-1) prev_row = prev_triangle[n-1] this_row = map(sum, zip([0] + prev_row, prev_row + [0])) return prev_triangle + [this_row] pascals_triangle(n) #+end_src A few things are wrong at this point. It seems the JSS article has an error in the header of the pascals_triangle source block. AFAIK there is no header argument :return. I don't know how :return pascals_triangle(5) got there, but am fairly certain it shouldn't be. The :return header argument *is* a supported header argument of python code blocks and is not an error. The python code block should run w/o error and without the extra return pascals_triangle(n) at the bottom. The following works for me. #+name: pascals_triangle #+begin_src python :var n=5 :exports none :return pascals_triangle(5) def pascals_triangle(n): if n == 0: return [[1]] prev_triangle = pascals_triangle(n-1) prev_row = prev_triangle[n-1] this_row = map(sum, zip([0] + prev_row, prev_row + [0])) return prev_triangle + [this_row] #+end_src #+RESULTS: pascals_triangle | 1 | ||| | | | 1 | 1 ||| | | | 1 | 2 | 1 || | | | 1 | 3 | 3 | 1 | | | | 1 | 4 | 6 | 4 | 1 | | | 1 | 5 | 10 | 10 | 5 | 1 | [...] I'm beginning to see why you have strong feelings about python. In the code above, the blank line before #+end_src is necessary and must not contain any spaces, and :var n can be set to anything, since it is declared for initialization only. The code in the JSS article doesn't run for me with a recent Org-mode unless I add a blank line before #+end_src, or remove the :return header argument. If I remove the :return header argument, then the need for the blank line goes away. The following code
Re: [O] Babel: communicating irregular data to R source-code block
[...] I'm beginning to see why you have strong feelings about python. Semantically meaningful whitespace is a bad idea for a programming langauge. In the code above, the blank line before #+end_src is necessary and must not contain any spaces, and :var n can be set to anything, since it is declared for initialization only. The code in the JSS article doesn't run for me with a recent Org-mode unless I add a blank line before #+end_src, or remove the :return header argument. If I remove the :return header argument, then the need for the blank line goes away. The following code block seems to work: #+name: pascals-triangle #+begin_src python :var n=2 :exports none def pascals_triangle(n): if n == 0: return [[1]] prev_triangle = pascals_triangle(n-1) prev_row = prev_triangle[n-1] this_row = map(sum, zip([0] + prev_row, prev_row + [0])) return prev_triangle + [this_row] return pascals_triangle(n) #+end_src #+RESULTS: pascals-triangle | 1 | | | | 1 | 1 | | | 1 | 2 | 1 | I'm guessing that the need for a blank line when using :results has arisen since the JSS article was published, because the article was generated from source code and didn't show any errors. I believe that we used to pad code blocks with newlines when they were extracted from the buffer, which had the effect of automatically adding this extra line. This behavior however caused problems in some cases where the extra line was not desired. If I have this right (a big if), then might it be possible to re-establish the old behavior so the JSS code works? I've just pushed up a patch in which the addition of the return value in python is careful to add this newline itself. This should restore the functionality of the python code from the paper (specifically the following now works [1]). This is applied to the maint branch so hopefully it will sync with Emacs before the release of Emacs24. Best, Footnotes: [1] #+name: pascals-triangle #+begin_src python :var n=2 :exports none :return pascals_triangle(n) def pascals_triangle(n): if n == 0: return [[1]] prev_triangle = pascals_triangle(n-1) prev_row = prev_triangle[n-1] this_row = map(sum, zip([0] + prev_row, prev_row + [0])) return prev_triangle + [this_row] #+end_src #+RESULTS: pascals-triangle | 1 | | | | 1 | 1 | | | 1 | 2 | 1 | -- Eric Schulte http://cs.unm.edu/~eschulte/
Re: [O] Babel: communicating irregular data to R source-code block
Eric Schulte eric.schu...@gmx.com writes: [...] I'm beginning to see why you have strong feelings about python. Semantically meaningful whitespace is a bad idea for a programming langauge. Yes, this makes sense to me. I suppose I should wean myself from python now that I use babel as a glue language. In the code above, the blank line before #+end_src is necessary and must not contain any spaces, and :var n can be set to anything, since it is declared for initialization only. The code in the JSS article doesn't run for me with a recent Org-mode unless I add a blank line before #+end_src, or remove the :return header argument. If I remove the :return header argument, then the need for the blank line goes away. The following code block seems to work: #+name: pascals-triangle #+begin_src python :var n=2 :exports none def pascals_triangle(n): if n == 0: return [[1]] prev_triangle = pascals_triangle(n-1) prev_row = prev_triangle[n-1] this_row = map(sum, zip([0] + prev_row, prev_row + [0])) return prev_triangle + [this_row] return pascals_triangle(n) #+end_src #+RESULTS: pascals-triangle | 1 | | | | 1 | 1 | | | 1 | 2 | 1 | I'm guessing that the need for a blank line when using :results has arisen since the JSS article was published, because the article was generated from source code and didn't show any errors. I believe that we used to pad code blocks with newlines when they were extracted from the buffer, which had the effect of automatically adding this extra line. This behavior however caused problems in some cases where the extra line was not desired. If I have this right (a big if), then might it be possible to re-establish the old behavior so the JSS code works? I've just pushed up a patch in which the addition of the return value in python is careful to add this newline itself. This should restore the functionality of the python code from the paper (specifically the following now works [1]). This is applied to the maint branch so hopefully it will sync with Emacs before the release of Emacs24. Thanks Eric. The source block in the paper returns the correct result with the code in the maint branch. All the best, Tom Best, Footnotes: [1] #+name: pascals-triangle #+begin_src python :var n=2 :exports none :return pascals_triangle(n) def pascals_triangle(n): if n == 0: return [[1]] prev_triangle = pascals_triangle(n-1) prev_row = prev_triangle[n-1] this_row = map(sum, zip([0] + prev_row, prev_row + [0])) return prev_triangle + [this_row] #+end_src #+RESULTS: pascals-triangle | 1 | | | | 1 | 1 | | | 1 | 2 | 1 | -- Thomas S. Dye http://www.tsdye.com
Re: [O] Babel: communicating irregular data to R source-code block
Greetings. I'm sorry to belabor this, but I thought I had found a relatively clean way to pass a ragged table to an R source-code block. Simple answer: add the fill=TRUE option to the read.table function. Please see the appended for the log of an R session that does what I want. I then tried to do the same thing in an R source-code block: #+RESULTS: pascals_triangle | 1 | | | | | | | 1 | 1 | | | | | | 1 | 2 | 1 | | | | | 1 | 3 | 3 | 1 | | | | 1 | 4 | 6 | 4 | 1 | | | 1 | 5 | 10 | 10 | 5 | 1 | #+NAME: sanity-check(sc_input=pascals_triangle) #+BEGIN_SRC R pt - read.table(sc_input, fill=TRUE) rowSums(pt) #+END_SRC Unfortunately, this still results in the error that the first line did not contain five elements: Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 1 did not have 5 elements Enter a frame number, or 0 to exit 1: read.table(/tmp/babel-3780tje/R-import-37801if, header = FALSE, row.names = NULL, sep = 2: scan(file = file, what = what, sep = sep, quote = quote, dec = dec, nmax = nrows, skip = 0, I.e.,it seems that Org is going to do its own read.table before even looking at the code in the source block. Is there some way to get Org to use the fill=TRUE option on a case-by-case basis? Thanks. -- Mike Appendix: R code that correctly reads and processes a Pascal's triangle === system(cat pascal.dat) 1 1 1 1 2 1 1 3 3 1 1 4 6 4 1 x - read.table(pascal.dat, fill=TRUE) x V1 V2 V3 V4 V5 1 1 NA NA NA NA 2 1 1 NA NA NA 3 1 2 1 NA NA 4 1 3 3 1 NA 5 1 4 6 4 1 y - as.matrix(x) y V1 V2 V3 V4 V5 [1,] 1 NA NA NA NA [2,] 1 1 NA NA NA [3,] 1 2 1 NA NA [4,] 1 3 3 1 NA [5,] 1 4 6 4 1 y[is.na(y)] - 0 y V1 V2 V3 V4 V5 [1,] 1 0 0 0 0 [2,] 1 1 0 0 0 [3,] 1 2 1 0 0 [4,] 1 3 3 1 0 [5,] 1 4 6 4 1 dimnames(y)[[2]]=NULL cosmetic change y [,1] [,2] [,3] [,4] [,5] [1,] 1 0 0 0 0 [2,] 1 1 0 0 0 [3,] 1 2 1 0 0 [4,] 1 3 3 1 0 [5,] 1 4 6 4 1
Re: [O] Babel: communicating irregular data to R source-code block
[...] I.e.,it seems that Org is going to do its own read.table before even looking at the code in the source block. Yes, this is true, Org will use read.table to read in tabular data. See the code in lisp/ob-R.el for specifics. Is there some way to get Org to use the fill=TRUE option on a case-by-case basis? Yes, The attached patch allows the :fill header argument to be specified adding fill=TRUE to the read.table function call. Please try it out and let me know if it works for you. From 45240d367eb981a93f3c694946d4f2a99044cda5 Mon Sep 17 00:00:00 2001 From: Eric Schulte eric.schu...@gmx.com Date: Mon, 23 Apr 2012 17:04:37 -0400 Subject: [PATCH] add :fill header argument to R code blocks * lisp/ob-R.el (org-babel-header-args:R): List this as a viable R header argument. (org-babel-variable-assignments:R): Check the value of this new :fill header argument. (org-babel-R-assign-elisp): Set fill=TRUE if the :fill header argument has been used. --- lisp/ob-R.el | 33 + 1 file changed, 21 insertions(+), 12 deletions(-) diff --git a/lisp/ob-R.el b/lisp/ob-R.el index 9538dc4..1427641 100644 --- a/lisp/ob-R.el +++ b/lisp/ob-R.el @@ -61,6 +61,7 @@ (colormodel . :any) (useDingbats . :any) (horizontal . :any) +(fill. ((yes no))) (results . ((file list vector table scalar verbatim) (raw org html latex code pp wrap) (replace silent append prepend) @@ -148,7 +149,8 @@ This function is called by `org-babel-execute-src-block'. (org-babel-R-assign-elisp (car pair) (cdr pair) (equal yes (cdr (assoc :colnames params))) - (equal yes (cdr (assoc :rownames params) + (equal yes (cdr (assoc :rownames params))) + (equal yes (cdr (assoc :fill params) (mapcar (lambda (i) (cons (car (nth i vars)) @@ -164,19 +166,26 @@ This function is called by `org-babel-execute-src-block'. (concat \ (mapconcat 'identity (split-string s \) \\) \) (format %S s))) -(defun org-babel-R-assign-elisp (name value colnames-p rownames-p) +(defun org-babel-R-assign-elisp (name value colnames-p rownames-p fill-p) Construct R code assigning the elisp VALUE to a variable named NAME. (if (listp value) - (let ((transition-file (org-babel-temp-file R-import-))) -;; ensure VALUE has an orgtbl structure (depth of at least 2) -(unless (listp (car value)) (setq value (list value))) -(with-temp-file transition-file - (insert (orgtbl-to-tsv value '(:fmt org-babel-R-quote-tsv-field))) - (insert \n)) -(format %s - read.table(\%s\, header=%s, row.names=%s, sep=\\\t\, as.is=TRUE) -name (org-babel-process-file-name transition-file 'noquote) - (if (or (eq (nth 1 value) 'hline) colnames-p) TRUE FALSE) - (if rownames-p 1 NULL))) + (flet ((R-bool (bool) (if bool TRUE FALSE))) + (let ((transition-file (org-babel-temp-file R-import-))) + ;; ensure VALUE has an orgtbl structure (depth of at least 2) + (unless (listp (car value)) (setq value (list value))) + (with-temp-file transition-file + (insert (orgtbl-to-tsv value '(:fmt org-babel-R-quote-tsv-field))) + (insert \n)) + (format %s - read.table(\%s\, %s, as.is=TRUE) + name (org-babel-process-file-name transition-file 'noquote) + (mapconcat (lambda (pair) (concat (car pair) = (cdr pair))) + `((header. ,(R-bool (or (eq (nth 1 value) + 'hline) + colnames-p))) + (row.names . ,(if rownames-p 1 NULL)) + (sep . \\\t\) + (fill . ,(R-bool fill-p))) + , (format %s - %s name (org-babel-R-quote-tsv-field value (defvar ess-ask-for-ess-directory nil) -- 1.7.10 Best, -- Eric Schulte http://cs.unm.edu/~eschulte/
Re: [O] Babel: communicating irregular data to R source-code block
Michael Hannon jm_han...@yahoo.com writes: Greetings. I'm sorry to belabor this, but I thought I had found a relatively clean way to pass a ragged table to an R source-code block. Simple answer: add the fill=TRUE option to the read.table function. Please see the appended for the log of an R session that does what I want. I then tried to do the same thing in an R source-code block: #+RESULTS: pascals_triangle | 1 | | | | | | | 1 | 1 | | | | | | 1 | 2 | 1 | | | | | 1 | 3 | 3 | 1 | | | | 1 | 4 | 6 | 4 | 1 | | | 1 | 5 | 10 | 10 | 5 | 1 | #+NAME: sanity-check(sc_input=pascals_triangle) #+BEGIN_SRC R pt - read.table(sc_input, fill=TRUE) rowSums(pt) #+END_SRC Unfortunately, this still results in the error that the first line did not contain five elements: Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 1 did not have 5 elements Enter a frame number, or 0 to exit 1: read.table(/tmp/babel-3780tje/R-import-37801if, header = FALSE, row.names = NULL, sep = 2: scan(file = file, what = what, sep = sep, quote = quote, dec = dec, nmax = nrows, skip = 0, I.e.,it seems that Org is going to do its own read.table before even looking at the code in the source block. Yes, I believe this happens when Org assigns values to R variables. Is there some way to get Org to use the fill=TRUE option on a case-by-case basis? I don't think so. The call to read.table in org-babel-R-assign-elisp doesn't use the fill option: (format %s - read.table(\%s\, header=%s, row.names=%s, sep=\\\t\, as.is=TRUE) If I add fill=TRUE to that (on a git branch), then I get this: #+RESULTS: pascals-triangle | 1 | ||| | | | 1 | 1 ||| | | | 1 | 2 | 1 || | | | 1 | 3 | 3 | 1 | | | | 1 | 4 | 6 | 4 | 1 | | | 1 | 5 | 10 | 10 | 5 | 1 | #+NAME: sanity-check #+HEADER: :var sc_input=pascals-triangle #+BEGIN_SRC R sc_input #+END_SRC #+RESULTS: sanity-check | 1 | nil | nil | nil | nil | | 1 | 1 | nil | nil | nil | | 1 | 2 | 1 | nil | nil | | 1 | 3 | 3 | 1 | nil | | 1 | 4 | 6 | 4 | 1 | | 1 | 5 | 10 | 10 | 5 | | 1 | nil | nil | nil | nil | which isn't correct, but gets past the scan error. I'm in over my head here, but hope that my curiosity hasn't been too noisy. All the best, Tom Thanks. -- Mike Appendix: R code that correctly reads and processes a Pascal's triangle === system(cat pascal.dat) 1 1 1 1 2 1 1 3 3 1 1 4 6 4 1 x - read.table(pascal.dat, fill=TRUE) x V1 V2 V3 V4 V5 1 1 NA NA NA NA 2 1 1 NA NA NA 3 1 2 1 NA NA 4 1 3 3 1 NA 5 1 4 6 4 1 y - as.matrix(x) y V1 V2 V3 V4 V5 [1,] 1 NA NA NA NA [2,] 1 1 NA NA NA [3,] 1 2 1 NA NA [4,] 1 3 3 1 NA [5,] 1 4 6 4 1 y[is.na(y)] - 0 y V1 V2 V3 V4 V5 [1,] 1 0 0 0 0 [2,] 1 1 0 0 0 [3,] 1 2 1 0 0 [4,] 1 3 3 1 0 [5,] 1 4 6 4 1 dimnames(y)[[2]]=NULL cosmetic change y [,1] [,2] [,3] [,4] [,5] [1,] 1 0 0 0 0 [2,] 1 1 0 0 0 [3,] 1 2 1 0 0 [4,] 1 3 3 1 0 [5,] 1 4 6 4 1 -- Thomas S. Dye http://www.tsdye.com
Re: [O] Babel: communicating irregular data to R source-code block
If I add fill=TRUE to that (on a git branch), then I get this: #+RESULTS: pascals-triangle | 1 | ||| | | | 1 | 1 ||| | | | 1 | 2 | 1 || | | | 1 | 3 | 3 | 1 | | | | 1 | 4 | 6 | 4 | 1 | | | 1 | 5 | 10 | 10 | 5 | 1 | #+NAME: sanity-check #+HEADER: :var sc_input=pascals-triangle #+BEGIN_SRC R sc_input #+END_SRC #+RESULTS: sanity-check | 1 | nil | nil | nil | nil | | 1 | 1 | nil | nil | nil | | 1 | 2 | 1 | nil | nil | | 1 | 3 | 3 | 1 | nil | | 1 | 4 | 6 | 4 | 1 | | 1 | 5 | 10 | 10 | 5 | | 1 | nil | nil | nil | nil | which isn't correct, but gets past the scan error. Hmm, this happens with my patch applied as well. It seems to me this *must* be an R error. The raw textual data pre-import has no such wrap. 1 1 1 1 2 1 1 3 3 1 1 4 6 4 1 1 5 10 10 5 1 Why would R intentionally wrap a table at an arbitrary column? I'm in over my head here, but hope that my curiosity hasn't been too noisy. Me too. Unless someone who is familiar with the motivations and design decisions behind R's read.table function, I'm inclined to leave the current Org-mode code as is. Thanks, -- Eric Schulte http://cs.unm.edu/~eschulte/
Re: [O] Babel: communicating irregular data to R source-code block
t...@tsdye.com (Thomas S. Dye) writes: Aloha Michael, Michael Hannon jm_han...@yahoo.com writes: Greetings. I'm sitting in on a weekly, informal, brown-bag seminar on data technologies in statistics. There are more people attending the seminar than there are weeks in which to give talks, so I may get by with being my usual, passive-slug self. But I thought it might be useful to have a contingency plan and decided that giving a brief talk about Babel might be useful/instructive. I thought (and think) that mushing together (with attribution) some of the content of the paper [1] by The Gang of Four and the content of Eric's talk [2] might be a good approach. (BTW, if this isn't legal, desirable, permissible, etc., this would be a good time to tell me.) I would be happy for you to re-use these materials. I liked the Pascal's Triangle example (which morphed from elisp to Python, or vice versa, in the two references), but I was afraid that the elisp routine pst-check, used as a check on the correctness of the previously-generated Pascal's triangle, might be too esoteric for this audience, not to mention me. (The recursive Fibonacci function is virtually identical in all languages, but the second part is more obscure.) I was giving a presentation to a local lisp/scheme user group, so I figured I'd spare them the pain of trying to read python code :). I thought it should be possible to use R to do the same sanity check, as R would be much more-familiar to this audience (and its use would still demonstrate the meta-language feature of Babel). Unfortunately, I haven't been able to find a way to communicate the output of the Pascal's Triangle example to an R source-code block. The gist of the problem seems to be that regardless of how I try to grab the data (scan, readLines, etc.) Babel always ends up trying to read a data frame (table) and I get an error similar to: I present some options below specific to Tom's discussion, but another option may be to use the :results output option on a python code block which prints the table to STDOUT, and then use something line readLines to read from the resulting string into R. Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 1 did not have 5 elements Enter a frame number, or 0 to exit 1: read.table(/tmp/babel-3780tje/R-import-3780Akj, header = FALSE, row.names = NULL, sep = If I construct a table by hand with all of the cells occupied, everything goes OK. For instance: #+TBLNAME: some-junk | 1 | 0 | 0 | 0 | | 1 | 1 | 0 | 0 | | 1 | 2 | 1 | 0 | | 1 | 3 | 3 | 1 | #+NAME: read-some-junk(sj_input=some-junk) #+BEGIN_SRC R rowSums(sj_input) #+END_SRC #+RESULTS: read-some-junk | 1 | | 2 | | 4 | | 8 | But the following gives the kind of error I described above: #+name: pascals_triangle #+begin_src python :var n=5 :exports none :return pascals_triangle(5) def pascals_triangle(n): if n == 0: return [[1]] prev_triangle = pascals_triangle(n-1) prev_row = prev_triangle[n-1] this_row = map(sum, zip([0] + prev_row, prev_row + [0])) return prev_triangle + [this_row] pascals_triangle(n) #+end_src A few things are wrong at this point. It seems the JSS article has an error in the header of the pascals_triangle source block. AFAIK there is no header argument :return. I don't know how :return pascals_triangle(5) got there, but am fairly certain it shouldn't be. The :return header argument *is* a supported header argument of python code blocks and is not an error. The python code block should run w/o error and without the extra return pascals_triangle(n) at the bottom. The following works for me. #+name: pascals_triangle #+begin_src python :var n=5 :exports none :return pascals_triangle(5) def pascals_triangle(n): if n == 0: return [[1]] prev_triangle = pascals_triangle(n-1) prev_row = prev_triangle[n-1] this_row = map(sum, zip([0] + prev_row, prev_row + [0])) return prev_triangle + [this_row] #+end_src #+RESULTS: pascals_triangle | 1 | ||| | | | 1 | 1 ||| | | | 1 | 2 | 1 || | | | 1 | 3 | 3 | 1 | | | | 1 | 4 | 6 | 4 | 1 | | | 1 | 5 | 10 | 10 | 5 | 1 | [...] I vaguely remember that it once was possible to pass variables in through the name line, but I couldn't find this syntax in some fairly recent documentation. This style of passing arguments is still supported, but not necessarily encouraged by the documentation. It does appear to work still using a recent Org-mode. If I rename the results and then pass that to the source code block, all is well. #+RESULTS: pascals-tri | 1 | ||| | | | 1 | 1 ||| | | | 1 | 2 | 1 || | | | 1 | 3 | 3 | 1 | | | | 1 | 4 | 6 | 4 | 1 | | | 1 | 5 | 10 | 10 | 5 | 1 | #+name: pst-checkR(p=pascals-tri) #+BEGIN_SRC R p #+END_SRC #+RESULTS:
Re: [O] Babel: communicating irregular data to R source-code block
Aloha Michael, Michael Hannon jm_han...@yahoo.com writes: Greetings. I'm sitting in on a weekly, informal, brown-bag seminar on data technologies in statistics. There are more people attending the seminar than there are weeks in which to give talks, so I may get by with being my usual, passive-slug self. But I thought it might be useful to have a contingency plan and decided that giving a brief talk about Babel might be useful/instructive. I thought (and think) that mushing together (with attribution) some of the content of the paper [1] by The Gang of Four and the content of Eric's talk [2] might be a good approach. (BTW, if this isn't legal, desirable, permissible, etc., this would be a good time to tell me.) I liked the Pascal's Triangle example (which morphed from elisp to Python, or vice versa, in the two references), but I was afraid that the elisp routine pst-check, used as a check on the correctness of the previously-generated Pascal's triangle, might be too esoteric for this audience, not to mention me. (The recursive Fibonacci function is virtually identical in all languages, but the second part is more obscure.) I thought it should be possible to use R to do the same sanity check, as R would be much more-familiar to this audience (and its use would still demonstrate the meta-language feature of Babel). Unfortunately, I haven't been able to find a way to communicate the output of the Pascal's Triangle example to an R source-code block. The gist of the problem seems to be that regardless of how I try to grab the data (scan, readLines, etc.) Babel always ends up trying to read a data frame (table) and I get an error similar to: Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 1 did not have 5 elements Enter a frame number, or 0 to exit 1: read.table(/tmp/babel-3780tje/R-import-3780Akj, header = FALSE, row.names = NULL, sep = If I construct a table by hand with all of the cells occupied, everything goes OK. For instance: #+TBLNAME: some-junk | 1 | 0 | 0 | 0 | | 1 | 1 | 0 | 0 | | 1 | 2 | 1 | 0 | | 1 | 3 | 3 | 1 | #+NAME: read-some-junk(sj_input=some-junk) #+BEGIN_SRC R rowSums(sj_input) #+END_SRC #+RESULTS: read-some-junk | 1 | | 2 | | 4 | | 8 | But the following gives the kind of error I described above: #+name: pascals_triangle #+begin_src python :var n=5 :exports none :return pascals_triangle(5) def pascals_triangle(n): if n == 0: return [[1]] prev_triangle = pascals_triangle(n-1) prev_row = prev_triangle[n-1] this_row = map(sum, zip([0] + prev_row, prev_row + [0])) return prev_triangle + [this_row] pascals_triangle(n) #+end_src A few things are wrong at this point. It seems the JSS article has an error in the header of the pascals_triangle source block. AFAIK there is no header argument :return. I don't know how :return pascals_triangle(5) got there, but am fairly certain it shouldn't be. Second is the last line of the source block. It should read: return pascals_triangle(n) Third, in the JSS article the name of the source code block is pascals-triangle, to distinguish it from the name of the python function pascals_triangle (note the underscore in place of the hyphen). So, with these changes made, I have this, which works for me: #+name: pascals-triangle #+begin_src python :var n=5 :exports none def pascals_triangle(n): if n == 0: return [[1]] prev_triangle = pascals_triangle(n-1) prev_row = prev_triangle[n-1] this_row = map(sum, zip([0] + prev_row, prev_row + [0])) return prev_triangle + [this_row] return pascals_triangle(n) #+end_src #+RESULTS: pascals-triangle | 1 | ||| | | | 1 | 1 ||| | | | 1 | 2 | 1 || | | | 1 | 3 | 3 | 1 | | | | 1 | 4 | 6 | 4 | 1 | | | 1 | 5 | 10 | 10 | 5 | 1 | #+CALL: pascals-triangle(5) #+RESULTS: pascals-triangle(5) | 1 | ||| | | | 1 | 1 ||| | | | 1 | 2 | 1 || | | | 1 | 3 | 3 | 1 | | | | 1 | 4 | 6 | 4 | 1 | | | 1 | 5 | 10 | 10 | 5 | 1 | #+RESULTS: pascals_triangle | 1 | | | | | | | 1 | 1 | | | | | | 1 | 2 | 1 | | | | | 1 | 3 | 3 | 1 | | | | 1 | 4 | 6 | 4 | 1 | | | 1 | 5 | 10 | 10 | 5 | 1 | #+name: pst-checkR(pas_inputs=pascals_triangle) #+BEGIN_SRC R rowSums(pas_inputs) #+END_SRC I vaguely remember that it once was possible to pass variables in through the name line, but I couldn't find this syntax in some fairly recent documentation. It does appear to work still using a recent Org-mode. If I rename the results and then pass that to the source code block, all is well. #+RESULTS: pascals-tri | 1 | ||| | | | 1 | 1 ||| | | | 1 | 2 | 1 || | | | 1 | 3 | 3 | 1 | | | | 1 | 4 | 6 | 4 | 1 | | | 1 | 5 | 10 | 10 | 5 | 1 | #+name: pst-checkR(p=pascals-tri) #+BEGIN_SRC R p