etc.

Leonard Mada Tue, 16 Jan 2007 10:16:29 -0800

Hello,

Prof J C Nash wrote:
> Some of the issues being raised suggest that a spreadsheet is not the 
> right analytic tool. How about a data frame in R?


Well, this is difficult, too. When there is a bunch of diagnoses (or 
symptoms) lumped together - in one single column, that won't be easy to 
work in R either.

A much more difficult subject is when a patient stays for longer than 
one day (that is usually the case) and I need a specific string (say 
diagnoses, symptom, ...), which may happen on any of the days, BUT I 
need either the first occurrence, or the number of days with this 
diagnoses, or some more complex search. I do work extensively with R 
(that is why I posted this OOo issue, 
http://qa.openoffice.org/issues/show_bug.cgi?id=66589), but this is NO 
substitute to a spreadsheet.

Actually, spreadsheets are still the most used application in 
life-sciences. I find even Epi-Info NOT as good (though it has better 
analysis possibilities than a spreadsheet, BUT - of course - it cannot 
compete with R). Almost every doctor will use Excel and it is the de 
facto standard when doctors perform some research (I refuse to use it, 
while some epidemiologists use Epi-Info, but I believe these are mere 
exceptions).

I posted another use for the gawk, see the OOo issue 
http://qa.openoffice.org/issues/show_bug.cgi?id=66816, where I wanted to 
create some dummy variables for the medical department:

GAWK SCRIPT
#($1 contains the input - the hospital unit)
$2 = 0 # neurosurgery vs non-neurosurgery
$3 = 0 # neurology vs non-neurology
$4 = 0 # general surgery vs non-surgery
$5 = 0 # internal medicine vs non-im
$6 = 1 # ERROR var, if unknown abreviation

$0 = tolower($0)

# NEUROSURGERY
/nch/ {$2 = 1, $6 = 0 }

# Neurology
/^n[ \t]*$|^ne/ {$3 = 1, $6 =0 }

# General Surgery
/^ch/ {$4 = 1, $6 =0 }

# INTERNAL MEDICINE
/mi|end|nut/ {$5 = 1, $6 =0 }

print $0 >> 'out-file'

### END SCRIPT

Try to do this with spreadsheet functions,  and it will turn out into a 
nightmare.

gawk has many advantages and I may point another two:
- it is easy and simple, and very very fast (both to write and execute - 
even on huge datasets)
- the code is structured and visible, so it is easy to understand what 
it does (this is NOT always the case when you write complex formulas in 
the spreadsheet)

I hope these are enough reasons to implement a simple menu-entry in 
gnumeric that runs awk/gawk scripts.

Specifically:
- the user selects some cells
- chooses Menu-Entry: RUN gawk-script (a dialog box opens allowing the 
user to select the proper script)
- gnumeric should then open a bidirectional pipeline to gawk
- should add some default values for the FieldSeparator (FS) and 
RecordSeparator (RS), that should be also used to split (join) the Cells 
and Rows in the worksheet when pipelining the data stream into gawk
- gawk's output should be split back into cells (using the same FS and 
RS) (probably into a new sheet, like ANOVA)

I believe this is easy to code and quite useful.

Many thanks in advance,

Leonard Mada
_______________________________________________
gnumeric-list mailing list
gnumeric-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gnumeric-list

Re: strings in gnumeric / awk / etc.

Reply via email to