Flat files are fine for large amounts of data up to maybe a few hundred MB
if the data is uniform and not too complex, say lots of equity prices.
Databases are more suitable as the data becomes more complex, say
information about various companies: their financial instruments - bonds
and different equity classes - and indicative data from balance sheets and
cashflow statements and such over time.
Flat files fall down when you need to keep track of relations between
various data items.

On Thu, Jan 7, 2021 at 5:13 PM Hauke Rehr <hauke.r...@uni-jena.de> wrote:

> Why only moderate?
> csv/tsv is amoung the best in scalability,
> way more reliable than spreadsheets
> (afaik)
> Of course, customized databases can be better.
>
> Am 07.01.21 um 23:07 schrieb Devon McCormick:
> > To be clear, I was expressing caution about spreadsheets with embedded
> > formulas and code.  Keeping data in flat files, like TSV files, is fine
> for
> > moderate amounts of data.
> >
> > On Thu, Jan 7, 2021 at 4:08 PM 'Bo Jacoby' via Programming <
> > programm...@jsoftware.com> wrote:
> >
> >>  "I am looking for a way to better organise my research. If not
> >> spreadsheets, do you have some advice on how to coordinate all this
> >> separate data in one place?"
> >> I have used ordinal fractions for structuring data since 1980. ORDINAL
> >> FRACTIONS - the algebra of data
> >>
> >> |
> >> |
> >> |
> >> |  |  |
> >>
> >>  |
> >>
> >>  |
> >> |
> >> |  |
> >> ORDINAL FRACTIONS - the algebra of data
> >>
> >> This paper was submitted to the 10th World Computer Congress, IFIP 1986
> >> conference, but rejected by the referee....
> >>  |
> >>
> >>  |
> >>
> >>  |
> >>
> >>
> >>
> >> I wrote software for processing this kind of data in fortran, BASIC, and
> >> pascal, but not (yet) in J.
> >> A BASIC program for browsing the data base is this.
> >> 1 INPUT;C$: IF C$="" THEN END
> >> 2 OPEN"CREDO" FOR INPUT AS 1: PRINT":";
> >> 3 IF EOF(1) THEN CLOSE:PRINT:GOTO 1
> >> 4 LINE INPUT#1,A$: B$=C$
> >> 5 IF A$=""THEN A%=-1 ELSE A%=ASC(A$)-48:A$=MID$(A$,2)
> >> 6 IF B$=""THEN B%=-1 ELSE B%=ASC(B$)-48:B$=MID$(B$,2)
> >> 7 IF A%<0 THEN PRINT" ";A$;:GOTO 3
> >> 8 IF A%=0 OR B%=0 OR A%=B% THEN 5 ELSE 3
> >>
> >> The test data base for illustrating the possibilities is this.
> >> 1 CREDO
> >> 11 IN
> >> 111 UNUM
> >> 11 DEUM
> >> 112 PATREM
> >> 1121 OMNIPOTENTEM
> >> 113 FACTOREM
> >> 1131 CÆLI
> >> 1139 ET
> >> 1132 TERRÆ
> >> 11331 VISIBILIUM
> >> 1133 OMNIUM
> >> 11339 ET
> >> 11332 INVISIBILIUM
> >> 19 ET
> >> 12 IN
> >> 1211 UNUM
> >> 1211 DOMINUM
> >> 12 JESUM
> >> 1211 CHRISTUM
> >> 1212 FILIUM
> >> 1212 DEI
> >> 12121 UNIGENITUM
> >> 1219 ET
> >> 1213 EX
> >> 1213 PATRE
> >> 1213 NATUM
> >> 12131 ANTE
> >> 121311 OMNIA
> >> 12131 SÆCULA
> >> 1221 DEUM
> >> 12211 DE
> >> 12211 DEO
> >> 1222 LUMEN
> >> 12221 DE
> >> 12221 LUMINE
> >> 1223 DEUM
> >> 12231 VERUM
> >> 12232 DE
> >> 12232 DEO
> >> 122321 VERO
> >> 1231 GENITUM
> >> 12311 NON
> >> 12311 FACTUM
> >> 1232 CONSUBSTANTIALEM
> >> 1232 PATRI
> >> 12321 PER
> >> 12321 QUEM
> >> 12321 OMNIA
> >> 12321 FACTA
> >> 12321 SUNT
> >> 124 QUI
> >> 124101 PROPTER
> >> 124101 NOS
> >> 12410101 HOMINES
> >> 124109 ET
> >> 124102 PROPTER
> >> 12410201 NOSTRAM
> >> 124102 SALUTEM
> >> 12411 DESCENDIT
> >> 1241101 DE
> >> 1241101 CÆLIS
> >> 12419 ET
> >> 12412 INCARNATUS EST
> >> 1241201 DE
> >> 1241201 SPIRITU 124120101 SANCTO
> >> 1241202 EX
> >> 1241202 MARIA
> >> 124120201 VIRGINE
> >> 12419 ET
> >> 1241301 HOMO
> >> 12413 FACTUS EST
> >> 124211 CRUCIFIXUS
> >> 1242101 ETIAM
> >> 1242101 PRO
> >> 1242101 NOBIS
> >> 1242102 SUB
> >> 1242102 PONTIO
> >> 1242102 PILATO
> >> 124212 PASSUS
> >> 124219 ET
> >> 124213 SEPULTUS
> >> 12421 EST
> >> 12429 ET
> >> 12422 RESURREXIT
> >> 124221 TERTIA
> >> 124221 DIE
> >> 124222 SECUMDUM
> >> 124222 SCRIPTURAS
> >> 12429 ET
> >> 12423 ASCENDIT
> >> 124231 IN
> >> 124231 CÆLUM
> >> 12424 SEDET
> >> 124241 AD
> >> 124241 DEXTERAM
> >> 124241 PATRIS
> >> 12429 ET
> >> 124251 ITERUM
> >> 12425 VENTURUS EST
> >> 124252 CUM
> >> 124252 GLORIA
> >> 124253 JUDICARE
> >> 1242531 VIVOS
> >> 1242539 ET
> >> 1242532 MORTUOS
> >> 125 CUJUS
> >> 125 REGNI
> >> 125 NON ERIT
> >> 125 FINIS
> >> 19 ET
> >> 13 IN
> >> 13 SPIRITUM
> >> 131 SANCTUM
> >> 132 DOMINUM
> >> 139 ET
> >> 133 VIVIFICANTEM
> >> 134 QUI
> >> 134 EX
> >> 1341 PATRE
> >> 1342 FILIO
> >> 1349 QUE
> >> 134 PROCEDIT
> >> 135 QUI
> >> 135 CUM
> >> 13501 PATRE
> >> 13509 ET
> >> 13502 FILIO
> >> 13509 SIMUL
> >> 1351 ADORATUR
> >> 1359 ET
> >> 1352 GLORIFICATUR
> >> 136 QUI
> >> 136 LOCUTUS EST
> >> 1361 PER
> >> 1361 PROPHETAS
> >> 19 ET
> >> 141 UNAM
> >> 142 SANCTAM
> >> 143 CATHOLICAM
> >> 149 ET
> >> 144 APOSTOLICAM
> >> 14 ECCLESIAM
> >> 2 CONFITEOR
> >> 211 UNUM
> >> 21 BAPTISMA
> >> 212 IN
> >> 212 REMISSIONEM
> >> 2121 PECCATORUM
> >> 9 ET
> >> 3 EXPECTO
> >> 31 RESURRECTIONEM
> >> 311 MORTUORUM
> >> 39 ET
> >> 32 VITAM
> >> 3211 VENTURI
> >> 321 SÆCULI
> >>  AMEN
> >>
> >> Some test runs of the program look like this.
> >> 13510: CREDO IN SPIRITUM QUI CUM PATRE ET FILIO SIMUL ADORATUR AMEN
> >> 13520: CREDO IN SPIRITUM QUI CUM PATRE ET FILIO SIMUL GLORIFICATUR AMEN
> >> 13501: CREDO IN SPIRITUM QUI CUM PATRE ADORATUR ET GLORIFICATUR AMEN
> >> 13502: CREDO IN SPIRITUM QUI CUM FILIO ADORATUR ET GLORIFICATUR AMEN
> >> 13511: CREDO IN SPIRITUM QUI CUM PATRE ADORATUR AMEN
> >> 13512: CREDO IN SPIRITUM QUI CUM FILIO ADORATUR AMEN
> >> 13521: CREDO IN SPIRITUM QUI CUM PATRE GLORIFICATUR AMEN
> >> 13522: CREDO IN SPIRITUM QUI CUM FILIO GLORIFICATUR AMEN
> >>
> >> I realize that this is not easy to understand, but I know that it is
> worth
> >> while.
> >> Good luck!
> >> Bo.    Den torsdag den 7. januar 2021 21.35.12 CET skrev Justin
> >> Paston-Cooper <paston.coo...@gmail.com>:
> >>
> >>  Thanks. I have been meaning to look at that.
> >>
> >> On Thu, 7 Jan 2021 at 23:33, Joe Bogner <joebog...@gmail.com> wrote:
> >>>
> >>> Jupyter notebooks may help you with organizing your research -
> >>> https://code.jsoftware.com/wiki/Guides/Jupyter
> >>>
> >>> This has been my preferred tool - far above Excel.
> >>>
> >>> On Thu, Jan 7, 2021 at 2:39 PM Justin Paston-Cooper <
> >> paston.coo...@gmail.com>
> >>> wrote:
> >>>
> >>>> I am open to suggestions. Right now I'm researching a lot of related
> >>>> things concurrently. I'm storing some of the results in TSV files.
> >>>> Some of the scripts are Python, some are curl | jq | awk. Some of the
> >>>> results I am storing as variables in J scripts. I am constantly going
> >>>> back and forth between differing representations, differing
> >>>> environments, recalculating things needlessly, and so on.
> >>>>
> >>>> I am looking for a way to better organise my research. If not
> >>>> spreadsheets, do you have some advice on how to coordinate all this
> >>>> separate data in one place? A Make file could be a start, but this
> >>>> doesn't satisfy the requirement of having a nice editable GUI to
> >>>> arrange and display all the separate sources of data. Maybe wd would
> >>>> be a start in that direction. I haven't researched the alternatives.
> >>>>
> >>>> How do you organise your research?
> >>>>
> >>>> Application: Researching interactions between prices of a set of
> >>>> things in each of a set of places. There are many different analyses
> >>>> that can be made. I am finding it hard to keep track of all the angles
> >>>> I have looked at. These angles all reside in separate directories,
> >>>> which is not ideal. I have hand-written notes, but those need to be
> >>>> updated by hand.
> >>>>
> >>>> By the way, I wasn't envisioning doing any calculation in the
> >>>> spreadsheet. The idea of the spreadsheet was simply to coordinate
> >>>> communication and (re)calculation between various calculation
> >>>> processes, display the results, and allow the display of the results
> >>>> to be edited.
> >>>>
> >>>> Imagine an actor system with the spreadsheet being the coordinator.
> >>>>
> >>>> On Thu, 7 Jan 2021 at 20:23, Devon McCormick <devon...@gmail.com>
> >> wrote:
> >>>>>
> >>>>> It would be remiss of me not to mention that you really ought to
> >>>>> re-consider making a spreadsheet an integral part of your design,
> >> not the
> >>>>> least due to the historically high rates of error that have been
> >> measured
> >>>>> in spreadsheets - 1 to 5%:
> >>>>> https://arxiv.org/ftp/arxiv/papers/1602/1602.02601.pdf .  It seems
> >>>>> incongruous to worry about the sixth decimal place in numbers with
> >> many
> >>>>> digits before the decimal point but ignoring error rates that dwarf
> >> this
> >>>>> imprecision.
> >>>>>
> >>>>> By way of comparison, in most code-bases where people measure
> >> errors, an
> >>>>> error rate of 10 bad lines per 1000 lines of code would be considered
> >>>>> unacceptably high.
> >>>>>
> >> ----------------------------------------------------------------------
> >>>>> For information about J forums see
> >> http://www.jsoftware.com/forums.htm
> >>>> ----------------------------------------------------------------------
> >>>> For information about J forums see
> http://www.jsoftware.com/forums.htm
> >>>>
> >>> ----------------------------------------------------------------------
> >>> For information about J forums see http://www.jsoftware.com/forums.htm
> >> ----------------------------------------------------------------------
> >> For information about J forums see http://www.jsoftware.com/forums.htm
> >>
> >> ----------------------------------------------------------------------
> >> For information about J forums see http://www.jsoftware.com/forums.htm
> >>
> >
> >
>
> --
> ----------------------
> mail written using NEO
> neo-layout.org
>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>


-- 

Devon McCormick, CFA

Quantitative Consultant
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to