Re: New DBD::File feature - can we do more with it?

Jens Rehsack Wed, 30 Jun 2010 01:20:50 -0700

On 06/30/10 06:37, H.Merijn Brand wrote:

On Tue, 29 Jun 2010 19:12:19 +0000, Jens Rehsack
<rehs...@googlemail.com>  wrote:

Hi,

because of a bogus implementation for PRECISION, NULLABLE etc.
attributes in DBD::CSV (forced by limitations of SQL::Statement)
we have a new attribute for tables in DBD::File meta data:
'table_defs'. This is filled when a 'CREATE TABLE ...' is executed
and copies the $stmt->{table_defs} structure (containing column
name and some more information - what ever could be specified
using (ANSI) SQL to create tables).

Could it makes sense to have a DBD::File supported way to store
and load this meta-data (serialized, of course)?


I'm all in favor of saving data-dictionary info in some persistent
way, but as Tim said, not by default.

It should be a user option, and the interface should be configurable
DBD::File should support what is installed, but only *if* it is
installed and available. I personally would prefer JSON, with YAML
on second position both very well fit the bill for DBD dict storage
YAML is available for CPAN

   my $dbh = DBI->connect ("dbi:CSV:", undef, undef, {
       f_dir      =>  "data",
       f_ext      =>  ".csv/r",
       f_encoding =>  "utf8",
       f_schema   =>  undef,
       f_dict     =>  $dict,
       }) or die DBI->errstr;

where $dict is
1. A hash           see below
2. A hashref        ref to 1.
3. A filename       filename
4. A listref        [ filename, storage type ]

The hash/ref from the DDD can be read from or written to file in case
of 3 or 4. this way we are backward compatible and support more than
ever before. The content of the hash should be very well documented and
all errors in it should optionally be ignored.

"storage type" can be any means of persistence: "JSON", "YAML",
"Storable", "Freeze", where "Storable" is the default as it is
available in CORE since ages.

The hash could be something like ...

my $hash = {
     foo  =>  [
         [ "bar",    [ "numeric", 4 ], [ "not null", "primary key" ] ],
         [ "baz",    [ "integer"    ],                               ],
         [ "drz",    [ "char",   60 ],                               ],
         [ "fbg",    [ "numeric", 2, 2 ], [ ], [ "default", 0.0 ],   ],
         [ "c_base", [ "numeric", 4 ], [ "not null" ]                ],
         ],
     base =>  [
         [ "c_base", [ "numeric", 4 ], [ "not null", "primary key" ] ],
         [ ...
     :
     "\cA:links" =>  [
         [ "foo.bar" =>  "base.c_base" ],
         :

that was just a braindump.

I would really like to do this - this would bring us a big step
in the right direction.

DBD::DBM could store it in it's meta-data (instead of saving column
names it could safe the entire table_defs structure), but what should
DBD::CSV do?

Best regards,
Jens


We talked in IRC about the above ...

Summary: The main intention points to dictionary support for databases.
This will require as well SQL::Statement 2 and increased feature realm in
DBI::DBD::SqlEngine. Currently I'm unsure how we deal with DBI::SQL::Nano
regarding that (separate mail).

IRC backlog:
<@[Tux]> I'm all in favor of saving data-dictionary info in some persistent way.
<@[Tux]> But as timbo said, not by default
<@[Tux]> it should be a user option, and the interface should be configurable

<@[Tux]> DBD::File should support what is installed, but only *if* it isinstalled

<@[Tux]> and available
<@[Tux]> I personally would prefer JSON, with YAML on second position
<@[Tux]> both very well fit the bill for DBD dict storage
<@[Tux]> YAML is available for CPAN

<@[Tux]> DBI->connect ("dbi:CSV:", undef, undef, { f_dir => "data", f_dict=> ... });

<@[Tux]> where ... is
<@[Tux]> 1. A filename
<@[Tux]> 2. A hash
<@[Tux]> 3. A hashref

<@[Tux]> The hash/ref from the DDD can be read from or written to file incase of 1.

<@[Tux]> this way we are backward compatible and support more than ever before
* You are now known as Sno|
<@Sno|> timbo? Do you mean Darren?

<@Sno|> [Tux]: your f_dict => parameter list doesn't fit requirements ofDBD::DBM

<@[Tux]> it was just a basic brain dump

<@[Tux]> all "\cA:..." entries are non-tabel names in that example, but itmight be better to change that to<@[Tux]> { tables => { foo => [ ... ], ... }, links => { ... }, indices => {... }};

<@[Tux]> extend to fit requirement of whatevah

<@Sno|> well, what's with the idea to add an optional second header line toa csv file?

<@[Tux]> no
<@Sno|> ok, this was clear :)
<@[Tux]> that would mean it is not a CSV file anymore
<@[Tux]> CSV's are MOSTLY from exports and for imports
<@[Tux]> if you alter the content just for DBD, it would defeat all other uses
<@Sno|> ok

<@Sno|> than DBD::File should have at least 2 parameters: f_serializer andf_dict

<@[Tux]> why serializer?
<@Sno|> to choose JSON, YAML etc.
<@[Tux]> see option 4
<@[Tux]> f_dict => [ "ddd.json", "JSON" ],
<@Sno|> I'd prefer to handle those separate
<@[Tux]> I don't
<@Sno|> because of, DBD::DBM would not store that in a separate file
<@[Tux]> f_dict => [ undef, "JSON" ],
<@Sno|> so DBD::DBM users would be forced to use f_dict => [ undef, "JSON" ]
<@[Tux]> WHICH IS *VERY* CLEAR!
<@Sno|> which is very error-prone
<@[Tux]> I don't think so
<@[Tux]> but the whole idea of f_dict is error-prone to start with
<@Sno|> try to reduce your intelligence to 20% (being normal project coder)

<@[Tux]> if the files (dbm or csv) are not in agreement with the dict,things will break

<@[Tux]> I was actually
<@[Tux]> reading just about f_dict in the doc focusses on what I actually want
<@Sno|> DBD::DBM will store it's stuff in it's special "metadata \0" key
<@[Tux]> having to read two entries will make me forget

<@[Tux]> and it would even make sense to enable DBM users to store themetadata in a SEPARATE file

<@Sno|> in principle yes
<@[Tux]> which can then be used for another type if data space (like CSV)
<@Sno|> would make DBM files of DBD::DBM more portable
<@[Tux]> so one single entry makes more and more sense the more I think about it

<@Sno|> I don't like it (you still haven't convinced me) - at least itbreaks DSN lines (f_dict cannot specified in DSN and not added by datasources())<@[Tux]> as you see, my mind had to settle on your quest before I came upwith a reasonable answer :)<@Sno|> having 2 parameters, datasources() could scan for those files, too,and add them to the returned dsn's<@[Tux]>DBI_DSN="dbi:CSV:f_ext=.csv/r;f_dir=.;f_encoding=utf8;f_dict=csv.ddd/json"

<@Sno|> cheater :)
<@[Tux]> No, completely in line with f_ext!
<@[Tux]> and as / is invalid for filenames, it fits the bill

<@Sno|> is it reasonable to have the dictionary in a separate directory (notby default, but possible)?

* [Tux] must have been clearvoyant when choosing / as sel :P
<@Sno|> f_dict => [ "dict/file", "JSON" ]
<@[Tux]> possibly yes, but we can say NO

<@Sno|> I tend to say yes, but the first argument in f_dict could be adirectory in that case, too

<@[Tux]> if people do want that, let them make a (sym)link
<@Sno|> win32 can't do symlinks
<@[Tux]> lets restrict ourselves to a file
<@[Tux]> win32 can do a copy
<@Sno|> copy is bad - you must copy in any direction

<@[Tux]> if users realy want it in another dir, not supporting it from theDSN sounds as a reasonable restriction<@Sno|> from special location to data file location before and copy back(ensure no modification) after

<@[Tux]> otoh if we restrict ourselves to json/storable/yaml,

<@Sno|> I can argue that the support for a separate dir is easy when weremove the "restriction" to one argument

<@[Tux]> $dict =~ s{/(json|yaml|storable)$}{}i and $fmt = uc $1;
<ribasushi> why simply not f_dict_fmt and f_dict_file
<ribasushi> clear, verbose, nice

<@[Tux]> and if you are to say, what if the folder is json? then use atrailing /

* ribasushi has no idea what the above does, just saw the interface bickering
<@[Tux]> and support both?
<@[Tux]> why not :)
<@Sno|> ribasushi: do you read dbi-...@perl.org?
<ribasushi> no, never used the meta-DBDs either
<@Sno|> [Tux]: supporting both could be a way out ...

<@[Tux]> the fact that we are no argueing about the invocation means that weactually agree on the basic back-bone

<@[Tux]> which is GOOD
<@Sno|> and we internally separate f_dict => (f_dict_fmt and f_dict_file)
<@[Tux]> yes
<@[Tux]> with methods

<ribasushi> and iff f_dict breaks down into a _fmt or _file, and you havethat already specified you die a horrible death

<ribasushi> (pols)
<@Sno|> well, f_dict_ext and f_dict_dir could be reasonable, too
* [Tux] charges his fire-bolt gun
<@Sno|> table2file could expand properly

<@Sno|> so table "foo" uses foo.csv as data file, foo.csv.tdf as table-defmeta data file

<@[Tux]> FWIW the / sep method should *only* be supported from DSN
<@[Tux]> not from attribs
<@[Tux]> Snol, no
<@Sno|> why not?
<@[Tux]> there should be no default dict name/ext
<@[Tux]> and the def for all tables should exist in a single def file
<@Sno|> no, no default - only if f_dict_ext is specified
<ribasushi> +1 on [Tux]
<@Sno|> one single file means, we create a database
<ribasushi> defaults are bad, and perl can and should be verbose
<ribasushi> DSNs are a compromise as they are a bitch already anyway
<@[Tux]> f_dict_ext is for the storage type, not for the file name extension
<@Sno|> f_dict_fmt is for storage type o.O
<@[Tux]> then drop the idea of f_dict_ext completely
<@Sno|> we do not support entire databases in DBD::File
* [Tux] overread that
<@[Tux]> why not
<@[Tux]> f_dir=data;f_dict=.ddd/json
<@Sno|> because it conflicts (without much more effort)
<@[Tux]> data/.ddd is a json file describing all csv's in data/

<@Sno|> 2 $dbh's - both same dir, accessing different files -> conflict indictionary<@Sno|> each table has it's own dictionary (currently), there is no global,big meta-dict

<@[Tux]> they can both pass a different f_dict file
<@[Tux]> but I would like a big meta-global ddd !
<@[Tux]> that is what a database is
<@[Tux]> that is what DBI is all about
<@[Tux]> databases do not store ddd's for a single table
<@[Tux]> databases are about integrity and references
<@Sno|> I know - a big global dict would be better
<@[Tux]> ok
<@[Tux]> pheeuw :)

<@Sno|> but currently SQL::Statement lacks support for that (so it's nottrivially possible)<@Sno|> and the available data structures would make more complicated tohave a global dictionary

<@[Tux]> don't try to implement new things with current restrictions in mind
<@[Tux]> we're moving ahead. fast.

<ribasushi> an interface should anticipate what is to come, not limit itselfwith what can be done today

<@[Tux]> ribasushi++
<@Sno|> ribasushi++
<@Sno|> but ... :)

<@Sno|> it'll a long way to add support for dictionaries to SQL::Statement(or DBI::DBD::SqlEngine)

<@[Tux]> ribasushi worded it better than I did, but meant exactly the same
<@[Tux]> long way is fine
<@[Tux]> it is NEW
<@[Tux]> mark it as experimental to start with
<@[Tux]> but do it right from the start
<@Sno|> what about support of tables in another directories?
<@[Tux]> does that matter?
<@Sno|> probably

<@[Tux]> maybe the content of a table should not be an array, as I initiallysuggested<@Sno|> let's state two things first: I like the idea of global dictionaryand I like to make new things right from beginning

<@[Tux]> tables => { foo => { fields => { ... }, file => "..." } }
<@[Tux]> or
<@[Tux]> tables => { foo => { fields => { ... }, meta => { file => "..." } }  }
<@Sno|> but I cannot imagine how I can implement it
<@[Tux]> implementation problems are to be solved later

<@Sno|> I could imagine to have a dictionary per file/table and when we'reable to support databases, than we might provide a migration way

<@[Tux]> first try to create an ideal picture of what we want
<@Sno|> the ideal picture is "create database ..."; "do ... in database"
* daxim (~da...@81.16.153.112) has joined #dbi
<@Sno|> but I see no way to implement it clean with the current abilities
<ribasushi> Sno|: you are missing the point

<@Sno|> so if we agree (or decide) that we don't want compromises forcurrent abilities, we need to add the idea to the roadmap and go on<ribasushi> the point is to design your API in such a way, that when someoneelse (possibly you) comes around and says "oh I can do this now"

<ribasushi> he doesn't follow it by "shit the API doesn't let me do it!"
<@[Tux]> current phase is "design", not "implement"

<ribasushi> Sno|: if the hard parts are never written (or perl dies asnetcraft confirms) - it's fine too<@Sno|> ribasushi: I want a new DBI release, soon - for several reasons, soI will not start developing a cripple thing which would prevent that

<@[Tux]> then do not include f_dict in the current DBI
<@Sno|> well, sad :(

<ribasushi> Sno|: it's not, a non inclusion doesn't mean "fuck it we are notdoing this"<ribasushi> it merely means "it seems we need to think about this a tadmore, so we don't get fucked later"

<ribasushi> this is not the last release of DBI afaik

<@Sno|> I hoped we can find a way to do it now with a migration plan for theperfect solution ;)<@[Tux]> Sno|, the good thing in DBI-trunk is that there is NOW a betterDBD::DBM<@[Tux]> reading above discussion, I'd say "veto to f_dict for next DBI",but add a big TODO in README/POD<@Sno|> would it be ok to post that discussion to dbi-dev@ and ask there(e.g. Tim/Martin)?

<ribasushi> Sno|: s/ok/strongly recommended/
<@[Tux]> yes

<@Sno|> [Tux]: if you have the time, maybe you could throw an eye to it andthink about

<@Sno|> [Tux]: maybe you find a way how to implement it *now*
<@Sno|> with all the possible conflicts in mind ;)

<@[Tux]> I can *think* about it, but you should know me enough by now that*now* is not an option for implementation

<@Sno|> ok

Re: New DBD::File feature - can we do more with it?

Reply via email to