Re: DBD::CSV and skip_first_line
On 11/26/2012 11:56 AM, H.Merijn Brand wrote: >> I still think it would be easier if skip_first_line were not presumed >> > (forced to) false if col_names is set, but rather presumed false only if >> > not set explicitly true. > > We agree, investigating what is actually required (and should be > documented) Just curious whether any progress had been made on this front since we last spoke about it. -- (please respond to the list as opposed to my email box directly, unless you are supplying private information you don't want public on the list)
Re: DBD::CSV and skip_first_line
On 11/26/2012 11:56 AM, H.Merijn Brand wrote: > Can you attach the first 4 lines of your csv datafile? Here is some randomized data that closely resembles the data in the csv if this is any help in working with variations on $dbh->prepare(q{Select key, PHM_ID, DAW_CD, GENBRND_CD from clms limit 10}); (bearing in mind the csv contains 44, not 4 columns and this is just a sample) key|PHM_ID|DAW_CD|GENBRND_CD 667291120KNM4728|P1951532|2|0 858525298EEA3248|P8697017|5|0 286424010HTG2644|P8607393|3|1 344987842DYH2950|P8662248|3|0 225509049XEU3393|P1222508|1|0 061473729SFZ1183|P2785408|6|0 370501125YPF2594|P1534462|2|0 620354050CRF3119|P4438944|3|1 901228431AUF5822|P5315769|1|0 969358370QPO9757|P1523687|8|0 543692286WTA5861|P5993819|1|0 591327753QVR5452|P1013462|4|0 159204117LXL0308|P5358769|8|1 352853355KYT5615|P2810873|3|1 195099617GNE7056|P1306424|6|0 -- Scott R. Godin, Senior Programmer MAD House Graphics 302.468.6230 - main office 302.722.5623 - home office
Re: DBD::CSV and skip_first_line
On 11/26/2012 11:56 AM, H.Merijn Brand wrote: > On Mon, 26 Nov 2012 11:49:49 -0500, "Scott R. Godin" > wrote: > > On 11/25/2012 04:16 AM, Jens Rehsack wrote: >>> On 25.11.12 10:00, H.Merijn Brand wrote: On Fri, 23 Nov 2012 17:43:50 -0500, "Scott R. Godin" wrote: > I've run into an issue where I need both col_names set and > skip_first_line still set to TRUE, because of malformed colnames in the > original dumpfiles that conflict with SQL Reserved Words (such as > 'key') > that I am unable to find any other acceptable workaround short of Why not automate the hacking using Text::CSV_XS and rewrite the header before using DBD::CSV? >>> Or simply quote the column names in your SQL statement? >> I tried various quoting mechanisms up to and including attempting to use >> backticks, but all result in errors of one kind or another > Can you attach the first 4 lines of your csv datafile? Unfortunately, no, as I am under HIPAA restrictions. key consists of seemingly random alphanumeric [A-Z0-9] sequences that may or may not contain one dash (at about position 11), of 16-char length PHM_ID consists of P\d{7} and may repeat across records of the other two fields DAW_CD is numeric(1) and GENBRND_CD is boolean all records are pipe-delimited The actual csv contains 44 columns; in the interest of brevity I limited the sample to the below four. :) >> $dbh->prepare(q{Select 'key', PHM_ID, DAW_CD, GENBRND_CD from clms limit >> 10}) >> results in every record having the literal value "key" for the column `key` >> same if I try select 'key' as PKEY >> >> if I switch to double-quotes rather than single quotes around "key" in >> the above, I get the following error: >> Execution ERROR: No such column '"key"' called from clms_test.pl at 23. >> >> I'll look into playing with Text::CSV_XS, and see what I can come up with. >> >> I still think it would be easier if skip_first_line were not presumed >> (forced to) false if col_names is set, but rather presumed false only if >> not set explicitly true. > We agree, investigating what is actually required (and should be > documented) > -- Scott R. Godin, Senior Programmer MAD House Graphics 302.468.6230 - main office 302.722.5623 - home office
Re: DBD::CSV and skip_first_line
On 11/25/2012 04:16 AM, Jens Rehsack wrote: > On 25.11.12 10:00, H.Merijn Brand wrote: >> On Fri, 23 Nov 2012 17:43:50 -0500, "Scott R. Godin" >> wrote: >> >>> I've run into an issue where I need both col_names set and >>> skip_first_line still set to TRUE, because of malformed colnames in the >>> original dumpfiles that conflict with SQL Reserved Words (such as >>> 'key') >>> that I am unable to find any other acceptable workaround short of >> >> Why not automate the hacking using Text::CSV_XS and rewrite the header >> before using DBD::CSV? > > Or simply quote the column names in your SQL statement? I tried various quoting mechanisms up to and including attempting to use backticks, but all result in errors of one kind or another $dbh->prepare(q{Select 'key', PHM_ID, DAW_CD, GENBRND_CD from clms limit 10}) results in every record having the literal value "key" for the column `key` same if I try select 'key' as PKEY if I switch to double-quotes rather than single quotes around "key" in the above, I get the following error: Execution ERROR: No such column '"key"' called from clms_test.pl at 23. I'll look into playing with Text::CSV_XS, and see what I can come up with. I still think it would be easier if skip_first_line were not presumed (forced to) false if col_names is set, but rather presumed false only if not set explicitly true. -- Scott R. Godin, Senior Programmer MAD House Graphics 302.468.6230 - main office 302.722.5623 - home office
Re: DBD::CSV and skip_first_line
On Mon, 26 Nov 2012 12:29:25 -0500, "Scott R. Godin" wrote: > On 11/26/2012 11:56 AM, H.Merijn Brand wrote: > > On Mon, 26 Nov 2012 11:49:49 -0500, "Scott R. Godin" > > wrote: > > > > On 11/25/2012 04:16 AM, Jens Rehsack wrote: > >>> On 25.11.12 10:00, H.Merijn Brand wrote: > On Fri, 23 Nov 2012 17:43:50 -0500, "Scott R. Godin" > wrote: > > I've run into an issue where I need both col_names set and > > skip_first_line still set to TRUE, because of malformed colnames in the > > original dumpfiles that conflict with SQL Reserved Words (such as > > 'key') > > that I am unable to find any other acceptable workaround short of > Why not automate the hacking using Text::CSV_XS and rewrite the header > before using DBD::CSV? > >>> Or simply quote the column names in your SQL statement? > >> I tried various quoting mechanisms up to and including attempting to use > >> backticks, but all result in errors of one kind or another > > Can you attach the first 4 lines of your csv datafile? > Unfortunately, no, as I am under HIPAA restrictions. > > key consists of seemingly random alphanumeric [A-Z0-9] sequences that > may or may not contain one dash (at about position 11), of 16-char length > PHM_ID consists of P\d{7} and may repeat across records > of the other two fields DAW_CD is numeric(1) and GENBRND_CD is boolean > all records are pipe-delimited Well, in that case, I'd use Text::CSV_XS to rewrite the data before using DBD::CSV to use it. I do sorta the same on a regular basis, as my customers mak typos in the headers and change them all the time. In any case, "key" quoted/unquoted has no magical value at all. The key-name is what you pass in in column_names or in the first line. That the first line should be skippable when specifying your own column names is already agreed on. > The actual csv contains 44 columns; in the interest of brevity I limited > the sample to the below four. :) > >> $dbh->prepare(q{Select 'key', PHM_ID, DAW_CD, GENBRND_CD from clms limit > >> 10}) > >> results in every record having the literal value "key" for the column `key` > >> same if I try select 'key' as PKEY --8<--- example of CSV rewrites written without checking from top of my head my $csv_o = Text::CSV_XS->new ({ binary=> 1, auto_diag => 1, eol => "\n", }); my $csv_i = Text::CSV->new ({ binary=> 1, auto_diag => 1, sep_char => "|", # | often requires allow_white_space => 1 }); my %hdr; while () { m/^#/ and next; my ($fld, $text) = m/^(\S+)\s+(.*)/ or next; $hdr{$text} = $fld; } my @hdr; for (@{$csv_i->getline ($fh)}) { if (exists $hdr{$_}) { push @hdr, $hdr{$_}; next; } # special hardcoded fields if (m/^[A-Za-z]\w{0,7}-?\w{1,12}$/) { push @hdr, "key"; # This is a key? next; } die "I do not know how to translate '$_' to something useful"; } my %rec; $csv_i->bind_columns (\@rec{@hdr}); $csv_o->print ($out, \@hdr); while ($csv_i->getline ($fh)) { $csv_o->print (\@rec{@hdr}); } __END__ # Wanted Seen in their shit c_foo code foo c_foo kode foo c_foo kode-foo c_foo kode_foo foodescription of foo fooomschrijving foo fooomschr. foo -->8--- -- H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using perl5.00307 .. 5.17 porting perl5 on HP-UX, AIX, and openSUSE http://mirrors.develooper.com/hpux/http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
Re: DBD::CSV and skip_first_line
On Mon, 26 Nov 2012 11:49:49 -0500, "Scott R. Godin" wrote: > > On 11/25/2012 04:16 AM, Jens Rehsack wrote: > > On 25.11.12 10:00, H.Merijn Brand wrote: > >> On Fri, 23 Nov 2012 17:43:50 -0500, "Scott R. Godin" > >> wrote: > >> > >>> I've run into an issue where I need both col_names set and > >>> skip_first_line still set to TRUE, because of malformed colnames in the > >>> original dumpfiles that conflict with SQL Reserved Words (such as > >>> 'key') > >>> that I am unable to find any other acceptable workaround short of > >> > >> Why not automate the hacking using Text::CSV_XS and rewrite the header > >> before using DBD::CSV? > > > > Or simply quote the column names in your SQL statement? > > I tried various quoting mechanisms up to and including attempting to use > backticks, but all result in errors of one kind or another Can you attach the first 4 lines of your csv datafile? > $dbh->prepare(q{Select 'key', PHM_ID, DAW_CD, GENBRND_CD from clms limit > 10}) > results in every record having the literal value "key" for the column `key` > same if I try select 'key' as PKEY > > if I switch to double-quotes rather than single quotes around "key" in > the above, I get the following error: > Execution ERROR: No such column '"key"' called from clms_test.pl at 23. > > I'll look into playing with Text::CSV_XS, and see what I can come up with. > > I still think it would be easier if skip_first_line were not presumed > (forced to) false if col_names is set, but rather presumed false only if > not set explicitly true. We agree, investigating what is actually required (and should be documented) -- H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using perl5.00307 .. 5.17 porting perl5 on HP-UX, AIX, and openSUSE http://mirrors.develooper.com/hpux/http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
Re: DBD::CSV and skip_first_line
On 25.11.12 10:00, H.Merijn Brand wrote: On Fri, 23 Nov 2012 17:43:50 -0500, "Scott R. Godin" wrote: I've run into an issue where I need both col_names set and skip_first_line still set to TRUE, because of malformed colnames in the original dumpfiles that conflict with SQL Reserved Words (such as 'key') that I am unable to find any other acceptable workaround short of Why not automate the hacking using Text::CSV_XS and rewrite the header before using DBD::CSV? Or simply quote the column names in your SQL statement? hacking the dumpfiles prior to import for unit-testing and validation prior to splitting the dumpfiles out into a normalized sql database. (I'll eed to hand this off to someone else for future dump-and-import work, so it's just got to WORK with these ACCESS database dump files as-is, plus HIPAA rules about not changing data complicates matters) Is there any way to ensure that despite col_names being set, I can still force skip_first_line => 1 ? or should I report this as a possible edge-case bug There should be, and it is likely that this case is already fixed in the current development state of DBD::CSV by the valuable work of Jens Rehsack, but that state is not (yet) releasable as it depends on changes in SQL::Statement and DBI (DBD::File). Well, since we're both busy - we need to find an hour or so to integrate. I do not expect a general issue with DBI-1.622, I expect some edge cases we need to tidy up. to sum up, col names present in csv but bad (sql reserved words) can use col_names => [ @ary ], but this sets skip_first_line to FALSE as it *assumes* that colnames are NOT present in original dump what do ? :) Rewrite the headers with Text::CSV_XS before using DBD::CSV Or simply quote the reserved cols :) Cheers -- Jens Rehsack
Re: DBD::CSV and skip_first_line
On Fri, 23 Nov 2012 17:43:50 -0500, "Scott R. Godin" wrote: > I've run into an issue where I need both col_names set and > skip_first_line still set to TRUE, because of malformed colnames in the > original dumpfiles that conflict with SQL Reserved Words (such as 'key') > that I am unable to find any other acceptable workaround short of Why not automate the hacking using Text::CSV_XS and rewrite the header before using DBD::CSV? > hacking the dumpfiles prior to import for unit-testing and validation > prior to splitting the dumpfiles out into a normalized sql database. > (I'll eed to hand this off to someone else for future dump-and-import > work, so it's just got to WORK with these ACCESS database dump files > as-is, plus HIPAA rules about not changing data complicates matters) > > Is there any way to ensure that despite col_names being set, I can still > force skip_first_line => 1 ? or should I report this as a possible > edge-case bug There should be, and it is likely that this case is already fixed in the current development state of DBD::CSV by the valuable work of Jens Rehsack, but that state is not (yet) releasable as it depends on changes in SQL::Statement and DBI (DBD::File). > to sum up, > > col names present in csv but bad (sql reserved words) > can use col_names => [ @ary ], but this sets skip_first_line to FALSE as > it *assumes* that colnames are NOT present in original dump > > what do ? :) Rewrite the headers with Text::CSV_XS before using DBD::CSV -- H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using perl5.00307 .. 5.17 porting perl5 on HP-UX, AIX, and openSUSE http://mirrors.develooper.com/hpux/http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/