Re: DBD::CSV and skip_first_line

2013-01-29 Thread Scott R. Godin
On 11/26/2012 11:56 AM, H.Merijn Brand wrote:
>> I still think it would be easier if skip_first_line were not presumed
>> > (forced to) false if col_names is set, but rather presumed false only if
>> > not set explicitly true.
>
> We agree, investigating what is actually required (and should be
> documented)

Just curious whether any progress had been made on this front since we
last spoke about it.

-- 
(please respond to the list as opposed to my email box directly,
unless you are supplying private information you don't want public
on the list)


Re: DBD::CSV and skip_first_line

2012-11-26 Thread Scott R. Godin

On 11/26/2012 11:56 AM, H.Merijn Brand wrote:
> Can you attach the first 4 lines of your csv datafile?
Here is some randomized data that closely resembles the data in the csv
if this is any help in working with variations on

$dbh->prepare(q{Select key, PHM_ID, DAW_CD, GENBRND_CD from clms limit 10});

(bearing in mind the csv contains 44, not 4 columns and this is just a sample)


key|PHM_ID|DAW_CD|GENBRND_CD
667291120KNM4728|P1951532|2|0
858525298EEA3248|P8697017|5|0
286424010HTG2644|P8607393|3|1
344987842DYH2950|P8662248|3|0
225509049XEU3393|P1222508|1|0
061473729SFZ1183|P2785408|6|0
370501125YPF2594|P1534462|2|0
620354050CRF3119|P4438944|3|1
901228431AUF5822|P5315769|1|0
969358370QPO9757|P1523687|8|0
543692286WTA5861|P5993819|1|0
591327753QVR5452|P1013462|4|0
159204117LXL0308|P5358769|8|1
352853355KYT5615|P2810873|3|1
195099617GNE7056|P1306424|6|0


-- 
Scott R. Godin, Senior Programmer
MAD House Graphics 
 302.468.6230 - main office
 302.722.5623 - home office



Re: DBD::CSV and skip_first_line

2012-11-26 Thread Scott R. Godin
On 11/26/2012 11:56 AM, H.Merijn Brand wrote:
> On Mon, 26 Nov 2012 11:49:49 -0500, "Scott R. Godin" 
> wrote:
>
> On 11/25/2012 04:16 AM, Jens Rehsack wrote:
>>> On 25.11.12 10:00, H.Merijn Brand wrote:
 On Fri, 23 Nov 2012 17:43:50 -0500, "Scott R. Godin" 
 wrote:
> I've run into an issue where I need both col_names set and
> skip_first_line still set to TRUE, because of malformed colnames in the
> original dumpfiles that conflict with SQL Reserved Words (such as
> 'key')
> that I am unable to find any other acceptable workaround short of
 Why not automate the hacking using Text::CSV_XS and rewrite the header
 before using DBD::CSV?
>>> Or simply quote the column names in your SQL statement?
>> I tried various quoting mechanisms up to and including attempting to use
>> backticks, but all result in errors of one kind or another
> Can you attach the first 4 lines of your csv datafile?
Unfortunately, no, as I am under HIPAA restrictions.

key consists of seemingly random alphanumeric [A-Z0-9] sequences that
may or may not contain one dash (at about position 11), of 16-char length
PHM_ID consists of P\d{7} and may repeat across records
 of the other two fields DAW_CD is numeric(1) and GENBRND_CD is boolean
all records are pipe-delimited

The actual csv contains 44 columns; in the interest of brevity I limited
the sample to the below four. :)
>> $dbh->prepare(q{Select 'key', PHM_ID, DAW_CD, GENBRND_CD from clms limit
>> 10})
>> results in every record having the literal value "key" for the column `key`
>> same if I try select 'key' as PKEY
>>
>> if I switch to double-quotes rather than single quotes around "key" in
>> the above, I get the following error:
>> Execution ERROR: No such column '"key"' called from clms_test.pl at 23.
>>
>> I'll look into playing with Text::CSV_XS, and see what I can come up with.
>>
>> I still think it would be easier if skip_first_line were not presumed
>> (forced to) false if col_names is set, but rather presumed false only if
>> not set explicitly true.
> We agree, investigating what is actually required (and should be
> documented)
>

-- 
Scott R. Godin, Senior Programmer
MAD House Graphics 
 302.468.6230 - main office
 302.722.5623 - home office



Re: DBD::CSV and skip_first_line

2012-11-26 Thread Scott R. Godin

On 11/25/2012 04:16 AM, Jens Rehsack wrote:
> On 25.11.12 10:00, H.Merijn Brand wrote:
>> On Fri, 23 Nov 2012 17:43:50 -0500, "Scott R. Godin" 
>> wrote:
>>
>>> I've run into an issue where I need both col_names set and
>>> skip_first_line still set to TRUE, because of malformed colnames in the
>>> original dumpfiles that conflict with SQL Reserved Words (such as
>>> 'key')
>>> that I am unable to find any other acceptable workaround short of
>>
>> Why not automate the hacking using Text::CSV_XS and rewrite the header
>> before using DBD::CSV?
>
> Or simply quote the column names in your SQL statement?

I tried various quoting mechanisms up to and including attempting to use
backticks, but all result in errors of one kind or another

$dbh->prepare(q{Select 'key', PHM_ID, DAW_CD, GENBRND_CD from clms limit
10})
results in every record having the literal value "key" for the column `key`
same if I try select 'key' as PKEY

if I switch to double-quotes rather than single quotes around "key" in
the above, I get the following error:
Execution ERROR: No such column '"key"' called from clms_test.pl at 23.

I'll look into playing with Text::CSV_XS, and see what I can come up with.

I still think it would be easier if skip_first_line were not presumed
(forced to) false if col_names is set, but rather presumed false only if
not set explicitly true.

-- 
Scott R. Godin, Senior Programmer
MAD House Graphics 
 302.468.6230 - main office
 302.722.5623 - home office



Re: DBD::CSV and skip_first_line

2012-11-26 Thread H.Merijn Brand
On Mon, 26 Nov 2012 12:29:25 -0500, "Scott R. Godin" 
wrote:

> On 11/26/2012 11:56 AM, H.Merijn Brand wrote:
> > On Mon, 26 Nov 2012 11:49:49 -0500, "Scott R. Godin" 
> > wrote:
> >
> > On 11/25/2012 04:16 AM, Jens Rehsack wrote:
> >>> On 25.11.12 10:00, H.Merijn Brand wrote:
>  On Fri, 23 Nov 2012 17:43:50 -0500, "Scott R. Godin" 
>  wrote:
> > I've run into an issue where I need both col_names set and
> > skip_first_line still set to TRUE, because of malformed colnames in the
> > original dumpfiles that conflict with SQL Reserved Words (such as
> > 'key')
> > that I am unable to find any other acceptable workaround short of
>  Why not automate the hacking using Text::CSV_XS and rewrite the header
>  before using DBD::CSV?
> >>> Or simply quote the column names in your SQL statement?
> >> I tried various quoting mechanisms up to and including attempting to use
> >> backticks, but all result in errors of one kind or another
> > Can you attach the first 4 lines of your csv datafile?
> Unfortunately, no, as I am under HIPAA restrictions.
> 
> key consists of seemingly random alphanumeric [A-Z0-9] sequences that
> may or may not contain one dash (at about position 11), of 16-char length
> PHM_ID consists of P\d{7} and may repeat across records
>  of the other two fields DAW_CD is numeric(1) and GENBRND_CD is boolean
> all records are pipe-delimited

Well, in that case, I'd use Text::CSV_XS to rewrite the data before
using DBD::CSV to use it. I do sorta the same on a regular basis, as my
customers mak typos in the headers and change them all the time.

In any case, "key" quoted/unquoted has no magical value at all. The
key-name is what you pass in in column_names or in the first line. That
the first line should be skippable when specifying your own column
names is already agreed on.

> The actual csv contains 44 columns; in the interest of brevity I limited
> the sample to the below four. :)
> >> $dbh->prepare(q{Select 'key', PHM_ID, DAW_CD, GENBRND_CD from clms limit
> >> 10})
> >> results in every record having the literal value "key" for the column `key`
> >> same if I try select 'key' as PKEY

--8<--- example of CSV rewrites written without checking from top of my head
my $csv_o = Text::CSV_XS->new ({
binary=> 1,
auto_diag => 1,
eol   => "\n",
});
my $csv_i = Text::CSV->new ({
binary=> 1,
auto_diag => 1,
sep_char  => "|", # | often requires allow_white_space => 1
});

my %hdr;
while () {
m/^#/ and next;
my ($fld, $text) = m/^(\S+)\s+(.*)/ or next;
$hdr{$text} = $fld;
}

my @hdr;
for (@{$csv_i->getline ($fh)}) {
if (exists $hdr{$_}) {
push @hdr, $hdr{$_};
next;
}
# special hardcoded fields
if (m/^[A-Za-z]\w{0,7}-?\w{1,12}$/) {
push @hdr, "key"; # This is a key?
next;
}
die "I do not know how to translate '$_' to something useful";
}
my %rec;
$csv_i->bind_columns (\@rec{@hdr});
$csv_o->print ($out, \@hdr);
while ($csv_i->getline ($fh)) {
$csv_o->print (\@rec{@hdr});
}

__END__
# Wanted   Seen in their shit
c_foo  code foo
c_foo  kode foo
c_foo  kode-foo
c_foo  kode_foo
foodescription of foo
fooomschrijving foo
fooomschr. foo
-->8---

-- 
H.Merijn Brand  http://tux.nl   Perl Monger  http://amsterdam.pm.org/
using perl5.00307 .. 5.17   porting perl5 on HP-UX, AIX, and openSUSE
http://mirrors.develooper.com/hpux/http://www.test-smoke.org/
http://qa.perl.org   http://www.goldmark.org/jeff/stupid-disclaimers/


Re: DBD::CSV and skip_first_line

2012-11-26 Thread H.Merijn Brand
On Mon, 26 Nov 2012 11:49:49 -0500, "Scott R. Godin" 
wrote:

> 
> On 11/25/2012 04:16 AM, Jens Rehsack wrote:
> > On 25.11.12 10:00, H.Merijn Brand wrote:
> >> On Fri, 23 Nov 2012 17:43:50 -0500, "Scott R. Godin" 
> >> wrote:
> >>
> >>> I've run into an issue where I need both col_names set and
> >>> skip_first_line still set to TRUE, because of malformed colnames in the
> >>> original dumpfiles that conflict with SQL Reserved Words (such as
> >>> 'key')
> >>> that I am unable to find any other acceptable workaround short of
> >>
> >> Why not automate the hacking using Text::CSV_XS and rewrite the header
> >> before using DBD::CSV?
> >
> > Or simply quote the column names in your SQL statement?
> 
> I tried various quoting mechanisms up to and including attempting to use
> backticks, but all result in errors of one kind or another

Can you attach the first 4 lines of your csv datafile?

> $dbh->prepare(q{Select 'key', PHM_ID, DAW_CD, GENBRND_CD from clms limit
> 10})
> results in every record having the literal value "key" for the column `key`
> same if I try select 'key' as PKEY
> 
> if I switch to double-quotes rather than single quotes around "key" in
> the above, I get the following error:
> Execution ERROR: No such column '"key"' called from clms_test.pl at 23.
> 
> I'll look into playing with Text::CSV_XS, and see what I can come up with.
> 
> I still think it would be easier if skip_first_line were not presumed
> (forced to) false if col_names is set, but rather presumed false only if
> not set explicitly true.

We agree, investigating what is actually required (and should be
documented)

-- 
H.Merijn Brand  http://tux.nl   Perl Monger  http://amsterdam.pm.org/
using perl5.00307 .. 5.17   porting perl5 on HP-UX, AIX, and openSUSE
http://mirrors.develooper.com/hpux/http://www.test-smoke.org/
http://qa.perl.org   http://www.goldmark.org/jeff/stupid-disclaimers/


Re: DBD::CSV and skip_first_line

2012-11-25 Thread Jens Rehsack

On 25.11.12 10:00, H.Merijn Brand wrote:

On Fri, 23 Nov 2012 17:43:50 -0500, "Scott R. Godin" 
wrote:


I've run into an issue where I need both col_names set and
skip_first_line still set to TRUE, because of malformed colnames in the
original dumpfiles that conflict with SQL Reserved Words (such as 'key')
that I am unable to find any other acceptable workaround short of


Why not automate the hacking using Text::CSV_XS and rewrite the header
before using DBD::CSV?


Or simply quote the column names in your SQL statement?


hacking the dumpfiles prior to import for unit-testing and validation
prior to splitting the dumpfiles out into a normalized sql database.
(I'll eed to hand this off to someone else for future dump-and-import
work, so it's just got to WORK with these ACCESS database dump files
as-is, plus HIPAA rules about not changing data complicates matters)

Is there any way to ensure that despite col_names being set, I can still
force skip_first_line => 1 ? or should I report this as a possible
edge-case bug


There should be, and it is likely that this case is already fixed in
the current development state of DBD::CSV by the valuable work of Jens
Rehsack, but that state is not (yet) releasable as it depends on
changes in SQL::Statement and DBI (DBD::File).


Well, since we're both busy - we need to find an hour or so to
integrate. I do not expect a general issue with DBI-1.622, I expect
some edge cases we need to tidy up.


to sum up,

col names present in csv but bad (sql reserved words)
can use col_names => [ @ary ], but this sets skip_first_line to FALSE as
it *assumes* that colnames are NOT present in original dump

what do ? :)


Rewrite the headers with Text::CSV_XS before using DBD::CSV


Or simply quote the reserved cols :)

Cheers
--
Jens Rehsack


Re: DBD::CSV and skip_first_line

2012-11-25 Thread H.Merijn Brand
On Fri, 23 Nov 2012 17:43:50 -0500, "Scott R. Godin" 
wrote:

> I've run into an issue where I need both col_names set and
> skip_first_line still set to TRUE, because of malformed colnames in the
> original dumpfiles that conflict with SQL Reserved Words (such as 'key')
> that I am unable to find any other acceptable workaround short of

Why not automate the hacking using Text::CSV_XS and rewrite the header
before using DBD::CSV?

> hacking the dumpfiles prior to import for unit-testing and validation
> prior to splitting the dumpfiles out into a normalized sql database.
> (I'll eed to hand this off to someone else for future dump-and-import
> work, so it's just got to WORK with these ACCESS database dump files
> as-is, plus HIPAA rules about not changing data complicates matters)
> 
> Is there any way to ensure that despite col_names being set, I can still
> force skip_first_line => 1 ? or should I report this as a possible
> edge-case bug

There should be, and it is likely that this case is already fixed in
the current development state of DBD::CSV by the valuable work of Jens
Rehsack, but that state is not (yet) releasable as it depends on
changes in SQL::Statement and DBI (DBD::File).

> to sum up,
> 
> col names present in csv but bad (sql reserved words)
> can use col_names => [ @ary ], but this sets skip_first_line to FALSE as
> it *assumes* that colnames are NOT present in original dump
> 
> what do ? :)

Rewrite the headers with Text::CSV_XS before using DBD::CSV

-- 
H.Merijn Brand  http://tux.nl   Perl Monger  http://amsterdam.pm.org/
using perl5.00307 .. 5.17   porting perl5 on HP-UX, AIX, and openSUSE
http://mirrors.develooper.com/hpux/http://www.test-smoke.org/
http://qa.perl.org   http://www.goldmark.org/jeff/stupid-disclaimers/