Thanx Wayne and Pavel, i will try to make this work with cursors. Theoretically there can be more than 4 bilion records in result, but only 58mil returned which took 3 days. Possibly i will have to create temporary table and store results form %patterns into it after each 500k records, and group them at the end.
I didn't manage to find example where plpgsql uses hash arrays or where plperl uses cursors. Any of these methods links/examples would be helpful to me. Thanx in advance On Fri, Jul 8, 2011 at 9:49 PM, <lists-pg...@useunix.net> wrote: > Hi Viktor, > > I'm not sure what your requirements are in terms of performance and > stability of the your result set. See Pavel's response. A cursor issues > a single query and renders a single result set. The result set is > static, the cursor just gives you finer control/performance when > retrieving rows from the set. Using a transaction will also render better > performance when %patterns contains a large number of keys/values, > insert all of them in one transaction, the same one you opened for the > cursor. > > Your method issues many queries and will take longer for each successive > query. And the number of queries will increase as table size increases. > It could also return duplicate rows and/or missed rows due to other > transactions completing between your select query. > > If you can tolerate the above issues then so be it, if not you really > should look at cursors. > > Also there might be a bug in your code if you delete entries from > 'entry'. Your depending on $rowCountAll to remain static which is not the > case if you ever delete entries. You can fix this by skipping the > "select count(1)" step and just breaking your loop when less then > $windowSize entries are returned from the "select sequence.." query. > > Wayne > > > On Fri, Jul 08, 2011 at 08:55:36PM +0200, Viktor Bojovi?? wrote: > > Thanx Wayne, > > at the end i did it that way and it works. > > The code is below. > > > > CREATE FUNCTION pattern_counter1("patLength" integer) RETURNS character > > varying > > LANGUAGE plperl > > AS $_X$ > > my $rvCnt = spi_exec_query("select count(1) as cnt from entry"); > > #my $rowCountAll = $rvCnt->{processed}; > > my $row = $rvCnt->{rows}[0]; > > my $rowCountAll = $row->{cnt}; > > my $windowSize = 500000; > > my %patterns=(); > > for (my $p=0;$p<$rowCountAll;$p+=$windowSize){ > > my $sql="select sequence from entry limit $windowSize offset $p"; > > > > my $rv = spi_exec_query($sql); > > my $rowCount = $rv->{processed}; > > my $patLen = $_[0]; > > my $patt = ''; > > > > foreach my $rn (0 .. $rowCount -1){ > > my $row = $rv->{rows}[$rn]; > > my $seq = $row->{sequence}; > > for (my $x = 1;$x<=length($seq) - $patLen;$x++){ > > $patt=substr($seq,$x,$patLen); > > if (! defined $patterns{$patt}) { > > $patterns{$patt}=1; > > }else{ > > $patterns{$patt}++; > > } > > } > > } > > } > > > > foreach $patt (keys %patterns){ > > my $sql="insert into patterns values('".$patt."',".$patterns{$patt}.")"; > > spi_exec_query($sql); > > } > > return $tmp; > > $_X$; > > > > > > On Fri, Jul 8, 2011 at 8:50 PM, <lists-pg...@useunix.net> wrote: > > > > > I'm have the same situation with large tables. Take a look at using a > > > cursor to fetch several thousand rows at a time. I presume what's > > > happening is that perl is attempting to create a massive list/array in > > > memory. If you use a cursor the list should only contain X number of > > > rows where X in the number specified at each fetch execution. You'll > > > need to define the cursor inside a transaction block. > > > > > > - begin transaction > > > - define the cursor > > > - fetch rows from cursor > > > - while row count from previous step > 0, execute previous step > > > - terminate transaction > > > > > > Or you could use plpgsql instead of plperl, FOR loops over result sets > in > > > plpgsql implicitly use cursors... it's just a little less code. > > > > > > Hope that helps, > > > Wayne > > > > > > On Tue, Jul 05, 2011 at 10:29:03PM +0200, Viktor Bojovi?? wrote: > > > > Hi, > > > > while reading 20GB table through PL/PERL function , it constantly > grows > > > in > > > > RAM. > > > > I wanted to ask you which is the best way to read table inside that > > > > function without such memory consumption. > > > > Thanks in advance > > > > > > > > Code is here: > > > > > > > > CREATE FUNCTION pattern_counter("patLength" integer) > > > > RETURNS varchar AS > > > > $BODY$ > > > > my $rv = spi_exec_query("select sequence from entry"); > > > > my $rowCount = $rv->{processed}; > > > > my $patLen = $_[0]; > > > > my $patt = ''; > > > > my %patterns=(); > > > > foreach my $rn (0 .. $rowCount -1){ > > > > my $row = $rv->{rows}[$rn]; > > > > my $seq = $row->{sequence}; > > > > for (my $x = 1;$x<=length($seq) - $patLen;$x++){ > > > > $patt=substr($seq,$x,$patLen); > > > > if (! defined $patterns{$patt}) { > > > > $patterns{$patt}=1; > > > > }else{ > > > > $patterns{$patt}++; > > > > } > > > > } > > > > } > > > > foreach $patt (keys %patterns){ > > > > my $sql="insert into patterns > > > values('".$patt."',".$patterns{$patt}.")"; > > > > spi_exec_query($sql); > > > > } > > > > return ''; > > > > $BODY$ > > > > LANGUAGE plperl VOLATILE > > > > COST 100; > > > > > > > > > > > > > > > > -- > > > > --------------------------------------- > > > > Viktor Bojovi?? > > > > --------------------------------------- > > > > Wherever I go, Murphy goes with me > > > > > > > > > > > -- > > --------------------------------------- > > Viktor Bojovi?? > > --------------------------------------- > > Wherever I go, Murphy goes with me > -- --------------------------------------- Viktor Bojović --------------------------------------- Wherever I go, Murphy goes with me