Hello,
  I just wanted to provide some closure on this "issue", we found out that
there was a PIX firewall between the Perl app server and the Oracle
database, and once we removed this, the problem did go away.  The network
guys were unable to find any reason why the firewall might be causing this
hang or the various workaround we were trying (selecting a constant number,
tweaking the row_cache) were causing it to work.  So rather than hire a
consultant, we are removing the firewall and should be case closed.

Thank you very much for your help,
Dan

On Mon, Jul 12, 2010 at 7:31 AM, John Scoles <byter...@hotmail.com> wrote:

>
> Unfortunetly network stuff is way out of my expertese and without exact
> duplication of you network I would just be shooting into the dark to give
> you an exact answer.  All I know is 1500 mtu is the max I think and packets
> that large may be giving you problems with your SQLnet layer if it is very
> old.
>
>
>
> From my very limited knowledge on such things I know tht a large packet
> will increase your lag time and lantancy on slower connections.
>
>
>
> Do not know why the constant runs corretly. I would need to see a
> dbdverbose trace of both runs.
>
>
>
> I would guess changeding the size and or type of the constant my throw the
> bug as well.  Might just be one of those majic munbers that pad the results
> just enough to make the query run fine.
>
>
>
> Hope this helps a little
>
>
>
> John Scoles
>
> > Date: Fri, 9 Jul 2010 17:47:24 -0400
> > Subject: Re: "sql*net message from client" hang
> > From: dkele...@gmail.com
> > To: byter...@hotmail.com
> > CC: sco...@pythian.com; martin.h...@oracle.com; dbi-users@perl.org
> >
> > Some auxiliary info that I just tried, right now the query is of the form
> > "select id from table where column = x order by foo". If I change this to
> > "select id, '123' from table where column = x order by foo", then it
> > completes every time... Does this provide any clue into what is going on?
> > Just adding a numeric constant into the select fixes it?! Does not seem
> to
> > hang up at all on any blocks when we do this.
> >
> > Thanks,
> > Dan
> >
> > On Fri, Jul 9, 2010 at 5:29 PM, Dan <dkele...@gmail.com> wrote:
> >
> > > Hello,
> > > Thank you very much for your observations, this definitely seems to be
> > > narrowing in on the problem. I have tried increasing (decreasing
> actually)
> > > the RowCacheSize and it seems to start to flake out around -4425. At
> -4424
> > > it hangs up for 10 seconds or so on that block, but seems to complete
> every
> > > time. At -4425 it hangs up on that block the first two times I ran it,
> but
> > > then the next 5 times it only hung up there for ~10 seconds then
> finishes
> > > up. Then I increased to -4430 and also had intermittent results. -4500
> > > fails every time on that same block (for 5 runs so far).
> > >
> > > Can you suggest anything I can look into on the network side or the OS
> > > side as a possible workaround for this without modifying that parameter
> in
> > > the code? The MTU on the boxes is currently 1500, not sure if that is
> > > related in any way.
> > >
> > >
> > > Thanks,
> > > Dan
> > >
> > >
> > > On Fri, Jul 9, 2010 at 3:30 PM, John Scoles <byter...@hotmail.com>
> wrote:
> > >
> > >>
> > >> ok we are getting some place now
> > >>
> > >>
> > >>
> > >> in DBD::Oracle 1.19 the way it set the rowcache is much different that
> > >> later versions
> > >>
> > >>
> > >>
> > >> in this case -1 would mean that the following is applied
> > >>
> > >>
> > >>
> > >> cache_mem = (cache_mem_iv <= 0) ? 10 * 1460 : cache_mem_iv;
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> so you cache_mem is now set to 10*1460 bytes for any describes that
> you
> > >> do (selects)
> > >>
> > >>
> > >>
> > >> later on this will be changed when you do a fetch 100,000 rows as your
> > >> size is -1.
> > >>
> > >>
> > >>
> > >> So you might want to try a larger - number for the value of
> RowCacheSize
> > >> to see at what point the querry dies again
> > >>
> > >>
> > >>
> > >> so far as being a bug I am on the fence right now.
> > >>
> > >>
> > >>
> > >> The ban-width of you network may have changed or the size of the bit
> > >> stream (16 vs 32 bit bus) so the default rowcachesize may no longer be
> large
> > >> engough
> > >>
> > >>
> > >>
> > >> The value of rowcachesize when there is no value given for rowcache
> size
> > >> is calulated like this
> > >>
> > >>
> > >>
> > >> (from the code)
> > >>
> > >>
> > >>
> > >> /* Oracle packets on ethernet have max size of around 1460. */
> > >> /* We'll aim to fill our row cache with around 10 per go. */
> > >> /* Using 10 means any 'runt' packets will have less impact. */
> > >>
> > >>
> > >> int txfr_size = 10 * 1460; /* desired transfer/cache size */
> > >>
> > >>
> > >>
> > >> /* Use guessed average on-the-wire row width calculated above & */
> > >> /* add in overhead of 5 bytes per field plus 8 bytes per row. */
> > >> /* The n*5+8 was determined by studying SQL*Net v2 packets. */
> > >> /* It could probably benefit from a more detailed analysis. */
> > >>
> > >>
> > >> est_width += num_fields*5 + 8;
> > >>
> > >>
> > >>
> > >> cache_rows = txfr_size / est_width; /* (maybe 1 or 0) */
> > >>
> > >>
> > >>
> > >> /* To ensure good performance with large rows (near or larger */
> > >> /* than our target transfer size) we set a minimum cache size. */
> > >>
> > >>
> > >> if (cache_rows < 6) /* is cache a 'useful' size? */
> > >> cache_rows = (cache_rows > 0) ? 6 : 4;
> > >> }
> > >> if (cache_rows > 32767) /* keep within Oracle's limits */
> > >> cache_rows = 32767;
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> so the above might be buggering you up if your packet size is no
> longer
> > >> 1460?
> > >>
> > >>
> > >>
> > >> This might be it.
> > >>
> > >>
> > >>
> > >> Cheers
> > >>
> > >> John Scoles
> > >> > Date: Fri, 9 Jul 2010 10:15:03 -0400
> > >> > Subject: Re: "sql*net message from client" hang
> > >> > From: dkele...@gmail.com
> > >> > To: sco...@pythian.com
> > >> > CC: martin.h...@oracle.com; dbi-users@perl.org
> > >> >
> > >> > Hello,
> > >> >
> > >> > Apologies, that was DBI 1.607, not .67!
> > >> >
> > >> > with RowCacheSize '0', it hangs on the same block, with '1', it
> seems to
> > >> > go through ok as with '-1'.
> > >> >
> > >> > As far as changes go, that is part of the mystery, haven't been able
> to
> > >> > find any changes around the time when this started to happen. Yes,
> we
> > >> are
> > >> > running DBI 1.607, DBD::Oracle 1.19, Oracle Client 10.2.0.1.0, all
> > >> pretty
> > >> > old, but I have confirmed that nothing changed in them anywhere, all
> the
> > >> > timestamps are years old, and the issue happens across all of our
> > >> > application servers. General hunch is that it is something with the
> > >> > network, but I have tried it on boxes with no firewall, where a
> > >> traceroute
> > >> > to the DB server has no hops at all, and the issue still occurs.
> > >> >
> > >> > Thanks,
> > >> > Dan
> > >> >
> > >> > On Thu, Jul 8, 2010 at 7:24 PM, John Scoles <sco...@pythian.com>
> wrote:
> > >> >
> > >> > > The $dbh->{RowCacheSize} = -1; is funny as you are telling the box
> to
> > >> only
> > >> > > use 1 byte for RowCach
> > >> > > Try it with '0' and see what happens?
> > >> > >
> > >> > >
> > >> > >
> > >> > > On Thu, Jul 8, 2010 at 7:20 PM, John Scoles <sco...@pythian.com>
> > >> wrote:
> > >> > >
> > >> > >> Hmm DBI 0.67 that is old and the fact you get the same error with
> the
> > >> > >> latest DBI and DBD::Oracle would lead me to think there may be
> > >> something
> > >> > >> wrong internally in your DB or its storage medium, corrupt block
> > >> perhaps or
> > >> > >> bad sector??
> > >> > >>
> > >> > >> The code differences between 1.61 and 0.67 are vast as would be
> the
> > >> > >> changes in DBD::Oracle.
> > >> > >>
> > >> > >> Did you change your Oracle client lately or anything else with
> the DB
> > >> or
> > >> > >> network?
> > >> > >>
> > >> > >> Would lke to know the version of DBD:Oracle you are using with
> DBI
> > >> 0.67
> > >> > >> and what version of the ORACLE client you where using and what
> > >> version of
> > >> > >> the DB you are trying to select from
> > >> > >>
> > >> > >> that will help narrow things down a bit
> > >> > >>
> > >> > >> cheers
> > >> > >> John Scoles
> > >> > >>
> > >> > >>
> > >> > >> On Thu, Jul 8, 2010 at 5:28 PM, Dan <dkele...@gmail.com> wrote:
> > >> > >>
> > >> > >>> Hello,
> > >> > >>> We do not have any quotas implemented on our Oracle instance.
> The
> > >> > >>> simple
> > >> > >>> select works fine for that hanging block on sqlplus, or if i
> only
> > >> execute
> > >> > >>> it
> > >> > >>> for that block in the perl script instead of iterating through
> them
> > >> all.
> > >> > >>> I
> > >> > >>> just tested running for that same block 185 times instead of all
> 185
> > >> > >>> blocks,
> > >> > >>> and it seems to hang on the second iteration (three times in a
> row
> > >> so
> > >> > >>> far)
> > >> > >>>
> > >> > >>> Thanks,
> > >> > >>> Dan
> > >> > >>>
> > >> > >>> On Thu, Jul 8, 2010 at 12:29 PM, Martin Hall <
> > >> martin.h...@oracle.com>
> > >> > >>> wrote:
> > >> > >>>
> > >> > >>> > there are various quotas in Oracle that could stop a query.
> > >> However,
> > >> > >>> they
> > >> > >>> > do tend to kill the guilty session rather than just hang like
> > >> that. If
> > >> > >>> your
> > >> > >>> > simple select ran multiple times for the same 10k block, would
> it
> > >> still
> > >> > >>> > hang?
> > >> > >>> >
> > >> > >>> > Cheers
> > >> > >>> >
> > >> > >>> > Martin
> > >> > >>> >
> > >> > >>> >
> > >> > >>> > On 08/07/2010 17:07, Howard, Chris wrote:
> > >> > >>> >
> > >> > >>> > If it always stops at the same place, it makes me think
> > >> > >>> > of a resource problem, something like a quota?
> > >> > >>> >
> > >> > >>> > I don't remember of there are select quotas.
> > >> > >>> >
> > >> > >>> > Does it do the same if run as sysdba or some other
> > >> > >>> > well-endowed database user?
> > >> > >>> >
> > >> > >>> >
> > >> > >>> >
> > >> > >>> >
> > >> > >>> >
> > >> > >>> > -----Original Message-----
> > >> > >>> > From: Dan [mailto:dkele...@gmail.com <dkele...@gmail.com>]
> > >> > >>> > Sent: Thursday, July 08, 2010 10:02 AM
> > >> > >>> > To: dbi-users@perl.org
> > >> > >>> > Subject: "sql*net message from client" hang
> > >> > >>> >
> > >> > >>> > Hello,
> > >> > >>> > I have some code that has been running happily for many years,
> but
> > >> > >>> > suddenly started hanging on a simple select query to our
> Oracle
> > >> DB.
> > >> > >>> > The DB
> > >> > >>> > shows it as an inactive session with "SQL*NET message from
> > >> client", so
> > >> > >>> > the
> > >> > >>> > DB thinks it is waiting for the perl script, but the perl
> script
> > >> is
> > >> > >>> > hung
> > >> > >>> > reading from the DB according to truss. I boiled it down to a
> 20
> > >> line
> > >> > >>> > test
> > >> > >>> > script that prepares a query using DBI and then executes it
> ~190
> > >> times
> > >> > >>> > for
> > >> > >>> > different 10k blocks of the table. It seems to consistently
> hang
> > >> on
> > >> > >>> > the
> > >> > >>> > 97th iteration, and times out after a few hours hung in
> "SQL*NET
> > >> > >>> > message
> > >> > >>> > from client". Has anyone ever come across an issue like this?
> Even
> > >> > >>> > stranger,
> > >> > >>> > when I inserted "$dbh->{RowCacheSize} = -1;" , it seems to
> > >> complete
> > >> > >>> >
> > >> > >>> >
> > >> > >>> > and
> > >> > >>> >
> > >> > >>> >
> > >> > >>> > never hang... Our hunch is that this is some issue with the
> > >> network,
> > >> > >>> > but
> > >> > >>> > don't have the expertise to pinpoint what would cause this
> strange
> > >> > >>> > behaviour
> > >> > >>> >
> > >> > >>> > We are running old DBI .67, but i tried the same script with a
> > >> local
> > >> > >>> > version
> > >> > >>> > of the latest and greatest DBI and DBD::Oracle with the same
> > >> results.
> > >> > >>> > Here
> > >> > >>> > is a sample output from setting $dbh->trace(15); :
> > >> > >>> >
> > >> > >>> > 96 : 1793727
> > >> > >>> > -> execute for DBD::Oracle::st
> (DBI::st=HASH(0x40ad7c)~0x40ad4c
> > >> > >>> > '1793727') thr#12d600
> > >> > >>> > bind :p1 <== '1793727' (type 0)
> > >> > >>> > rebinding :p1 (not-utf8, ftype 1, csid 0, csform 0, inout 0)
> > >> > >>> > bind :p1 <== '1793727' (size 7/8/0, ptype 4, otype 1)
> > >> > >>> > bind :p1 <== '1793727' (size 7/7, otype 1, indp 0, at_exec 1)
> > >> > >>> >
> > >> > >>> >
> > >> > >>> >
> > >> > >>> >
> > >> > >>> >
> > >>
> OCIBindByName(3960a0,404734,38f52c,":p1",3,3a8438,7,1,40474c,0,404744,0
> > >> > >>> >
> > >> > >>> >
> > >> > >>> > ,0,2)=SUCCESS
> > >> > >>> >
> > >> > >>> >
> > >> OCIBindDynamic(395e34,38f52c,404710,fefec350,404710,fefec644)=SUCCESS
> > >> > >>> > bind :p1 <== '1793727' (in, not-utf8, csid 31->0->31, ftype 1,
> > >> > >>> > csform
> > >> > >>> > 0->0, maxlen 7, maxdata_size 0)
> > >> > >>> > OCIAttrSet(395e34,OCI_HTYPE_BIND,ffbff290,0,31,38f52c)=SUCCESS
> > >> > >>> > dbd_st_execute SELECT (out0, lob0)...
> > >> > >>> > in ':p1' [0,0]: len 7, ind 0
> > >> > >>> > OCIStmtExecute(38f4b8,3960a0,38f52c,0,0,0,0,0)=SUCCESS
> > >> > >>> > OCIAttrGet(3960a0,OCI_HTYPE_STMT,ffbff40a,0,10,38f52c)=SUCCESS
> > >> > >>> > dbd_st_execute SELECT returned (SUCCESS, rpc0, fn4, out0)
> > >> > >>> > <- execute= '0E0' at simple.pl line 18
> > >> > >>> > 97 : 1793738
> > >> > >>> > -> execute for DBD::Oracle::st
> (DBI::st=HASH(0x40ad7c)~0x40ad4c
> > >> > >>> > '1793738') thr#12d600
> > >> > >>> > bind :p1 <== '1793738' (type 0)
> > >> > >>> > rebinding :p1 (not-utf8, ftype 1, csid 0, csform 0, inout 0)
> > >> > >>> > bind :p1 <== '1793738' (size 7/8/0, ptype 4, otype 1)
> > >> > >>> > bind :p1 <== '1793738' (size 7/7, otype 1, indp 0, at_exec 1)
> > >> > >>> >
> > >> > >>> >
> > >> > >>> >
> > >> > >>> >
> > >> > >>> >
> > >>
> OCIBindByName(3960a0,404734,38f52c,":p1",3,3a8438,7,1,40474c,0,404744,0
> > >> > >>> >
> > >> > >>> >
> > >> > >>> > ,0,2)=SUCCESS
> > >> > >>> >
> > >> > >>> >
> > >> OCIBindDynamic(395e34,38f52c,404710,fefec350,404710,fefec644)=SUCCESS
> > >> > >>> > bind :p1 <== '1793738' (in, not-utf8, csid 31->0->31, ftype 1,
> > >> > >>> > csform
> > >> > >>> > 0->0, maxlen 7, maxdata_size 0)
> > >> > >>> > OCIAttrSet(395e34,OCI_HTYPE_BIND,ffbff290,0,31,38f52c)=SUCCESS
> > >> > >>> > dbd_st_execute SELECT (out0, lob0)...
> > >> > >>> > in ':p1' [0,0]: len 7, ind 0
> > >> > >>> > ...
> > >> > >>> > ### HANGS forever here :(
> > >> > >>> > ...
> > >> > >>> >
> > >> > >>> > Any help is much appreciated!
> > >> > >>> >
> > >> > >>> > Thanks,
> > >> > >>> > Dan
> > >> > >>> >
> > >> > >>> >
> > >> > >>> >
> > >> > >>> >
> > >> > >>>
> > >> > >>
> > >> > >>
> > >> > > --
> > >> > > New! Learn why & how to love your data with Pythian's new webinar
> > >> series.
> > >> > > Topics, details & register: http://www.pythian.com/webinars
> > >> > >
> > >> > >
> > >>
> > >> _________________________________________________________________
> > >> Turn down-time into play-time with Messenger games
> > >> http://go.microsoft.com/?linkid=9734385
> > >
> > >
> > >
>
> _________________________________________________________________
> Game on: Challenge friends to great games on Messenger
> http://go.microsoft.com/?linkid=9734387

Reply via email to