Hello,
  Thank you very much for your observations, this definitely seems to be
narrowing in on the problem.  I have tried increasing (decreasing actually)
the RowCacheSize and it seems to start to flake out around -4425.  At -4424
it hangs up for 10 seconds or so on that block, but seems to complete every
time.  At -4425  it hangs up on that block the first two times I ran it, but
then the next 5 times it only hung up there for ~10 seconds then finishes
up.  Then I increased to -4430 and also had intermittent results.  -4500
fails every time on that same block (for 5 runs so far).

  Can you suggest anything I can look into on the network side or the OS
side as a possible workaround for this without modifying that parameter in
the code?  The MTU on the boxes is currently 1500, not sure if that is
related in any way.


Thanks,
Dan

On Fri, Jul 9, 2010 at 3:30 PM, John Scoles <byter...@hotmail.com> wrote:

>
> ok we are getting some place now
>
>
>
> in DBD::Oracle 1.19 the way it set the rowcache is much different that
> later versions
>
>
>
> in this case -1 would mean that the following is applied
>
>
>
> cache_mem = (cache_mem_iv <= 0) ? 10 * 1460 : cache_mem_iv;
>
>
>
>
>
> so you cache_mem is now set to 10*1460 bytes  for any describes that you do
> (selects)
>
>
>
> later on this will be changed when you do a fetch 100,000 rows as your size
> is -1.
>
>
>
> So you might want to try a larger - number for the value of RowCacheSize
>  to see at what point the querry dies again
>
>
>
> so far as being a bug I am on the fence right now.
>
>
>
> The ban-width of you network may have changed or the size of the bit stream
> (16 vs 32 bit bus) so the default rowcachesize may no longer be large
> engough
>
>
>
> The value of rowcachesize when there is no value given for rowcache size is
> calulated like this
>
>
>
> (from the code)
>
>
>
> /* Oracle packets on ethernet have max size of around 1460. */
>  /* We'll aim to fill our row cache with around 10 per go. */
>  /* Using 10 means any 'runt' packets will have less impact. */
>
>
>  int txfr_size  = 10 * 1460; /* desired transfer/cache size */
>
>
>
>  /* Use guessed average on-the-wire row width calculated above & */
>  /* add in overhead of 5 bytes per field plus 8 bytes per row. */
>  /* The n*5+8 was determined by studying SQL*Net v2 packets. */
>  /* It could probably benefit from a more detailed analysis. */
>
>
>  est_width += num_fields*5 + 8;
>
>
>
>  cache_rows = txfr_size / est_width;       /* (maybe 1 or 0) */
>
>
>
>  /* To ensure good performance with large rows (near or larger */
>  /* than our target transfer size) we set a minimum cache size. */
>
>
>  if (cache_rows < 6) /* is cache a 'useful' size? */
>     cache_rows = (cache_rows > 0) ? 6 : 4;
>    }
>    if (cache_rows > 32767) /* keep within Oracle's limits  */
>  cache_rows = 32767;
>
>
>
>
>
> so the above might be buggering you up if your packet size is no longer
> 1460?
>
>
>
> This might be it.
>
>
>
> Cheers
>
> John Scoles
> > Date: Fri, 9 Jul 2010 10:15:03 -0400
> > Subject: Re: "sql*net message from client" hang
> > From: dkele...@gmail.com
> > To: sco...@pythian.com
> > CC: martin.h...@oracle.com; dbi-users@perl.org
> >
> > Hello,
> >
> > Apologies, that was DBI 1.607, not .67!
> >
> > with RowCacheSize '0', it hangs on the same block, with '1', it seems to
> > go through ok as with '-1'.
> >
> > As far as changes go, that is part of the mystery, haven't been able to
> > find any changes around the time when this started to happen. Yes, we are
> > running DBI 1.607, DBD::Oracle 1.19, Oracle Client 10.2.0.1.0, all pretty
> > old, but I have confirmed that nothing changed in them anywhere, all the
> > timestamps are years old, and the issue happens across all of our
> > application servers. General hunch is that it is something with the
> > network, but I have tried it on boxes with no firewall, where a
> traceroute
> > to the DB server has no hops at all, and the issue still occurs.
> >
> > Thanks,
> > Dan
> >
> > On Thu, Jul 8, 2010 at 7:24 PM, John Scoles <sco...@pythian.com> wrote:
> >
> > > The $dbh->{RowCacheSize} = -1; is funny as you are telling the box to
> only
> > > use 1 byte for RowCach
> > > Try it with '0' and see what happens?
> > >
> > >
> > >
> > > On Thu, Jul 8, 2010 at 7:20 PM, John Scoles <sco...@pythian.com>
> wrote:
> > >
> > >> Hmm DBI 0.67 that is old and the fact you get the same error with the
> > >> latest DBI and DBD::Oracle would lead me to think there may be
> something
> > >> wrong internally in your DB or its storage medium, corrupt block
> perhaps or
> > >> bad sector??
> > >>
> > >> The code differences between 1.61 and 0.67 are vast as would be the
> > >> changes in DBD::Oracle.
> > >>
> > >> Did you change your Oracle client lately or anything else with the DB
> or
> > >> network?
> > >>
> > >> Would lke to know the version of DBD:Oracle you are using with DBI
> 0.67
> > >> and what version of the ORACLE client you where using and what version
> of
> > >> the DB you are trying to select from
> > >>
> > >> that will help narrow things down a bit
> > >>
> > >> cheers
> > >> John Scoles
> > >>
> > >>
> > >> On Thu, Jul 8, 2010 at 5:28 PM, Dan <dkele...@gmail.com> wrote:
> > >>
> > >>> Hello,
> > >>> We do not have any quotas implemented on our Oracle instance. The
> > >>> simple
> > >>> select works fine for that hanging block on sqlplus, or if i only
> execute
> > >>> it
> > >>> for that block in the perl script instead of iterating through them
> all.
> > >>> I
> > >>> just tested running for that same block 185 times instead of all 185
> > >>> blocks,
> > >>> and it seems to hang on the second iteration (three times in a row so
> > >>> far)
> > >>>
> > >>> Thanks,
> > >>> Dan
> > >>>
> > >>> On Thu, Jul 8, 2010 at 12:29 PM, Martin Hall <martin.h...@oracle.com
> >
> > >>> wrote:
> > >>>
> > >>> > there are various quotas in Oracle that could stop a query.
> However,
> > >>> they
> > >>> > do tend to kill the guilty session rather than just hang like that.
> If
> > >>> your
> > >>> > simple select ran multiple times for the same 10k block, would it
> still
> > >>> > hang?
> > >>> >
> > >>> > Cheers
> > >>> >
> > >>> > Martin
> > >>> >
> > >>> >
> > >>> > On 08/07/2010 17:07, Howard, Chris wrote:
> > >>> >
> > >>> > If it always stops at the same place, it makes me think
> > >>> > of a resource problem, something like a quota?
> > >>> >
> > >>> > I don't remember of there are select quotas.
> > >>> >
> > >>> > Does it do the same if run as sysdba or some other
> > >>> > well-endowed database user?
> > >>> >
> > >>> >
> > >>> >
> > >>> >
> > >>> >
> > >>> > -----Original Message-----
> > >>> > From: Dan [mailto:dkele...@gmail.com <dkele...@gmail.com>]
> > >>> > Sent: Thursday, July 08, 2010 10:02 AM
> > >>> > To: dbi-users@perl.org
> > >>> > Subject: "sql*net message from client" hang
> > >>> >
> > >>> > Hello,
> > >>> > I have some code that has been running happily for many years, but
> > >>> > suddenly started hanging on a simple select query to our Oracle DB.
> > >>> > The DB
> > >>> > shows it as an inactive session with "SQL*NET message from client",
> so
> > >>> > the
> > >>> > DB thinks it is waiting for the perl script, but the perl script is
> > >>> > hung
> > >>> > reading from the DB according to truss. I boiled it down to a 20
> line
> > >>> > test
> > >>> > script that prepares a query using DBI and then executes it ~190
> times
> > >>> > for
> > >>> > different 10k blocks of the table. It seems to consistently hang on
> > >>> > the
> > >>> > 97th iteration, and times out after a few hours hung in "SQL*NET
> > >>> > message
> > >>> > from client". Has anyone ever come across an issue like this? Even
> > >>> > stranger,
> > >>> > when I inserted "$dbh->{RowCacheSize} = -1;" , it seems to complete
> > >>> >
> > >>> >
> > >>> > and
> > >>> >
> > >>> >
> > >>> > never hang... Our hunch is that this is some issue with the
> network,
> > >>> > but
> > >>> > don't have the expertise to pinpoint what would cause this strange
> > >>> > behaviour
> > >>> >
> > >>> > We are running old DBI .67, but i tried the same script with a
> local
> > >>> > version
> > >>> > of the latest and greatest DBI and DBD::Oracle with the same
> results.
> > >>> > Here
> > >>> > is a sample output from setting $dbh->trace(15); :
> > >>> >
> > >>> > 96 : 1793727
> > >>> > -> execute for DBD::Oracle::st (DBI::st=HASH(0x40ad7c)~0x40ad4c
> > >>> > '1793727') thr#12d600
> > >>> > bind :p1 <== '1793727' (type 0)
> > >>> > rebinding :p1 (not-utf8, ftype 1, csid 0, csform 0, inout 0)
> > >>> > bind :p1 <== '1793727' (size 7/8/0, ptype 4, otype 1)
> > >>> > bind :p1 <== '1793727' (size 7/7, otype 1, indp 0, at_exec 1)
> > >>> >
> > >>> >
> > >>> >
> > >>> >
> > >>> >
> OCIBindByName(3960a0,404734,38f52c,":p1",3,3a8438,7,1,40474c,0,404744,0
> > >>> >
> > >>> >
> > >>> > ,0,2)=SUCCESS
> > >>> >
> > >>> >
> OCIBindDynamic(395e34,38f52c,404710,fefec350,404710,fefec644)=SUCCESS
> > >>> > bind :p1 <== '1793727' (in, not-utf8, csid 31->0->31, ftype 1,
> > >>> > csform
> > >>> > 0->0, maxlen 7, maxdata_size 0)
> > >>> > OCIAttrSet(395e34,OCI_HTYPE_BIND,ffbff290,0,31,38f52c)=SUCCESS
> > >>> > dbd_st_execute SELECT (out0, lob0)...
> > >>> > in ':p1' [0,0]: len 7, ind 0
> > >>> > OCIStmtExecute(38f4b8,3960a0,38f52c,0,0,0,0,0)=SUCCESS
> > >>> > OCIAttrGet(3960a0,OCI_HTYPE_STMT,ffbff40a,0,10,38f52c)=SUCCESS
> > >>> > dbd_st_execute SELECT returned (SUCCESS, rpc0, fn4, out0)
> > >>> > <- execute= '0E0' at simple.pl line 18
> > >>> > 97 : 1793738
> > >>> > -> execute for DBD::Oracle::st (DBI::st=HASH(0x40ad7c)~0x40ad4c
> > >>> > '1793738') thr#12d600
> > >>> > bind :p1 <== '1793738' (type 0)
> > >>> > rebinding :p1 (not-utf8, ftype 1, csid 0, csform 0, inout 0)
> > >>> > bind :p1 <== '1793738' (size 7/8/0, ptype 4, otype 1)
> > >>> > bind :p1 <== '1793738' (size 7/7, otype 1, indp 0, at_exec 1)
> > >>> >
> > >>> >
> > >>> >
> > >>> >
> > >>> >
> OCIBindByName(3960a0,404734,38f52c,":p1",3,3a8438,7,1,40474c,0,404744,0
> > >>> >
> > >>> >
> > >>> > ,0,2)=SUCCESS
> > >>> >
> > >>> >
> OCIBindDynamic(395e34,38f52c,404710,fefec350,404710,fefec644)=SUCCESS
> > >>> > bind :p1 <== '1793738' (in, not-utf8, csid 31->0->31, ftype 1,
> > >>> > csform
> > >>> > 0->0, maxlen 7, maxdata_size 0)
> > >>> > OCIAttrSet(395e34,OCI_HTYPE_BIND,ffbff290,0,31,38f52c)=SUCCESS
> > >>> > dbd_st_execute SELECT (out0, lob0)...
> > >>> > in ':p1' [0,0]: len 7, ind 0
> > >>> > ...
> > >>> > ### HANGS forever here :(
> > >>> > ...
> > >>> >
> > >>> > Any help is much appreciated!
> > >>> >
> > >>> > Thanks,
> > >>> > Dan
> > >>> >
> > >>> >
> > >>> >
> > >>> >
> > >>>
> > >>
> > >>
> > > --
> > > New! Learn why & how to love your data with Pythian's new webinar
> series.
> > > Topics, details & register: http://www.pythian.com/webinars
> > >
> > >
>
> _________________________________________________________________
> Turn down-time into play-time with Messenger games
> http://go.microsoft.com/?linkid=9734385

Reply via email to