RE: "sql*net message from client" hang

John Scoles Mon, 12 Jul 2010 04:31:26 -0700

Unfortunetly network stuff is way out of my expertese and without exact 
duplication of you network I would just be shooting into the dark to give you 
an exact answer.  All I know is 1500 mtu is the max I think and packets that 
large may be giving you problems with your SQLnet layer if it is very old.


 

>From my very limited knowledge on such things I know tht a large packet will 
>increase your lag time and lantancy on slower connections.

 

Do not know why the constant runs corretly. I would need to see a dbdverbose 
trace of both runs.

 

I would guess changeding the size and or type of the constant my throw the bug 
as well.  Might just be one of those majic munbers that pad the results just 
enough to make the query run fine.

 

Hope this helps a little

 

John Scoles
 
> Date: Fri, 9 Jul 2010 17:47:24 -0400
> Subject: Re: "sql*net message from client" hang
> From: dkele...@gmail.com
> To: byter...@hotmail.com
> CC: sco...@pythian.com; martin.h...@oracle.com; dbi-users@perl.org
> 
> Some auxiliary info that I just tried, right now the query is of the form
> "select id from table where column = x order by foo". If I change this to
> "select id, '123' from table where column = x order by foo", then it
> completes every time... Does this provide any clue into what is going on?
> Just adding a numeric constant into the select fixes it?! Does not seem to
> hang up at all on any blocks when we do this.
> 
> Thanks,
> Dan
> 
> On Fri, Jul 9, 2010 at 5:29 PM, Dan <dkele...@gmail.com> wrote:
> 
> > Hello,
> > Thank you very much for your observations, this definitely seems to be
> > narrowing in on the problem. I have tried increasing (decreasing actually)
> > the RowCacheSize and it seems to start to flake out around -4425. At -4424
> > it hangs up for 10 seconds or so on that block, but seems to complete every
> > time. At -4425 it hangs up on that block the first two times I ran it, but
> > then the next 5 times it only hung up there for ~10 seconds then finishes
> > up. Then I increased to -4430 and also had intermittent results. -4500
> > fails every time on that same block (for 5 runs so far).
> >
> > Can you suggest anything I can look into on the network side or the OS
> > side as a possible workaround for this without modifying that parameter in
> > the code? The MTU on the boxes is currently 1500, not sure if that is
> > related in any way.
> >
> >
> > Thanks,
> > Dan
> >
> >
> > On Fri, Jul 9, 2010 at 3:30 PM, John Scoles <byter...@hotmail.com> wrote:
> >
> >>
> >> ok we are getting some place now
> >>
> >>
> >>
> >> in DBD::Oracle 1.19 the way it set the rowcache is much different that
> >> later versions
> >>
> >>
> >>
> >> in this case -1 would mean that the following is applied
> >>
> >>
> >>
> >> cache_mem = (cache_mem_iv <= 0) ? 10 * 1460 : cache_mem_iv;
> >>
> >>
> >>
> >>
> >>
> >> so you cache_mem is now set to 10*1460 bytes for any describes that you
> >> do (selects)
> >>
> >>
> >>
> >> later on this will be changed when you do a fetch 100,000 rows as your
> >> size is -1.
> >>
> >>
> >>
> >> So you might want to try a larger - number for the value of RowCacheSize
> >> to see at what point the querry dies again
> >>
> >>
> >>
> >> so far as being a bug I am on the fence right now.
> >>
> >>
> >>
> >> The ban-width of you network may have changed or the size of the bit
> >> stream (16 vs 32 bit bus) so the default rowcachesize may no longer be 
> >> large
> >> engough
> >>
> >>
> >>
> >> The value of rowcachesize when there is no value given for rowcache size
> >> is calulated like this
> >>
> >>
> >>
> >> (from the code)
> >>
> >>
> >>
> >> /* Oracle packets on ethernet have max size of around 1460. */
> >> /* We'll aim to fill our row cache with around 10 per go. */
> >> /* Using 10 means any 'runt' packets will have less impact. */
> >>
> >>
> >> int txfr_size = 10 * 1460; /* desired transfer/cache size */
> >>
> >>
> >>
> >> /* Use guessed average on-the-wire row width calculated above & */
> >> /* add in overhead of 5 bytes per field plus 8 bytes per row. */
> >> /* The n*5+8 was determined by studying SQL*Net v2 packets. */
> >> /* It could probably benefit from a more detailed analysis. */
> >>
> >>
> >> est_width += num_fields*5 + 8;
> >>
> >>
> >>
> >> cache_rows = txfr_size / est_width; /* (maybe 1 or 0) */
> >>
> >>
> >>
> >> /* To ensure good performance with large rows (near or larger */
> >> /* than our target transfer size) we set a minimum cache size. */
> >>
> >>
> >> if (cache_rows < 6) /* is cache a 'useful' size? */
> >> cache_rows = (cache_rows > 0) ? 6 : 4;
> >> }
> >> if (cache_rows > 32767) /* keep within Oracle's limits */
> >> cache_rows = 32767;
> >>
> >>
> >>
> >>
> >>
> >> so the above might be buggering you up if your packet size is no longer
> >> 1460?
> >>
> >>
> >>
> >> This might be it.
> >>
> >>
> >>
> >> Cheers
> >>
> >> John Scoles
> >> > Date: Fri, 9 Jul 2010 10:15:03 -0400
> >> > Subject: Re: "sql*net message from client" hang
> >> > From: dkele...@gmail.com
> >> > To: sco...@pythian.com
> >> > CC: martin.h...@oracle.com; dbi-users@perl.org
> >> >
> >> > Hello,
> >> >
> >> > Apologies, that was DBI 1.607, not .67!
> >> >
> >> > with RowCacheSize '0', it hangs on the same block, with '1', it seems to
> >> > go through ok as with '-1'.
> >> >
> >> > As far as changes go, that is part of the mystery, haven't been able to
> >> > find any changes around the time when this started to happen. Yes, we
> >> are
> >> > running DBI 1.607, DBD::Oracle 1.19, Oracle Client 10.2.0.1.0, all
> >> pretty
> >> > old, but I have confirmed that nothing changed in them anywhere, all the
> >> > timestamps are years old, and the issue happens across all of our
> >> > application servers. General hunch is that it is something with the
> >> > network, but I have tried it on boxes with no firewall, where a
> >> traceroute
> >> > to the DB server has no hops at all, and the issue still occurs.
> >> >
> >> > Thanks,
> >> > Dan
> >> >
> >> > On Thu, Jul 8, 2010 at 7:24 PM, John Scoles <sco...@pythian.com> wrote:
> >> >
> >> > > The $dbh->{RowCacheSize} = -1; is funny as you are telling the box to
> >> only
> >> > > use 1 byte for RowCach
> >> > > Try it with '0' and see what happens?
> >> > >
> >> > >
> >> > >
> >> > > On Thu, Jul 8, 2010 at 7:20 PM, John Scoles <sco...@pythian.com>
> >> wrote:
> >> > >
> >> > >> Hmm DBI 0.67 that is old and the fact you get the same error with the
> >> > >> latest DBI and DBD::Oracle would lead me to think there may be
> >> something
> >> > >> wrong internally in your DB or its storage medium, corrupt block
> >> perhaps or
> >> > >> bad sector??
> >> > >>
> >> > >> The code differences between 1.61 and 0.67 are vast as would be the
> >> > >> changes in DBD::Oracle.
> >> > >>
> >> > >> Did you change your Oracle client lately or anything else with the DB
> >> or
> >> > >> network?
> >> > >>
> >> > >> Would lke to know the version of DBD:Oracle you are using with DBI
> >> 0.67
> >> > >> and what version of the ORACLE client you where using and what
> >> version of
> >> > >> the DB you are trying to select from
> >> > >>
> >> > >> that will help narrow things down a bit
> >> > >>
> >> > >> cheers
> >> > >> John Scoles
> >> > >>
> >> > >>
> >> > >> On Thu, Jul 8, 2010 at 5:28 PM, Dan <dkele...@gmail.com> wrote:
> >> > >>
> >> > >>> Hello,
> >> > >>> We do not have any quotas implemented on our Oracle instance. The
> >> > >>> simple
> >> > >>> select works fine for that hanging block on sqlplus, or if i only
> >> execute
> >> > >>> it
> >> > >>> for that block in the perl script instead of iterating through them
> >> all.
> >> > >>> I
> >> > >>> just tested running for that same block 185 times instead of all 185
> >> > >>> blocks,
> >> > >>> and it seems to hang on the second iteration (three times in a row
> >> so
> >> > >>> far)
> >> > >>>
> >> > >>> Thanks,
> >> > >>> Dan
> >> > >>>
> >> > >>> On Thu, Jul 8, 2010 at 12:29 PM, Martin Hall <
> >> martin.h...@oracle.com>
> >> > >>> wrote:
> >> > >>>
> >> > >>> > there are various quotas in Oracle that could stop a query.
> >> However,
> >> > >>> they
> >> > >>> > do tend to kill the guilty session rather than just hang like
> >> that. If
> >> > >>> your
> >> > >>> > simple select ran multiple times for the same 10k block, would it
> >> still
> >> > >>> > hang?
> >> > >>> >
> >> > >>> > Cheers
> >> > >>> >
> >> > >>> > Martin
> >> > >>> >
> >> > >>> >
> >> > >>> > On 08/07/2010 17:07, Howard, Chris wrote:
> >> > >>> >
> >> > >>> > If it always stops at the same place, it makes me think
> >> > >>> > of a resource problem, something like a quota?
> >> > >>> >
> >> > >>> > I don't remember of there are select quotas.
> >> > >>> >
> >> > >>> > Does it do the same if run as sysdba or some other
> >> > >>> > well-endowed database user?
> >> > >>> >
> >> > >>> >
> >> > >>> >
> >> > >>> >
> >> > >>> >
> >> > >>> > -----Original Message-----
> >> > >>> > From: Dan [mailto:dkele...@gmail.com <dkele...@gmail.com>]
> >> > >>> > Sent: Thursday, July 08, 2010 10:02 AM
> >> > >>> > To: dbi-users@perl.org
> >> > >>> > Subject: "sql*net message from client" hang
> >> > >>> >
> >> > >>> > Hello,
> >> > >>> > I have some code that has been running happily for many years, but
> >> > >>> > suddenly started hanging on a simple select query to our Oracle
> >> DB.
> >> > >>> > The DB
> >> > >>> > shows it as an inactive session with "SQL*NET message from
> >> client", so
> >> > >>> > the
> >> > >>> > DB thinks it is waiting for the perl script, but the perl script
> >> is
> >> > >>> > hung
> >> > >>> > reading from the DB according to truss. I boiled it down to a 20
> >> line
> >> > >>> > test
> >> > >>> > script that prepares a query using DBI and then executes it ~190
> >> times
> >> > >>> > for
> >> > >>> > different 10k blocks of the table. It seems to consistently hang
> >> on
> >> > >>> > the
> >> > >>> > 97th iteration, and times out after a few hours hung in "SQL*NET
> >> > >>> > message
> >> > >>> > from client". Has anyone ever come across an issue like this? Even
> >> > >>> > stranger,
> >> > >>> > when I inserted "$dbh->{RowCacheSize} = -1;" , it seems to
> >> complete
> >> > >>> >
> >> > >>> >
> >> > >>> > and
> >> > >>> >
> >> > >>> >
> >> > >>> > never hang... Our hunch is that this is some issue with the
> >> network,
> >> > >>> > but
> >> > >>> > don't have the expertise to pinpoint what would cause this strange
> >> > >>> > behaviour
> >> > >>> >
> >> > >>> > We are running old DBI .67, but i tried the same script with a
> >> local
> >> > >>> > version
> >> > >>> > of the latest and greatest DBI and DBD::Oracle with the same
> >> results.
> >> > >>> > Here
> >> > >>> > is a sample output from setting $dbh->trace(15); :
> >> > >>> >
> >> > >>> > 96 : 1793727
> >> > >>> > -> execute for DBD::Oracle::st (DBI::st=HASH(0x40ad7c)~0x40ad4c
> >> > >>> > '1793727') thr#12d600
> >> > >>> > bind :p1 <== '1793727' (type 0)
> >> > >>> > rebinding :p1 (not-utf8, ftype 1, csid 0, csform 0, inout 0)
> >> > >>> > bind :p1 <== '1793727' (size 7/8/0, ptype 4, otype 1)
> >> > >>> > bind :p1 <== '1793727' (size 7/7, otype 1, indp 0, at_exec 1)
> >> > >>> >
> >> > >>> >
> >> > >>> >
> >> > >>> >
> >> > >>> >
> >> OCIBindByName(3960a0,404734,38f52c,":p1",3,3a8438,7,1,40474c,0,404744,0
> >> > >>> >
> >> > >>> >
> >> > >>> > ,0,2)=SUCCESS
> >> > >>> >
> >> > >>> >
> >> OCIBindDynamic(395e34,38f52c,404710,fefec350,404710,fefec644)=SUCCESS
> >> > >>> > bind :p1 <== '1793727' (in, not-utf8, csid 31->0->31, ftype 1,
> >> > >>> > csform
> >> > >>> > 0->0, maxlen 7, maxdata_size 0)
> >> > >>> > OCIAttrSet(395e34,OCI_HTYPE_BIND,ffbff290,0,31,38f52c)=SUCCESS
> >> > >>> > dbd_st_execute SELECT (out0, lob0)...
> >> > >>> > in ':p1' [0,0]: len 7, ind 0
> >> > >>> > OCIStmtExecute(38f4b8,3960a0,38f52c,0,0,0,0,0)=SUCCESS
> >> > >>> > OCIAttrGet(3960a0,OCI_HTYPE_STMT,ffbff40a,0,10,38f52c)=SUCCESS
> >> > >>> > dbd_st_execute SELECT returned (SUCCESS, rpc0, fn4, out0)
> >> > >>> > <- execute= '0E0' at simple.pl line 18
> >> > >>> > 97 : 1793738
> >> > >>> > -> execute for DBD::Oracle::st (DBI::st=HASH(0x40ad7c)~0x40ad4c
> >> > >>> > '1793738') thr#12d600
> >> > >>> > bind :p1 <== '1793738' (type 0)
> >> > >>> > rebinding :p1 (not-utf8, ftype 1, csid 0, csform 0, inout 0)
> >> > >>> > bind :p1 <== '1793738' (size 7/8/0, ptype 4, otype 1)
> >> > >>> > bind :p1 <== '1793738' (size 7/7, otype 1, indp 0, at_exec 1)
> >> > >>> >
> >> > >>> >
> >> > >>> >
> >> > >>> >
> >> > >>> >
> >> OCIBindByName(3960a0,404734,38f52c,":p1",3,3a8438,7,1,40474c,0,404744,0
> >> > >>> >
> >> > >>> >
> >> > >>> > ,0,2)=SUCCESS
> >> > >>> >
> >> > >>> >
> >> OCIBindDynamic(395e34,38f52c,404710,fefec350,404710,fefec644)=SUCCESS
> >> > >>> > bind :p1 <== '1793738' (in, not-utf8, csid 31->0->31, ftype 1,
> >> > >>> > csform
> >> > >>> > 0->0, maxlen 7, maxdata_size 0)
> >> > >>> > OCIAttrSet(395e34,OCI_HTYPE_BIND,ffbff290,0,31,38f52c)=SUCCESS
> >> > >>> > dbd_st_execute SELECT (out0, lob0)...
> >> > >>> > in ':p1' [0,0]: len 7, ind 0
> >> > >>> > ...
> >> > >>> > ### HANGS forever here :(
> >> > >>> > ...
> >> > >>> >
> >> > >>> > Any help is much appreciated!
> >> > >>> >
> >> > >>> > Thanks,
> >> > >>> > Dan
> >> > >>> >
> >> > >>> >
> >> > >>> >
> >> > >>> >
> >> > >>>
> >> > >>
> >> > >>
> >> > > --
> >> > > New! Learn why & how to love your data with Pythian's new webinar
> >> series.
> >> > > Topics, details & register: http://www.pythian.com/webinars
> >> > >
> >> > >
> >>
> >> _________________________________________________________________
> >> Turn down-time into play-time with Messenger games
> >> http://go.microsoft.com/?linkid=9734385
> >
> >
> >
                                          
_________________________________________________________________
Game on: Challenge friends to great games on Messenger
http://go.microsoft.com/?linkid=9734387

RE: "sql*net message from client" hang

Reply via email to