from:"John Plevyak \(JIRA\)"

[jira] [Updated] (TS-4053) Add hit rate and memory usage regressions for RAM cache, tune CLFUS.

2015-12-03 Thread John Plevyak (JIRA)


 [ 
https://issues.apache.org/jira/browse/TS-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Plevyak updated TS-4053:
-
Priority: Minor  (was: Major)
 Summary: Add hit rate and memory usage regressions for RAM cache, tune 
CLFUS.  (was: Add hit rate and memory usage regressions for RAM cache.)

> Add hit rate and memory usage regressions for RAM cache, tune CLFUS.
> 
>
> Key: TS-4053
> URL: https://issues.apache.org/jira/browse/TS-4053
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Cache
>Reporter: John Plevyak
>Assignee: John Plevyak
>Priority: Minor
>
> It would be nice to have a hit rate and memory usage regression tests for the 
> RAM cache.  In particular comparing LRU and CLFUS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TS-4053) Add hit rate and memory usage regressions for RAM cache, tune CLFUS.

2015-12-03 Thread John Plevyak (JIRA)


 [ 
https://issues.apache.org/jira/browse/TS-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Plevyak updated TS-4053:
-
Description: It would be nice to have a hit rate and memory usage 
regression tests for the RAM cache.  In particular comparing LRU and CLFUS.  
Once we have this we can tune the CLFUS implementation.  (was: It would be nice 
to have a hit rate and memory usage regression tests for the RAM cache.  In 
particular comparing LRU and CLFUS.)

> Add hit rate and memory usage regressions for RAM cache, tune CLFUS.
> 
>
> Key: TS-4053
> URL: https://issues.apache.org/jira/browse/TS-4053
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Cache
>Reporter: John Plevyak
>Assignee: John Plevyak
>Priority: Minor
>
> It would be nice to have a hit rate and memory usage regression tests for the 
> RAM cache.  In particular comparing LRU and CLFUS.  Once we have this we can 
> tune the CLFUS implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (TS-4053) Add hit rate and memory usage regressions for RAM cache.

2015-12-03 Thread John Plevyak (JIRA)


 [ 
https://issues.apache.org/jira/browse/TS-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Plevyak reassigned TS-4053:


Assignee: John Plevyak

> Add hit rate and memory usage regressions for RAM cache.
> 
>
> Key: TS-4053
> URL: https://issues.apache.org/jira/browse/TS-4053
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Cache
>Reporter: John Plevyak
>Assignee: John Plevyak
>
> It would be nice to have a hit rate and memory usage regression tests for the 
> RAM cache.  In particular comparing LRU and CLFUS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (TS-4053) Add hit rate and memory usage regressions for RAM cache.

2015-12-03 Thread John Plevyak (JIRA)

John Plevyak created TS-4053:


 Summary: Add hit rate and memory usage regressions for RAM cache.
 Key: TS-4053
 URL: https://issues.apache.org/jira/browse/TS-4053
 Project: Traffic Server
  Issue Type: Improvement
  Components: Cache
Reporter: John Plevyak


It would be nice to have a hit rate and memory usage regression tests for the 
RAM cache.  In particular comparing LRU and CLFUS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (TS-3786) Use a consensus algorithm to elect the cluster master

2015-07-21 Thread John Plevyak (JIRA)

John Plevyak created TS-3786:


 Summary: Use a consensus algorithm to elect the cluster master
 Key: TS-3786
 URL: https://issues.apache.org/jira/browse/TS-3786
 Project: Traffic Server
  Issue Type: Improvement
  Components: Manager
Reporter: John Plevyak


We should use a consensus algorithm to elect the cluster master and to update 
the configurations so that there is no single point of failure and machines 
entering or restarting can be brought to a consistent state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (TS-3786) Use a consensus algorithm to elect the cluster master

2015-07-21 Thread John Plevyak (JIRA)


 [ 
https://issues.apache.org/jira/browse/TS-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Plevyak reassigned TS-3786:


Assignee: John Plevyak

> Use a consensus algorithm to elect the cluster master
> -
>
> Key: TS-3786
> URL: https://issues.apache.org/jira/browse/TS-3786
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Manager
>Reporter: John Plevyak
>Assignee: John Plevyak
>
> We should use a consensus algorithm to elect the cluster master and to update 
> the configurations so that there is no single point of failure and machines 
> entering or restarting can be brought to a consistent state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (TS-3508) use accept4 on linux systems where available to reduce system calls

2015-04-08 Thread John Plevyak (JIRA)

John Plevyak created TS-3508:


 Summary: use accept4 on linux systems where available to reduce 
system calls
 Key: TS-3508
 URL: https://issues.apache.org/jira/browse/TS-3508
 Project: Traffic Server
  Issue Type: Improvement
  Components: Network
Reporter: John Plevyak


The accept4() syscall can set flags on the accepted socket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TS-3401) AIO blocks under lock contention

2015-02-23 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-3401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14334450#comment-14334450
 ] 

John Plevyak commented on TS-3401:
--

I generally agree, but it is true that aio_thread_main() uses 
ink_atomiclist_popall() to grab the entire atomic queue associated with a 
AIO_Req for a single file descriptor/disk.  This means that a bunch of reads 
could be blocked behind the disk operation (as well as acquiring the mutex for 
write callbacks, but that is probably less important).  We could switch to 
using ink_atomiclist_pop in aio_move which would cause only a single op to be 
moved to the local queue.  

That said, we should probably reexamine using linux native AIO now that the 
eventfd code has landed.  I think it will be more efficient, and the new linux 
multi-queue support for SSDs we can do millions of ops/sec, so we want to be 
able to load up that queue and native AIO with eventfd looks like a good way to 
do it.

We should also consider changing all the delay periods (e.g. AIO_PERIOD) to be 
100 mseconds or more if we have eventfd as we don't need to busy poll 
anything... we will be awoken if anything appears in a queue or on an file 
descriptor.

> AIO blocks under lock contention
> 
>
> Key: TS-3401
> URL: https://issues.apache.org/jira/browse/TS-3401
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Reporter: Brian Geffon
>Assignee: Brian Geffon
> Attachments: aio.patch
>
>
> In {{aio_thread_main()}} while trying to process AIO ops the AIO thread will 
> wait on the mutex for the op which obviously blocks other AIO ops from 
> processing. We should use a try lock instead and reschedule the ops that we 
> couldn't immediately process. Patch attached. Waiting for reviews.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TS-1264) LRU RAM cache not accounting for overhead

2015-01-31 Thread John Plevyak (JIRA)


 [ 
https://issues.apache.org/jira/browse/TS-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Plevyak updated TS-1264:
-
Attachment: ram_cache.patch

> LRU RAM cache not accounting for overhead
> -
>
> Key: TS-1264
> URL: https://issues.apache.org/jira/browse/TS-1264
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache
>Affects Versions: 3.1.3
>Reporter: John Plevyak
>Assignee: Leif Hedstrom
>Priority: Minor
> Fix For: 6.0.0
>
> Attachments: ram_cache.patch
>
>
> The CLFUS RAM cache takes its overhead into account when determining how many 
> bytes it is using.  The LRU cache does not, which makes it hard to compare 
> performance between the two and hard to correctly size the LRU RAM cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TS-60) support writing large buffers via zero-copy

2014-11-04 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-60?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14196323#comment-14196323
 ] 

John Plevyak commented on TS-60:


This would be a win on systems which write large files frequently (as opposed 
to just serving them).  Is that a common enough workload to justify the 
complexity?  Perhaps when linux native AIO is enabled otherwise the benefit 
would likely be swamped by context switch overhead.

> support writing large buffers via zero-copy
> ---
>
> Key: TS-60
> URL: https://issues.apache.org/jira/browse/TS-60
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Cache
>Affects Versions: 3.0.0
> Environment: all
>Reporter: John Plevyak
>Assignee: Alan M. Carroll
> Fix For: sometime
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Currently all write data is written from the aggregation buffer.  In order to 
> support large buffer writes efficiently
> it would be nice to be able to write directly from page aligned memory.  This 
> would be bother more efficient and
> would help support large objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TS-3044) linux native AIO should use eventfd if available to signal thread

2014-09-30 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154210#comment-14154210
 ] 

John Plevyak commented on TS-3044:
--

The perror is from the original AIO_MODE_NATIVE code.   I was just following 
the style as this was a minimal patch just to add the eventfd handling.   I 
agree that we should change that to standard ATS errors.  For most unix/linux 
installations waiting for less than 10msec is the same as waiting for 0msecs, 
and can result in busy spinning.   The iocore has a minimum wait time, so 
HRTIME_MSECONDS(4) is disingenuous as well as being a poor idea (if it actually 
was obeyed).  

> linux native AIO should use eventfd if available to signal thread
> -
>
> Key: TS-3044
> URL: https://issues.apache.org/jira/browse/TS-3044
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Cache
>Reporter: John Plevyak
>Assignee: Phil Sorber
> Fix For: 5.2.0
>
> Attachments: native-aio-eventfd.patch
>
>
> linux native AIO has the ability to signal the event thread to get off the 
> poll and service the disk via the io_set_eventfd() call.  linux native AIO 
> scales better than the thread-based IO, but the current implementation can 
> introduce delays on lightly loaded systems because of the thread is waiting 
> on epoll().   This can be remedied by using io_set_eventfd



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TS-3044) linux native AIO should use eventfd if available to signal thread

2014-08-25 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110181#comment-14110181
 ] 

John Plevyak commented on TS-3044:
--

Assigned to weijin for review as he was in charge of the native linux AIO and 
can assess the impact.  As I remember, this isn't enabled by default because of 
the latency concerns by Leif.  With this patch, if the latency concerns are 
addressed, we might want to enable this feature by default.

> linux native AIO should use eventfd if available to signal thread
> -
>
> Key: TS-3044
> URL: https://issues.apache.org/jira/browse/TS-3044
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Cache
>Reporter: John Plevyak
>Assignee: weijin
> Attachments: native-aio-eventfd.patch
>
>
> linux native AIO has the ability to signal the event thread to get off the 
> poll and service the disk via the io_set_eventfd() call.  linux native AIO 
> scales better than the thread-based IO, but the current implementation can 
> introduce delays on lightly loaded systems because of the thread is waiting 
> on epoll().   This can be remedied by using io_set_eventfd



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (TS-3044) linux native AIO should use eventfd if available to signal thread

2014-08-25 Thread John Plevyak (JIRA)


 [ 
https://issues.apache.org/jira/browse/TS-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Plevyak updated TS-3044:
-

Attachment: native-aio-eventfd.patch

> linux native AIO should use eventfd if available to signal thread
> -
>
> Key: TS-3044
> URL: https://issues.apache.org/jira/browse/TS-3044
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Cache
>Reporter: John Plevyak
>Assignee: weijin
> Attachments: native-aio-eventfd.patch
>
>
> linux native AIO has the ability to signal the event thread to get off the 
> poll and service the disk via the io_set_eventfd() call.  linux native AIO 
> scales better than the thread-based IO, but the current implementation can 
> introduce delays on lightly loaded systems because of the thread is waiting 
> on epoll().   This can be remedied by using io_set_eventfd



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (TS-3044) linux native AIO should use eventfd if available to signal thread

2014-08-25 Thread John Plevyak (JIRA)

John Plevyak created TS-3044:


 Summary: linux native AIO should use eventfd if available to 
signal thread
 Key: TS-3044
 URL: https://issues.apache.org/jira/browse/TS-3044
 Project: Traffic Server
  Issue Type: Improvement
  Components: Cache
Reporter: John Plevyak


linux native AIO has the ability to signal the event thread to get off the poll 
and service the disk via the io_set_eventfd() call.  linux native AIO scales 
better than the thread-based IO, but the current implementation can introduce 
delays on lightly loaded systems because of the thread is waiting on epoll().   
This can be remedied by using io_set_eventfd



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (TS-3044) linux native AIO should use eventfd if available to signal thread

2014-08-25 Thread John Plevyak (JIRA)


 [ 
https://issues.apache.org/jira/browse/TS-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Plevyak updated TS-3044:
-

Assignee: weijin

> linux native AIO should use eventfd if available to signal thread
> -
>
> Key: TS-3044
> URL: https://issues.apache.org/jira/browse/TS-3044
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Cache
>Reporter: John Plevyak
>Assignee: weijin
>
> linux native AIO has the ability to signal the event thread to get off the 
> poll and service the disk via the io_set_eventfd() call.  linux native AIO 
> scales better than the thread-based IO, but the current implementation can 
> introduce delays on lightly loaded systems because of the thread is waiting 
> on epoll().   This can be remedied by using io_set_eventfd



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (TS-2193) Trafficserver 4.1 Crash with proxy.config.dns.dedicated_thread = 1

2013-09-10 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13763225#comment-13763225
 ] 

John Plevyak commented on TS-2193:
--

This is listed as an experimental performance feature.  Personally, I would 
like to see some numbers before committing resources otherwise I would say that 
the experiment was a failure.

> Trafficserver 4.1 Crash with proxy.config.dns.dedicated_thread = 1
> --
>
> Key: TS-2193
> URL: https://issues.apache.org/jira/browse/TS-2193
> Project: Traffic Server
>  Issue Type: Bug
>  Components: DNS
>Affects Versions: 4.1.0
>Reporter: Tommy Lee
> Fix For: 4.1.0
>
> Attachments: bt-01.txt
>
>
> Hi all,
>   I've tried to enable DNS Thread without luck.
>   When i set proxy.config.dns.dedicated_thread to 1, it crashes with the 
> information below.
>   The ATS is working in Forward Proxy mode.
>   Thanks in advance.
> --
> traffic.out
> NOTE: Traffic Server received Sig 11: Segmentation fault
> /usr/local/cache-4.1/bin/traffic_server - STACK TRACE: 
> /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x2af714875cb0]
> /usr/local/cache-4.1/bin/traffic_server(_Z16_acquire_sessionP13SessionBucketPK8sockaddrR7INK_MD5P6HttpSM+0x52)[0x51dac2]
> /usr/local/cache-4.1/bin/traffic_server(_ZN18HttpSessionManager15acquire_sessionEP12ContinuationPK8sockaddrPKcP17HttpClientSessionP6HttpSM+0x3d1)[0x51e0f1]
> /usr/local/cache-4.1/bin/traffic_server(_ZN6HttpSM19do_http_server_openEb+0x30c)[0x53644c]
> /usr/local/cache-4.1/bin/traffic_server(_ZN6HttpSM14set_next_stateEv+0x6a0)[0x537560]
> /usr/local/cache-4.1/bin/traffic_server(_ZN6HttpSM14set_next_stateEv+0x57e)[0x53743e]
> /usr/local/cache-4.1/bin/traffic_server(_ZN6HttpSM14set_next_stateEv+0x57e)[0x53743e]
> /usr/local/cache-4.1/bin/traffic_server(_ZN6HttpSM27state_hostdb_reverse_lookupEiPv+0xb9)[0x526b99]
> /usr/local/cache-4.1/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0xd8)[0x531be8]
> /usr/local/cache-4.1/bin/traffic_server[0x5d7c8a]
> /usr/local/cache-4.1/bin/traffic_server(_ZN18HostDBContinuation8dnsEventEiP7HostEnt+0x821)[0x5decd1]
> /usr/local/cache-4.1/bin/traffic_server(_ZN8DNSEntry9postEventEiP5Event+0x44)[0x5f7a94]
> /usr/local/cache-4.1/bin/traffic_server[0x5fd382]
> /usr/local/cache-4.1/bin/traffic_server(_ZN10DNSHandler8recv_dnsEiP5Event+0x852)[0x5fee72]
> /usr/local/cache-4.1/bin/traffic_server(_ZN10DNSHandler9mainEventEiP5Event+0x14)[0x5ffd94]
> /usr/local/cache-4.1/bin/traffic_server(_ZN7EThread13process_eventEP5Eventi+0x91)[0x6b2a41]
> /usr/local/cache-4.1/bin/traffic_server(_ZN7EThread7executeEv+0x514)[0x6b3534]
> /usr/local/cache-4.1/bin/traffic_server[0x6b17ea]
> /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a)[0x2af71486de9a]
> /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x2af71558dccd]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-2193) Trafficserver 4.1 Crash with proxy.config.dns.dedicated_thread = 1

2013-09-10 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13763219#comment-13763219
 ] 

John Plevyak commented on TS-2193:
--

I am concerned about the proxy.config.dns.dedicated_thread option.  It is 
testing a configuration where event threads are not ET_NET.  While originally 
the design accounted for that possibility, it was the case that all event 
threads were ET_NET soon thereafter and I an worried that there are implicit 
assumptions that it is the case (as seems to be with the session manager).

Is this really necessary?  DNS processing should be cheap.

Several fixes come to mind:

1) make the SessionManager not depend on being called on an ET_NET (this should 
probably be done in any case).  It could simply shift to any ET_NET thread if 
it was called from another.
2) make the DNS processor call back on an ET_NET thread (this is stupid since 
there is no good reason for it to assume the caller has such a restriction and 
indeed what about the other ET_ type?).
3) make the DNS processor run across threads by hashing hosts to all ET_NET 
threads.  This will fix both the issue we are seeing as well as spread the load.

We should probably do both 1 and 3.  There will be a temptation to do 2) 
because it will be the "easy" fix but I think it is the wrong way out.

> Trafficserver 4.1 Crash with proxy.config.dns.dedicated_thread = 1
> --
>
> Key: TS-2193
> URL: https://issues.apache.org/jira/browse/TS-2193
> Project: Traffic Server
>  Issue Type: Bug
>  Components: DNS
>Affects Versions: 4.1.0
>Reporter: Tommy Lee
> Fix For: 4.1.0
>
> Attachments: bt-01.txt
>
>
> Hi all,
>   I've tried to enable DNS Thread without luck.
>   When i set proxy.config.dns.dedicated_thread to 1, it crashes with the 
> information below.
>   The ATS is working in Forward Proxy mode.
>   Thanks in advance.
> --
> traffic.out
> NOTE: Traffic Server received Sig 11: Segmentation fault
> /usr/local/cache-4.1/bin/traffic_server - STACK TRACE: 
> /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x2af714875cb0]
> /usr/local/cache-4.1/bin/traffic_server(_Z16_acquire_sessionP13SessionBucketPK8sockaddrR7INK_MD5P6HttpSM+0x52)[0x51dac2]
> /usr/local/cache-4.1/bin/traffic_server(_ZN18HttpSessionManager15acquire_sessionEP12ContinuationPK8sockaddrPKcP17HttpClientSessionP6HttpSM+0x3d1)[0x51e0f1]
> /usr/local/cache-4.1/bin/traffic_server(_ZN6HttpSM19do_http_server_openEb+0x30c)[0x53644c]
> /usr/local/cache-4.1/bin/traffic_server(_ZN6HttpSM14set_next_stateEv+0x6a0)[0x537560]
> /usr/local/cache-4.1/bin/traffic_server(_ZN6HttpSM14set_next_stateEv+0x57e)[0x53743e]
> /usr/local/cache-4.1/bin/traffic_server(_ZN6HttpSM14set_next_stateEv+0x57e)[0x53743e]
> /usr/local/cache-4.1/bin/traffic_server(_ZN6HttpSM27state_hostdb_reverse_lookupEiPv+0xb9)[0x526b99]
> /usr/local/cache-4.1/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0xd8)[0x531be8]
> /usr/local/cache-4.1/bin/traffic_server[0x5d7c8a]
> /usr/local/cache-4.1/bin/traffic_server(_ZN18HostDBContinuation8dnsEventEiP7HostEnt+0x821)[0x5decd1]
> /usr/local/cache-4.1/bin/traffic_server(_ZN8DNSEntry9postEventEiP5Event+0x44)[0x5f7a94]
> /usr/local/cache-4.1/bin/traffic_server[0x5fd382]
> /usr/local/cache-4.1/bin/traffic_server(_ZN10DNSHandler8recv_dnsEiP5Event+0x852)[0x5fee72]
> /usr/local/cache-4.1/bin/traffic_server(_ZN10DNSHandler9mainEventEiP5Event+0x14)[0x5ffd94]
> /usr/local/cache-4.1/bin/traffic_server(_ZN7EThread13process_eventEP5Eventi+0x91)[0x6b2a41]
> /usr/local/cache-4.1/bin/traffic_server(_ZN7EThread7executeEv+0x514)[0x6b3534]
> /usr/local/cache-4.1/bin/traffic_server[0x6b17ea]
> /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a)[0x2af71486de9a]
> /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x2af71558dccd]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-2193) Trafficserver 4.1 Crash with proxy.config.dns.dedicated_thread = 1

2013-09-09 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13762366#comment-13762366
 ] 

John Plevyak commented on TS-2193:
--

the code in dns_result will check to see that we only call back on the same 
thread that initiated the DNS lookup:

  if (h->mutex->thread_holding == e->submit_thread) {
MUTEX_TRY_LOCK(lock, e->action.mutex, h->mutex->thread_holding);
if (!lock) {
  Debug("dns", "failed lock for result %s", e->qname);
  goto Lretry;
}
for (int i = 0; i < MAX_DNS_RETRIES; i++) {
  if (e->id[i] < 0)
break;
  h->release_query_id(e->id[i]);
}
e->postEvent(0, 0);
  } else {
for (int i = 0; i < MAX_DNS_RETRIES; i++) {
  if (e->id[i] < 0)
break;
  h->release_query_id(e->id[i]);
}
e->mutex = e->action.mutex;
SET_CONTINUATION_HANDLER(e, &DNSEntry::postEvent);
e->submit_thread->schedule_imm_signal(e);
  }

There are calls which will schedule on *ANY* event thread (e.g. 
eventProcessor.schedule_XX).  These could schedule (e.g. a timeout or other 
event) on the ET_DNS thread which perhaps isn't initialized for all the 
processors (e.g. sessions).

At one point I removed all calls to the non-specific thread schedule calls, but 
it is possible there are some how/still.
 

> Trafficserver 4.1 Crash with proxy.config.dns.dedicated_thread = 1
> --
>
> Key: TS-2193
> URL: https://issues.apache.org/jira/browse/TS-2193
> Project: Traffic Server
>  Issue Type: Bug
>  Components: DNS
>Affects Versions: 4.1.0
>Reporter: Tommy Lee
> Fix For: 4.1.0
>
>
> Hi all,
>   I've tried to enable DNS Thread without luck.
>   When i set proxy.config.dns.dedicated_thread to 1, it crashes with the 
> information below.
>   The ATS is working in Forward Proxy mode.
>   Thanks in advance.
> --
> traffic.out
> NOTE: Traffic Server received Sig 11: Segmentation fault
> /usr/local/cache-4.1/bin/traffic_server - STACK TRACE: 
> /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x2af714875cb0]
> /usr/local/cache-4.1/bin/traffic_server(_Z16_acquire_sessionP13SessionBucketPK8sockaddrR7INK_MD5P6HttpSM+0x52)[0x51dac2]
> /usr/local/cache-4.1/bin/traffic_server(_ZN18HttpSessionManager15acquire_sessionEP12ContinuationPK8sockaddrPKcP17HttpClientSessionP6HttpSM+0x3d1)[0x51e0f1]
> /usr/local/cache-4.1/bin/traffic_server(_ZN6HttpSM19do_http_server_openEb+0x30c)[0x53644c]
> /usr/local/cache-4.1/bin/traffic_server(_ZN6HttpSM14set_next_stateEv+0x6a0)[0x537560]
> /usr/local/cache-4.1/bin/traffic_server(_ZN6HttpSM14set_next_stateEv+0x57e)[0x53743e]
> /usr/local/cache-4.1/bin/traffic_server(_ZN6HttpSM14set_next_stateEv+0x57e)[0x53743e]
> /usr/local/cache-4.1/bin/traffic_server(_ZN6HttpSM27state_hostdb_reverse_lookupEiPv+0xb9)[0x526b99]
> /usr/local/cache-4.1/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0xd8)[0x531be8]
> /usr/local/cache-4.1/bin/traffic_server[0x5d7c8a]
> /usr/local/cache-4.1/bin/traffic_server(_ZN18HostDBContinuation8dnsEventEiP7HostEnt+0x821)[0x5decd1]
> /usr/local/cache-4.1/bin/traffic_server(_ZN8DNSEntry9postEventEiP5Event+0x44)[0x5f7a94]
> /usr/local/cache-4.1/bin/traffic_server[0x5fd382]
> /usr/local/cache-4.1/bin/traffic_server(_ZN10DNSHandler8recv_dnsEiP5Event+0x852)[0x5fee72]
> /usr/local/cache-4.1/bin/traffic_server(_ZN10DNSHandler9mainEventEiP5Event+0x14)[0x5ffd94]
> /usr/local/cache-4.1/bin/traffic_server(_ZN7EThread13process_eventEP5Eventi+0x91)[0x6b2a41]
> /usr/local/cache-4.1/bin/traffic_server(_ZN7EThread7executeEv+0x514)[0x6b3534]
> /usr/local/cache-4.1/bin/traffic_server[0x6b17ea]
> /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a)[0x2af71486de9a]
> /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x2af71558dccd]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-947) AIO Race condition on non NT systems

2013-09-04 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758339#comment-13758339
 ] 

John Plevyak commented on TS-947:
-

Yes, this has been fixed.

john





> AIO Race condition on non NT systems
> 
>
> Key: TS-947
> URL: https://issues.apache.org/jira/browse/TS-947
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
> Environment: stock build with static libts, running on a 4 core server
>Reporter: B Wyatt
>Assignee: John Plevyak
> Fix For: 4.2.0
>
> Attachments: lock-safe-AIO.patch, timed-wait-AIO.patch
>
>
> Refer to code below.  The timeslice starting when a consumer thread 
> determines that the temp_list is empty (A) and ending when it releases the 
> aio_mutex(C) is unsafe if the work queues are empty and it breaks loop 
> execution at B.  During this timeslice (A-C) the consumer holds the aio_mutex 
> and as a result request producers enqueue items on the temporary atomic list 
> (D).  As a consumer in this state will wait for a signal on aio_cond to 
> proceed before processing the temp_list again, any requests on the temp_list 
> are effectively stalled until a future request produces this signal or 
> manually processes the temp_list.
> In the case of cache volume initialization, there is no "future request" and 
> the initialization sequence soft locks. 
> {code:title=iocore/aio/AIO.cc(annotated)}
> void *
> aio_thread_main(void *arg)
> {
>   ...
>   ink_mutex_acquire(&my_aio_req->aio_mutex);
>   for (;;) {
> do {
>   current_req = my_aio_req;
>   /* check if any pending requests on the atomic list */
> A>>>  if (!INK_ATOMICLIST_EMPTY(my_aio_req->aio_temp_list))
> aio_move(my_aio_req);
>   if (!(op = my_aio_req->aio_todo.pop()) && !(op =
> my_aio_req->http_aio_todo.pop()))
> B>>>break;
>   ...
>   <>
>   ...
> } while (1);
> C>>>ink_cond_wait(&my_aio_req->aio_cond, &my_aio_req->aio_mutex);
>   }
>   ...
> }
> static void
> aio_queue_req(AIOCallbackInternal *op, int fromAPI = 0)
> {
>   ...
>   if (!ink_mutex_try_acquire(&req->aio_mutex)) {
> D>>>ink_atomiclist_push(&req->aio_temp_list, op);
>   } else {
> /* check if any pending requests on the atomic list */
> if (!INK_ATOMICLIST_EMPTY(req->aio_temp_list))
>   aio_move(req);
> /* now put the new request */
> aio_insert(op, req);
> ink_cond_signal(&req->aio_cond);
> ink_mutex_release(&req->aio_mutex);
>   }
>   ...
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-1648) Segmentation fault in dir_clear_range()

2013-05-29 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13670005#comment-13670005
 ] 

John Plevyak commented on TS-1648:
--

I added a patch to make the variables I think are causing the problem int64_t.

> Segmentation fault in dir_clear_range()
> ---
>
> Key: TS-1648
> URL: https://issues.apache.org/jira/browse/TS-1648
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache
>Affects Versions: 3.3.0, 3.2.0
> Environment: reverse proxy
>Reporter: Tomasz Kuzemko
>Assignee: John Plevyak
>  Labels: A
> Fix For: 3.3.3
>
> Attachments: 
> 0001-Fix-for-TS-1648-Segmentation-fault-in-dir_clear_rang.patch, 
> cachedir_int64-jp-1.patch
>
>
> I use ATS as a reverse proxy. I have a fairly large disk cache consisting of 
> 2x 10TB raw disks. I do not use cache compression. After a few days of 
> running (this is a dev machine - not handling any traffic) ATS begins to 
> crash with a segfault shortly after start:
> [Jan 11 16:11:00.690] Server {0x72bb8700} DEBUG: (rusage) took rusage 
> snap 1357917060690487000
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x720ad700 (LWP 17292)]
> 0x00696a71 in dir_clear_range (start=640, end=17024, vol=0x16057d0) 
> at CacheDir.cc:382
> 382   CacheDir.cc: No such file or directory.
>   in CacheDir.cc
> (gdb) p i
> $1 = 214748365
> (gdb) l
> 377   in CacheDir.cc
> (gdb) p dir_index(vol, i)
> $2 = (Dir *) 0x7ff997a04002
> (gdb) p dir_index(vol, i-1)
> $3 = (Dir *) 0x7ffa97a03ff8
> (gdb) p *dir_index(vol, i-1)
> $4 = {w = {0, 0, 0, 0, 0}}
> (gdb) p *dir_index(vol, i-2)
> $5 = {w = {0, 0, 52431, 52423, 0}}
> (gdb) p *dir_index(vol, i)
> Cannot access memory at address 0x7ff997a04002
> (gdb) p *dir_index(vol, i+2)
> Cannot access memory at address 0x7ff997a04016
> (gdb) p *dir_index(vol, i+1)
> Cannot access memory at address 0x7ff997a0400c
> (gdb) p vol->buckets * DIR_DEPTH * vol->segments
> $6 = 1246953472
> (gdb) bt
> #0  0x00696a71 in dir_clear_range (start=640, end=17024, 
> vol=0x16057d0) at CacheDir.cc:382
> #1  0x0068aba2 in Vol::handle_recover_from_data (this=0x16057d0, 
> event=3900, data=0x16058a0) at Cache.cc:1384
> #2  0x004e8e1c in Continuation::handleEvent (this=0x16057d0, 
> event=3900, data=0x16058a0) at ../iocore/eventsystem/I_Continuation.h:146
> #3  0x00692385 in AIOCallbackInternal::io_complete (this=0x16058a0, 
> event=1, data=0x135afc0) at ../../iocore/aio/P_AIO.h:80
> #4  0x004e8e1c in Continuation::handleEvent (this=0x16058a0, event=1, 
> data=0x135afc0) at ../iocore/eventsystem/I_Continuation.h:146
> #5  0x00700fec in EThread::process_event (this=0x736c4010, 
> e=0x135afc0, calling_code=1) at UnixEThread.cc:142
> #6  0x007011ff in EThread::execute (this=0x736c4010) at 
> UnixEThread.cc:191
> #7  0x006ff8c2 in spawn_thread_internal (a=0x1356040) at Thread.cc:88
> #8  0x7797e8ca in start_thread () from /lib/libpthread.so.0
> #9  0x755c6b6d in clone () from /lib/libc.so.6
> #10 0x in ?? ()
> This is fixed by running "traffic_server -Kk" to clear the cache. But after a 
> few days the issue reappears.
> I will keep the current faulty setup as-is in case you need me to provide 
> more data. I tried to make a core dump but it took a couple of GB even after 
> gzip (I can however provide it on request).
> *Edit*
> OS is Debian GNU/Linux 6.0.6 with custom built kernel 
> 3.2.13-grsec--grs-ipv6-64

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (TS-1648) Segmentation fault in dir_clear_range()

2013-05-29 Thread John Plevyak (JIRA)


 [ 
https://issues.apache.org/jira/browse/TS-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Plevyak updated TS-1648:
-

Attachment: cachedir_int64-jp-1.patch

> Segmentation fault in dir_clear_range()
> ---
>
> Key: TS-1648
> URL: https://issues.apache.org/jira/browse/TS-1648
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache
>Affects Versions: 3.3.0, 3.2.0
> Environment: reverse proxy
>Reporter: Tomasz Kuzemko
>Assignee: John Plevyak
>  Labels: A
> Fix For: 3.3.3
>
> Attachments: 
> 0001-Fix-for-TS-1648-Segmentation-fault-in-dir_clear_rang.patch, 
> cachedir_int64-jp-1.patch
>
>
> I use ATS as a reverse proxy. I have a fairly large disk cache consisting of 
> 2x 10TB raw disks. I do not use cache compression. After a few days of 
> running (this is a dev machine - not handling any traffic) ATS begins to 
> crash with a segfault shortly after start:
> [Jan 11 16:11:00.690] Server {0x72bb8700} DEBUG: (rusage) took rusage 
> snap 1357917060690487000
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x720ad700 (LWP 17292)]
> 0x00696a71 in dir_clear_range (start=640, end=17024, vol=0x16057d0) 
> at CacheDir.cc:382
> 382   CacheDir.cc: No such file or directory.
>   in CacheDir.cc
> (gdb) p i
> $1 = 214748365
> (gdb) l
> 377   in CacheDir.cc
> (gdb) p dir_index(vol, i)
> $2 = (Dir *) 0x7ff997a04002
> (gdb) p dir_index(vol, i-1)
> $3 = (Dir *) 0x7ffa97a03ff8
> (gdb) p *dir_index(vol, i-1)
> $4 = {w = {0, 0, 0, 0, 0}}
> (gdb) p *dir_index(vol, i-2)
> $5 = {w = {0, 0, 52431, 52423, 0}}
> (gdb) p *dir_index(vol, i)
> Cannot access memory at address 0x7ff997a04002
> (gdb) p *dir_index(vol, i+2)
> Cannot access memory at address 0x7ff997a04016
> (gdb) p *dir_index(vol, i+1)
> Cannot access memory at address 0x7ff997a0400c
> (gdb) p vol->buckets * DIR_DEPTH * vol->segments
> $6 = 1246953472
> (gdb) bt
> #0  0x00696a71 in dir_clear_range (start=640, end=17024, 
> vol=0x16057d0) at CacheDir.cc:382
> #1  0x0068aba2 in Vol::handle_recover_from_data (this=0x16057d0, 
> event=3900, data=0x16058a0) at Cache.cc:1384
> #2  0x004e8e1c in Continuation::handleEvent (this=0x16057d0, 
> event=3900, data=0x16058a0) at ../iocore/eventsystem/I_Continuation.h:146
> #3  0x00692385 in AIOCallbackInternal::io_complete (this=0x16058a0, 
> event=1, data=0x135afc0) at ../../iocore/aio/P_AIO.h:80
> #4  0x004e8e1c in Continuation::handleEvent (this=0x16058a0, event=1, 
> data=0x135afc0) at ../iocore/eventsystem/I_Continuation.h:146
> #5  0x00700fec in EThread::process_event (this=0x736c4010, 
> e=0x135afc0, calling_code=1) at UnixEThread.cc:142
> #6  0x007011ff in EThread::execute (this=0x736c4010) at 
> UnixEThread.cc:191
> #7  0x006ff8c2 in spawn_thread_internal (a=0x1356040) at Thread.cc:88
> #8  0x7797e8ca in start_thread () from /lib/libpthread.so.0
> #9  0x755c6b6d in clone () from /lib/libc.so.6
> #10 0x in ?? ()
> This is fixed by running "traffic_server -Kk" to clear the cache. But after a 
> few days the issue reappears.
> I will keep the current faulty setup as-is in case you need me to provide 
> more data. I tried to make a core dump but it took a couple of GB even after 
> gzip (I can however provide it on request).
> *Edit*
> OS is Debian GNU/Linux 6.0.6 with custom built kernel 
> 3.2.13-grsec--grs-ipv6-64

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-1648) Segmentation fault in dir_clear_range()

2013-05-29 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13669893#comment-13669893
 ] 

John Plevyak commented on TS-1648:
--

Rather than long we should be using int64 as "long" is not well defined (it is 
platform dependent). Are those 10TB RAIDs?  If so, you are better of using them 
as JBOD since ATS assumes that there is a single disk arm (or equal fraction) 
for each "disk" is storage.config.  Because of the size of your "disk" it is 
possible that you have more than 2^31 directory entries which would account for 
the overflow.  Also, given the size, the "clear" may take a long time.  Your 
trace is not long enough for me to see if it repeats.  However, if it does 
repeat, it is possible that it is because dir_in_bucket also takes an int which 
is then multiplied to get a directory number.  The other possibility is (of 
course) that you have memory corruption: the directory is the single largest 
memory user, and it contains a linked list which can be circularized by 
corruption, but let's concentrate on the other issues first.

I would suggest that we change all the bucket/entry/etc offsets to int64 (I can 
build a patch, but I would appreciate a review).  Second, I would suggest 
(after testing to ensure that the patch fixes your problem) that you move to 
JBOD rather than RAID-0 or to having multiple NAS volumes which correspond 
approximately to the number of underlying disks since ATS will only have one 
outstanding write (although multiple reads) for each "disk" in storage.config.  

> Segmentation fault in dir_clear_range()
> ---
>
> Key: TS-1648
> URL: https://issues.apache.org/jira/browse/TS-1648
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache
>Affects Versions: 3.3.0, 3.2.0
> Environment: reverse proxy
>Reporter: Tomasz Kuzemko
>Assignee: weijin
>  Labels: A
> Fix For: 3.3.3
>
> Attachments: 
> 0001-Fix-for-TS-1648-Segmentation-fault-in-dir_clear_rang.patch
>
>
> I use ATS as a reverse proxy. I have a fairly large disk cache consisting of 
> 2x 10TB raw disks. I do not use cache compression. After a few days of 
> running (this is a dev machine - not handling any traffic) ATS begins to 
> crash with a segfault shortly after start:
> [Jan 11 16:11:00.690] Server {0x72bb8700} DEBUG: (rusage) took rusage 
> snap 1357917060690487000
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x720ad700 (LWP 17292)]
> 0x00696a71 in dir_clear_range (start=640, end=17024, vol=0x16057d0) 
> at CacheDir.cc:382
> 382   CacheDir.cc: No such file or directory.
>   in CacheDir.cc
> (gdb) p i
> $1 = 214748365
> (gdb) l
> 377   in CacheDir.cc
> (gdb) p dir_index(vol, i)
> $2 = (Dir *) 0x7ff997a04002
> (gdb) p dir_index(vol, i-1)
> $3 = (Dir *) 0x7ffa97a03ff8
> (gdb) p *dir_index(vol, i-1)
> $4 = {w = {0, 0, 0, 0, 0}}
> (gdb) p *dir_index(vol, i-2)
> $5 = {w = {0, 0, 52431, 52423, 0}}
> (gdb) p *dir_index(vol, i)
> Cannot access memory at address 0x7ff997a04002
> (gdb) p *dir_index(vol, i+2)
> Cannot access memory at address 0x7ff997a04016
> (gdb) p *dir_index(vol, i+1)
> Cannot access memory at address 0x7ff997a0400c
> (gdb) p vol->buckets * DIR_DEPTH * vol->segments
> $6 = 1246953472
> (gdb) bt
> #0  0x00696a71 in dir_clear_range (start=640, end=17024, 
> vol=0x16057d0) at CacheDir.cc:382
> #1  0x0068aba2 in Vol::handle_recover_from_data (this=0x16057d0, 
> event=3900, data=0x16058a0) at Cache.cc:1384
> #2  0x004e8e1c in Continuation::handleEvent (this=0x16057d0, 
> event=3900, data=0x16058a0) at ../iocore/eventsystem/I_Continuation.h:146
> #3  0x00692385 in AIOCallbackInternal::io_complete (this=0x16058a0, 
> event=1, data=0x135afc0) at ../../iocore/aio/P_AIO.h:80
> #4  0x004e8e1c in Continuation::handleEvent (this=0x16058a0, event=1, 
> data=0x135afc0) at ../iocore/eventsystem/I_Continuation.h:146
> #5  0x00700fec in EThread::process_event (this=0x736c4010, 
> e=0x135afc0, calling_code=1) at UnixEThread.cc:142
> #6  0x007011ff in EThread::execute (this=0x736c4010) at 
> UnixEThread.cc:191
> #7  0x006ff8c2 in spawn_thread_internal (a=0x1356040) at Thread.cc:88
> #8  0x7797e8ca in start_thread () from /lib/libpthread.so.0
> #9  0x755c6b6d in clone () from /lib/libc.so.6
> #10 0x in ?? ()
> This is fixed by running "traffic_server -Kk" to clear the cache. But after a 
> few days the issue reappears.
> I will keep the current faulty setup as-is in case you need me to provide 
> more data. I tried to make a core dump but it took a couple of GB even after 
> gzip (I can however provide it on request).
> *Edit*
> OS is Debian GNU/Linux 6.0.6 with custom built ke

[jira] [Commented] (TS-745) Support ssd

2013-05-21 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13663697#comment-13663697
 ] 

John Plevyak commented on TS-745:
-

Humm... let me read over the code.  An SSD layer is necessary at this point, 
and if this is ephemeral, I am sure we can find a clean integration.

thanx!

> Support ssd
> ---
>
> Key: TS-745
> URL: https://issues.apache.org/jira/browse/TS-745
> Project: Traffic Server
>  Issue Type: New Feature
>  Components: Cache
>Reporter: mohan_zl
>Assignee: weijin
> Fix For: 3.3.5
>
> Attachments: 0001-TS-745-support-interim-caching-in-storage.patch, 
> ts-745.diff, TS-ssd-2.patch, TS-ssd.patch
>
>
> A patch for supporting, not work well for a long time with --enable-debug

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-745) Support ssd

2013-05-20 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13662684#comment-13662684
 ] 

John Plevyak commented on TS-745:
-

I think the idea of stealing bits from the directory which are hard coded to 
point off device (off the hard disk which the directory is a part of) is a huge 
design departure and a problem.  When the cache was first built, it was limited 
to 8GB disks which seemed HUGE.  For Apache I extended it to .5PB as by then 
8GB was far too small.  Currently disks are at 4TB and this patch would 
decrease the limit from .5PB to 32TB which gives us only a few years headroom, 
not a good idea.   Furthermore, the current design let's you unplug any cache 
disk from any machine, move it to another machine and have your cache back.   
This change stores SSD information in the HDD directory! why?  Changing the 
configuration, a disk or machine failure, etc. invalidates that information 
corrupting the cache.   Why not store that information in a side structure and 
either store it only in memory only or on the SSD?   

The idea of storing the SSD configuration in a string in records.config is also 
a bad idea.

Overall, a stacked cache seems like a better idea or a minimally invasive 
extension would be great.   This patch is pretty invasive, duplicates code and 
generally touches many bits of the code.  The ram cache for example uses no 
bits in the HDD directory and only a couple entry points at well defined places 
(insert, lookup and delete/invalidate).

This patch looks to incur more technical depth at a time when I think we would 
like to decrease the technical debt.  For example, it would be nice to have 
more smaller locks, move the HTTP support out of the core via a well defined 
interface, add layering, etc.   Adding yet another set of core code paths is 
going to make those changes harder.

my 2 cents.

> Support ssd
> ---
>
> Key: TS-745
> URL: https://issues.apache.org/jira/browse/TS-745
> Project: Traffic Server
>  Issue Type: New Feature
>  Components: Cache
>Reporter: mohan_zl
>Assignee: weijin
> Fix For: 3.3.5
>
> Attachments: 0001-TS-745-support-interim-caching-in-storage.patch, 
> ts-745.diff, TS-ssd-2.patch, TS-ssd.patch
>
>
> A patch for supporting, not work well for a long time with --enable-debug

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-1453) remove InactivityCop and enable define INACTIVITY_TIMEOUT

2013-04-19 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13636624#comment-13636624
 ] 

John Plevyak commented on TS-1453:
--

Couple things:  1) the lock is always held over accesses to "disabled" so it 
doesn't need to be volatile 2) I would just change the callback_event to a new 
EVENT_DISABLED and handle it in NetVConnction::mainEvent.   The reason is that 
this will isolate the changes to the net processor, and I think the interaction 
of the disabled flag with the timeouts is problematic: you are going to end up 
rescheduling the event as an immediate eventually which will cause a lot of 
busy processing.

> remove InactivityCop and enable define INACTIVITY_TIMEOUT
> -
>
> Key: TS-1453
> URL: https://issues.apache.org/jira/browse/TS-1453
> Project: Traffic Server
>  Issue Type: Sub-task
>  Components: Core
>Affects Versions: 3.2.0
>Reporter: Bin Chen
>Assignee: Bin Chen
> Fix For: 3.3.5
>
> Attachments: TS-1453.patch
>
>
> when we have O(1), then we can be enable define INACTIVITY_TIMEOUT

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system

2013-04-13 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13631112#comment-13631112
 ] 

John Plevyak commented on TS-1405:
--

A third drop in performance on any test is a red flag.  There is definitely 
something wrong.  There are two things going on in this patch.  1) it replaces 
the power of 2 buckets with a time wheel and 2) it introduces an atomic list as 
a mechanism for freeing up events quickly.  Perhaps we can test the two 
separately?  In particular, we can remove the atomic list effects by just 
having Event::cancel_event() call cancel_action() and commenting out the call 
to process_cancelled_events().

Leif, you up for running your test again with that change?



> apply time-wheel scheduler  about event system
> --
>
> Key: TS-1405
> URL: https://issues.apache.org/jira/browse/TS-1405
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 3.2.0
>Reporter: Bin Chen
>Assignee: Bin Chen
> Fix For: 3.3.2
>
> Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, 
> linux_time_wheel_v11jp.patch, linux_time_wheel_v2.patch, 
> linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, 
> linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, 
> linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, 
> linux_time_wheel_v9jp.patch
>
>
> when have more and more event in event system scheduler, it's worse. This is 
> the reason why we use inactivecop to handler keepalive. the new scheduler is 
> time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system

2013-04-10 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13627913#comment-13627913
 ] 

John Plevyak commented on TS-1405:
--

So why would we be getting less thread utilization?  We could be not
distributing the work over the threads (you could try switching from accept
thread to per-thread accept).  We could be blocking the threads waiting on
other threads either because of lock contention or locks just getting held.
 We could be introducing a delay whereby previously "ready" events are now
waiting.   We could have introduced a LIFO queue which previously was a
FIFO queue such that some connections are starving.

Other ideas?

I'll read the patch over again.





> apply time-wheel scheduler  about event system
> --
>
> Key: TS-1405
> URL: https://issues.apache.org/jira/browse/TS-1405
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 3.2.0
>Reporter: Bin Chen
>Assignee: Bin Chen
> Fix For: 3.3.2
>
> Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, 
> linux_time_wheel_v11jp.patch, linux_time_wheel_v2.patch, 
> linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, 
> linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, 
> linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, 
> linux_time_wheel_v9jp.patch
>
>
> when have more and more event in event system scheduler, it's worse. This is 
> the reason why we use inactivecop to handler keepalive. the new scheduler is 
> time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system

2013-04-09 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13627411#comment-13627411
 ] 

John Plevyak commented on TS-1405:
--


Weird.  The min and max are down, but the mean is up.  What happens when you go 
to 500 connections?  I am wondering if it is an efficiency or a latency issue.

> apply time-wheel scheduler  about event system
> --
>
> Key: TS-1405
> URL: https://issues.apache.org/jira/browse/TS-1405
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 3.2.0
>Reporter: Bin Chen
>Assignee: Bin Chen
> Fix For: 3.3.2
>
> Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, 
> linux_time_wheel_v11jp.patch, linux_time_wheel_v2.patch, 
> linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, 
> linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, 
> linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, 
> linux_time_wheel_v9jp.patch
>
>
> when have more and more event in event system scheduler, it's worse. This is 
> the reason why we use inactivecop to handler keepalive. the new scheduler is 
> time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system

2013-04-08 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13625510#comment-13625510
 ] 

John Plevyak commented on TS-1405:
--

Perhaps this is a larger issue.   We use eventfd to wake up the event thread on 
an unloaded system, but it would be best to avoid using it when the system 
becomes loaded as it is expensive and tends to cause spinning on moderately 
loaded systems. Perhaps instead we should have operational regimes: use 
blocking IO threads on an unloaded or lightly loaded system and switching to 
AIO as the system becomes more heavily loaded.   I would also be interested to 
see how this interacts with SSDs which can have wait times in the micro-second 
range.   The crossover point for an SSD system is likely different than for an 
HDD system.

> apply time-wheel scheduler  about event system
> --
>
> Key: TS-1405
> URL: https://issues.apache.org/jira/browse/TS-1405
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 3.2.0
>Reporter: Bin Chen
>Assignee: Bin Chen
> Fix For: 3.3.2
>
> Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, 
> linux_time_wheel_v11jp.patch, linux_time_wheel_v2.patch, 
> linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, 
> linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, 
> linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, 
> linux_time_wheel_v9jp.patch
>
>
> when have more and more event in event system scheduler, it's worse. This is 
> the reason why we use inactivecop to handler keepalive. the new scheduler is 
> time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system

2013-04-08 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13625460#comment-13625460
 ] 

John Plevyak commented on TS-1405:
--

The patch includes:

+#if AIO_MODE == AIO_MODE_NATIVE
+#define AIO_PERIOD-HRTIME_MSECONDS(4)
+#else

Even if it was set to zero, on an unloaded system it would only get
polled every 10 msecs because that is the poll rate for epoll(), so
you could potentially delay a disk IO by that amount of time.







> apply time-wheel scheduler  about event system
> --
>
> Key: TS-1405
> URL: https://issues.apache.org/jira/browse/TS-1405
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 3.2.0
>Reporter: Bin Chen
>Assignee: Bin Chen
> Fix For: 3.3.2
>
> Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, 
> linux_time_wheel_v11jp.patch, linux_time_wheel_v2.patch, 
> linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, 
> linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, 
> linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, 
> linux_time_wheel_v9jp.patch
>
>
> when have more and more event in event system scheduler, it's worse. This is 
> the reason why we use inactivecop to handler keepalive. the new scheduler is 
> time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system

2013-04-08 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13625447#comment-13625447
 ] 

John Plevyak commented on TS-1405:
--

Sounds good.   What sort of CPU/Memory improvements are you seeing?

> apply time-wheel scheduler  about event system
> --
>
> Key: TS-1405
> URL: https://issues.apache.org/jira/browse/TS-1405
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 3.2.0
>Reporter: Bin Chen
>Assignee: Bin Chen
> Fix For: 3.3.2
>
> Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, 
> linux_time_wheel_v11jp.patch, linux_time_wheel_v2.patch, 
> linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, 
> linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, 
> linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, 
> linux_time_wheel_v9jp.patch
>
>
> when have more and more event in event system scheduler, it's worse. This is 
> the reason why we use inactivecop to handler keepalive. the new scheduler is 
> time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-1760) use linux native aio

2013-04-03 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621157#comment-13621157
 ] 

John Plevyak commented on TS-1760:
--

This patch seems to have a latency issues and a busy wait issue.The timeout 
on io_getevents is 4msec which is below the threshold on some systems for busy 
waiting which is often 10msec.   Second, it queues the events onto a handler in 
the same thread rather than doing the io_submit itself.  Third, if the handler 
which calls io_getevents on an EThread, there is already another call to 
epoll() which is blocking the same thread.  Having two calls on the same thread 
blocking the thread is not a good idea: they will conflict with one blocking 
while the other has ready data (i.e. from the net or from the disk).

If io_submit is thread safe while there is an currently waiting io_getevents on 
another thread, then linux aio might be viable for traffic server.  If 
io_getevents played well with epoll() then linux aio might be viable.   Really, 
to get this to work Linux would need to have an integrated async completion API.

> use linux native aio
> 
>
> Key: TS-1760
> URL: https://issues.apache.org/jira/browse/TS-1760
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Core
>Reporter: weijin
>Assignee: weijin
> Fix For: 3.3.2
>
> Attachments: native_aio.patch
>
>
> add a feature that use linux native aio

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system

2013-03-30 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13618129#comment-13618129
 ] 

John Plevyak commented on TS-1405:
--

I missed on case, fixed in v11. 

I agree that you won't see the race if the timeout (50msec) is sufficiently 
large and no thread fails to be rescheduled and run in that amount of time, but 
I think such timing dependent behavior is to be avoided if possible.  We have 
have a couple other races of this type, uses of new_Freer() and flushing of the 
log buffers but the former use a much larger timeout (1 minute) while the 
latter may be a cause of occasional crashes which we have not been able to 
debug for years.  Experiences with the log buffer flushing issue are why I am 
not happy with a race in the event code.

> apply time-wheel scheduler  about event system
> --
>
> Key: TS-1405
> URL: https://issues.apache.org/jira/browse/TS-1405
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 3.2.0
>Reporter: Bin Chen
>Assignee: Bin Chen
> Fix For: 3.3.2
>
> Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, 
> linux_time_wheel_v11jp.patch, linux_time_wheel_v2.patch, 
> linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, 
> linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, 
> linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, 
> linux_time_wheel_v9jp.patch
>
>
> when have more and more event in event system scheduler, it's worse. This is 
> the reason why we use inactivecop to handler keepalive. the new scheduler is 
> time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (TS-1405) apply time-wheel scheduler about event system

2013-03-30 Thread John Plevyak (JIRA)


 [ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Plevyak updated TS-1405:
-

Attachment: linux_time_wheel_v11jp.patch

> apply time-wheel scheduler  about event system
> --
>
> Key: TS-1405
> URL: https://issues.apache.org/jira/browse/TS-1405
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 3.2.0
>Reporter: Bin Chen
>Assignee: Bin Chen
> Fix For: 3.3.2
>
> Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, 
> linux_time_wheel_v11jp.patch, linux_time_wheel_v2.patch, 
> linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, 
> linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, 
> linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, 
> linux_time_wheel_v9jp.patch
>
>
> when have more and more event in event system scheduler, it's worse. This is 
> the reason why we use inactivecop to handler keepalive. the new scheduler is 
> time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system

2013-03-29 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13617877#comment-13617877
 ] 

John Plevyak commented on TS-1405:
--

If anyone else would like to chime in, I would appreciate it.  Race conditions 
are subtle and when they exist, lead to random crashes which are very difficult 
to debug, so I would like to be sure that we are not introducing any races with 
this change.

> apply time-wheel scheduler  about event system
> --
>
> Key: TS-1405
> URL: https://issues.apache.org/jira/browse/TS-1405
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 3.2.0
>Reporter: Bin Chen
>Assignee: Bin Chen
> Fix For: 3.3.2
>
> Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, 
> linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, 
> linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, 
> linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, 
> linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch
>
>
> when have more and more event in event system scheduler, it's worse. This is 
> the reason why we use inactivecop to handler keepalive. the new scheduler is 
> time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system

2013-03-29 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13617873#comment-13617873
 ] 

John Plevyak commented on TS-1405:
--

No, in the current patch (v10) in process_event the event will only be free'd 
if cancelled is set to CANCEL_SET which means that the Event is not in the 
atomic_list.

The current v10 patch is simple, fast and has no delay and hence no opportunity 
for timing related problems.

The previous patch checks Event::in_the_priority_queue which can change state 
at any time when the Event::ethread != this_ethread(). This is a race, and as a 
result the state of the Event being on the atomic_list is not knowable in the 
EThread during ::execute().  This will result in crashes.  You may not be 
seeing them because we typically pin all transactions to a single thread unless 
proxy.config.share_server_session is set to 1, so Event::ethread == 
this_ethread(), however that is not the case in general. Try testing with this 
and the appropriate configuration and you will see the problem.




> apply time-wheel scheduler  about event system
> --
>
> Key: TS-1405
> URL: https://issues.apache.org/jira/browse/TS-1405
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 3.2.0
>Reporter: Bin Chen
>Assignee: Bin Chen
> Fix For: 3.3.2
>
> Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, 
> linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, 
> linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, 
> linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, 
> linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch
>
>
> when have more and more event in event system scheduler, it's worse. This is 
> the reason why we use inactivecop to handler keepalive. the new scheduler is 
> time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system

2013-03-28 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13616342#comment-13616342
 ] 

John Plevyak commented on TS-1405:
--

I am still concerned about race conditions with the v9 patch.  In particular 
when the cancelled flag is set is possible (but not certain) that the event 
will be in the atomic list.  If it is, then it should not be free'd, but if it 
is not it should be. Doing the wrong thing is either a leak or memory 
corruption. Furthermore, if we are cancelling from a different thread than the 
one the Event is on, the in_the_priority_queue flag is racy (it may change at 
any time) and hence should not be relied upon.

Attached please find v10.  This patch converts the 'cancelled' flag into a 
multi-state variable which captures whether or not the Event is in the atomic 
list.   All tests of the "cancelled" variable now do the right thing with 
respect to the state of the event.

Bin Chen: please take a look at this patch and consider the possible races and 
tell me what you think.

> apply time-wheel scheduler  about event system
> --
>
> Key: TS-1405
> URL: https://issues.apache.org/jira/browse/TS-1405
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 3.2.0
>Reporter: Bin Chen
>Assignee: Bin Chen
> Fix For: 3.3.2
>
> Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, 
> linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, 
> linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, 
> linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, 
> linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch
>
>
> when have more and more event in event system scheduler, it's worse. This is 
> the reason why we use inactivecop to handler keepalive. the new scheduler is 
> time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (TS-1405) apply time-wheel scheduler about event system

2013-03-28 Thread John Plevyak (JIRA)


 [ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Plevyak updated TS-1405:
-

Attachment: linux_time_wheel_v10jp.patch

> apply time-wheel scheduler  about event system
> --
>
> Key: TS-1405
> URL: https://issues.apache.org/jira/browse/TS-1405
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 3.2.0
>Reporter: Bin Chen
>Assignee: Bin Chen
> Fix For: 3.3.2
>
> Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, 
> linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, 
> linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, 
> linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, 
> linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch
>
>
> when have more and more event in event system scheduler, it's worse. This is 
> the reason why we use inactivecop to handler keepalive. the new scheduler is 
> time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (TS-1405) apply time-wheel scheduler about event system

2013-03-26 Thread John Plevyak (JIRA)


 [ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Plevyak updated TS-1405:
-

Attachment: linux_time_wheel_v9jp.patch

Fix Mutex leak and remove delay.

> apply time-wheel scheduler  about event system
> --
>
> Key: TS-1405
> URL: https://issues.apache.org/jira/browse/TS-1405
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 3.2.0
>Reporter: Bin Chen
>Assignee: Bin Chen
> Fix For: 3.3.2
>
> Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, 
> linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, 
> linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, 
> linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, 
> linux_time_wheel_v9jp.patch
>
>
> when have more and more event in event system scheduler, it's worse. This is 
> the reason why we use inactivecop to handler keepalive. the new scheduler is 
> time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system

2013-03-26 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13614895#comment-13614895
 ] 

John Plevyak commented on TS-1405:
--

I have uploaded a small modification on the recent v8 patch.  This modification 
removes the delay, fixes a memory leak (of Mutex) and avoids going through the 
atomic list if we are on the same thread (the typical case).

> apply time-wheel scheduler  about event system
> --
>
> Key: TS-1405
> URL: https://issues.apache.org/jira/browse/TS-1405
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 3.2.0
>Reporter: Bin Chen
>Assignee: Bin Chen
> Fix For: 3.3.2
>
> Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, 
> linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, 
> linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, 
> linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, 
> linux_time_wheel_v9jp.patch
>
>
> when have more and more event in event system scheduler, it's worse. This is 
> the reason why we use inactivecop to handler keepalive. the new scheduler is 
> time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system

2013-03-23 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13611749#comment-13611749
 ] 

John Plevyak commented on TS-1405:
--

I think depending on the delay is brittle.  You can never tell how long a 
thread will be delayed in an overloaded system, and the delay increases memory 
pressure.  Rather I would remove the delay, moving the line

+  event_cancel_list_head = (Event *) ink_atomiclist_popall(&event_cancel_list);

above the loop in process_cancel_event() (and remove the time test).

Then I would move the assignment of cancelled = true into set_event_cancel:

if (!e->canceled) {
  if (e->in_the_priority_queue && (e->timeout_at - e->ethread->cur_time) > 
HRTIME_SECONDS(event_cancel_limit)) {
/* prevent more threads cancel one event racing */
e->cancelled = true;
ink_atomiclist_push(&event_cancel_list, e);
  } else
e->cancelled = true;
}

In fact, I would just incorporate the code in set_event_cancel into 
cancel_event() since it is only called in one place.

So, I agree, that the delay would most likely have prevented a problem, but I 
think it would be better to not have it, because when future programmers see a 
constant delay, they might be tempted to decrease it to the point when problems 
might occur.

> apply time-wheel scheduler  about event system
> --
>
> Key: TS-1405
> URL: https://issues.apache.org/jira/browse/TS-1405
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 3.2.0
>Reporter: Bin Chen
>Assignee: Bin Chen
> Fix For: 3.3.2
>
> Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, 
> linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, 
> linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, 
> linux_time_wheel_v7.patch, linux_time_wheel_v8.patch
>
>
> when have more and more event in event system scheduler, it's worse. This is 
> the reason why we use inactivecop to handler keepalive. the new scheduler is 
> time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system

2013-03-22 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13611638#comment-13611638
 ] 

John Plevyak commented on TS-1405:
--

+TS_INLINE void
+Event::cancel_event(Continuation * c)
+{
+  if (!cancelled) {
+ink_assert(!c || c == continuation);
+ethread->set_event_cancel(this);
+cancelled = true;
+  }
+}

Once set_event_cancel has run, the Event may be deleted at any time.   Do not 
set the cancelled flag here.  It is set in set_cancel_event() in any case.  If 
you set it here you can overwrite free memory (or worse a another event).

> apply time-wheel scheduler  about event system
> --
>
> Key: TS-1405
> URL: https://issues.apache.org/jira/browse/TS-1405
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 3.2.0
>Reporter: Bin Chen
>Assignee: Bin Chen
> Fix For: 3.3.2
>
> Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, 
> linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, 
> linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, 
> linux_time_wheel_v7.patch, linux_time_wheel_v8.patch
>
>
> when have more and more event in event system scheduler, it's worse. This is 
> the reason why we use inactivecop to handler keepalive. the new scheduler is 
> time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system

2013-03-21 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13609122#comment-13609122
 ] 

John Plevyak commented on TS-1405:
--

Why is it segfaulting?  Can we backout the commit(s) which which caused the 
problem?

> apply time-wheel scheduler  about event system
> --
>
> Key: TS-1405
> URL: https://issues.apache.org/jira/browse/TS-1405
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 3.2.0
>Reporter: Bin Chen
>Assignee: Bin Chen
> Fix For: 3.3.2
>
> Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, 
> linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, 
> linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, 
> linux_time_wheel_v7.patch, linux_time_wheel_v8.patch
>
>
> when have more and more event in event system scheduler, it's worse. This is 
> the reason why we use inactivecop to handler keepalive. the new scheduler is 
> time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system

2013-03-20 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13608142#comment-13608142
 ] 

John Plevyak commented on TS-1405:
--

There are only very limited reasons to use an ink_release_assert, in particular 
if it looks like we could be returning the wrong content to a user.  We 
shouldn't use them to check other invariants as such checks just slow down the 
production server and are better done during regression testing and not at 
production time.  Moreover, a server that crashes can cause major service 
disruption, so the assert itself may very well cause more harm than a bug.

> apply time-wheel scheduler  about event system
> --
>
> Key: TS-1405
> URL: https://issues.apache.org/jira/browse/TS-1405
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 3.2.0
>Reporter: Bin Chen
>Assignee: Bin Chen
> Fix For: 3.3.2
>
> Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, 
> linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, 
> linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, 
> linux_time_wheel_v7.patch
>
>
> when have more and more event in event system scheduler, it's worse. This is 
> the reason why we use inactivecop to handler keepalive. the new scheduler is 
> time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system

2013-03-20 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13608136#comment-13608136
 ] 

John Plevyak commented on TS-1405:
--

If everything is correct there should be no race.  You shouldn't be setting the 
'cancelled' flag in cancel_event() since it is set in set_cancelled_event.  
Remove the ink_release_assert().  We should not have any of these: they slow 
the code down and lead to crash storms which are bad for everyone.

There is no race because the caller needs to be holding the mutex, and after 
the call to cancel_event() the event is considered dead (which is why you 
shouldn't be setting the "cancelled" flag AFTER inserting the event into the 
cancel atomic list, because that is a race).

> apply time-wheel scheduler  about event system
> --
>
> Key: TS-1405
> URL: https://issues.apache.org/jira/browse/TS-1405
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 3.2.0
>Reporter: Bin Chen
>Assignee: Bin Chen
> Fix For: 3.3.2
>
> Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, 
> linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, 
> linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, 
> linux_time_wheel_v7.patch
>
>
> when have more and more event in event system scheduler, it's worse. This is 
> the reason why we use inactivecop to handler keepalive. the new scheduler is 
> time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system

2013-03-19 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13606671#comment-13606671
 ] 

John Plevyak commented on TS-1405:
--

You are using EVENT_FREE, which does not free the mutex (which is reference 
counted) by setting it to NULL.   Try using free_event().

Also, I think process_cancel_event shouldn't delay for 4 seconds, that is far 
too long.  Perhaps 10 msec?

Finally, why is the ink_atomic_popall happening at the end of process cancel 
event? shouldn't event_cancel_list_head be local and the call happen at the 
start (after the delay)?


> apply time-wheel scheduler  about event system
> --
>
> Key: TS-1405
> URL: https://issues.apache.org/jira/browse/TS-1405
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 3.2.0
>Reporter: Bin Chen
>Assignee: Bin Chen
> Fix For: 3.3.2
>
> Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, 
> linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, 
> linux_time_wheel_v5.patch, linux_time_wheel_v6.patch
>
>
> when have more and more event in event system scheduler, it's worse. This is 
> the reason why we use inactivecop to handler keepalive. the new scheduler is 
> time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system

2013-03-16 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13604486#comment-13604486
 ] 

John Plevyak commented on TS-1405:
--

Thanx!



> apply time-wheel scheduler  about event system
> --
>
> Key: TS-1405
> URL: https://issues.apache.org/jira/browse/TS-1405
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 3.2.0
>Reporter: Bin Chen
>Assignee: Bin Chen
> Fix For: 3.3.2
>
> Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, 
> linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, 
> linux_time_wheel_v5.patch
>
>
> when have more and more event in event system scheduler, it's worse. This is 
> the reason why we use inactivecop to handler keepalive. the new scheduler is 
> time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system

2013-03-16 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13604324#comment-13604324
 ] 

John Plevyak commented on TS-1405:
--

Could you update this patch to be against the current master branch?

I am getting a compile failure:

UnixEThread.cc: In constructor 'EThread::EThread()':
UnixEThread.cc:57:81: error: 'IOCORE_ReadConfigInteger' was not declared in 
this scope
UnixEThread.cc: In constructor 'EThread::EThread(ThreadType, int)':
UnixEThread.cc:79:81: error: 'IOCORE_ReadConfigInteger' was not declared in 
this scope
UnixEThread.cc: In constructor 'EThread::EThread(ThreadType, Event*, ink_sem*)':
UnixEThread.cc:116:81: error: 'IOCORE_ReadConfigInteger' was not declared in 
this scope

and a patch failure:

--- iocore/net/P_UnixNetVConnection.h
+++ iocore/net/P_UnixNetVConnection.h
@@ -339,7 +339,7 @@
   inactivity_timeout_in = 0;
 #ifdef INACTIVITY_TIMEOUT
   if (inactivity_timeout) {
-inactivity_timeout->cancel_action(this);
+inactivity_timeout->cancel_event(this);
 inactivity_timeout = NULL;
   }
 #else
@@ -351,7 +351,7 @@
 UnixNetVConnection::cancel_active_timeout()
 {
   if (active_timeout) {
-active_timeout->cancel_action(this);
+active_timeout->cancel_event(this);
 active_timeout = NULL;
 active_timeout_in = 0;
   }
~





> apply time-wheel scheduler  about event system
> --
>
> Key: TS-1405
> URL: https://issues.apache.org/jira/browse/TS-1405
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 3.2.0
>Reporter: Bin Chen
>Assignee: Bin Chen
> Fix For: 3.3.2
>
> Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, 
> linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, 
> linux_time_wheel_v5.patch
>
>
> when have more and more event in event system scheduler, it's worse. This is 
> the reason why we use inactivecop to handler keepalive. the new scheduler is 
> time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-1742) Freelists to use 64bit version w/ Double Word Compare and Swap

2013-03-12 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13600214#comment-13600214
 ] 

John Plevyak commented on TS-1742:
--

There are still a number of volatile declarations associated head_p, and
they need
to be made consistent.  Anyone with an ARM/i386 system want to do the
honors?

At the very least it looks like here and in ink_queue.h where head_p itself
is declared volatile.

john




> Freelists to use 64bit version w/ Double Word Compare and Swap
> --
>
> Key: TS-1742
> URL: https://issues.apache.org/jira/browse/TS-1742
> Project: Traffic Server
>  Issue Type: Improvement
>Reporter: Brian Geffon
>Assignee: Brian Geffon
> Fix For: 3.3.2
>
> Attachments: 128bit_cas.patch, 128bit_cas.patch.2
>
>
> So to those of you familiar with the freelists you know that it works this 
> way the head pointer uses the upper 16 bits for a version to prevent the ABA 
> problem. The big drawback to this is that it requires the following macros to 
> get at the pointer or the version:
> {code}
> #define FREELIST_POINTER(_x) ((void*)(intptr_t)(_x).data)<<16)>>16) | \
>  (((~intptr_t)(_x).data)<<16>>63)-1))>>48)<<48)))  // sign extend
> #define FREELIST_VERSION(_x) (((intptr_t)(_x).data)>>48)
> #define SET_FREELIST_POINTER_VERSION(_x,_p,_v) \
>   (_x).data = intptr_t)(_p))&0xULL) | (((_v)&0xULL) 
> << 48))
> {code}
> Additionally, since this only leaves 16 bits it limits the number of versions 
> you can have, well more and more x86_64 processors support DCAS (double word 
> compare and swap / 128bit CAS). This means that we can use 64bits for a 
> version which basically makes the versions unlimited but more importantly it 
> takes those macros above and simplifies them to:
> {code}
> #define FREELIST_POINTER(_x) (_x).s.pointer
> #define FREELIST_VERSION(_x) (_x).s.version
> #define SET_FREELIST_POINTER_VERSION(_x,_p,_v) \
> (_x).s.pointer = _p; (_x).s.version = _v
> {code}
> As you can imagine this will have a performance improvement, in my simple 
> tests I measured a performance improvement of around 6%. Unfortunately, I'm 
> not an expert with this stuff and I would really appreciate more community 
> feedback before I commit this patch.
> Note: this only applies if you're not using a reclaimable freelist.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-1742) Freelists to use 64bit version w/ Double Word Compare and Swap

2013-03-10 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13598501#comment-13598501
 ] 

John Plevyak commented on TS-1742:
--

We can take it out because we do the loads manually for the cas.

I always liked to use volatile as a marker that the variable was being
accessed outside
of a lock, but if it is causing a performance problem then we could convert
the keyword
into a comment:

// Warning: this variable is read and written in multiple threads without a
lock, use INK_QUEUE_LD to read safely.

john




> Freelists to use 64bit version w/ Double Word Compare and Swap
> --
>
> Key: TS-1742
> URL: https://issues.apache.org/jira/browse/TS-1742
> Project: Traffic Server
>  Issue Type: Improvement
>Reporter: Brian Geffon
>Assignee: Brian Geffon
> Attachments: 128bit_cas.patch, 128bit_cas.patch.2
>
>
> So to those of you familiar with the freelists you know that it works this 
> way the head pointer uses the upper 16 bits for a version to prevent the ABA 
> problem. The big drawback to this is that it requires the following macros to 
> get at the pointer or the version:
> {code}
> #define FREELIST_POINTER(_x) ((void*)(intptr_t)(_x).data)<<16)>>16) | \
>  (((~intptr_t)(_x).data)<<16>>63)-1))>>48)<<48)))  // sign extend
> #define FREELIST_VERSION(_x) (((intptr_t)(_x).data)>>48)
> #define SET_FREELIST_POINTER_VERSION(_x,_p,_v) \
>   (_x).data = intptr_t)(_p))&0xULL) | (((_v)&0xULL) 
> << 48))
> {code}
> Additionally, since this only leaves 16 bits it limits the number of versions 
> you can have, well more and more x86_64 processors support DCAS (double word 
> compare and swap / 128bit CAS). This means that we can use 64bits for a 
> version which basically makes the versions unlimited but more importantly it 
> takes those macros above and simplifies them to:
> {code}
> #define FREELIST_POINTER(_x) (_x).s.pointer
> #define FREELIST_VERSION(_x) (_x).s.version
> #define SET_FREELIST_POINTER_VERSION(_x,_p,_v) \
> (_x).s.pointer = _p; (_x).s.version = _v
> {code}
> As you can imagine this will have a performance improvement, in my simple 
> tests I measured a performance improvement of around 6%. Unfortunately, I'm 
> not an expert with this stuff and I would really appreciate more community 
> feedback before I commit this patch.
> Note: this only applies if you're not using a reclaimable freelist.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-1742) Freelists to use 64bit version w/ Double Word Compare and Swap

2013-03-10 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13598350#comment-13598350
 ] 

John Plevyak commented on TS-1742:
--

Looks good.  I might consider adding an ink_debug_assert that the type_size for 
the freelist now be at least 16 bytes when this is enabled.

> Freelists to use 64bit version w/ Double Word Compare and Swap
> --
>
> Key: TS-1742
> URL: https://issues.apache.org/jira/browse/TS-1742
> Project: Traffic Server
>  Issue Type: Improvement
>Reporter: Brian Geffon
>Assignee: Brian Geffon
> Attachments: 128bit_cas.patch, 128bit_cas.patch.2
>
>
> So to those of you familiar with the freelists you know that it works this 
> way the head pointer uses the upper 16 bits for a version to prevent the ABA 
> problem. The big drawback to this is that it requires the following macros to 
> get at the pointer or the version:
> {code}
> #define FREELIST_POINTER(_x) ((void*)(intptr_t)(_x).data)<<16)>>16) | \
>  (((~intptr_t)(_x).data)<<16>>63)-1))>>48)<<48)))  // sign extend
> #define FREELIST_VERSION(_x) (((intptr_t)(_x).data)>>48)
> #define SET_FREELIST_POINTER_VERSION(_x,_p,_v) \
>   (_x).data = intptr_t)(_p))&0xULL) | (((_v)&0xULL) 
> << 48))
> {code}
> Additionally, since this only leaves 16 bits it limits the number of versions 
> you can have, well more and more x86_64 processors support DCAS (double word 
> compare and swap / 128bit CAS). This means that we can use 64bits for a 
> version which basically makes the versions unlimited but more importantly it 
> takes those macros above and simplifies them to:
> {code}
> #define FREELIST_POINTER(_x) (_x).s.pointer
> #define FREELIST_VERSION(_x) (_x).s.version
> #define SET_FREELIST_POINTER_VERSION(_x,_p,_v) \
> (_x).s.pointer = _p; (_x).s.version = _v
> {code}
> As you can imagine this will have a performance improvement, in my simple 
> tests I measured a performance improvement of around 6%. Unfortunately, I'm 
> not an expert with this stuff and I would really appreciate more community 
> feedback before I commit this patch.
> Note: this only applies if you're not using a reclaimable freelist.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system

2013-02-27 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13588535#comment-13588535
 ] 

John Plevyak commented on TS-1405:
--

There is still a nasty race condition.  The call to set_event_cancel() is 
async, so it can happen at any time.  It first sets the "cancelled" flag then 
it enqueues in the atomic list.  Between these two events EThread::execute can 
pull it from the external queue, check the flag, call free_event().  If this 
happens, the atomic list insert will result in a clobber of free'd memory.

Instead, the only place that the event can be free'd when "cancelled" is in 
process_cancel_event() which should probably be named 
process_cancelled_events().

Also, no atomic is required to set the cancelled flag.  This flag can only be 
set while holding the mutex of the Event, so it is single threaded (there can 
be no race).   I would suggest adding an ink_debug_assert() to that effect, but 
not using an atomic.   It isn't strictly wrong, but it gives the impression 
that the lock isn't required which it most definitely is.  Without the lock 
there would be a race inside process_event which takes the lock, checks 
cancelled and the runs the event.  If it was possible to cancel without the 
lock, it could happen between the check and the running of the event which 
would be "very bad".

What do you think Bin Chen?  I can update the patch or do you want to 
incorporate my comments?


> apply time-wheel scheduler  about event system
> --
>
> Key: TS-1405
> URL: https://issues.apache.org/jira/browse/TS-1405
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 3.2.0
>Reporter: Bin Chen
>Assignee: Bin Chen
> Fix For: 3.3.1
>
> Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, 
> linux_time_wheel_v3.patch, linux_time_wheel_v4.patch
>
>
> when have more and more event in event system scheduler, it's worse. This is 
> the reason why we use inactivecop to handler keepalive. the new scheduler is 
> time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system

2013-02-27 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13588500#comment-13588500
 ] 

John Plevyak commented on TS-1405:
--

The atomic list is single linked, so you could use SLINK for clink in Event.  
There are lots of events, so an extra field is worth saving.

> apply time-wheel scheduler  about event system
> --
>
> Key: TS-1405
> URL: https://issues.apache.org/jira/browse/TS-1405
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 3.2.0
>Reporter: Bin Chen
>Assignee: Bin Chen
> Fix For: 3.3.1
>
> Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, 
> linux_time_wheel_v3.patch, linux_time_wheel_v4.patch
>
>
> when have more and more event in event system scheduler, it's worse. This is 
> the reason why we use inactivecop to handler keepalive. the new scheduler is 
> time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system

2013-02-27 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13588498#comment-13588498
 ] 

John Plevyak commented on TS-1405:
--

Instance variables "CancelList" need to start with a lower case letter and use 
_ to separate words (like all the other variables in this file).

> apply time-wheel scheduler  about event system
> --
>
> Key: TS-1405
> URL: https://issues.apache.org/jira/browse/TS-1405
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 3.2.0
>Reporter: Bin Chen
>Assignee: Bin Chen
> Fix For: 3.3.1
>
> Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, 
> linux_time_wheel_v3.patch, linux_time_wheel_v4.patch
>
>
> when have more and more event in event system scheduler, it's worse. This is 
> the reason why we use inactivecop to handler keepalive. the new scheduler is 
> time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system

2013-02-27 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13588493#comment-13588493
 ] 

John Plevyak commented on TS-1405:
--

I am getting some compilation errors with tcc 4.7.2 :

UnixEThread.cc:159:83: error: no matching function for call to 
'ink_atomic_cas(int32_t*, bool, bool)'
UnixEThread.cc:159:83: note: candidate is:
In file included from ../../lib/ts/libts.h:52:0,
 from P_EventSystem.h:39,
 from UnixEThread.cc:30:
../../lib/ts/ink_atomic.h:152:1: note: template bool 
ink_atomic_cas(volatile T*, T, T)
../../lib/ts/ink_atomic.h:152:1: note:   template argument 
deduction/substitution failed:
UnixEThread.cc:159:83: note:   deduced conflicting types for parameter 'T' 
('int' and 'bool')

Also:
UnixEThread.cc: In constructor 'EThread::EThread()':
UnixEThread.cc:58:81: error: 'IOCORE_ReadConfigInteger' was not declared in 
this scope



> apply time-wheel scheduler  about event system
> --
>
> Key: TS-1405
> URL: https://issues.apache.org/jira/browse/TS-1405
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 3.2.0
>Reporter: Bin Chen
>Assignee: Bin Chen
> Fix For: 3.3.1
>
> Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, 
> linux_time_wheel_v3.patch, linux_time_wheel_v4.patch
>
>
> when have more and more event in event system scheduler, it's worse. This is 
> the reason why we use inactivecop to handler keepalive. the new scheduler is 
> time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system

2013-02-25 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13586722#comment-13586722
 ] 

John Plevyak commented on TS-1405:
--

Let me take a last look.  The race condition bug in the last version was
rather subtle.

john




> apply time-wheel scheduler  about event system
> --
>
> Key: TS-1405
> URL: https://issues.apache.org/jira/browse/TS-1405
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 3.2.0
>Reporter: Bin Chen
>Assignee: Bin Chen
> Fix For: 3.3.1
>
> Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, 
> linux_time_wheel_v3.patch, time_wheel_v4.patch, TS-1405.patch
>
>
> when have more and more event in event system scheduler, it's worse. This is 
> the reason why we use inactivecop to handler keepalive. the new scheduler is 
> time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-1006) memory management, cut down memory waste ?

2012-12-11 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13529116#comment-13529116
 ] 

John Plevyak commented on TS-1006:
--

I agree.  We should land this initially as a compile time option in the dev
branch to get wider production time on it before moving it to default.

The main reason is that it is invasive and complicated, particularly in the
way it will interact with the VM system and it would be nice to see how it
responds in a variety of environments.

If it is much better than TCMalloc, then perhaps we should package it up in
a more general form as well.

Was the design based on another allocator/paper?  Any references?

john




> memory management, cut down memory waste ?
> --
>
> Key: TS-1006
> URL: https://issues.apache.org/jira/browse/TS-1006
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 3.1.1
>Reporter: Zhao Yongming
>Assignee: Bin Chen
> Fix For: 3.3.2
>
> Attachments: 0001-Allocator-optimize-InkFreeList-memory-pool.patch, 
> 0002-Allocator-make-InkFreeList-memory-pool-configurable.patch, 
> Memory-Usage-After-Introduced-New-Allocator.png, memusage.ods, memusage.ods
>
>
> when we review the memory usage in the production, there is something 
> abnormal, ie, looks like TS take much memory than index data + common system 
> waste, and here is some memory dump result by set 
> "proxy.config.dump_mem_info_frequency"
> 1, the one on a not so busy forwarding system:
> physics memory: 32G
> RAM cache: 22G
> DISK: 6140 GB
> average_object_size 64000
> {code}
>  allocated  |in-use  | type size  |   free list name
> |||--
>   671088640 |   37748736 |2097152 | 
> memory/ioBufAllocator[14]
>  2248146944 | 2135949312 |1048576 | 
> memory/ioBufAllocator[13]
>  1711276032 | 1705508864 | 524288 | 
> memory/ioBufAllocator[12]
>  1669332992 | 1667760128 | 262144 | 
> memory/ioBufAllocator[11]
>  2214592512 | 221184 | 131072 | 
> memory/ioBufAllocator[10]
>  2325741568 | 2323775488 |  65536 | 
> memory/ioBufAllocator[9]
>  2091909120 | 2089123840 |  32768 | 
> memory/ioBufAllocator[8]
>  1956642816 | 1956478976 |  16384 | 
> memory/ioBufAllocator[7]
>  2094530560 | 2094071808 |   8192 | 
> memory/ioBufAllocator[6]
>   356515840 |  355540992 |   4096 | 
> memory/ioBufAllocator[5]
> 1048576 |  14336 |   2048 | 
> memory/ioBufAllocator[4]
>  131072 |  0 |   1024 | 
> memory/ioBufAllocator[3]
>   65536 |  0 |512 | 
> memory/ioBufAllocator[2]
>   32768 |  0 |256 | 
> memory/ioBufAllocator[1]
>   16384 |  0 |128 | 
> memory/ioBufAllocator[0]
>   0 |  0 |576 | 
> memory/ICPRequestCont_allocator
>   0 |  0 |112 | 
> memory/ICPPeerReadContAllocator
>   0 |  0 |432 | 
> memory/PeerReadDataAllocator
>   0 |  0 | 32 | 
> memory/MIMEFieldSDKHandle
>   0 |  0 |240 | 
> memory/INKVConnAllocator
>   0 |  0 | 96 | 
> memory/INKContAllocator
>4096 |  0 | 32 | 
> memory/apiHookAllocator
>   0 |  0 |288 | 
> memory/FetchSMAllocator
>   0 |  0 | 80 | 
> memory/prefetchLockHandlerAllocator
>   0 |  0 |176 | 
> memory/PrefetchBlasterAllocator
>   0 |  0 | 80 | 
> memory/prefetchUrlBlaster
>   0 |  0 | 96 | memory/blasterUrlList
>   0 |  0 | 96 | 
> memory/prefetchUrlEntryAllocator
>   0 |  0 |128 | 
> memory/socksProxyAllocator
>   0 |  0 |144 | 
> memory/ObjectReloadCont
> 3258368 | 576016 |592 | 
> memory/httpClientSessionAllocator
>  825344 | 139568 |208 | 
> memory/httpServerSessionAllocator
>22597632 |1284848 |   9808 | memory/httpSMAllocator
>   0 |  0 | 32 | 
> memory/CacheLookupHttpConfigAllocator
>   0

[jira] [Commented] (TS-1006) memory management, cut down memory waste ?

2012-12-10 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13528528#comment-13528528
 ] 

John Plevyak commented on TS-1006:
--

Some of the volatile variables are not listed as such (e.g. 
InkThreadCache::status).

Also, what is the purpose of this status field and how is it updated?  It is 
set in ink_freelist_new to 0 via simple assignment, then tested/assigned via a 
cas in ink_freelist_free.  Some comments, or documentation would be nice.

Have you tested this against the default memory allocator and TCMalloc?

This seems to be doing something similar to TCMalloc and that code has been 
extensively tested.

> memory management, cut down memory waste ?
> --
>
> Key: TS-1006
> URL: https://issues.apache.org/jira/browse/TS-1006
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 3.1.1
>Reporter: Zhao Yongming
>Assignee: Bin Chen
> Fix For: 3.3.2
>
> Attachments: 0001-Allocator-optimize-InkFreeList-memory-pool.patch, 
> 0002-Allocator-make-InkFreeList-memory-pool-configurable.patch, 
> Memory-Usage-After-Introduced-New-Allocator.png, memusage.ods, memusage.ods
>
>
> when we review the memory usage in the production, there is something 
> abnormal, ie, looks like TS take much memory than index data + common system 
> waste, and here is some memory dump result by set 
> "proxy.config.dump_mem_info_frequency"
> 1, the one on a not so busy forwarding system:
> physics memory: 32G
> RAM cache: 22G
> DISK: 6140 GB
> average_object_size 64000
> {code}
>  allocated  |in-use  | type size  |   free list name
> |||--
>   671088640 |   37748736 |2097152 | 
> memory/ioBufAllocator[14]
>  2248146944 | 2135949312 |1048576 | 
> memory/ioBufAllocator[13]
>  1711276032 | 1705508864 | 524288 | 
> memory/ioBufAllocator[12]
>  1669332992 | 1667760128 | 262144 | 
> memory/ioBufAllocator[11]
>  2214592512 | 221184 | 131072 | 
> memory/ioBufAllocator[10]
>  2325741568 | 2323775488 |  65536 | 
> memory/ioBufAllocator[9]
>  2091909120 | 2089123840 |  32768 | 
> memory/ioBufAllocator[8]
>  1956642816 | 1956478976 |  16384 | 
> memory/ioBufAllocator[7]
>  2094530560 | 2094071808 |   8192 | 
> memory/ioBufAllocator[6]
>   356515840 |  355540992 |   4096 | 
> memory/ioBufAllocator[5]
> 1048576 |  14336 |   2048 | 
> memory/ioBufAllocator[4]
>  131072 |  0 |   1024 | 
> memory/ioBufAllocator[3]
>   65536 |  0 |512 | 
> memory/ioBufAllocator[2]
>   32768 |  0 |256 | 
> memory/ioBufAllocator[1]
>   16384 |  0 |128 | 
> memory/ioBufAllocator[0]
>   0 |  0 |576 | 
> memory/ICPRequestCont_allocator
>   0 |  0 |112 | 
> memory/ICPPeerReadContAllocator
>   0 |  0 |432 | 
> memory/PeerReadDataAllocator
>   0 |  0 | 32 | 
> memory/MIMEFieldSDKHandle
>   0 |  0 |240 | 
> memory/INKVConnAllocator
>   0 |  0 | 96 | 
> memory/INKContAllocator
>4096 |  0 | 32 | 
> memory/apiHookAllocator
>   0 |  0 |288 | 
> memory/FetchSMAllocator
>   0 |  0 | 80 | 
> memory/prefetchLockHandlerAllocator
>   0 |  0 |176 | 
> memory/PrefetchBlasterAllocator
>   0 |  0 | 80 | 
> memory/prefetchUrlBlaster
>   0 |  0 | 96 | memory/blasterUrlList
>   0 |  0 | 96 | 
> memory/prefetchUrlEntryAllocator
>   0 |  0 |128 | 
> memory/socksProxyAllocator
>   0 |  0 |144 | 
> memory/ObjectReloadCont
> 3258368 | 576016 |592 | 
> memory/httpClientSessionAllocator
>  825344 | 139568 |208 | 
> memory/httpServerSessionAllocator
>22597632 |1284848 |   9808 | memory/httpSMAllocator
>   0 |  0 | 32 | 
> memory/CacheLookupHttpConfigAllocator
>   0 |  0 |

[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system

2012-09-06 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449953#comment-13449953
 ] 

John Plevyak commented on TS-1405:
--

weijin: I don't know that freeing it as soon as possible is as big a goal as 
race conditions are a problem :)  The current code can take up to 5 seconds to 
free a cancelled event, so this code is much better in that regard, even if we 
have to wait for the next time the event loop runs.

> apply time-wheel scheduler  about event system
> --
>
> Key: TS-1405
> URL: https://issues.apache.org/jira/browse/TS-1405
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 3.2.0
>Reporter: kuotai
>Assignee: kuotai
> Fix For: 3.3.1
>
> Attachments: linux_time_wheel.patch
>
>
> when have more and more event in event system scheduler, it's worse. This is 
> the reason why we use inactivecop to handler keepalive. the new scheduler is 
> time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system

2012-09-06 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449950#comment-13449950
 ] 

John Plevyak commented on TS-1405:
--

There is a race between the adding into the atomic list in the cancelling 
thread, getting dequeued in the controlling thread, and the setting of the 
cancelled flag in the cancelling thread.  One solution is to take the mutex 
lock in the check_ready code as the cancelling thread must be holding that lock 
over the insert into the atomic list and setting the cancelled flag.  Note, you 
could set the cancelled flag before adding to the atomic list and then just 
ignore it in process_thread() (and any other place) counting on it getting 
free'd eventually via the atomic list.  

> apply time-wheel scheduler  about event system
> --
>
> Key: TS-1405
> URL: https://issues.apache.org/jira/browse/TS-1405
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 3.2.0
>Reporter: kuotai
>Assignee: kuotai
> Fix For: 3.3.1
>
> Attachments: linux_time_wheel.patch
>
>
> when have more and more event in event system scheduler, it's worse. This is 
> the reason why we use inactivecop to handler keepalive. the new scheduler is 
> time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system

2012-08-28 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443583#comment-13443583
 ] 

John Plevyak commented on TS-1405:
--

Sorry, the numbers for 30 seconds should be 30/5 + ~17 (every time a power of 2 
bucket is touched, 1/2 of the of the elements will be moved out, and 1/2 of 
those will be moved down 2 levels, etc.) = 27 vs 7 for the time wheel

So the time wheel, in the case of short expired timeouts, can be several times 
more efficient.

> apply time-wheel scheduler  about event system
> --
>
> Key: TS-1405
> URL: https://issues.apache.org/jira/browse/TS-1405
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 3.2.0
>Reporter: kuotai
>Assignee: kuotai
> Fix For: 3.3.0
>
> Attachments: time-wheel.patch
>
>
> when have more and more event in event system scheduler, it's worse. This is 
> the reason why we use inactivecop to handler keepalive. the new scheduler is 
> time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system

2012-08-28 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443575#comment-13443575
 ] 

John Plevyak commented on TS-1405:
--

The current code "should" have a complexity which is bounded by the need to 
scan the entire queue every 5 seconds.  This is necessary because cancelling an 
event involves setting the volatile "cancelled" flag and to not scan them would 
result in running out of memory.  Assuming an event is inserted with a 30 
seconds timeout and waits till it runs, it will be touched 30/5 = 6 + 10 = 16 
times.  For a 300 second timeout it will be touched 300/5 = 60 + 10 = 70 times.

If an event is cancelled (the normal case for timeouts). Then it will be 
touched once (after an average of 2.5 seconds).  So (at least according to the 
design). The cost of the current design should be only a small constant factor 
worse than the time wheel and should average slightly more than 1 touch per 
event which is the best that can be expected.   Of course that is the 
design if it is causing problems, then likely there is a bug or something 
about the workload which is causing problems.

The time wheel can bring this down to 1 touch every N seconds with expected 1 
touch per event or 6 and 60 above.

So, I think this is a very reasonable change, assuming that it can deal with 
the out-of-memory issue, and I interested in seeing the benchmarks as I am 
curious as to see how the theory and practice collide.

> apply time-wheel scheduler  about event system
> --
>
> Key: TS-1405
> URL: https://issues.apache.org/jira/browse/TS-1405
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 3.2.0
>Reporter: kuotai
>Assignee: kuotai
> Fix For: 3.3.0
>
> Attachments: time-wheel.patch
>
>
> when have more and more event in event system scheduler, it's worse. This is 
> the reason why we use inactivecop to handler keepalive. the new scheduler is 
> time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (TS-1264) LRU RAM cache not accounting for overhead

2012-05-19 Thread John Plevyak (JIRA)

John Plevyak created TS-1264:


 Summary: LRU RAM cache not accounting for overhead
 Key: TS-1264
 URL: https://issues.apache.org/jira/browse/TS-1264
 Project: Traffic Server
  Issue Type: Bug
  Components: Cache
Affects Versions: 3.1.3
Reporter: John Plevyak
Assignee: John Plevyak
Priority: Minor


The CLFUS RAM cache takes its overhead into account when determining how many 
bytes it is using.  The LRU cache does not which makes it hard to compare 
performance between the two and hard to correctly size the LRU RAM cache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-1240) Debug assert triggered in LogBuffer.cc:209

2012-05-16 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-1240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13277284#comment-13277284
 ] 

John Plevyak commented on TS-1240:
--

What is the downside to restoring the delete delay buffer (was the memory usage 
too high)?

> Debug assert triggered in LogBuffer.cc:209
> --
>
> Key: TS-1240
> URL: https://issues.apache.org/jira/browse/TS-1240
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Logging
>Affects Versions: 3.1.4
>Reporter: Leif Hedstrom
> Fix For: 3.1.5
>
>
> From John:
> {code}
> [May  1 09:08:44.746] Server {0x77fce800} NOTE: traffic server running
> FATAL: LogBuffer.cc:209: failed assert `m_unaligned_buffer`
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server - STACK 
> TRACE: 
> /home/jplevyak/projects/ts/ts-2/lib/ts/.libs/libtsutil.so.3(ink_fatal+0xa3)[0x77bae4a5]
> /home/jplevyak/projects/ts/ts-2/lib/ts/.libs/libtsutil.so.3(_ink_assert+0x3c)[0x77bad47c]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN9LogBuffer14checkout_writeEPmm+0x35)[0x5d3a53]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN9LogObject15_checkout_writeEPmm+0x41)[0x5eef75]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN9LogObject3logEP9LogAccessPc+0x4cb)[0x5ef5b9]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN16LogObjectManager3logEP9LogAccess+0x4a)[0x5daab4]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN3Log6accessEP9LogAccess+0x235)[0x5d97f9]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN6HttpSM12update_statsEv+0x204)[0x579872]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN6HttpSM9kill_thisEv+0x31d)[0x579525]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN6HttpSM12main_handlerEiPv+0x337)[0x56cec1]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN12Continuation11handleEventEiPv+0x72)[0x4e2450]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN10HttpTunnel12main_handlerEiPv+0x14c)[0x5b24aa]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN12Continuation11handleEventEiPv+0x72)[0x4e2450]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server[0x6bb9d1]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server[0x6bbafa]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_Z15write_to_net_ioP10NetHandlerP18UnixNetVConnectionP7EThread+0x6fa)[0x6bcaaf]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_Z12write_to_netP10NetHandlerP18UnixNetVConnectionP14PollDescriptorP7EThread+0x7d)[0x6bc3b3]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN10NetHandler12mainNetEventEiP5Event+0x6e6)[0x6b8828]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN12Continuation11handleEventEiPv+0x72)[0x4e2450]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN7EThread13process_eventEP5Eventi+0x111)[0x6dde7f]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN7EThread7executeEv+0x431)[0x6de42b]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server[0x6dd0bc]
> /lib64/libpthread.so.0(+0x7d90)[0x77676d90]
> /lib64/libc.so.6(clone+0x6d)[0x754f9f5d]
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-1238) RAM cache hit rate unexpectedly low

2012-05-15 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276422#comment-13276422
 ] 

John Plevyak commented on TS-1238:
--

I think the problem is that the LRU cache doesn't account for its overhead, 
while the CLFUS cache does which puts it at an unfair disadvantage in terms of 
relative true memory used per byte allocated.  The CLFUS cache is much better 
behaved when the working set is larger than the RAM cache size and it supports 
compression.  I am going to commit this fix and leave CLFUS as the default and 
file another bug to fix the accounting for ram in the LRU cache.   I think this 
will make the performance comparable in the best case and better in worse cases.

> RAM cache hit rate unexpectedly low
> ---
>
> Key: TS-1238
> URL: https://issues.apache.org/jira/browse/TS-1238
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache
>Affects Versions: 3.1.3
>Reporter: John Plevyak
>Assignee: John Plevyak
> Fix For: 3.1.4
>
> Attachments: TS-1238-jp-1.patch
>
>
> The RAM cache is not getting the expected hit rate.  Looks like there are a 
> couple issues.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-1240) Debug assert triggered in LogBuffer.cc:209

2012-05-13 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-1240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13274396#comment-13274396
 ] 

John Plevyak commented on TS-1240:
--

I think removing the delay buffer before deleting the LogBuffer has lead to 
this race condition.  The problem is that checkout_write can execute at the 
same time as the flush. The only thing that prevents a problem is that 
checkout_write will fail when it finds that the buffer is full.   When a buffer 
becomes full there can still be other threads trying to do checkout_write, so 
the buffer must not be deleted immediately after the flush... instead it is 
kept around for a while, until all the other thread which might be doing 
checkout_write figure out that the buffer is full and back out and reload the 
current buffer pointer (which is going to be different).  Once all those 
threads have dropped references to the LogBuffer, it is finally OK to delete 
it.  This should probably be documented in grizzly detail in the LogObject 
file.  It is a bit of a pain, but it does make it possible to implement the 
critical path of logging completely without locks.

Do you want me to generate a patch?

> Debug assert triggered in LogBuffer.cc:209
> --
>
> Key: TS-1240
> URL: https://issues.apache.org/jira/browse/TS-1240
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Logging
>Affects Versions: 3.1.4
>Reporter: Leif Hedstrom
> Fix For: 3.1.5
>
>
> From John:
> {code}
> [May  1 09:08:44.746] Server {0x77fce800} NOTE: traffic server running
> FATAL: LogBuffer.cc:209: failed assert `m_unaligned_buffer`
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server - STACK 
> TRACE: 
> /home/jplevyak/projects/ts/ts-2/lib/ts/.libs/libtsutil.so.3(ink_fatal+0xa3)[0x77bae4a5]
> /home/jplevyak/projects/ts/ts-2/lib/ts/.libs/libtsutil.so.3(_ink_assert+0x3c)[0x77bad47c]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN9LogBuffer14checkout_writeEPmm+0x35)[0x5d3a53]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN9LogObject15_checkout_writeEPmm+0x41)[0x5eef75]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN9LogObject3logEP9LogAccessPc+0x4cb)[0x5ef5b9]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN16LogObjectManager3logEP9LogAccess+0x4a)[0x5daab4]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN3Log6accessEP9LogAccess+0x235)[0x5d97f9]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN6HttpSM12update_statsEv+0x204)[0x579872]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN6HttpSM9kill_thisEv+0x31d)[0x579525]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN6HttpSM12main_handlerEiPv+0x337)[0x56cec1]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN12Continuation11handleEventEiPv+0x72)[0x4e2450]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN10HttpTunnel12main_handlerEiPv+0x14c)[0x5b24aa]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN12Continuation11handleEventEiPv+0x72)[0x4e2450]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server[0x6bb9d1]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server[0x6bbafa]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_Z15write_to_net_ioP10NetHandlerP18UnixNetVConnectionP7EThread+0x6fa)[0x6bcaaf]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_Z12write_to_netP10NetHandlerP18UnixNetVConnectionP14PollDescriptorP7EThread+0x7d)[0x6bc3b3]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN10NetHandler12mainNetEventEiP5Event+0x6e6)[0x6b8828]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN12Continuation11handleEventEiPv+0x72)[0x4e2450]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN7EThread13process_eventEP5Eventi+0x111)[0x6dde7f]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN7EThread7executeEv+0x431)[0x6de42b]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server[0x6dd0bc]
> /lib64/libpthread.so.0(+0x7d90)[0x77676d90]
> /lib64/libc.so.6(clone+0x6d)[0x754f9f5d]
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-1240) Debug assert triggered in LogBuffer.cc:209

2012-05-13 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-1240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13274393#comment-13274393
 ] 

John Plevyak commented on TS-1240:
--

I an repro on my machine any time you like :)

> Debug assert triggered in LogBuffer.cc:209
> --
>
> Key: TS-1240
> URL: https://issues.apache.org/jira/browse/TS-1240
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Logging
>Affects Versions: 3.1.4
>Reporter: Leif Hedstrom
> Fix For: 3.1.5
>
>
> From John:
> {code}
> [May  1 09:08:44.746] Server {0x77fce800} NOTE: traffic server running
> FATAL: LogBuffer.cc:209: failed assert `m_unaligned_buffer`
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server - STACK 
> TRACE: 
> /home/jplevyak/projects/ts/ts-2/lib/ts/.libs/libtsutil.so.3(ink_fatal+0xa3)[0x77bae4a5]
> /home/jplevyak/projects/ts/ts-2/lib/ts/.libs/libtsutil.so.3(_ink_assert+0x3c)[0x77bad47c]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN9LogBuffer14checkout_writeEPmm+0x35)[0x5d3a53]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN9LogObject15_checkout_writeEPmm+0x41)[0x5eef75]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN9LogObject3logEP9LogAccessPc+0x4cb)[0x5ef5b9]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN16LogObjectManager3logEP9LogAccess+0x4a)[0x5daab4]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN3Log6accessEP9LogAccess+0x235)[0x5d97f9]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN6HttpSM12update_statsEv+0x204)[0x579872]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN6HttpSM9kill_thisEv+0x31d)[0x579525]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN6HttpSM12main_handlerEiPv+0x337)[0x56cec1]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN12Continuation11handleEventEiPv+0x72)[0x4e2450]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN10HttpTunnel12main_handlerEiPv+0x14c)[0x5b24aa]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN12Continuation11handleEventEiPv+0x72)[0x4e2450]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server[0x6bb9d1]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server[0x6bbafa]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_Z15write_to_net_ioP10NetHandlerP18UnixNetVConnectionP7EThread+0x6fa)[0x6bcaaf]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_Z12write_to_netP10NetHandlerP18UnixNetVConnectionP14PollDescriptorP7EThread+0x7d)[0x6bc3b3]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN10NetHandler12mainNetEventEiP5Event+0x6e6)[0x6b8828]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN12Continuation11handleEventEiPv+0x72)[0x4e2450]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN7EThread13process_eventEP5Eventi+0x111)[0x6dde7f]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN7EThread7executeEv+0x431)[0x6de42b]
> /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server[0x6dd0bc]
> /lib64/libpthread.so.0(+0x7d90)[0x77676d90]
> /lib64/libc.so.6(clone+0x6d)[0x754f9f5d]
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (TS-934) Proxy Mutex null pointer crash

2012-05-13 Thread John Plevyak (JIRA)


 [ 
https://issues.apache.org/jira/browse/TS-934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Plevyak updated TS-934:


Attachment: ts-934-jp1.patch

This undoes the previous patch as this issue was addressed under a different 
bug.

> Proxy Mutex null pointer crash
> --
>
> Key: TS-934
> URL: https://issues.apache.org/jira/browse/TS-934
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 3.1.0
> Environment: Debian 6.0.2 quadcore, forward transparent proxy.
>Reporter: Alan M. Carroll
>Assignee: John Plevyak
> Fix For: 3.1.4, 3.1.1
>
> Attachments: ts-934-jp1.patch, ts-934-patch.txt
>
>
> [Client report]
> We had the cache crash gracefully twice last night on a segfault.  Both 
> times the callstack produced by trafficserver's signal handler was:
> /usr/bin/traffic_server[0x529596]
> /lib/libpthread.so.0(+0xef60)[0x2ab09a897f60]
> [0x2ab09e7c0a10]
> usr/bin/traffic_server(HttpServerSession::do_io_close(int)+0xa8)[0x567a3c]
> /usr/bin/traffic_server(HttpVCTable::cleanup_entry(HttpVCTableEntry*)+0x4c)[0x56aff6]
> /usr/bin/traffic_server(HttpVCTable::cleanup_all()+0x64)[0x56b07a]
> /usr/bin/traffic_server(HttpSM::kill_this()+0x120)[0x57c226]
> /usr/bin/traffic_server(HttpSM::main_handler(int, void*)+0x208)[0x571b28]
> /usr/bin/traffic_server(Continuation::handleEvent(int, 
> void*)+0x69)[0x4e4623]
> I went through the disassembly and the instruction that it is on in 
> ::do_io_close is loading the value of diags (not dereferencing it) so it 
> is unlikely that that through a segfault (unless this is some how in 
> thread local storage and that is corrupt).
> The kernel message claimed that the instruction pointer was 0x4e438e 
> which in this build is in ProxyMutexPtr::operator ->() on the 
> instruction that dereferences the object pointer to get the stored mutex 
> pointer (bingo!), so it would seem that at some point we are 
> dereferencing a null "safe" pointer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (TS-934) Proxy Mutex null pointer crash

2012-05-13 Thread John Plevyak (JIRA)


 [ 
https://issues.apache.org/jira/browse/TS-934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Plevyak updated TS-934:


Assignee: John Plevyak  (was: Alan M. Carroll)

> Proxy Mutex null pointer crash
> --
>
> Key: TS-934
> URL: https://issues.apache.org/jira/browse/TS-934
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 3.1.0
> Environment: Debian 6.0.2 quadcore, forward transparent proxy.
>Reporter: Alan M. Carroll
>Assignee: John Plevyak
> Fix For: 3.1.4, 3.1.1
>
> Attachments: ts-934-jp1.patch, ts-934-patch.txt
>
>
> [Client report]
> We had the cache crash gracefully twice last night on a segfault.  Both 
> times the callstack produced by trafficserver's signal handler was:
> /usr/bin/traffic_server[0x529596]
> /lib/libpthread.so.0(+0xef60)[0x2ab09a897f60]
> [0x2ab09e7c0a10]
> usr/bin/traffic_server(HttpServerSession::do_io_close(int)+0xa8)[0x567a3c]
> /usr/bin/traffic_server(HttpVCTable::cleanup_entry(HttpVCTableEntry*)+0x4c)[0x56aff6]
> /usr/bin/traffic_server(HttpVCTable::cleanup_all()+0x64)[0x56b07a]
> /usr/bin/traffic_server(HttpSM::kill_this()+0x120)[0x57c226]
> /usr/bin/traffic_server(HttpSM::main_handler(int, void*)+0x208)[0x571b28]
> /usr/bin/traffic_server(Continuation::handleEvent(int, 
> void*)+0x69)[0x4e4623]
> I went through the disassembly and the instruction that it is on in 
> ::do_io_close is loading the value of diags (not dereferencing it) so it 
> is unlikely that that through a segfault (unless this is some how in 
> thread local storage and that is corrupt).
> The kernel message claimed that the instruction pointer was 0x4e438e 
> which in this build is in ProxyMutexPtr::operator ->() on the 
> instruction that dereferences the object pointer to get the stored mutex 
> pointer (bingo!), so it would seem that at some point we are 
> dereferencing a null "safe" pointer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-934) Proxy Mutex null pointer crash

2012-05-13 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13274383#comment-13274383
 ] 

John Plevyak commented on TS-934:
-

I think we should undo this as other changes fixed the bug.

> Proxy Mutex null pointer crash
> --
>
> Key: TS-934
> URL: https://issues.apache.org/jira/browse/TS-934
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 3.1.0
> Environment: Debian 6.0.2 quadcore, forward transparent proxy.
>Reporter: Alan M. Carroll
>Assignee: Alan M. Carroll
> Fix For: 3.1.4, 3.1.1
>
> Attachments: ts-934-patch.txt
>
>
> [Client report]
> We had the cache crash gracefully twice last night on a segfault.  Both 
> times the callstack produced by trafficserver's signal handler was:
> /usr/bin/traffic_server[0x529596]
> /lib/libpthread.so.0(+0xef60)[0x2ab09a897f60]
> [0x2ab09e7c0a10]
> usr/bin/traffic_server(HttpServerSession::do_io_close(int)+0xa8)[0x567a3c]
> /usr/bin/traffic_server(HttpVCTable::cleanup_entry(HttpVCTableEntry*)+0x4c)[0x56aff6]
> /usr/bin/traffic_server(HttpVCTable::cleanup_all()+0x64)[0x56b07a]
> /usr/bin/traffic_server(HttpSM::kill_this()+0x120)[0x57c226]
> /usr/bin/traffic_server(HttpSM::main_handler(int, void*)+0x208)[0x571b28]
> /usr/bin/traffic_server(Continuation::handleEvent(int, 
> void*)+0x69)[0x4e4623]
> I went through the disassembly and the instruction that it is on in 
> ::do_io_close is loading the value of diags (not dereferencing it) so it 
> is unlikely that that through a segfault (unless this is some how in 
> thread local storage and that is corrupt).
> The kernel message claimed that the instruction pointer was 0x4e438e 
> which in this build is in ProxyMutexPtr::operator ->() on the 
> instruction that dereferences the object pointer to get the stored mutex 
> pointer (bingo!), so it would seem that at some point we are 
> dereferencing a null "safe" pointer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-934) Proxy Mutex null pointer crash

2012-05-13 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13274381#comment-13274381
 ] 

John Plevyak commented on TS-934:
-

Is this still happening with the latest code?

> Proxy Mutex null pointer crash
> --
>
> Key: TS-934
> URL: https://issues.apache.org/jira/browse/TS-934
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 3.1.0
> Environment: Debian 6.0.2 quadcore, forward transparent proxy.
>Reporter: Alan M. Carroll
>Assignee: Alan M. Carroll
> Fix For: 3.1.4, 3.1.1
>
> Attachments: ts-934-patch.txt
>
>
> [Client report]
> We had the cache crash gracefully twice last night on a segfault.  Both 
> times the callstack produced by trafficserver's signal handler was:
> /usr/bin/traffic_server[0x529596]
> /lib/libpthread.so.0(+0xef60)[0x2ab09a897f60]
> [0x2ab09e7c0a10]
> usr/bin/traffic_server(HttpServerSession::do_io_close(int)+0xa8)[0x567a3c]
> /usr/bin/traffic_server(HttpVCTable::cleanup_entry(HttpVCTableEntry*)+0x4c)[0x56aff6]
> /usr/bin/traffic_server(HttpVCTable::cleanup_all()+0x64)[0x56b07a]
> /usr/bin/traffic_server(HttpSM::kill_this()+0x120)[0x57c226]
> /usr/bin/traffic_server(HttpSM::main_handler(int, void*)+0x208)[0x571b28]
> /usr/bin/traffic_server(Continuation::handleEvent(int, 
> void*)+0x69)[0x4e4623]
> I went through the disassembly and the instruction that it is on in 
> ::do_io_close is loading the value of diags (not dereferencing it) so it 
> is unlikely that that through a segfault (unless this is some how in 
> thread local storage and that is corrupt).
> The kernel message claimed that the instruction pointer was 0x4e438e 
> which in this build is in ProxyMutexPtr::operator ->() on the 
> instruction that dereferences the object pointer to get the stored mutex 
> pointer (bingo!), so it would seem that at some point we are 
> dereferencing a null "safe" pointer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-1238) RAM cache hit rate unexpectedly low

2012-05-08 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271042#comment-13271042
 ] 

John Plevyak commented on TS-1238:
--

It isn't committed.   Bryan was going to try it out.  It changes one of the
defaults (probably for the better) for RAM caching, but I wanted to give
him a chance to take a look.  I'll see if I can figure it out myself as
well.  It should be a very safe change.




> RAM cache hit rate unexpectedly low
> ---
>
> Key: TS-1238
> URL: https://issues.apache.org/jira/browse/TS-1238
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache
>Affects Versions: 3.1.3
>Reporter: John Plevyak
>Assignee: John Plevyak
> Fix For: 3.1.4
>
> Attachments: TS-1238-jp-1.patch
>
>
> The RAM cache is not getting the expected hit rate.  Looks like there are a 
> couple issues.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-1238) RAM cache hit rate unexpectedly low

2012-05-02 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13266934#comment-13266934
 ] 

John Plevyak commented on TS-1238:
--

I agree that such a configuration would be more convenient.  Before this patch 
it was 1 with LRU and 3 with CLFUS.   With this patch you can do 1 or 2 with 
LRU (where 2 was the default in 2.0 and 1 was the default in 3.0) and 2 or 3 
with CLFUS (where in 3.0 the default was 3).  The higher number occurs when the 
seen_filter is ON.  What would be interesting to see is 1) if LRU is better 
with 1 or 2 and 2) if CLFUS with 2 is better (or worse) than LRU with 2.  This 
bug is basically that LRU @ 1 > CLFUS @ 3 for some workloads.   Ultimately, 
easier configuration and some guidelines or better yet, auto-tuning would be 
the objective.

> RAM cache hit rate unexpectedly low
> ---
>
> Key: TS-1238
> URL: https://issues.apache.org/jira/browse/TS-1238
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache
>Affects Versions: 3.1.3
>Reporter: John Plevyak
>Assignee: John Plevyak
> Fix For: 3.1.4
>
> Attachments: TS-1238-jp-1.patch
>
>
> The RAM cache is not getting the expected hit rate.  Looks like there are a 
> couple issues.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (TS-1238) RAM cache hit rate unexpectedly low

2012-05-01 Thread John Plevyak (JIRA)


 [ 
https://issues.apache.org/jira/browse/TS-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Plevyak updated TS-1238:
-

Attachment: TS-1238-jp-1.patch

Add new option to disable/enable the seen_filter in the RAM cache.  Fix 
reporting of RAM cache hits to HTTP.  Fix for LRU cache.  Add back in 
seen_filter to LRU (disabled by default).  Disable seen filter by default for 
CLFUS.

> RAM cache hit rate unexpectedly low
> ---
>
> Key: TS-1238
> URL: https://issues.apache.org/jira/browse/TS-1238
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache
>Affects Versions: 3.1.3
>Reporter: John Plevyak
>Assignee: John Plevyak
> Fix For: 3.1.4
>
> Attachments: TS-1238-jp-1.patch
>
>
> The RAM cache is not getting the expected hit rate.  Looks like there are a 
> couple issues.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (TS-1238) RAM cache hit rate unexpectedly low

2012-05-01 Thread John Plevyak (JIRA)

John Plevyak created TS-1238:


 Summary: RAM cache hit rate unexpectedly low
 Key: TS-1238
 URL: https://issues.apache.org/jira/browse/TS-1238
 Project: Traffic Server
  Issue Type: Bug
  Components: Cache
Affects Versions: 3.1.3
Reporter: John Plevyak
Assignee: John Plevyak
 Fix For: 3.1.4


The RAM cache is not getting the expected hit rate.  Looks like there are a 
couple issues.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (TS-1225) doc_size still gets casted to int in a few places

2012-04-25 Thread John Plevyak (JIRA)


 [ 
https://issues.apache.org/jira/browse/TS-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Plevyak updated TS-1225:
-

Attachment: ts-1225.diff

Remove cast to 32bits of doc_len.

> doc_size still gets casted to int in a few places
> -
>
> Key: TS-1225
> URL: https://issues.apache.org/jira/browse/TS-1225
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache
>Reporter: Leif Hedstrom
>Assignee: John Plevyak
> Fix For: 3.1.4
>
> Attachments: ts-1225.diff
>
>
> This was also discussed on TS-475, and discovered by bwyatt. I'm filing a 
> separate bug, since I think this should be fixed independent of TS-475.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Closed] (TS-888) SSL connections working with 2.1.5 fail with 3.0.1 and FireFox

2011-08-08 Thread John Plevyak (JIRA)


 [ 
https://issues.apache.org/jira/browse/TS-888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Plevyak closed TS-888.
---

Resolution: Fixed
  Assignee: John Plevyak  (was: Leif Hedstrom)

Fixed 1155125.

> SSL connections working with 2.1.5 fail with 3.0.1 and FireFox
> --
>
> Key: TS-888
> URL: https://issues.apache.org/jira/browse/TS-888
> Project: Traffic Server
>  Issue Type: Bug
>  Components: SSL
>Affects Versions: 3.0.1
> Environment: Ubuntu 10.04 LTS amd64, Glassfish 3.0.1, FireFox 5.0
>Reporter: Kurt Huwig
>Assignee: John Plevyak
> Fix For: 3.1.0
>
> Attachments: TS-888-jp.patch
>
>
> ATS has SSL server certificates. The backend is accessed via SSL as well 
> which uses the same certificates. It fails with FireFox, but works with 
> Google Chrome.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (TS-888) SSL connections working with 2.1.5 fail with 3.0.1 and FireFox

2011-08-07 Thread John Plevyak (JIRA)


 [ 
https://issues.apache.org/jira/browse/TS-888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Plevyak updated TS-888:


Attachment: TS-888-jp.patch

> SSL connections working with 2.1.5 fail with 3.0.1 and FireFox
> --
>
> Key: TS-888
> URL: https://issues.apache.org/jira/browse/TS-888
> Project: Traffic Server
>  Issue Type: Bug
>  Components: SSL
>Affects Versions: 3.0.1
> Environment: Ubuntu 10.04 LTS amd64, Glassfish 3.0.1, FireFox 5.0
>Reporter: Kurt Huwig
>Assignee: Leif Hedstrom
> Fix For: 3.1.0
>
> Attachments: TS-888-jp.patch
>
>
> ATS has SSL server certificates. The backend is accessed via SSL as well 
> which uses the same certificates. It fails with FireFox, but works with 
> Google Chrome.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-844) ReadFromWriter fail in CacheRead.cc

2011-08-01 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13076013#comment-13076013
 ] 

John Plevyak commented on TS-844:
-

I'd like to know what the top of the stack looked like and also what "fail" 
means in this context.

The patch is safe in the sense that it is conservative, but if a write has been 
closed, but
not yet been written into the aggregation buffer, this patch will prevent that 
data from being
available for a ReadFromWriter.  At least that is how I read it.

What I am wondering is what about a closed by not yet written CacheVC is making 
ReadaFromWriter fail?


> ReadFromWriter fail in CacheRead.cc
> ---
>
> Key: TS-844
> URL: https://issues.apache.org/jira/browse/TS-844
> Project: Traffic Server
>  Issue Type: Bug
>Reporter: mohan_zl
> Fix For: 3.1.0
>
> Attachments: TS-844.patch
>
>
> {code}
> #6  0x006ab4d7 in CacheVC::openReadChooseWriter (this=0x2aaaf81523d0, 
> event=1, e=0x0) at CacheRead.cc:320
> #7  0x006abdc9 in CacheVC::openReadFromWriter (this=0x2aaaf81523d0, 
> event=1, e=0x0) at CacheRead.cc:411
> #8  0x004d302f in Continuation::handleEvent (this=0x2aaaf81523d0, 
> event=1, data=0x0) at I_Continuation.h:146
> #9  0x006ae2b9 in Cache::open_read (this=0x2aaab0001c40, 
> cont=0x2aaab4472aa0, key=0x42100b10, request=0x2aaab44710f0, 
> params=0x2aaab4470928, type=CACHE_FRAG_TYPE_HTTP,
> hostname=0x2aab09581049 
> "js.tongji.linezing.comicon1.gifjs.tongji.linezing.comï¿½ï¿ï¿½Þï¿½ï¿ï¿½Þï¿½ï¿ï¿½Þï¿½ï¿ï¿½Þï¿½ï¿ï¿½Þï¿½ï¿ï¿½Þï¿½ï¿ï¿½Þï¿½ï¿ï¿½Þï¿½ï¿ï¿½Þï¿½ï¿ï¿½Þï¿½ï¿ï¿½Þï¿½ï¿ï¿½Þï¿½ï¿ï¿½Þï¿½ï¿ï¿½Þï¿½ï¿ï¿½Þï¿½ï¿ï¿½Þï¿½ï¿ï¿½Þï¿½ï¿ï¿½Þï¿½ï¿ï¿½Þï¿½ï¿ï¿½Þï¿½ï¿ï¿½Þï¿½ï¿ï¿½Þï¿½ï¿ï¿½Þï¿½ï¿ï¿½Þï¿½ï¿ï¿½Þï¿½ï¿ï¿½Þï¿½ï¿ï¿½Þï¿½ï¿ï¿½Þï¿½ï¿ï¿½Þï¿½ï¿ï¿½Þï¿½ï¿ï¿½Þï¿½ï¿ï¿½Þï¿½ï¿ï¿½Þï¿½ï¿ï¿½Þï¿½ï¿ï¿½Þï¿½ï¿ï¿½Þï¿½ï¿½ï¿½"...,
>  host_len=22) at CacheRead.cc:228
> #10 0x0068da30 in Cache::open_read (this=0x2aaab0001c40, 
> cont=0x2aaab4472aa0, url=0x2aaab4471108, request=0x2aaab44710f0, 
> params=0x2aaab4470928,
> type=CACHE_FRAG_TYPE_HTTP) at P_CacheInternal.h:1068
> #11 0x0067d32f in CacheProcessor::open_read (this=0xf2c030, 
> cont=0x2aaab4472aa0, url=0x2aaab4471108, request=0x2aaab44710f0, 
> params=0x2aaab4470928, pin_in_cache=0,
> type=CACHE_FRAG_TYPE_HTTP) at Cache.cc:3011
> #12 0x0054e058 in HttpCacheSM::do_cache_open_read 
> (this=0x2aaab4472aa0) at HttpCacheSM.cc:220
> #13 0x0054e1a7 in HttpCacheSM::open_read (this=0x2aaab4472aa0, 
> url=0x2aaab4471108, hdr=0x2aaab44710f0, params=0x2aaab4470928, 
> pin_in_cache=0) at HttpCacheSM.cc:252
> #14 0x00568404 in HttpSM::do_cache_lookup_and_read 
> (this=0x2aaab4470830) at HttpSM.cc:3893
> #15 0x005734b5 in HttpSM::set_next_state (this=0x2aaab4470830) at 
> HttpSM.cc:6436
> #16 0x0056115a in HttpSM::call_transact_and_set_next_state 
> (this=0x2aaab4470830, f=0) at HttpSM.cc:6328
> #17 0x00574b78 in HttpSM::handle_api_return (this=0x2aaab4470830) at 
> HttpSM.cc:1516
> #18 0x0056dbe7 in HttpSM::state_api_callout (this=0x2aaab4470830, 
> event=0, data=0x0) at HttpSM.cc:1448
> #19 0x0056de77 in HttpSM::do_api_callout_internal 
> (this=0x2aaab4470830) at HttpSM.cc:4345
> #20 0x00578c89 in HttpSM::do_api_callout (this=0x2aaab4470830) at 
> HttpSM.cc:497
> #21 0x00572e93 in HttpSM::set_next_state (this=0x2aaab4470830) at 
> HttpSM.cc:6362
> #22 0x0056115a in HttpSM::call_transact_and_set_next_state 
> (this=0x2aaab4470830, f=0) at HttpSM.cc:6328
> #23 0x00572faf in HttpSM::set_next_state (this=0x2aaab4470830) at 
> HttpSM.cc:6378
> #24 0x0056115a in HttpSM::call_transact_and_set_next_state 
> (this=0x2aaab4470830, f=0) at HttpSM.cc:6328
> #25 0x00574b78 in HttpSM::handle_api_return (this=0x2aaab4470830) at 
> HttpSM.cc:1516
> #26 0x0056dbe7 in HttpSM::state_api_callout (this=0x2aaab4470830, 
> event=0, data=0x0) at HttpSM.cc:1448
> #27 0x0056de77 in HttpSM::do_api_callout_internal 
> (this=0x2aaab4470830) at HttpSM.cc:4345
> #28 0x00578c89 in HttpSM::do_api_callout (this=0x2aaab4470830) at 
> HttpSM.cc:497
> #29 0x00572e93 in HttpSM::set_next_state (this=0x2aaab4470830) at 
> HttpSM.cc:6362
> #30 0x0056115a in HttpSM::call_transact_and_set_next_state 
> (this=0x2aaab4470830, f=0) at HttpSM.cc:6328
> #31 0x00574b78 in HttpSM::handle_api_return (this=0x2aaab4470830) at 
> HttpSM.cc:1516
> #32 0x0056dbe7 in HttpSM::state_api_callout (this=0x2aaab4470830, 
> event=0, data=0x0) at HttpSM.cc:1448
> #33 0x0056de77 in HttpSM::do_api_callout_internal 
> (this=0x2aaab4470830) at HttpSM.cc:4345
> #34 0x00578c89 in HttpSM::do_api_callout (this=0x2aaab4470830) at 
> HttpSM.

[jira] [Commented] (TS-848) Crash Report: ShowNet::showConnectionsOnThread -> ShowCont::show

2011-07-24 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070292#comment-13070292
 ] 

John Plevyak commented on TS-848:
-

I think this is fixed in 1150526, give it a try.

> Crash Report: ShowNet::showConnectionsOnThread -> ShowCont::show
> 
>
> Key: TS-848
> URL: https://issues.apache.org/jira/browse/TS-848
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP
>Affects Versions: 3.1.0
>Reporter: Zhao Yongming
>  Labels: http_ui, network
> Fix For: 3.1.1
>
>
> when we use the {net} http_ui network interface, it crashed with the 
> following information
> {code}
> NOTE: Traffic Server received Sig 11: Segmentation fault
> /usr/bin/traffic_server - STACK TRACE: 
> /usr/bin/traffic_server[0x51ba3e]
> /lib64/libpthread.so.0[0x3f89c0e7c0]
> [0x7fffd20544f8]
> /lib64/libc.so.6(vsnprintf+0x9a)[0x3f8906988a]
> /usr/bin/traffic_server(ShowCont::show(char const*, ...)+0x262)[0x638184]
> /usr/bin/traffic_server(ShowNet::showConnectionsOnThread(int, 
> Event*)+0x481)[0x6ec7bf]
> /usr/bin/traffic_server(Continuation::handleEvent(int, void*)+0x6f)[0x4d302f]
> /usr/bin/traffic_server(EThread::process_event(Event*, int)+0x11e)[0x6f9978]
> /usr/bin/traffic_server(EThread::execute()+0x94)[0x6f9b6a]
> /usr/bin/traffic_server(main+0x10c7)[0x4ff74d]
> /lib64/libc.so.6(__libc_start_main+0xf4)[0x3f8901d994]
> /usr/bin/traffic_server(__gxx_personality_v0+0x491)[0x4b2149]
> /usr/bin/traffic_server(__gxx_personality_v0+0x491)[0x4b2149]
> [New process 31182]
> #0  0x003f890796d0 in strlen () from /lib64/libc.so.6
> (gdb) bt
> #0  0x003f890796d0 in strlen () from /lib64/libc.so.6
> #1  0x003f89046b69 in vfprintf () from /lib64/libc.so.6
> #2  0x003f8906988a in vsnprintf () from /lib64/libc.so.6
> #3  0x00638184 in ShowCont::show (this=0x2aaab44af600, 
> s=0x7732b8 
> "%d%s%d%d%s%d%d 
> secs 
> ago%d%d%d%d%d%d%d%d
>  secs%d secs<"...) at ../../proxy/Show.h:62
> #4  0x006ec7bf in ShowNet::showConnectionsOnThread 
> (this=0x2aaab44af600, event=1, e=0x2aaab5cc2080) at UnixNetPages.cc:75
> #5  0x004d302f in Continuation::handleEvent (this=0x2aaab44af600, 
> event=1, data=0x2aaab5cc2080) at I_Continuation.h:146
> #6  0x006f9978 in EThread::process_event (this=0x2ae29010, 
> e=0x2aaab5cc2080, calling_code=1) at UnixEThread.cc:140
> #7  0x006f9b6a in EThread::execute (this=0x2ae29010) at 
> UnixEThread.cc:189
> #8  0x004ff74d in main (argc=3, argv=0x7fffd2054d88) at Main.cc:1958
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-848) Crash Report: ShowNet::showConnectionsOnThread -> ShowCont::show

2011-07-24 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070236#comment-13070236
 ] 

John Plevyak commented on TS-848:
-

Gack, many of those values (e.g. nbytes) are now 64-bit %lld.

> Crash Report: ShowNet::showConnectionsOnThread -> ShowCont::show
> 
>
> Key: TS-848
> URL: https://issues.apache.org/jira/browse/TS-848
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP
>Affects Versions: 3.1.0
>Reporter: Zhao Yongming
>  Labels: http_ui, network
> Fix For: 3.1.1
>
>
> when we use the {net} http_ui network interface, it crashed with the 
> following information
> {code}
> NOTE: Traffic Server received Sig 11: Segmentation fault
> /usr/bin/traffic_server - STACK TRACE: 
> /usr/bin/traffic_server[0x51ba3e]
> /lib64/libpthread.so.0[0x3f89c0e7c0]
> [0x7fffd20544f8]
> /lib64/libc.so.6(vsnprintf+0x9a)[0x3f8906988a]
> /usr/bin/traffic_server(ShowCont::show(char const*, ...)+0x262)[0x638184]
> /usr/bin/traffic_server(ShowNet::showConnectionsOnThread(int, 
> Event*)+0x481)[0x6ec7bf]
> /usr/bin/traffic_server(Continuation::handleEvent(int, void*)+0x6f)[0x4d302f]
> /usr/bin/traffic_server(EThread::process_event(Event*, int)+0x11e)[0x6f9978]
> /usr/bin/traffic_server(EThread::execute()+0x94)[0x6f9b6a]
> /usr/bin/traffic_server(main+0x10c7)[0x4ff74d]
> /lib64/libc.so.6(__libc_start_main+0xf4)[0x3f8901d994]
> /usr/bin/traffic_server(__gxx_personality_v0+0x491)[0x4b2149]
> /usr/bin/traffic_server(__gxx_personality_v0+0x491)[0x4b2149]
> [New process 31182]
> #0  0x003f890796d0 in strlen () from /lib64/libc.so.6
> (gdb) bt
> #0  0x003f890796d0 in strlen () from /lib64/libc.so.6
> #1  0x003f89046b69 in vfprintf () from /lib64/libc.so.6
> #2  0x003f8906988a in vsnprintf () from /lib64/libc.so.6
> #3  0x00638184 in ShowCont::show (this=0x2aaab44af600, 
> s=0x7732b8 
> "%d%s%d%d%s%d%d 
> secs 
> ago%d%d%d%d%d%d%d%d
>  secs%d secs<"...) at ../../proxy/Show.h:62
> #4  0x006ec7bf in ShowNet::showConnectionsOnThread 
> (this=0x2aaab44af600, event=1, e=0x2aaab5cc2080) at UnixNetPages.cc:75
> #5  0x004d302f in Continuation::handleEvent (this=0x2aaab44af600, 
> event=1, data=0x2aaab5cc2080) at I_Continuation.h:146
> #6  0x006f9978 in EThread::process_event (this=0x2ae29010, 
> e=0x2aaab5cc2080, calling_code=1) at UnixEThread.cc:140
> #7  0x006f9b6a in EThread::execute (this=0x2ae29010) at 
> UnixEThread.cc:189
> #8  0x004ff74d in main (argc=3, argv=0x7fffd2054d88) at Main.cc:1958
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (TS-866) Need way to clear contents of a cache entry

2011-07-24 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070234#comment-13070234
 ] 

John Plevyak edited comment on TS-866 at 7/24/11 7:25 PM:
--

Sorry for the delay.  I am looking at this patch.  It needs a little bit of 
work:

1) it should be built on remove instead of read (it can still share internal 
states with read using the stack mechanism)
2) it should interlock writes from the aggregation buffer if they would overlap 
these writes
3) it needs to support clustering

These are not huge changes, but they will require a bit of work.  There are 
other features which need to touch this code as well, so I'll poke around.

  was (Author: jplevyak):
Sorry for the delay.  I am looking at this patch.  It needs a little bit of 
work:

1) it should be built on remove instead of read (it can still share internal 
states with using the stack mechanism)
2) it should interlock writes from the aggregation buffer if they would overlap 
these writes
3) it needs to support clustering

These are not huge changes, but they will require a bit of work.  There are 
other features which need to touch this code as well, so I'll poke around.
  
> Need way to clear contents of a cache entry
> ---
>
> Key: TS-866
> URL: https://issues.apache.org/jira/browse/TS-866
> Project: Traffic Server
>  Issue Type: New Feature
>  Components: Cache
>Affects Versions: 3.0.0
>Reporter: William Bardwell
>Priority: Minor
> Fix For: 3.1.0
>
> Attachments: cache_erase.diff
>
>
> I needed a way to clear a cache entry off of disk, not just forget about it.  
> The worry was about if you got content on a server that was illegal or a 
> privacy violation of some sort, we wanted a way to be able to tell customers 
> that after this step there was no way that TS could serve the content again.  
> The normal cache remove just clears the directory entry, but theoretically a 
> bug could allow that data out in some way.  This was not intended to prevent 
> forensic analysis of the hardware being able to recover the data.  And bugs 
> in low level drivers or the kernel could theoretically allow data to survive 
> due to block remapping or mis-management of disk caches.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-866) Need way to clear contents of a cache entry

2011-07-24 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070234#comment-13070234
 ] 

John Plevyak commented on TS-866:
-

Sorry for the delay.  I am looking at this patch.  It needs a little bit of 
work:

1) it should be built on remove instead of read (it can still share internal 
states with using the stack mechanism)
2) it should interlock writes from the aggregation buffer if they would overlap 
these writes
3) it needs to support clustering

These are not huge changes, but they will require a bit of work.  There are 
other features which need to touch this code as well, so I'll poke around.

> Need way to clear contents of a cache entry
> ---
>
> Key: TS-866
> URL: https://issues.apache.org/jira/browse/TS-866
> Project: Traffic Server
>  Issue Type: New Feature
>  Components: Cache
>Affects Versions: 3.0.0
>Reporter: William Bardwell
>Priority: Minor
> Fix For: 3.1.0
>
> Attachments: cache_erase.diff
>
>
> I needed a way to clear a cache entry off of disk, not just forget about it.  
> The worry was about if you got content on a server that was illegal or a 
> privacy violation of some sort, we wanted a way to be able to tell customers 
> that after this step there was no way that TS could serve the content again.  
> The normal cache remove just clears the directory entry, but theoretically a 
> bug could allow that data out in some way.  This was not intended to prevent 
> forensic analysis of the hardware being able to recover the data.  And bugs 
> in low level drivers or the kernel could theoretically allow data to survive 
> due to block remapping or mis-management of disk caches.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-833) Crash Report: Continuation::handleEvent, event=2, 0xdeadbeef, ink_freelist_free related

2011-06-17 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051351#comment-13051351
 ] 

John Plevyak commented on TS-833:
-

I think these are related.  The DNSEntry is trashed or the timeout is bad.  We 
are in write_dns_event writing an entry which
is not the new one.  This is an odd state to be in as entries are written 
immediately unless we are over 2K entries in flight
or if there has been a failover (bad DNS server).  You could be able to check 
that in the stats.  I think it more likely that
the DNSEntry is trashed.   If you see this again, try printing out "e" within 
write_dns_event.  Also check to see if you are getting
failovers.  There should be warnings in the logs: 

"failover: connection to DNS server %d.%d.%d.%d lost, retrying"

Also the number in flight is stored in DNSHandler::in_flight available in 
h->in_flight in write_dns_event.

Most likely the DNSEntry has been free'd.  But how that could possibly happen 
is beyond me.  The only code path goes
through DNSEntry::post() which is after removing the DNSEntry from the 
DNSHandler::entries queue which is the same
queue which is walked in write_dns() to call write_dns_entry() which is where 
the crash happens.  It is also after
code which cancels any timeout.

I just can't see the problem, but perhaps someone else can.



> Crash Report: Continuation::handleEvent, event=2, 0xdeadbeef, 
> ink_freelist_free related
> ---
>
> Key: TS-833
> URL: https://issues.apache.org/jira/browse/TS-833
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 3.1.0
> Environment: current trunk, with --enable-debug
>Reporter: Zhao Yongming
>  Labels: freelist
> Attachments: TS-833-2.diff, TS-833-3.diff, TS-833.diff
>
>
> bt #1
> {code}
> #0  0x004d2c5c in Continuation::handleEvent (this=0x19581df0, 
> event=2, data=0x197c4fc0) at I_Continuation.h:146
> 146 return (this->*handler) (event, data);
> (gdb) bt
> #0  0x004d2c5c in Continuation::handleEvent (this=0x19581df0, 
> event=2, data=0x197c4fc0) at I_Continuation.h:146
> #1  0x006f5830 in EThread::process_event (this=0x2ae29010, 
> e=0x197c4fc0, calling_code=2) at UnixEThread.cc:140
> #2  0x006f5b72 in EThread::execute (this=0x2ae29010) at 
> UnixEThread.cc:217
> #3  0x004ff37d in main (argc=3, argv=0x7fff76c41528) at Main.cc:1958
> (gdb) info f
> Stack level 0, frame at 0x7fff76c40e40:
>  rip = 0x4d2c5c in Continuation::handleEvent(int, void*) 
> (I_Continuation.h:146); saved rip 0x6f5830
>  called by frame at 0x7fff76c40eb0
>  source language c++.
>  Arglist at 0x7fff76c40e30, args: this=0x19581df0, event=2, data=0x197c4fc0
>  Locals at 0x7fff76c40e30, Previous frame's sp is 0x7fff76c40e40
>  Saved registers:
>   rbp at 0x7fff76c40e30, rip at 0x7fff76c40e38
> (gdb) x/40x this
> 0x19581df0: 0x19581901  0x  0xefbeadde  0xefbeadde
> 0x19581e00: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e10: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e20: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e30: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e40: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e50: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e60: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e70: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e80: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> {code}
> bt #2
> {code}
> #0  0x004d637c in Continuation::handleEvent (this=0xc3cc390, event=2, 
> data=0xc4408a0) at I_Continuation.h:146
> 146 return (this->*handler) (event, data);
> (gdb) bt
> #0  0x004d637c in Continuation::handleEvent (this=0xc3cc390, event=2, 
> data=0xc4408a0) at I_Continuation.h:146
> #1  0x0070364c in EThread::process_event (this=0x2ae29010, 
> e=0xc4408a0, calling_code=2) at UnixEThread.cc:140
> #2  0x0070398e in EThread::execute (this=0x2ae29010) at 
> UnixEThread.cc:217
> #3  0x00502aac in main (argc=3, argv=0x7fff32ef2f58) at Main.cc:1961
> (gdb) p *this
> $1 = { = {_vptr.force_VFPT_to_top = 0x2aaab002f011}, 
> handler = 0xefbeaddeefbeadde, this adjustment -1171307680053154338, 
>   handler_name = 0xefbeaddeefbeadde  bounds>, mutex = {m_ptr = 0xefbeaddeefbeadde}, link = {> 
> = {
>   next = 0xefbeaddeefbeadde}, prev = 0xefbeaddeefbeadde}}
> (gdb) 
> {code}
> bt #3
> {code}
> #0  0x004d2c5c in Continuation::handleEvent (this=0x2aaab00615b0, 
> event=2, data=0x2aaab00d1570) at I_Continuation.h:146
> 14

[jira] [Commented] (TS-833) Crash Report: Continuation::handleEvent, event=2, 0xdeadbeef, ink_freelist_free related

2011-06-16 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050620#comment-13050620
 ] 

John Plevyak commented on TS-833:
-

mohan_zl, this latest crash is with TS-833-3.diff ??

> Crash Report: Continuation::handleEvent, event=2, 0xdeadbeef, 
> ink_freelist_free related
> ---
>
> Key: TS-833
> URL: https://issues.apache.org/jira/browse/TS-833
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 3.1.0
> Environment: current trunk, with --enable-debug
>Reporter: Zhao Yongming
>  Labels: freelist
> Attachments: TS-833-2.diff, TS-833-3.diff, TS-833.diff
>
>
> bt #1
> {code}
> #0  0x004d2c5c in Continuation::handleEvent (this=0x19581df0, 
> event=2, data=0x197c4fc0) at I_Continuation.h:146
> 146 return (this->*handler) (event, data);
> (gdb) bt
> #0  0x004d2c5c in Continuation::handleEvent (this=0x19581df0, 
> event=2, data=0x197c4fc0) at I_Continuation.h:146
> #1  0x006f5830 in EThread::process_event (this=0x2ae29010, 
> e=0x197c4fc0, calling_code=2) at UnixEThread.cc:140
> #2  0x006f5b72 in EThread::execute (this=0x2ae29010) at 
> UnixEThread.cc:217
> #3  0x004ff37d in main (argc=3, argv=0x7fff76c41528) at Main.cc:1958
> (gdb) info f
> Stack level 0, frame at 0x7fff76c40e40:
>  rip = 0x4d2c5c in Continuation::handleEvent(int, void*) 
> (I_Continuation.h:146); saved rip 0x6f5830
>  called by frame at 0x7fff76c40eb0
>  source language c++.
>  Arglist at 0x7fff76c40e30, args: this=0x19581df0, event=2, data=0x197c4fc0
>  Locals at 0x7fff76c40e30, Previous frame's sp is 0x7fff76c40e40
>  Saved registers:
>   rbp at 0x7fff76c40e30, rip at 0x7fff76c40e38
> (gdb) x/40x this
> 0x19581df0: 0x19581901  0x  0xefbeadde  0xefbeadde
> 0x19581e00: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e10: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e20: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e30: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e40: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e50: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e60: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e70: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e80: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> {code}
> bt #2
> {code}
> #0  0x004d637c in Continuation::handleEvent (this=0xc3cc390, event=2, 
> data=0xc4408a0) at I_Continuation.h:146
> 146 return (this->*handler) (event, data);
> (gdb) bt
> #0  0x004d637c in Continuation::handleEvent (this=0xc3cc390, event=2, 
> data=0xc4408a0) at I_Continuation.h:146
> #1  0x0070364c in EThread::process_event (this=0x2ae29010, 
> e=0xc4408a0, calling_code=2) at UnixEThread.cc:140
> #2  0x0070398e in EThread::execute (this=0x2ae29010) at 
> UnixEThread.cc:217
> #3  0x00502aac in main (argc=3, argv=0x7fff32ef2f58) at Main.cc:1961
> (gdb) p *this
> $1 = { = {_vptr.force_VFPT_to_top = 0x2aaab002f011}, 
> handler = 0xefbeaddeefbeadde, this adjustment -1171307680053154338, 
>   handler_name = 0xefbeaddeefbeadde  bounds>, mutex = {m_ptr = 0xefbeaddeefbeadde}, link = {> 
> = {
>   next = 0xefbeaddeefbeadde}, prev = 0xefbeaddeefbeadde}}
> (gdb) 
> {code}
> bt #3
> {code}
> #0  0x004d2c5c in Continuation::handleEvent (this=0x2aaab00615b0, 
> event=2, data=0x2aaab00d1570) at I_Continuation.h:146
> 146 return (this->*handler) (event, data);
> (gdb) bt
> #0  0x004d2c5c in Continuation::handleEvent (this=0x2aaab00615b0, 
> event=2, data=0x2aaab00d1570) at I_Continuation.h:146
> #1  0x006f5830 in EThread::process_event (this=0x2ae29010, 
> e=0x2aaab00d1570, calling_code=2) at UnixEThread.cc:140
> #2  0x006f5b72 in EThread::execute (this=0x2ae29010) at 
> UnixEThread.cc:217
> #3  0x004ff37d in main (argc=3, argv=0x7fff421f08d8) at Main.cc:1958
> (gdb) info f
> Stack level 0, frame at 0x7fff421f01f0:
>  rip = 0x4d2c5c in Continuation::handleEvent(int, void*) 
> (I_Continuation.h:146); saved rip 0x6f5830
>  called by frame at 0x7fff421f0260
>  source language c++.
>  Arglist at 0x7fff421f01e0, args: this=0x2aaab00615b0, event=2, 
> data=0x2aaab00d1570
>  Locals at 0x7fff421f01e0, Previous frame's sp is 0x7fff421f01f0
>  Saved registers:
>   rbp at 0x7fff421f01e0, rip at 0x7fff421f01e8
> (gdb) p this->handler
> $1 = 0xefbeaddeefbeadde, this adjustment -1171307680053154338
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: htt

[jira] [Commented] (TS-834) Crash Report: InactivityCop::check_inactivity, event=2, UnixNet.cc:57

2011-06-16 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050605#comment-13050605
 ] 

John Plevyak commented on TS-834:
-

zym, do you still see this with the patch?

> Crash Report: InactivityCop::check_inactivity, event=2, UnixNet.cc:57
> -
>
> Key: TS-834
> URL: https://issues.apache.org/jira/browse/TS-834
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 3.1.0
> Environment: current trunk( the same time as v3.0), --enable-debug
>Reporter: Zhao Yongming
>  Labels: UnixNet
> Attachments: TS-834.diff
>
>
> bt #1
> {code}
> #0  0x004d2c5c in Continuation::handleEvent (this=0x2aaaf4091b70, 
> event=1, data=0x4b2d6d0) at I_Continuation.h:146
> 146 return (this->*handler) (event, data);
> (gdb) bt
> #0  0x004d2c5c in Continuation::handleEvent (this=0x2aaaf4091b70, 
> event=1, data=0x4b2d6d0) at I_Continuation.h:146
> #1  0x006ce196 in InactivityCop::check_inactivity (this=0x4b3f780, 
> event=2, e=0x4b2d6d0) at UnixNet.cc:57
> #2  0x004d2c5f in Continuation::handleEvent (this=0x4b3f780, event=2, 
> data=0x4b2d6d0) at I_Continuation.h:146
> #3  0x006f5830 in EThread::process_event (this=0x2ae29010, 
> e=0x4b2d6d0, calling_code=2) at UnixEThread.cc:140
> #4  0x006f5b72 in EThread::execute (this=0x2ae29010) at 
> UnixEThread.cc:217
> #5  0x004ff37d in main (argc=3, argv=0x7fff6f447418) at Main.cc:1958
> (gdb) info f
> Stack level 0, frame at 0x7fff6f446cb0:
>  rip = 0x4d2c5c in Continuation::handleEvent(int, void*) 
> (I_Continuation.h:146); saved rip 0x6ce196
>  called by frame at 0x7fff6f446d00
>  source language c++.
>  Arglist at 0x7fff6f446ca0, args: this=0x2aaaf4091b70, event=1, data=0x4b2d6d0
>  Locals at 0x7fff6f446ca0, Previous frame's sp is 0x7fff6f446cb0
>  Saved registers:
>   rbp at 0x7fff6f446ca0, rip at 0x7fff6f446ca8
> (gdb) x/80x this
> 0x2aaaf4091b70: 0x0076a830  0x  0x006d1902  0x
> 0x2aaaf4091b80: 0x  0x  0x0076a290  0x
> 0x2aaaf4091b90: 0x  0x  0x  0x
> 0x2aaaf4091ba0: 0x  0x  0x  0x
> 0x2aaaf4091bb0: 0x  0x  0x  0x
> 0x2aaaf4091bc0: 0x  0x  0x  0x
> 0x2aaaf4091bd0: 0x  0x  0x  0x
> 0x2aaaf4091be0: 0x  0x  0x  0x
> 0x2aaaf4091bf0: 0x  0x  0x  0x
> 0x2aaaf4091c00: 0x  0x  0x  0x
> 0x2aaaf4091c10: 0x  0x  0x  0x
> 0x2aaaf4091c20: 0x  0x  0x  0x
> 0x2aaaf4091c30: 0x  0x  0x  0x
> 0x2aaaf4091c40: 0x  0x  0x  0x
> 0x2aaaf4091c50: 0x  0x  0x  0x
> 0x2aaaf4091c60: 0x  0x  0x  0x
> 0x2aaaf4091c70: 0x  0x  0x  0x
> 0x2aaaf4091c80: 0x  0x  0x  0x
> 0x2aaaf4091c90: 0x  0x  0x  0x
> 0x2aaaf4091ca0: 0x  0x  0x  0x
> {code}
> bt #2
> {code}
> #0  0x004d2c5c in Continuation::handleEvent (this=0x11ed6000, 
> event=1, data=0x11cbc610) at I_Continuation.h:146
> 146 return (this->*handler) (event, data);
> (gdb) bt
> #0  0x004d2c5c in Continuation::handleEvent (this=0x11ed6000, 
> event=1, data=0x11cbc610) at I_Continuation.h:146
> #1  0x006ce196 in InactivityCop::check_inactivity 
> (this=0x2c001f50, event=2, e=0x11cbc610) at UnixNet.cc:57
> #2  0x004d2c5f in Continuation::handleEvent (this=0x2c001f50, 
> event=2, data=0x11cbc610) at I_Continuation.h:146
> #3  0x006f5830 in EThread::process_event (this=0x2af2a010, 
> e=0x11cbc610, calling_code=2) at UnixEThread.cc:140
> #4  0x006f5b72 in EThread::execute (this=0x2af2a010) at 
> UnixEThread.cc:217
> #5  0x006f5181 in spawn_thread_internal (a=0x11cadae0) at Thread.cc:88
> #6  0x0030ec2064a7 in start_thread () from /lib64/libpthread.so.0
> #7  0x0030eb6d3c2d in clone () from /lib64/libc.so.6
> (gdb) info f
> Stack level 0, frame at 0x4198df60:
>  rip = 0x4d2c5c in Continuation::handleEvent(int, void*) 
> (I_Continuation.h:146); saved rip 0x6ce196
>  called by frame at 0x4198dfb0
>  source language c++.
>  Arglist at 0x4198df50, args: this=0x11ed6000,

[jira] [Commented] (TS-833) Crash Report: Continuation::handleEvent, event=2, 0xdeadbeef, ink_freelist_free related

2011-06-16 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050604#comment-13050604
 ] 

John Plevyak commented on TS-833:
-

Bloody unbelievable.  This is event == 2 which is a schedule_in and the size of 
the object is the same as DNSEntry, so a schedule_in on a DNSEntry.  There are 
only a couple of those and I have checked those paths.  Anyone else want to 
take a look?

> Crash Report: Continuation::handleEvent, event=2, 0xdeadbeef, 
> ink_freelist_free related
> ---
>
> Key: TS-833
> URL: https://issues.apache.org/jira/browse/TS-833
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 3.1.0
> Environment: current trunk, with --enable-debug
>Reporter: Zhao Yongming
>  Labels: freelist
> Attachments: TS-833-2.diff, TS-833-3.diff, TS-833.diff
>
>
> bt #1
> {code}
> #0  0x004d2c5c in Continuation::handleEvent (this=0x19581df0, 
> event=2, data=0x197c4fc0) at I_Continuation.h:146
> 146 return (this->*handler) (event, data);
> (gdb) bt
> #0  0x004d2c5c in Continuation::handleEvent (this=0x19581df0, 
> event=2, data=0x197c4fc0) at I_Continuation.h:146
> #1  0x006f5830 in EThread::process_event (this=0x2ae29010, 
> e=0x197c4fc0, calling_code=2) at UnixEThread.cc:140
> #2  0x006f5b72 in EThread::execute (this=0x2ae29010) at 
> UnixEThread.cc:217
> #3  0x004ff37d in main (argc=3, argv=0x7fff76c41528) at Main.cc:1958
> (gdb) info f
> Stack level 0, frame at 0x7fff76c40e40:
>  rip = 0x4d2c5c in Continuation::handleEvent(int, void*) 
> (I_Continuation.h:146); saved rip 0x6f5830
>  called by frame at 0x7fff76c40eb0
>  source language c++.
>  Arglist at 0x7fff76c40e30, args: this=0x19581df0, event=2, data=0x197c4fc0
>  Locals at 0x7fff76c40e30, Previous frame's sp is 0x7fff76c40e40
>  Saved registers:
>   rbp at 0x7fff76c40e30, rip at 0x7fff76c40e38
> (gdb) x/40x this
> 0x19581df0: 0x19581901  0x  0xefbeadde  0xefbeadde
> 0x19581e00: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e10: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e20: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e30: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e40: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e50: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e60: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e70: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e80: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> {code}
> bt #2
> {code}
> #0  0x004d637c in Continuation::handleEvent (this=0xc3cc390, event=2, 
> data=0xc4408a0) at I_Continuation.h:146
> 146 return (this->*handler) (event, data);
> (gdb) bt
> #0  0x004d637c in Continuation::handleEvent (this=0xc3cc390, event=2, 
> data=0xc4408a0) at I_Continuation.h:146
> #1  0x0070364c in EThread::process_event (this=0x2ae29010, 
> e=0xc4408a0, calling_code=2) at UnixEThread.cc:140
> #2  0x0070398e in EThread::execute (this=0x2ae29010) at 
> UnixEThread.cc:217
> #3  0x00502aac in main (argc=3, argv=0x7fff32ef2f58) at Main.cc:1961
> (gdb) p *this
> $1 = { = {_vptr.force_VFPT_to_top = 0x2aaab002f011}, 
> handler = 0xefbeaddeefbeadde, this adjustment -1171307680053154338, 
>   handler_name = 0xefbeaddeefbeadde  bounds>, mutex = {m_ptr = 0xefbeaddeefbeadde}, link = {> 
> = {
>   next = 0xefbeaddeefbeadde}, prev = 0xefbeaddeefbeadde}}
> (gdb) 
> {code}
> bt #3
> {code}
> #0  0x004d2c5c in Continuation::handleEvent (this=0x2aaab00615b0, 
> event=2, data=0x2aaab00d1570) at I_Continuation.h:146
> 146 return (this->*handler) (event, data);
> (gdb) bt
> #0  0x004d2c5c in Continuation::handleEvent (this=0x2aaab00615b0, 
> event=2, data=0x2aaab00d1570) at I_Continuation.h:146
> #1  0x006f5830 in EThread::process_event (this=0x2ae29010, 
> e=0x2aaab00d1570, calling_code=2) at UnixEThread.cc:140
> #2  0x006f5b72 in EThread::execute (this=0x2ae29010) at 
> UnixEThread.cc:217
> #3  0x004ff37d in main (argc=3, argv=0x7fff421f08d8) at Main.cc:1958
> (gdb) info f
> Stack level 0, frame at 0x7fff421f01f0:
>  rip = 0x4d2c5c in Continuation::handleEvent(int, void*) 
> (I_Continuation.h:146); saved rip 0x6f5830
>  called by frame at 0x7fff421f0260
>  source language c++.
>  Arglist at 0x7fff421f01e0, args: this=0x2aaab00615b0, event=2, 
> data=0x2aaab00d1570
>  Locals at 0x7fff421f01e0, Previous frame's sp is 0x7fff421f01f0
>  Saved registers:
>   rbp at 0x7fff421f01e0, rip at 0x7ff

[jira] [Updated] (TS-833) Crash Report: Continuation::handleEvent, event=2, 0xdeadbeef, ink_freelist_free related

2011-06-14 Thread John Plevyak (JIRA)


 [ 
https://issues.apache.org/jira/browse/TS-833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Plevyak updated TS-833:


Attachment: TS-833-3.diff

Even more conservative coding style.

> Crash Report: Continuation::handleEvent, event=2, 0xdeadbeef, 
> ink_freelist_free related
> ---
>
> Key: TS-833
> URL: https://issues.apache.org/jira/browse/TS-833
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 3.1.0
> Environment: current trunk, with --enable-debug
>Reporter: Zhao Yongming
>  Labels: freelist
> Attachments: TS-833-2.diff, TS-833-3.diff, TS-833.diff
>
>
> bt #1
> {code}
> #0  0x004d2c5c in Continuation::handleEvent (this=0x19581df0, 
> event=2, data=0x197c4fc0) at I_Continuation.h:146
> 146 return (this->*handler) (event, data);
> (gdb) bt
> #0  0x004d2c5c in Continuation::handleEvent (this=0x19581df0, 
> event=2, data=0x197c4fc0) at I_Continuation.h:146
> #1  0x006f5830 in EThread::process_event (this=0x2ae29010, 
> e=0x197c4fc0, calling_code=2) at UnixEThread.cc:140
> #2  0x006f5b72 in EThread::execute (this=0x2ae29010) at 
> UnixEThread.cc:217
> #3  0x004ff37d in main (argc=3, argv=0x7fff76c41528) at Main.cc:1958
> (gdb) info f
> Stack level 0, frame at 0x7fff76c40e40:
>  rip = 0x4d2c5c in Continuation::handleEvent(int, void*) 
> (I_Continuation.h:146); saved rip 0x6f5830
>  called by frame at 0x7fff76c40eb0
>  source language c++.
>  Arglist at 0x7fff76c40e30, args: this=0x19581df0, event=2, data=0x197c4fc0
>  Locals at 0x7fff76c40e30, Previous frame's sp is 0x7fff76c40e40
>  Saved registers:
>   rbp at 0x7fff76c40e30, rip at 0x7fff76c40e38
> (gdb) x/40x this
> 0x19581df0: 0x19581901  0x  0xefbeadde  0xefbeadde
> 0x19581e00: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e10: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e20: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e30: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e40: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e50: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e60: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e70: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e80: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> {code}
> bt #2
> {code}
> #0  0x004d637c in Continuation::handleEvent (this=0xc3cc390, event=2, 
> data=0xc4408a0) at I_Continuation.h:146
> 146 return (this->*handler) (event, data);
> (gdb) bt
> #0  0x004d637c in Continuation::handleEvent (this=0xc3cc390, event=2, 
> data=0xc4408a0) at I_Continuation.h:146
> #1  0x0070364c in EThread::process_event (this=0x2ae29010, 
> e=0xc4408a0, calling_code=2) at UnixEThread.cc:140
> #2  0x0070398e in EThread::execute (this=0x2ae29010) at 
> UnixEThread.cc:217
> #3  0x00502aac in main (argc=3, argv=0x7fff32ef2f58) at Main.cc:1961
> (gdb) p *this
> $1 = { = {_vptr.force_VFPT_to_top = 0x2aaab002f011}, 
> handler = 0xefbeaddeefbeadde, this adjustment -1171307680053154338, 
>   handler_name = 0xefbeaddeefbeadde  bounds>, mutex = {m_ptr = 0xefbeaddeefbeadde}, link = {> 
> = {
>   next = 0xefbeaddeefbeadde}, prev = 0xefbeaddeefbeadde}}
> (gdb) 
> {code}
> bt #3
> {code}
> #0  0x004d2c5c in Continuation::handleEvent (this=0x2aaab00615b0, 
> event=2, data=0x2aaab00d1570) at I_Continuation.h:146
> 146 return (this->*handler) (event, data);
> (gdb) bt
> #0  0x004d2c5c in Continuation::handleEvent (this=0x2aaab00615b0, 
> event=2, data=0x2aaab00d1570) at I_Continuation.h:146
> #1  0x006f5830 in EThread::process_event (this=0x2ae29010, 
> e=0x2aaab00d1570, calling_code=2) at UnixEThread.cc:140
> #2  0x006f5b72 in EThread::execute (this=0x2ae29010) at 
> UnixEThread.cc:217
> #3  0x004ff37d in main (argc=3, argv=0x7fff421f08d8) at Main.cc:1958
> (gdb) info f
> Stack level 0, frame at 0x7fff421f01f0:
>  rip = 0x4d2c5c in Continuation::handleEvent(int, void*) 
> (I_Continuation.h:146); saved rip 0x6f5830
>  called by frame at 0x7fff421f0260
>  source language c++.
>  Arglist at 0x7fff421f01e0, args: this=0x2aaab00615b0, event=2, 
> data=0x2aaab00d1570
>  Locals at 0x7fff421f01e0, Previous frame's sp is 0x7fff421f01f0
>  Saved registers:
>   rbp at 0x7fff421f01e0, rip at 0x7fff421f01e8
> (gdb) p this->handler
> $1 = 0xefbeaddeefbeadde, this adjustment -1171307680053154338
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (TS-834) Crash Report: InactivityCop::check_inactivity, event=2, UnixNet.cc:57

2011-06-14 Thread John Plevyak (JIRA)


 [ 
https://issues.apache.org/jira/browse/TS-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Plevyak updated TS-834:


Attachment: TS-834.diff

Patch copied from TS-833 which is really for this bug.

> Crash Report: InactivityCop::check_inactivity, event=2, UnixNet.cc:57
> -
>
> Key: TS-834
> URL: https://issues.apache.org/jira/browse/TS-834
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 3.1.0
> Environment: current trunk( the same time as v3.0), --enable-debug
>Reporter: Zhao Yongming
>  Labels: UnixNet
> Attachments: TS-834.diff
>
>
> bt #1
> {code}
> #0  0x004d2c5c in Continuation::handleEvent (this=0x2aaaf4091b70, 
> event=1, data=0x4b2d6d0) at I_Continuation.h:146
> 146 return (this->*handler) (event, data);
> (gdb) bt
> #0  0x004d2c5c in Continuation::handleEvent (this=0x2aaaf4091b70, 
> event=1, data=0x4b2d6d0) at I_Continuation.h:146
> #1  0x006ce196 in InactivityCop::check_inactivity (this=0x4b3f780, 
> event=2, e=0x4b2d6d0) at UnixNet.cc:57
> #2  0x004d2c5f in Continuation::handleEvent (this=0x4b3f780, event=2, 
> data=0x4b2d6d0) at I_Continuation.h:146
> #3  0x006f5830 in EThread::process_event (this=0x2ae29010, 
> e=0x4b2d6d0, calling_code=2) at UnixEThread.cc:140
> #4  0x006f5b72 in EThread::execute (this=0x2ae29010) at 
> UnixEThread.cc:217
> #5  0x004ff37d in main (argc=3, argv=0x7fff6f447418) at Main.cc:1958
> (gdb) info f
> Stack level 0, frame at 0x7fff6f446cb0:
>  rip = 0x4d2c5c in Continuation::handleEvent(int, void*) 
> (I_Continuation.h:146); saved rip 0x6ce196
>  called by frame at 0x7fff6f446d00
>  source language c++.
>  Arglist at 0x7fff6f446ca0, args: this=0x2aaaf4091b70, event=1, data=0x4b2d6d0
>  Locals at 0x7fff6f446ca0, Previous frame's sp is 0x7fff6f446cb0
>  Saved registers:
>   rbp at 0x7fff6f446ca0, rip at 0x7fff6f446ca8
> (gdb) x/80x this
> 0x2aaaf4091b70: 0x0076a830  0x  0x006d1902  0x
> 0x2aaaf4091b80: 0x  0x  0x0076a290  0x
> 0x2aaaf4091b90: 0x  0x  0x  0x
> 0x2aaaf4091ba0: 0x  0x  0x  0x
> 0x2aaaf4091bb0: 0x  0x  0x  0x
> 0x2aaaf4091bc0: 0x  0x  0x  0x
> 0x2aaaf4091bd0: 0x  0x  0x  0x
> 0x2aaaf4091be0: 0x  0x  0x  0x
> 0x2aaaf4091bf0: 0x  0x  0x  0x
> 0x2aaaf4091c00: 0x  0x  0x  0x
> 0x2aaaf4091c10: 0x  0x  0x  0x
> 0x2aaaf4091c20: 0x  0x  0x  0x
> 0x2aaaf4091c30: 0x  0x  0x  0x
> 0x2aaaf4091c40: 0x  0x  0x  0x
> 0x2aaaf4091c50: 0x  0x  0x  0x
> 0x2aaaf4091c60: 0x  0x  0x  0x
> 0x2aaaf4091c70: 0x  0x  0x  0x
> 0x2aaaf4091c80: 0x  0x  0x  0x
> 0x2aaaf4091c90: 0x  0x  0x  0x
> 0x2aaaf4091ca0: 0x  0x  0x  0x
> {code}
> bt #2
> {code}
> #0  0x004d2c5c in Continuation::handleEvent (this=0x11ed6000, 
> event=1, data=0x11cbc610) at I_Continuation.h:146
> 146 return (this->*handler) (event, data);
> (gdb) bt
> #0  0x004d2c5c in Continuation::handleEvent (this=0x11ed6000, 
> event=1, data=0x11cbc610) at I_Continuation.h:146
> #1  0x006ce196 in InactivityCop::check_inactivity 
> (this=0x2c001f50, event=2, e=0x11cbc610) at UnixNet.cc:57
> #2  0x004d2c5f in Continuation::handleEvent (this=0x2c001f50, 
> event=2, data=0x11cbc610) at I_Continuation.h:146
> #3  0x006f5830 in EThread::process_event (this=0x2af2a010, 
> e=0x11cbc610, calling_code=2) at UnixEThread.cc:140
> #4  0x006f5b72 in EThread::execute (this=0x2af2a010) at 
> UnixEThread.cc:217
> #5  0x006f5181 in spawn_thread_internal (a=0x11cadae0) at Thread.cc:88
> #6  0x0030ec2064a7 in start_thread () from /lib64/libpthread.so.0
> #7  0x0030eb6d3c2d in clone () from /lib64/libc.so.6
> (gdb) info f
> Stack level 0, frame at 0x4198df60:
>  rip = 0x4d2c5c in Continuation::handleEvent(int, void*) 
> (I_Continuation.h:146); saved rip 0x6ce196
>  called by frame at 0x4198dfb0
>  source language c++.
>  Arglist at 0x4198df50, args: this=0x11ed6000, event=1, data=0

[jira] [Updated] (TS-833) Crash Report: Continuation::handleEvent, event=2, 0xdeadbeef, ink_freelist_free related

2011-06-14 Thread John Plevyak (JIRA)


 [ 
https://issues.apache.org/jira/browse/TS-833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Plevyak updated TS-833:


Attachment: TS-833-2.diff

This is a possible patch which deals with DNS issues.

> Crash Report: Continuation::handleEvent, event=2, 0xdeadbeef, 
> ink_freelist_free related
> ---
>
> Key: TS-833
> URL: https://issues.apache.org/jira/browse/TS-833
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 3.1.0
> Environment: current trunk, with --enable-debug
>Reporter: Zhao Yongming
>  Labels: freelist
> Attachments: TS-833-2.diff, TS-833.diff
>
>
> bt #1
> {code}
> #0  0x004d2c5c in Continuation::handleEvent (this=0x19581df0, 
> event=2, data=0x197c4fc0) at I_Continuation.h:146
> 146 return (this->*handler) (event, data);
> (gdb) bt
> #0  0x004d2c5c in Continuation::handleEvent (this=0x19581df0, 
> event=2, data=0x197c4fc0) at I_Continuation.h:146
> #1  0x006f5830 in EThread::process_event (this=0x2ae29010, 
> e=0x197c4fc0, calling_code=2) at UnixEThread.cc:140
> #2  0x006f5b72 in EThread::execute (this=0x2ae29010) at 
> UnixEThread.cc:217
> #3  0x004ff37d in main (argc=3, argv=0x7fff76c41528) at Main.cc:1958
> (gdb) info f
> Stack level 0, frame at 0x7fff76c40e40:
>  rip = 0x4d2c5c in Continuation::handleEvent(int, void*) 
> (I_Continuation.h:146); saved rip 0x6f5830
>  called by frame at 0x7fff76c40eb0
>  source language c++.
>  Arglist at 0x7fff76c40e30, args: this=0x19581df0, event=2, data=0x197c4fc0
>  Locals at 0x7fff76c40e30, Previous frame's sp is 0x7fff76c40e40
>  Saved registers:
>   rbp at 0x7fff76c40e30, rip at 0x7fff76c40e38
> (gdb) x/40x this
> 0x19581df0: 0x19581901  0x  0xefbeadde  0xefbeadde
> 0x19581e00: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e10: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e20: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e30: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e40: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e50: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e60: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e70: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e80: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> {code}
> bt #2
> {code}
> #0  0x004d637c in Continuation::handleEvent (this=0xc3cc390, event=2, 
> data=0xc4408a0) at I_Continuation.h:146
> 146 return (this->*handler) (event, data);
> (gdb) bt
> #0  0x004d637c in Continuation::handleEvent (this=0xc3cc390, event=2, 
> data=0xc4408a0) at I_Continuation.h:146
> #1  0x0070364c in EThread::process_event (this=0x2ae29010, 
> e=0xc4408a0, calling_code=2) at UnixEThread.cc:140
> #2  0x0070398e in EThread::execute (this=0x2ae29010) at 
> UnixEThread.cc:217
> #3  0x00502aac in main (argc=3, argv=0x7fff32ef2f58) at Main.cc:1961
> (gdb) p *this
> $1 = { = {_vptr.force_VFPT_to_top = 0x2aaab002f011}, 
> handler = 0xefbeaddeefbeadde, this adjustment -1171307680053154338, 
>   handler_name = 0xefbeaddeefbeadde  bounds>, mutex = {m_ptr = 0xefbeaddeefbeadde}, link = {> 
> = {
>   next = 0xefbeaddeefbeadde}, prev = 0xefbeaddeefbeadde}}
> (gdb) 
> {code}
> bt #3
> {code}
> #0  0x004d2c5c in Continuation::handleEvent (this=0x2aaab00615b0, 
> event=2, data=0x2aaab00d1570) at I_Continuation.h:146
> 146 return (this->*handler) (event, data);
> (gdb) bt
> #0  0x004d2c5c in Continuation::handleEvent (this=0x2aaab00615b0, 
> event=2, data=0x2aaab00d1570) at I_Continuation.h:146
> #1  0x006f5830 in EThread::process_event (this=0x2ae29010, 
> e=0x2aaab00d1570, calling_code=2) at UnixEThread.cc:140
> #2  0x006f5b72 in EThread::execute (this=0x2ae29010) at 
> UnixEThread.cc:217
> #3  0x004ff37d in main (argc=3, argv=0x7fff421f08d8) at Main.cc:1958
> (gdb) info f
> Stack level 0, frame at 0x7fff421f01f0:
>  rip = 0x4d2c5c in Continuation::handleEvent(int, void*) 
> (I_Continuation.h:146); saved rip 0x6f5830
>  called by frame at 0x7fff421f0260
>  source language c++.
>  Arglist at 0x7fff421f01e0, args: this=0x2aaab00615b0, event=2, 
> data=0x2aaab00d1570
>  Locals at 0x7fff421f01e0, Previous frame's sp is 0x7fff421f01f0
>  Saved registers:
>   rbp at 0x7fff421f01e0, rip at 0x7fff421f01e8
> (gdb) p this->handler
> $1 = 0xefbeaddeefbeadde, this adjustment -1171307680053154338
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-833) Crash Report: Continuation::handleEvent, event=2, 0xdeadbeef, ink_freelist_free related

2011-06-13 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048892#comment-13048892
 ] 

John Plevyak commented on TS-833:
-

The previous comments are really for TS-834.  For this bug, the first value 
"this" is a pointer to the next element in the freelist.
It just so happens that

0x19581df0 -0x19581900 = 1264

which probably not coincidentally is the same as sizeof(DNSEntry)... so this 
patch is really for 834 and the blasted DNSEntry is very likely the
cause of this second issue. 

> Crash Report: Continuation::handleEvent, event=2, 0xdeadbeef, 
> ink_freelist_free related
> ---
>
> Key: TS-833
> URL: https://issues.apache.org/jira/browse/TS-833
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 3.1.0
> Environment: current trunk, with --enable-debug
>Reporter: Zhao Yongming
>  Labels: freelist
> Attachments: TS-833.diff
>
>
> bt #1
> {code}
> #0  0x004d2c5c in Continuation::handleEvent (this=0x19581df0, 
> event=2, data=0x197c4fc0) at I_Continuation.h:146
> 146 return (this->*handler) (event, data);
> (gdb) bt
> #0  0x004d2c5c in Continuation::handleEvent (this=0x19581df0, 
> event=2, data=0x197c4fc0) at I_Continuation.h:146
> #1  0x006f5830 in EThread::process_event (this=0x2ae29010, 
> e=0x197c4fc0, calling_code=2) at UnixEThread.cc:140
> #2  0x006f5b72 in EThread::execute (this=0x2ae29010) at 
> UnixEThread.cc:217
> #3  0x004ff37d in main (argc=3, argv=0x7fff76c41528) at Main.cc:1958
> (gdb) info f
> Stack level 0, frame at 0x7fff76c40e40:
>  rip = 0x4d2c5c in Continuation::handleEvent(int, void*) 
> (I_Continuation.h:146); saved rip 0x6f5830
>  called by frame at 0x7fff76c40eb0
>  source language c++.
>  Arglist at 0x7fff76c40e30, args: this=0x19581df0, event=2, data=0x197c4fc0
>  Locals at 0x7fff76c40e30, Previous frame's sp is 0x7fff76c40e40
>  Saved registers:
>   rbp at 0x7fff76c40e30, rip at 0x7fff76c40e38
> (gdb) x/40x this
> 0x19581df0: 0x19581901  0x  0xefbeadde  0xefbeadde
> 0x19581e00: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e10: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e20: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e30: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e40: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e50: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e60: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e70: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e80: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> {code}
> bt #2
> {code}
> #0  0x004d637c in Continuation::handleEvent (this=0xc3cc390, event=2, 
> data=0xc4408a0) at I_Continuation.h:146
> 146 return (this->*handler) (event, data);
> (gdb) bt
> #0  0x004d637c in Continuation::handleEvent (this=0xc3cc390, event=2, 
> data=0xc4408a0) at I_Continuation.h:146
> #1  0x0070364c in EThread::process_event (this=0x2ae29010, 
> e=0xc4408a0, calling_code=2) at UnixEThread.cc:140
> #2  0x0070398e in EThread::execute (this=0x2ae29010) at 
> UnixEThread.cc:217
> #3  0x00502aac in main (argc=3, argv=0x7fff32ef2f58) at Main.cc:1961
> (gdb) p *this
> $1 = { = {_vptr.force_VFPT_to_top = 0x2aaab002f011}, 
> handler = 0xefbeaddeefbeadde, this adjustment -1171307680053154338, 
>   handler_name = 0xefbeaddeefbeadde  bounds>, mutex = {m_ptr = 0xefbeaddeefbeadde}, link = {> 
> = {
>   next = 0xefbeaddeefbeadde}, prev = 0xefbeaddeefbeadde}}
> (gdb) 
> {code}
> bt #3
> {code}
> #0  0x004d2c5c in Continuation::handleEvent (this=0x2aaab00615b0, 
> event=2, data=0x2aaab00d1570) at I_Continuation.h:146
> 146 return (this->*handler) (event, data);
> (gdb) bt
> #0  0x004d2c5c in Continuation::handleEvent (this=0x2aaab00615b0, 
> event=2, data=0x2aaab00d1570) at I_Continuation.h:146
> #1  0x006f5830 in EThread::process_event (this=0x2ae29010, 
> e=0x2aaab00d1570, calling_code=2) at UnixEThread.cc:140
> #2  0x006f5b72 in EThread::execute (this=0x2ae29010) at 
> UnixEThread.cc:217
> #3  0x004ff37d in main (argc=3, argv=0x7fff421f08d8) at Main.cc:1958
> (gdb) info f
> Stack level 0, frame at 0x7fff421f01f0:
>  rip = 0x4d2c5c in Continuation::handleEvent(int, void*) 
> (I_Continuation.h:146); saved rip 0x6f5830
>  called by frame at 0x7fff421f0260
>  source language c++.
>  Arglist at 0x7fff421f01e0, args: this=0x2aaab00615b0, event=2, 
> data=0x2aaab00d1570
>  Locals at 0x7fff421f01e0, Previous

[jira] [Commented] (TS-834) Crash Report: InactivityCop::check_inactivity, event=2, UnixNet.cc:57

2011-06-13 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048873#comment-13048873
 ] 

John Plevyak commented on TS-834:
-

Check out my comment from TS-834.  I don't know that they are dups, but those 
comments apply here.

> Crash Report: InactivityCop::check_inactivity, event=2, UnixNet.cc:57
> -
>
> Key: TS-834
> URL: https://issues.apache.org/jira/browse/TS-834
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 3.1.0
> Environment: current trunk( the same time as v3.0), --enable-debug
>Reporter: Zhao Yongming
>  Labels: UnixNet
>
> bt #1
> {code}
> #0  0x004d2c5c in Continuation::handleEvent (this=0x2aaaf4091b70, 
> event=1, data=0x4b2d6d0) at I_Continuation.h:146
> 146 return (this->*handler) (event, data);
> (gdb) bt
> #0  0x004d2c5c in Continuation::handleEvent (this=0x2aaaf4091b70, 
> event=1, data=0x4b2d6d0) at I_Continuation.h:146
> #1  0x006ce196 in InactivityCop::check_inactivity (this=0x4b3f780, 
> event=2, e=0x4b2d6d0) at UnixNet.cc:57
> #2  0x004d2c5f in Continuation::handleEvent (this=0x4b3f780, event=2, 
> data=0x4b2d6d0) at I_Continuation.h:146
> #3  0x006f5830 in EThread::process_event (this=0x2ae29010, 
> e=0x4b2d6d0, calling_code=2) at UnixEThread.cc:140
> #4  0x006f5b72 in EThread::execute (this=0x2ae29010) at 
> UnixEThread.cc:217
> #5  0x004ff37d in main (argc=3, argv=0x7fff6f447418) at Main.cc:1958
> (gdb) info f
> Stack level 0, frame at 0x7fff6f446cb0:
>  rip = 0x4d2c5c in Continuation::handleEvent(int, void*) 
> (I_Continuation.h:146); saved rip 0x6ce196
>  called by frame at 0x7fff6f446d00
>  source language c++.
>  Arglist at 0x7fff6f446ca0, args: this=0x2aaaf4091b70, event=1, data=0x4b2d6d0
>  Locals at 0x7fff6f446ca0, Previous frame's sp is 0x7fff6f446cb0
>  Saved registers:
>   rbp at 0x7fff6f446ca0, rip at 0x7fff6f446ca8
> (gdb) x/80x this
> 0x2aaaf4091b70: 0x0076a830  0x  0x006d1902  0x
> 0x2aaaf4091b80: 0x  0x  0x0076a290  0x
> 0x2aaaf4091b90: 0x  0x  0x  0x
> 0x2aaaf4091ba0: 0x  0x  0x  0x
> 0x2aaaf4091bb0: 0x  0x  0x  0x
> 0x2aaaf4091bc0: 0x  0x  0x  0x
> 0x2aaaf4091bd0: 0x  0x  0x  0x
> 0x2aaaf4091be0: 0x  0x  0x  0x
> 0x2aaaf4091bf0: 0x  0x  0x  0x
> 0x2aaaf4091c00: 0x  0x  0x  0x
> 0x2aaaf4091c10: 0x  0x  0x  0x
> 0x2aaaf4091c20: 0x  0x  0x  0x
> 0x2aaaf4091c30: 0x  0x  0x  0x
> 0x2aaaf4091c40: 0x  0x  0x  0x
> 0x2aaaf4091c50: 0x  0x  0x  0x
> 0x2aaaf4091c60: 0x  0x  0x  0x
> 0x2aaaf4091c70: 0x  0x  0x  0x
> 0x2aaaf4091c80: 0x  0x  0x  0x
> 0x2aaaf4091c90: 0x  0x  0x  0x
> 0x2aaaf4091ca0: 0x  0x  0x  0x
> {code}
> bt #2
> {code}
> #0  0x004d2c5c in Continuation::handleEvent (this=0x11ed6000, 
> event=1, data=0x11cbc610) at I_Continuation.h:146
> 146 return (this->*handler) (event, data);
> (gdb) bt
> #0  0x004d2c5c in Continuation::handleEvent (this=0x11ed6000, 
> event=1, data=0x11cbc610) at I_Continuation.h:146
> #1  0x006ce196 in InactivityCop::check_inactivity 
> (this=0x2c001f50, event=2, e=0x11cbc610) at UnixNet.cc:57
> #2  0x004d2c5f in Continuation::handleEvent (this=0x2c001f50, 
> event=2, data=0x11cbc610) at I_Continuation.h:146
> #3  0x006f5830 in EThread::process_event (this=0x2af2a010, 
> e=0x11cbc610, calling_code=2) at UnixEThread.cc:140
> #4  0x006f5b72 in EThread::execute (this=0x2af2a010) at 
> UnixEThread.cc:217
> #5  0x006f5181 in spawn_thread_internal (a=0x11cadae0) at Thread.cc:88
> #6  0x0030ec2064a7 in start_thread () from /lib64/libpthread.so.0
> #7  0x0030eb6d3c2d in clone () from /lib64/libc.so.6
> (gdb) info f
> Stack level 0, frame at 0x4198df60:
>  rip = 0x4d2c5c in Continuation::handleEvent(int, void*) 
> (I_Continuation.h:146); saved rip 0x6ce196
>  called by frame at 0x4198dfb0
>  source language c++.
>  Arglist at 0x4198df50, ar

[jira] [Updated] (TS-833) Crash Report: Continuation::handleEvent, event=2, 0xdeadbeef, ink_freelist_free related

2011-06-13 Thread John Plevyak (JIRA)


 [ 
https://issues.apache.org/jira/browse/TS-833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Plevyak updated TS-833:


Attachment: TS-833.diff

Potential patch for InactivityCop

> Crash Report: Continuation::handleEvent, event=2, 0xdeadbeef, 
> ink_freelist_free related
> ---
>
> Key: TS-833
> URL: https://issues.apache.org/jira/browse/TS-833
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 3.1.0
> Environment: current trunk, with --enable-debug
>Reporter: Zhao Yongming
>  Labels: freelist
> Attachments: TS-833.diff
>
>
> bt #1
> {code}
> #0  0x004d2c5c in Continuation::handleEvent (this=0x19581df0, 
> event=2, data=0x197c4fc0) at I_Continuation.h:146
> 146 return (this->*handler) (event, data);
> (gdb) bt
> #0  0x004d2c5c in Continuation::handleEvent (this=0x19581df0, 
> event=2, data=0x197c4fc0) at I_Continuation.h:146
> #1  0x006f5830 in EThread::process_event (this=0x2ae29010, 
> e=0x197c4fc0, calling_code=2) at UnixEThread.cc:140
> #2  0x006f5b72 in EThread::execute (this=0x2ae29010) at 
> UnixEThread.cc:217
> #3  0x004ff37d in main (argc=3, argv=0x7fff76c41528) at Main.cc:1958
> (gdb) info f
> Stack level 0, frame at 0x7fff76c40e40:
>  rip = 0x4d2c5c in Continuation::handleEvent(int, void*) 
> (I_Continuation.h:146); saved rip 0x6f5830
>  called by frame at 0x7fff76c40eb0
>  source language c++.
>  Arglist at 0x7fff76c40e30, args: this=0x19581df0, event=2, data=0x197c4fc0
>  Locals at 0x7fff76c40e30, Previous frame's sp is 0x7fff76c40e40
>  Saved registers:
>   rbp at 0x7fff76c40e30, rip at 0x7fff76c40e38
> (gdb) x/40x this
> 0x19581df0: 0x19581901  0x  0xefbeadde  0xefbeadde
> 0x19581e00: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e10: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e20: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e30: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e40: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e50: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e60: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e70: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e80: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> {code}
> bt #2
> {code}
> #0  0x004d637c in Continuation::handleEvent (this=0xc3cc390, event=2, 
> data=0xc4408a0) at I_Continuation.h:146
> 146 return (this->*handler) (event, data);
> (gdb) bt
> #0  0x004d637c in Continuation::handleEvent (this=0xc3cc390, event=2, 
> data=0xc4408a0) at I_Continuation.h:146
> #1  0x0070364c in EThread::process_event (this=0x2ae29010, 
> e=0xc4408a0, calling_code=2) at UnixEThread.cc:140
> #2  0x0070398e in EThread::execute (this=0x2ae29010) at 
> UnixEThread.cc:217
> #3  0x00502aac in main (argc=3, argv=0x7fff32ef2f58) at Main.cc:1961
> (gdb) p *this
> $1 = { = {_vptr.force_VFPT_to_top = 0x2aaab002f011}, 
> handler = 0xefbeaddeefbeadde, this adjustment -1171307680053154338, 
>   handler_name = 0xefbeaddeefbeadde  bounds>, mutex = {m_ptr = 0xefbeaddeefbeadde}, link = {> 
> = {
>   next = 0xefbeaddeefbeadde}, prev = 0xefbeaddeefbeadde}}
> (gdb) 
> {code}
> bt #3
> {code}
> #0  0x004d2c5c in Continuation::handleEvent (this=0x2aaab00615b0, 
> event=2, data=0x2aaab00d1570) at I_Continuation.h:146
> 146 return (this->*handler) (event, data);
> (gdb) bt
> #0  0x004d2c5c in Continuation::handleEvent (this=0x2aaab00615b0, 
> event=2, data=0x2aaab00d1570) at I_Continuation.h:146
> #1  0x006f5830 in EThread::process_event (this=0x2ae29010, 
> e=0x2aaab00d1570, calling_code=2) at UnixEThread.cc:140
> #2  0x006f5b72 in EThread::execute (this=0x2ae29010) at 
> UnixEThread.cc:217
> #3  0x004ff37d in main (argc=3, argv=0x7fff421f08d8) at Main.cc:1958
> (gdb) info f
> Stack level 0, frame at 0x7fff421f01f0:
>  rip = 0x4d2c5c in Continuation::handleEvent(int, void*) 
> (I_Continuation.h:146); saved rip 0x6f5830
>  called by frame at 0x7fff421f0260
>  source language c++.
>  Arglist at 0x7fff421f01e0, args: this=0x2aaab00615b0, event=2, 
> data=0x2aaab00d1570
>  Locals at 0x7fff421f01e0, Previous frame's sp is 0x7fff421f01f0
>  Saved registers:
>   rbp at 0x7fff421f01e0, rip at 0x7fff421f01e8
> (gdb) p this->handler
> $1 = 0xefbeaddeefbeadde, this adjustment -1171307680053154338
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-833) Crash Report: Continuation::handleEvent, event=2, 0xdeadbeef, ink_freelist_free related

2011-06-13 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048750#comment-13048750
 ] 

John Plevyak commented on TS-833:
-

I have a theory about this, but I am not sure why the problem has only manifest 
now as it seems to have been in the codebase for a while.  The theory is that 
the vc_next is bad because it has been closed as a result of the inactivity 
callback.   This could be checked by walking down nh->open_list in the debugger 
(or code) to see if next_vc is in the list.

> Crash Report: Continuation::handleEvent, event=2, 0xdeadbeef, 
> ink_freelist_free related
> ---
>
> Key: TS-833
> URL: https://issues.apache.org/jira/browse/TS-833
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 3.1.0
> Environment: current trunk, with --enable-debug
>Reporter: Zhao Yongming
>  Labels: freelist
>
> bt #1
> {code}
> #0  0x004d2c5c in Continuation::handleEvent (this=0x19581df0, 
> event=2, data=0x197c4fc0) at I_Continuation.h:146
> 146 return (this->*handler) (event, data);
> (gdb) bt
> #0  0x004d2c5c in Continuation::handleEvent (this=0x19581df0, 
> event=2, data=0x197c4fc0) at I_Continuation.h:146
> #1  0x006f5830 in EThread::process_event (this=0x2ae29010, 
> e=0x197c4fc0, calling_code=2) at UnixEThread.cc:140
> #2  0x006f5b72 in EThread::execute (this=0x2ae29010) at 
> UnixEThread.cc:217
> #3  0x004ff37d in main (argc=3, argv=0x7fff76c41528) at Main.cc:1958
> (gdb) info f
> Stack level 0, frame at 0x7fff76c40e40:
>  rip = 0x4d2c5c in Continuation::handleEvent(int, void*) 
> (I_Continuation.h:146); saved rip 0x6f5830
>  called by frame at 0x7fff76c40eb0
>  source language c++.
>  Arglist at 0x7fff76c40e30, args: this=0x19581df0, event=2, data=0x197c4fc0
>  Locals at 0x7fff76c40e30, Previous frame's sp is 0x7fff76c40e40
>  Saved registers:
>   rbp at 0x7fff76c40e30, rip at 0x7fff76c40e38
> (gdb) x/40x this
> 0x19581df0: 0x19581901  0x  0xefbeadde  0xefbeadde
> 0x19581e00: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e10: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e20: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e30: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e40: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e50: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e60: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e70: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> 0x19581e80: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
> {code}
> bt #2
> {code}
> #0  0x004d637c in Continuation::handleEvent (this=0xc3cc390, event=2, 
> data=0xc4408a0) at I_Continuation.h:146
> 146 return (this->*handler) (event, data);
> (gdb) bt
> #0  0x004d637c in Continuation::handleEvent (this=0xc3cc390, event=2, 
> data=0xc4408a0) at I_Continuation.h:146
> #1  0x0070364c in EThread::process_event (this=0x2ae29010, 
> e=0xc4408a0, calling_code=2) at UnixEThread.cc:140
> #2  0x0070398e in EThread::execute (this=0x2ae29010) at 
> UnixEThread.cc:217
> #3  0x00502aac in main (argc=3, argv=0x7fff32ef2f58) at Main.cc:1961
> (gdb) p *this
> $1 = { = {_vptr.force_VFPT_to_top = 0x2aaab002f011}, 
> handler = 0xefbeaddeefbeadde, this adjustment -1171307680053154338, 
>   handler_name = 0xefbeaddeefbeadde  bounds>, mutex = {m_ptr = 0xefbeaddeefbeadde}, link = {> 
> = {
>   next = 0xefbeaddeefbeadde}, prev = 0xefbeaddeefbeadde}}
> (gdb) 
> {code}
> bt #3
> {code}
> #0  0x004d2c5c in Continuation::handleEvent (this=0x2aaab00615b0, 
> event=2, data=0x2aaab00d1570) at I_Continuation.h:146
> 146 return (this->*handler) (event, data);
> (gdb) bt
> #0  0x004d2c5c in Continuation::handleEvent (this=0x2aaab00615b0, 
> event=2, data=0x2aaab00d1570) at I_Continuation.h:146
> #1  0x006f5830 in EThread::process_event (this=0x2ae29010, 
> e=0x2aaab00d1570, calling_code=2) at UnixEThread.cc:140
> #2  0x006f5b72 in EThread::execute (this=0x2ae29010) at 
> UnixEThread.cc:217
> #3  0x004ff37d in main (argc=3, argv=0x7fff421f08d8) at Main.cc:1958
> (gdb) info f
> Stack level 0, frame at 0x7fff421f01f0:
>  rip = 0x4d2c5c in Continuation::handleEvent(int, void*) 
> (I_Continuation.h:146); saved rip 0x6f5830
>  called by frame at 0x7fff421f0260
>  source language c++.
>  Arglist at 0x7fff421f01e0, args: this=0x2aaab00615b0, event=2, 
> data=0x2aaab00d1570
>  Locals at 0x7fff421f01e0, Previous frame's sp is 0x7fff421f01f0
>  Saved regist

[jira] [Updated] (TS-811) libtool configure warnings on Fedora 15

2011-05-30 Thread John Plevyak (JIRA)


 [ 
https://issues.apache.org/jira/browse/TS-811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Plevyak updated TS-811:


Priority: Major  (was: Minor)

autoreconf -i fails.  If the configure file is built on some other machine, then
it will work fine so this only impacts developers :)

The resulting configure and them Makefile's are not functional:

make[3]: Entering directory 
`/a/home/jplevyak/projects/ts/trafficserver-2.1.9-unstable/lib/ts'
/bin/sh ../../libtool --tag=CXX   --mode=compile g++ -DHAVE_CONFIG_H -I.   
-D_LARGEFILE64_SOURCE=1 -D_COMPILE64BIT_SOURCE=1 -D_GNU_SOURCE -D_REENTRANT 
-Dlinux  -g -pipe -Wall -Werror -O3 -feliminate-unused-debug-symbols 
-fno-strict-aliasing -Wno-invalid-offsetof  -MT Allocator.lo -MD -MP -MF 
.deps/Allocator.Tpo -c -o Allocator.lo Allocator.cc
../../libtool: line 2089: ./Allocator.cc: Permission denied
libtool: compile:  g++ -DHAVE_CONFIG_H -I. -D_LARGEFILE64_SOURCE=1 
-D_COMPILE64BIT_SOURCE=1 -D_GNU_SOURCE -D_REENTRANT -Dlinux -g -pipe -Wall 
-Werror -O3 -feliminate-unused-debug-symbols -fno-strict-aliasing 
-Wno-invalid-offsetof -MT Allocator.lo -MD -MP -MF .deps/Allocator.Tpo -c ""  
-fPIC -DPIC -o .libs/Allocator.o
g++: error: : No such file or directory
g++: fatal error: no input files
compilation terminated.
make[3]: *** [Allocator.lo] Error 1
make[3]: Leaving directory 
`/a/home/jplevyak/projects/ts/trafficserver-2.1.9-unstable/lib/ts'
make[2]: *** [all] Error 2
make[2]: Leaving directory 
`/a/home/jplevyak/projects/ts/trafficserver-2.1.9-unstable/lib/ts'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory 
`/a/home/jplevyak/projects/ts/trafficserver-2.1.9-unstable/lib'


> libtool configure warnings on Fedora 15
> ---
>
> Key: TS-811
> URL: https://issues.apache.org/jira/browse/TS-811
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.1.9
> Environment: Fedora 15 x86_64.
>Reporter: John Plevyak
> Fix For: 3.1.0
>
>
> configure.ac:465: warning: AC_LANG_CONFTEST: no AC_LANG_SOURCE call detected 
> in body
> ../../lib/autoconf/lang.m4:194: AC_LANG_CONFTEST is expanded from...
> ../../lib/autoconf/general.m4:2662: _AC_LINK_IFELSE is expanded from...
> ../../lib/autoconf/general.m4:2679: AC_LINK_IFELSE is expanded from...
> build/libtool.m4:1084: _LT_SYS_MODULE_PATH_AIX is expanded from...
> build/libtool.m4:5428: _LT_LANG_CXX_CONFIG is expanded from...
> build/libtool.m4:816: _LT_LANG is expanded from...
> build/libtool.m4:799: LT_LANG is expanded from...
> build/libtool.m4:827: _LT_LANG_DEFAULT_CONFIG is expanded from...
> build/libtool.m4:143: _LT_SETUP is expanded from...
> build/libtool.m4:69: LT_INIT is expanded from...
> build/libtool.m4:107: AC_PROG_LIBTOOL is expanded from...
> configure.ac:465: the top level

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (TS-811) libtool configure warnings on Fedora 15

2011-05-30 Thread John Plevyak (JIRA)

libtool configure warnings on Fedora 15
---

 Key: TS-811
 URL: https://issues.apache.org/jira/browse/TS-811
 Project: Traffic Server
  Issue Type: Bug
  Components: Build
Affects Versions: 2.1.9
 Environment: Fedora 15 x86_64.
Reporter: John Plevyak
Priority: Minor
 Fix For: 3.1.0


configure.ac:465: warning: AC_LANG_CONFTEST: no AC_LANG_SOURCE call detected in 
body
../../lib/autoconf/lang.m4:194: AC_LANG_CONFTEST is expanded from...
../../lib/autoconf/general.m4:2662: _AC_LINK_IFELSE is expanded from...
../../lib/autoconf/general.m4:2679: AC_LINK_IFELSE is expanded from...
build/libtool.m4:1084: _LT_SYS_MODULE_PATH_AIX is expanded from...
build/libtool.m4:5428: _LT_LANG_CXX_CONFIG is expanded from...
build/libtool.m4:816: _LT_LANG is expanded from...
build/libtool.m4:799: LT_LANG is expanded from...
build/libtool.m4:827: _LT_LANG_DEFAULT_CONFIG is expanded from...
build/libtool.m4:143: _LT_SETUP is expanded from...
build/libtool.m4:69: LT_INIT is expanded from...
build/libtool.m4:107: AC_PROG_LIBTOOL is expanded from...
configure.ac:465: the top level


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-178) Change INK_ prefix to TS_ and ink_ to ts_ and inkXXX to tsXXX, and change filenames and directory structure

2011-05-30 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13041165#comment-13041165
 ] 

John Plevyak commented on TS-178:
-

FYI: this isn't fixed.  lib/ts includes many files prefixed with ink_ and many 
functions
with the same prefix.



> Change INK_ prefix to TS_ and ink_ to ts_ and inkXXX to tsXXX, and change 
> filenames and directory structure
> ---
>
> Key: TS-178
> URL: https://issues.apache.org/jira/browse/TS-178
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Cleanup
>Reporter: John Plevyak
> Fix For: 3.0.1
>
>
> We should change the INK_, ink_ and ink prefixes to be TS_, ts_ and ts.
> The target for this change is 2.2, that is, right before the 2.2 snap we make 
> this
> changes.
> Earlier and it will make merging bug fixes into both 2.0 and 2.1 very hard as 
> patch
> will likely fail, particularly if we change file names.
> I would suggest that we combine this with changing the directory structure 
> since we
> will already be biting the bullet and making it very hard to compare code 
> across these
> changes.
> See
> http://cwiki.apache.org/confluence/display/TS/2_2_Prefix_Changes
> for the ongoing proposal.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Closed] (TS-773) Traffic server has a hard limit of 512 gigabytes per RAW disk partition

2011-05-20 Thread John Plevyak (JIRA)


 [ 
https://issues.apache.org/jira/browse/TS-773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Plevyak closed TS-773.
---

Resolution: Fixed

This seems to be fixed now.  Tested by jplevyak and zwoop.

> Traffic server has a hard limit of 512 gigabytes per RAW disk partition
> ---
>
> Key: TS-773
> URL: https://issues.apache.org/jira/browse/TS-773
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache
>Affects Versions: 2.1.8
> Environment: Debian Lenny 5.0.8 2.6.34.7 x86_64
> 12 1.5TB harddrives for cache disks. 
>Reporter: David Robinson
>Assignee: John Plevyak
> Fix For: 2.1.9
>
>
> Using 1.5TB harddrives as cache disks results in ATS only using 512GBs of the 
> disk. The disks are configured in RAW mode with no partition information.
> storage.config is setup like this,
> /dev/sda
> /dev/sdb
> /dev/sde
> /dev/sdf
> /dev/sdh
> /dev/sdi
> /dev/sdj
> /dev/sdk
> /dev/sdl
> /dev/sdm
> /dev/sdn
> /dev/sdo
> fdisk -l /dev/sdo
> Disk /dev/sdo: 1500.3 GB, 1500301910016 bytes
> 255 heads, 63 sectors/track, 182401 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
> Disk identifier: 0x
> Partitioning a disk into 3 512G partition and adding then to storage.config 
> will make ATS use the entire 1.5TBs of space.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (TS-621) writing 0 bytes to the HTTP cache means only update the header... need a new API: update_header_only() to allow 0 byte files to be cached

2011-05-20 Thread John Plevyak (JIRA)


 [ 
https://issues.apache.org/jira/browse/TS-621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Plevyak updated TS-621:


Backport to Version: 3.0.1
  Fix Version/s: (was: 2.1.9)
 3.1

This change is just too risky to land in 3.0.   We will make the change first 
thing in 3.1 and then backport if/when it proves stable.

> writing 0 bytes to the HTTP cache means only update the header... need a new 
> API: update_header_only() to allow 0 byte files to be cached
> -
>
> Key: TS-621
> URL: https://issues.apache.org/jira/browse/TS-621
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Cache
>Affects Versions: 2.1.5
>Reporter: John Plevyak
>Assignee: John Plevyak
> Fix For: 3.1
>
> Attachments: TS-621_cluster_zero_size_objects.patch, 
> ts-621-jp-1.patch, ts-621-jp-2.patch, ts-621-jp-3.patch
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-621) writing 0 bytes to the HTTP cache means only update the header... need a new API: update_header_only() to allow 0 byte files to be cached

2011-05-19 Thread John Plevyak (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036424#comment-13036424
 ] 

John Plevyak commented on TS-621:
-

Obviously the patch needs to be fixed up a bit.  The Cluster used the
CacheDataType
as a message type, so I hacked in:

 enum CacheDataType
 {
   CACHE_DATA_SIZE = VCONNECTION_CACHE_DATA_BASE,
-  CACHE_DATA_HTTP_INFO,
+  CACHE_DATA_HTTP_INFO_LEAVE_BODY,
+  CACHE_DATA_HTTP_INFO_REPLACE_BODY,
   CACHE_DATA_KEY,
   CACHE_DATA_RAM_CACHE_HIT_FLAG
 };

Which doesn't really make sense.  The leave/replace bit should be encoded
somewhere else in the message.

The changes to CacheWrite are very tricky and I have little faith in them.

We could land it, but we would needs some serious testing...





> writing 0 bytes to the HTTP cache means only update the header... need a new 
> API: update_header_only() to allow 0 byte files to be cached
> -
>
> Key: TS-621
> URL: https://issues.apache.org/jira/browse/TS-621
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Cache
>Affects Versions: 2.1.5
>Reporter: John Plevyak
>Assignee: John Plevyak
> Fix For: 2.1.9
>
> Attachments: TS-621_cluster_zero_size_objects.patch, 
> ts-621-jp-1.patch, ts-621-jp-2.patch, ts-621-jp-3.patch
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

1 2 3 >

1 - 100 of 229 matches

Mail list logo