Hi Dishara, Thats excellent. More evidence that read scalability is flat as the number of child entries rises. All the results are distributed around 14ms regardless of the number of child nodes upto 1M.
How much effort would it be to populate a collection with 10M and 100M items ? Ian On 27 August 2013 06:44, Dishara Wijewardana <[email protected]> wrote: > Hi Ian, FYI > The exercise data is the first 100 even numbers and test data is first 100 > odd numbers. > > > On Tue, Aug 27, 2013 at 11:12 AM, Dishara Wijewardana < > [email protected]> wrote: > >> Hi Ian, >> I have updated the JIRA https://issues.apache.org/jira/browse/SLING-3026 with >> the new test results. I have created a integration test which runs inside >> /launchpad/integration-tests which does the exact same thing you mentioned. >> And I am writing the results to a file and that is also attached in the >> JIRA. It also shows you the test summary with average latency. >> >> NOTE: Here I use HTTPBase test to do HTTP calls and I calculate the >> latency from the time difference in millis between before call and after >> call. >> >> >> >> On Sat, Aug 24, 2013 at 1:06 PM, Ian Boston <[email protected]> wrote: >> >>> Hi Dishara, >>> >>> Interesting, >>> Read times show no correlation the number of items in a collection. >>> (thats good!). >>> From 1 - 1M child nodes the access time is almost identical showing >>> flat scalability for read as collection size grows. >>> >>> Since the results are so good, I think it would be worth expanding the >>> test to verify that it really is the case. >>> >>> Rather than starting a fresh server, can you randomise which node is >>> retrieved, retrieve the node only once and run against a server that >>> has been previously exercised on different nodes. >>> >>> The test algorithm should go something like this. >>> >>> populate a set with 100 unique numbers in the range 0-1000 (call this >>> exercise set) >>> populate a set with 100 unique numbers in the range 0-1000 not in the >>> first set ( call this test set). >>> for each collection (A,B,C,D): >>> get all the children in exercise set. >>> record the time taken to get each child in test set. (first >>> time results) >>> get all the children in exercise set. >>> record the time taken to get each child in test set. (second >>> time results) >>> >>> This may not be a perfect test but it tries to bring the server up >>> into a running state, eliminate first time startups and measure the >>> time taken to get an child first and second time. If that still shows >>> a completely flat scaling curve from 0 to 1M items, then that becomes >>> really interesting. >>> >>> Ian >>> >>> >>> On 23 August 2013 03:33, Dishara Wijewardana (JIRA) <[email protected]> >>> wrote: >>> > >>> > [ >>> https://issues.apache.org/jira/browse/SLING-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel] >>> > >>> > Dishara Wijewardana updated SLING-3026: >>> > --------------------------------------- >>> > >>> > Attachment: SLING_CASSANDRA_LATENCY_STATS_TWO_CHART_22-08-2013.png >>> > SLING_CASSANDRA_LATENCY_STATS_CHART_22-08-2013.png >>> > >>> > The corresponding graphs attach herewith. >>> > >>> >> Cassandra Resource Provider READ Latency Stats >>> >> ----------------------------------------------- >>> >> >>> >> Key: SLING-3026 >>> >> URL: https://issues.apache.org/jira/browse/SLING-3026 >>> >> Project: Sling >>> >> Issue Type: Task >>> >> Reporter: Dishara Wijewardana >>> >> Priority: Critical >>> >> Attachments: SLING_CASSANDRA_LATENCY_STATS_22-08-2013.txt, >>> SLING_CASSANDRA_LATENCY_STATS_CHART_22-08-2013.png, >>> SLING_CASSANDRA_LATENCY_STATS_TWO_CHART_22-08-2013.png >>> >> >>> >> >>> >> This is to keep track on the statistics of the latency for the >>> requests done on Cassandra layer through Cassandra Resource Provider. Here >>> we use Apache Benchmark. >>> >> We have a test profile java component in the cassandra module to add >>> bulk test data to cassandra. >>> >> /content/cassandra/A/0 to /content/cassandra/A/999 >>> >> /content/cassandra/B/0 to /content/cassandra/B/9999 >>> >> /content/cassandra/C/0 to /content/cassandra/C/99999 >>> >> /content/cassandra/D/0 to /content/cassandra/D/999999 >>> >> And then this JIRA will keep track of reports on the http request time >>> to retrieve 1 node from each following data collection. >>> >> >>> > >>> > -- >>> > This message is automatically generated by JIRA. >>> > If you think it was sent incorrectly, please contact your JIRA >>> administrators >>> > For more information on JIRA, see: >>> http://www.atlassian.com/software/jira >>> >> >> >> >> -- >> Thanks >> /Dishara >> > > > > -- > Thanks > /Dishara
