Hi Ian, On Thu, Aug 29, 2013 at 12:13 PM, Ian Boston <[email protected]> wrote:
> Hi Dishara, > That was definitely worth doing. Looks like it is flat scalable on > read in this form up to 100M items per collection. Before I get too > excited about the results, are you absolutely certain that the > ResourceProvider is retrieving the items requested and not just short > circuiting somewhere ? > I hope results are correct :-). - In fact currently what we only can retrieve is pre defined cassandra paths in Map. So before running this, I also update the provider to populate 100 children in each node A,B,C,D,E,F. So If I try to obtain a node not already populated in gives me error (I have verified it). So it picks the nodes from the exact places that it suppose to pick. - The latest result I gave was after some server warm up. In the very first iteration, the FIRST RUN average was about 25-26ms(because just after adding 100M records my compute seems over heated and etc). SECOND RUN was OK which had about 12-15ms in avg. Then I ran it again. Then this result came, which is around 12-14ms. That's what I posted. - And also, nodes from /content /cassandra/F/0 .. /content /cassandra/F/99999999. Each node in cassandra evaluates to a different key. So even though we have 100M under same column family, for cassandra, in hector point of view (relational DB point of view), is just a set of records with 10M unique keys and we read one record at a time (but I am not sure how cassandra really store these data). So I believe even with 1B records it will not have a huge latency difference (given that we cluster the cassandra for better performance). > > I'll think you are clear to go onto the next phase if you are willing. > I think you have an option. Do write or do access control. Write will > be more exciting, access control will be more thought intensive. You > may already have most of the write code, as you just wrote 100M items! > > Which one ? > I do have codes which writes to cassandra. But if you are OK, I would like to go with read with access control first. Writing I can easily in cooperate to a sling interface once realized the API. I am not sure the complexity of access controlling thing :-) . So better to start with it :-) > Ian > > On 29 August 2013 02:12, Dishara Wijewardana <[email protected]> > wrote: > > Hi Ian, > > I have updated the latest results in the JIRA, and please find the report > > named "CassandraLatencyReport_V1.txt" to get the latest results. I > improve > > the report also to get average latency under each node. So the node "E" > and > > "F" will have 10M and 100M collection. > > > > P.S It took >7 hrs for me to populate a 100M collection :-). Seems it's > > worth spending that much of time for populating a 100M collection by > seeing > > the results. > > > > > =========================================================================================================================== > > ========================================== FIRST RUN TEST > > SUMMERY========================================================== > > [RESULT] Average Latency Under Node A(1K) = 14 (ms) > > [RESULT] Average Latency Under Node B(10K) = 11 (ms) > > [RESULT] Average Latency Under Node C(100K) = 12 (ms) > > [RESULT] Average Latency Under Node D(1M) = 21 (ms) > > [RESULT] Average Latency Under Node E(10M) = 21 (ms) > > [RESULT] Average Latency Under Node F(100M) = 16 (ms) > > [FIRST RUN] #TOTAL CALLS = 600 Total Average Latency = 16 (ms) > > > =========================================================================================================================== > > ========================================== SECOND RUN TEST > > SUMMERY========================================================== > > [RESULT] Average Latency Under Node A(1K) = 10 (ms) > > [RESULT] Average Latency Under Node B(10K) = 15 (ms) > > [RESULT] Average Latency Under Node C(100K) = 14 (ms) > > [RESULT] Average Latency Under Node D(1M) = 14 (ms) > > [RESULT] Average Latency Under Node E(10M) = 11 (ms) > > [RESULT] Average Latency Under Node F(100M) = 14 (ms) > > [FIRST RUN] #TOTAL CALLS = 600 Total Average Latency = 13 (ms) > > > =========================================================================================================================== > > > > > > > > > > > > > > > > > > > > On Thu, Aug 29, 2013 at 6:37 AM, Dishara Wijewardana (JIRA) < > [email protected] > >> wrote: > > > >> > >> [ > >> > https://issues.apache.org/jira/browse/SLING-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel > ] > >> > >> Dishara Wijewardana updated SLING-3026: > >> --------------------------------------- > >> > >> Attachment: CassandraLatencyReport_V1.txt > >> > >> Here I am attaching the latest test results which includes latency to > pull > >> data from a 10M and 100M collections. > >> > >> > Cassandra Resource Provider READ Latency Stats > >> > ----------------------------------------------- > >> > > >> > Key: SLING-3026 > >> > URL: https://issues.apache.org/jira/browse/SLING-3026 > >> > Project: Sling > >> > Issue Type: Task > >> > Reporter: Dishara Wijewardana > >> > Priority: Critical > >> > Attachments: CassandraIntegrationTest.patch, > >> CassandraLatencyReport.txt, CassandraLatencyReport_V1.txt, > >> SLING_CASSANDRA_LATENCY_STATS_22-08-2013.txt, > >> SLING_CASSANDRA_LATENCY_STATS_CHART_22-08-2013.png, > >> SLING_CASSANDRA_LATENCY_STATS_TWO_CHART_22-08-2013.png > >> > > >> > > >> > This is to keep track on the statistics of the latency for the > requests > >> done on Cassandra layer through Cassandra Resource Provider. Here we use > >> Apache Benchmark. > >> > We have a test profile java component in the cassandra module to add > >> bulk test data to cassandra. > >> > /content/cassandra/A/0 to /content/cassandra/A/999 > >> > /content/cassandra/B/0 to /content/cassandra/B/9999 > >> > /content/cassandra/C/0 to /content/cassandra/C/99999 > >> > /content/cassandra/D/0 to /content/cassandra/D/999999 > >> > And then this JIRA will keep track of reports on the http request time > >> to retrieve 1 node from each following data collection. > >> > > >> > >> -- > >> This message is automatically generated by JIRA. > >> If you think it was sent incorrectly, please contact your JIRA > >> administrators > >> For more information on JIRA, see: > http://www.atlassian.com/software/jira > >> > > > > > > > > -- > > Thanks > > /Dishara > -- Thanks /Dishara
