Hi Ian,

On Thu, Aug 29, 2013 at 12:13 PM, Ian Boston <[email protected]> wrote:

> Hi Dishara,
> That was definitely worth doing. Looks like it is flat scalable on
> read in this form up to 100M items per collection. Before I get too
> excited about the results, are you absolutely certain that the
> ResourceProvider is retrieving the items requested and not just short
> circuiting somewhere ?
>

I hope results are correct :-).

- In fact currently what we only can retrieve is pre defined cassandra
paths in Map. So before running this, I also update the provider to
populate 100 children  in each
node A,B,C,D,E,F. So If I try to obtain a node not already populated in
gives me error (I have verified it). So it picks the nodes from the exact
places that it suppose to pick.

- The latest result I gave was after some server warm up. In the very first
iteration, the FIRST RUN average was about 25-26ms(because just after
adding 100M records my  compute seems over heated and etc). SECOND RUN was
OK which had about 12-15ms in avg. Then I ran it again. Then this result
came, which is  around 12-14ms. That's what I posted.

- And also, nodes from /content /cassandra/F/0 .. /content
/cassandra/F/99999999.  Each node in cassandra evaluates to a different
key. So even though we have 100M under same column family, for cassandra,
in hector point of view (relational DB point of view), is just a set of
records with 10M unique keys and we read one record at a time (but I am not
sure how cassandra really store these data). So I believe even with 1B
records it will not have a huge latency difference (given that we cluster
the cassandra for better performance).


>
> I'll think you are clear to go onto the next phase if you are willing.
> I think you have an option. Do write or do access control. Write will
> be more exciting, access control will be more thought intensive. You
> may already have most of the write code, as you just wrote 100M items!
>
> Which one ?
>

I do have codes which writes to cassandra. But if you are OK, I would like
to go with read with access control first. Writing I can easily in
cooperate to a sling interface once realized the API. I am not sure the
complexity of access controlling thing :-) . So better to start with it :-)


> Ian
>
> On 29 August 2013 02:12, Dishara Wijewardana <[email protected]>
> wrote:
> > Hi Ian,
> > I have updated the latest results in the JIRA, and please find the report
> > named "CassandraLatencyReport_V1.txt" to get the latest results. I
> improve
> > the report also to get average latency under each node. So the node "E"
> and
> > "F" will have 10M and 100M collection.
> >
> > P.S It took >7 hrs for me to populate a 100M collection :-). Seems it's
> > worth spending that much of time for populating a 100M collection by
> seeing
> > the results.
> >
> >
> ===========================================================================================================================
> > ========================================== FIRST RUN TEST
> > SUMMERY==========================================================
> > [RESULT] Average Latency Under Node A(1K)   = 14 (ms)
> > [RESULT] Average Latency Under Node B(10K)  = 11 (ms)
> > [RESULT] Average Latency Under Node C(100K) = 12 (ms)
> > [RESULT] Average Latency Under Node D(1M)   = 21 (ms)
> > [RESULT] Average Latency Under Node E(10M)  = 21 (ms)
> > [RESULT] Average Latency Under Node F(100M) = 16 (ms)
> > [FIRST RUN] #TOTAL CALLS = 600 Total Average Latency = 16 (ms)
> >
> ===========================================================================================================================
> > ========================================== SECOND RUN TEST
> > SUMMERY==========================================================
> > [RESULT] Average Latency Under Node A(1K)   = 10 (ms)
> > [RESULT] Average Latency Under Node B(10K)  = 15 (ms)
> > [RESULT] Average Latency Under Node C(100K) = 14 (ms)
> > [RESULT] Average Latency Under Node D(1M)   = 14 (ms)
> > [RESULT] Average Latency Under Node E(10M)  = 11 (ms)
> > [RESULT] Average Latency Under Node F(100M) = 14 (ms)
> > [FIRST RUN] #TOTAL CALLS = 600 Total Average Latency = 13 (ms)
> >
> ===========================================================================================================================
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Thu, Aug 29, 2013 at 6:37 AM, Dishara Wijewardana (JIRA) <
> [email protected]
> >> wrote:
> >
> >>
> >>      [
> >>
> https://issues.apache.org/jira/browse/SLING-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
> ]
> >>
> >> Dishara Wijewardana updated SLING-3026:
> >> ---------------------------------------
> >>
> >>     Attachment: CassandraLatencyReport_V1.txt
> >>
> >> Here I am attaching the latest test results which includes latency to
> pull
> >> data from a 10M and 100M collections.
> >>
> >> > Cassandra Resource Provider READ Latency Stats
> >> > -----------------------------------------------
> >> >
> >> >                 Key: SLING-3026
> >> >                 URL: https://issues.apache.org/jira/browse/SLING-3026
> >> >             Project: Sling
> >> >          Issue Type: Task
> >> >            Reporter: Dishara Wijewardana
> >> >            Priority: Critical
> >> >         Attachments: CassandraIntegrationTest.patch,
> >> CassandraLatencyReport.txt, CassandraLatencyReport_V1.txt,
> >> SLING_CASSANDRA_LATENCY_STATS_22-08-2013.txt,
> >> SLING_CASSANDRA_LATENCY_STATS_CHART_22-08-2013.png,
> >> SLING_CASSANDRA_LATENCY_STATS_TWO_CHART_22-08-2013.png
> >> >
> >> >
> >> > This is to keep track on the statistics of the latency for the
> requests
> >> done on Cassandra layer through Cassandra Resource Provider. Here we use
> >> Apache Benchmark.
> >> > We have a test profile java component in the cassandra module to add
> >> bulk test data to cassandra.
> >> > /content/cassandra/A/0   to /content/cassandra/A/999
> >> > /content/cassandra/B/0   to /content/cassandra/B/9999
> >> > /content/cassandra/C/0   to /content/cassandra/C/99999
> >> > /content/cassandra/D/0   to /content/cassandra/D/999999
> >> > And then this JIRA will keep track of reports on the http request time
> >> to retrieve 1 node from each following data collection.
> >> >
> >>
> >> --
> >> This message is automatically generated by JIRA.
> >> If you think it was sent incorrectly, please contact your JIRA
> >> administrators
> >> For more information on JIRA, see:
> http://www.atlassian.com/software/jira
> >>
> >
> >
> >
> > --
> > Thanks
> > /Dishara
>



-- 
Thanks
/Dishara

Reply via email to