[ 
https://issues.apache.org/jira/browse/CASSANDRA-12054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Estevez updated CASSANDRA-12054:
------------------------------------------
    Description: 
cassandra-stess, like many powerful tools, is not easy to use. It requires some 
statistical understanding and syntactic skill, in order to get up and running 
with even a simple data model, user profile driven test (for details see the 
cassandra-stress docs). Furthermore, designing a good cassandra data model 
requires basic understanding of how CQL works and how c* data is laid out on 
disk as a result of partitioning and clustering.

The CQL DataModeler aims to simplify this task, getting users up and running 
with user profile powered cassandra-stress tests in minutes.

Given the feedback the community has voiced about the usability of 
cassandra-stress at NGCC, it was suggested that I contribute the data modeler 
to the open source project so it can be maintained in tree and leveraged by 
users.

It is a simple static web application and users should be able to use it by 
just opening up index.html with their browser and populating the GUI.

Check it out here:
http://www.sestevez.com/sestevez/CassandraDataModeler/

The source code sits in github, once we clean it up and know where it will live 
in tree, I'll submit a c* patch:
https://github.com/phact/CassandraDataModeler

I have developed this as a side project and not as production ready code. I 
welcome feedback on how it can be cleaned up and improved.

cc: [~tjake][~carlyeks]

Future improvements include:
1) Add cluster distributions (currently only size and population are supported)
2) Add functionality so that the histograms display overall distributions 
(combining cluster and population distributions for fields)
3) Include batch configuration and insert distribution
4) Include -pop and other command line options that are crucial for describing 
workloads
5) Add sparse table capabilities (already in stress but currently undocumented)
6) Add a few example data models to ship with the tool
7) Eventually allow users to contribute back profiles to some sort of community

IMO this jira should be contingent on 1, 3, 4, and 6 being completed. 

  was:
cassandra-stess, like many powerful tools, is not easy to use. It requires some 
statistical understanding and syntactic skill, in order to get up and running 
with even a simple data model, user profile driven test (for details see the 
cassandra-stress docs). Furthermore, designing a good cassandra data model 
requires basic understanding of how CQL works and how c* data is laid out on 
disk as a result of partitioning and clustering.

The CQL DataModeler aims to simplify this task, getting users up and running 
with user profile powered cassandra-stress tests in minutes.

Given the feedback the community has voiced about the usability of 
cassandra-stress at NGCC, it was suggested that I contribute the data modeler 
to the open source project so it can be maintained in tree and leveraged by 
users.

It is a simple static web application and users should be able to use it by 
just opening up index.html with their browser and populating the GUI.

Check it out here:
http://www.sestevez.com/sestevez/CassandraDataModeler/

The source code sits in github, once clean it up and know where in tree it will 
live, I'll submit a c* patch:
https://github.com/phact/CassandraDataModeler

I have developed this as a side project and not as production ready code. I 
welcome feedback on how it can be cleaned up and improved.

cc: [~tjake][~carlyeks]

Future improvements include:
1) Add cluster distributions (currently only size and population are supported)
2) Add functionality so that the histograms display overall distributions 
(combining cluster and population distributions for fields)
3) Include batch configuration and insert distribution
4) Include -pop and other command line options that are crucial for describing 
workloads
5) Add sparse table capabilities (already in stress but currently undocumented)
6) Add a few example data models to ship with the tool
7) Eventually allow users to contribute back profiles to some sort of community

IMO this jira should be contingent on 1, 3, 4, and 6 being completed. 


> Add CQL Data Modeler to tree
> ----------------------------
>
>                 Key: CASSANDRA-12054
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12054
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Sebastian Estevez
>
> cassandra-stess, like many powerful tools, is not easy to use. It requires 
> some statistical understanding and syntactic skill, in order to get up and 
> running with even a simple data model, user profile driven test (for details 
> see the cassandra-stress docs). Furthermore, designing a good cassandra data 
> model requires basic understanding of how CQL works and how c* data is laid 
> out on disk as a result of partitioning and clustering.
> The CQL DataModeler aims to simplify this task, getting users up and running 
> with user profile powered cassandra-stress tests in minutes.
> Given the feedback the community has voiced about the usability of 
> cassandra-stress at NGCC, it was suggested that I contribute the data modeler 
> to the open source project so it can be maintained in tree and leveraged by 
> users.
> It is a simple static web application and users should be able to use it by 
> just opening up index.html with their browser and populating the GUI.
> Check it out here:
> http://www.sestevez.com/sestevez/CassandraDataModeler/
> The source code sits in github, once we clean it up and know where it will 
> live in tree, I'll submit a c* patch:
> https://github.com/phact/CassandraDataModeler
> I have developed this as a side project and not as production ready code. I 
> welcome feedback on how it can be cleaned up and improved.
> cc: [~tjake][~carlyeks]
> Future improvements include:
> 1) Add cluster distributions (currently only size and population are 
> supported)
> 2) Add functionality so that the histograms display overall distributions 
> (combining cluster and population distributions for fields)
> 3) Include batch configuration and insert distribution
> 4) Include -pop and other command line options that are crucial for 
> describing workloads
> 5) Add sparse table capabilities (already in stress but currently 
> undocumented)
> 6) Add a few example data models to ship with the tool
> 7) Eventually allow users to contribute back profiles to some sort of 
> community
> IMO this jira should be contingent on 1, 3, 4, and 6 being completed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to