[
https://issues.apache.org/jira/browse/CASSANDRA-15837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17118024#comment-17118024
]
David Capwell commented on CASSANDRA-15837:
---
Spoke to Marcus about this, dumping context here.
* "why not read the raw fql logs?"
main reasons
1) cassandra brings in its dependencies, and these can be very old causing
conflict
2) cassandra doesn’t have a stable API, so using different versions require a
rewrite [1]
3) thrift is a lot faster [2], so we spend less time reading the files and more
time trying to put load on the cluster. Right now my bottleneck isn’t reading
the files, its that cassandra can’t keep up so need to throttle the operations.
[1] - It is less of an issue reading the files, but there are no guarantees
that QueryOptions won’t change the java API at will. The main issue I had in
upgrading from 3.0 to 4.0 was the query parsing side to classify the query. I
added logic to annotate what the query does and touches, so tools could filter
for specific tables or only replay specific types of queries (such as selects
or updates, etc.); having this logic is very useful in tooling (actively using
right now) but this part is not compatible cross releases.
[2] - Below are two consumers of the fql logs: one reading the raw logs and
ignoring the output, the other reading the thrift version and collecting stats.
In both tools they read 100% of the data and do this sequentially.
$ time ./bin/fqltool thrift-stats ../query_logs/*/fql/fql.thrift.gz 1>/dev/null
13.67 real14.88 user 1.59 sys
$ time ./bin/fqltool dump-thrift -- ../query_logs/*/fql/
56.97 real76.70 user 2.47 sys
* "we just removed [thrift]"
I started in 3.0 so used thrift because it was there, in doing so I grew to
hate it... I find protobuf to be better so will likely switch to that. But,
to avoid adding a dependency into Cassandra, I can take on the work to allow
FQL to have a different set of dependencies.
> Enhance fqltool to be able to export the fql log into a format which doesn't
> depend on Cassandra
>
>
> Key: CASSANDRA-15837
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15837
> Project: Cassandra
> Issue Type: Improvement
> Components: Tool/fql
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Currently the fql log format uses Cassandra serialization within the message,
> which means that reading the file also requires Cassandra classes. To make it
> easier for outside tools to read the fql logs we should enhance the fqltool
> to be able to dump the logs to a file format using thrift or protobuf.
> Additionally we should support exporting the original query with a
> deterministic version allowing tools to have reproducible and comparable
> results.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org