[ 
https://issues.apache.org/jira/browse/CASSANDRA-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15319451#comment-15319451
 ] 

Sylvain Lebresne commented on CASSANDRA-7622:
---------------------------------------------

I really think we should approach this in a more restricted way first, and 
would *strongly* prefer we do so. I think initially virtual tables should be a 
purely internal mechanism that we use for:
# exposing metrics for those alergic to JMX
# exposing more of the replica state (some of which is already exposed in our 
existing system tables) when it makes sense. We might also have a few system 
table that don't really rely on persistence and could be switched to virtual 
tables for their implementation.
I'm not saying this is the only possible use case for virtual tables, but 
that's the main use case that has brought so far and it's a pretty damn good 
start.

That does mean starting with read-only tables, no CQL syntax for creating 
random virtual table for now (if I'm being honest, I'm not yet convinced we 
should ever add it) and probably sticking to local (non replicated) tables for 
now. And I think there is enough to discuss for such initial version that it's 
worth leaving the discussing for further extension to later. In particular, 
that still leave discussion on:
# how do we implement that
# what do we initially expose with this and how (typically, for metrics, what 
schema are we going for).


I'll note that on the implementation front, I think the cleanest and simplest 
solution might be to expose such virtual table as some fake memtable that is 
never flushed. That is, as far as reading is concerned, a memtable mostly 
expose 2 methods:
{noformat}
UnfilteredPartitionIterator makePartitionIterator(ColumnFilter columnFilter, 
DataRange dataRange, boolean isForThrift)
Partition getPartition(DecoratedKey key)
{noformat}
and that's what virtual tables would have to implement in general.

Of course, in practice, we wouldn't want to re-implement such fairly generic 
method for every new virtual table. So I imagine we could create some form of 
"builder" that would allow to declare each column of the virtual table and 
associate to each column a callback, used to compute the value. As well as 
method to iterate over which rows the table has. So something like this (not at 
all refine, just to give a clearer idea of what I mean):
{noformat}
public interface VirtualTableBuilder
{
    /**
     * Adds column {@code name} with type {@code type} to the virtual table. 
The value
     * for that column is computed using {@code callback}.
     */
    public void add(ColumnIdentifier name, AbstractType type, Callback 
callback);

    /**
     * Returns an iterator generating the partition keys of the virtual table.
     */
    public Iterator<DecoratedKey> partitionKeys();

    /**
     * Given a particular partition key, returns an iterator generating the 
rows contained
     * in this keys of the virtual table.
     */
    public Iterator<Clustering> clusterings(DecoratedKey key);

    public interface Callback
    {
        /**
         * Computes a column value given the primary key to that column.
         */
        public ByteBuffer compute(DecoratedKey partitionKey, Clustering 
clustering);
    }
}
{noformat}
Given this, we should be able to generate a proper "VirtualMemtable" object I 
describe above.


bq. So at least for the things I can think of the "nicest" interface would be 
that virtual tables have some kind of "where should I get routed" function, so 
you could do custom routing.

I admit the idea of being able to query a given metrics for the whole cluster 
in a single query is seducing on paper, but I'm not at all sure it's actually a 
good idea. Queries that rely on all the nodes of the cluster responding to 
return sounds bad to me. Even if we were to handle timeouts differently, still 
returning a result but with just no results for the nodes that didn't answered, 
I assume having the collection of all your metrics blocked for X seconds due to 
a single node is strictly worth for monitoring system than querying nodes 
themselves and thus have only the metrics to the node that are unhealthy be 
delayed.

Overall, feels like a lot of complexity on our side for doubtful usefulness.


> Implement virtual tables
> ------------------------
>
>                 Key: CASSANDRA-7622
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7622
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Tupshin Harper
>            Assignee: Jeff Jirsa
>             Fix For: 3.x
>
>
> There are a variety of reasons to want virtual tables, which would be any 
> table that would be backed by an API, rather than data explicitly managed and 
> stored as sstables.
> One possible use case would be to expose JMX data through CQL as a 
> resurrection of CASSANDRA-3527.
> Another is a more general framework to implement the ability to expose yaml 
> configuration information. So it would be an alternate approach to 
> CASSANDRA-7370.
> A possible implementation would be in terms of CASSANDRA-7443, but I am not 
> presupposing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to