There are some ideas that development community members have kicked around that may falsify the assumption that "virtual tables are 
tiny and will fit in memory."One example is CASSANDRA-14629: Abstract Virtual Table for very large result 
setshttps://issues.apache.org/jira/browse/CASSANDRA-14629Chris's proposal here is to enable query results from virtual tables to be 
streamed to the client rather than being fully materialized. There are some neat possibilities suggested in this ticket, such as debug 
functionality to dump the contents of a raw SSTable via the CQL interface, or the contents of the database's internal caches. One could 
also imagine a feature like this providing functionality similar to a foreign data wrapper in other databases.I don't think the 
assumption that "virtual tables will always be small and always fit in memory" is a safe one.I don't think we should implicitly 
add "ALLOW FILTERING" to all queries against virtual tables because of this, in addition to concern with departing from 
standard CQL semantics for a type of tables deemed special.– ScottOn Feb 3, 2023, at 6:52 AM, Maxim Muzafarov <mmu...@apache.org> 
wrote:Hello Stefan,Regarding the decision to implicitly enable ALLOW FILTERING forvirtual tables, which also makes sense to me, it may be 
necessary toconsider changing the clustering columns in the virtual table metadatato regular columns as well. The reasons are the same as 
mentionedearlier: the virtual tables hold their data in memory, thus we do notbenefit from the advantages of ordered data (e.g. the 
ClientsTable andits ClusteringColumn(PORT)).Changing the clustering column to a regular column may simplify thevirtual table data model, 
but I'm afraid it may affect users who relyon the table metadata.On Fri, 3 Feb 2023 at 12:32, Andrés de la Peña 
<adelap...@apache.org> wrote:I think removing the need for ALLOW FILTERING on virtual tables makes sense and would be quite useful 
for operators.That guard exists for performance issues that shouldn't occur on virtual tables. We also have a flag in case some future 
virtual table implementation has limitations regarding filtering, although it seems it's not the case with any of the existing virtual 
tables.It is not like we would promote bad habits because virtual tables are meant to be queried by operators / administrators only.It 
might even be quite the opposite, since in the current situation users might get used to routinely use ALLOW FILTERING for querying their 
virtual tables.It has been mentioned on the #cassandra-dev Slack thread where this started (1) that it's kind of an API inconsistency to 
allow querying by non-primary keys on virtual tables without ALLOW FILTERING, whereas it's required for regular tables. I think that a 
simply doc update saying that virtual tables, which are not regular tables, support filtering would be enough. Virtual tables are well 
identified by both the keyspace they belong to and doc, so users shouldn't have trouble knowing whether a table is virtual. It would be 
similar to the current exception for ALLOW FILTERING, where one needs to use it unless the table has an index for the queried column.(1) 
https://the-asf.slack.com/archives/CK23JSY2K/p1675352759267329On Fri, 3 Feb 2023 at 09:09, Miklosovic, Stefan 
<stefan.mikloso...@netapp.com> wrote:Hi list,the content of virtual tables is held in memory (and / or is fetched every time upon 
request). While doing queries against such table for a column outside of primary key, normally, users are required to specify ALLOW 
FILTERING. This makes total sense for "ordinary tables" for applications to have performant and effective queries but it kinds 
of loses the applicability for virtual tables when it literally holds just handful of entries in memory and it just does not matter, does 
it?What do you think about implicitly allowing filtering for virtual tables so we save ourselves from these pesky errors when we want to 
query arbitrary column and we need to satisfy CQL spec just to do that?It is not like we would promote bad habits because virtual tables 
are meant to be queried by operators / administrators only.We can also explicitly document this behavior.Among other options, we may try 
to implement secondary indices on virtual tables but I am not completely sure this is what we want because its complexity etc. Is it even 
necessary to put such complex logic in place just to be able to select any column on few entries in memory?I put together a draft here 
(1). It would be ever possible to implicitly allow filtering on virtual tables only and it would be implementator's responsibility to 
decide that, per table.For all virtual tables we currently have, I would enable this everywhere. I do not think there is any virtual 
table where we would not want to enable it or where people HAVE TO specify that.(1) https://github.com/apache/cassandra/pull/2131

Reply via email to