[ https://issues.apache.org/jira/browse/CALCITE-2696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17184616#comment-17184616 ]
Zachary Gramana commented on CALCITE-2696: ------------------------------------------ It's worth considering making this threshold determine by cost + statistics rather than having a simple static threshold. In the case which brought me to this issue, the threshold is exceed but the resulting table scan is very, very expensive--much more so than the 1000 OR RexNodes. But there doubtless will be cases for us where the current table scan + semi-join will be more efficient, and I would like to enable those cases too (though 19 may or may not be the right threshold in those cases). > Make it easier to configure SqlToRelConverter.Config.getInSubQueryThreshold() > ----------------------------------------------------------------------------- > > Key: CALCITE-2696 > URL: https://issues.apache.org/jira/browse/CALCITE-2696 > Project: Calcite > Issue Type: Bug > Components: core > Affects Versions: 1.17.0 > Reporter: Dirk Mahler > Priority: Major > Labels: pull-request-available > Attachments: calcite-in-clause.zip > > Time Spent: 10m > Remaining Estimate: 0h > > A {{Filter}} containing an IN clause is not passed to {{Enumerable.scan}}. > I'm using the Calcite JDBC driver with an own SchemaFactory (defined by a > model property) that provides a schema containing a > ProjectableFilterableTable: > {code:java} > String model = "inline:" // > + "{" // > + " version: '1.0', " // > + " defaultSchema: 'test'," // > + " schemas: [" // > + " {" // > + " name: 'test'," // > + " type: 'custom'," // > + " factory: '" + TestSchemaFactory.class.getName() + "'" // > + " }" > + " ]" // > + "}"; > Properties properties = new Properties(); > properties.put(CalciteConnectionProperty.MODEL.camelName(), model); > connection = DriverManager.getConnection("jdbc:calcite:", properties); > {code} > > > {code:java} > class TestTable extends AbstractQueryableTable implements > ProjectableFilterableTable { > public Enumerable<Object[]> scan(DataContext root, List<RexNode> filters, > int[] projects) { > ... > } > ... > }{code} > > It maps to a Java class and provides two Integer typed columns "value1" and > "value2". > The following query leads to a quite expensive behavior in the scan method if > the following statement is executed: > > {code:java} > SELECT "value" FROM "TEST_TABLE" WHERE "value1" = 1 AND "value2" in > (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20) > {code} > The scan method is invoked with a filter that only covers the part "value1" = > 1, the IN clause is completely omitted. The result on the JDBC side is still > valid but in my case this still leads to a full scan of a large underlying > data set (millions of rows). > Interestingly the filter part reflecting the IN operator is provided if the > number of elements in the list is below 20. It seems that this is controlled > by > org.apache.calcite.sql2rel.SqlToRelConverter.Config#getInSubQueryThreshold. > It would at be very helpful if this behavior could be confgiured on the JDBC > property level. -- This message was sent by Atlassian Jira (v8.3.4#803005)