Hi all, # Programm Sketch
1. I create a HiveContext `hiveContext` 2. With that context, I create a DataFrame `df` from a JDBC relational table. 3. I register the DataFrame `df` via df.registerTempTable("TESTTABLE") 4. I start a HiveThriftServer2 via HiveThriftServer2.startWithContext(hiveContext) The TESTTABLE contains 1,000,000 entries, columns are ID (INT) and NAME (VARCHAR) +-----+--------+ | ID | NAME | +-----+--------+ | 1 | Hello | | 2 | Hello | | 3 | Hello | | ... | ... | With Beeline I access the SQL Endpoint (at port 10000) of the HiveThriftServer and perform a query. E.g. SELECT * FROM TESTTABLE WHERE ID='3' When I inspect the QueryLog of the DB with the SQL Statements executed I see /*SQL #:1000000 t:657*/ *SELECT \"ID\",\"NAME\" FROM test;* So there happens no predicate pushdown , as the where clause is missing. # Questions This gives raise to the following questions: 1. *Why is no predicate pushdown performed?* 2. *Can this be changed by not using registerTempTable? If so, how? * 3. *Or is this a known restriction of the HiveThriftServer?* # Counterexample If I create a DataFrame `df` in Spark SQLContext and call df.filter( df("ID") === 3).show() I observe /*SQL #:1*/SELECT \"ID\",\"NAME\" FROM test *WHERE ID = 3*; as expected.