Hi all,

# Programm Sketch

I create a HiveContext `hiveContext`
With that context, I create a DataFrame `df` from a JDBC relational table.I
register the DataFrame `df` viadf.registerTempTable("TESTTABLE")I start a
HiveThriftServer2 via
HiveThriftServer2.startWithContext(hiveContext)

The TESTTABLE contains 1,000,000 entries, columns are ID (INT) and NAME
(VARCHAR)

    +-----+--------+
    | ID  |  NAME  |
    +-----+--------+
    | 1   | Hello  |
    | 2   | Hello  |
    | 3   | Hello  |
    | ... | ...    |

With Beeline I access the SQL Endpoint (at port 10000) of the
HiveThriftServer and perform a query. E.g.

    SELECT * FROM TESTTABLE WHERE ID='3'

When I inspect the QueryLog of the DB with the SQL Statements executed I see

    /*SQL #:1000000 t:657*/  SELECT \"ID\",\"NAME\" FROM test;

So there happens no predicate pushdown , as the where clause is missing.

# Questions

This gives raise to the following questions:

Why is no predicate pushdown performed?Can this be changed by not using
registerTempTable? If so, how?
Or is this a known restriction of the HiveThriftServer?

# Counterexample

If I create a DataFrame `df` in Spark SQLContext and call

df.filter( df("ID") === 3).show()
I observe

    /*SQL #:1*/SELECT \"ID\",\"NAME\" FROM test WHERE ID = 3;

as expected.

Reply via email to