Hi all,

# Programm Sketch


   1. I create a HiveContext `hiveContext`
   2. With that context, I create a DataFrame `df` from a JDBC relational
   table.
   3. I register the DataFrame `df` via

   df.registerTempTable("TESTTABLE")

   4. I start a HiveThriftServer2 via

   HiveThriftServer2.startWithContext(hiveContext)



The TESTTABLE contains 1,000,000 entries, columns are ID (INT) and NAME
(VARCHAR)

    +-----+--------+
    | ID  |  NAME  |
    +-----+--------+
    | 1   | Hello  |
    | 2   | Hello  |
    | 3   | Hello  |
    | ... | ...    |

With Beeline I access the SQL Endpoint (at port 10000) of the
HiveThriftServer and perform a query. E.g.

    SELECT * FROM TESTTABLE WHERE ID='3'

When I inspect the QueryLog of the DB with the SQL Statements executed I see

    /*SQL #:1000000 t:657*/  *SELECT \"ID\",\"NAME\" FROM test;*

So there happens no predicate pushdown , as the where clause is missing.


# Questions

This gives raise to the following questions:

   1. *Why is no predicate pushdown performed?*
   2.
*Can this be changed by not using registerTempTable? If so, how? *
   3. *Or is this a known restriction of the HiveThriftServer?*


# Counterexample

If I create a DataFrame `df` in Spark SQLContext and call

    df.filter( df("ID") === 3).show()

I observe

    /*SQL #:1*/SELECT \"ID\",\"NAME\" FROM test *WHERE ID = 3*;

as expected.

Reply via email to