Re: query pushdown into HBase subscan
The constant folding feature is turned on by default (and can be disabled with planner.enable_constant_folding). It should be able to work with UDFs, as it has access to all of the same function definitions as our standard resolution/evaluation during full execution. In the plan that includes the full scan, in the filter above the scan does your expression appear as written (i.e convert_from(...) = hash_to_long('key_part1')), or has the right hand side been reduced to a constant value? The next thing that would probably be good to debug would be pre-computing the right hand side and seeing if that gets pushed down. Jason Altekruse Software Engineer at Dremio Apache Drill Committer On Tue, May 31, 2016 at 5:04 PM, Adityawrote: > Hi Andrey, > > Drill currently does require a constant value on the right hand side of a > comparison operator to pushdown the filter. > > I believe that Jason had worked on constant folding feature which would > evaluate a constant expression during planning phase and rewrite the plan > to replace the expression with the corresponding constant value. > > Not sure if that works with UDFs as well. > > Jason? > > On Tue, May 31, 2016 at 3:54 PM, Andrey Gusev > wrote: > > > Hello Drill, > > > > We're noticing somewhat of an odd behavior with the following query > > against HBase table. > > > > They key of the table is roughly speaking > > *8byteHash(string1)8byteHash(string2)* > > > > > > SELECT CONVERT_FROM(BYTE_SUBSTR(row_key, 1, 8), 'BIGINT') p1_long, ... > from {table} > > WHERE CONVERT_FROM(BYTE_SUBSTR(row_key, 1, 8), 'BIGINT_BE') = > hash_to_long('key_part1') limit 10 > > > > The query does seem to work correctly in terms of result set but times > out > > on larger tables. The hash_to_long is udf that I wrote that converts a > > string to long such that the above equality can be satisfied. > > > > It appears that it doesn't push down this into subscan (i.e. prefix HBase > > scan) - while the operator profile shows HBASE_SUB_SCAN: > > > > [image: Inline image 1] > > > > The physical plan start with unconstrained full table scan: > > > > Scan(groupscan=[HBaseGroupScan [HBaseScanSpec=HBaseScanSpec > [tableName={table}, startRow=null, stopRow=null, filter=null], > > > > > > How can we force the where clause to be reflected into scan bounds? > > > > We're running latest Drill 1.6. > > > > Andrey > > >
Re: query pushdown into HBase subscan
Hi Andrey, Drill currently does require a constant value on the right hand side of a comparison operator to pushdown the filter. I believe that Jason had worked on constant folding feature which would evaluate a constant expression during planning phase and rewrite the plan to replace the expression with the corresponding constant value. Not sure if that works with UDFs as well. Jason? On Tue, May 31, 2016 at 3:54 PM, Andrey Gusevwrote: > Hello Drill, > > We're noticing somewhat of an odd behavior with the following query > against HBase table. > > They key of the table is roughly speaking > *8byteHash(string1)8byteHash(string2)* > > > SELECT CONVERT_FROM(BYTE_SUBSTR(row_key, 1, 8), 'BIGINT') p1_long, ... from > {table} > WHERE CONVERT_FROM(BYTE_SUBSTR(row_key, 1, 8), 'BIGINT_BE') = > hash_to_long('key_part1') limit 10 > > The query does seem to work correctly in terms of result set but times out > on larger tables. The hash_to_long is udf that I wrote that converts a > string to long such that the above equality can be satisfied. > > It appears that it doesn't push down this into subscan (i.e. prefix HBase > scan) - while the operator profile shows HBASE_SUB_SCAN: > > [image: Inline image 1] > > The physical plan start with unconstrained full table scan: > > Scan(groupscan=[HBaseGroupScan [HBaseScanSpec=HBaseScanSpec > [tableName={table}, startRow=null, stopRow=null, filter=null], > > > How can we force the where clause to be reflected into scan bounds? > > We're running latest Drill 1.6. > > Andrey >
query pushdown into HBase subscan
Hello Drill, We're noticing somewhat of an odd behavior with the following query against HBase table. They key of the table is roughly speaking *8byteHash(string1)8byteHash(string2)* SELECT CONVERT_FROM(BYTE_SUBSTR(row_key, 1, 8), 'BIGINT') p1_long, ... from {table} WHERE CONVERT_FROM(BYTE_SUBSTR(row_key, 1, 8), 'BIGINT_BE') = hash_to_long('key_part1') limit 10 The query does seem to work correctly in terms of result set but times out on larger tables. The hash_to_long is udf that I wrote that converts a string to long such that the above equality can be satisfied. It appears that it doesn't push down this into subscan (i.e. prefix HBase scan) - while the operator profile shows HBASE_SUB_SCAN: [image: Inline image 1] The physical plan start with unconstrained full table scan: Scan(groupscan=[HBaseGroupScan [HBaseScanSpec=HBaseScanSpec [tableName={table}, startRow=null, stopRow=null, filter=null], How can we force the where clause to be reflected into scan bounds? We're running latest Drill 1.6. Andrey