> On April 1, 2015, 8 p.m., Jinfeng Ni wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/DirectoryExplorers.java,
> >  line 82
> > <https://reviews.apache.org/r/30701/diff/6/?file=906263#file906263line82>
> >
> >     Not sure if the case-insensitive comparison should be used as default, 
> > or it should depend on the schema (i.e HBase schema could use a different 
> > sensitive policy from FileSystemSchema, etc), or it should be passed in as 
> > a parameter of udf maxdir(). In the query 
> >     " select * 
> >       from dfs.my_workspace.data_directory 
> >       where dir0 in (select MAX(dir0) from dfs.my_workspace.data_directory)"
> >     
> >     Aggregate function max() could use case sensitive string comparison. If 
> > this maxdir UDF chooses to use case-insensitive, then after partition 
> > pruning, it might return different query results.

The primary use case we had in mind with this feature was actually just finding 
recent data, so all of the partition names were numeric. For the sake of date 
formats that are arranged such that a string comparison can give the corret 
result, ie YYYY-MM-DD or similar, the case sensitivity wouldn't matter. I think 
there are a lot of possibilities of ways that users might want to query there 
partition information, and I think it might be best to leave open the interface 
for writing custom UDFs in these cases. I could pass a flag to this UDF, or 
wriate another to do the same operation but case-sensitively.


> On April 1, 2015, 8 p.m., Jinfeng Ni wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/store/PartitionExplorer.java,
> >  line 101
> > <https://reviews.apache.org/r/30701/diff/6/?file=906270#file906270line101>
> >
> >     What's the purpose of passing partitionColumns and partitionValues? In 
> > FileSystemSchema or WorkspaceSchema getSubPartitions, those two parameters 
> > are not used. UDF DirectoryExplorers just passes two empty list. I'm not 
> > clear why the interface need these two additional parameters, on top of 
> > "schema" and "table".

These columns were added for use with storage systems that track partition 
column names. It is the case that they are not used for the only two current 
implementations of the interface in the file system and workspace schemas. 
These are primarily useful for Hive, as we can do partition pruning currently 
on partition columns. Adding this to the interface allowed generalizing this 
functionality to enable future use in Hive. It would be possible that would 
could have two different interface to avoid confusion in the cases where the 
partion columns are not needed.


- Jason


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30701/#review78562
-----------------------------------------------------------


On March 26, 2015, 12:54 a.m., Jason Altekruse wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30701/
> -----------------------------------------------------------
> 
> (Updated March 26, 2015, 12:54 a.m.)
> 
> 
> Review request for drill, Jacques Nadeau, Mehant Baid, Parth Chandra, and 
> Venki Korukanti.
> 
> 
> Bugs: DRILL-2173
>     https://issues.apache.org/jira/browse/DRILL-2173
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> Adds a new interface for UDFs to access partition information. Together with 
> 2060 which allows constant expression folding this will allow UDFs that can 
> query against partition information and then scan a subset of data. Example 
> use case, find the most recent directory and only that partition worth of 
> data.
> 
> 
> Diffs
> -----
> 
>   
> contrib/storage-hbase/src/main/java/org/apache/drill/exec/store/hbase/HBaseSchemaFactory.java
>  7b76092 
>   
> contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/HiveSchemaFactory.java
>  023517b 
>   
> contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/schema/MongoSchemaFactory.java
>  32c42ba 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/FunctionConverter.java
>  ab121b0 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/DirectoryExplorers.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/interpreter/InterpreterEvaluator.java
>  35c35ec 
>   exec/java-exec/src/main/java/org/apache/drill/exec/ops/FragmentContext.java 
> 5e31e5c 
>   exec/java-exec/src/main/java/org/apache/drill/exec/ops/QueryContext.java 
> 3b51a69 
>   exec/java-exec/src/main/java/org/apache/drill/exec/ops/UdfUtilities.java 
> f7a1a04 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/store/AbstractSchema.java 
> 90e3ef4 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/store/AbstractStoragePlugin.java
>  b032fce 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/store/PartitionExplorer.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/store/PartitionExplorerImpl.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/store/PartitionNotFoundException.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/store/SchemaPartitionExplorer.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/store/SubSchemaWrapper.java
>  2c0d8b8 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemSchemaFactory.java
>  4a3eba9 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java
>  7c8d9b3 
>   
> exec/java-exec/src/test/java/org/apache/drill/exec/fn/interp/TestConstantFolding.java
>  PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/30701/diff/
> 
> 
> Testing
> -------
> 
> Test have been run on a very recent version, made a few minor cleanup edits 
> since, waiting on another run, but do not anticipate issues.
> 
> 
> Thanks,
> 
> Jason Altekruse
> 
>

Reply via email to