LakshSingla commented on code in PR #16288:
URL: https://github.com/apache/druid/pull/16288#discussion_r1571750300


##########
docs/ingestion/input-sources.md:
##########
@@ -1141,22 +1141,122 @@ To use the Delta Lake input source, load the extension 
[`druid-deltalake-extensi
 You can use the Delta input source to read data stored in a Delta Lake table. 
For a given table, the input source scans
 the latest snapshot from the configured table. Druid ingests the underlying 
delta files from the table.
 
-The following is a sample spec:
+ | Property|Description|Required|
+|---------|-----------|--------|
+| type|Set this value to `delta`.|yes|
+| tablePath|The location of the Delta table.|yes|
+| filter|The JSON Object that filters data files within a snapshot.|no|
+
+### Delta filter object
+
+You can use these filters to filter out data files from a snapshot, reducing 
the number of files Druid has to ingest from
+a Delta table. This input source provides the following filters: `and`, `or`, 
`not`, `=`, `>`, `>=`, `<`, `<=`.
+
+When a filter is applied on non-partitioned columns, the filtering is 
best-effort as the Delta Kernel solely relies
+on statistics collected when the non-partitioned table is created. In this 
scenario, this Druid connector may ingest
+data that doesn't match the filter. For guaranteed filtering behavior, only 
use filters on partitioned columns.

Review Comment:
   ```suggestion
   data that doesn't match the filter. To guarantee that the Delta Kernel 
prunes out the unnecessary columns, only use filters on partitioned columns.
   ```
   I think the wording can be slightly improved - "only" makes it seem like 
something unintuitive will happen when the filtering is done on a 
non-partitioned column. Unless it will degrade the performance, than if the 
filter was not present, we can probably reword it like a guideline. 



##########
docs/ingestion/input-sources.md:
##########
@@ -1141,22 +1141,122 @@ To use the Delta Lake input source, load the extension 
[`druid-deltalake-extensi
 You can use the Delta input source to read data stored in a Delta Lake table. 
For a given table, the input source scans
 the latest snapshot from the configured table. Druid ingests the underlying 
delta files from the table.
 
-The following is a sample spec:
+ | Property|Description|Required|
+|---------|-----------|--------|
+| type|Set this value to `delta`.|yes|
+| tablePath|The location of the Delta table.|yes|
+| filter|The JSON Object that filters data files within a snapshot.|no|
+
+### Delta filter object
+
+You can use these filters to filter out data files from a snapshot, reducing 
the number of files Druid has to ingest from
+a Delta table. This input source provides the following filters: `and`, `or`, 
`not`, `=`, `>`, `>=`, `<`, `<=`.
+
+When a filter is applied on non-partitioned columns, the filtering is 
best-effort as the Delta Kernel solely relies
+on statistics collected when the non-partitioned table is created. In this 
scenario, this Druid connector may ingest
+data that doesn't match the filter. For guaranteed filtering behavior, only 
use filters on partitioned columns.
+
+
+`and` filter:
+
+| Property | Description                                                       
                                                                                
            | Required |
+|----------|---------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|
+| type     | Set this value to `and`.                                          
                                                                                
            | yes      |
+| filters  | List of Delta filter predicates that get evaluated using logical 
AND where both conditions need to be true. `and` filter requires two filter 
predicates.      | yes      |
+
+`or` filter:
+
+| Property | Description                                                       
                                                                                
              | Required |
+|----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|
+| type     | Set this value to `or`.                                           
                                                                                
              | yes      |
+| filters  | List of Delta filter predicates that get evaluated using logical 
OR where only one condition needs to be true. `or` filter requires two filter 
predicates.      | yes      |
+
+`not` filter:
+
+| Property | Description                                                       
                                            | Required |
+|----------|---------------------------------------------------------------------------------------------------------------|----------|
+| type     | Set this value to `not`.                                          
                                            | yes      |
+| filter   | The Delta filter predicate that gets evaluated using logical NOT. 
`not` filter requires one filter predicate. | yes      |
+
+`=` filter:
+
+| Property | Description                              | Required |
+|----------|------------------------------------------|----------|
+| type     | Set this value to `=`.                   | yes      |
+| column   | The table column to apply the filter on. | yes      |
+| value    | The value to use in the filter.          | yes      |
+
+`>` filter:
+
+| Property | Description                              | Required |
+|----------|------------------------------------------|----------|
+| type     | Set this value to `>`.                   | yes      |
+| column   | The table column to apply the filter on. | yes      |
+| value    | The value to use in the filter.          | yes      |
+
+`>=` filter:
+
+| Property | Description                              | Required |
+|----------|------------------------------------------|----------|
+| type     | Set this value to `>=`.                  | yes      |
+| column   | The table column to apply the filter on. | yes      |
+| value    | The value to use in the filter.          | yes      |
+
+`<` filter:
+
+| Property | Description                              | Required |
+|----------|------------------------------------------|----------|
+| type     | Set this value to `<`.                   | Yes      |
+| column   | The table column to apply the filter on. | Yes      |
+| value    | The value to use in the filter.          | Yes      |
+
+`<=` filter:
+
+| Property | Description                              | Required |
+|----------|------------------------------------------|----------|
+| type     | Set this value to `<=`.                  | yes      |
+| column   | The table column to apply the filter on. | yes      |
+| value    | The value to use in the filter.          | yes      |
+
+
+The following is a sample spec to read all records from the Delta table 
`/delta-table/foo`:
 
 ```json
 ...
     "ioConfig": {
       "type": "index_parallel",
       "inputSource": {
         "type": "delta",
-        "tablePath": "/delta-table/directory"
+        "tablePath": "/delta-table/foo"
       },
     }
-}
 ```
 
-| Property|Description|Required|
-|---------|-----------|--------|
-| type|Set this value to `delta`.|yes|
-| tablePath|The location of the Delta table.|yes|
+The following is a sample spec to read records from the Delta table 
`/delta-table/foo` that match the `and` filter
+to select records where `name = 'Employee4'` and `age >= 30`:

Review Comment:
   ```suggestion
   The following is a sample spec to read records from the Delta table 
`/delta-table/foo` to select records where `name = 'Employee4'` 'and' `age >= 
30`:
   ```



##########
extensions-contrib/druid-deltalake-extensions/src/main/java/org/apache/druid/delta/filter/DeltaOrFilter.java:
##########
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.delta.filter;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import io.delta.kernel.expressions.Or;
+import io.delta.kernel.expressions.Predicate;
+import io.delta.kernel.types.StructType;
+import org.apache.druid.error.InvalidInput;
+
+import java.util.List;
+
+public class DeltaOrFilter implements DeltaFilter
+{
+  @JsonProperty
+  private final List<DeltaFilter> filters;
+
+  @JsonCreator
+  public DeltaOrFilter(@JsonProperty("filters") final List<DeltaFilter> 
filters)
+  {
+    if (filters == null) {
+      throw InvalidInput.exception("Delta or filter requires 2 filter 
predicates and must be non-empty.");
+    }
+    if (filters.size() != 2) {
+      throw InvalidInput.exception(
+          "Delta or filter requires 2 filter predicates, but provided [%d].",
+          filters.size()

Review Comment:
   This must be mentioned in the Javadoc. 



##########
docs/ingestion/input-sources.md:
##########
@@ -1141,22 +1141,122 @@ To use the Delta Lake input source, load the extension 
[`druid-deltalake-extensi
 You can use the Delta input source to read data stored in a Delta Lake table. 
For a given table, the input source scans
 the latest snapshot from the configured table. Druid ingests the underlying 
delta files from the table.
 
-The following is a sample spec:
+ | Property|Description|Required|
+|---------|-----------|--------|
+| type|Set this value to `delta`.|yes|
+| tablePath|The location of the Delta table.|yes|
+| filter|The JSON Object that filters data files within a snapshot.|no|
+
+### Delta filter object
+
+You can use these filters to filter out data files from a snapshot, reducing 
the number of files Druid has to ingest from
+a Delta table. This input source provides the following filters: `and`, `or`, 
`not`, `=`, `>`, `>=`, `<`, `<=`.
+
+When a filter is applied on non-partitioned columns, the filtering is 
best-effort as the Delta Kernel solely relies
+on statistics collected when the non-partitioned table is created. In this 
scenario, this Druid connector may ingest
+data that doesn't match the filter. For guaranteed filtering behavior, only 
use filters on partitioned columns.
+
+
+`and` filter:
+
+| Property | Description                                                       
                                                                                
            | Required |
+|----------|---------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|
+| type     | Set this value to `and`.                                          
                                                                                
            | yes      |
+| filters  | List of Delta filter predicates that get evaluated using logical 
AND where both conditions need to be true. `and` filter requires two filter 
predicates.      | yes      |
+
+`or` filter:
+
+| Property | Description                                                       
                                                                                
              | Required |
+|----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|
+| type     | Set this value to `or`.                                           
                                                                                
              | yes      |
+| filters  | List of Delta filter predicates that get evaluated using logical 
OR where only one condition needs to be true. `or` filter requires two filter 
predicates.      | yes      |
+
+`not` filter:
+
+| Property | Description                                                       
                                            | Required |
+|----------|---------------------------------------------------------------------------------------------------------------|----------|
+| type     | Set this value to `not`.                                          
                                            | yes      |
+| filter   | The Delta filter predicate that gets evaluated using logical NOT. 
`not` filter requires one filter predicate. | yes      |
+
+`=` filter:
+
+| Property | Description                              | Required |
+|----------|------------------------------------------|----------|
+| type     | Set this value to `=`.                   | yes      |
+| column   | The table column to apply the filter on. | yes      |
+| value    | The value to use in the filter.          | yes      |
+
+`>` filter:
+
+| Property | Description                              | Required |
+|----------|------------------------------------------|----------|
+| type     | Set this value to `>`.                   | yes      |
+| column   | The table column to apply the filter on. | yes      |
+| value    | The value to use in the filter.          | yes      |
+
+`>=` filter:
+
+| Property | Description                              | Required |
+|----------|------------------------------------------|----------|
+| type     | Set this value to `>=`.                  | yes      |
+| column   | The table column to apply the filter on. | yes      |
+| value    | The value to use in the filter.          | yes      |
+
+`<` filter:
+
+| Property | Description                              | Required |
+|----------|------------------------------------------|----------|
+| type     | Set this value to `<`.                   | Yes      |
+| column   | The table column to apply the filter on. | Yes      |
+| value    | The value to use in the filter.          | Yes      |
+
+`<=` filter:
+
+| Property | Description                              | Required |
+|----------|------------------------------------------|----------|
+| type     | Set this value to `<=`.                  | yes      |
+| column   | The table column to apply the filter on. | yes      |
+| value    | The value to use in the filter.          | yes      |

Review Comment:
   There is some redundancy in the docs. Perhaps, we can merge all these 
filters into a single table. 
   ```suggestion
   | Property | Description                              | Required |
   |----------|------------------------------------------|----------|
   | type     | Set this value to '=', '>=', ....`<=`.                  | yes   
   |
   | column   | The table column to apply the filter on. | yes      |
   | value    | The value to use in the filter.          | yes      |
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to