[GitHub] paul-rogers commented on a change in pull request #1618: DRILL-6950: Row set-based scan framework

GitBox Sat, 23 Feb 2019 16:05:31 -0800

paul-rogers commented on a change in pull request #1618: DRILL-6950: Row 
set-based scan framework
URL: https://github.com/apache/drill/pull/1618#discussion_r259597700


 ##########
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/easy/text/compliant/TestCsv.java
 ##########
 @@ -151,12 +151,77 @@ private String makeStatement(String fileName) {
     return "SELECT * FROM `dfs.data`.`" + fileName + "`";
   }
 
-  private void buildFile(String fileName, String[] data) throws IOException {
+  private static void buildFile(String fileName, String[] data) throws 
IOException {
     try(PrintWriter out = new PrintWriter(new FileWriter(new File(testDir, 
fileName)))) {
       for (String line : data) {
         out.println(line);
       }
     }
   }
 
+  /**
+   * Verify that the wildcard expands columns to the header names, including
+   * case
+   */
+  @Test
+  public void testWildcard() throws IOException {
+    String sql = "SELECT * FROM `dfs.data`.`%s`";
+    RowSet actual = client.queryBuilder().sql(sql, CASE2_FILE_NAME).rowSet();
+
+    BatchSchema expectedSchema = new SchemaBuilder()
+        .add("a", MinorType.VARCHAR)
+        .add("b", MinorType.VARCHAR)
+        .add("c", MinorType.VARCHAR)
+        .build();
+    assertTrue(expectedSchema.isEquivalent(actual.batchSchema()));
+
+    RowSet expected = new RowSetBuilder(client.allocator(), expectedSchema)
+        .addRow("10", "foo", "bar")
+        .build();
+    RowSetUtilities.verify(expected, actual);
+  }
+
+  /**
+   * Verify that implicit columns are recognized and populated. Sanity test
+   * of just one implicit column.
+   */
+  @Test
+  public void testImplicitColsExplicitSelect() throws IOException {
+    String sql = "SELECT A, filename FROM `dfs.data`.`%s`";
+    RowSet actual = client.queryBuilder().sql(sql, CASE2_FILE_NAME).rowSet();
+
+    BatchSchema expectedSchema = new SchemaBuilder()
+        .add("A", MinorType.VARCHAR)
+        .addNullable("filename", MinorType.VARCHAR)
 
 Review comment:
   Very good question. This PR includes no changes to the CSV reader. This 
particular test acts as a baseline: it verifies the current behavior before we 
make any changes. So, we are asserting that the current code creates the 
implicit column as nullable.
   
   I tried changing the test to expect a required column. Got this error:
   
   ```
   java.lang.AssertionError: Schemas don't match.
   Expected: [TupleSchema [..., [PrimitiveColumnMetadata [`filename` 
(VARCHAR(0, 0):REQUIRED)], projected]]
   Actual: [TupleSchema [..., [PrimitiveColumnMetadata [`filename` (VARCHAR(0, 
0):OPTIONAL)], projected]]
   ```
   
   And, indeed, we do add the fields as nullable: see 
[ScanBatch.addImplicitVectors()](https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java#L330).
   
   The general theme here is that we do need very thorough tests of existing 
behavior for each reader so that when we replace them with the 
ResultSetLoader-aware versions, we can verify that there are no user-visible 
behavior changes.
   
   In the new world, we could make the file name columns be required, but the 
partition columns should still be nullable, right?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] paul-rogers commented on a change in pull request #1618: DRILL-6950: Row set-based scan framework

Reply via email to