paul-rogers commented on a change in pull request #1618: DRILL-6950: Row set-based scan framework URL: https://github.com/apache/drill/pull/1618#discussion_r259597700
########## File path: exec/java-exec/src/test/java/org/apache/drill/exec/store/easy/text/compliant/TestCsv.java ########## @@ -151,12 +151,77 @@ private String makeStatement(String fileName) { return "SELECT * FROM `dfs.data`.`" + fileName + "`"; } - private void buildFile(String fileName, String[] data) throws IOException { + private static void buildFile(String fileName, String[] data) throws IOException { try(PrintWriter out = new PrintWriter(new FileWriter(new File(testDir, fileName)))) { for (String line : data) { out.println(line); } } } + /** + * Verify that the wildcard expands columns to the header names, including + * case + */ + @Test + public void testWildcard() throws IOException { + String sql = "SELECT * FROM `dfs.data`.`%s`"; + RowSet actual = client.queryBuilder().sql(sql, CASE2_FILE_NAME).rowSet(); + + BatchSchema expectedSchema = new SchemaBuilder() + .add("a", MinorType.VARCHAR) + .add("b", MinorType.VARCHAR) + .add("c", MinorType.VARCHAR) + .build(); + assertTrue(expectedSchema.isEquivalent(actual.batchSchema())); + + RowSet expected = new RowSetBuilder(client.allocator(), expectedSchema) + .addRow("10", "foo", "bar") + .build(); + RowSetUtilities.verify(expected, actual); + } + + /** + * Verify that implicit columns are recognized and populated. Sanity test + * of just one implicit column. + */ + @Test + public void testImplicitColsExplicitSelect() throws IOException { + String sql = "SELECT A, filename FROM `dfs.data`.`%s`"; + RowSet actual = client.queryBuilder().sql(sql, CASE2_FILE_NAME).rowSet(); + + BatchSchema expectedSchema = new SchemaBuilder() + .add("A", MinorType.VARCHAR) + .addNullable("filename", MinorType.VARCHAR) Review comment: Very good question. This PR includes no changes to the CSV reader. This particular test acts as a baseline: it verifies the current behavior before we make any changes. So, we are asserting that the current code creates the implicit column as nullable. I tried changing the test to expect a required column. Got this error: ``` java.lang.AssertionError: Schemas don't match. Expected: [TupleSchema [..., [PrimitiveColumnMetadata [`filename` (VARCHAR(0, 0):REQUIRED)], projected]] Actual: [TupleSchema [..., [PrimitiveColumnMetadata [`filename` (VARCHAR(0, 0):OPTIONAL)], projected]] ``` And, indeed, we do add the fields as nullable: see [ScanBatch.addImplicitVectors()](https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java#L330). The general theme here is that we do need very thorough tests of existing behavior for each reader so that when we replace them with the ResultSetLoader-aware versions, we can verify that there are no user-visible behavior changes. In the new world, we could make the file name columns be required, but the partition columns should still be nullable, right? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services