[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16210174#comment-16210174
 ] 

ASF GitHub Bot commented on DRILL-5783:
---------------------------------------

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r145508960
  
    --- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/xsort/TestExternalSort.java
 ---
    @@ -138,34 +141,34 @@ public void testNewColumnsManaged() throws Exception {
         testNewColumns(false);
       }
     
    -
       @Test
       public void testNewColumnsLegacy() throws Exception {
         testNewColumns(true);
       }
     
       private void testNewColumns(boolean testLegacy) throws Exception {
         final int record_count = 10000;
    -    String dfs_temp = getDfsTestTmpSchemaLocation();
    -    System.out.println(dfs_temp);
    -    File table_dir = new File(dfs_temp, "newColumns");
    -    table_dir.mkdir();
    -    try (BufferedOutputStream os = new BufferedOutputStream(new 
FileOutputStream(new File(table_dir, "a.json")))) {
    -      String format = "{ a : %d, b : %d }%n";
    -      for (int i = 0; i <= record_count; i += 2) {
    -        os.write(String.format(format, i, i).getBytes());
    -      }
    +    final String tableDirName = "newColumns";
    +
    +    TableFileBuilder tableA = new TableFileBuilder(Lists.newArrayList("a", 
"b"), Lists.newArrayList("%d", "%d"));
    --- End diff --
    
    Clever -- but how do we handle types? The original code created JSON of the 
form:
    
    ```
    { a : 10, b : 20 }
    ```
    
    Aside from the fact that the labels are not true JSON (not quoted), the 
mechanism does not work for strings, which need to be quoted. The mechanism 
here assumes numbers. But, of course, if we assume numbers, we don't need the 
second argument, the "%d".
    
    If you look in the `ClusterFixture` code, you'll see a method 
`ClusterFixture.stringify()` that converts an Object to a SQL-compatible 
string. We could create a similar one for JSON.
    
    But, if we take a step back, we are creating JSON. Perfectly fine JSON 
builder classes exist that can be used to build up type-aware JSON and render 
the result as a string. The one limitation is that these classes are for single 
documents; they can't handle the non-standard list-of-objects format that Drill 
users. Still, we can use the per-object classes to build up the list of objects.
    
    Yet another solution is to use the `RowSet` classes which are type aware. 
Yes, they build value vectors, but that is not important here. What is 
important is that they use the full schema power of Drill: including data type, 
repeated, nullable, etc. Then, we just create a RowSet-to-JSON converter. In 
fact, I may have something like that in the "Jig" project I did earlier; I'll 
rummage around and see if I can find it. 


> Make code generation in the TopN operator more modular and test it
> ------------------------------------------------------------------
>
>                 Key: DRILL-5783
>                 URL: https://issues.apache.org/jira/browse/DRILL-5783
>             Project: Apache Drill
>          Issue Type: Improvement
>            Reporter: Timothy Farkas
>            Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to