[jira] [Commented] (ORC-233) Allow `orc.include.columns` to be empty

ASF GitHub Bot (JIRA) Tue, 22 Aug 2017 00:01:42 -0700

    [ 
https://issues.apache.org/jira/browse/ORC-233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16136413#comment-16136413
 ]


ASF GitHub Bot commented on ORC-233:
------------------------------------

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/orc/pull/160#discussion_r134398544
  
    --- Diff: 
java/mapreduce/src/test/org/apache/orc/mapreduce/TestMapreduceOrcOutputFormat.java
 ---
    @@ -153,11 +153,55 @@ public void testColumnSelection() throws Exception {
         assertEquals(false, reader.nextKeyValue());
       }
     
    +  @Test
    +  public void testColumnSelectionBlank() throws Exception {
    +    String typeStr = "struct<i:int,j:int,k:int>";
    +    OrcConf.MAPRED_OUTPUT_SCHEMA.setString(conf, typeStr);
    +    conf.set("mapreduce.output.fileoutputformat.outputdir", 
workDir.toString());
    +    conf.setInt(OrcConf.ROW_INDEX_STRIDE.getAttribute(), 1000);
    +    conf.setBoolean(OrcOutputFormat.SKIP_TEMP_DIRECTORY, true);
    +    TaskAttemptID id = new TaskAttemptID("jt", 0, TaskType.MAP, 0, 1);
    +    TaskAttemptContext attemptContext = new TaskAttemptContextImpl(conf, 
id);
    +    OutputFormat<NullWritable, OrcStruct> outputFormat =
    +            new OrcOutputFormat<OrcStruct>();
    +    RecordWriter<NullWritable, OrcStruct> writer =
    +            outputFormat.getRecordWriter(attemptContext);
     
    -  /**
    -   * Make sure that the writer ignores the OrcKey
    -   * @throws Exception
    -   */
    +    // write 4000 rows with the integer and the binary string
    +    TypeDescription type = TypeDescription.fromString(typeStr);
    +    OrcStruct row = (OrcStruct) OrcStruct.createValue(type);
    +    NullWritable nada = NullWritable.get();
    +    for (int r = 0; r < 3000; ++r) {
    +      row.setFieldValue(0, new IntWritable(r));
    +      row.setFieldValue(1, new IntWritable(r * 2));
    +      row.setFieldValue(2, new IntWritable(r * 3));
    +      writer.write(nada, row);
    +    }
    +    writer.close(attemptContext);
    +
    +    conf.set(OrcConf.INCLUDE_COLUMNS.getAttribute(), "");
    +    FileSplit split = new FileSplit(new Path(workDir, "part-m-00000.orc"),
    +            0, 1000000, new String[0]);
    +    RecordReader<NullWritable, OrcStruct> reader =
    +            new OrcInputFormat<OrcStruct>().createRecordReader(split,
    +                    attemptContext);
    +    // the sarg should cause it to skip over the rows except 1000 to 2000
    +    for (int r = 0; r < 3000; ++r) {
    +      assertEquals(true, reader.nextKeyValue());
    +      row = reader.getCurrentValue();
    +      assertEquals(null, ((IntWritable) row.getFieldValue(0)));
    +      assertEquals(null, row.getFieldValue(1));
    +      assertEquals(null, ((IntWritable) row.getFieldValue(2)));
    +    }
    +    assertEquals(false, reader.nextKeyValue());
    +  }
    +
    +
    +
    +    /**
    +     * Make sure that the writer ignores the OrcKey
    +     * @throws Exception
    +     */
    --- End diff --
    
    Could you remove the redundant blank here in line 201 ~ 204?


> Allow `orc.include.columns` to be empty
> ---------------------------------------
>
>                 Key: ORC-233
>                 URL: https://issues.apache.org/jira/browse/ORC-233
>             Project: ORC
>          Issue Type: Bug
>          Components: Java
>    Affects Versions: 1.4.0
>            Reporter: Dongjoon Hyun
>
> Apache ORC should support returning all NULLs by the following.
> {code}
> conf.set(OrcConf.INCLUDE_COLUMNS.getAttribute, "")
> {code}
> Currently, it raises the following exceptions.
> {code}
> For input string: ""
> java.lang.NumberFormatException: For input string: ""
>       at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>       at java.lang.Integer.parseInt(Integer.java:592)
>       at java.lang.Integer.parseInt(Integer.java:615)
>       at 
> org.apache.orc.mapred.OrcInputFormat.parseInclude(OrcInputFormat.java:69)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (ORC-233) Allow `orc.include.columns` to be empty

Reply via email to