[ 
https://issues.apache.org/jira/browse/ORC-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968160#comment-15968160
 ] 

ASF GitHub Bot commented on ORC-168:
------------------------------------

Github user Citrullin commented on a diff in the pull request:

    https://github.com/apache/orc/pull/104#discussion_r111478051
  
    --- Diff: site/_docs/core-java.md ---
    @@ -233,14 +233,15 @@ VectorizedRowBatch batch = schema.createRowBatch();
     LongColumnVector x = (LongColumnVector) batch.cols[0];
     LongColumnVector y = (LongColumnVector) batch.cols[1];
     for(int r=0; r < 10000; ++r) {
    -  int row = batch.size++;
    +  int row = batch.size;
       x.vector[row] = r;
    --- End diff --
    
    Hi, I'm not that good in Java. I'm more familiar with Scala. But you take a 
look into the Library I wrote. I created two branches where I changed only the 
position of the up counting.
    
    In example-1 I count both batch.size and rowBatchSize up before I add a row 
to VectorizedBatch. 
    [See more 
here](https://github.com/Citrullin/scalaOrcWriter/blob/ORC-168-wrong-example-1/src/main/scala/citrullin/orcwriter/OrcWriter.scala#L77)
    
    In example 2 I count only batch.size up before I add the row to the batch. 
rowBatchSize will up counted after a row is written.
    [See more 
here](https://github.com/Citrullin/scalaOrcWriter/blob/ORC-168-wrong-example-2/src/main/scala/citrullin/orcwriter/OrcWriter.scala#L77)
    
    The working example is in the dev branch. 
    [More 
here](https://github.com/Citrullin/scalaOrcWriter/blob/dev/src/main/scala/citrullin/orcwriter/OrcWriter.scala#L77)
    
    You can run the Implicit ComplexMap example to see the differences. 
    [Here is the 
source](https://github.com/Citrullin/scalaOrcWriter/blob/dev/src/main/scala/citrullin/orcwriter/examples/orcwriterimplicitapi/WriteComplexMap.scala)



> Documentation Writer Example: batch.size++ has a wrong possition
> ----------------------------------------------------------------
>
>                 Key: ORC-168
>                 URL: https://issues.apache.org/jira/browse/ORC-168
>             Project: ORC
>          Issue Type: Improvement
>          Components: documentation, Java
>            Reporter: Philipp Blum
>            Priority: Critical
>
> There's one little mistake in the Java Core Example. The for loops starts 
> with a batch.size++:
> for(int r=0; r < 10000; ++r) {
>   int row = batch.size++;
>   x.vector[row] = r;
>   y.vector[row] = r * 3;
>   // If the batch is full, write it out and start over.
>   if (batch.size == batch.getMaxSize()) {
>     writer.addRowBatch(batch);
>     batch.reset();
>   }
> }
> If you start with a batch.size++ the first index will be 1, so the first 
> entry in the orc file will be empty.
> Correct is:
> for(int r=0; r < 10000; ++r) {
>   int row = batch.size;
>   x.vector[row] = r;
>   y.vector[row] = r * 3;
>   // If the batch is full, write it out and start over.
>   if (batch.size == batch.getMaxSize()) {
>     writer.addRowBatch(batch);
>     batch.reset();
>   }
>  batch.size++;
> }
> Already tested it in scala.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to