[ 
https://issues.apache.org/jira/browse/ORC-168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Blum updated ORC-168:
-----------------------------
    Description: 
There's one little mistake in the Java Core Example. The for loops starts with 
a batch.size++:

for(int r=0; r < 10000; ++r) {
  int row = batch.size++;
  x.vector[row] = r;
  y.vector[row] = r * 3;
  // If the batch is full, write it out and start over.
  if (batch.size == batch.getMaxSize()) {
    writer.addRowBatch(batch);
    batch.reset();
  }
}

If you start with a batch.size++ the first index will be 1, so the first entry 
in the orc file will be empty.
Correct is:

for(int r=0; r < 10000; ++r) {
  int row = batch.size;
  x.vector[row] = r;
  y.vector[row] = r * 3;
  // If the batch is full, write it out and start over.
  if (batch.size == batch.getMaxSize()) {
    writer.addRowBatch(batch);
    batch.reset();
  }
 batch.size++;
}

Already tested it in scala.

  was:
There's one little mistake in the Java Core Example. The for loops starts with 
a batch.size++:

for(int r=0; r < 10000; ++r) {
  int row = batch.size++;
  x.vector[row] = r;
  y.vector[row] = r * 3;
  // If the batch is full, write it out and start over.
  if (batch.size == batch.getMaxSize()) {
    writer.addRowBatch(batch);
    batch.reset();
  }
}

If you start with a batch.size++ the first index will be 1, so the first entry 
in the orc file will be empty.
Correct is:

for(int r=0; r < 10000; ++r) {
  x.vector[row] = r;
  y.vector[row] = r * 3;
  int row = batch.size++;
  // If the batch is full, write it out and start over.
  if (batch.size == batch.getMaxSize()) {
    writer.addRowBatch(batch);
    batch.reset();
  }
}

Already tested it in scala.


> Documentation Writer Example: batch.size++ has a wrong possition
> ----------------------------------------------------------------
>
>                 Key: ORC-168
>                 URL: https://issues.apache.org/jira/browse/ORC-168
>             Project: ORC
>          Issue Type: Improvement
>          Components: documentation, Java
>            Reporter: Philipp Blum
>            Priority: Critical
>
> There's one little mistake in the Java Core Example. The for loops starts 
> with a batch.size++:
> for(int r=0; r < 10000; ++r) {
>   int row = batch.size++;
>   x.vector[row] = r;
>   y.vector[row] = r * 3;
>   // If the batch is full, write it out and start over.
>   if (batch.size == batch.getMaxSize()) {
>     writer.addRowBatch(batch);
>     batch.reset();
>   }
> }
> If you start with a batch.size++ the first index will be 1, so the first 
> entry in the orc file will be empty.
> Correct is:
> for(int r=0; r < 10000; ++r) {
>   int row = batch.size;
>   x.vector[row] = r;
>   y.vector[row] = r * 3;
>   // If the batch is full, write it out and start over.
>   if (batch.size == batch.getMaxSize()) {
>     writer.addRowBatch(batch);
>     batch.reset();
>   }
>  batch.size++;
> }
> Already tested it in scala.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to