Good point Bhupesh. Should we have maxTuplesPerWindow or emitBatchSize  ?

Input operators in Malhar have different configurations :
1) maxTuplesPerWindow as in kafaka input operator
2) emitBatchSize as in AbstractFileInputOperator

Going forward should we have some guidelines on what parameters should be
defined in input operators? Supporting data size instead of number of
tuples, like bandwidth control should also be part of input operators.

Regards,
Sandeep

On Mon, Dec 21, 2015 at 3:03 PM, Bhupesh Chawda <[email protected]>
wrote:

> Hi All,
>
> Any reason why the HBase input operators in Apex Malhar - contrib are doing
> the entire table scan in the emitTuples() method. Shouldn't this just
> return a single row each time? The current way seems to be sending the
> entire table in a single call of emitTuples().
>
> Here is the code fragment from HBaseScanOperator:
>
>   @Override
> >   public void emitTuples()
> >   {
> >     try {
> >       HTable table = getTable();
> >       Scan scan = operationScan();
> >       ResultScanner scanner = table.getScanner(scan);
> >       for (Result result : scanner) {
> >         //KeyValue[] kvs = result.raw();
> >         //T t = getTuple(kvs);
> >         T t = getTuple(result);
> >         outputPort.emit(t);
> >       }
> >     } catch (Exception e) {
> >       e.printStackTrace();
> >     }
> >   }
> >
> >
> Thanks.
> Bhupesh
>

Reply via email to