Good point Bhupesh. Should we have maxTuplesPerWindow or emitBatchSize ? Input operators in Malhar have different configurations : 1) maxTuplesPerWindow as in kafaka input operator 2) emitBatchSize as in AbstractFileInputOperator
Going forward should we have some guidelines on what parameters should be defined in input operators? Supporting data size instead of number of tuples, like bandwidth control should also be part of input operators. Regards, Sandeep On Mon, Dec 21, 2015 at 3:03 PM, Bhupesh Chawda <[email protected]> wrote: > Hi All, > > Any reason why the HBase input operators in Apex Malhar - contrib are doing > the entire table scan in the emitTuples() method. Shouldn't this just > return a single row each time? The current way seems to be sending the > entire table in a single call of emitTuples(). > > Here is the code fragment from HBaseScanOperator: > > @Override > > public void emitTuples() > > { > > try { > > HTable table = getTable(); > > Scan scan = operationScan(); > > ResultScanner scanner = table.getScanner(scan); > > for (Result result : scanner) { > > //KeyValue[] kvs = result.raw(); > > //T t = getTuple(kvs); > > T t = getTuple(result); > > outputPort.emit(t); > > } > > } catch (Exception e) { > > e.printStackTrace(); > > } > > } > > > > > Thanks. > Bhupesh >
