iemejia commented on a change in pull request #10815: [BEAM-9279] Make 
HBase.ReadAll based on Reads instead of HBaseQuery
URL: https://github.com/apache/beam/pull/10815#discussion_r394859324
 
 

 ##########
 File path: 
sdks/java/io/hbase/src/main/java/org/apache/beam/sdk/io/hbase/HBaseReadSplittableDoFn.java
 ##########
 @@ -32,65 +31,50 @@
 import org.apache.hadoop.hbase.client.ConnectionFactory;
 import org.apache.hadoop.hbase.client.Result;
 import org.apache.hadoop.hbase.client.ResultScanner;
-import org.apache.hadoop.hbase.client.Scan;
 import org.apache.hadoop.hbase.client.Table;
 
 /** A SplittableDoFn to read from HBase. */
 @BoundedPerElement
-class HBaseReadSplittableDoFn extends DoFn<HBaseQuery, Result> {
-  private final SerializableConfiguration serializableConfiguration;
-
-  private transient Connection connection;
-
-  HBaseReadSplittableDoFn(SerializableConfiguration serializableConfiguration) 
{
-    this.serializableConfiguration = serializableConfiguration;
-  }
-
-  @Setup
-  public void setup() throws Exception {
-    connection = 
ConnectionFactory.createConnection(serializableConfiguration.get());
-  }
-
-  private static Scan newScanInRange(Scan scan, ByteKeyRange range) throws 
IOException {
-    return new Scan(scan)
-        .setStartRow(range.getStartKey().getBytes())
-        .setStopRow(range.getEndKey().getBytes());
-  }
+class HBaseReadSplittableDoFn extends DoFn<Read, Result> {
+  HBaseReadSplittableDoFn() {}
 
   @ProcessElement
-  public void processElement(ProcessContext c, 
RestrictionTracker<ByteKeyRange, ByteKey> tracker)
+  public void processElement(
+      @Element Read read,
+      OutputReceiver<Result> out,
+      RestrictionTracker<ByteKeyRange, ByteKey> tracker)
       throws Exception {
-    final HBaseQuery query = c.element();
-    TableName tableName = TableName.valueOf(query.getTableId());
+    Connection connection = 
ConnectionFactory.createConnection(read.getConfiguration());
 
 Review comment:
   Better filled https://issues.apache.org/jira/browse/BEAM-9554 to track this. 
No I did not test performance because this is a quite particular case as I 
mention, for users doing 1 to n queries when n is big this time would not be 
considerable, the real issue can manifest mostly in pipelines with streaming 
where we would like to do reads per window with multiple windows kind of 
similar to what we found for JdbcIO writes (but this case is waaay more common).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to