[ https://issues.apache.org/jira/browse/BEAM-4361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Robertson updated BEAM-4361: -------------------------------- Description: Add a paragraph demonstrating the usage of {{TableSnapshotInputFormat}} as a mechanism for doing efficient full scans over HBase to https://beam.apache.org/documentation/io/built-in/hadoop/ Typically in MR / Spark this yields up to 3-4x improvement over hitting region servers directly and keeps load (GC etc) from those services. I have it tested and an example ready. was: Add a paragraph demonstrating the usage of {{TableSnapshotInputFormat }} as a mechanism for doing efficient full scans over HBase to https://beam.apache.org/documentation/io/built-in/hadoop/ Typically in MR / Spark this yields up to 3-4x improvement over hitting region servers directly and keeps load (GC etc) from those services. I have it tested and an example ready. > Document usage of HBase TableSnapshotInputFormat > ------------------------------------------------- > > Key: BEAM-4361 > URL: https://issues.apache.org/jira/browse/BEAM-4361 > Project: Beam > Issue Type: Task > Components: website > Affects Versions: 2.4.0 > Reporter: Tim Robertson > Assignee: Tim Robertson > Priority: Trivial > > Add a paragraph demonstrating the usage of {{TableSnapshotInputFormat}} as a > mechanism for doing efficient full scans over HBase to > https://beam.apache.org/documentation/io/built-in/hadoop/ > Typically in MR / Spark this yields up to 3-4x improvement over hitting > region servers directly and keeps load (GC etc) from those services. > I have it tested and an example ready. -- This message was sent by Atlassian JIRA (v7.6.3#76005)