Podling Report Reminder - December 2016

2016-11-28 Thread johndament
Dear podling,

This email was sent by an automated system on behalf of the Apache
Incubator PMC. It is an initial reminder to give you plenty of time to
prepare your quarterly board report.

The board meeting is scheduled for Wed, 21 December 2016, 10:30 am PDT.
The report for your podling will form a part of the Incubator PMC
report. The Incubator PMC requires your report to be submitted 2 weeks
before the board meeting, to allow sufficient time for review and
submission (Wed, December 07).

Please submit your report with sufficient time to allow the Incubator
PMC, and subsequently board members to review and digest. Again, the
very latest you should submit your report is 2 weeks prior to the board
meeting.

Thanks,

The Apache Incubator PMC

Submitting your Report

--

Your report should contain the following:

*   Your project name
*   A brief description of your project, which assumes no knowledge of
the project or necessarily of its field
*   A list of the three most important issues to address in the move
towards graduation.
*   Any issues that the Incubator PMC or ASF Board might wish/need to be
aware of
*   How has the community developed since the last report
*   How has the project developed since the last report.

This should be appended to the Incubator Wiki page at:

https://wiki.apache.org/incubator/December2016

Note: This is manually populated. You may need to wait a little before
this page is created from a template.

Mentors
---

Mentors should review reports for their project(s) and sign them off on
the Incubator wiki page. Signing off reports shows that you are
following the project - projects that are not signed may raise alarms
for the Incubator PMC.

Incubator PMC


[GitHub] incubator-tephra pull request #20: Compute global prune upper bound using co...

2016-11-28 Thread anew
Github user anew commented on a diff in the pull request:

https://github.com/apache/incubator-tephra/pull/20#discussion_r89938666
  
--- Diff: 
tephra-hbase-compat-1.1-base/src/main/java/org/apache/tephra/hbase/coprocessor/janitor/DataJanitorState.java
 ---
@@ -19,20 +19,46 @@
 
 package org.apache.tephra.hbase.coprocessor.janitor;
 
+import com.google.common.collect.Maps;
+import com.google.common.primitives.Longs;
+import org.apache.hadoop.hbase.client.Delete;
 import org.apache.hadoop.hbase.client.Get;
 import org.apache.hadoop.hbase.client.Put;
+import org.apache.hadoop.hbase.client.Result;
+import org.apache.hadoop.hbase.client.ResultScanner;
+import org.apache.hadoop.hbase.client.Scan;
 import org.apache.hadoop.hbase.client.Table;
 import org.apache.hadoop.hbase.util.Bytes;
 
 import java.io.IOException;
+import java.util.Map;
+import java.util.Set;
+import java.util.SortedSet;
+import java.util.TreeMap;
+import java.util.TreeSet;
+import javax.annotation.Nullable;
 
 /**
  * Persist data janitor state into an HBase table.
--- End diff --

Is it true that this class is shared between the DataJanitor coprocessor 
(for saving prune state) and the DataJanitorPlugin in the tx manager (for 
reading the state)? If so, then this deserves mentioning in the javadocs


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-tephra pull request #20: Compute global prune upper bound using co...

2016-11-28 Thread anew
Github user anew commented on a diff in the pull request:

https://github.com/apache/incubator-tephra/pull/20#discussion_r89938828
  
--- Diff: 
tephra-hbase-compat-1.1-base/src/main/java/org/apache/tephra/hbase/coprocessor/janitor/DataJanitorState.java
 ---
@@ -58,10 +105,237 @@ public long getPruneUpperBound(byte[] regionId) 
throws IOException {
 }
   }
 
+  /**
+   * Get latest prune upper bounds for given regions
+   *
+   * @param regions a set of regions
+   * @return a map containing region id and its latest prune upper bound 
value
+   * @throws IOException when not able to read the data from HBase
+   */
+  public Map getPruneUpperBoundForRegions(SortedSet 
regions) throws IOException {
+Map resultMap = new TreeMap<>(Bytes.BYTES_COMPARATOR);
+try (Table stateTable = stateTableSupplier.get()) {
+  byte[] startRow = makeRegionKey(EMPTY_BYTE_ARRAY);
+  Scan scan = new Scan(startRow, REGION_KEY_PREFIX_STOP);
+  scan.addColumn(FAMILY, PRUNE_UPPER_BOUND_COL);
+
+  try (ResultScanner scanner = stateTable.getScanner(scan)) {
+Result next;
+while ((next = scanner.next()) != null) {
+  byte[] region = getRegionFromKey(next.getRow());
+  if (regions.contains(region)) {
+byte[] timeBytes = next.getValue(FAMILY, 
PRUNE_UPPER_BOUND_COL);
+if (timeBytes != null) {
+  long pruneUpperBoundRegion = Bytes.toLong(timeBytes);
+  resultMap.put(region, pruneUpperBoundRegion);
+}
+  }
+}
+  }
+  return resultMap;
+}
+  }
+
+  /**
+   * Delete all regions that are not in the given exclude set and whose 
prune upper bound is less than a given value
+   *
+   * @param deletionPruneUpperBound prune upper bound below which regions 
will be deleted
+   * @param excludeRegions set of regions that should not be deleted
+   * @throws IOException when not able to delete data in HBase
+   */
+  public void deleteRegionsWithPruneUpperBoundBefore(long 
deletionPruneUpperBound, SortedSet excludeRegions)
+throws IOException {
+try (Table stateTable = stateTableSupplier.get()) {
+  byte[] startRow = makeRegionKey(EMPTY_BYTE_ARRAY);
+  Scan scan = new Scan(startRow, REGION_KEY_PREFIX_STOP);
+  scan.addColumn(FAMILY, PRUNE_UPPER_BOUND_COL);
+
+  try (ResultScanner scanner = stateTable.getScanner(scan)) {
+Result next;
+while ((next = scanner.next()) != null) {
+  byte[] region = getRegionFromKey(next.getRow());
+  if (!excludeRegions.contains(region)) {
+byte[] timeBytes = next.getValue(FAMILY, 
PRUNE_UPPER_BOUND_COL);
+if (timeBytes != null) {
+  long pruneUpperBoundRegion = Bytes.toLong(timeBytes);
+  if (pruneUpperBoundRegion < deletionPruneUpperBound) {
+stateTable.delete(new Delete(next.getRow()));
+  }
+}
+  }
+}
+  }
+}
+  }
+
+  // ---
+  // --- Methods for regions at a given time ---
+  // Key: 0x2
+  // Col 't': 
+  // ---
+
+  /**
+   * Persist the regions for a given time
--- End diff --

what exactly does this persist? Simply what regions existed at that time?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-tephra pull request #20: Compute global prune upper bound using co...

2016-11-28 Thread anew
Github user anew commented on a diff in the pull request:

https://github.com/apache/incubator-tephra/pull/20#discussion_r89391681
  
--- Diff: 
tephra-hbase-compat-1.1-base/src/main/java/org/apache/tephra/hbase/coprocessor/janitor/DataJanitorState.java
 ---
@@ -58,10 +105,237 @@ public long getPruneUpperBound(byte[] regionId) 
throws IOException {
 }
   }
 
+  /**
+   * Get latest prune upper bounds for given regions
+   *
+   * @param regions a set of regions
+   * @return a map containing region id and its latest prune upper bound 
value
+   * @throws IOException when not able to read the data from HBase
+   */
+  public Map getPruneUpperBoundForRegions(SortedSet 
regions) throws IOException {
+Map resultMap = new TreeMap<>(Bytes.BYTES_COMPARATOR);
+try (Table stateTable = stateTableSupplier.get()) {
+  byte[] startRow = makeRegionKey(EMPTY_BYTE_ARRAY);
+  Scan scan = new Scan(startRow, REGION_KEY_PREFIX_STOP);
+  scan.addColumn(FAMILY, PRUNE_UPPER_BOUND_COL);
+
+  try (ResultScanner scanner = stateTable.getScanner(scan)) {
+Result next;
+while ((next = scanner.next()) != null) {
+  byte[] region = getRegionFromKey(next.getRow());
+  if (regions.contains(region)) {
+byte[] timeBytes = next.getValue(FAMILY, 
PRUNE_UPPER_BOUND_COL);
+if (timeBytes != null) {
+  long pruneUpperBoundRegion = Bytes.toLong(timeBytes);
+  resultMap.put(region, pruneUpperBoundRegion);
+}
+  }
+}
+  }
+  return resultMap;
+}
+  }
+
+  /**
+   * Delete all regions that are not in the given exclude set and whose 
prune upper bound is less than a given value
+   *
+   * @param deletionPruneUpperBound prune upper bound below which regions 
will be deleted
+   * @param excludeRegions set of regions that should not be deleted
+   * @throws IOException when not able to delete data in HBase
+   */
+  public void deleteRegionsWithPruneUpperBoundBefore(long 
deletionPruneUpperBound, SortedSet excludeRegions)
+throws IOException {
+try (Table stateTable = stateTableSupplier.get()) {
+  byte[] startRow = makeRegionKey(EMPTY_BYTE_ARRAY);
+  Scan scan = new Scan(startRow, REGION_KEY_PREFIX_STOP);
+  scan.addColumn(FAMILY, PRUNE_UPPER_BOUND_COL);
+
+  try (ResultScanner scanner = stateTable.getScanner(scan)) {
+Result next;
+while ((next = scanner.next()) != null) {
+  byte[] region = getRegionFromKey(next.getRow());
+  if (!excludeRegions.contains(region)) {
+byte[] timeBytes = next.getValue(FAMILY, 
PRUNE_UPPER_BOUND_COL);
+if (timeBytes != null) {
+  long pruneUpperBoundRegion = Bytes.toLong(timeBytes);
+  if (pruneUpperBoundRegion < deletionPruneUpperBound) {
+stateTable.delete(new Delete(next.getRow()));
+  }
+}
+  }
+}
+  }
+}
+  }
+
+  // ---
+  // --- Methods for regions at a given time ---
+  // Key: 0x2
+  // Col 't': 
+  // ---
+
+  /**
+   * Persist the regions for a given time
+   *
+   * @param time timestamp in milliseconds
+   * @param regions set of regions at the time
+   * @throws IOException when not able to persist the data to HBase
+   */
+  public void saveRegionsForTime(long time, Set regions) throws 
IOException {
+byte[] timeBytes = Bytes.toBytes(getInvertedTime(time));
+try (Table stateTable = stateTableSupplier.get()) {
+  for (byte[] region : regions) {
+Put put = new Put(makeTimeRegionKey(timeBytes, region));
+put.addColumn(FAMILY, REGION_TIME_COL, EMPTY_BYTE_ARRAY);
+stateTable.put(put);
+  }
+}
+  }
+
+  /**
+   * Return all the persisted regions for a time equal to or less than the 
given time
+   *
+   * @param time timestamp in milliseconds
+   * @return set of regions and time at which they were recorded
+   * @throws IOException when not able to read the data from HBase
+   */
+  @Nullable
+  public TimeRegions getRegionsOnOrBeforeTime(long time) throws 
IOException {
+byte[] timeBytes = Bytes.toBytes(getInvertedTime(time));
+try (Table stateTable = stateTableSupplier.get()) {
+  Scan scan = new Scan(makeTimeRegionKey(timeBytes, EMPTY_BYTE_ARRAY), 
REGION_TIME_KEY_PREFIX_STOP);
+  scan.addColumn(FAMILY, REGION_TIME_COL);
+
+  SortedSet regions = new TreeSet<>(Bytes.BYTES_COMPARATOR);
+  long currentRe

[GitHub] incubator-tephra pull request #20: Compute global prune upper bound using co...

2016-11-28 Thread anew
Github user anew commented on a diff in the pull request:

https://github.com/apache/incubator-tephra/pull/20#discussion_r89207839
  
--- Diff: 
tephra-hbase-compat-1.1-base/src/main/java/org/apache/tephra/hbase/coprocessor/janitor/DataJanitorState.java
 ---
@@ -58,10 +105,237 @@ public long getPruneUpperBound(byte[] regionId) 
throws IOException {
 }
   }
 
+  /**
+   * Get latest prune upper bounds for given regions
+   *
+   * @param regions a set of regions
+   * @return a map containing region id and its latest prune upper bound 
value
+   * @throws IOException when not able to read the data from HBase
+   */
+  public Map getPruneUpperBoundForRegions(SortedSet 
regions) throws IOException {
+Map resultMap = new TreeMap<>(Bytes.BYTES_COMPARATOR);
+try (Table stateTable = stateTableSupplier.get()) {
+  byte[] startRow = makeRegionKey(EMPTY_BYTE_ARRAY);
+  Scan scan = new Scan(startRow, REGION_KEY_PREFIX_STOP);
+  scan.addColumn(FAMILY, PRUNE_UPPER_BOUND_COL);
+
+  try (ResultScanner scanner = stateTable.getScanner(scan)) {
+Result next;
+while ((next = scanner.next()) != null) {
+  byte[] region = getRegionFromKey(next.getRow());
+  if (regions.contains(region)) {
+byte[] timeBytes = next.getValue(FAMILY, 
PRUNE_UPPER_BOUND_COL);
+if (timeBytes != null) {
+  long pruneUpperBoundRegion = Bytes.toLong(timeBytes);
+  resultMap.put(region, pruneUpperBoundRegion);
+}
+  }
+}
+  }
+  return resultMap;
+}
+  }
+
+  /**
+   * Delete all regions that are not in the given exclude set and whose 
prune upper bound is less than a given value
--- End diff --

period.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-tephra pull request #20: Compute global prune upper bound using co...

2016-11-28 Thread anew
Github user anew commented on a diff in the pull request:

https://github.com/apache/incubator-tephra/pull/20#discussion_r89391878
  
--- Diff: 
tephra-hbase-compat-1.1-base/src/main/java/org/apache/tephra/hbase/coprocessor/janitor/DataJanitorState.java
 ---
@@ -58,10 +105,237 @@ public long getPruneUpperBound(byte[] regionId) 
throws IOException {
 }
   }
 
+  /**
+   * Get latest prune upper bounds for given regions
+   *
+   * @param regions a set of regions
+   * @return a map containing region id and its latest prune upper bound 
value
+   * @throws IOException when not able to read the data from HBase
+   */
+  public Map getPruneUpperBoundForRegions(SortedSet 
regions) throws IOException {
+Map resultMap = new TreeMap<>(Bytes.BYTES_COMPARATOR);
+try (Table stateTable = stateTableSupplier.get()) {
+  byte[] startRow = makeRegionKey(EMPTY_BYTE_ARRAY);
+  Scan scan = new Scan(startRow, REGION_KEY_PREFIX_STOP);
+  scan.addColumn(FAMILY, PRUNE_UPPER_BOUND_COL);
+
+  try (ResultScanner scanner = stateTable.getScanner(scan)) {
+Result next;
+while ((next = scanner.next()) != null) {
+  byte[] region = getRegionFromKey(next.getRow());
+  if (regions.contains(region)) {
+byte[] timeBytes = next.getValue(FAMILY, 
PRUNE_UPPER_BOUND_COL);
+if (timeBytes != null) {
+  long pruneUpperBoundRegion = Bytes.toLong(timeBytes);
+  resultMap.put(region, pruneUpperBoundRegion);
+}
+  }
+}
+  }
+  return resultMap;
+}
+  }
+
+  /**
+   * Delete all regions that are not in the given exclude set and whose 
prune upper bound is less than a given value
+   *
+   * @param deletionPruneUpperBound prune upper bound below which regions 
will be deleted
+   * @param excludeRegions set of regions that should not be deleted
+   * @throws IOException when not able to delete data in HBase
+   */
+  public void deleteRegionsWithPruneUpperBoundBefore(long 
deletionPruneUpperBound, SortedSet excludeRegions)
+throws IOException {
+try (Table stateTable = stateTableSupplier.get()) {
+  byte[] startRow = makeRegionKey(EMPTY_BYTE_ARRAY);
+  Scan scan = new Scan(startRow, REGION_KEY_PREFIX_STOP);
+  scan.addColumn(FAMILY, PRUNE_UPPER_BOUND_COL);
+
+  try (ResultScanner scanner = stateTable.getScanner(scan)) {
+Result next;
+while ((next = scanner.next()) != null) {
+  byte[] region = getRegionFromKey(next.getRow());
+  if (!excludeRegions.contains(region)) {
+byte[] timeBytes = next.getValue(FAMILY, 
PRUNE_UPPER_BOUND_COL);
+if (timeBytes != null) {
+  long pruneUpperBoundRegion = Bytes.toLong(timeBytes);
+  if (pruneUpperBoundRegion < deletionPruneUpperBound) {
+stateTable.delete(new Delete(next.getRow()));
+  }
+}
+  }
+}
+  }
+}
+  }
+
+  // ---
+  // --- Methods for regions at a given time ---
+  // Key: 0x2
+  // Col 't': 
+  // ---
+
+  /**
+   * Persist the regions for a given time
+   *
+   * @param time timestamp in milliseconds
+   * @param regions set of regions at the time
+   * @throws IOException when not able to persist the data to HBase
+   */
+  public void saveRegionsForTime(long time, Set regions) throws 
IOException {
+byte[] timeBytes = Bytes.toBytes(getInvertedTime(time));
+try (Table stateTable = stateTableSupplier.get()) {
+  for (byte[] region : regions) {
+Put put = new Put(makeTimeRegionKey(timeBytes, region));
+put.addColumn(FAMILY, REGION_TIME_COL, EMPTY_BYTE_ARRAY);
+stateTable.put(put);
+  }
+}
+  }
+
+  /**
+   * Return all the persisted regions for a time equal to or less than the 
given time
+   *
+   * @param time timestamp in milliseconds
+   * @return set of regions and time at which they were recorded
+   * @throws IOException when not able to read the data from HBase
+   */
+  @Nullable
+  public TimeRegions getRegionsOnOrBeforeTime(long time) throws 
IOException {
+byte[] timeBytes = Bytes.toBytes(getInvertedTime(time));
+try (Table stateTable = stateTableSupplier.get()) {
+  Scan scan = new Scan(makeTimeRegionKey(timeBytes, EMPTY_BYTE_ARRAY), 
REGION_TIME_KEY_PREFIX_STOP);
+  scan.addColumn(FAMILY, REGION_TIME_COL);
+
+  SortedSet regions = new TreeSet<>(Bytes.BYTES_COMPARATOR);
+  long currentRe

[GitHub] incubator-tephra pull request #20: Compute global prune upper bound using co...

2016-11-28 Thread anew
Github user anew commented on a diff in the pull request:

https://github.com/apache/incubator-tephra/pull/20#discussion_r89207961
  
--- Diff: 
tephra-hbase-compat-1.1-base/src/main/java/org/apache/tephra/hbase/coprocessor/janitor/DataJanitorState.java
 ---
@@ -58,10 +105,237 @@ public long getPruneUpperBound(byte[] regionId) 
throws IOException {
 }
   }
 
+  /**
+   * Get latest prune upper bounds for given regions
+   *
+   * @param regions a set of regions
+   * @return a map containing region id and its latest prune upper bound 
value
+   * @throws IOException when not able to read the data from HBase
+   */
+  public Map getPruneUpperBoundForRegions(SortedSet 
regions) throws IOException {
+Map resultMap = new TreeMap<>(Bytes.BYTES_COMPARATOR);
+try (Table stateTable = stateTableSupplier.get()) {
+  byte[] startRow = makeRegionKey(EMPTY_BYTE_ARRAY);
+  Scan scan = new Scan(startRow, REGION_KEY_PREFIX_STOP);
+  scan.addColumn(FAMILY, PRUNE_UPPER_BOUND_COL);
+
+  try (ResultScanner scanner = stateTable.getScanner(scan)) {
+Result next;
+while ((next = scanner.next()) != null) {
+  byte[] region = getRegionFromKey(next.getRow());
+  if (regions.contains(region)) {
+byte[] timeBytes = next.getValue(FAMILY, 
PRUNE_UPPER_BOUND_COL);
+if (timeBytes != null) {
+  long pruneUpperBoundRegion = Bytes.toLong(timeBytes);
+  resultMap.put(region, pruneUpperBoundRegion);
+}
+  }
+}
+  }
+  return resultMap;
+}
+  }
+
+  /**
+   * Delete all regions that are not in the given exclude set and whose 
prune upper bound is less than a given value
--- End diff --

what is this used for? Might be useful to say that in the javadoc


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-tephra pull request #20: Compute global prune upper bound using co...

2016-11-28 Thread anew
Github user anew commented on a diff in the pull request:

https://github.com/apache/incubator-tephra/pull/20#discussion_r89377823
  
--- Diff: 
tephra-hbase-compat-1.1-base/src/main/java/org/apache/tephra/hbase/coprocessor/janitor/DataJanitorState.java
 ---
@@ -58,10 +105,237 @@ public long getPruneUpperBound(byte[] regionId) 
throws IOException {
 }
   }
 
+  /**
+   * Get latest prune upper bounds for given regions
+   *
+   * @param regions a set of regions
+   * @return a map containing region id and its latest prune upper bound 
value
+   * @throws IOException when not able to read the data from HBase
+   */
+  public Map getPruneUpperBoundForRegions(SortedSet 
regions) throws IOException {
+Map resultMap = new TreeMap<>(Bytes.BYTES_COMPARATOR);
+try (Table stateTable = stateTableSupplier.get()) {
+  byte[] startRow = makeRegionKey(EMPTY_BYTE_ARRAY);
+  Scan scan = new Scan(startRow, REGION_KEY_PREFIX_STOP);
+  scan.addColumn(FAMILY, PRUNE_UPPER_BOUND_COL);
+
+  try (ResultScanner scanner = stateTable.getScanner(scan)) {
+Result next;
+while ((next = scanner.next()) != null) {
+  byte[] region = getRegionFromKey(next.getRow());
+  if (regions.contains(region)) {
+byte[] timeBytes = next.getValue(FAMILY, 
PRUNE_UPPER_BOUND_COL);
+if (timeBytes != null) {
+  long pruneUpperBoundRegion = Bytes.toLong(timeBytes);
+  resultMap.put(region, pruneUpperBoundRegion);
+}
+  }
+}
+  }
+  return resultMap;
+}
+  }
+
+  /**
+   * Delete all regions that are not in the given exclude set and whose 
prune upper bound is less than a given value
+   *
+   * @param deletionPruneUpperBound prune upper bound below which regions 
will be deleted
+   * @param excludeRegions set of regions that should not be deleted
+   * @throws IOException when not able to delete data in HBase
+   */
+  public void deleteRegionsWithPruneUpperBoundBefore(long 
deletionPruneUpperBound, SortedSet excludeRegions)
+throws IOException {
+try (Table stateTable = stateTableSupplier.get()) {
+  byte[] startRow = makeRegionKey(EMPTY_BYTE_ARRAY);
+  Scan scan = new Scan(startRow, REGION_KEY_PREFIX_STOP);
+  scan.addColumn(FAMILY, PRUNE_UPPER_BOUND_COL);
+
+  try (ResultScanner scanner = stateTable.getScanner(scan)) {
+Result next;
+while ((next = scanner.next()) != null) {
+  byte[] region = getRegionFromKey(next.getRow());
+  if (!excludeRegions.contains(region)) {
+byte[] timeBytes = next.getValue(FAMILY, 
PRUNE_UPPER_BOUND_COL);
+if (timeBytes != null) {
+  long pruneUpperBoundRegion = Bytes.toLong(timeBytes);
+  if (pruneUpperBoundRegion < deletionPruneUpperBound) {
+stateTable.delete(new Delete(next.getRow()));
+  }
+}
+  }
+}
+  }
+}
+  }
+
+  // ---
+  // --- Methods for regions at a given time ---
+  // Key: 0x2
+  // Col 't': 
+  // ---
+
+  /**
+   * Persist the regions for a given time
+   *
+   * @param time timestamp in milliseconds
+   * @param regions set of regions at the time
+   * @throws IOException when not able to persist the data to HBase
+   */
+  public void saveRegionsForTime(long time, Set regions) throws 
IOException {
+byte[] timeBytes = Bytes.toBytes(getInvertedTime(time));
+try (Table stateTable = stateTableSupplier.get()) {
+  for (byte[] region : regions) {
+Put put = new Put(makeTimeRegionKey(timeBytes, region));
+put.addColumn(FAMILY, REGION_TIME_COL, EMPTY_BYTE_ARRAY);
+stateTable.put(put);
+  }
+}
+  }
+
+  /**
+   * Return all the persisted regions for a time equal to or less than the 
given time
+   *
--- End diff --

If I read the code correctly, then this finds the greatest time that is 
less than the given time, and then return all regions with that exact time, but 
none that are older than that. Is that correct? The javadoc is not that clear 
about it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-tephra pull request #20: Compute global prune upper bound using co...

2016-11-28 Thread anew
Github user anew commented on a diff in the pull request:

https://github.com/apache/incubator-tephra/pull/20#discussion_r89391984
  
--- Diff: 
tephra-hbase-compat-1.1-base/src/main/java/org/apache/tephra/hbase/coprocessor/janitor/DataJanitorState.java
 ---
@@ -58,10 +105,237 @@ public long getPruneUpperBound(byte[] regionId) 
throws IOException {
 }
   }
 
+  /**
+   * Get latest prune upper bounds for given regions
+   *
+   * @param regions a set of regions
+   * @return a map containing region id and its latest prune upper bound 
value
+   * @throws IOException when not able to read the data from HBase
+   */
+  public Map getPruneUpperBoundForRegions(SortedSet 
regions) throws IOException {
+Map resultMap = new TreeMap<>(Bytes.BYTES_COMPARATOR);
+try (Table stateTable = stateTableSupplier.get()) {
+  byte[] startRow = makeRegionKey(EMPTY_BYTE_ARRAY);
+  Scan scan = new Scan(startRow, REGION_KEY_PREFIX_STOP);
+  scan.addColumn(FAMILY, PRUNE_UPPER_BOUND_COL);
+
+  try (ResultScanner scanner = stateTable.getScanner(scan)) {
+Result next;
+while ((next = scanner.next()) != null) {
+  byte[] region = getRegionFromKey(next.getRow());
+  if (regions.contains(region)) {
+byte[] timeBytes = next.getValue(FAMILY, 
PRUNE_UPPER_BOUND_COL);
+if (timeBytes != null) {
+  long pruneUpperBoundRegion = Bytes.toLong(timeBytes);
+  resultMap.put(region, pruneUpperBoundRegion);
+}
+  }
+}
+  }
+  return resultMap;
+}
+  }
+
+  /**
+   * Delete all regions that are not in the given exclude set and whose 
prune upper bound is less than a given value
+   *
+   * @param deletionPruneUpperBound prune upper bound below which regions 
will be deleted
+   * @param excludeRegions set of regions that should not be deleted
+   * @throws IOException when not able to delete data in HBase
+   */
+  public void deleteRegionsWithPruneUpperBoundBefore(long 
deletionPruneUpperBound, SortedSet excludeRegions)
+throws IOException {
+try (Table stateTable = stateTableSupplier.get()) {
+  byte[] startRow = makeRegionKey(EMPTY_BYTE_ARRAY);
+  Scan scan = new Scan(startRow, REGION_KEY_PREFIX_STOP);
+  scan.addColumn(FAMILY, PRUNE_UPPER_BOUND_COL);
+
+  try (ResultScanner scanner = stateTable.getScanner(scan)) {
+Result next;
+while ((next = scanner.next()) != null) {
+  byte[] region = getRegionFromKey(next.getRow());
+  if (!excludeRegions.contains(region)) {
+byte[] timeBytes = next.getValue(FAMILY, 
PRUNE_UPPER_BOUND_COL);
+if (timeBytes != null) {
+  long pruneUpperBoundRegion = Bytes.toLong(timeBytes);
+  if (pruneUpperBoundRegion < deletionPruneUpperBound) {
+stateTable.delete(new Delete(next.getRow()));
+  }
+}
+  }
+}
+  }
+}
+  }
+
+  // ---
+  // --- Methods for regions at a given time ---
+  // Key: 0x2
+  // Col 't': 
+  // ---
+
+  /**
+   * Persist the regions for a given time
+   *
+   * @param time timestamp in milliseconds
+   * @param regions set of regions at the time
+   * @throws IOException when not able to persist the data to HBase
+   */
+  public void saveRegionsForTime(long time, Set regions) throws 
IOException {
+byte[] timeBytes = Bytes.toBytes(getInvertedTime(time));
+try (Table stateTable = stateTableSupplier.get()) {
+  for (byte[] region : regions) {
+Put put = new Put(makeTimeRegionKey(timeBytes, region));
+put.addColumn(FAMILY, REGION_TIME_COL, EMPTY_BYTE_ARRAY);
+stateTable.put(put);
+  }
+}
+  }
+
+  /**
+   * Return all the persisted regions for a time equal to or less than the 
given time
+   *
+   * @param time timestamp in milliseconds
+   * @return set of regions and time at which they were recorded
+   * @throws IOException when not able to read the data from HBase
+   */
+  @Nullable
+  public TimeRegions getRegionsOnOrBeforeTime(long time) throws 
IOException {
+byte[] timeBytes = Bytes.toBytes(getInvertedTime(time));
+try (Table stateTable = stateTableSupplier.get()) {
+  Scan scan = new Scan(makeTimeRegionKey(timeBytes, EMPTY_BYTE_ARRAY), 
REGION_TIME_KEY_PREFIX_STOP);
+  scan.addColumn(FAMILY, REGION_TIME_COL);
+
+  SortedSet regions = new TreeSet<>(Bytes.BYTES_COMPARATOR);
+  long currentRe

[GitHub] incubator-tephra pull request #20: Compute global prune upper bound using co...

2016-11-28 Thread anew
Github user anew commented on a diff in the pull request:

https://github.com/apache/incubator-tephra/pull/20#discussion_r89206584
  
--- Diff: 
tephra-core/src/main/java/org/apache/tephra/janitor/DataJanitorPlugin.java ---
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.tephra.janitor;
+
+import org.apache.hadoop.conf.Configuration;
+
+import java.io.IOException;
+
+/**
+ * Data janitor interface to manage the invalid transaction list.
+ * There will be one such plugin per data store that will be invoked 
periodically
--- End diff --

Is there one per each HBase instance? Or one per HBase table?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-tephra pull request #20: Compute global prune upper bound using co...

2016-11-28 Thread anew
Github user anew commented on a diff in the pull request:

https://github.com/apache/incubator-tephra/pull/20#discussion_r89938407
  
--- Diff: 
tephra-core/src/main/java/org/apache/tephra/janitor/DataJanitorPlugin.java ---
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.tephra.janitor;
+
+import org.apache.hadoop.conf.Configuration;
+
+import java.io.IOException;
+
+/**
+ * Data janitor interface to manage the invalid transaction list.
+ * There will be one such plugin per data store that will be invoked 
periodically
+ * to fetch the prune upper bound for each data store.
+ * Invalid transaction list will pruned up to the minimum of prune upper 
bounds returned by all the plugins.
+ */
+public interface DataJanitorPlugin {
+  /**
+   * Called once at the beginning to initialize the plugin
+   */
+  void initialize(Configuration conf) throws IOException;
+
+  /**
+   * Called periodically to fetch prune upper bound for a data store
+   *
+   * @param time start time of this prune iteration
+   * @param pruneUpperBoundForTime upper bound for prune tx id for the 
given start time
--- End diff --

this is confusing. I thought this _returns_ the prune upper bound? Why does 
it need the upper bound passed in as a parameter. I think you need to explain 
better what this parameter means. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-tephra pull request #20: Compute global prune upper bound using co...

2016-11-28 Thread anew
Github user anew commented on a diff in the pull request:

https://github.com/apache/incubator-tephra/pull/20#discussion_r89938969
  
--- Diff: 
tephra-hbase-compat-1.1-base/src/main/java/org/apache/tephra/hbase/coprocessor/janitor/DefaultDataJanitorPlugin.java
 ---
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.tephra.hbase.coprocessor.janitor;
+
+import com.google.common.base.Function;
+import com.google.common.collect.Iterables;
+import com.google.common.collect.Maps;
+import com.google.common.collect.Sets;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.HRegionInfo;
+import org.apache.hadoop.hbase.HTableDescriptor;
+import org.apache.hadoop.hbase.TableName;
+import org.apache.hadoop.hbase.client.Admin;
+import org.apache.hadoop.hbase.client.Connection;
+import org.apache.hadoop.hbase.client.ConnectionFactory;
+import org.apache.hadoop.hbase.client.Table;
+import org.apache.hadoop.hbase.util.Bytes;
+import org.apache.tephra.TxConstants;
+import org.apache.tephra.hbase.coprocessor.TransactionProcessor;
+import org.apache.tephra.janitor.DataJanitorPlugin;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.SortedSet;
+import java.util.TreeSet;
+
+/**
+ * Default implementation of the {@link DataJanitorPlugin} for HBase
+ */
+@SuppressWarnings("WeakerAccess")
+public class DefaultDataJanitorPlugin implements DataJanitorPlugin {
+  public static final Logger LOG = 
LoggerFactory.getLogger(DefaultDataJanitorPlugin.class);
+
+  protected Configuration conf;
+  protected Connection connection;
+  protected DataJanitorState dataJanitorState;
+
+  @Override
+  public void initialize(Configuration conf) throws IOException {
+this.conf = conf;
+this.connection = ConnectionFactory.createConnection(conf);
+
+final TableName stateTable = 
TableName.valueOf(conf.get(TxConstants.DataJanitor.PRUNE_STATE_TABLE,
+
TxConstants.DataJanitor.DEFAULT_PRUNE_STATE_TABLE));
+LOG.info("Initializing plugin with state table {}", 
stateTable.getNameWithNamespaceInclAsString());
+this.dataJanitorState = new DataJanitorState(new 
DataJanitorState.TableSupplier() {
+  @Override
+  public Table get() throws IOException {
+return connection.getTable(stateTable);
+  }
+});
+  }
+
+  @Override
+  public long fetchPruneUpperBound(long time, long pruneUpperBoundForTime) 
throws IOException {
+LOG.debug("Fetching prune upper bound for time {} and max prune upper 
bound {}", time, pruneUpperBoundForTime);
+if (time < 0 || pruneUpperBoundForTime < 0) {
+  return -1;
+}
+
+SortedSet transactionalRegions = getTransactionalRegions();
+if (!transactionalRegions.isEmpty()) {
+  LOG.debug("Saving {} transactional regions for time {}", 
transactionalRegions.size(), time);
+  dataJanitorState.saveRegionsForTime(time, transactionalRegions);
--- End diff --

I am not sure I understand. Why does fetch() have to save anything? 
Shouldn't it just read (=fetch)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-tephra pull request #20: Compute global prune upper bound using co...

2016-11-28 Thread anew
Github user anew commented on a diff in the pull request:

https://github.com/apache/incubator-tephra/pull/20#discussion_r89938230
  
--- Diff: 
tephra-core/src/main/java/org/apache/tephra/janitor/DataJanitorPlugin.java ---
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.tephra.janitor;
+
+import org.apache.hadoop.conf.Configuration;
+
+import java.io.IOException;
+
+/**
+ * Data janitor interface to manage the invalid transaction list.
+ * There will be one such plugin per data store that will be invoked 
periodically
+ * to fetch the prune upper bound for each data store.
+ * Invalid transaction list will pruned up to the minimum of prune upper 
bounds returned by all the plugins.
+ */
+public interface DataJanitorPlugin {
+  /**
+   * Called once at the beginning to initialize the plugin
+   */
+  void initialize(Configuration conf) throws IOException;
--- End diff --

at the beginning of what? The lifetime of a transaction manager? Or the 
beginning of a prune operation?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-tephra pull request #20: Compute global prune upper bound using co...

2016-11-28 Thread anew
Github user anew commented on a diff in the pull request:

https://github.com/apache/incubator-tephra/pull/20#discussion_r89938297
  
--- Diff: 
tephra-core/src/main/java/org/apache/tephra/janitor/DataJanitorPlugin.java ---
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.tephra.janitor;
+
+import org.apache.hadoop.conf.Configuration;
+
+import java.io.IOException;
+
+/**
+ * Data janitor interface to manage the invalid transaction list.
+ * There will be one such plugin per data store that will be invoked 
periodically
--- End diff --

It would make sense to mention here that this a plugin for the tx manager. 
Its name is a little misleading because the DataJanitor is a coprocessor that 
runs in each region server. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-tephra pull request #20: Compute global prune upper bound using co...

2016-11-28 Thread anew
Github user anew commented on a diff in the pull request:

https://github.com/apache/incubator-tephra/pull/20#discussion_r89939212
  
--- Diff: 
tephra-hbase-compat-1.1-base/src/main/java/org/apache/tephra/hbase/coprocessor/janitor/DefaultDataJanitorPlugin.java
 ---
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.tephra.hbase.coprocessor.janitor;
+
+import com.google.common.base.Function;
+import com.google.common.collect.Iterables;
+import com.google.common.collect.Maps;
+import com.google.common.collect.Sets;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.HRegionInfo;
+import org.apache.hadoop.hbase.HTableDescriptor;
+import org.apache.hadoop.hbase.TableName;
+import org.apache.hadoop.hbase.client.Admin;
+import org.apache.hadoop.hbase.client.Connection;
+import org.apache.hadoop.hbase.client.ConnectionFactory;
+import org.apache.hadoop.hbase.client.Table;
+import org.apache.hadoop.hbase.util.Bytes;
+import org.apache.tephra.TxConstants;
+import org.apache.tephra.hbase.coprocessor.TransactionProcessor;
+import org.apache.tephra.janitor.DataJanitorPlugin;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.SortedSet;
+import java.util.TreeSet;
+
+/**
+ * Default implementation of the {@link DataJanitorPlugin} for HBase
+ */
+@SuppressWarnings("WeakerAccess")
+public class DefaultDataJanitorPlugin implements DataJanitorPlugin {
+  public static final Logger LOG = 
LoggerFactory.getLogger(DefaultDataJanitorPlugin.class);
+
+  protected Configuration conf;
+  protected Connection connection;
+  protected DataJanitorState dataJanitorState;
+
+  @Override
+  public void initialize(Configuration conf) throws IOException {
+this.conf = conf;
+this.connection = ConnectionFactory.createConnection(conf);
+
+final TableName stateTable = 
TableName.valueOf(conf.get(TxConstants.DataJanitor.PRUNE_STATE_TABLE,
+
TxConstants.DataJanitor.DEFAULT_PRUNE_STATE_TABLE));
+LOG.info("Initializing plugin with state table {}", 
stateTable.getNameWithNamespaceInclAsString());
+this.dataJanitorState = new DataJanitorState(new 
DataJanitorState.TableSupplier() {
+  @Override
+  public Table get() throws IOException {
+return connection.getTable(stateTable);
+  }
+});
+  }
+
+  @Override
+  public long fetchPruneUpperBound(long time, long pruneUpperBoundForTime) 
throws IOException {
+LOG.debug("Fetching prune upper bound for time {} and max prune upper 
bound {}", time, pruneUpperBoundForTime);
+if (time < 0 || pruneUpperBoundForTime < 0) {
+  return -1;
+}
+
+SortedSet transactionalRegions = getTransactionalRegions();
+if (!transactionalRegions.isEmpty()) {
+  LOG.debug("Saving {} transactional regions for time {}", 
transactionalRegions.size(), time);
+  dataJanitorState.saveRegionsForTime(time, transactionalRegions);
+  // Save prune upper bound for time as the final step.
+  // We can then use its existence to make sure that the data for a 
given time is complete or not
+  LOG.debug("Saving max prune upper bound {} for time {}", 
pruneUpperBoundForTime, time);
+  dataJanitorState.savePruneUpperBoundForTime(time, 
pruneUpperBoundForTime);
+}
+
+return computePruneUpperBound(new TimeRegions(time, 
transactionalRegions));
+  }
+
+  @Override
+  public void pruneComplete(long time, long pruneUpperBound) throws 
IOException {
+LOG.debug("Prune complete for time {} and prune upper bound {}", time, 
pruneUpperBound);
+// Get regions for given time, so as to not delete them
+TimeRegions regionsToExclude =

[GitHub] incubator-tephra pull request #20: Compute global prune upper bound using co...

2016-11-28 Thread anew
Github user anew commented on a diff in the pull request:

https://github.com/apache/incubator-tephra/pull/20#discussion_r89206461
  
--- Diff: 
tephra-core/src/main/java/org/apache/tephra/janitor/DataJanitorPlugin.java ---
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.tephra.janitor;
+
+import org.apache.hadoop.conf.Configuration;
+
+import java.io.IOException;
+
+/**
+ * Data janitor interface to manage the invalid transaction list.
+ * There will be one such plugin per data store that will be invoked 
periodically
+ * to fetch the prune upper bound for each data store.
+ * Invalid transaction list will pruned up to the minimum of prune upper 
bounds returned by all the plugins.
+ */
+public interface DataJanitorPlugin {
--- End diff --

The naming in not clear. Can we call it a TransactionPruningPlugin? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-tephra pull request #20: Compute global prune upper bound using co...

2016-11-28 Thread anew
Github user anew commented on a diff in the pull request:

https://github.com/apache/incubator-tephra/pull/20#discussion_r89938559
  
--- Diff: 
tephra-core/src/main/java/org/apache/tephra/janitor/DataJanitorPlugin.java ---
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.tephra.janitor;
+
+import org.apache.hadoop.conf.Configuration;
+
+import java.io.IOException;
+
+/**
+ * Data janitor interface to manage the invalid transaction list.
+ * There will be one such plugin per data store that will be invoked 
periodically
+ * to fetch the prune upper bound for each data store.
+ * Invalid transaction list will pruned up to the minimum of prune upper 
bounds returned by all the plugins.
+ */
+public interface DataJanitorPlugin {
+  /**
+   * Called once at the beginning to initialize the plugin
+   */
+  void initialize(Configuration conf) throws IOException;
+
+  /**
+   * Called periodically to fetch prune upper bound for a data store
+   *
+   * @param time start time of this prune iteration
+   * @param pruneUpperBoundForTime upper bound for prune tx id for the 
given start time
+   */
+  long fetchPruneUpperBound(long time, long pruneUpperBoundForTime) throws 
IOException;
+
+  /**
+   * Called after pruning the invalid list.
+   * The plugin can use the pruneUpperBound passed to clean up its state
--- End diff --

is this the same pruneUpperBound that was returned by fetchPruneUpperBound? 
What is the semantic of this - does it mean it is guaranteed that all invalid 
transactions less than that upper bound have been removed from the invalid list?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-tephra pull request #20: Compute global prune upper bound using co...

2016-11-28 Thread anew
Github user anew commented on a diff in the pull request:

https://github.com/apache/incubator-tephra/pull/20#discussion_r89206674
  
--- Diff: 
tephra-core/src/main/java/org/apache/tephra/janitor/DataJanitorPlugin.java ---
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.tephra.janitor;
+
+import org.apache.hadoop.conf.Configuration;
+
+import java.io.IOException;
+
+/**
+ * Data janitor interface to manage the invalid transaction list.
+ * There will be one such plugin per data store that will be invoked 
periodically
+ * to fetch the prune upper bound for each data store.
+ * Invalid transaction list will pruned up to the minimum of prune upper 
bounds returned by all the plugins.
+ */
+public interface DataJanitorPlugin {
+  /**
+   * Called once at the beginning to initialize the plugin
+   */
+  void initialize(Configuration conf) throws IOException;
+
+  /**
+   * Called periodically to fetch prune upper bound for a data store
+   *
+   * @param time start time of this prune iteration
+   * @param pruneUpperBoundForTime upper bound for prune tx id for the 
given start time
+   */
+  long fetchPruneUpperBound(long time, long pruneUpperBoundForTime) throws 
IOException;
+
+  /**
+   * Called after pruning the invalid list.
+   * The plugin can use the pruneUpperBound passed to clean up its state
+   *
+   * @param time start time of this prune iteration
+   * @param pruneUpperBound prune upper bound used to prune the invalid 
list
+   */
+  void pruneComplete(long time, long pruneUpperBound) throws IOException;
+
+  /**
+   * Called once during shutdown
+   */
+  void destroy() throws IOException;
--- End diff --

should destroy() throw exceptions? I mean, normally it should not.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-tephra pull request #20: Compute global prune upper bound using co...

2016-11-28 Thread anew
Github user anew commented on a diff in the pull request:

https://github.com/apache/incubator-tephra/pull/20#discussion_r89938020
  
--- Diff: 
tephra-hbase-compat-1.1-base/src/main/java/org/apache/tephra/hbase/coprocessor/janitor/DefaultDataJanitorPlugin.java
 ---
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.tephra.hbase.coprocessor.janitor;
+
+import com.google.common.base.Function;
+import com.google.common.collect.Iterables;
+import com.google.common.collect.Maps;
+import com.google.common.collect.Sets;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.HRegionInfo;
+import org.apache.hadoop.hbase.HTableDescriptor;
+import org.apache.hadoop.hbase.TableName;
+import org.apache.hadoop.hbase.client.Admin;
+import org.apache.hadoop.hbase.client.Connection;
+import org.apache.hadoop.hbase.client.ConnectionFactory;
+import org.apache.hadoop.hbase.client.Table;
+import org.apache.hadoop.hbase.util.Bytes;
+import org.apache.tephra.TxConstants;
+import org.apache.tephra.hbase.coprocessor.TransactionProcessor;
+import org.apache.tephra.janitor.DataJanitorPlugin;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.SortedSet;
+import java.util.TreeSet;
+
+/**
+ * Default implementation of the {@link DataJanitorPlugin} for HBase
+ */
+@SuppressWarnings("WeakerAccess")
+public class DefaultDataJanitorPlugin implements DataJanitorPlugin {
--- End diff --

I think this is the HBaseJanitorPlugin? And if you had another store, it 
would be entirely different?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---