Table and Family
Hi, all, My understandings about HBase table and its family are as follows. 1) Each table can consist of multiple families; 2) When retrieving with SingleColumnValueFilter, if the family is specified, other families contained in the same table are not affected. Are these claims right? But I got a problem which conflicts with the above understandings. In the following code, even though no any data in the family of ContrivedNeighborStructure.NODE_NEIGHBOR_IN_GROUP_BASICS_FAMILY, the for-loop runs many times if other families has the column of ContrivedNeighborStructure.NODE_NEIGHBOR_IN_GROUP_DC_KEY_COLUMN. Is that normal in HBase? If so, I think it is not a good design. No column overlaps must exist among the families of the same table? Otherwise, retrieving the table must cause waste of scanning loops? Thanks so much! Best wishes, Bing SingleColumnValueFilter dcKeyFilter = new SingleColumnValueFilter(ContrivedNeighborStructure.NODE_NEIGHBOR_IN_GROUP_BASICS_FAMILY, ContrivedNeighborStructure.NODE_NEIGHBOR_IN_GROUP_DC_KEY_COLUMN, CompareFilter.CompareOp.EQUAL, new SubstringComparator(dcKey)); Scan scan = new Scan(); scan.setFilter(dcKeyFilter); scan.setCaching(Parameters.CACHING_SIZE); scan.setBatch(Parameters.BATCHING_SIZE); String qualifier; String hostNodeKey = SocialRole.NO_NODE_KEY; String groupKey = SocialGroup.NO_GROUP_KEY; int timingScale = TimingScale.NO_TIMING_SCALE; String key; try { ResultScanner scanner = this.neighborTable.getScanner(scan); for (Result result : scanner) { for (KeyValue kv : result.raw()) { qualifier = Bytes.toString(kv.getQualifier()); if (qualifier.equals(ContrivedNeighborStructure.NODE_NEIGHBOR_IN_GROUP_HOST_NODE_KEY_STRING_COLUMN)) { hostNodeKey = Bytes.toString(kv.getValue()); } else if (qualifier.equals(ContrivedNeighborStructure.NODE_NEIGHBOR_IN_GROUP_HOST_GROUP_KEY_STRING_COLUMN)) { groupKey = Bytes.toString(kv.getValue()); } else if (qualifier.equals(ContrivedNeighborStructure.NODE_NEIGHBOR_IN_GROUP_TIMING_SCALE_STRING_COLUMN)) { timingScale = Bytes.toInt(kv.getValue()); } } if (!hostNodeKey.equals(SocialRole.NO_NODE_KEY) !groupKey.equals(SocialGroup.NO_GROUP_KEY) timingScale != TimingScale.NO_TIMING_SCALE) { key = Tools.GetKeyOfNode(hostNodeKey, groupKey, timingScale); if (!neighborMap.containsKey(key)) { neighborMap.put(key, new NodeNeighborInGroup(hostNodeKey, groupKey, timingScale)); } } hostNodeKey = SocialRole.NO_NODE_KEY; groupKey = SocialGroup.NO_GROUP_KEY; timingScale = TimingScale.NO_TIMING_SCALE; } } catch (IOException e) { e.printStackTrace(); }
Performance Are Affected? - Table and Family
Dear all, I have one additional question about table and family. A table which has less families is faster than the one which has more families if the amount of data they have is the same? Correct or not? Is it a higher performance design to put fewer families into a table? Thanks so much! Best regards, Bing On Tue, Aug 13, 2013 at 12:31 AM, Stas Maksimov maksi...@gmail.com wrote: Hi there, On your second point, I don't think column family can ever be an optional parameter, so I'm not sure this understanding is correct. Regards, Stas. On 12 August 2013 17:22, Bing Li lbl...@gmail.com wrote: Hi, all, My understandings about HBase table and its family are as follows. 1) Each table can consist of multiple families; 2) When retrieving with SingleColumnValueFilter, if the family is specified, other families contained in the same table are not affected. Are these claims right? But I got a problem which conflicts with the above understandings. In the following code, even though no any data in the family of ContrivedNeighborStructure.NODE_NEIGHBOR_IN_GROUP_BASICS_FAMILY, the for-loop runs many times if other families has the column of ContrivedNeighborStructure.NODE_NEIGHBOR_IN_GROUP_DC_KEY_COLUMN. Is that normal in HBase? If so, I think it is not a good design. No column overlaps must exist among the families of the same table? Otherwise, retrieving the table must cause waste of scanning loops? Thanks so much! Best wishes, Bing SingleColumnValueFilter dcKeyFilter = new SingleColumnValueFilter(ContrivedNeighborStructure.NODE_NEIGHBOR_IN_GROUP_BASICS_FAMILY, ContrivedNeighborStructure.NODE_NEIGHBOR_IN_GROUP_DC_KEY_COLUMN, CompareFilter.CompareOp.EQUAL, new SubstringComparator(dcKey)); Scan scan = new Scan(); scan.setFilter(dcKeyFilter); scan.setCaching(Parameters.CACHING_SIZE); scan.setBatch(Parameters.BATCHING_SIZE); String qualifier; String hostNodeKey = SocialRole.NO_NODE_KEY; String groupKey = SocialGroup.NO_GROUP_KEY; int timingScale = TimingScale.NO_TIMING_SCALE; String key; try { ResultScanner scanner = this.neighborTable.getScanner(scan); for (Result result : scanner) { for (KeyValue kv : result.raw()) { qualifier = Bytes.toString(kv.getQualifier()); if (qualifier.equals(ContrivedNeighborStructure.NODE_NEIGHBOR_IN_GROUP_HOST_NODE_KEY_STRING_COLUMN)) { hostNodeKey = Bytes.toString(kv.getValue()); } else if (qualifier.equals(ContrivedNeighborStructure.NODE_NEIGHBOR_IN_GROUP_HOST_GROUP_KEY_STRING_COLUMN)) { groupKey = Bytes.toString(kv.getValue()); } else if (qualifier.equals(ContrivedNeighborStructure.NODE_NEIGHBOR_IN_GROUP_TIMING_SCALE_STRING_COLUMN)) { timingScale = Bytes.toInt(kv.getValue()); } } if (!hostNodeKey.equals(SocialRole.NO_NODE_KEY) !groupKey.equals(SocialGroup.NO_GROUP_KEY) timingScale != TimingScale.NO_TIMING_SCALE) { key = Tools.GetKeyOfNode(hostNodeKey, groupKey, timingScale); if (!neighborMap.containsKey(key)) { neighborMap.put(key, new NodeNeighborInGroup(hostNodeKey, groupKey, timingScale)); } } hostNodeKey = SocialRole.NO_NODE_KEY; groupKey = SocialGroup.NO_GROUP_KEY; timingScale = TimingScale.NO_TIMING_SCALE; } } catch (IOException e) { e.printStackTrace(); }
Re: Is synchronized required?
. ... public void dispose() { try { this.rankTable.close(); } catch (IOException e) { e.printStackTrace(); } } ... On Wed, Feb 6, 2013 at 1:05 PM, lars hofhansl la...@apache.org wrote: Are you sharing this.rankTable between threads? HTable is not thread safe. -- Lars From: Bing Li lbl...@gmail.com To: hbase-u...@hadoop.apache.org hbase-u...@hadoop.apache.org; user user@hbase.apache.org Sent: Tuesday, February 5, 2013 8:54 AM Subject: Re: Is synchronized required? Dear all, After synchronized is removed from the method of writing, I get the following exceptions when reading. Before the removal, no such exceptions. Could you help me how to solve it? Thanks so much! Best wishes, Bing [java] Feb 6, 2013 12:21:31 AM org.apache.hadoop.hbase.ipc.HBaseClient$Connection run [java] WARNING: Unexpected exception receiving call responses [java] java.lang.NullPointerException [java] at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:521) [java] at org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297) [java] at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:593) [java] at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:505) [java] Feb 6, 2013 12:21:31 AM org.apache.hadoop.hbase.client.ScannerCallable close [java] WARNING: Ignore, probably already closed [java] java.io.IOException: Call to greatfreeweb/127.0.1.1:60020 failed on local exception: java.io.IOException: Unexpected exception receiving call responses [java] at org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:934) [java] at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:903) [java] at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150) [java] at $Proxy6.close(Unknown Source) [java] at org.apache.hadoop.hbase.client.ScannerCallable.close(ScannerCallable.java:112) [java] at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:74) [java] at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:39) [java] at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1325) [java] at org.apache.hadoop.hbase.client.HTable$ClientScanner.nextScanner(HTable.java:1167) [java] at org.apache.hadoop.hbase.client.HTable$ClientScanner.next(HTable.java:1296) [java] at org.apache.hadoop.hbase.client.HTable$ClientScanner$1.hasNext(HTable.java:1356) [java] at com.greatfree.hbase.rank.NodeRankRetriever.LoadNodeGroupNodeRankRowKeys(NodeRankRetriever.java:348) [java] at com.greatfree.ranking.PersistNodeGroupNodeRanksThread.run(PersistNodeGroupNodeRanksThread.java:29) [java] at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [java] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [java] at java.lang.Thread.run(Thread.java:662) [java] Caused by: java.io.IOException: Unexpected exception receiving call responses [java] at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:509) [java] Caused by: java.lang.NullPointerException [java] at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:521) [java] at org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297) [java] at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:593) [java] at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:505) The code that causes the exceptions is as follows. public SetString LoadNodeGroupNodeRankRowKeys(String hostNodeKey, String groupKey, int timingScale) { ListFilter nodeGroupFilterList = new ArrayListFilter(); SingleColumnValueFilter hostNodeKeyFilter = new SingleColumnValueFilter(RankStructure.NODE_GROUP_NODE_RANK_FAMILY, RankStructure.NODE_GROUP_NODE_RANK_HOST_NODE_KEY_COLUMN, CompareFilter.CompareOp.EQUAL, new SubstringComparator(hostNodeKey)); hostNodeKeyFilter.setFilterIfMissing(true); nodeGroupFilterList.add(hostNodeKeyFilter); SingleColumnValueFilter groupKeyFilter = new SingleColumnValueFilter(RankStructure.NODE_GROUP_NODE_RANK_FAMILY, RankStructure.NODE_GROUP_NODE_RANK_GROUP_KEY_COLUMN, CompareFilter.CompareOp.EQUAL, new
Concurrently Reading Still Got Exceptions
Dear all, Some exceptions are raised when I concurrently read data from HBase. The version of HBase I used is 0.92.0. I cannot fix the problem. Could you please help me? Thanks so much! Best wishes, Bing Feb 6, 2013 12:21:31 AM org.apache.hadoop.hbase.ipc.HBaseClient$Connection run WARNING: Unexpected exception receiving call responses java.lang.NullPointerException at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:521) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:593) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:505) Feb 6, 2013 12:21:31 AM org.apache.hadoop.hbase.client.ScannerCallable close WARNING: Ignore, probably already closed java.io.IOException: Call to greatfreeweb/127.0.1.1:60020 failed on local exception: java.io.IOException: Unexpected exception receiving call responses at org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:934) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:903) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150) at $Proxy6.close(Unknown Source) at org.apache.hadoop.hbase.client.ScannerCallable.close(ScannerCallable.java:112) at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:74) at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:39) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1325) at org.apache.hadoop.hbase.client.HTable$ClientScanner.nextScanner(HTable.java:1167) at org.apache.hadoop.hbase.client.HTable$ClientScanner.next(HTable.java:1296) at org.apache.hadoop.hbase.client.HTable$ClientScanner$1.hasNext(HTable.java:1356) at com.greatfree.hbase.rank.NodeRankRetriever.loadNodeGroupNodeRankRowKeys(NodeRankRetriever.java:348) at com.greatfree.ranking.PersistNodeGroupNodeRanksThread.run(PersistNodeGroupNodeRanksThread.java:29) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: Unexpected exception receiving call responses at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:509) Caused by: java.lang.NullPointerException at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:521) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:593) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:505) I read data from HBase concurrently with the following code. ... ExecutorService threadPool = Executors.newFixedThreadPool(100); LoadNodeGroupNodeRankRowKeyThread thread; SetString groupKeys; for (String nodeKey : nodeKeys) { groupKeys = NodeCache.WWW().getGroupKeys(nodeKey); for (String groupKey : groupKeys) { // Threads are initialized and executed here. thread = new LoadNodeGroupNodeRankRowKeyThread(nodeKey, groupKey, TimingScale.PERMANENTLY); threadPool.execute(thread); } } Scanner in = new Scanner(System.in); in.nextLine(); threadPool.shutdownNow(); ... The code of LoadNodeGroupNodeRankRowKeyThread is as follows, ... public void run() { NodeRankRetriever retriever = new NodeRankRetriever(); SetString rowKeys = retriever.loadNodeGroupNodeRankRowKeys(this.hostNodeKey, this.groupKey, this.timingScale); if (rowKeys.size() 0) { for (String rowKey : rowKeys) { System.out.println(rowKey); } } else { System.out.println(No data loaded); } retriever.dispose(); } ... The constructor of NodeRankRetriever() just got an instance of HTable from HTablePool from the following method. ... public HTableInterface getTable(String
Re: Is synchronized required?
Dear Lars, I am now running HBase in the pseudo-distributed mode. The updated HTable constructor also works? Thanks so much! Bing On Wed, Feb 6, 2013 at 3:44 PM, lars hofhansl la...@apache.org wrote: Don't use a pool at all. With HBASE-4805 (https://issues.apache.org/jira/browse/HBASE-4805) you can precreate an HConnection and ExecutorService and then make HTable cheaply on demand every time you need one. Checkout HConnectionManager.createConnection(...) and the HTable constructors. I need to document this somewhere. -- Lars From: Bing Li lbl...@gmail.com To: user user@hbase.apache.org; lars hofhansl la...@apache.org; hbase-u...@hadoop.apache.org hbase-u...@hadoop.apache.org Sent: Tuesday, February 5, 2013 10:36 PM Subject: Re: Is synchronized required? Lars, I found that at least the exceptions have nothing to do with shared HTable. To save the resources, I designed a pool for the classes that write and read from HBase. The primary resources consumed in the classes are HTable. The pool has some bugs. My question is whether it is necessary to design such a pool? Is it fine to create a instance of HTable for each thread? I noticed that HBase has a class, HTablePool. Maybe the pool I designed is NOT required? Thanks so much! Best wishes! Bing On Wed, Feb 6, 2013 at 1:05 PM, lars hofhansl la...@apache.org wrote: Are you sharing this.rankTable between threads? HTable is not thread safe. -- Lars From: Bing Li lbl...@gmail.com To: hbase-u...@hadoop.apache.org hbase-u...@hadoop.apache.org; user user@hbase.apache.org Sent: Tuesday, February 5, 2013 8:54 AM Subject: Re: Is synchronized required? Dear all, After synchronized is removed from the method of writing, I get the following exceptions when reading. Before the removal, no such exceptions. Could you help me how to solve it? Thanks so much! Best wishes, Bing [java] Feb 6, 2013 12:21:31 AM org.apache.hadoop.hbase.ipc.HBaseClient$Connection run [java] WARNING: Unexpected exception receiving call responses [java] java.lang.NullPointerException [java] at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:521) [java] at org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297) [java] at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:593) [java] at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:505) [java] Feb 6, 2013 12:21:31 AM org.apache.hadoop.hbase.client.ScannerCallable close [java] WARNING: Ignore, probably already closed [java] java.io.IOException: Call to greatfreeweb/127.0.1.1:60020 failed on local exception: java.io.IOException: Unexpected exception receiving call responses [java] at org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:934) [java] at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:903) [java] at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150) [java] at $Proxy6.close(Unknown Source) [java] at org.apache.hadoop.hbase.client.ScannerCallable.close(ScannerCallable.java:112) [java] at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:74) [java] at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:39) [java] at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1325) [java] at org.apache.hadoop.hbase.client.HTable$ClientScanner.nextScanner(HTable.java:1167) [java] at org.apache.hadoop.hbase.client.HTable$ClientScanner.next(HTable.java:1296) [java] at org.apache.hadoop.hbase.client.HTable$ClientScanner$1.hasNext(HTable.java:1356) [java] at com.greatfree.hbase.rank.NodeRankRetriever.LoadNodeGroupNodeRankRowKeys(NodeRankRetriever.java:348) [java] at com.greatfree.ranking.PersistNodeGroupNodeRanksThread.run(PersistNodeGroupNodeRanksThread.java:29) [java] at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [java] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [java] at java.lang.Thread.run(Thread.java:662) [java] Caused by: java.io.IOException: Unexpected exception receiving call responses [java] at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:509) [java] Caused by: java.lang.NullPointerException [java] at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:521) [java] at org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297) [java
Re: Is synchronized required?
); scan.setBatch(Parameters.BATCHING_SIZE); SetString rowKeySet = Sets.newHashSet(); try { ResultScanner scanner = this.rankTable.getScanner(scan); for (Result result : scanner) // EXCEPTIONS are raised at this line. { for (KeyValue kv : result.raw()) { rowKeySet.add(Bytes.toString(kv.getRow())); break; } } scanner.close(); } catch (IOException e) { e.printStackTrace(); } return rowKeySet; } On Tue, Feb 5, 2013 at 4:20 AM, Bing Li lbl...@gmail.com wrote: Dear all, When writing data into HBase, sometimes I got exceptions. I guess they might be caused by concurrent writings. But I am not sure. My question is whether it is necessary to put synchronized before the writing methods? The following lines are the sample code. I think the directive, synchronized, must lower the performance of writing. Sometimes concurrent writing is needed in my system. Thanks so much! Best wishes, Bing public synchronized void AddDomainNodeRanks(String domainKey, int timingScale, MapString, Double nodeRankMap) { ListPut puts = new ArrayListPut(); Put domainKeyPut; Put timingScalePut; Put nodeKeyPut; Put rankPut; byte[] domainNodeRankRowKey; for (Map.EntryString, Double nodeRankEntry : nodeRankMap.entrySet()) { domainNodeRankRowKey = Bytes.toBytes(RankStructure.DOMAIN_NODE_RANK_ROW + Tools.GetAHash(domainKey + timingScale + nodeRankEntry.getKey())); domainKeyPut = new Put(domainNodeRankRowKey); domainKeyPut.add(RankStructure.DOMAIN_NODE_RANK_FAMILY, RankStructure.DOMAIN_NODE_RANK_DOMAIN_KEY_COLUMN, Bytes.toBytes(domainKey)); puts.add(domainKeyPut); timingScalePut = new Put(domainNodeRankRowKey); timingScalePut.add(RankStructure.DOMAIN_NODE_RANK_FAMILY, RankStructure.DOMAIN_NODE_RANK_TIMING_SCALE_COLUMN, Bytes.toBytes(timingScale)); puts.add(timingScalePut); nodeKeyPut = new Put(domainNodeRankRowKey); nodeKeyPut.add(RankStructure.DOMAIN_NODE_RANK_FAMILY, RankStructure.DOMAIN_NODE_RANK_NODE_KEY_COLUMN, Bytes.toBytes(nodeRankEntry.getKey())); puts.add(nodeKeyPut); rankPut = new Put(domainNodeRankRowKey); rankPut.add(RankStructure.DOMAIN_NODE_RANK_FAMILY, RankStructure.DOMAIN_NODE_RANK_RANKS_COLUMN, Bytes.toBytes(nodeRankEntry.getValue())); puts.add(rankPut); } try { this.rankTable.put(puts); } catch (IOException e) { e.printStackTrace(); } }
The Exceptions When Concurrently Writing and Reading
Dear all, To raise the performance of writing data into HBase, the synchronized is removed from the writing method. But after synchronized is removed from the method of writing, I get the following exceptions when reading. Before the removal, no such exceptions. Could you help me how to solve it? Thanks so much! Best wishes, Bing Feb 6, 2013 12:21:31 AM org.apache.hadoop.hbase.ipc.HBaseClient$Connection run WARNING: Unexpected exception receiving call responses java.lang.NullPointerException at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:521) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:593) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:505) Feb 6, 2013 12:21:31 AM org.apache.hadoop.hbase.client.ScannerCallable close WARNING: Ignore, probably already closed java.io.IOException: Call to greatfreeweb/127.0.1.1:60020 failed on local exception: java.io.IOException: Unexpected exception receiving call responses at org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:934) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:903) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150) at $Proxy6.close(Unknown Source) at org.apache.hadoop.hbase.client.ScannerCallable.close(ScannerCallable.java:112) at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:74) at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:39) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1325) at org.apache.hadoop.hbase.client.HTable$ClientScanner.nextScanner(HTable.java:1167) at org.apache.hadoop.hbase.client.HTable$ClientScanner.next(HTable.java:1296) at org.apache.hadoop.hbase.client.HTable$ClientScanner$1.hasNext(HTable.java:1356) at com.greatfree.hbase.rank.NodeRankRetriever.LoadNodeGroupNodeRankRowKeys(NodeRankRetriever.java:348) at com.greatfree.ranking.PersistNodeGroupNodeRanksThread.run(PersistNodeGroupNodeRanksThread.java:29) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: Unexpected exception receiving call responses at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:509) Caused by: java.lang.NullPointerException at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:521) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:593) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:505) The writing method is as follows. // The synchronized is removed to raise the performance. // public synchronized void AddNodeViewGroupNodeRanks(String hostNodeKey, String groupKey, int timingScale, MapString, Double groupNodeRankMap) public void AddNodeViewGroupNodeRanks(String hostNodeKey, String groupKey, int timingScale, MapString, Double groupNodeRankMap) { ListPut puts = new ArrayListPut(); Put hostNodeKeyPut; Put groupKeyPut; Put timingScalePut; Put nodeKeyPut; Put rankPut; byte[] groupNodeRankRowKey; for (Map.EntryString, Double nodeRankEntry : groupNodeRankMap.entrySet()) { groupNodeRankRowKey = Bytes.toBytes(...); hostNodeKeyPut = new Put(groupNodeRankRowKey); hostNodeKeyPut.add(...); puts.add(hostNodeKeyPut); .. rankPut = new Put(groupNodeRankRowKey); rankPut.add(...); puts.add(rankPut); } try { this.rankTable.put(puts); } catch (IOException e) { e.printStackTrace(); } } The reading method that causes the exceptions is as follows. public SetString LoadNodeGroupNodeRankRowKeys(String hostNodeKey, String groupKey, int timingScale) { ListFilter nodeGroupFilterList = new ArrayListFilter(); SingleColumnValueFilter hostNodeKeyFilter = new SingleColumnValueFilter(...); hostNodeKeyFilter.setFilterIfMissing(true); nodeGroupFilterList.add(hostNodeKeyFilter); ..
Re: The Exceptions When Concurrently Writing and Reading
Dear Ted, My HBase is 0.92. Thanks! Bing On Wed, Feb 6, 2013 at 2:45 AM, Ted Yu yuzhih...@gmail.com wrote: To help us more easily correlate line numbers, can you tell us the version of HBase you're using ? Thanks On Tue, Feb 5, 2013 at 10:39 AM, Bing Li lbl...@gmail.com wrote: Dear all, To raise the performance of writing data into HBase, the synchronized is removed from the writing method. But after synchronized is removed from the method of writing, I get the following exceptions when reading. Before the removal, no such exceptions. Could you help me how to solve it? Thanks so much! Best wishes, Bing Feb 6, 2013 12:21:31 AM org.apache.hadoop.hbase.ipc.HBaseClient$Connection run WARNING: Unexpected exception receiving call responses java.lang.NullPointerException at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:521) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:593) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:505) Feb 6, 2013 12:21:31 AM org.apache.hadoop.hbase.client.ScannerCallable close WARNING: Ignore, probably already closed java.io.IOException: Call to greatfreeweb/127.0.1.1:60020 failed on local exception: java.io.IOException: Unexpected exception receiving call responses at org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:934) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:903) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150) at $Proxy6.close(Unknown Source) at org.apache.hadoop.hbase.client.ScannerCallable.close(ScannerCallable.java:112) at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:74) at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:39) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1325) at org.apache.hadoop.hbase.client.HTable$ClientScanner.nextScanner(HTable.java:1167) at org.apache.hadoop.hbase.client.HTable$ClientScanner.next(HTable.java:1296) at org.apache.hadoop.hbase.client.HTable$ClientScanner$1.hasNext(HTable.java:1356) at com.greatfree.hbase.rank.NodeRankRetriever.LoadNodeGroupNodeRankRowKeys(NodeRankRetriever.java:348) at com.greatfree.ranking.PersistNodeGroupNodeRanksThread.run(PersistNodeGroupNodeRanksThread.java:29) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: Unexpected exception receiving call responses at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:509) Caused by: java.lang.NullPointerException at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:521) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:593) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:505) The writing method is as follows. // The synchronized is removed to raise the performance. // public synchronized void AddNodeViewGroupNodeRanks(String hostNodeKey, String groupKey, int timingScale, MapString, Double groupNodeRankMap) public void AddNodeViewGroupNodeRanks(String hostNodeKey, String groupKey, int timingScale, MapString, Double groupNodeRankMap) { ListPut puts = new ArrayListPut(); Put hostNodeKeyPut; Put groupKeyPut; Put timingScalePut; Put nodeKeyPut; Put rankPut; byte[] groupNodeRankRowKey; for (Map.EntryString, Double nodeRankEntry : groupNodeRankMap.entrySet()) { groupNodeRankRowKey = Bytes.toBytes(...); hostNodeKeyPut = new Put(groupNodeRankRowKey); hostNodeKeyPut.add(...); puts.add(hostNodeKeyPut); .. rankPut = new Put(groupNodeRankRowKey); rankPut.add(...); puts.add(rankPut); } try { this.rankTable.put(puts); } catch (IOException e) { e.printStackTrace(); } } The reading method that causes the exceptions is as follows. public SetString LoadNodeGroupNodeRankRowKeys(String
Re: The Exceptions When Concurrently Writing and Reading
Ted, The version is 0.92.0. Is it what you need? BTW, now I runs HBase in the pseudo-distributed mode. Thanks! Bing On Wed, Feb 6, 2013 at 2:49 AM, Ted Yu yuzhih...@gmail.com wrote: There're several 0.92 releases, can you be more specific ? Thanks On Tue, Feb 5, 2013 at 10:46 AM, Bing Li lbl...@gmail.com wrote: Dear Ted, My HBase is 0.92. Thanks! Bing On Wed, Feb 6, 2013 at 2:45 AM, Ted Yu yuzhih...@gmail.com wrote: To help us more easily correlate line numbers, can you tell us the version of HBase you're using ? Thanks On Tue, Feb 5, 2013 at 10:39 AM, Bing Li lbl...@gmail.com wrote: Dear all, To raise the performance of writing data into HBase, the synchronized is removed from the writing method. But after synchronized is removed from the method of writing, I get the following exceptions when reading. Before the removal, no such exceptions. Could you help me how to solve it? Thanks so much! Best wishes, Bing Feb 6, 2013 12:21:31 AM org.apache.hadoop.hbase.ipc.HBaseClient$Connection run WARNING: Unexpected exception receiving call responses java.lang.NullPointerException at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:521) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:593) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:505) Feb 6, 2013 12:21:31 AM org.apache.hadoop.hbase.client.ScannerCallable close WARNING: Ignore, probably already closed java.io.IOException: Call to greatfreeweb/127.0.1.1:60020 failed on local exception: java.io.IOException: Unexpected exception receiving call responses at org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:934) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:903) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150) at $Proxy6.close(Unknown Source) at org.apache.hadoop.hbase.client.ScannerCallable.close(ScannerCallable.java:112) at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:74) at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:39) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1325) at org.apache.hadoop.hbase.client.HTable$ClientScanner.nextScanner(HTable.java:1167) at org.apache.hadoop.hbase.client.HTable$ClientScanner.next(HTable.java:1296) at org.apache.hadoop.hbase.client.HTable$ClientScanner$1.hasNext(HTable.java:1356) at com.greatfree.hbase.rank.NodeRankRetriever.LoadNodeGroupNodeRankRowKeys(NodeRankRetriever.java:348) at com.greatfree.ranking.PersistNodeGroupNodeRanksThread.run(PersistNodeGroupNodeRanksThread.java:29) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: Unexpected exception receiving call responses at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:509) Caused by: java.lang.NullPointerException at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:521) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:593) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:505) The writing method is as follows. // The synchronized is removed to raise the performance. // public synchronized void AddNodeViewGroupNodeRanks(String hostNodeKey, String groupKey, int timingScale, MapString, Double groupNodeRankMap) public void AddNodeViewGroupNodeRanks(String hostNodeKey, String groupKey, int timingScale, MapString, Double groupNodeRankMap) { ListPut puts = new ArrayListPut(); Put hostNodeKeyPut; Put groupKeyPut; Put timingScalePut; Put nodeKeyPut; Put rankPut; byte[] groupNodeRankRowKey; for (Map.EntryString, Double nodeRankEntry : groupNodeRankMap.entrySet()) { groupNodeRankRowKey = Bytes.toBytes(...); hostNodeKeyPut = new Put(groupNodeRankRowKey); hostNodeKeyPut.add
Re: The Exceptions When Concurrently Writing and Reading
Dear all, Sorry, I just found that the same exceptions when synchronized is added. Some other problems may exist. I am now checking. Do you have any suggestions? Thanks so much! Best regards, Bing On Wed, Feb 6, 2013 at 3:00 AM, Bing Li lbl...@gmail.com wrote: Ted, The version is 0.92.0. Is it what you need? BTW, now I runs HBase in the pseudo-distributed mode. Thanks! Bing On Wed, Feb 6, 2013 at 2:49 AM, Ted Yu yuzhih...@gmail.com wrote: There're several 0.92 releases, can you be more specific ? Thanks On Tue, Feb 5, 2013 at 10:46 AM, Bing Li lbl...@gmail.com wrote: Dear Ted, My HBase is 0.92. Thanks! Bing On Wed, Feb 6, 2013 at 2:45 AM, Ted Yu yuzhih...@gmail.com wrote: To help us more easily correlate line numbers, can you tell us the version of HBase you're using ? Thanks On Tue, Feb 5, 2013 at 10:39 AM, Bing Li lbl...@gmail.com wrote: Dear all, To raise the performance of writing data into HBase, the synchronized is removed from the writing method. But after synchronized is removed from the method of writing, I get the following exceptions when reading. Before the removal, no such exceptions. Could you help me how to solve it? Thanks so much! Best wishes, Bing Feb 6, 2013 12:21:31 AM org.apache.hadoop.hbase.ipc.HBaseClient$Connection run WARNING: Unexpected exception receiving call responses java.lang.NullPointerException at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:521) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:593) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:505) Feb 6, 2013 12:21:31 AM org.apache.hadoop.hbase.client.ScannerCallable close WARNING: Ignore, probably already closed java.io.IOException: Call to greatfreeweb/127.0.1.1:60020 failed on local exception: java.io.IOException: Unexpected exception receiving call responses at org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:934) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:903) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150) at $Proxy6.close(Unknown Source) at org.apache.hadoop.hbase.client.ScannerCallable.close(ScannerCallable.java:112) at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:74) at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:39) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1325) at org.apache.hadoop.hbase.client.HTable$ClientScanner.nextScanner(HTable.java:1167) at org.apache.hadoop.hbase.client.HTable$ClientScanner.next(HTable.java:1296) at org.apache.hadoop.hbase.client.HTable$ClientScanner$1.hasNext(HTable.java:1356) at com.greatfree.hbase.rank.NodeRankRetriever.LoadNodeGroupNodeRankRowKeys(NodeRankRetriever.java:348) at com.greatfree.ranking.PersistNodeGroupNodeRanksThread.run(PersistNodeGroupNodeRanksThread.java:29) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: Unexpected exception receiving call responses at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:509) Caused by: java.lang.NullPointerException at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:521) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:593) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:505) The writing method is as follows. // The synchronized is removed to raise the performance. // public synchronized void AddNodeViewGroupNodeRanks(String hostNodeKey, String groupKey, int timingScale, MapString, Double groupNodeRankMap) public void AddNodeViewGroupNodeRanks(String hostNodeKey, String groupKey, int timingScale, MapString, Double groupNodeRankMap) { ListPut puts = new ArrayListPut(); Put hostNodeKeyPut; Put groupKeyPut; Put timingScalePut; Put nodeKeyPut; Put rankPut; byte[] groupNodeRankRowKey
Re: Is synchronized required?
Lars, I found that at least the exceptions have nothing to do with shared HTable. To save the resources, I designed a pool for the classes that write and read from HBase. The primary resources consumed in the classes are HTable. The pool has some bugs. My question is whether it is necessary to design such a pool? Is it fine to create a instance of HTable for each thread? I noticed that HBase has a class, HTablePool. Maybe the pool I designed is NOT required? Thanks so much! Best wishes! Bing On Wed, Feb 6, 2013 at 1:05 PM, lars hofhansl la...@apache.org wrote: Are you sharing this.rankTable between threads? HTable is not thread safe. -- Lars From: Bing Li lbl...@gmail.com To: hbase-u...@hadoop.apache.org hbase-u...@hadoop.apache.org; user user@hbase.apache.org Sent: Tuesday, February 5, 2013 8:54 AM Subject: Re: Is synchronized required? Dear all, After synchronized is removed from the method of writing, I get the following exceptions when reading. Before the removal, no such exceptions. Could you help me how to solve it? Thanks so much! Best wishes, Bing [java] Feb 6, 2013 12:21:31 AM org.apache.hadoop.hbase.ipc.HBaseClient$Connection run [java] WARNING: Unexpected exception receiving call responses [java] java.lang.NullPointerException [java] at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:521) [java] at org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297) [java] at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:593) [java] at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:505) [java] Feb 6, 2013 12:21:31 AM org.apache.hadoop.hbase.client.ScannerCallable close [java] WARNING: Ignore, probably already closed [java] java.io.IOException: Call to greatfreeweb/127.0.1.1:60020 failed on local exception: java.io.IOException: Unexpected exception receiving call responses [java] at org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:934) [java] at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:903) [java] at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150) [java] at $Proxy6.close(Unknown Source) [java] at org.apache.hadoop.hbase.client.ScannerCallable.close(ScannerCallable.java:112) [java] at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:74) [java] at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:39) [java] at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1325) [java] at org.apache.hadoop.hbase.client.HTable$ClientScanner.nextScanner(HTable.java:1167) [java] at org.apache.hadoop.hbase.client.HTable$ClientScanner.next(HTable.java:1296) [java] at org.apache.hadoop.hbase.client.HTable$ClientScanner$1.hasNext(HTable.java:1356) [java] at com.greatfree.hbase.rank.NodeRankRetriever.LoadNodeGroupNodeRankRowKeys(NodeRankRetriever.java:348) [java] at com.greatfree.ranking.PersistNodeGroupNodeRanksThread.run(PersistNodeGroupNodeRanksThread.java:29) [java] at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [java] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [java] at java.lang.Thread.run(Thread.java:662) [java] Caused by: java.io.IOException: Unexpected exception receiving call responses [java] at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:509) [java] Caused by: java.lang.NullPointerException [java] at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:521) [java] at org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297) [java] at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:593) [java] at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:505) The code that causes the exceptions is as follows. public SetString LoadNodeGroupNodeRankRowKeys(String hostNodeKey, String groupKey, int timingScale) { ListFilter nodeGroupFilterList = new ArrayListFilter(); SingleColumnValueFilter hostNodeKeyFilter = new SingleColumnValueFilter(RankStructure.NODE_GROUP_NODE_RANK_FAMILY, RankStructure.NODE_GROUP_NODE_RANK_HOST_NODE_KEY_COLUMN, CompareFilter.CompareOp.EQUAL, new SubstringComparator(hostNodeKey)); hostNodeKeyFilter.setFilterIfMissing(true); nodeGroupFilterList.add(hostNodeKeyFilter
Is synchronized required?
Dear all, When writing data into HBase, sometimes I got exceptions. I guess they might be caused by concurrent writings. But I am not sure. My question is whether it is necessary to put synchronized before the writing methods? The following lines are the sample code. I think the directive, synchronized, must lower the performance of writing. Sometimes concurrent writing is needed in my system. Thanks so much! Best wishes, Bing public synchronized void AddDomainNodeRanks(String domainKey, int timingScale, MapString, Double nodeRankMap) { ListPut puts = new ArrayListPut(); Put domainKeyPut; Put timingScalePut; Put nodeKeyPut; Put rankPut; byte[] domainNodeRankRowKey; for (Map.EntryString, Double nodeRankEntry : nodeRankMap.entrySet()) { domainNodeRankRowKey = Bytes.toBytes(RankStructure.DOMAIN_NODE_RANK_ROW + Tools.GetAHash(domainKey + timingScale + nodeRankEntry.getKey())); domainKeyPut = new Put(domainNodeRankRowKey); domainKeyPut.add(RankStructure.DOMAIN_NODE_RANK_FAMILY, RankStructure.DOMAIN_NODE_RANK_DOMAIN_KEY_COLUMN, Bytes.toBytes(domainKey)); puts.add(domainKeyPut); timingScalePut = new Put(domainNodeRankRowKey); timingScalePut.add(RankStructure.DOMAIN_NODE_RANK_FAMILY, RankStructure.DOMAIN_NODE_RANK_TIMING_SCALE_COLUMN, Bytes.toBytes(timingScale)); puts.add(timingScalePut); nodeKeyPut = new Put(domainNodeRankRowKey); nodeKeyPut.add(RankStructure.DOMAIN_NODE_RANK_FAMILY, RankStructure.DOMAIN_NODE_RANK_NODE_KEY_COLUMN, Bytes.toBytes(nodeRankEntry.getKey())); puts.add(nodeKeyPut); rankPut = new Put(domainNodeRankRowKey); rankPut.add(RankStructure.DOMAIN_NODE_RANK_FAMILY, RankStructure.DOMAIN_NODE_RANK_RANKS_COLUMN, Bytes.toBytes(nodeRankEntry.getValue())); puts.add(rankPut); } try { this.rankTable.put(puts); } catch (IOException e) { e.printStackTrace(); } }
Re: Is synchronized required?
Dear Ted and Harsh, I am sorry I didn't keep the exceptions. It occurred many days ago. My current version is 0.92. Now synchronized is removed. Is it correct? I will test if such exceptions are raised. I will let you know. Thanks! Best wishes, Bing On Tue, Feb 5, 2013 at 4:25 AM, Ted Yu yuzhih...@gmail.com wrote: Bing: Use pastebin.com instead of attaching exception report. What version of HBase are you using ? Thanks On Mon, Feb 4, 2013 at 12:21 PM, Harsh J ha...@cloudera.com wrote: What exceptions do you actually receive - can you send them here? Knowing that is key to addressing your issue. On Tue, Feb 5, 2013 at 1:50 AM, Bing Li lbl...@gmail.com wrote: Dear all, When writing data into HBase, sometimes I got exceptions. I guess they might be caused by concurrent writings. But I am not sure. My question is whether it is necessary to put synchronized before the writing methods? The following lines are the sample code. I think the directive, synchronized, must lower the performance of writing. Sometimes concurrent writing is needed in my system. Thanks so much! Best wishes, Bing public synchronized void AddDomainNodeRanks(String domainKey, int timingScale, MapString, Double nodeRankMap) { ListPut puts = new ArrayListPut(); Put domainKeyPut; Put timingScalePut; Put nodeKeyPut; Put rankPut; byte[] domainNodeRankRowKey; for (Map.EntryString, Double nodeRankEntry : nodeRankMap.entrySet()) { domainNodeRankRowKey = Bytes.toBytes(RankStructure.DOMAIN_NODE_RANK_ROW + Tools.GetAHash(domainKey + timingScale + nodeRankEntry.getKey())); domainKeyPut = new Put(domainNodeRankRowKey); domainKeyPut.add(RankStructure.DOMAIN_NODE_RANK_FAMILY, RankStructure.DOMAIN_NODE_RANK_DOMAIN_KEY_COLUMN, Bytes.toBytes(domainKey)); puts.add(domainKeyPut); timingScalePut = new Put(domainNodeRankRowKey); timingScalePut.add(RankStructure.DOMAIN_NODE_RANK_FAMILY, RankStructure.DOMAIN_NODE_RANK_TIMING_SCALE_COLUMN, Bytes.toBytes(timingScale)); puts.add(timingScalePut); nodeKeyPut = new Put(domainNodeRankRowKey); nodeKeyPut.add(RankStructure.DOMAIN_NODE_RANK_FAMILY, RankStructure.DOMAIN_NODE_RANK_NODE_KEY_COLUMN, Bytes.toBytes(nodeRankEntry.getKey())); puts.add(nodeKeyPut); rankPut = new Put(domainNodeRankRowKey); rankPut.add(RankStructure.DOMAIN_NODE_RANK_FAMILY, RankStructure.DOMAIN_NODE_RANK_RANKS_COLUMN, Bytes.toBytes(nodeRankEntry.getValue())); puts.add(rankPut); } try { this.rankTable.put(puts); } catch (IOException e) { e.printStackTrace(); } } -- Harsh J
Re: Is synchronized required?
Dear Nicolas, If using synchronized is required, the performance must be too low, right? Are there any other ways to minimize the synchronization granularity? Thanks so much! Bing On Tue, Feb 5, 2013 at 5:31 AM, Nicolas Liochon nkey...@gmail.com wrote: Yes, HTable is not thread safe, and using synchronized around them could work, but would be implementation dependent. You can have one HTable per request at a reasonable cost since https://issues.apache.org/jira/browse/HBASE-4805. It's seems to be available in 0.92 as well. Cheers, Nicolas On Mon, Feb 4, 2013 at 10:13 PM, Adrien Mogenet adrien.moge...@gmail.comwrote: Beware, HTablePool is not totally thread-safe as well: https://issues.apache.org/jira/browse/HBASE-6651. On Mon, Feb 4, 2013 at 9:42 PM, Haijia Zhou leons...@gmail.com wrote: Hi, Bing, Not sure about your scenario but HTable class is not thread safe for neither reads nor write. If you consider writing/reading from a table in a multiple-threaded way, you can consider using HTablePool. Hope it helps HJ On Mon, Feb 4, 2013 at 3:32 PM, Bing Li lbl...@gmail.com wrote: Dear Ted and Harsh, I am sorry I didn't keep the exceptions. It occurred many days ago. My current version is 0.92. Now synchronized is removed. Is it correct? I will test if such exceptions are raised. I will let you know. Thanks! Best wishes, Bing On Tue, Feb 5, 2013 at 4:25 AM, Ted Yu yuzhih...@gmail.com wrote: Bing: Use pastebin.com instead of attaching exception report. What version of HBase are you using ? Thanks On Mon, Feb 4, 2013 at 12:21 PM, Harsh J ha...@cloudera.com wrote: What exceptions do you actually receive - can you send them here? Knowing that is key to addressing your issue. On Tue, Feb 5, 2013 at 1:50 AM, Bing Li lbl...@gmail.com wrote: Dear all, When writing data into HBase, sometimes I got exceptions. I guess they might be caused by concurrent writings. But I am not sure. My question is whether it is necessary to put synchronized before the writing methods? The following lines are the sample code. I think the directive, synchronized, must lower the performance of writing. Sometimes concurrent writing is needed in my system. Thanks so much! Best wishes, Bing public synchronized void AddDomainNodeRanks(String domainKey, int timingScale, MapString, Double nodeRankMap) { ListPut puts = new ArrayListPut(); Put domainKeyPut; Put timingScalePut; Put nodeKeyPut; Put rankPut; byte[] domainNodeRankRowKey; for (Map.EntryString, Double nodeRankEntry : nodeRankMap.entrySet()) { domainNodeRankRowKey = Bytes.toBytes(RankStructure.DOMAIN_NODE_RANK_ROW + Tools.GetAHash(domainKey + timingScale + nodeRankEntry.getKey())); domainKeyPut = new Put(domainNodeRankRowKey); domainKeyPut.add(RankStructure.DOMAIN_NODE_RANK_FAMILY, RankStructure.DOMAIN_NODE_RANK_DOMAIN_KEY_COLUMN, Bytes.toBytes(domainKey)); puts.add(domainKeyPut); timingScalePut = new Put(domainNodeRankRowKey); timingScalePut.add(RankStructure.DOMAIN_NODE_RANK_FAMILY, RankStructure.DOMAIN_NODE_RANK_TIMING_SCALE_COLUMN, Bytes.toBytes(timingScale)); puts.add(timingScalePut); nodeKeyPut = new Put(domainNodeRankRowKey); nodeKeyPut.add(RankStructure.DOMAIN_NODE_RANK_FAMILY, RankStructure.DOMAIN_NODE_RANK_NODE_KEY_COLUMN, Bytes.toBytes(nodeRankEntry.getKey())); puts.add(nodeKeyPut); rankPut = new Put(domainNodeRankRowKey); rankPut.add(RankStructure.DOMAIN_NODE_RANK_FAMILY, RankStructure.DOMAIN_NODE_RANK_RANKS_COLUMN, Bytes.toBytes(nodeRankEntry.getValue())); puts.add(rankPut); } try { this.rankTable.put(puts); } catch (IOException e) { e.printStackTrace(); } } -- Harsh J -- Adrien Mogenet 06.59.16.64.22 http://www.mogenet.me
Pseudo-Distributed Mode Multi-Thread Accessing
Dear all, Pseudo-distributed mode is still used since I am still coding. When scanning a table, I noticed that a single thread was much faster than each one in a multi-threads module. For example, the following method can be done in 2 or 3ms with a single thread. If 30 threads execute the method together, it takes each thread about 150ms to execute. Each of the threads can get a HTableInterface from HTablePool. So I think the performance should not be so low. Maybe the pseudo-distributed mode causes the problem? Thanks so much! Best regards, Bing public SetString GetOutgoingHHNeighborKeys(String hubKey, String groupKey, int timingScale) { ListFilter hhNeighborFilterList = new ArrayListFilter(); SingleColumnValueFilter hubKeyFilter = new SingleColumnValueFilter(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY, NeighborStructure.HUB_HUB_NEIGHBOR_HUB_KEY_COLUMN, CompareFilter.CompareOp.EQUAL, new SubstringComparator(hubKey)); hubKeyFilter.setFilterIfMissing(true); hhNeighborFilterList.add(hubKeyFilter); SingleColumnValueFilter groupKeyFilter = new SingleColumnValueFilter(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY, NeighborStructure.HUB_HUB_NEIGHBOR_GROUP_KEY_COLUMN, CompareFilter.CompareOp.EQUAL, new SubstringComparator(groupKey)); groupKeyFilter.setFilterIfMissing(true); hhNeighborFilterList.add(groupKeyFilter); SingleColumnValueFilter timingScaleFilter = new SingleColumnValueFilter(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY, NeighborStructure.HUB_HUB_NEIGHBOR_TIMING_SCALE_COLUMN, CompareFilter.CompareOp.EQUAL, new BinaryComparator(Bytes.toBytes(timingScale))); timingScaleFilter.setFilterIfMissing(true); hhNeighborFilterList.add(timingScaleFilter); FilterList hhNeighborFilter = new FilterList(hhNeighborFilterList); Scan scan = new Scan(); scan.setFilter(hhNeighborFilter); scan.setCaching(Parameters.CACHING_SIZE); scan.setBatch(Parameters.BATCHING_SIZE); SetString neighborKeySet = Sets.newHashSet(); String qualifier; try { ResultScanner scanner = this.neighborTable.getScanner(scan); for (Result result : scanner) { for (KeyValue kv : result.raw()) { qualifier = Bytes.toString(kv.getQualifier()); if (qualifier.equals(NeighborStructure.HUB_HUB_NEIGHBOR_NODE_KEY_STRING_COLUMN)) { neighborKeySet.add(Bytes.toString(kv.getValue())); } } } scanner.close(); } catch (IOException e) { e.printStackTrace(); } return neighborKeySet; }
Re: Is it correct and required to keep consistency this way?
Dear Jieshan, Thanks so much for your reply! Now locking is not set on the reading methods in my system. It seems to be fine with that. But I noticed exceptions when no locking was put on the writing method. If multiple threads are writing to HBase concurrently, do you think it is safe without locking? Best regards, Bing On Thu, Sep 20, 2012 at 10:22 AM, Bijieshan bijies...@huawei.com wrote: You can avoid read write running parallel from your application level, if I read your mail correctly. You can use ReentrantReadWriteLock if your intention is like that. But it's not recommended. HBase has its own mechanism(MVCC) to manage the read/write consistency. When we start a scanning, the latest data has not committed by MVCC may not be visible(According to our configuration). Jieshan -Original Message- From: Bing Li [mailto:lbl...@gmail.com] Sent: Thursday, September 20, 2012 10:02 AM To: hbase-u...@hadoop.apache.org; user Subject: Is it correct and required to keep consistency this way? Dear all, Sorry to send the email multiple times! An error in the previous email is corrected. I am not exactly sure if it is correct and required to keep consistency as follows when saving and reading from HBase? Your help is highly appreciated. Best regards, Bing // Writing public void AddOutgoingNeighbor(String hostNodeKey, String groupKey, int timingScale, String neighborKey) { ListPut puts = new ArrayListPut(); Put hostNodeKeyPut; Put groupKeyPut; Put topGroupKeyPut; Put timingScalePut; Put neighborKeyPut; byte[] outgoingRowKey = Bytes.toBytes(NeighborStructure.NODE_OUTGOING_NEIGHBOR_ROW + Tools.GetAHash(hostNodeKey + groupKey + timingScale + neighborKey)); hostNodeKeyPut = new Put(outgoingRowKey); hostNodeKeyPut.add(NeighborStructure.NODE_OUTGOING_NEIGHBOR_FAMILY, NeighborStructure.NODE_OUTGOING_NEIGHBOR_HOST_NODE_KEY_COLUMN, Bytes.toBytes(hostNodeKey)); puts.add(hostNodeKeyPut); groupKeyPut = new Put(outgoingRowKey); groupKeyPut.add(NeighborStructure.NODE_OUTGOING_NEIGHBOR_FAMILY, NeighborStructure.NODE_OUTGOING_NEIGHBOR_GROUP_KEY_COLUMN, Bytes.toBytes(groupKey)); puts.add(groupKeyPut); topGroupKeyPut = new Put(outgoingRowKey); topGroupKeyPut.add(NeighborStructure.NODE_OUTGOING_NEIGHBOR_FAMILY, NeighborStructure.NODE_OUTGOING_NEIGHBOR_TOP_GROUP_KEY_COLUMN, Bytes.toBytes(GroupRegistry.WWW().GetParentGroupKey(groupKey))); puts.add(topGroupKeyPut); timingScalePut = new Put(outgoingRowKey); timingScalePut.add(NeighborStructure.NODE_OUTGOING_NEIGHBOR_FAMILY, NeighborStructure.NODE_OUTGOING_NEIGHBOR_TIMING_SCALE_COLUMN, Bytes.toBytes(timingScale)); puts.add(timingScalePut); neighborKeyPut = new Put(outgoingRowKey); neighborKeyPut.add(NeighborStructure.NODE_OUTGOING_NEIGHBOR_FAMILY, NeighborStructure.NODE_OUTGOING_NEIGHBOR_NEIGHBOR_KEY_COLUMN, Bytes.toBytes(neighborKey)); puts.add(neighborKeyPut); try { // Locking is here this.lock.writeLock().lock(); this.neighborTable.put(puts); this.lock.writeLock().unlock(); } catch (IOException e) { e.printStackTrace(); } } // Reading public SetString GetOutgoingNeighborKeys(String hostNodeKey, int timingScale) { ListFilter outgoingNeighborsList = new ArrayListFilter(); SingleColumnValueFilter hostNodeKeyFilter = new SingleColumnValueFilter(NeighborStructure.NODE_OUTGOING_NEIGHBOR_FAMILY, NeighborStructure.NODE_OUTGOING_NEIGHBOR_HOST_NODE_KEY_COLUMN, CompareFilter.CompareOp.EQUAL, new SubstringComparator(hostNodeKey)); hostNodeKeyFilter.setFilterIfMissing(true); outgoingNeighborsList.add(hostNodeKeyFilter); SingleColumnValueFilter timingScaleFilter = new SingleColumnValueFilter(NeighborStructure.NODE_OUTGOING_NEIGHBOR_FAMILY, NeighborStructure.NODE_OUTGOING_NEIGHBOR_TIMING_SCALE_COLUMN, CompareFilter.CompareOp.EQUAL, new BinaryComparator(Bytes.toBytes(timingScale))); timingScaleFilter.setFilterIfMissing(true); outgoingNeighborsList.add(timingScaleFilter); FilterList outgoingNeighborFilter = new FilterList(outgoingNeighborsList); Scan scan = new Scan(); scan.setFilter(outgoingNeighborFilter); scan.setCaching(Parameters.CACHING_SIZE); scan.setBatch(Parameters.BATCHING_SIZE); String qualifier; SetString
Re: Is it correct and required to keep consistency this way?
Sorry, I didn't keep the exceptions. I will post the exceptions if I get them again. But after putting synchronized on the writing methods, the exceptions are gone. I am a little confused. HTable must be the interface to write/read data from HBase. If it is not safe, it means locking must be set as what is shown in my code, doesn't it? Thanks so much! Bing On Thu, Sep 20, 2012 at 11:00 AM, Bijieshan bijies...@huawei.com wrote: Yes. It should be safe. What you need to pay attention is HTable is not thread safe. What are the exceptions? Jieshan -Original Message- From: Bing Li [mailto:lbl...@gmail.com] Sent: Thursday, September 20, 2012 10:52 AM To: user@hbase.apache.org Cc: hbase-u...@hadoop.apache.org; Zhouxunmiao Subject: Re: Is it correct and required to keep consistency this way? Dear Jieshan, Thanks so much for your reply! Now locking is not set on the reading methods in my system. It seems to be fine with that. But I noticed exceptions when no locking was put on the writing method. If multiple threads are writing to HBase concurrently, do you think it is safe without locking? Best regards, Bing On Thu, Sep 20, 2012 at 10:22 AM, Bijieshan bijies...@huawei.com wrote: You can avoid read write running parallel from your application level, if I read your mail correctly. You can use ReentrantReadWriteLock if your intention is like that. But it's not recommended. HBase has its own mechanism(MVCC) to manage the read/write consistency. When we start a scanning, the latest data has not committed by MVCC may not be visible(According to our configuration). Jieshan -Original Message- From: Bing Li [mailto:lbl...@gmail.com] Sent: Thursday, September 20, 2012 10:02 AM To: hbase-u...@hadoop.apache.org; user Subject: Is it correct and required to keep consistency this way? Dear all, Sorry to send the email multiple times! An error in the previous email is corrected. I am not exactly sure if it is correct and required to keep consistency as follows when saving and reading from HBase? Your help is highly appreciated. Best regards, Bing // Writing public void AddOutgoingNeighbor(String hostNodeKey, String groupKey, int timingScale, String neighborKey) { ListPut puts = new ArrayListPut(); Put hostNodeKeyPut; Put groupKeyPut; Put topGroupKeyPut; Put timingScalePut; Put neighborKeyPut; byte[] outgoingRowKey = Bytes.toBytes(NeighborStructure.NODE_OUTGOING_NEIGHBOR_ROW + Tools.GetAHash(hostNodeKey + groupKey + timingScale + neighborKey)); hostNodeKeyPut = new Put(outgoingRowKey); hostNodeKeyPut.add(NeighborStructure.NODE_OUTGOING_NEIGHBOR_FAMILY, NeighborStructure.NODE_OUTGOING_NEIGHBOR_HOST_NODE_KEY_COLUMN, Bytes.toBytes(hostNodeKey)); puts.add(hostNodeKeyPut); groupKeyPut = new Put(outgoingRowKey); groupKeyPut.add(NeighborStructure.NODE_OUTGOING_NEIGHBOR_FAMILY, NeighborStructure.NODE_OUTGOING_NEIGHBOR_GROUP_KEY_COLUMN, Bytes.toBytes(groupKey)); puts.add(groupKeyPut); topGroupKeyPut = new Put(outgoingRowKey); topGroupKeyPut.add(NeighborStructure.NODE_OUTGOING_NEIGHBOR_FAMILY, NeighborStructure.NODE_OUTGOING_NEIGHBOR_TOP_GROUP_KEY_COLUMN, Bytes.toBytes(GroupRegistry.WWW().GetParentGroupKey(groupKey))); puts.add(topGroupKeyPut); timingScalePut = new Put(outgoingRowKey); timingScalePut.add(NeighborStructure.NODE_OUTGOING_NEIGHBOR_FAMILY, NeighborStructure.NODE_OUTGOING_NEIGHBOR_TIMING_SCALE_COLUMN, Bytes.toBytes(timingScale)); puts.add(timingScalePut); neighborKeyPut = new Put(outgoingRowKey); neighborKeyPut.add(NeighborStructure.NODE_OUTGOING_NEIGHBOR_FAMILY, NeighborStructure.NODE_OUTGOING_NEIGHBOR_NEIGHBOR_KEY_COLUMN, Bytes.toBytes(neighborKey)); puts.add(neighborKeyPut); try { // Locking is here this.lock.writeLock().lock(); this.neighborTable.put(puts); this.lock.writeLock().unlock(); } catch (IOException e) { e.printStackTrace(); } } // Reading public SetString GetOutgoingNeighborKeys(String hostNodeKey, int timingScale) { ListFilter outgoingNeighborsList = new ArrayListFilter(); SingleColumnValueFilter hostNodeKeyFilter = new SingleColumnValueFilter(NeighborStructure.NODE_OUTGOING_NEIGHBOR_FAMILY, NeighborStructure.NODE_OUTGOING_NEIGHBOR_HOST_NODE_KEY_COLUMN, CompareFilter.CompareOp.EQUAL, new
Re: Is it correct and required to keep consistency this way?
Jieshan, Thanks! HTablePool is used in my system. Best, Bing On Thu, Sep 20, 2012 at 11:19 AM, Bijieshan bijies...@huawei.com wrote: If it is not safe, it means locking must be set as what is shown in my code, doesn't it? You should not use one HTableInterface instance in multi-threads(Sharing one HTableInterface in multi-threads + Lock will degrade the performance). There are 2 options: 1. Create one HTableInterface instance in each thread. 2. Use HTablePool to get HTableInnterface. See http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTablePool.html . Hope it helps. Jieshan. -Original Message- From: Bing Li [mailto:lbl...@gmail.com] Sent: Thursday, September 20, 2012 11:07 AM To: user@hbase.apache.org Cc: hbase-u...@hadoop.apache.org; Zhouxunmiao Subject: Re: Is it correct and required to keep consistency this way? Sorry, I didn't keep the exceptions. I will post the exceptions if I get them again. But after putting synchronized on the writing methods, the exceptions are gone. I am a little confused. HTable must be the interface to write/read data from HBase. If it is not safe, it means locking must be set as what is shown in my code, doesn't it? Thanks so much! Bing On Thu, Sep 20, 2012 at 11:00 AM, Bijieshan bijies...@huawei.com wrote: Yes. It should be safe. What you need to pay attention is HTable is not thread safe. What are the exceptions? Jieshan -Original Message- From: Bing Li [mailto:lbl...@gmail.com] Sent: Thursday, September 20, 2012 10:52 AM To: user@hbase.apache.org Cc: hbase-u...@hadoop.apache.org; Zhouxunmiao Subject: Re: Is it correct and required to keep consistency this way? Dear Jieshan, Thanks so much for your reply! Now locking is not set on the reading methods in my system. It seems to be fine with that. But I noticed exceptions when no locking was put on the writing method. If multiple threads are writing to HBase concurrently, do you think it is safe without locking? Best regards, Bing On Thu, Sep 20, 2012 at 10:22 AM, Bijieshan bijies...@huawei.com wrote: You can avoid read write running parallel from your application level, if I read your mail correctly. You can use ReentrantReadWriteLock if your intention is like that. But it's not recommended. HBase has its own mechanism(MVCC) to manage the read/write consistency. When we start a scanning, the latest data has not committed by MVCC may not be visible(According to our configuration). Jieshan -Original Message- From: Bing Li [mailto:lbl...@gmail.com] Sent: Thursday, September 20, 2012 10:02 AM To: hbase-u...@hadoop.apache.org; user Subject: Is it correct and required to keep consistency this way? Dear all, Sorry to send the email multiple times! An error in the previous email is corrected. I am not exactly sure if it is correct and required to keep consistency as follows when saving and reading from HBase? Your help is highly appreciated. Best regards, Bing // Writing public void AddOutgoingNeighbor(String hostNodeKey, String groupKey, int timingScale, String neighborKey) { ListPut puts = new ArrayListPut(); Put hostNodeKeyPut; Put groupKeyPut; Put topGroupKeyPut; Put timingScalePut; Put neighborKeyPut; byte[] outgoingRowKey = Bytes.toBytes(NeighborStructure.NODE_OUTGOING_NEIGHBOR_ROW + Tools.GetAHash(hostNodeKey + groupKey + timingScale + neighborKey)); hostNodeKeyPut = new Put(outgoingRowKey); hostNodeKeyPut.add(NeighborStructure.NODE_OUTGOING_NEIGHBOR_FAMILY, NeighborStructure.NODE_OUTGOING_NEIGHBOR_HOST_NODE_KEY_COLUMN, Bytes.toBytes(hostNodeKey)); puts.add(hostNodeKeyPut); groupKeyPut = new Put(outgoingRowKey); groupKeyPut.add(NeighborStructure.NODE_OUTGOING_NEIGHBOR_FAMILY, NeighborStructure.NODE_OUTGOING_NEIGHBOR_GROUP_KEY_COLUMN, Bytes.toBytes(groupKey)); puts.add(groupKeyPut); topGroupKeyPut = new Put(outgoingRowKey); topGroupKeyPut.add(NeighborStructure.NODE_OUTGOING_NEIGHBOR_FAMILY, NeighborStructure.NODE_OUTGOING_NEIGHBOR_TOP_GROUP_KEY_COLUMN, Bytes.toBytes(GroupRegistry.WWW().GetParentGroupKey(groupKey))); puts.add(topGroupKeyPut); timingScalePut = new Put(outgoingRowKey); timingScalePut.add(NeighborStructure.NODE_OUTGOING_NEIGHBOR_FAMILY, NeighborStructure.NODE_OUTGOING_NEIGHBOR_TIMING_SCALE_COLUMN, Bytes.toBytes(timingScale)); puts.add(timingScalePut); neighborKeyPut = new Put(outgoingRowKey); neighborKeyPut.add(NeighborStructure.NODE_OUTGOING_NEIGHBOR_FAMILY
HBase Is So Slow To Save Data?
Dear all, According to my experiences, it is very slow for HBase to save data? Am I right? For example, today I need to save data in a HashMap to HBase. It took about more than three hours. However when saving the same HashMap in a file in the text format with the redirected System.out, it took only 4.5 seconds! Why is HBase so slow? It is indexing? My code to save data in HBase is as follows. I think the code must be correct. .. public synchronized void AddVirtualOutgoingHHNeighbors(ConcurrentHashMapString, ConcurrentHashMapString, SetString hhOutNeighborMap, int timingScale) { ListPut puts = new ArrayListPut(); String hhNeighborRowKey; Put hubKeyPut; Put groupKeyPut; Put topGroupKeyPut; Put timingScalePut; Put nodeKeyPut; Put hubNeighborTypePut; for (Map.EntryString, ConcurrentHashMapString, SetString sourceHubGroupNeighborEntry : hhOutNeighborMap.entrySet()) { for (Map.EntryString, SetString groupNeighborEntry : sourceHubGroupNeighborEntry.getValue().entrySet()) { for (String neighborKey : groupNeighborEntry.getValue()) { hhNeighborRowKey = NeighborStructure.HUB_HUB_NEIGHBOR_ROW + Tools.GetAHash(sourceHubGroupNeighborEntry.getKey() + groupNeighborEntry.getKey() + timingScale + neighborKey); hubKeyPut = new Put(Bytes.toBytes(hhNeighborRowKey)); hubKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY), Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_HUB_KEY_COLUMN), Bytes.toBytes(sourceHubGroupNeighborEntry.getKey())); puts.add(hubKeyPut); groupKeyPut = new Put(Bytes.toBytes(hhNeighborRowKey)); groupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY), Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_GROUP_KEY_COLUMN), Bytes.toBytes(groupNeighborEntry.getKey())); puts.add(groupKeyPut); topGroupKeyPut = new Put(Bytes.toBytes(hhNeighborRowKey)); topGroupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY), Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TOP_GROUP_KEY_COLUMN), Bytes.toBytes(GroupRegistry.WWW().GetParentGroupKey(groupNeighborEntry.getKey(; puts.add(topGroupKeyPut); timingScalePut = new Put(Bytes.toBytes(hhNeighborRowKey)); timingScalePut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY), Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TIMING_SCALE_COLUMN), Bytes.toBytes(timingScale)); puts.add(timingScalePut); nodeKeyPut = new Put(Bytes.toBytes(hhNeighborRowKey)); nodeKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY), Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_NODE_KEY_COLUMN), Bytes.toBytes(neighborKey)); puts.add(nodeKeyPut); hubNeighborTypePut = new Put(Bytes.toBytes(hhNeighborRowKey)); hubNeighborTypePut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY), Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TYPE_COLUMN), Bytes.toBytes(SocialRole.VIRTUAL_NEIGHBOR)); puts.add(hubNeighborTypePut); } } } try { this.neighborTable.put(puts); } catch (IOException e) { e.printStackTrace(); } } .. Thanks so much! Best regards, Bing
Re: HBase Is So Slow To Save Data?
Dear all, By the way, my HBase is in the pseudo-distributed mode. Thanks! Best regards, Bing On Wed, Aug 29, 2012 at 10:04 PM, Bing Li lbl...@gmail.com wrote: Dear all, According to my experiences, it is very slow for HBase to save data? Am I right? For example, today I need to save data in a HashMap to HBase. It took about more than three hours. However when saving the same HashMap in a file in the text format with the redirected System.out, it took only 4.5 seconds! Why is HBase so slow? It is indexing? My code to save data in HBase is as follows. I think the code must be correct. .. public synchronized void AddVirtualOutgoingHHNeighbors(ConcurrentHashMapString, ConcurrentHashMapString, SetString hhOutNeighborMap, int timingScale) { ListPut puts = new ArrayListPut(); String hhNeighborRowKey; Put hubKeyPut; Put groupKeyPut; Put topGroupKeyPut; Put timingScalePut; Put nodeKeyPut; Put hubNeighborTypePut; for (Map.EntryString, ConcurrentHashMapString, SetString sourceHubGroupNeighborEntry : hhOutNeighborMap.entrySet()) { for (Map.EntryString, SetString groupNeighborEntry : sourceHubGroupNeighborEntry.getValue().entrySet()) { for (String neighborKey : groupNeighborEntry.getValue()) { hhNeighborRowKey = NeighborStructure.HUB_HUB_NEIGHBOR_ROW + Tools.GetAHash(sourceHubGroupNeighborEntry.getKey() + groupNeighborEntry.getKey() + timingScale + neighborKey); hubKeyPut = new Put(Bytes.toBytes(hhNeighborRowKey)); hubKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY), Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_HUB_KEY_COLUMN), Bytes.toBytes(sourceHubGroupNeighborEntry.getKey())); puts.add(hubKeyPut); groupKeyPut = new Put(Bytes.toBytes(hhNeighborRowKey)); groupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY), Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_GROUP_KEY_COLUMN), Bytes.toBytes(groupNeighborEntry.getKey())); puts.add(groupKeyPut); topGroupKeyPut = new Put(Bytes.toBytes(hhNeighborRowKey)); topGroupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY), Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TOP_GROUP_KEY_COLUMN), Bytes.toBytes(GroupRegistry.WWW().GetParentGroupKey(groupNeighborEntry.getKey(; puts.add(topGroupKeyPut); timingScalePut = new Put(Bytes.toBytes(hhNeighborRowKey)); timingScalePut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY), Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TIMING_SCALE_COLUMN), Bytes.toBytes(timingScale)); puts.add(timingScalePut); nodeKeyPut = new Put(Bytes.toBytes(hhNeighborRowKey)); nodeKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY), Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_NODE_KEY_COLUMN), Bytes.toBytes(neighborKey)); puts.add(nodeKeyPut); hubNeighborTypePut = new Put(Bytes.toBytes(hhNeighborRowKey)); hubNeighborTypePut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY), Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TYPE_COLUMN), Bytes.toBytes(SocialRole.VIRTUAL_NEIGHBOR)); puts.add(hubNeighborTypePut); } } } try { this.neighborTable.put(puts); } catch (IOException e) { e.printStackTrace(); } } .. Thanks so much! Best regards, Bing
Re: HBase Is So Slow To Save Data?
Dear N Keywal, Thanks so much for your reply! The total amount of data is about 110M. The available memory is enough, 2G. In Java, I just set a collection to NULL to collect garbage. Do you think it is fine? Best regards, Bing On Wed, Aug 29, 2012 at 11:22 PM, N Keywal nkey...@gmail.com wrote: Hi Bing, You should expect HBase to be slower in the generic case: 1) it writes much more data (see hbase data model), with extra columns qualifiers, timestamps so on. 2) the data is written multiple times: once in the write-ahead-log, once per replica on datanode so on again. 3) there are inter process calls inter machine calls on the critical path. This is the cost of the atomicity, reliability and scalability features. With these features in mind, HBase is reasonably fast to save data on a cluster. On your specific case (without the points 2 3 above), the performance seems to be very bad. You should first look at: - how much is spent in the put vs. preparing the list - do you have garbage collection going on? even swap? - what's the size of your final Array vs. the available memory? Cheers, N. On Wed, Aug 29, 2012 at 4:08 PM, Bing Li lbl...@gmail.com wrote: Dear all, By the way, my HBase is in the pseudo-distributed mode. Thanks! Best regards, Bing On Wed, Aug 29, 2012 at 10:04 PM, Bing Li lbl...@gmail.com wrote: Dear all, According to my experiences, it is very slow for HBase to save data? Am I right? For example, today I need to save data in a HashMap to HBase. It took about more than three hours. However when saving the same HashMap in a file in the text format with the redirected System.out, it took only 4.5 seconds! Why is HBase so slow? It is indexing? My code to save data in HBase is as follows. I think the code must be correct. .. public synchronized void AddVirtualOutgoingHHNeighbors(ConcurrentHashMapString, ConcurrentHashMapString, SetString hhOutNeighborMap, int timingScale) { ListPut puts = new ArrayListPut(); String hhNeighborRowKey; Put hubKeyPut; Put groupKeyPut; Put topGroupKeyPut; Put timingScalePut; Put nodeKeyPut; Put hubNeighborTypePut; for (Map.EntryString, ConcurrentHashMapString, SetString sourceHubGroupNeighborEntry : hhOutNeighborMap.entrySet()) { for (Map.EntryString, SetString groupNeighborEntry : sourceHubGroupNeighborEntry.getValue().entrySet()) { for (String neighborKey : groupNeighborEntry.getValue()) { hhNeighborRowKey = NeighborStructure.HUB_HUB_NEIGHBOR_ROW + Tools.GetAHash(sourceHubGroupNeighborEntry.getKey() + groupNeighborEntry.getKey() + timingScale + neighborKey); hubKeyPut = new Put(Bytes.toBytes(hhNeighborRowKey)); hubKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY), Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_HUB_KEY_COLUMN), Bytes.toBytes(sourceHubGroupNeighborEntry.getKey())); puts.add(hubKeyPut); groupKeyPut = new Put(Bytes.toBytes(hhNeighborRowKey)); groupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY), Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_GROUP_KEY_COLUMN), Bytes.toBytes(groupNeighborEntry.getKey())); puts.add(groupKeyPut); topGroupKeyPut = new Put(Bytes.toBytes(hhNeighborRowKey)); topGroupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY), Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TOP_GROUP_KEY_COLUMN), Bytes.toBytes(GroupRegistry.WWW().GetParentGroupKey(groupNeighborEntry.getKey(; puts.add(topGroupKeyPut); timingScalePut = new Put(Bytes.toBytes(hhNeighborRowKey)); timingScalePut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY), Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TIMING_SCALE_COLUMN), Bytes.toBytes(timingScale)); puts.add(timingScalePut); nodeKeyPut = new Put(Bytes.toBytes(hhNeighborRowKey)); nodeKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY), Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_NODE_KEY_COLUMN), Bytes.toBytes(neighborKey)); puts.add(nodeKeyPut); hubNeighborTypePut = new Put(Bytes.toBytes(hhNeighborRowKey)); hubNeighborTypePut.add
Re: HBase Is So Slow To Save Data?
I see. Thanks so much! Bing On Wed, Aug 29, 2012 at 11:59 PM, N Keywal nkey...@gmail.com wrote: It's not useful here: if you have a memory issue, it's when your using the list, not when you have finished with it and set it to null. You need to monitor the memory consumption of the jvm, both the client the server. Google around these keywords, there are many examples on the web. Google as well arrayList initialization. Note as well that the important is not the memory size of the structure on disk but the size of the ListPut puts = new ArrayListPut(); before the table put. On Wed, Aug 29, 2012 at 5:42 PM, Bing Li lbl...@gmail.com wrote: Dear N Keywal, Thanks so much for your reply! The total amount of data is about 110M. The available memory is enough, 2G. In Java, I just set a collection to NULL to collect garbage. Do you think it is fine? Best regards, Bing On Wed, Aug 29, 2012 at 11:22 PM, N Keywal nkey...@gmail.com wrote: Hi Bing, You should expect HBase to be slower in the generic case: 1) it writes much more data (see hbase data model), with extra columns qualifiers, timestamps so on. 2) the data is written multiple times: once in the write-ahead-log, once per replica on datanode so on again. 3) there are inter process calls inter machine calls on the critical path. This is the cost of the atomicity, reliability and scalability features. With these features in mind, HBase is reasonably fast to save data on a cluster. On your specific case (without the points 2 3 above), the performance seems to be very bad. You should first look at: - how much is spent in the put vs. preparing the list - do you have garbage collection going on? even swap? - what's the size of your final Array vs. the available memory? Cheers, N. On Wed, Aug 29, 2012 at 4:08 PM, Bing Li lbl...@gmail.com wrote: Dear all, By the way, my HBase is in the pseudo-distributed mode. Thanks! Best regards, Bing On Wed, Aug 29, 2012 at 10:04 PM, Bing Li lbl...@gmail.com wrote: Dear all, According to my experiences, it is very slow for HBase to save data? Am I right? For example, today I need to save data in a HashMap to HBase. It took about more than three hours. However when saving the same HashMap in a file in the text format with the redirected System.out, it took only 4.5 seconds! Why is HBase so slow? It is indexing? My code to save data in HBase is as follows. I think the code must be correct. .. public synchronized void AddVirtualOutgoingHHNeighbors(ConcurrentHashMapString, ConcurrentHashMapString, SetString hhOutNeighborMap, int timingScale) { ListPut puts = new ArrayListPut(); String hhNeighborRowKey; Put hubKeyPut; Put groupKeyPut; Put topGroupKeyPut; Put timingScalePut; Put nodeKeyPut; Put hubNeighborTypePut; for (Map.EntryString, ConcurrentHashMapString, SetString sourceHubGroupNeighborEntry : hhOutNeighborMap.entrySet()) { for (Map.EntryString, SetString groupNeighborEntry : sourceHubGroupNeighborEntry.getValue().entrySet()) { for (String neighborKey : groupNeighborEntry.getValue()) { hhNeighborRowKey = NeighborStructure.HUB_HUB_NEIGHBOR_ROW + Tools.GetAHash(sourceHubGroupNeighborEntry.getKey() + groupNeighborEntry.getKey() + timingScale + neighborKey); hubKeyPut = new Put(Bytes.toBytes(hhNeighborRowKey)); hubKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY), Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_HUB_KEY_COLUMN), Bytes.toBytes(sourceHubGroupNeighborEntry.getKey())); puts.add(hubKeyPut); groupKeyPut = new Put(Bytes.toBytes(hhNeighborRowKey)); groupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY), Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_GROUP_KEY_COLUMN), Bytes.toBytes(groupNeighborEntry.getKey())); puts.add(groupKeyPut); topGroupKeyPut = new Put(Bytes.toBytes(hhNeighborRowKey)); topGroupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY), Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TOP_GROUP_KEY_COLUMN), Bytes.toBytes(GroupRegistry.WWW().GetParentGroupKey(groupNeighborEntry.getKey(; puts.add(topGroupKeyPut
Re: HBase Is So Slow To Save Data?
Dear Cristofer, Thanks so much for your reminding! Best regards, Bing On Thu, Aug 30, 2012 at 12:32 AM, Cristofer Weber cristofer.we...@neogrid.com wrote: There's also a lot of conversions from same values to byte array representation, eg, your NeighborStructure constants. You should do this conversion only once to save time, since you are doing this inside 3 nested loops. Not sure about how much this can improve, but you should try this also. Best regards, Cristofer -Mensagem original- De: Bing Li [mailto:lbl...@gmail.com] Enviada em: quarta-feira, 29 de agosto de 2012 13:07 Para: user@hbase.apache.org Cc: hbase-u...@hadoop.apache.org Assunto: Re: HBase Is So Slow To Save Data? I see. Thanks so much! Bing On Wed, Aug 29, 2012 at 11:59 PM, N Keywal nkey...@gmail.com wrote: It's not useful here: if you have a memory issue, it's when your using the list, not when you have finished with it and set it to null. You need to monitor the memory consumption of the jvm, both the client the server. Google around these keywords, there are many examples on the web. Google as well arrayList initialization. Note as well that the important is not the memory size of the structure on disk but the size of the ListPut puts = new ArrayListPut(); before the table put. On Wed, Aug 29, 2012 at 5:42 PM, Bing Li lbl...@gmail.com wrote: Dear N Keywal, Thanks so much for your reply! The total amount of data is about 110M. The available memory is enough, 2G. In Java, I just set a collection to NULL to collect garbage. Do you think it is fine? Best regards, Bing On Wed, Aug 29, 2012 at 11:22 PM, N Keywal nkey...@gmail.com wrote: Hi Bing, You should expect HBase to be slower in the generic case: 1) it writes much more data (see hbase data model), with extra columns qualifiers, timestamps so on. 2) the data is written multiple times: once in the write-ahead-log, once per replica on datanode so on again. 3) there are inter process calls inter machine calls on the critical path. This is the cost of the atomicity, reliability and scalability features. With these features in mind, HBase is reasonably fast to save data on a cluster. On your specific case (without the points 2 3 above), the performance seems to be very bad. You should first look at: - how much is spent in the put vs. preparing the list - do you have garbage collection going on? even swap? - what's the size of your final Array vs. the available memory? Cheers, N. On Wed, Aug 29, 2012 at 4:08 PM, Bing Li lbl...@gmail.com wrote: Dear all, By the way, my HBase is in the pseudo-distributed mode. Thanks! Best regards, Bing On Wed, Aug 29, 2012 at 10:04 PM, Bing Li lbl...@gmail.com wrote: Dear all, According to my experiences, it is very slow for HBase to save data? Am I right? For example, today I need to save data in a HashMap to HBase. It took about more than three hours. However when saving the same HashMap in a file in the text format with the redirected System.out, it took only 4.5 seconds! Why is HBase so slow? It is indexing? My code to save data in HBase is as follows. I think the code must be correct. .. public synchronized void AddVirtualOutgoingHHNeighbors(ConcurrentHashMapString, ConcurrentHashMapString, SetString hhOutNeighborMap, int timingScale) { ListPut puts = new ArrayListPut(); String hhNeighborRowKey; Put hubKeyPut; Put groupKeyPut; Put topGroupKeyPut; Put timingScalePut; Put nodeKeyPut; Put hubNeighborTypePut; for (Map.EntryString, ConcurrentHashMapString, SetString sourceHubGroupNeighborEntry : hhOutNeighborMap.entrySet()) { for (Map.EntryString, SetString groupNeighborEntry : sourceHubGroupNeighborEntry.getValue().entrySet()) { for (String neighborKey : groupNeighborEntry.getValue()) { hhNeighborRowKey = NeighborStructure.HUB_HUB_NEIGHBOR_ROW + Tools.GetAHash(sourceHubGroupNeighborEntry.getKey() + groupNeighborEntry.getKey() + timingScale + neighborKey); hubKeyPut = new Put(Bytes.toBytes(hhNeighborRowKey)); hubKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY) , Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_HUB_KEY_COLUMN) , Bytes.toBytes(sourceHubGroupNeighborEntry.getKey
Min/Max Column Value and Row Count
Dear all, I noticed that there were no ways to get the min/max of a specific column value using the current available filters. Right? Any more convenient approaches to get the row count of a family? I plan to use FamilyFilter to do that. Thanks so much! Best regards, Bing
Re: Is HBase Thread-Safety?
NNever, Thanks so much for your answers! On Fri, Apr 13, 2012 at 10:50 AM, NNever nnever...@gmail.com wrote: 1. A pre-row lock is here during the update, so other clients will block whild on client performs an update.(see HRegion.put 's annotaion), no exception. In the client side, while a process is updating, it may not reach the buffersize so the other process may read the original value, I think. 2. What kind of inconsistency? different value on the same row's qualifier? The inconsistency means for the same retrieval, such as a scan, different values are got in different threads for the multiple instances of a HTable in them, respectively. Is it possible? In my case, the little bit inconsistency is not so critical. So I will not worry about the thread-safety issue. It must be fine? 3.I don't know the truely realize in code. There Is caching, but everytime you call methods like Htable.get, it still need connect to server to retrieve data——so, not as fast as in memory, isn't it? I plan to design a read-only mechanism except when a periodically-updatinge in with HBase for my system to raise the performance. Locking must affect the performance. If caching is not fast enough in HBase, the design might not be good? Thanks again! Best, Bing Best regards, nn 2012/4/13 Bing Li lbl...@gmail.com Dear Iars, Thanks so much for your reply! In my case, I need to overwrite or update a HTable. If reading during the process of updating or overwriting, any exceptions will be thrown by HBase? If multiple instances for a HTable are used by multiple threads, there must be inconsistency among them, right? I guess caching must be done in HBase. So retrieving in HTable must be almost as fast as in memory? Best regards, Bing On Fri, Apr 13, 2012 at 6:17 AM, lars hofhansl lhofha...@yahoo.com wrote: Hi Bing, Which part? The server certainly is thread safe. The client is not, at least not all the way through. The main consideration is HTable, which is not thread safe, you need to create one instance for each thread (HBASE-4805 makes that much cheaper), store the HTable in a ThreadLocal after creation, or use HTablePool. Please let me know if that answers your question. Thanks. -- Lars - Original Message - From: Bing Li lbl...@gmail.com To: hbase-u...@hadoop.apache.org; user user@hbase.apache.org Cc: Sent: Thursday, April 12, 2012 3:10 PM Subject: Is HBase Thread-Safety? Dear all, Is HBase thread-safety? Do I need to consider the consistency issue when manipulating HBase? Thanks so much! Best regards, Bing
Fwd: Is HBase Thread-Safety?
NNever, Thanks so much for your answers! On Fri, Apr 13, 2012 at 10:50 AM, NNever nnever...@gmail.com wrote: 1. A pre-row lock is here during the update, so other clients will block whild on client performs an update.(see HRegion.put 's annotaion), no exception. In the client side, while a process is updating, it may not reach the buffersize so the other process may read the original value, I think. 2. What kind of inconsistency? different value on the same row's qualifier? The inconsistency means for the same retrieval, such as a scan, different values are got in different threads for the multiple instances of a HTable in them, respectively. Is it possible? In my case, the little bit inconsistency is not so critical. So I will not worry about the thread-safety issue. It must be fine? 3.I don't know the truely realize in code. There Is caching, but everytime you call methods like Htable.get, it still need connect to server to retrieve data——so, not as fast as in memory, isn't it? I plan to design a read-only mechanism except when a periodically-updatinge in with HBase for my system to raise the performance. Locking must affect the performance. If caching is not fast enough in HBase, the design might not be good? Thanks again! Best, Bing Best regards, nn 2012/4/13 Bing Li lbl...@gmail.com Dear Iars, Thanks so much for your reply! In my case, I need to overwrite or update a HTable. If reading during the process of updating or overwriting, any exceptions will be thrown by HBase? If multiple instances for a HTable are used by multiple threads, there must be inconsistency among them, right? I guess caching must be done in HBase. So retrieving in HTable must be almost as fast as in memory? Best regards, Bing On Fri, Apr 13, 2012 at 6:17 AM, lars hofhansl lhofha...@yahoo.com wrote: Hi Bing, Which part? The server certainly is thread safe. The client is not, at least not all the way through. The main consideration is HTable, which is not thread safe, you need to create one instance for each thread (HBASE-4805 makes that much cheaper), store the HTable in a ThreadLocal after creation, or use HTablePool. Please let me know if that answers your question. Thanks. -- Lars - Original Message - From: Bing Li lbl...@gmail.com To: hbase-u...@hadoop.apache.org; user user@hbase.apache.org Cc: Sent: Thursday, April 12, 2012 3:10 PM Subject: Is HBase Thread-Safety? Dear all, Is HBase thread-safety? Do I need to consider the consistency issue when manipulating HBase? Thanks so much! Best regards, Bing
Re: Is HBase Thread-Safety?
Dear Iars, Thanks so much for your reply! In my case, I need to overwrite or update a HTable. If reading during the process of updating or overwriting, any exceptions will be thrown by HBase? If multiple instances for a HTable are used by multiple threads, there must be inconsistency among them, right? I guess caching must be done in HBase. So retrieving in HTable must be almost as fast as in memory? Best regards, Bing On Fri, Apr 13, 2012 at 6:17 AM, lars hofhansl lhofha...@yahoo.com wrote: Hi Bing, Which part? The server certainly is thread safe. The client is not, at least not all the way through. The main consideration is HTable, which is not thread safe, you need to create one instance for each thread (HBASE-4805 makes that much cheaper), store the HTable in a ThreadLocal after creation, or use HTablePool. Please let me know if that answers your question. Thanks. -- Lars - Original Message - From: Bing Li lbl...@gmail.com To: hbase-u...@hadoop.apache.org; user user@hbase.apache.org Cc: Sent: Thursday, April 12, 2012 3:10 PM Subject: Is HBase Thread-Safety? Dear all, Is HBase thread-safety? Do I need to consider the consistency issue when manipulating HBase? Thanks so much! Best regards, Bing
Re: NotServingRegionException in Pseudo-Distributed Mode
Dear all, By the way, I didn't see any severe exceptions and no any exceptions related to NotServingRegionException. Thanks so much! Bing On Thu, Apr 12, 2012 at 12:27 AM, Bing Li lbl...@gmail.com wrote: Dear all, I got an exception as follows when running HBase. My Hadoop is set up in the pseudo-distributed mode. The exception happens after the system runs for about one hour. The specification of NotServingRegionException says thrown by a region server if it is sent a request for a region it is not serving. I cannot figure out how to solve it in my case. Could you please help me on this? Thanks so much! Bing [java] org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 14 actions: NotServingRegionException: 14 times, servers with issues: greatfreeweb:60020, [java] at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1641) [java] at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1409) [java] at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:900) [java] at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:777) [java] at org.apache.hadoop.hbase.client.HTable.put(HTable.java:760) [java] at org.apache.hadoop.hbase.client.HTablePool$PooledHTable.put(HTablePool.java:402) [java] at com.greatfree.hbase.NeighborPersister.ReplicateNodeNeighbor(NeighborPersister.java:550) [java] at com.greatfree.hbase.thread.ReplicateNodeNeighborThread.run(ReplicateNodeNeighborThread.java:50) [java] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) [java] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) [java] at java.lang.Thread.run(Thread.java:722)
Re: NotServingRegionException in Pseudo-Distributed Mode
Dear Shashwat, I appreciate so much for your reply! But I still cannot solve the problem with the links in your email. In my case, the environment is simple. All of the stuffs run on a single machine. Does the exception affect something? Some data will be lost or anything else? I got another link which wondered if NSRE was really an exception. http://mail-archives.apache.org/mod_mbox/hbase-user/201003.mbox/%3c4bb3617a.4020...@gmx.de%3E Any further help? Thanks so much! Best regards, Bing On Thu, Apr 12, 2012 at 1:42 AM, Shashwat dwivedishash...@gmail.com wrote: Check out this thread may be this will provide some help : http://mail-archives.apache.org/mod_mbox/hbase-user/201201.mbox/%3CCAHau4ys9 eTj_ek_jP=bnpovsprrayuyn4fhtd51dgpdgyvy...@mail.gmail.com%3E and http://www.mail-archive.com/hbase-user@hadoop.apache.org/msg01180.html -Original Message- From: Bing Li [mailto:lbl...@gmail.com] Sent: 11 April 2012 21:58 To: hbase-u...@hadoop.apache.org; user Subject: NotServingRegionException in Pseudo-Distributed Mode Dear all, I got an exception as follows when running HBase. My Hadoop is set up in the pseudo-distributed mode. The exception happens after the system runs for about one hour. The specification of NotServingRegionException says thrown by a region server if it is sent a request for a region it is not serving. I cannot figure out how to solve it in my case. Could you please help me on this? Thanks so much! Bing [java] org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 14 actions: NotServingRegionException: 14 times, servers with issues: greatfreeweb:60020, [java] at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation. processBatchCallback(HConnectionManager.java:1641) [java] at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation. processBatch(HConnectionManager.java:1409) [java] at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:900) [java] at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:777) [java] at org.apache.hadoop.hbase.client.HTable.put(HTable.java:760) [java] at org.apache.hadoop.hbase.client.HTablePool$PooledHTable.put(HTablePool.java:4 02) [java] at com.greatfree.hbase.NeighborPersister.ReplicateNodeNeighbor(NeighborPersiste r.java:550) [java] at com.greatfree.hbase.thread.ReplicateNodeNeighborThread.run(ReplicateNodeNeig hborThread.java:50) [java] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11 10) [java] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6 03) [java] at java.lang.Thread.run(Thread.java:722)
Methods Missing in HTableInterface
Dear all, I found some methods existed in HTable were not in HTableInterface. setAutoFlush setWriteBufferSize ... In most cases, I manipulate HBase through HTableInterface from HTablePool. If I need to use the above methods, how to do that? I am considering writing my own table pool if no proper ways. Is it fine? Thanks so much! Best regards, Bing
Re: Methods Missing in HTableInterface
I just did that. Thanks so much for your help! Best, Bing Methods Missing in HTableInterface -- Key: HBASE-5728 URL: https://issues.apache.org/jira/browse/HBASE-5728 Project: HBase Issue Type: Improvement Components: client Reporter: Bing Li On Thu, Apr 5, 2012 at 11:32 PM, Lars George lars.geo...@gmail.com wrote: +1, there are quiet a few missing that should be in there. Please create a JIRA issue so that we can discuss and agree on which to add. Lars On Apr 5, 2012, at 6:23 PM, Stack wrote: On Thu, Apr 5, 2012 at 4:20 AM, Bing Li lbl...@gmail.com wrote: Dear all, I found some methods existed in HTable were not in HTableInterface. setAutoFlush setWriteBufferSize ... Make a patch to add them? Thanks, St.Ack
Re: Starting Abnormally After Shutting Down For Some Time
Dear Manish and Jean-Daniel, After starting DFS (/opt/hadoop/bin/start-dfs.sh), I got the following daemons after tying jps. 5212 Jps 5150 SecondaryNameNode 4932 DataNode 4737 NameNode Then, I started the HBase (/opt/hbase/bin/start-hbase.sh). The following daemons were available. 5797 Jps 5526 HMaster 5150 SecondaryNameNode 5711 HRegionServer 4932 DataNode 4737 NameNode 5456 HQuorumPeer HMaster was started. It seemed that everything was fine. But when typing status in HBase shell. The following error still occurred. ERROR: org.apache.hadoop.hbase.MasterNotRunningException: Retried 7 times In the master log, the following exception was found. 2012-03-28 13:40:01,193 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. java.net.ConnectException: Call to localhost/127.0.0.1:9000 failed on connection exception: java.net.ConnectException: Connection refused at org.apache.hadoop.ipc.Client.wrapException(Client.java:1095) at org.apache.hadoop.ipc.Client.call(Client.java:1071) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225) at $Proxy10.setSafeMode(Unknown Source) at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at $Proxy10.setSafeMode(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.setSafeMode(DFSClient.java:1120) at org.apache.hadoop.hdfs.DistributedFileSystem.setSafeMode(DistributedFileSystem.java:423) at org.apache.hadoop.hbase.util.FSUtils.waitOnSafeMode(FSUtils.java:439) at org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:323) at org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:128) at org.apache.hadoop.hbase.master.MasterFileSystem.init(MasterFileSystem.java:113) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:448) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:326) at java.lang.Thread.run(Thread.java:722) Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:656) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:434) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:560) at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:184) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1202) at org.apache.hadoop.ipc.Client.call(Client.java:1046) ... 17 more 2012-03-28 13:40:01,195 INFO org.apache.hadoop.hbase.master.HMaster: Aborting What is the problem? Why does it happen after HBase/Hadoop is shutdown for a couple of days? Thanks so much! Bing On Wed, Mar 28, 2012 at 11:09 AM, Manish Bhoge manishbh...@rocketmail.comwrote: It says you have not started the hbase master. Once you restarted the system have you confirmed whether all hadoop daemons are running? sudo jps If you are using CDH package then you can automatically start the hadoop daemons on boot using reconfig package. Sent from my BlackBerry, pls excuse typo -Original Message- From: Bing Li lbl...@gmail.com Date: Wed, 28 Mar 2012 03:52:12 To: hbase-u...@hadoop.apache.org; useruser@hbase.apache.org Reply-To: user@hbase.apache.org Subject: Starting Abnormally After Shutting Down For Some Time Dear all, I got a weird problem when programming on the pseudo-distributed mode of HBase/Hadoop. The HBase/Hadoop were installed correctly. It also ran well with my Java code. However, if after shutting down the server for some time, for example, four or five days, I noticed that HBase/Hadoop got a problem. I got an ERROR when typing status in the shell of HBase. ERROR: org.apache.hadoop.hbase.MasterNotRunningException: Retried 7 times Such a problem had happened for three times in the three weeks. The HBase/Hadoop are installed on Ubuntu 10. Have you encountered such a problem? How to solve it? Thanks so much! Best regards, Bing
Re: Starting Abnormally After Shutting Down For Some Time
Jean-Daniel, I changed dfs.data.dir and dfs.name.dir to new paths in the hdfs-site.xml. I really cannot figure out why the HBase/Hadoop got a problem after a couple of days of shutting down. If I use it frequently, no such a master problem happens. Each time, I have to reinstall not only HBase/Hadoop but also Ubuntu for the problem. It wasted me a lot of time. Thanks so much! Bing On Wed, Mar 28, 2012 at 4:46 AM, Jean-Daniel Cryans jdcry...@apache.orgwrote: Hi Bing, Two questions: - Can you look at the master log and see what's preventing the master from starting? - Did you change dfs.data.dir and dfs.name.dir in hdfs-site.xml? By default it writes to /tmp which can get cleaned up. J-D On Tue, Mar 27, 2012 at 12:52 PM, Bing Li lbl...@gmail.com wrote: Dear all, I got a weird problem when programming on the pseudo-distributed mode of HBase/Hadoop. The HBase/Hadoop were installed correctly. It also ran well with my Java code. However, if after shutting down the server for some time, for example, four or five days, I noticed that HBase/Hadoop got a problem. I got an ERROR when typing status in the shell of HBase. ERROR: org.apache.hadoop.hbase.MasterNotRunningException: Retried 7 times Such a problem had happened for three times in the three weeks. The HBase/Hadoop are installed on Ubuntu 10. Have you encountered such a problem? How to solve it? Thanks so much! Best regards, Bing
Re: Starting Abnormally After Shutting Down For Some Time
Dear Manish, I appreciate so much for your replies! The system tmp directory is changed to anther location in my hdfs-site.xml. If I ran $HADOOP_HOME/bin/start-all.sh, all of the services were listed, including job tracker and task tracker. 10211 SecondaryNameNode 10634 Jps 9992 DataNode 10508 TaskTracker 10312 JobTracker 9797 NameNode In the job tracker's log, one exception was found. org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete /home/libing/GreatFreeLab s/Hadoop/FS/mapred/system. Name node is in safe mode. In my system, I didn't see the directory, ~/mapred. How should I configure for it? For the properties you listed, they were not set in my system. Are they required? Since they have default values ( http://hbase.apache.org/docs/r0.20.6/hbase-conf.html), do I need to update them? - hbase.zookeeper.property.clientPort. - hbase.zookeeper.quorum. - hbase.zookeeper.property.dataDir Now the system was reinstalled. At least, the pseudo-distributed mode runs well. I also tried to shut down the ubuntu machine and started it again. The system worked fine. But I worried the master-related problem must happen if the machine was shutdown for more time. I really don't understand the reason. Thanks so much! Best, Bing On Wed, Mar 28, 2012 at 3:11 PM, Manish Bhoge manishbh...@rocketmail.comwrote: Bing, As per my experience on the configuration I can list down some points one of which may be your solution. - first and foremost don't store your service metadata into system tmp directory because it may get cleaned up in every start and you loose all your job tracker, datanode information. It is as good as you're formatting your namenode. - if you're using CDH make sure you set up permission perfectly for root, dfs data directory and mapred directories.(Refer CDH documentation) - I didn't see job tracker in your service list. It should be up and running. Check the job tracker log if there is any permission issue when starting job tracker and task tracker. - before trying your stuff on Hbase set up make sure all your Hadoop services are up and running. You can check that by running a sample program and check whether job tracker, task tracker responding for your mapred.system and mapred.local directories to create intermediate files. - once you have all hadoop services up don't set/change any permission. As far as Hbase configuration is concerned there are 2 path for set up: either you set up zookeeper within hbase-site.xml Or configure separately via zoo.cfg. If you are going with hbase setting for zookeeper then confirm following setting: - hbase.zookeeper.property.clientPort. - hbase.zookeeper.quorum. - hbase.zookeeper.property.dataDir Once you have right setting for these and set up root directory for hbase then there not much excercise is required.(Make sure zookeeper service is up before you start hbase) I think if you follow above rules you should be fine. There is no issue because of long time shutdown or frequent machine restart. Champ, moreover you need to have good amount of patience to understand the problem :) I do understand how frustating when you set up everything and next day you find the things are completely down. Sent from my BlackBerry, pls excuse typo -Original Message- From: Bing Li lbl...@gmail.com Date: Wed, 28 Mar 2012 14:32:12 To: user@hbase.apache.org; hbase-u...@hadoop.apache.org Reply-To: user@hbase.apache.org Subject: Re: Starting Abnormally After Shutting Down For Some Time Jean-Daniel, I changed dfs.data.dir and dfs.name.dir to new paths in the hdfs-site.xml. I really cannot figure out why the HBase/Hadoop got a problem after a couple of days of shutting down. If I use it frequently, no such a master problem happens. Each time, I have to reinstall not only HBase/Hadoop but also Ubuntu for the problem. It wasted me a lot of time. Thanks so much! Bing On Wed, Mar 28, 2012 at 4:46 AM, Jean-Daniel Cryans jdcry...@apache.org wrote: Hi Bing, Two questions: - Can you look at the master log and see what's preventing the master from starting? - Did you change dfs.data.dir and dfs.name.dir in hdfs-site.xml? By default it writes to /tmp which can get cleaned up. J-D On Tue, Mar 27, 2012 at 12:52 PM, Bing Li lbl...@gmail.com wrote: Dear all, I got a weird problem when programming on the pseudo-distributed mode of HBase/Hadoop. The HBase/Hadoop were installed correctly. It also ran well with my Java code. However, if after shutting down the server for some time, for example, four or five days, I noticed that HBase/Hadoop got a problem. I got an ERROR when typing status in the shell of HBase. ERROR: org.apache.hadoop.hbase.MasterNotRunningException: Retried 7 times Such a problem had happened for three times in the three weeks
Re: Starting Abnormally After Shutting Down For Some Time
Dear Peter, When I just started the Ubuntu machine, there was nothing in /tmp. After starting $HADOOP/bin/start-dfs.sh and $HBase/bin/start-hbase.sh, the following files were under /tmp. Do you think anything wrong? Thanks! libing@greatfreeweb:/tmp$ ls -alrt total 112 drwxr-xr-x 22 root root4096 2012-03-28 14:17 .. -rw-r--r-- 1 libing libing 5 2012-03-29 04:48 hadoop-libing-namenode.pid -rw-r--r-- 1 libing libing 5 2012-03-29 04:48 hadoop-libing-datanode.pid -rw-r--r-- 1 libing libing 5 2012-03-29 04:48 hadoop-libing-secondarynamenode.pid -rw-r--r-- 1 libing libing 5 2012-03-29 04:48 hbase-libing-zookeeper.pid drwxr-xr-x 3 libing libing 4096 2012-03-29 04:48 hbase-libing -rw-r--r-- 1 libing libing 5 2012-03-29 04:48 hbase-libing-master.pid -rw-r--r-- 1 libing libing 5 2012-03-29 04:48 hbase-libing-regionserver.pid drwxr-xr-x 2 libing libing 4096 2012-03-29 04:48 hsperfdata_libing drwxrwxrwt 4 root root4096 2012-03-29 04:48 . -rw-r--r-- 1 libing libing 71819 2012-03-29 04:48 jffi5395899026867792565.tmp libing@greatfreeweb:/tmp$ Best, Bing On Thu, Mar 29, 2012 at 3:19 AM, Peter Vandenabeele pe...@vandenabeele.comwrote: On Wed, Mar 28, 2012 at 7:27 PM, Bing Li lbl...@gmail.com wrote: Dear all, I found some configuration information was saved in /tmp in my system. So when some of the information is lost, the HBase cannot be started normally. But in my system, I have tried to change the HDFS directory to another location. Why are there still some files under /tmp? I have a pseudo-distributed set-up (Cloudera cdh3u2) with local directory (not /tmp) and as a test: * stopped the hbase service * stopped the hadoop services * moved all hadoop related files from tmp to an ORIG directory [see below] * restarted all (5) hadoop services * restarted the hbase service All of that worked stable, so I presume no immediate dependency on the /tmp files. The files that are recreated are these: peterv@e6500:/tmp$ ls -alrt ... drwxr-xr-x 4 hdfs hdfs 4096 2012-03-28 20:07 Jetty_0_0_0_0_50070_hdfsw2cu08 drwxr-xr-x 4 hdfs hdfs 4096 2012-03-28 20:07 Jetty_0_0_0_0_50075_datanodehwtdwq drwxr-xr-x 2 hdfs hdfs 4096 2012-03-28 20:07 hsperfdata_hdfs drwxr-xr-x 4 hdfs hdfs 4096 2012-03-28 20:07 Jetty_0_0_0_0_50090_secondaryy6aanv drwxr-xr-x 4 mapred mapred 4096 2012-03-28 20:07 Jetty_0_0_0_0_50030_jobyn7qmk drwxr-xr-x 2 mapred mapred 4096 2012-03-28 20:07 hsperfdata_mapred drwxr-xr-x 2 root root 4096 2012-03-28 20:07 hsperfdata_root drwxr-xr-x 4 mapred mapred 4096 2012-03-28 20:07 Jetty_0_0_0_0_50060_task.2vcltf The files that I had moved on the site (to ORIG) where: peterv@e6500:/tmp$ ls -alrt ORIG/ total 44 drwxr-xr-x 4 mapred mapred 4096 2012-03-28 19:58 Jetty_0_0_0_0_50030_jobyn7qmk drwxr-xr-x 4 hdfs hdfs 4096 2012-03-28 19:58 Jetty_0_0_0_0_50070_hdfsw2cu08 drwxr-xr-x 4 hdfs hdfs 4096 2012-03-28 19:58 Jetty_0_0_0_0_50090_secondaryy6aanv drwxr-xr-x 4 hdfs hdfs 4096 2012-03-28 19:58 Jetty_0_0_0_0_50075_datanodehwtdwq drwxr-xr-x 4 mapred mapred 4096 2012-03-28 19:59 Jetty_0_0_0_0_50060_task.2vcltf drwxr-xr-x 2 peterv peterv 4096 2012-03-28 20:05 hsperfdata_peterv drwxr-xr-x 2 hdfs hdfs 4096 2012-03-28 20:05 hsperfdata_hdfs drwxr-xr-x 2 mapred mapred 4096 2012-03-28 20:05 hsperfdata_mapred drwxr-xr-x 2 root root 4096 2012-03-28 20:06 hsperfdata_root Which hadoop/hbase files do you still see in your /tmp directory? HTH, Peter
Starting Abnormally After Shutting Down For Some Time
Dear all, I got a weird problem when programming on the pseudo-distributed mode of HBase/Hadoop. The HBase/Hadoop were installed correctly. It also ran well with my Java code. However, if after shutting down the server for some time, for example, four or five days, I noticed that HBase/Hadoop got a problem. I got an ERROR when typing status in the shell of HBase. ERROR: org.apache.hadoop.hbase.MasterNotRunningException: Retried 7 times Such a problem had happened for three times in the three weeks. The HBase/Hadoop are installed on Ubuntu 10. Have you encountered such a problem? How to solve it? Thanks so much! Best regards, Bing
Re: Setting Up Pseudo-Distributed Mode Failed On Ubuntu 11
After installing on Ubuntu Server 11, I found two errors. 1) In the HBase shell, the error is that the master node is not started. The system prompts it tries seven times; 2) Sometimes, I also saw the following problem. And, the HBase cannot be stopped. 0 servers, 0 dead, NaN average load On Ubuntu Server 10, no such problems. Thanks so much! Bing On Sun, Mar 11, 2012 at 12:01 PM, Gopal absoft...@gmail.com wrote: On 03/10/2012 10:23 PM, Bing Li wrote: Dear all, Yesterday I tried to set up the pseudo-distributed mode for HBase on Ubuntu 11 (64-bit). But I failed to do that. What I have done is exactly the same as on Ubuntu 10. On Ubuntu 10, I set it up successfully. I am not sure what are the possible problems. Could you give me some hints? Thanks so much! Best regards, Bing List the error you are getting. Dump the Java stack trace. Thanks
Setting Up Pseudo-Distributed Mode Failed On Ubuntu 11
Dear all, Yesterday I tried to set up the pseudo-distributed mode for HBase on Ubuntu 11 (64-bit). But I failed to do that. What I have done is exactly the same as on Ubuntu 10. On Ubuntu 10, I set it up successfully. I am not sure what are the possible problems. Could you give me some hints? Thanks so much! Best regards, Bing
RowFilter - Each Time It Should Be Initialized?
Dear all, I am now using RowFilter to retrieve multiple rows. Each time I need to call the following line? filter = new RowFilter(CompareFilter.CompareOp.EQUAL, newSubstringComparator( Classmate2)); I check the relevant APIs. There is a method, reset(). But it seems that I have to use the constructor to set the new parameters only. Does that way consume many resources if the count of rows is large. Thanks, Bing
Retrieving by Counters and ValueFilter
Dear all, HBase has the feature to treat columns as counters. So I attempted to retrieve data based on the value of counters. Usually, the counters are the long type. But the filters' constructors, such as ValueFilter, in HBase does not have the parameter of long type. If so, may I still retrieve by ValueFilter and counters? Thanks so much! Best regards, Bing
The Problems When Retrieving By BinaryComparator
Dear all, I created a table as follows. I need to retrieve by the column of Salary, which is a long type data. Some errors are got as follows. ROW COLUMN+CELL Classmate1 column=ClassmateFamily:Address, timestamp=1330118559432, value=Canada Classmate1 column=ClassmateFamily:Age, timestamp=1330118559429, value=42 Classmate1 column=ClassmateFamily:Career, timestamp=1330118559431, value=Faculty Classmate1 column=ClassmateFamily:Hobby, timestamp=1330118559433, value=Soccer Classmate1 column=ClassmateFamily:Name, timestamp=1330118559427, value=Bing Classmate1 column=ClassmateFamily:Salary, timestamp=1330121577483, value=\x00\x00\x00\x00\x00\x00\x03\xEA (1002 - long) Classmate2 column=ClassmateFamily:Address, timestamp=1330118559436, value=US Classmate2 column=ClassmateFamily:Age, timestamp=1330118559434, value=52 Classmate2 column=ClassmateFamily:Career, timestamp=1330118559435, value=Educator Classmate2 column=ClassmateFamily:Hobby, timestamp=1330118559437, value=Music Classmate2 column=ClassmateFamily:Name, timestamp=1330118559433, value=GreatFree Classmate2 column=ClassmateFamily:Salary, timestamp=1330118559393, value=\x00\x00\x00\x00\x00\x00\x05\xDC (1500 - long) Classmate3 column=ClassmateFamily:Address, timestamp=1330118559440, value=US Classmate3 column=ClassmateFamily:Age, timestamp=1330118559438, value=100 Classmate3 column=ClassmateFamily:Career, timestamp=1330118559439, value=Researcher Classmate3 column=ClassmateFamily:Hobby, timestamp=1330118559442, value=Science Classmate3 column=ClassmateFamily:Name, timestamp=1330118559437, value=LBLabs Classmate3 column=ClassmateFamily:Salary, timestamp=1330118559397, value=\x00\x00\x00\x00\x00\x00\x07\x08 (1800 - long) Classmate4 column=ClassmateFamily:Address, timestamp=1330118559445, value=Baoji Classmate4 column=ClassmateFamily:Age, timestamp=1330118559443, value=41 Classmate4 column=ClassmateFamily:Career, timestamp=1330118559444, value=Lawyer Classmate4 column=ClassmateFamily:Hobby, timestamp=1330118559446, value=Drawing Classmate4 column=ClassmateFamily:Name, timestamp=1330118559442, value=Dezhi Classmate4 column=ClassmateFamily:Salary, timestamp=1330118559399, value=\x00\x00\x00\x00\x00\x00\x03 (800 - long) The code is listed below. Filter filter = new ValueFilter(CompareFilter.CompareOp.LESS, new BinaryComparator(Bytes.toBytes(1000))); // The filter line * Scan scan = new Scan(); scan.addColumn(Bytes.toBytes(ClassmateFamily), Bytes.toBytes(Salary)); scan.setFilter(filter); ResultScanner scanner = table.getScanner(scan); for (Result result : scanner) { for (KeyValue kv : result.raw()) { System.out.println(KV: + kv + , Value: + Bytes.toLong(kv.getValue())); } } scanner.close(); System.out.println(); Get get = new Get(Bytes.toBytes(Classmate3)); get.setFilter(filter); Result result = table.get(get); for (KeyValue kv : result.raw()) { System.out.println(KV: + kv + , Value: + Bytes.toLong(kv.getValue())); } I think the correct result should be like the one below. Only the rows that are less than 1000 must be returned, right? [java] KV: Classmate4/ClassmateFamily:Salary/1330118559399/Put/vlen=8, Value: 800 [java] But, the actual result is as follows. Some rows which are higher than 1000 are returned. Why? [java] KV: Classmate1/ClassmateFamily:Salary/1330121577483/Put/vlen=8, Value: 1002 [java] KV: Classmate2/ClassmateFamily:Salary/1330118559393/Put/vlen=8, Value: 1500 [java] KV: Classmate3/ClassmateFamily:Salary/1330118559397/Put/vlen=8, Value: 1800 [java] KV: Classmate4/ClassmateFamily:Salary/1330118559399/Put/vlen=8, Value: 800 [java] [java] KV: Classmate3/ClassmateFamily:Salary/1330118559397/Put/vlen=8, Value: 1800 If I change the filter line to the following one, Filter filter = new ValueFilter(CompareFilter.CompareOp.GREATER, new BinaryComparator(Bytes.toBytes(1000))); // The
Re: The Problems When Retrieving By BinaryComparator
Mr Gupta, Yes, you are right. After changing Bytes.toBytes(1000) to Bytes.toBytes(1000L), it works fine. However, the following exception still exists. [java] Exception in thread main java.lang.IllegalArgumentException: offset (0) + length (8) exceed the capacity of the array: 2 [java] at org.apache.hadoop.hbase.util.Bytes.explainWrongLengthOrOffset(Bytes.java:527) [java] at org.apache.hadoop.hbase.util.Bytes.toLong(Bytes.java:505) [java] at org.apache.hadoop.hbase.util.Bytes.toLong(Bytes.java:478) [java] at com.greatfree.testing.hbase.OrderedQualifierValue.main(Unknown Source) After searching on the Web, one said it was possible that int type was inserted into the table while retrieving the long value. I created the table again and inserted the long type. But I still got the exception. I am trying to solve the problem. Thanks so much! Bing On Sat, Feb 25, 2012 at 8:31 AM, T Vinod Gupta tvi...@readypulse.comwrote: when you do Bytes.toBytes(1000), you are not telling it whether 1000 is integer or long.. you have to be super careful here.. i didnt read the flow fully but this caught my eye immediate.. try repopulating properly and use proper types when using Bytes. thanks On Fri, Feb 24, 2012 at 4:25 PM, Bing Li lbl...@gmail.com wrote: Dear all, I created a table as follows. I need to retrieve by the column of Salary, which is a long type data. Some errors are got as follows. ROW COLUMN+CELL Classmate1 column=ClassmateFamily:Address, timestamp=1330118559432, value=Canada Classmate1 column=ClassmateFamily:Age, timestamp=1330118559429, value=42 Classmate1 column=ClassmateFamily:Career, timestamp=1330118559431, value=Faculty Classmate1 column=ClassmateFamily:Hobby, timestamp=1330118559433, value=Soccer Classmate1 column=ClassmateFamily:Name, timestamp=1330118559427, value=Bing Classmate1 column=ClassmateFamily:Salary, timestamp=1330121577483, value=\x00\x00\x00\x00\x00\x00\x03\xEA (1002 - long) Classmate2 column=ClassmateFamily:Address, timestamp=1330118559436, value=US Classmate2 column=ClassmateFamily:Age, timestamp=1330118559434, value=52 Classmate2 column=ClassmateFamily:Career, timestamp=1330118559435, value=Educator Classmate2 column=ClassmateFamily:Hobby, timestamp=1330118559437, value=Music Classmate2 column=ClassmateFamily:Name, timestamp=1330118559433, value=GreatFree Classmate2 column=ClassmateFamily:Salary, timestamp=1330118559393, value=\x00\x00\x00\x00\x00\x00\x05\xDC (1500 - long) Classmate3 column=ClassmateFamily:Address, timestamp=1330118559440, value=US Classmate3 column=ClassmateFamily:Age, timestamp=1330118559438, value=100 Classmate3 column=ClassmateFamily:Career, timestamp=1330118559439, value=Researcher Classmate3 column=ClassmateFamily:Hobby, timestamp=1330118559442, value=Science Classmate3 column=ClassmateFamily:Name, timestamp=1330118559437, value=LBLabs Classmate3 column=ClassmateFamily:Salary, timestamp=1330118559397, value=\x00\x00\x00\x00\x00\x00\x07\x08 (1800 - long) Classmate4 column=ClassmateFamily:Address, timestamp=1330118559445, value=Baoji Classmate4 column=ClassmateFamily:Age, timestamp=1330118559443, value=41 Classmate4 column=ClassmateFamily:Career, timestamp=1330118559444, value=Lawyer Classmate4 column=ClassmateFamily:Hobby, timestamp=1330118559446, value=Drawing Classmate4 column=ClassmateFamily:Name, timestamp=1330118559442, value=Dezhi Classmate4 column=ClassmateFamily:Salary, timestamp=1330118559399, value=\x00\x00\x00\x00\x00\x00\x03 (800 - long) The code is listed below. Filter filter = new ValueFilter(CompareFilter.CompareOp.LESS, new BinaryComparator(Bytes.toBytes(1000))); // The filter line * Scan scan = new Scan(); scan.addColumn(Bytes.toBytes(ClassmateFamily), Bytes.toBytes(Salary)); scan.setFilter(filter); ResultScanner scanner = table.getScanner(scan); for (Result result : scanner) { for (KeyValue kv : result.raw()) { System.out.println(KV: + kv + , Value: + Bytes.toLong(kv.getValue())); } } scanner.close(); System.out.println
Re: Solr HBase - Re: How is Data Indexed in HBase?
Dear Mr Gupta, Your understanding about my solution is correct. Now both HBase and Solr are used in my system. I hope it could work. Thanks so much for your reply! Best regards, Bing On Fri, Feb 24, 2012 at 3:30 AM, T Vinod Gupta tvi...@readypulse.comwrote: regarding your question on hbase support for high performance and consistency - i would say hbase is highly scalable and performant. how it does what it does can be understood by reading relevant chapters around architecture and design in the hbase book. with regards to ranking, i see your problem. but if you split the problem into hbase specific solution and solr based solution, you can achieve the results probably. may be you do the ranking and store the rank in hbase and then use solr to get the results and then use hbase as a lookup to get the rank. or you can put the rank as part of the document schema and index the rank too for range queries and such. is my understanding of your scenario wrong? thanks On Wed, Feb 22, 2012 at 9:51 AM, Bing Li lbl...@gmail.com wrote: Mr Gupta, Thanks so much for your reply! In my use cases, retrieving data by keyword is one of them. I think Solr is a proper choice. However, Solr does not provide a complex enough support to rank. And, frequent updating is also not suitable in Solr. So it is difficult to retrieve data randomly based on the values other than keyword frequency in text. In this case, I attempt to use HBase. But I don't know how HBase support high performance when it needs to keep consistency in a large scale distributed system. Now both of them are used in my system. I will check out ElasticSearch. Best regards, Bing On Thu, Feb 23, 2012 at 1:35 AM, T Vinod Gupta tvi...@readypulse.comwrote: Bing, Its a classic battle on whether to use solr or hbase or a combination of both. both systems are very different but there is some overlap in the utility. they also differ vastly when it compares to computation power, storage needs, etc. so in the end, it all boils down to your use case. you need to pick the technology that it best suited to your needs. im still not clear on your use case though. btw, if you haven't started using solr yet - then you might want to checkout ElasticSearch. I spent over a week researching between solr and ES and eventually chose ES due to its cool merits. thanks On Wed, Feb 22, 2012 at 9:31 AM, Ted Yu yuzhih...@gmail.com wrote: There is no secondary index support in HBase at the moment. It's on our road map. FYI On Wed, Feb 22, 2012 at 9:28 AM, Bing Li lbl...@gmail.com wrote: Jacques, Yes. But I still have questions about that. In my system, when users search with a keyword arbitrarily, the query is forwarded to Solr. No any updating operations but appending new indexes exist in Solr managed data. When I need to retrieve data based on ranking values, HBase is used. And, the ranking values need to be updated all the time. Is that correct? My question is that the performance must be low if keeping consistency in a large scale distributed environment. How does HBase handle this issue? Thanks so much! Bing On Thu, Feb 23, 2012 at 1:17 AM, Jacques whs...@gmail.com wrote: It is highly unlikely that you could replace Solr with HBase. They're really apples and oranges. On Wed, Feb 22, 2012 at 1:09 AM, Bing Li lbl...@gmail.com wrote: Dear all, I wonder how data in HBase is indexed? Now Solr is used in my system because data is managed in inverted index. Such an index is suitable to retrieve unstructured and huge amount of data. How does HBase deal with the issue? May I replaced Solr with HBase? Thanks so much! Best regards, Bing
How is Data Indexed in HBase?
Dear all, I wonder how data in HBase is indexed? Now Solr is used in my system because data is managed in inverted index. Such an index is suitable to retrieve unstructured and huge amount of data. How does HBase deal with the issue? May I replaced Solr with HBase? Thanks so much! Best regards, Bing
Solr HBase - Re: How is Data Indexed in HBase?
Jacques, Yes. But I still have questions about that. In my system, when users search with a keyword arbitrarily, the query is forwarded to Solr. No any updating operations but appending new indexes exist in Solr managed data. When I need to retrieve data based on ranking values, HBase is used. And, the ranking values need to be updated all the time. Is that correct? My question is that the performance must be low if keeping consistency in a large scale distributed environment. How does HBase handle this issue? Thanks so much! Bing On Thu, Feb 23, 2012 at 1:17 AM, Jacques whs...@gmail.com wrote: It is highly unlikely that you could replace Solr with HBase. They're really apples and oranges. On Wed, Feb 22, 2012 at 1:09 AM, Bing Li lbl...@gmail.com wrote: Dear all, I wonder how data in HBase is indexed? Now Solr is used in my system because data is managed in inverted index. Such an index is suitable to retrieve unstructured and huge amount of data. How does HBase deal with the issue? May I replaced Solr with HBase? Thanks so much! Best regards, Bing
Re: Solr HBase - Re: How is Data Indexed in HBase?
Mr Gupta, Thanks so much for your reply! In my use cases, retrieving data by keyword is one of them. I think Solr is a proper choice. However, Solr does not provide a complex enough support to rank. And, frequent updating is also not suitable in Solr. So it is difficult to retrieve data randomly based on the values other than keyword frequency in text. In this case, I attempt to use HBase. But I don't know how HBase support high performance when it needs to keep consistency in a large scale distributed system. Now both of them are used in my system. I will check out ElasticSearch. Best regards, Bing On Thu, Feb 23, 2012 at 1:35 AM, T Vinod Gupta tvi...@readypulse.comwrote: Bing, Its a classic battle on whether to use solr or hbase or a combination of both. both systems are very different but there is some overlap in the utility. they also differ vastly when it compares to computation power, storage needs, etc. so in the end, it all boils down to your use case. you need to pick the technology that it best suited to your needs. im still not clear on your use case though. btw, if you haven't started using solr yet - then you might want to checkout ElasticSearch. I spent over a week researching between solr and ES and eventually chose ES due to its cool merits. thanks On Wed, Feb 22, 2012 at 9:31 AM, Ted Yu yuzhih...@gmail.com wrote: There is no secondary index support in HBase at the moment. It's on our road map. FYI On Wed, Feb 22, 2012 at 9:28 AM, Bing Li lbl...@gmail.com wrote: Jacques, Yes. But I still have questions about that. In my system, when users search with a keyword arbitrarily, the query is forwarded to Solr. No any updating operations but appending new indexes exist in Solr managed data. When I need to retrieve data based on ranking values, HBase is used. And, the ranking values need to be updated all the time. Is that correct? My question is that the performance must be low if keeping consistency in a large scale distributed environment. How does HBase handle this issue? Thanks so much! Bing On Thu, Feb 23, 2012 at 1:17 AM, Jacques whs...@gmail.com wrote: It is highly unlikely that you could replace Solr with HBase. They're really apples and oranges. On Wed, Feb 22, 2012 at 1:09 AM, Bing Li lbl...@gmail.com wrote: Dear all, I wonder how data in HBase is indexed? Now Solr is used in my system because data is managed in inverted index. Such an index is suitable to retrieve unstructured and huge amount of data. How does HBase deal with the issue? May I replaced Solr with HBase? Thanks so much! Best regards, Bing
TimeStampFilter - type int out of range
Dear all, I am running the sample about TimeStampFilter as follows. ListLong ts = new ArrayListLong(); ts.add(new Long(1329640759364)); ts.add(new Long(1329640759372)); ts.add(new Long(1329640759378)); Filter filter = new TimestampsFilter(ts); When compiling the above code, the error is type int out of range. But the time stamp is that long. How to handle this problem? Thanks so much! Best regards, Bing
Re: ERROR zookeeper.RecoverableZooKeeper: ZooKeeper exists failed after 3 retries
Stack, The link just describes a standalone mode for HBase. If possible, I think a pseudo-distributed mode is also preferred. Thanks, Bing On Fri, Feb 17, 2012 at 11:10 PM, Stack st...@duboce.net wrote: On Thu, Feb 16, 2012 at 11:03 PM, Bing Li lbl...@gmail.com wrote: I just made summary about the experiences to set up a pseudo-distributed mode HBase. Thank you for the writeup. What would you have us change in here: http://hbase.apache.org/book/quickstart.html? Thanks, St.Ack
Re: ERROR zookeeper.RecoverableZooKeeper: ZooKeeper exists failed after 3 retries
Yes, I noticed that. But it missed something I mentioned in my previous email. Thanks, Bing On Sat, Feb 18, 2012 at 12:11 AM, Stack st...@duboce.net wrote: The next page is on pseudo-distributed: http://hbase.apache.org/book/standalone_dist.html#distributed St.Ack On Fri, Feb 17, 2012 at 7:18 AM, Bing Li lbl...@gmail.com wrote: Stack, The link just describes a standalone mode for HBase. If possible, I think a pseudo-distributed mode is also preferred. Thanks, Bing On Fri, Feb 17, 2012 at 11:10 PM, Stack st...@duboce.net wrote: On Thu, Feb 16, 2012 at 11:03 PM, Bing Li lbl...@gmail.com wrote: I just made summary about the experiences to set up a pseudo-distributed mode HBase. Thank you for the writeup. What would you have us change in here: http://hbase.apache.org/book/quickstart.html? Thanks, St.Ack
Re: ERROR zookeeper.RecoverableZooKeeper: ZooKeeper exists failed after 3 retries
Dear Jean-Daniel, The issue is solved. I think the book in the HBase the Definitive Guide does not give sufficient descriptions about the pseudo-distributed mode. Thanks so much! Bing On Tue, Feb 14, 2012 at 7:27 AM, Jean-Daniel Cryans jdcry...@apache.orgwrote: Is zookeeper running properly? Is it where your shell expects it to be? Can you access HBase's web ui on port 60010? J-D On Sun, Feb 12, 2012 at 1:00 PM, Bing Li lbl...@gmail.com wrote: Dear all, I am a new learner of HBase. I tried to set up my HBase on a pseudo-distributed HDFS. After starting HDFS by running ./start-dfs.sh and ./start-hbase.sh, I started the HBase shell. ./hbase shell It was started properly. However, when I typed the command, status, as follows. hbase(main):001:0 status It got the following exception. Since I had very limited experiences to use HBase, I could not figure out what the problem was. SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/hbase-0.92.0/lib/slf4j-log4j12-1.5.8.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/hadoop-1.0.0/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 12/02/13 04:34:01 ERROR zookeeper.RecoverableZooKeeper: ZooKeeper exists failed after 3 retries 12/02/13 04:34:01 WARN zookeeper.ZKUtil: hconnection Unable to set watcher on znode /hbase/master org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1003) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:154) at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:226) at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:76) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:580) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.init(HConnectionManager.java:569) at org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:186) at org.apache.hadoop.hbase.client.HBaseAdmin.init(HBaseAdmin.java:98) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.jruby.javasupport.JavaConstructor.newInstanceDirect(JavaConstructor.java:275) at org.jruby.java.invokers.ConstructorInvoker.call(ConstructorInvoker.java:91) at org.jruby.java.invokers.ConstructorInvoker.call(ConstructorInvoker.java:178) at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:322) at org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:178) at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:182) at org.jruby.java.proxies.ConcreteJavaProxy$2.call(ConcreteJavaProxy.java:47) at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:322) Could you please give me a hand? Thanks so much! Best regards, Bing
ERROR: org.apache.hadoop.hbase.MasterNotRunningException: Retried 7 Times
Dear all, After searching on the Web and asking for help from friends, I noticed that the pseudo distributed configuration in the book, HBase the Definitive Guide, was not complete. Now the ZooKeeper related exception is fixed. However, I got another error when typing status in the HBase shell. ERROR: org.apache.hadoop.hbase.MasterNotRunningException: Retried 7 Times I am trying to fix it myself. Your help is highly appreciated. Thanks so much! Bing Li On Mon, Feb 13, 2012 at 5:00 AM, Bing Li lbl...@gmail.com wrote: Dear all, I am a new learner of HBase. I tried to set up my HBase on a pseudo-distributed HDFS. After starting HDFS by running ./start-dfs.sh and ./start-hbase.sh, I started the HBase shell. ./hbase shell It was started properly. However, when I typed the command, status, as follows. hbase(main):001:0 status It got the following exception. Since I had very limited experiences to use HBase, I could not figure out what the problem was. SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/hbase-0.92.0/lib/slf4j-log4j12-1.5.8.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/hadoop-1.0.0/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 12/02/13 04:34:01 ERROR zookeeper.RecoverableZooKeeper: ZooKeeper exists failed after 3 retries 12/02/13 04:34:01 WARN zookeeper.ZKUtil: hconnection Unable to set watcher on znode /hbase/master org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1003) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:154) at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:226) at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:76) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:580) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.init(HConnectionManager.java:569) at org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:186) at org.apache.hadoop.hbase.client.HBaseAdmin.init(HBaseAdmin.java:98) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.jruby.javasupport.JavaConstructor.newInstanceDirect(JavaConstructor.java:275) at org.jruby.java.invokers.ConstructorInvoker.call(ConstructorInvoker.java:91) at org.jruby.java.invokers.ConstructorInvoker.call(ConstructorInvoker.java:178) at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:322) at org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:178) at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:182) at org.jruby.java.proxies.ConcreteJavaProxy$2.call(ConcreteJavaProxy.java:47) at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:322) Could you please give me a hand? Thanks so much! Best regards, Bing
Re: ERROR: org.apache.hadoop.hbase.MasterNotRunningException: Retried 7 Times
Dear Jimmy, Thanks so much for your reply! I didn't set up the zookeeper.quorom. After getting your email, I made a change. Now my hbase-site.xml is as follows. configuration property namehbase.rootdir/name valuehdfs://localhost:9000/hbase/value /property property namedfs.replication/name value1/value /property property namehbase.cluster.distributed/name valuetrue/value /property property namehbase.zookeeper.quorum/name valuelocalhost/value /property /configuration The previous error is still existed. I feel weird why HBase developers cannot provide a reliable description about their work. Best, Bing On Tue, Feb 14, 2012 at 2:16 AM, Jimmy Xiang jxi...@cloudera.com wrote: What's your hbase.zookeeper.quorom configuration? You can check out this quick start guide: http://hbase.apache.org/book/quickstart.html Thanks, Jimmy On Mon, Feb 13, 2012 at 10:09 AM, Bing Li lbl...@gmail.com wrote: Dear all, After searching on the Web and asking for help from friends, I noticed that the pseudo distributed configuration in the book, HBase the Definitive Guide, was not complete. Now the ZooKeeper related exception is fixed. However, I got another error when typing status in the HBase shell. ERROR: org.apache.hadoop.hbase.MasterNotRunningException: Retried 7 Times I am trying to fix it myself. Your help is highly appreciated. Thanks so much! Bing Li On Mon, Feb 13, 2012 at 5:00 AM, Bing Li lbl...@gmail.com wrote: Dear all, I am a new learner of HBase. I tried to set up my HBase on a pseudo-distributed HDFS. After starting HDFS by running ./start-dfs.sh and ./start-hbase.sh, I started the HBase shell. ./hbase shell It was started properly. However, when I typed the command, status, as follows. hbase(main):001:0 status It got the following exception. Since I had very limited experiences to use HBase, I could not figure out what the problem was. SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/hbase-0.92.0/lib/slf4j-log4j12-1.5.8.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/hadoop-1.0.0/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 12/02/13 04:34:01 ERROR zookeeper.RecoverableZooKeeper: ZooKeeper exists failed after 3 retries 12/02/13 04:34:01 WARN zookeeper.ZKUtil: hconnection Unable to set watcher on znode /hbase/master org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1003) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:154) at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:226) at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:76) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:580) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.init(HConnectionManager.java:569) at org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:186) at org.apache.hadoop.hbase.client.HBaseAdmin.init(HBaseAdmin.java:98) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.jruby.javasupport.JavaConstructor.newInstanceDirect(JavaConstructor.java:275) at org.jruby.java.invokers.ConstructorInvoker.call(ConstructorInvoker.java:91) at org.jruby.java.invokers.ConstructorInvoker.call(ConstructorInvoker.java:178) at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:322) at org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:178) at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:182) at org.jruby.java.proxies.ConcreteJavaProxy$2.call(ConcreteJavaProxy.java:47) at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:322) Could you please give me a hand? Thanks so much! Best regards, Bing
Re: ERROR: org.apache.hadoop.hbase.MasterNotRunningException: Retried 7 Times
Dear Jimmy, I am a new user of HBase. My experiences in HBase and Hadoop is very limited. I just tried to follow some books, such as Hadoop/HBase the Definitive Guide. However, I still got some problems. What I am trying to do is just to set up a pseudo distributed HBase environment on a single node. After that, I will start my system programming in Java. I hope I could deploy the system in fully distributed mode when my system is done. So what I am configuring is very simple. Do I need to set up the zookeeper port in hbase-site.xml? Thanks so much! Best, Bing On Tue, Feb 14, 2012 at 3:16 AM, Jimmy Xiang jxi...@cloudera.com wrote: Have you restarted your HBase after the change? What's the zookeeper port does your HMaster use? Can you run the following to checkout where is your HMaster as below? hbase zkcli then: get /hbase/master It should show you master location. It seems you have a distributed installation. How many regionservers do you have? Can you check your master web UI to make sure all look fine. Thanks, Jimmy On Mon, Feb 13, 2012 at 10:51 AM, Bing Li lbl...@gmail.com wrote: Dear Jimmy, Thanks so much for your reply! I didn't set up the zookeeper.quorom. After getting your email, I made a change. Now my hbase-site.xml is as follows. configuration property namehbase.rootdir/name valuehdfs://localhost:9000/hbase/value /property property namedfs.replication/name value1/value /property property namehbase.cluster.distributed/name valuetrue/value /property property namehbase.zookeeper.quorum/name valuelocalhost/value /property /configuration The previous error is still existed. I feel weird why HBase developers cannot provide a reliable description about their work. Best, Bing On Tue, Feb 14, 2012 at 2:16 AM, Jimmy Xiang jxi...@cloudera.com wrote: What's your hbase.zookeeper.quorom configuration? You can check out this quick start guide: http://hbase.apache.org/book/quickstart.html Thanks, Jimmy On Mon, Feb 13, 2012 at 10:09 AM, Bing Li lbl...@gmail.com wrote: Dear all, After searching on the Web and asking for help from friends, I noticed that the pseudo distributed configuration in the book, HBase the Definitive Guide, was not complete. Now the ZooKeeper related exception is fixed. However, I got another error when typing status in the HBase shell. ERROR: org.apache.hadoop.hbase.MasterNotRunningException: Retried 7 Times I am trying to fix it myself. Your help is highly appreciated. Thanks so much! Bing Li On Mon, Feb 13, 2012 at 5:00 AM, Bing Li lbl...@gmail.com wrote: Dear all, I am a new learner of HBase. I tried to set up my HBase on a pseudo-distributed HDFS. After starting HDFS by running ./start-dfs.sh and ./start-hbase.sh, I started the HBase shell. ./hbase shell It was started properly. However, when I typed the command, status, as follows. hbase(main):001:0 status It got the following exception. Since I had very limited experiences to use HBase, I could not figure out what the problem was. SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/hbase-0.92.0/lib/slf4j-log4j12-1.5.8.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/hadoop-1.0.0/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 12/02/13 04:34:01 ERROR zookeeper.RecoverableZooKeeper: ZooKeeper exists failed after 3 retries 12/02/13 04:34:01 WARN zookeeper.ZKUtil: hconnection Unable to set watcher on znode /hbase/master org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1003) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:154) at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:226) at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:76) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:580) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.init(HConnectionManager.java:569) at org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:186) at org.apache.hadoop.hbase.client.HBaseAdmin.init(HBaseAdmin.java:98) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method
Re: ERROR: org.apache.hadoop.hbase.MasterNotRunningException: Retried 7 Times
Dear Jimmy, I configured the standalone mode successfully. But I wonder why the pseudo distributed one does work. I checked in logs and got the following exceptions. Does the information give you some hints? Thanks so much for your help again! Best, Bing 2012-02-13 18:25:49,782 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. java.net.ConnectException: Call to localhost/127.0.0.1:9000 failed on connection exception: java.net.ConnectException: Connection refuse d at org.apache.hadoop.ipc.Client.wrapException(Client.java:1095) at org.apache.hadoop.ipc.Client.call(Client.java:1071) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225) at $Proxy10.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379) at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:238) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:203) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187) at org.apache.hadoop.hbase.util.FSUtils.getRootDir(FSUtils.java:471) at org.apache.hadoop.hbase.master.MasterFileSystem.init(MasterFileSystem.java:94) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:448) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:326) at java.lang.Thread.run(Thread.java:662) Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:656) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:434) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:560) at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:184) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1202) at org.apache.hadoop.ipc.Client.call(Client.java:1046) ... 18 more 2012-02-13 18:25:49,787 INFO org.apache.hadoop.hbase.master.HMaster: Aborting 2012-02-13 18:25:49,787 DEBUG org.apache.hadoop.hbase.master.HMaster: Stopping service threads Thanks so much! Bing On Tue, Feb 14, 2012 at 3:35 AM, Jimmy Xiang jxi...@cloudera.com wrote: In this case, you may just use the standalone mode. You can follow the quick start step by step. The default zookeeper port is 2181, you don't need to configure it. On Mon, Feb 13, 2012 at 11:28 AM, Bing Li lbl...@gmail.com wrote: Dear Jimmy, I am a new user of HBase. My experiences in HBase and Hadoop is very limited. I just tried to follow some books, such as Hadoop/HBase the Definitive Guide. However, I still got some problems. What I am trying to do is just to set up a pseudo distributed HBase environment on a single node. After that, I will start my system programming in Java. I hope I could deploy the system in fully distributed mode when my system is done. So what I am configuring is very simple. Do I need to set up the zookeeper port in hbase-site.xml? Thanks so much! Best, Bing On Tue, Feb 14, 2012 at 3:16 AM, Jimmy Xiang jxi...@cloudera.com wrote: Have you restarted your HBase after the change? What's the zookeeper port does your HMaster use? Can you run the following to checkout where is your HMaster as below? hbase zkcli then: get /hbase/master It should show you master location. It seems you have a distributed installation. How many regionservers do you have? Can you check your master web UI to make sure all look fine. Thanks, Jimmy On Mon, Feb 13, 2012 at 10:51 AM, Bing Li lbl...@gmail.com wrote: Dear Jimmy, Thanks so much for your reply! I didn't set up the zookeeper.quorom. After getting your email, I made a change. Now my hbase-site.xml is as follows. configuration property namehbase.rootdir/name valuehdfs://localhost:9000/hbase/value /property property namedfs.replication/name value1/value /property property namehbase.cluster.distributed/name valuetrue/value /property property namehbase.zookeeper.quorum/name valuelocalhost/value /property /configuration The previous error is still existed. I feel weird why HBase developers cannot provide a reliable description about their work. Best, Bing On Tue, Feb 14, 2012 at 2:16 AM, Jimmy Xiang jxi...@cloudera.comwrote: What's your hbase.zookeeper.quorom configuration? You can check out this quick start guide: http
Re: ERROR: org.apache.hadoop.hbase.MasterNotRunningException: Retried 7 Times
Dear Jimmy, Thanks so much for your instant reply! My hbase-site.xml is like the following. property namehbase.rootdir/name valuehdfs://localhost:9000/hbase/value /property property namedfs.replication/name value1/value /property property namehbase.master/name valuelocalhost:6/value /property property namehbase.cluster.distributed/name valuetrue/value /property property namehbase.zookeeper.quorum/name valuelocalhost/value /property When I run hadoop fs -ls /, the directories and files under the linux root are displayed. Best, Bing On Tue, Feb 14, 2012 at 3:48 AM, Jimmy Xiang jxi...@cloudera.com wrote: Which port does your HDFS listen to? It is not 9000, right? namehbase.rootdir/name valuehdfs://localhost:9000/hbase/value You need to fix this and make sure your HDFS is working, for example, the following command should work for you. hadoop fs -ls / On Mon, Feb 13, 2012 at 11:44 AM, Bing Li lbl...@gmail.com wrote: Dear Jimmy, I configured the standalone mode successfully. But I wonder why the pseudo distributed one does work. I checked in logs and got the following exceptions. Does the information give you some hints? Thanks so much for your help again! Best, Bing 2012-02-13 18:25:49,782 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. java.net.ConnectException: Call to localhost/127.0.0.1:9000 failed on connection exception: java.net.ConnectException: Connection refuse d at org.apache.hadoop.ipc.Client.wrapException(Client.java:1095) at org.apache.hadoop.ipc.Client.call(Client.java:1071) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225) at $Proxy10.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379) at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:238) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:203) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187) at org.apache.hadoop.hbase.util.FSUtils.getRootDir(FSUtils.java:471) at org.apache.hadoop.hbase.master.MasterFileSystem.init(MasterFileSystem.java:94) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:448) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:326) at java.lang.Thread.run(Thread.java:662) Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:656) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:434) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:560) at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:184) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1202) at org.apache.hadoop.ipc.Client.call(Client.java:1046) ... 18 more 2012-02-13 18:25:49,787 INFO org.apache.hadoop.hbase.master.HMaster: Aborting 2012-02-13 18:25:49,787 DEBUG org.apache.hadoop.hbase.master.HMaster: Stopping service threads Thanks so much! Bing On Tue, Feb 14, 2012 at 3:35 AM, Jimmy Xiang jxi...@cloudera.com wrote: In this case, you may just use the standalone mode. You can follow the quick start step by step. The default zookeeper port is 2181, you don't need to configure it. On Mon, Feb 13, 2012 at 11:28 AM, Bing Li lbl...@gmail.com wrote: Dear Jimmy, I am a new user of HBase. My experiences in HBase and Hadoop is very limited. I just tried to follow some books, such as Hadoop/HBase the Definitive Guide. However, I still got some problems. What I am trying to do is just to set up a pseudo distributed HBase environment on a single node. After that, I will start my system programming in Java. I hope I could deploy the system in fully distributed mode when my system is done. So what I am configuring is very simple. Do I need to set up the zookeeper port in hbase-site.xml? Thanks so much! Best, Bing On Tue, Feb 14, 2012 at 3:16 AM, Jimmy Xiang jxi...@cloudera.comwrote: Have you restarted your HBase after the change? What's the zookeeper port does your HMaster use? Can you run the following to checkout where is your HMaster as below? hbase zkcli then: get /hbase/master It should show you master location. It seems
Re: Why Cannot the Data/Name Directory Be Changed?
Dear all, I fixed the problem in the previous email by doing that on Ubuntu 10 instead of RedHat 9. RedHat 9 might be too old? Thanks so much! Bing On Tue, Feb 14, 2012 at 1:00 PM, Bing Li lbl...@gmail.com wrote: Dear all, I am a new user of HDFS. The default Data/Name directory is /tmp. I would like to change it. The hdfs-site.xml is updated as follows. property namedfs.replication/name value1/value descriptionThe actual number of replications can be specified when the file is created./description /property property namehadoop.tmp.dir/name value/home/bing/GreatFreeLabs/Hadoop/FS/value /property property namedfs.name.dir/name value${hadoop.tmp.dir}/dfs/name//value /property property namedfs.data.dir/name value${hadoop.tmp.dir}/dfs/data//value /property But when formatting by running the following command, I was asked to format the /tmp. Why? $ hadoop namenode -format Re-format filesystem in /tmp/hadoop-libing/dfs/name ? (Y or N) N Because the updated name node is not formatted, the name node cannot be started. How to solve the problem? Thanks so much! Best regards, Bing
Which Version of Hadoop Should I Use?
Dear all, I am starting to learn how to use HBase. I am a little bit confused about the version of Hadoop. Which one should I use? According to the book, HBase - The Definitive Guide, Page. 47, it is said that The current version of HBase will only run on Hadoop 0.20.x. But, in the page of http://hbase.apache.org/book/hadoop.html, it is said HBase will lose data unless it is running on an HDFS that has a durable sync implementation. Hadoop 0.20.2, Hadoop 0.20.203.0, and Hadoop 0.20.204.0 DO NOT have this attribute. Currently only Hadoop versions 0.20.205.x or any release in excess of this version -- this includes hadoop 1.0.0 -- have a working, durable sync. If so, Hadoop 0.20.x can NOT be used with the latest version HBase? Now the version of HBase I am learning is 0.92. I noticed that a jar file, hadoop-core-1.0.0.jar, was there. It seems that the HBase can run with Hadoop 1.0? Could you please give me a hand on this? Thanks so much! Best regards, Bing
Fwd: How to Rank in HBase?
Another question is whether it is proper to update data in HBase frequently? Thanks, Bing -- Forwarded message -- From: Bing Li lbl...@gmail.com Date: Mon, Jan 30, 2012 at 4:00 AM Subject: How to Rank in HBase? To: user@hbase.apache.org Dear all, I am a new user of HBase. I wonder the ranking strategy in HBase. I am now using Solr to manage the large amount of data in my system. I got one issue when loading data from Solr. In most cases, data is loaded and ranked from Solr according to keyword partial matching degree. Well, my case is different. I hope data can be loaded by another complete matching field, e.g., the author of the data. I noticed that Solr could not rank the data properly for the complete matching. I guess I can do the same thing in HBase too, right? My question is whether it is possible to rank data in HBase according to customized strategy, like PageRank? Thasks, Bing
Re: How to Rank in HBase?
Dear Stack, Thanks so much for your reply! According to my understanding, in a large scale distributed system, it prefers write-once-read-many. Frequent-updating must bring heavy load for the consistency issue and the performance must be lowered. HBase must not be suitable to be updated frequently, right? Best regards, Bing On Mon, Jan 30, 2012 at 1:51 PM, Stack st...@duboce.net wrote: On Sun, Jan 29, 2012 at 12:02 PM, Bing Li lbl...@gmail.com wrote: Another question is whether it is proper to update data in HBase frequently? This is 'normal', yes. St.Ack
Re: How to Rank in HBase?
Dear Ian, I appreciate so much for your detailed reply! I will read the book about HBase. Best regards, Bing On Mon, Jan 30, 2012 at 2:36 PM, Ian Varley ivar...@salesforce.com wrote: Bing, HBase uses an approach to structuring its storage known as Log Structured Merge Trees, which you can learn more about here: http://scholar.google.com/scholar?q=log+structured+merge+treehl=enas_sdt=0as_vis=1oi=scholart As well as in Lars George's great book, here: http://shop.oreilly.com/product/0636920014348.do It does all of these frequent updates just in memory, which is very fast; at the same time, it writes a simple forward-only log of all edits (known as the Write Ahead Log, or WAL) to disk in order to provide durability in the event of machine failure. It periodically writes the in-memory data to disk in big immutable ordered chunks, called store files, which is very efficient. Future reads of the data then merge the on-disk store file data with the current state in memory, to get the full picture of the state of any row. Over time, the many small store files get compacted into bigger files, so that individual reads don't have too many files to read from. Each get or scan operation can just read small blocks of the store files; when you ask for one record, it doesn't have to read gigabytes of data from the disk, it can just read a small block. As such, random small reads and writes on a very big data set can be done efficiently. Furthermore, it's fine to update the data store frequently. For any given record, you can make as many updates as you want to the in-memory structures, and these aren't written to disk until the memory store is flushed (and into the WAL, but that's also efficient b/c it's ordered by update time, not record key). It all happens in memory, which is very fast (but, again, it's safe b/c of the WAL). There are even some recent JIRAs that make that process more efficient, by, for example, HBASE-4241 https://issues.apache.org/jira/browse/HBASE-4241. One way to think about it is that HBase is *precisely* a layer that adds these efficient random read/write capabilities on top of the Hadoop distributed file system (HDFS), and takes care of doing that in a way that parallelizes nicely across a large cluster of machines, deals with machine failures, etc. Ian On Jan 29, 2012, at 10:16 PM, Bing Li wrote: Dear Stack, Thanks so much for your reply! According to my understanding, in a large scale distributed system, it prefers write-once-read-many. Frequent-updating must bring heavy load for the consistency issue and the performance must be lowered. HBase must not be suitable to be updated frequently, right? Best regards, Bing On Mon, Jan 30, 2012 at 1:51 PM, Stack st...@duboce.netmailto: st...@duboce.net wrote: On Sun, Jan 29, 2012 at 12:02 PM, Bing Li lbl...@gmail.commailto: lbl...@gmail.com wrote: Another question is whether it is proper to update data in HBase frequently? This is 'normal', yes. St.Ack