Table and Family

2013-08-12 Thread Bing Li
Hi, all,

My understandings about HBase table and its family are as follows.

1) Each table can consist of multiple families;

2) When retrieving with SingleColumnValueFilter, if the family is
specified, other families contained in the same table are not
affected.

Are these claims right? But I got a problem which conflicts with the
above understandings.

In the following code, even though no any data in the family of
ContrivedNeighborStructure.NODE_NEIGHBOR_IN_GROUP_BASICS_FAMILY, the
for-loop runs many times if other families has the column of
ContrivedNeighborStructure.NODE_NEIGHBOR_IN_GROUP_DC_KEY_COLUMN.

Is that normal in HBase? If so, I think it is not a good design. No
column overlaps must exist among the families of the same table?
Otherwise, retrieving the table must cause waste of scanning loops?

Thanks so much!

Best wishes,
Bing

SingleColumnValueFilter dcKeyFilter = new
SingleColumnValueFilter(ContrivedNeighborStructure.NODE_NEIGHBOR_IN_GROUP_BASICS_FAMILY,
ContrivedNeighborStructure.NODE_NEIGHBOR_IN_GROUP_DC_KEY_COLUMN,
CompareFilter.CompareOp.EQUAL, new SubstringComparator(dcKey));
Scan scan = new Scan();
scan.setFilter(dcKeyFilter);
scan.setCaching(Parameters.CACHING_SIZE);
scan.setBatch(Parameters.BATCHING_SIZE);

String qualifier;
String hostNodeKey = SocialRole.NO_NODE_KEY;
String groupKey = SocialGroup.NO_GROUP_KEY;
int timingScale = TimingScale.NO_TIMING_SCALE;
String key;
try
{
ResultScanner scanner = this.neighborTable.getScanner(scan);
for (Result result : scanner)
{
for (KeyValue kv : result.raw())
{
qualifier = Bytes.toString(kv.getQualifier());
if
(qualifier.equals(ContrivedNeighborStructure.NODE_NEIGHBOR_IN_GROUP_HOST_NODE_KEY_STRING_COLUMN))
{
hostNodeKey = Bytes.toString(kv.getValue());
}
else if
(qualifier.equals(ContrivedNeighborStructure.NODE_NEIGHBOR_IN_GROUP_HOST_GROUP_KEY_STRING_COLUMN))
{
groupKey = Bytes.toString(kv.getValue());
}
else if
(qualifier.equals(ContrivedNeighborStructure.NODE_NEIGHBOR_IN_GROUP_TIMING_SCALE_STRING_COLUMN))
{
timingScale = Bytes.toInt(kv.getValue());
}
}
if (!hostNodeKey.equals(SocialRole.NO_NODE_KEY) 
!groupKey.equals(SocialGroup.NO_GROUP_KEY)  timingScale !=
TimingScale.NO_TIMING_SCALE)
{
key = Tools.GetKeyOfNode(hostNodeKey, groupKey,
timingScale);
if (!neighborMap.containsKey(key))
{
neighborMap.put(key, new
NodeNeighborInGroup(hostNodeKey, groupKey, timingScale));
}
}
hostNodeKey = SocialRole.NO_NODE_KEY;
groupKey = SocialGroup.NO_GROUP_KEY;
timingScale = TimingScale.NO_TIMING_SCALE;
}
}
catch (IOException e)
{
e.printStackTrace();
}


Performance Are Affected? - Table and Family

2013-08-12 Thread Bing Li
Dear all,

I have one additional question about table and family.

A table which has less families is faster than the one which has more
families if the amount of data they have is the same? Correct or not?

Is it a higher performance design to put fewer families into a table?

Thanks so much!

Best regards,
Bing


On Tue, Aug 13, 2013 at 12:31 AM, Stas Maksimov maksi...@gmail.com wrote:
 Hi there,

 On your second point, I don't think column family can ever be an optional
 parameter, so I'm not sure this understanding is correct.

 Regards,
 Stas.


 On 12 August 2013 17:22, Bing Li lbl...@gmail.com wrote:

 Hi, all,

 My understandings about HBase table and its family are as follows.

 1) Each table can consist of multiple families;

 2) When retrieving with SingleColumnValueFilter, if the family is
 specified, other families contained in the same table are not
 affected.

 Are these claims right? But I got a problem which conflicts with the
 above understandings.

 In the following code, even though no any data in the family of
 ContrivedNeighborStructure.NODE_NEIGHBOR_IN_GROUP_BASICS_FAMILY, the
 for-loop runs many times if other families has the column of
 ContrivedNeighborStructure.NODE_NEIGHBOR_IN_GROUP_DC_KEY_COLUMN.

 Is that normal in HBase? If so, I think it is not a good design. No
 column overlaps must exist among the families of the same table?
 Otherwise, retrieving the table must cause waste of scanning loops?

 Thanks so much!

 Best wishes,
 Bing

 SingleColumnValueFilter dcKeyFilter = new

 SingleColumnValueFilter(ContrivedNeighborStructure.NODE_NEIGHBOR_IN_GROUP_BASICS_FAMILY,
 ContrivedNeighborStructure.NODE_NEIGHBOR_IN_GROUP_DC_KEY_COLUMN,
 CompareFilter.CompareOp.EQUAL, new SubstringComparator(dcKey));
 Scan scan = new Scan();
 scan.setFilter(dcKeyFilter);
 scan.setCaching(Parameters.CACHING_SIZE);
 scan.setBatch(Parameters.BATCHING_SIZE);

 String qualifier;
 String hostNodeKey = SocialRole.NO_NODE_KEY;
 String groupKey = SocialGroup.NO_GROUP_KEY;
 int timingScale = TimingScale.NO_TIMING_SCALE;
 String key;
 try
 {
 ResultScanner scanner = this.neighborTable.getScanner(scan);
 for (Result result : scanner)
 {
 for (KeyValue kv : result.raw())
 {
 qualifier = Bytes.toString(kv.getQualifier());
 if

 (qualifier.equals(ContrivedNeighborStructure.NODE_NEIGHBOR_IN_GROUP_HOST_NODE_KEY_STRING_COLUMN))
 {
 hostNodeKey = Bytes.toString(kv.getValue());
 }
 else if

 (qualifier.equals(ContrivedNeighborStructure.NODE_NEIGHBOR_IN_GROUP_HOST_GROUP_KEY_STRING_COLUMN))
 {
 groupKey = Bytes.toString(kv.getValue());
 }
 else if

 (qualifier.equals(ContrivedNeighborStructure.NODE_NEIGHBOR_IN_GROUP_TIMING_SCALE_STRING_COLUMN))
 {
 timingScale = Bytes.toInt(kv.getValue());
 }
 }
 if (!hostNodeKey.equals(SocialRole.NO_NODE_KEY) 
 !groupKey.equals(SocialGroup.NO_GROUP_KEY)  timingScale !=
 TimingScale.NO_TIMING_SCALE)
 {
 key = Tools.GetKeyOfNode(hostNodeKey, groupKey,
 timingScale);
 if (!neighborMap.containsKey(key))
 {
 neighborMap.put(key, new
 NodeNeighborInGroup(hostNodeKey, groupKey, timingScale));
 }
 }
 hostNodeKey = SocialRole.NO_NODE_KEY;
 groupKey = SocialGroup.NO_GROUP_KEY;
 timingScale = TimingScale.NO_TIMING_SCALE;
 }
 }
 catch (IOException e)
 {
 e.printStackTrace();
 }




Re: Is synchronized required?

2013-02-07 Thread Bing Li
.

...
public void dispose()
{
try
{
this.rankTable.close();
}
catch (IOException e)
{
e.printStackTrace();
}
}
...

On Wed, Feb 6, 2013 at 1:05 PM, lars hofhansl la...@apache.org wrote:
 Are you sharing this.rankTable between threads? HTable is not thread safe.

 -- Lars



 
  From: Bing Li lbl...@gmail.com
 To: hbase-u...@hadoop.apache.org hbase-u...@hadoop.apache.org; user 
 user@hbase.apache.org
 Sent: Tuesday, February 5, 2013 8:54 AM
 Subject: Re: Is synchronized required?

 Dear all,

 After synchronized is removed from the method of writing, I get the
 following exceptions when reading. Before the removal, no such
 exceptions.

 Could you help me how to solve it?

 Thanks so much!

 Best wishes,
 Bing

  [java] Feb 6, 2013 12:21:31 AM
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection run
  [java] WARNING: Unexpected exception receiving call responses
  [java] java.lang.NullPointerException
  [java] at
 org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:521)
  [java] at
 org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297)
  [java] at
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:593)
  [java] at
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:505)
  [java] Feb 6, 2013 12:21:31 AM
 org.apache.hadoop.hbase.client.ScannerCallable close
  [java] WARNING: Ignore, probably already closed
  [java] java.io.IOException: Call to greatfreeweb/127.0.1.1:60020
 failed on local exception: java.io.IOException: Unexpected exception
 receiving call responses
  [java] at
 org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:934)
  [java] at
 org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:903)
  [java] at
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
  [java] at $Proxy6.close(Unknown Source)
  [java] at
 org.apache.hadoop.hbase.client.ScannerCallable.close(ScannerCallable.java:112)
  [java] at
 org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:74)
  [java] at
 org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:39)
  [java] at
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1325)
  [java] at
 org.apache.hadoop.hbase.client.HTable$ClientScanner.nextScanner(HTable.java:1167)
  [java] at
 org.apache.hadoop.hbase.client.HTable$ClientScanner.next(HTable.java:1296)
  [java] at
 org.apache.hadoop.hbase.client.HTable$ClientScanner$1.hasNext(HTable.java:1356)
  [java] at
 com.greatfree.hbase.rank.NodeRankRetriever.LoadNodeGroupNodeRankRowKeys(NodeRankRetriever.java:348)
  [java] at
 com.greatfree.ranking.PersistNodeGroupNodeRanksThread.run(PersistNodeGroupNodeRanksThread.java:29)
  [java] at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  [java] at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  [java] at java.lang.Thread.run(Thread.java:662)
  [java] Caused by: java.io.IOException: Unexpected exception
 receiving call responses
  [java] at
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:509)
  [java] Caused by: java.lang.NullPointerException
  [java] at
 org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:521)
  [java] at
 org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297)
  [java] at
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:593)
  [java] at
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:505)


 The code that causes the exceptions is as follows.

 public SetString LoadNodeGroupNodeRankRowKeys(String
 hostNodeKey, String groupKey, int timingScale)
 {
 ListFilter nodeGroupFilterList = new ArrayListFilter();

 SingleColumnValueFilter hostNodeKeyFilter = new
 SingleColumnValueFilter(RankStructure.NODE_GROUP_NODE_RANK_FAMILY,
 RankStructure.NODE_GROUP_NODE_RANK_HOST_NODE_KEY_COLUMN,
 CompareFilter.CompareOp.EQUAL, new SubstringComparator(hostNodeKey));
 hostNodeKeyFilter.setFilterIfMissing(true);
 nodeGroupFilterList.add(hostNodeKeyFilter);

 SingleColumnValueFilter groupKeyFilter = new
 SingleColumnValueFilter(RankStructure.NODE_GROUP_NODE_RANK_FAMILY,
 RankStructure.NODE_GROUP_NODE_RANK_GROUP_KEY_COLUMN,
 CompareFilter.CompareOp.EQUAL, new

Concurrently Reading Still Got Exceptions

2013-02-06 Thread Bing Li
Dear all,

Some exceptions are raised when I concurrently read data from HBase.
The version of HBase I used is 0.92.0.

I cannot fix the problem. Could you please help me?

Thanks so much!

Best wishes,
Bing

  Feb 6, 2013 12:21:31 AM
org.apache.hadoop.hbase.ipc.HBaseClient$Connection run
  WARNING: Unexpected exception receiving call responses
java.lang.NullPointerException
  at 
org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:521)
  at 
org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297)
  at 
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:593)
  at 
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:505)
  Feb 6, 2013 12:21:31 AM
org.apache.hadoop.hbase.client.ScannerCallable close
  WARNING: Ignore, probably already closed
  java.io.IOException: Call to greatfreeweb/127.0.1.1:60020
failed on local exception: java.io.IOException: Unexpected exception
receiving call responses
  at 
org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:934)
  at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:903)
  at 
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
  at $Proxy6.close(Unknown Source)
  at 
org.apache.hadoop.hbase.client.ScannerCallable.close(ScannerCallable.java:112)
  at 
org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:74)
  at 
org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:39)
  at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1325)
  at 
org.apache.hadoop.hbase.client.HTable$ClientScanner.nextScanner(HTable.java:1167)
  at 
org.apache.hadoop.hbase.client.HTable$ClientScanner.next(HTable.java:1296)
  at 
org.apache.hadoop.hbase.client.HTable$ClientScanner$1.hasNext(HTable.java:1356)
  at 
com.greatfree.hbase.rank.NodeRankRetriever.loadNodeGroupNodeRankRowKeys(NodeRankRetriever.java:348)
  at 
com.greatfree.ranking.PersistNodeGroupNodeRanksThread.run(PersistNodeGroupNodeRanksThread.java:29)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:662) Caused by:
java.io.IOException: Unexpected exception receiving call responses
  at 
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:509)
  Caused by: java.lang.NullPointerException
  at 
org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:521)
  at 
org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297)
  at 
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:593)
  at 
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:505)

I read data from HBase concurrently with the following code.

...
ExecutorService threadPool = Executors.newFixedThreadPool(100);
LoadNodeGroupNodeRankRowKeyThread thread;
SetString groupKeys;
for (String nodeKey : nodeKeys)
{
groupKeys = NodeCache.WWW().getGroupKeys(nodeKey);
for (String groupKey : groupKeys)
{
// Threads are initialized and executed here.
thread = new
LoadNodeGroupNodeRankRowKeyThread(nodeKey, groupKey,
TimingScale.PERMANENTLY);
threadPool.execute(thread);
}
}
Scanner in = new Scanner(System.in);
in.nextLine();
threadPool.shutdownNow();
...

The code of LoadNodeGroupNodeRankRowKeyThread is as follows,

...
public void run()
{
NodeRankRetriever retriever = new NodeRankRetriever();
SetString rowKeys =
retriever.loadNodeGroupNodeRankRowKeys(this.hostNodeKey,
this.groupKey, this.timingScale);
if (rowKeys.size()  0)
{
for (String rowKey : rowKeys)
{
System.out.println(rowKey);
}
}
else
{
System.out.println(No data loaded);
}
retriever.dispose();
}
...

The constructor of NodeRankRetriever() just got an instance of HTable
from HTablePool from the following method.

...
public HTableInterface getTable(String 

Re: Is synchronized required?

2013-02-06 Thread Bing Li
Dear Lars,

I am now running HBase in the pseudo-distributed mode. The updated
HTable constructor also works?

Thanks so much!
Bing

On Wed, Feb 6, 2013 at 3:44 PM, lars hofhansl la...@apache.org wrote:
 Don't use a pool at all.
 With HBASE-4805 (https://issues.apache.org/jira/browse/HBASE-4805) you can 
 precreate an HConnection and ExecutorService and then make HTable cheaply on 
 demand every time you need one.

 Checkout HConnectionManager.createConnection(...) and the HTable constructors.

 I need to document this somewhere.


 -- Lars



 
  From: Bing Li lbl...@gmail.com
 To: user user@hbase.apache.org; lars hofhansl la...@apache.org; 
 hbase-u...@hadoop.apache.org hbase-u...@hadoop.apache.org
 Sent: Tuesday, February 5, 2013 10:36 PM
 Subject: Re: Is synchronized required?

 Lars,

 I found that at least the exceptions have nothing to do with shared HTable.

 To save the resources, I designed a pool for the classes that write
 and read from HBase. The primary resources consumed in the classes are
 HTable. The pool has some bugs.

 My question is whether it is necessary to design such a pool? Is it
 fine to create a instance of HTable for each thread?

 I noticed that HBase has a class, HTablePool. Maybe the pool I
 designed is NOT required?

 Thanks so much!

 Best wishes!
 Bing

 On Wed, Feb 6, 2013 at 1:05 PM, lars hofhansl la...@apache.org wrote:
 Are you sharing this.rankTable between threads? HTable is not thread safe.

 -- Lars



 
  From: Bing Li lbl...@gmail.com
 To: hbase-u...@hadoop.apache.org hbase-u...@hadoop.apache.org; user 
 user@hbase.apache.org
 Sent: Tuesday, February 5, 2013 8:54 AM
 Subject: Re: Is synchronized required?

 Dear all,

 After synchronized is removed from the method of writing, I get the
 following exceptions when reading. Before the removal, no such
 exceptions.

 Could you help me how to solve it?

 Thanks so much!

 Best wishes,
 Bing

  [java] Feb 6, 2013 12:21:31 AM
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection run
  [java] WARNING: Unexpected exception receiving call responses
  [java] java.lang.NullPointerException
  [java] at
 org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:521)
  [java] at
 org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297)
  [java] at
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:593)
  [java] at
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:505)
  [java] Feb 6, 2013 12:21:31 AM
 org.apache.hadoop.hbase.client.ScannerCallable close
  [java] WARNING: Ignore, probably already closed
  [java] java.io.IOException: Call to greatfreeweb/127.0.1.1:60020
 failed on local exception: java.io.IOException: Unexpected exception
 receiving call responses
  [java] at
 org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:934)
  [java] at
 org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:903)
  [java] at
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
  [java] at $Proxy6.close(Unknown Source)
  [java] at
 org.apache.hadoop.hbase.client.ScannerCallable.close(ScannerCallable.java:112)
  [java] at
 org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:74)
  [java] at
 org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:39)
  [java] at
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1325)
  [java] at
 org.apache.hadoop.hbase.client.HTable$ClientScanner.nextScanner(HTable.java:1167)
  [java] at
 org.apache.hadoop.hbase.client.HTable$ClientScanner.next(HTable.java:1296)
  [java] at
 org.apache.hadoop.hbase.client.HTable$ClientScanner$1.hasNext(HTable.java:1356)
  [java] at
 com.greatfree.hbase.rank.NodeRankRetriever.LoadNodeGroupNodeRankRowKeys(NodeRankRetriever.java:348)
  [java] at
 com.greatfree.ranking.PersistNodeGroupNodeRanksThread.run(PersistNodeGroupNodeRanksThread.java:29)
  [java] at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  [java] at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  [java] at java.lang.Thread.run(Thread.java:662)
  [java] Caused by: java.io.IOException: Unexpected exception
 receiving call responses
  [java] at
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:509)
  [java] Caused by: java.lang.NullPointerException
  [java] at
 org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:521)
  [java] at
 org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297)
  [java

Re: Is synchronized required?

2013-02-05 Thread Bing Li
);
scan.setBatch(Parameters.BATCHING_SIZE);

SetString rowKeySet = Sets.newHashSet();
try
{
ResultScanner scanner = this.rankTable.getScanner(scan);
for (Result result : scanner)  //
 EXCEPTIONS are raised at this line.
{
for (KeyValue kv : result.raw())
{

rowKeySet.add(Bytes.toString(kv.getRow()));
break;
}
}
scanner.close();
}
catch (IOException e)
{
e.printStackTrace();
}
return rowKeySet;
}


On Tue, Feb 5, 2013 at 4:20 AM, Bing Li lbl...@gmail.com wrote:
 Dear all,

 When writing data into HBase, sometimes I got exceptions. I guess they
 might be caused by concurrent writings. But I am not sure.

 My question is whether it is necessary to put synchronized before
 the writing methods? The following lines are the sample code.

 I think the directive, synchronized, must lower the performance of
 writing. Sometimes concurrent writing is needed in my system.

 Thanks so much!

 Best wishes,
 Bing

 public synchronized void AddDomainNodeRanks(String domainKey, int
 timingScale, MapString, Double nodeRankMap)
 {
   ListPut puts = new ArrayListPut();
   Put domainKeyPut;
   Put timingScalePut;
   Put nodeKeyPut;
   Put rankPut;

   byte[] domainNodeRankRowKey;

   for (Map.EntryString, Double nodeRankEntry : nodeRankMap.entrySet())
   {
   domainNodeRankRowKey =
 Bytes.toBytes(RankStructure.DOMAIN_NODE_RANK_ROW +
 Tools.GetAHash(domainKey + timingScale + nodeRankEntry.getKey()));

  domainKeyPut = new Put(domainNodeRankRowKey);
  domainKeyPut.add(RankStructure.DOMAIN_NODE_RANK_FAMILY,
 RankStructure.DOMAIN_NODE_RANK_DOMAIN_KEY_COLUMN,
 Bytes.toBytes(domainKey));
  puts.add(domainKeyPut);

  timingScalePut = new Put(domainNodeRankRowKey);
  timingScalePut.add(RankStructure.DOMAIN_NODE_RANK_FAMILY,
 RankStructure.DOMAIN_NODE_RANK_TIMING_SCALE_COLUMN,
 Bytes.toBytes(timingScale));
 puts.add(timingScalePut);

 nodeKeyPut = new Put(domainNodeRankRowKey);
 nodeKeyPut.add(RankStructure.DOMAIN_NODE_RANK_FAMILY,
 RankStructure.DOMAIN_NODE_RANK_NODE_KEY_COLUMN,
 Bytes.toBytes(nodeRankEntry.getKey()));
 puts.add(nodeKeyPut);

 rankPut = new Put(domainNodeRankRowKey);
 rankPut.add(RankStructure.DOMAIN_NODE_RANK_FAMILY,
 RankStructure.DOMAIN_NODE_RANK_RANKS_COLUMN,
 Bytes.toBytes(nodeRankEntry.getValue()));
 puts.add(rankPut);
  }

  try
  {
  this.rankTable.put(puts);
  }
  catch (IOException e)
  {
  e.printStackTrace();
  }
 }


The Exceptions When Concurrently Writing and Reading

2013-02-05 Thread Bing Li
Dear all,

To raise the performance of writing data into HBase, the
synchronized is removed from the writing method.

But after synchronized is removed from the method of writing, I get
the following exceptions when reading. Before the removal, no such
exceptions.

Could you help me how to solve it?

Thanks so much!

Best wishes,
Bing

  Feb 6, 2013 12:21:31 AM
org.apache.hadoop.hbase.ipc.HBaseClient$Connection run
  WARNING: Unexpected exception receiving call responses
java.lang.NullPointerException
  at 
org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:521)
  at 
org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297)
  at 
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:593)
  at 
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:505)
  Feb 6, 2013 12:21:31 AM
org.apache.hadoop.hbase.client.ScannerCallable close
  WARNING: Ignore, probably already closed
  java.io.IOException: Call to greatfreeweb/127.0.1.1:60020
failed on local exception: java.io.IOException: Unexpected exception
receiving call responses
  at 
org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:934)
  at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:903)
  at 
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
  at $Proxy6.close(Unknown Source)
  at 
org.apache.hadoop.hbase.client.ScannerCallable.close(ScannerCallable.java:112)
  at 
org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:74)
  at 
org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:39)
  at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1325)
  at 
org.apache.hadoop.hbase.client.HTable$ClientScanner.nextScanner(HTable.java:1167)
  at 
org.apache.hadoop.hbase.client.HTable$ClientScanner.next(HTable.java:1296)
  at 
org.apache.hadoop.hbase.client.HTable$ClientScanner$1.hasNext(HTable.java:1356)
  at 
com.greatfree.hbase.rank.NodeRankRetriever.LoadNodeGroupNodeRankRowKeys(NodeRankRetriever.java:348)
  at 
com.greatfree.ranking.PersistNodeGroupNodeRanksThread.run(PersistNodeGroupNodeRanksThread.java:29)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:662) Caused by:
java.io.IOException: Unexpected exception receiving call responses
  at 
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:509)
  Caused by: java.lang.NullPointerException
  at 
org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:521)
  at 
org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297)
  at 
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:593)
  at 
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:505)

The writing method is as follows.


// The synchronized is removed to raise the performance.
// public synchronized void AddNodeViewGroupNodeRanks(String
hostNodeKey, String groupKey, int timingScale, MapString, Double
groupNodeRankMap)

public void AddNodeViewGroupNodeRanks(String hostNodeKey, String
groupKey, int timingScale, MapString, Double groupNodeRankMap)
{
ListPut puts = new ArrayListPut();
Put hostNodeKeyPut;
Put groupKeyPut;
Put timingScalePut;
Put nodeKeyPut;
Put rankPut;

byte[] groupNodeRankRowKey;

for (Map.EntryString, Double nodeRankEntry :
groupNodeRankMap.entrySet())
{
   groupNodeRankRowKey = Bytes.toBytes(...);

   hostNodeKeyPut = new Put(groupNodeRankRowKey);
   hostNodeKeyPut.add(...);
puts.add(hostNodeKeyPut);
   ..

rankPut = new Put(groupNodeRankRowKey);
   rankPut.add(...);
puts.add(rankPut);
}

try
{
this.rankTable.put(puts);
}
catch (IOException e)
{
e.printStackTrace();
}
}


The reading method that causes the exceptions is as follows.

public SetString LoadNodeGroupNodeRankRowKeys(String
hostNodeKey, String groupKey, int timingScale)
{
ListFilter nodeGroupFilterList = new ArrayListFilter();

SingleColumnValueFilter hostNodeKeyFilter = new
SingleColumnValueFilter(...);
hostNodeKeyFilter.setFilterIfMissing(true);
nodeGroupFilterList.add(hostNodeKeyFilter);

..

   

Re: The Exceptions When Concurrently Writing and Reading

2013-02-05 Thread Bing Li
Dear Ted,

My HBase is 0.92.

Thanks!
Bing

On Wed, Feb 6, 2013 at 2:45 AM, Ted Yu yuzhih...@gmail.com wrote:
 To help us more easily correlate line numbers, can you tell us the version
 of HBase you're using ?

 Thanks

 On Tue, Feb 5, 2013 at 10:39 AM, Bing Li lbl...@gmail.com wrote:

 Dear all,

 To raise the performance of writing data into HBase, the
 synchronized is removed from the writing method.

 But after synchronized is removed from the method of writing, I get
 the following exceptions when reading. Before the removal, no such
 exceptions.

 Could you help me how to solve it?

 Thanks so much!

 Best wishes,
 Bing

   Feb 6, 2013 12:21:31 AM
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection run
   WARNING: Unexpected exception receiving call responses
 java.lang.NullPointerException
   at
 org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:521)
   at
 org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297)
   at
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:593)
   at
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:505)
   Feb 6, 2013 12:21:31 AM
 org.apache.hadoop.hbase.client.ScannerCallable close
   WARNING: Ignore, probably already closed
   java.io.IOException: Call to greatfreeweb/127.0.1.1:60020
 failed on local exception: java.io.IOException: Unexpected exception
 receiving call responses
   at
 org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:934)
   at
 org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:903)
   at
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
   at $Proxy6.close(Unknown Source)
   at
 org.apache.hadoop.hbase.client.ScannerCallable.close(ScannerCallable.java:112)
   at
 org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:74)
   at
 org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:39)
   at
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1325)
   at
 org.apache.hadoop.hbase.client.HTable$ClientScanner.nextScanner(HTable.java:1167)
   at
 org.apache.hadoop.hbase.client.HTable$ClientScanner.next(HTable.java:1296)
   at
 org.apache.hadoop.hbase.client.HTable$ClientScanner$1.hasNext(HTable.java:1356)
   at
 com.greatfree.hbase.rank.NodeRankRetriever.LoadNodeGroupNodeRankRowKeys(NodeRankRetriever.java:348)
   at
 com.greatfree.ranking.PersistNodeGroupNodeRanksThread.run(PersistNodeGroupNodeRanksThread.java:29)
   at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662) Caused by:
 java.io.IOException: Unexpected exception receiving call responses
   at
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:509)
   Caused by: java.lang.NullPointerException
   at
 org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:521)
   at
 org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297)
   at
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:593)
   at
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:505)

 The writing method is as follows.


 // The synchronized is removed to raise the performance.
 // public synchronized void AddNodeViewGroupNodeRanks(String
 hostNodeKey, String groupKey, int timingScale, MapString, Double
 groupNodeRankMap)

 public void AddNodeViewGroupNodeRanks(String hostNodeKey, String
 groupKey, int timingScale, MapString, Double groupNodeRankMap)
 {
 ListPut puts = new ArrayListPut();
 Put hostNodeKeyPut;
 Put groupKeyPut;
 Put timingScalePut;
 Put nodeKeyPut;
 Put rankPut;

 byte[] groupNodeRankRowKey;

 for (Map.EntryString, Double nodeRankEntry :
 groupNodeRankMap.entrySet())
 {
groupNodeRankRowKey = Bytes.toBytes(...);

hostNodeKeyPut = new Put(groupNodeRankRowKey);
hostNodeKeyPut.add(...);
 puts.add(hostNodeKeyPut);
..

 rankPut = new Put(groupNodeRankRowKey);
rankPut.add(...);
 puts.add(rankPut);
 }

 try
 {
 this.rankTable.put(puts);
 }
 catch (IOException e)
 {
 e.printStackTrace();
 }
 }


 The reading method that causes the exceptions is as follows.

 public SetString LoadNodeGroupNodeRankRowKeys(String

Re: The Exceptions When Concurrently Writing and Reading

2013-02-05 Thread Bing Li
Ted,

The version is 0.92.0. Is it what you need?

BTW, now I runs HBase in the pseudo-distributed mode.

Thanks!
Bing


On Wed, Feb 6, 2013 at 2:49 AM, Ted Yu yuzhih...@gmail.com wrote:
 There're several 0.92 releases, can you be more specific ?

 Thanks


 On Tue, Feb 5, 2013 at 10:46 AM, Bing Li lbl...@gmail.com wrote:

 Dear Ted,

 My HBase is 0.92.

 Thanks!
 Bing

 On Wed, Feb 6, 2013 at 2:45 AM, Ted Yu yuzhih...@gmail.com wrote:
  To help us more easily correlate line numbers, can you tell us the
  version
  of HBase you're using ?
 
  Thanks
 
  On Tue, Feb 5, 2013 at 10:39 AM, Bing Li lbl...@gmail.com wrote:
 
  Dear all,
 
  To raise the performance of writing data into HBase, the
  synchronized is removed from the writing method.
 
  But after synchronized is removed from the method of writing, I get
  the following exceptions when reading. Before the removal, no such
  exceptions.
 
  Could you help me how to solve it?
 
  Thanks so much!
 
  Best wishes,
  Bing
 
Feb 6, 2013 12:21:31 AM
  org.apache.hadoop.hbase.ipc.HBaseClient$Connection run
WARNING: Unexpected exception receiving call responses
  java.lang.NullPointerException
at
 
  org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:521)
at
 
  org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297)
at
 
  org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:593)
at
 
  org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:505)
Feb 6, 2013 12:21:31 AM
  org.apache.hadoop.hbase.client.ScannerCallable close
WARNING: Ignore, probably already closed
java.io.IOException: Call to greatfreeweb/127.0.1.1:60020
  failed on local exception: java.io.IOException: Unexpected exception
  receiving call responses
at
 
  org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:934)
at
  org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:903)
at
 
  org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
at $Proxy6.close(Unknown Source)
at
 
  org.apache.hadoop.hbase.client.ScannerCallable.close(ScannerCallable.java:112)
at
 
  org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:74)
at
 
  org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:39)
at
 
  org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1325)
at
 
  org.apache.hadoop.hbase.client.HTable$ClientScanner.nextScanner(HTable.java:1167)
at
 
  org.apache.hadoop.hbase.client.HTable$ClientScanner.next(HTable.java:1296)
at
 
  org.apache.hadoop.hbase.client.HTable$ClientScanner$1.hasNext(HTable.java:1356)
at
 
  com.greatfree.hbase.rank.NodeRankRetriever.LoadNodeGroupNodeRankRowKeys(NodeRankRetriever.java:348)
at
 
  com.greatfree.ranking.PersistNodeGroupNodeRanksThread.run(PersistNodeGroupNodeRanksThread.java:29)
at
 
  java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
 
  java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662) Caused by:
  java.io.IOException: Unexpected exception receiving call responses
at
 
  org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:509)
Caused by: java.lang.NullPointerException
at
 
  org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:521)
at
 
  org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297)
at
 
  org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:593)
at
 
  org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:505)
 
  The writing method is as follows.
 
 
  // The synchronized is removed to raise the performance.
  // public synchronized void AddNodeViewGroupNodeRanks(String
  hostNodeKey, String groupKey, int timingScale, MapString, Double
  groupNodeRankMap)
 
  public void AddNodeViewGroupNodeRanks(String hostNodeKey, String
  groupKey, int timingScale, MapString, Double groupNodeRankMap)
  {
  ListPut puts = new ArrayListPut();
  Put hostNodeKeyPut;
  Put groupKeyPut;
  Put timingScalePut;
  Put nodeKeyPut;
  Put rankPut;
 
  byte[] groupNodeRankRowKey;
 
  for (Map.EntryString, Double nodeRankEntry :
  groupNodeRankMap.entrySet())
  {
 groupNodeRankRowKey = Bytes.toBytes(...);
 
 hostNodeKeyPut = new Put(groupNodeRankRowKey);
 hostNodeKeyPut.add

Re: The Exceptions When Concurrently Writing and Reading

2013-02-05 Thread Bing Li
Dear all,

Sorry, I just found that the same exceptions when synchronized is added.

Some other problems may exist. I am now checking.

Do you have any suggestions?

Thanks so much!

Best regards,
Bing

On Wed, Feb 6, 2013 at 3:00 AM, Bing Li lbl...@gmail.com wrote:
 Ted,

 The version is 0.92.0. Is it what you need?

 BTW, now I runs HBase in the pseudo-distributed mode.

 Thanks!
 Bing


 On Wed, Feb 6, 2013 at 2:49 AM, Ted Yu yuzhih...@gmail.com wrote:
 There're several 0.92 releases, can you be more specific ?

 Thanks


 On Tue, Feb 5, 2013 at 10:46 AM, Bing Li lbl...@gmail.com wrote:

 Dear Ted,

 My HBase is 0.92.

 Thanks!
 Bing

 On Wed, Feb 6, 2013 at 2:45 AM, Ted Yu yuzhih...@gmail.com wrote:
  To help us more easily correlate line numbers, can you tell us the
  version
  of HBase you're using ?
 
  Thanks
 
  On Tue, Feb 5, 2013 at 10:39 AM, Bing Li lbl...@gmail.com wrote:
 
  Dear all,
 
  To raise the performance of writing data into HBase, the
  synchronized is removed from the writing method.
 
  But after synchronized is removed from the method of writing, I get
  the following exceptions when reading. Before the removal, no such
  exceptions.
 
  Could you help me how to solve it?
 
  Thanks so much!
 
  Best wishes,
  Bing
 
Feb 6, 2013 12:21:31 AM
  org.apache.hadoop.hbase.ipc.HBaseClient$Connection run
WARNING: Unexpected exception receiving call responses
  java.lang.NullPointerException
at
 
  org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:521)
at
 
  org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297)
at
 
  org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:593)
at
 
  org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:505)
Feb 6, 2013 12:21:31 AM
  org.apache.hadoop.hbase.client.ScannerCallable close
WARNING: Ignore, probably already closed
java.io.IOException: Call to greatfreeweb/127.0.1.1:60020
  failed on local exception: java.io.IOException: Unexpected exception
  receiving call responses
at
 
  org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:934)
at
  org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:903)
at
 
  org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
at $Proxy6.close(Unknown Source)
at
 
  org.apache.hadoop.hbase.client.ScannerCallable.close(ScannerCallable.java:112)
at
 
  org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:74)
at
 
  org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:39)
at
 
  org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1325)
at
 
  org.apache.hadoop.hbase.client.HTable$ClientScanner.nextScanner(HTable.java:1167)
at
 
  org.apache.hadoop.hbase.client.HTable$ClientScanner.next(HTable.java:1296)
at
 
  org.apache.hadoop.hbase.client.HTable$ClientScanner$1.hasNext(HTable.java:1356)
at
 
  com.greatfree.hbase.rank.NodeRankRetriever.LoadNodeGroupNodeRankRowKeys(NodeRankRetriever.java:348)
at
 
  com.greatfree.ranking.PersistNodeGroupNodeRanksThread.run(PersistNodeGroupNodeRanksThread.java:29)
at
 
  java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
 
  java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662) Caused by:
  java.io.IOException: Unexpected exception receiving call responses
at
 
  org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:509)
Caused by: java.lang.NullPointerException
at
 
  org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:521)
at
 
  org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297)
at
 
  org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:593)
at
 
  org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:505)
 
  The writing method is as follows.
 
 
  // The synchronized is removed to raise the performance.
  // public synchronized void AddNodeViewGroupNodeRanks(String
  hostNodeKey, String groupKey, int timingScale, MapString, Double
  groupNodeRankMap)
 
  public void AddNodeViewGroupNodeRanks(String hostNodeKey, String
  groupKey, int timingScale, MapString, Double groupNodeRankMap)
  {
  ListPut puts = new ArrayListPut();
  Put hostNodeKeyPut;
  Put groupKeyPut;
  Put timingScalePut;
  Put nodeKeyPut;
  Put rankPut;
 
  byte[] groupNodeRankRowKey

Re: Is synchronized required?

2013-02-05 Thread Bing Li
Lars,

I found that at least the exceptions have nothing to do with shared HTable.

To save the resources, I designed a pool for the classes that write
and read from HBase. The primary resources consumed in the classes are
HTable. The pool has some bugs.

My question is whether it is necessary to design such a pool? Is it
fine to create a instance of HTable for each thread?

I noticed that HBase has a class, HTablePool. Maybe the pool I
designed is NOT required?

Thanks so much!

Best wishes!
Bing

On Wed, Feb 6, 2013 at 1:05 PM, lars hofhansl la...@apache.org wrote:
 Are you sharing this.rankTable between threads? HTable is not thread safe.

 -- Lars



 
  From: Bing Li lbl...@gmail.com
 To: hbase-u...@hadoop.apache.org hbase-u...@hadoop.apache.org; user 
 user@hbase.apache.org
 Sent: Tuesday, February 5, 2013 8:54 AM
 Subject: Re: Is synchronized required?

 Dear all,

 After synchronized is removed from the method of writing, I get the
 following exceptions when reading. Before the removal, no such
 exceptions.

 Could you help me how to solve it?

 Thanks so much!

 Best wishes,
 Bing

  [java] Feb 6, 2013 12:21:31 AM
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection run
  [java] WARNING: Unexpected exception receiving call responses
  [java] java.lang.NullPointerException
  [java] at
 org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:521)
  [java] at
 org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297)
  [java] at
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:593)
  [java] at
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:505)
  [java] Feb 6, 2013 12:21:31 AM
 org.apache.hadoop.hbase.client.ScannerCallable close
  [java] WARNING: Ignore, probably already closed
  [java] java.io.IOException: Call to greatfreeweb/127.0.1.1:60020
 failed on local exception: java.io.IOException: Unexpected exception
 receiving call responses
  [java] at
 org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:934)
  [java] at
 org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:903)
  [java] at
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
  [java] at $Proxy6.close(Unknown Source)
  [java] at
 org.apache.hadoop.hbase.client.ScannerCallable.close(ScannerCallable.java:112)
  [java] at
 org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:74)
  [java] at
 org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:39)
  [java] at
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1325)
  [java] at
 org.apache.hadoop.hbase.client.HTable$ClientScanner.nextScanner(HTable.java:1167)
  [java] at
 org.apache.hadoop.hbase.client.HTable$ClientScanner.next(HTable.java:1296)
  [java] at
 org.apache.hadoop.hbase.client.HTable$ClientScanner$1.hasNext(HTable.java:1356)
  [java] at
 com.greatfree.hbase.rank.NodeRankRetriever.LoadNodeGroupNodeRankRowKeys(NodeRankRetriever.java:348)
  [java] at
 com.greatfree.ranking.PersistNodeGroupNodeRanksThread.run(PersistNodeGroupNodeRanksThread.java:29)
  [java] at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  [java] at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  [java] at java.lang.Thread.run(Thread.java:662)
  [java] Caused by: java.io.IOException: Unexpected exception
 receiving call responses
  [java] at
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:509)
  [java] Caused by: java.lang.NullPointerException
  [java] at
 org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:521)
  [java] at
 org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297)
  [java] at
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:593)
  [java] at
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:505)


 The code that causes the exceptions is as follows.

 public SetString LoadNodeGroupNodeRankRowKeys(String
 hostNodeKey, String groupKey, int timingScale)
 {
 ListFilter nodeGroupFilterList = new ArrayListFilter();

 SingleColumnValueFilter hostNodeKeyFilter = new
 SingleColumnValueFilter(RankStructure.NODE_GROUP_NODE_RANK_FAMILY,
 RankStructure.NODE_GROUP_NODE_RANK_HOST_NODE_KEY_COLUMN,
 CompareFilter.CompareOp.EQUAL, new SubstringComparator(hostNodeKey));
 hostNodeKeyFilter.setFilterIfMissing(true);
 nodeGroupFilterList.add(hostNodeKeyFilter

Is synchronized required?

2013-02-04 Thread Bing Li
Dear all,

When writing data into HBase, sometimes I got exceptions. I guess they
might be caused by concurrent writings. But I am not sure.

My question is whether it is necessary to put synchronized before
the writing methods? The following lines are the sample code.

I think the directive, synchronized, must lower the performance of
writing. Sometimes concurrent writing is needed in my system.

Thanks so much!

Best wishes,
Bing

public synchronized void AddDomainNodeRanks(String domainKey, int
timingScale, MapString, Double nodeRankMap)
{
  ListPut puts = new ArrayListPut();
  Put domainKeyPut;
  Put timingScalePut;
  Put nodeKeyPut;
  Put rankPut;

  byte[] domainNodeRankRowKey;

  for (Map.EntryString, Double nodeRankEntry : nodeRankMap.entrySet())
  {
  domainNodeRankRowKey =
Bytes.toBytes(RankStructure.DOMAIN_NODE_RANK_ROW +
Tools.GetAHash(domainKey + timingScale + nodeRankEntry.getKey()));

 domainKeyPut = new Put(domainNodeRankRowKey);
 domainKeyPut.add(RankStructure.DOMAIN_NODE_RANK_FAMILY,
RankStructure.DOMAIN_NODE_RANK_DOMAIN_KEY_COLUMN,
Bytes.toBytes(domainKey));
 puts.add(domainKeyPut);

 timingScalePut = new Put(domainNodeRankRowKey);
 timingScalePut.add(RankStructure.DOMAIN_NODE_RANK_FAMILY,
RankStructure.DOMAIN_NODE_RANK_TIMING_SCALE_COLUMN,
Bytes.toBytes(timingScale));
puts.add(timingScalePut);

nodeKeyPut = new Put(domainNodeRankRowKey);
nodeKeyPut.add(RankStructure.DOMAIN_NODE_RANK_FAMILY,
RankStructure.DOMAIN_NODE_RANK_NODE_KEY_COLUMN,
Bytes.toBytes(nodeRankEntry.getKey()));
puts.add(nodeKeyPut);

rankPut = new Put(domainNodeRankRowKey);
rankPut.add(RankStructure.DOMAIN_NODE_RANK_FAMILY,
RankStructure.DOMAIN_NODE_RANK_RANKS_COLUMN,
Bytes.toBytes(nodeRankEntry.getValue()));
puts.add(rankPut);
 }

 try
 {
 this.rankTable.put(puts);
 }
 catch (IOException e)
 {
 e.printStackTrace();
 }
}


Re: Is synchronized required?

2013-02-04 Thread Bing Li
Dear Ted and Harsh,

I am sorry I didn't keep the exceptions. It occurred many days ago. My
current version is 0.92.

Now synchronized is removed. Is it correct?

I will test if such exceptions are raised. I will let you know.

Thanks!

Best wishes,
Bing


On Tue, Feb 5, 2013 at 4:25 AM, Ted Yu yuzhih...@gmail.com wrote:
 Bing:
 Use pastebin.com instead of attaching exception report.

 What version of HBase are you using ?

 Thanks


 On Mon, Feb 4, 2013 at 12:21 PM, Harsh J ha...@cloudera.com wrote:

 What exceptions do you actually receive - can you send them here?
 Knowing that is key to addressing your issue.

 On Tue, Feb 5, 2013 at 1:50 AM, Bing Li lbl...@gmail.com wrote:
  Dear all,
 
  When writing data into HBase, sometimes I got exceptions. I guess they
  might be caused by concurrent writings. But I am not sure.
 
  My question is whether it is necessary to put synchronized before
  the writing methods? The following lines are the sample code.
 
  I think the directive, synchronized, must lower the performance of
  writing. Sometimes concurrent writing is needed in my system.
 
  Thanks so much!
 
  Best wishes,
  Bing
 
  public synchronized void AddDomainNodeRanks(String domainKey, int
  timingScale, MapString, Double nodeRankMap)
  {
ListPut puts = new ArrayListPut();
Put domainKeyPut;
Put timingScalePut;
Put nodeKeyPut;
Put rankPut;
 
byte[] domainNodeRankRowKey;
 
for (Map.EntryString, Double nodeRankEntry :
  nodeRankMap.entrySet())
{
domainNodeRankRowKey =
  Bytes.toBytes(RankStructure.DOMAIN_NODE_RANK_ROW +
  Tools.GetAHash(domainKey + timingScale + nodeRankEntry.getKey()));
 
   domainKeyPut = new Put(domainNodeRankRowKey);
   domainKeyPut.add(RankStructure.DOMAIN_NODE_RANK_FAMILY,
  RankStructure.DOMAIN_NODE_RANK_DOMAIN_KEY_COLUMN,
  Bytes.toBytes(domainKey));
   puts.add(domainKeyPut);
 
   timingScalePut = new Put(domainNodeRankRowKey);
   timingScalePut.add(RankStructure.DOMAIN_NODE_RANK_FAMILY,
  RankStructure.DOMAIN_NODE_RANK_TIMING_SCALE_COLUMN,
  Bytes.toBytes(timingScale));
  puts.add(timingScalePut);
 
  nodeKeyPut = new Put(domainNodeRankRowKey);
  nodeKeyPut.add(RankStructure.DOMAIN_NODE_RANK_FAMILY,
  RankStructure.DOMAIN_NODE_RANK_NODE_KEY_COLUMN,
  Bytes.toBytes(nodeRankEntry.getKey()));
  puts.add(nodeKeyPut);
 
  rankPut = new Put(domainNodeRankRowKey);
  rankPut.add(RankStructure.DOMAIN_NODE_RANK_FAMILY,
  RankStructure.DOMAIN_NODE_RANK_RANKS_COLUMN,
  Bytes.toBytes(nodeRankEntry.getValue()));
  puts.add(rankPut);
   }
 
   try
   {
   this.rankTable.put(puts);
   }
   catch (IOException e)
   {
   e.printStackTrace();
   }
  }



 --
 Harsh J




Re: Is synchronized required?

2013-02-04 Thread Bing Li
Dear Nicolas,

If using synchronized is required, the performance must be too low,
right? Are there any other ways to minimize the synchronization
granularity?

Thanks so much!
Bing

On Tue, Feb 5, 2013 at 5:31 AM, Nicolas Liochon nkey...@gmail.com wrote:
 Yes, HTable is not thread safe, and using synchronized around them could
 work, but would be implementation dependent.
 You can have one HTable per request at a reasonable cost since
 https://issues.apache.org/jira/browse/HBASE-4805. It's seems to be
 available in 0.92 as well.

 Cheers,

 Nicolas


 On Mon, Feb 4, 2013 at 10:13 PM, Adrien Mogenet 
 adrien.moge...@gmail.comwrote:

 Beware, HTablePool is not totally thread-safe as well:
 https://issues.apache.org/jira/browse/HBASE-6651.


 On Mon, Feb 4, 2013 at 9:42 PM, Haijia Zhou leons...@gmail.com wrote:

  Hi, Bing,
 
   Not sure about your scenario but HTable class is not thread safe for
  neither reads nor write.
   If you consider writing/reading from a table in a multiple-threaded way,
  you can consider using HTablePool.
 
   Hope it helps
 
  HJ
 
 
  On Mon, Feb 4, 2013 at 3:32 PM, Bing Li lbl...@gmail.com wrote:
 
   Dear Ted and Harsh,
  
   I am sorry I didn't keep the exceptions. It occurred many days ago. My
   current version is 0.92.
  
   Now synchronized is removed. Is it correct?
  
   I will test if such exceptions are raised. I will let you know.
  
   Thanks!
  
   Best wishes,
   Bing
  
  
   On Tue, Feb 5, 2013 at 4:25 AM, Ted Yu yuzhih...@gmail.com wrote:
Bing:
Use pastebin.com instead of attaching exception report.
   
What version of HBase are you using ?
   
Thanks
   
   
On Mon, Feb 4, 2013 at 12:21 PM, Harsh J ha...@cloudera.com wrote:
   
What exceptions do you actually receive - can you send them here?
Knowing that is key to addressing your issue.
   
On Tue, Feb 5, 2013 at 1:50 AM, Bing Li lbl...@gmail.com wrote:
 Dear all,

 When writing data into HBase, sometimes I got exceptions. I guess
  they
 might be caused by concurrent writings. But I am not sure.

 My question is whether it is necessary to put synchronized
 before
 the writing methods? The following lines are the sample code.

 I think the directive, synchronized, must lower the performance of
 writing. Sometimes concurrent writing is needed in my system.

 Thanks so much!

 Best wishes,
 Bing

 public synchronized void AddDomainNodeRanks(String domainKey, int
 timingScale, MapString, Double nodeRankMap)
 {
   ListPut puts = new ArrayListPut();
   Put domainKeyPut;
   Put timingScalePut;
   Put nodeKeyPut;
   Put rankPut;

   byte[] domainNodeRankRowKey;

   for (Map.EntryString, Double nodeRankEntry :
 nodeRankMap.entrySet())
   {
   domainNodeRankRowKey =
 Bytes.toBytes(RankStructure.DOMAIN_NODE_RANK_ROW +
 Tools.GetAHash(domainKey + timingScale + nodeRankEntry.getKey()));

  domainKeyPut = new Put(domainNodeRankRowKey);
  domainKeyPut.add(RankStructure.DOMAIN_NODE_RANK_FAMILY,
 RankStructure.DOMAIN_NODE_RANK_DOMAIN_KEY_COLUMN,
 Bytes.toBytes(domainKey));
  puts.add(domainKeyPut);

  timingScalePut = new Put(domainNodeRankRowKey);
  timingScalePut.add(RankStructure.DOMAIN_NODE_RANK_FAMILY,
 RankStructure.DOMAIN_NODE_RANK_TIMING_SCALE_COLUMN,
 Bytes.toBytes(timingScale));
 puts.add(timingScalePut);

 nodeKeyPut = new Put(domainNodeRankRowKey);
 nodeKeyPut.add(RankStructure.DOMAIN_NODE_RANK_FAMILY,
 RankStructure.DOMAIN_NODE_RANK_NODE_KEY_COLUMN,
 Bytes.toBytes(nodeRankEntry.getKey()));
 puts.add(nodeKeyPut);

 rankPut = new Put(domainNodeRankRowKey);
 rankPut.add(RankStructure.DOMAIN_NODE_RANK_FAMILY,
 RankStructure.DOMAIN_NODE_RANK_RANKS_COLUMN,
 Bytes.toBytes(nodeRankEntry.getValue()));
 puts.add(rankPut);
  }

  try
  {
  this.rankTable.put(puts);
  }
  catch (IOException e)
  {
  e.printStackTrace();
  }
 }
   
   
   
--
Harsh J
   
   
  
 



 --
 Adrien Mogenet
 06.59.16.64.22
 http://www.mogenet.me



Pseudo-Distributed Mode Multi-Thread Accessing

2012-09-21 Thread Bing Li
Dear all,

Pseudo-distributed mode is still used since I am still coding.

When scanning a table, I noticed that a single thread was much faster than
each one in a multi-threads module.

For example, the following method can be done in 2 or 3ms with a single
thread. If 30 threads execute the method together, it takes each thread
about 150ms to execute.

Each of the threads can get a HTableInterface from HTablePool. So I think
the performance should not be so low.

Maybe the pseudo-distributed mode causes the problem?

Thanks so much!

Best regards,
Bing

public SetString GetOutgoingHHNeighborKeys(String hubKey, String
groupKey, int timingScale)
{
ListFilter hhNeighborFilterList = new ArrayListFilter();

SingleColumnValueFilter hubKeyFilter = new
SingleColumnValueFilter(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY,
NeighborStructure.HUB_HUB_NEIGHBOR_HUB_KEY_COLUMN,
CompareFilter.CompareOp.EQUAL, new SubstringComparator(hubKey));
hubKeyFilter.setFilterIfMissing(true);
hhNeighborFilterList.add(hubKeyFilter);

SingleColumnValueFilter groupKeyFilter = new
SingleColumnValueFilter(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY,
NeighborStructure.HUB_HUB_NEIGHBOR_GROUP_KEY_COLUMN,
CompareFilter.CompareOp.EQUAL, new SubstringComparator(groupKey));
groupKeyFilter.setFilterIfMissing(true);
hhNeighborFilterList.add(groupKeyFilter);

SingleColumnValueFilter timingScaleFilter = new
SingleColumnValueFilter(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY,
NeighborStructure.HUB_HUB_NEIGHBOR_TIMING_SCALE_COLUMN,
CompareFilter.CompareOp.EQUAL, new
BinaryComparator(Bytes.toBytes(timingScale)));
timingScaleFilter.setFilterIfMissing(true);
hhNeighborFilterList.add(timingScaleFilter);

FilterList hhNeighborFilter = new
FilterList(hhNeighborFilterList);
Scan scan = new Scan();
scan.setFilter(hhNeighborFilter);
scan.setCaching(Parameters.CACHING_SIZE);
scan.setBatch(Parameters.BATCHING_SIZE);

SetString neighborKeySet = Sets.newHashSet();
String qualifier;
try
{
ResultScanner scanner =
this.neighborTable.getScanner(scan);
for (Result result : scanner)
{
for (KeyValue kv : result.raw())
{
qualifier =
Bytes.toString(kv.getQualifier());
if
(qualifier.equals(NeighborStructure.HUB_HUB_NEIGHBOR_NODE_KEY_STRING_COLUMN))
{

neighborKeySet.add(Bytes.toString(kv.getValue()));
}
}
}
scanner.close();
}
catch (IOException e)
{
e.printStackTrace();
}
return neighborKeySet;
}


Re: Is it correct and required to keep consistency this way?

2012-09-19 Thread Bing Li
Dear Jieshan,

Thanks so much for your reply!

Now locking is not set on the reading methods in my system. It seems to be
fine with that.

But I noticed exceptions when no locking was put on the writing method. If
multiple threads are writing to HBase concurrently, do you think it is safe
without locking?

Best regards,
Bing

On Thu, Sep 20, 2012 at 10:22 AM, Bijieshan bijies...@huawei.com wrote:

 You can avoid read  write running parallel from your application level,
 if I read your mail correctly. You can use ReentrantReadWriteLock if your
 intention is like that. But it's not recommended.
 HBase has its own mechanism(MVCC) to manage the read/write consistency.
 When we start a scanning, the latest data has not committed by MVCC may not
 be visible(According to our configuration).

 Jieshan
 -Original Message-
 From: Bing Li [mailto:lbl...@gmail.com]
 Sent: Thursday, September 20, 2012 10:02 AM
 To: hbase-u...@hadoop.apache.org; user
 Subject: Is it correct and required to keep consistency this way?

 Dear all,

 Sorry to send the email multiple times! An error in the previous email is
 corrected.

 I am not exactly sure if it is correct and required to keep consistency as
 follows when saving and reading from HBase? Your help is highly
 appreciated.

 Best regards,
 Bing

 // Writing
 public void AddOutgoingNeighbor(String hostNodeKey, String
 groupKey, int timingScale, String neighborKey)
 {
 ListPut puts = new ArrayListPut();
 Put hostNodeKeyPut;
 Put groupKeyPut;
 Put topGroupKeyPut;
 Put timingScalePut;
 Put neighborKeyPut;

 byte[] outgoingRowKey =
 Bytes.toBytes(NeighborStructure.NODE_OUTGOING_NEIGHBOR_ROW +
 Tools.GetAHash(hostNodeKey + groupKey + timingScale + neighborKey));

 hostNodeKeyPut = new Put(outgoingRowKey);

 hostNodeKeyPut.add(NeighborStructure.NODE_OUTGOING_NEIGHBOR_FAMILY,
 NeighborStructure.NODE_OUTGOING_NEIGHBOR_HOST_NODE_KEY_COLUMN,
 Bytes.toBytes(hostNodeKey));
 puts.add(hostNodeKeyPut);

 groupKeyPut = new Put(outgoingRowKey);

 groupKeyPut.add(NeighborStructure.NODE_OUTGOING_NEIGHBOR_FAMILY,
 NeighborStructure.NODE_OUTGOING_NEIGHBOR_GROUP_KEY_COLUMN,
 Bytes.toBytes(groupKey));
 puts.add(groupKeyPut);

 topGroupKeyPut = new Put(outgoingRowKey);

 topGroupKeyPut.add(NeighborStructure.NODE_OUTGOING_NEIGHBOR_FAMILY,
 NeighborStructure.NODE_OUTGOING_NEIGHBOR_TOP_GROUP_KEY_COLUMN,
 Bytes.toBytes(GroupRegistry.WWW().GetParentGroupKey(groupKey)));
 puts.add(topGroupKeyPut);

 timingScalePut = new Put(outgoingRowKey);

 timingScalePut.add(NeighborStructure.NODE_OUTGOING_NEIGHBOR_FAMILY,
 NeighborStructure.NODE_OUTGOING_NEIGHBOR_TIMING_SCALE_COLUMN,
 Bytes.toBytes(timingScale));
 puts.add(timingScalePut);

 neighborKeyPut = new Put(outgoingRowKey);

 neighborKeyPut.add(NeighborStructure.NODE_OUTGOING_NEIGHBOR_FAMILY,
 NeighborStructure.NODE_OUTGOING_NEIGHBOR_NEIGHBOR_KEY_COLUMN,
 Bytes.toBytes(neighborKey));
 puts.add(neighborKeyPut);

 try
 {
 // Locking is here
 this.lock.writeLock().lock();
 this.neighborTable.put(puts);
 this.lock.writeLock().unlock();
 }
 catch (IOException e)
 {
 e.printStackTrace();
 }
 }

 // Reading
 public SetString GetOutgoingNeighborKeys(String hostNodeKey, int
 timingScale)
 {
 ListFilter outgoingNeighborsList = new
 ArrayListFilter();

 SingleColumnValueFilter hostNodeKeyFilter = new
 SingleColumnValueFilter(NeighborStructure.NODE_OUTGOING_NEIGHBOR_FAMILY,
 NeighborStructure.NODE_OUTGOING_NEIGHBOR_HOST_NODE_KEY_COLUMN,
 CompareFilter.CompareOp.EQUAL, new SubstringComparator(hostNodeKey));
 hostNodeKeyFilter.setFilterIfMissing(true);
 outgoingNeighborsList.add(hostNodeKeyFilter);

 SingleColumnValueFilter timingScaleFilter = new
 SingleColumnValueFilter(NeighborStructure.NODE_OUTGOING_NEIGHBOR_FAMILY,
 NeighborStructure.NODE_OUTGOING_NEIGHBOR_TIMING_SCALE_COLUMN,
 CompareFilter.CompareOp.EQUAL, new
 BinaryComparator(Bytes.toBytes(timingScale)));
 timingScaleFilter.setFilterIfMissing(true);
 outgoingNeighborsList.add(timingScaleFilter);

 FilterList outgoingNeighborFilter = new
 FilterList(outgoingNeighborsList);
 Scan scan = new Scan();
 scan.setFilter(outgoingNeighborFilter);
 scan.setCaching(Parameters.CACHING_SIZE);
 scan.setBatch(Parameters.BATCHING_SIZE);

 String qualifier;
 SetString

Re: Is it correct and required to keep consistency this way?

2012-09-19 Thread Bing Li
Sorry, I didn't keep the exceptions. I will post the exceptions if I get
them again.

But after putting synchronized on the writing methods, the exceptions are
gone.

I am a little confused. HTable must be the interface to write/read data
from HBase. If it is not safe, it means locking must be set as what is
shown in my code, doesn't it?

Thanks so much!
Bing

On Thu, Sep 20, 2012 at 11:00 AM, Bijieshan bijies...@huawei.com wrote:

 Yes. It should be safe. What you need to pay attention is HTable is not
 thread safe. What are the exceptions?

 Jieshan
 -Original Message-
 From: Bing Li [mailto:lbl...@gmail.com]
 Sent: Thursday, September 20, 2012 10:52 AM
 To: user@hbase.apache.org
 Cc: hbase-u...@hadoop.apache.org; Zhouxunmiao
 Subject: Re: Is it correct and required to keep consistency this way?

 Dear Jieshan,

 Thanks so much for your reply!

 Now locking is not set on the reading methods in my system. It seems to be
 fine with that.

 But I noticed exceptions when no locking was put on the writing method. If
 multiple threads are writing to HBase concurrently, do you think it is safe
 without locking?

 Best regards,
 Bing

 On Thu, Sep 20, 2012 at 10:22 AM, Bijieshan bijies...@huawei.com wrote:

  You can avoid read  write running parallel from your application level,
  if I read your mail correctly. You can use ReentrantReadWriteLock if your
  intention is like that. But it's not recommended.
  HBase has its own mechanism(MVCC) to manage the read/write consistency.
  When we start a scanning, the latest data has not committed by MVCC may
 not
  be visible(According to our configuration).
 
  Jieshan
  -Original Message-
  From: Bing Li [mailto:lbl...@gmail.com]
  Sent: Thursday, September 20, 2012 10:02 AM
  To: hbase-u...@hadoop.apache.org; user
  Subject: Is it correct and required to keep consistency this way?
 
  Dear all,
 
  Sorry to send the email multiple times! An error in the previous email is
  corrected.
 
  I am not exactly sure if it is correct and required to keep consistency
 as
  follows when saving and reading from HBase? Your help is highly
  appreciated.
 
  Best regards,
  Bing
 
  // Writing
  public void AddOutgoingNeighbor(String hostNodeKey, String
  groupKey, int timingScale, String neighborKey)
  {
  ListPut puts = new ArrayListPut();
  Put hostNodeKeyPut;
  Put groupKeyPut;
  Put topGroupKeyPut;
  Put timingScalePut;
  Put neighborKeyPut;
 
  byte[] outgoingRowKey =
  Bytes.toBytes(NeighborStructure.NODE_OUTGOING_NEIGHBOR_ROW +
  Tools.GetAHash(hostNodeKey + groupKey + timingScale + neighborKey));
 
  hostNodeKeyPut = new Put(outgoingRowKey);
 
  hostNodeKeyPut.add(NeighborStructure.NODE_OUTGOING_NEIGHBOR_FAMILY,
  NeighborStructure.NODE_OUTGOING_NEIGHBOR_HOST_NODE_KEY_COLUMN,
  Bytes.toBytes(hostNodeKey));
  puts.add(hostNodeKeyPut);
 
  groupKeyPut = new Put(outgoingRowKey);
 
  groupKeyPut.add(NeighborStructure.NODE_OUTGOING_NEIGHBOR_FAMILY,
  NeighborStructure.NODE_OUTGOING_NEIGHBOR_GROUP_KEY_COLUMN,
  Bytes.toBytes(groupKey));
  puts.add(groupKeyPut);
 
  topGroupKeyPut = new Put(outgoingRowKey);
 
  topGroupKeyPut.add(NeighborStructure.NODE_OUTGOING_NEIGHBOR_FAMILY,
  NeighborStructure.NODE_OUTGOING_NEIGHBOR_TOP_GROUP_KEY_COLUMN,
  Bytes.toBytes(GroupRegistry.WWW().GetParentGroupKey(groupKey)));
  puts.add(topGroupKeyPut);
 
  timingScalePut = new Put(outgoingRowKey);
 
  timingScalePut.add(NeighborStructure.NODE_OUTGOING_NEIGHBOR_FAMILY,
  NeighborStructure.NODE_OUTGOING_NEIGHBOR_TIMING_SCALE_COLUMN,
  Bytes.toBytes(timingScale));
  puts.add(timingScalePut);
 
  neighborKeyPut = new Put(outgoingRowKey);
 
  neighborKeyPut.add(NeighborStructure.NODE_OUTGOING_NEIGHBOR_FAMILY,
  NeighborStructure.NODE_OUTGOING_NEIGHBOR_NEIGHBOR_KEY_COLUMN,
  Bytes.toBytes(neighborKey));
  puts.add(neighborKeyPut);
 
  try
  {
  // Locking is here
  this.lock.writeLock().lock();
  this.neighborTable.put(puts);
  this.lock.writeLock().unlock();
  }
  catch (IOException e)
  {
  e.printStackTrace();
  }
  }
 
  // Reading
  public SetString GetOutgoingNeighborKeys(String hostNodeKey,
 int
  timingScale)
  {
  ListFilter outgoingNeighborsList = new
  ArrayListFilter();
 
  SingleColumnValueFilter hostNodeKeyFilter = new
  SingleColumnValueFilter(NeighborStructure.NODE_OUTGOING_NEIGHBOR_FAMILY,
  NeighborStructure.NODE_OUTGOING_NEIGHBOR_HOST_NODE_KEY_COLUMN,
  CompareFilter.CompareOp.EQUAL, new

Re: Is it correct and required to keep consistency this way?

2012-09-19 Thread Bing Li
Jieshan,

Thanks! HTablePool is used in my system.

Best,
Bing

On Thu, Sep 20, 2012 at 11:19 AM, Bijieshan bijies...@huawei.com wrote:

 If it is not safe, it means locking must be set as what is
 shown in my code, doesn't it?

 You should not use one HTableInterface instance in multi-threads(Sharing
 one HTableInterface in multi-threads + Lock will degrade the performance).
 There are 2 options:
 1. Create one HTableInterface instance in each thread.
 2. Use HTablePool to get HTableInnterface. See
 http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTablePool.html
 .

 Hope it helps.
 Jieshan.
 -Original Message-
 From: Bing Li [mailto:lbl...@gmail.com]
 Sent: Thursday, September 20, 2012 11:07 AM
 To: user@hbase.apache.org
 Cc: hbase-u...@hadoop.apache.org; Zhouxunmiao
 Subject: Re: Is it correct and required to keep consistency this way?

 Sorry, I didn't keep the exceptions. I will post the exceptions if I get
 them again.

 But after putting synchronized on the writing methods, the exceptions are
 gone.

 I am a little confused. HTable must be the interface to write/read data
 from HBase. If it is not safe, it means locking must be set as what is
 shown in my code, doesn't it?

 Thanks so much!
 Bing

 On Thu, Sep 20, 2012 at 11:00 AM, Bijieshan bijies...@huawei.com wrote:

  Yes. It should be safe. What you need to pay attention is HTable is not
  thread safe. What are the exceptions?
 
  Jieshan
  -Original Message-
  From: Bing Li [mailto:lbl...@gmail.com]
  Sent: Thursday, September 20, 2012 10:52 AM
  To: user@hbase.apache.org
  Cc: hbase-u...@hadoop.apache.org; Zhouxunmiao
  Subject: Re: Is it correct and required to keep consistency this way?
 
  Dear Jieshan,
 
  Thanks so much for your reply!
 
  Now locking is not set on the reading methods in my system. It seems to
 be
  fine with that.
 
  But I noticed exceptions when no locking was put on the writing method.
 If
  multiple threads are writing to HBase concurrently, do you think it is
 safe
  without locking?
 
  Best regards,
  Bing
 
  On Thu, Sep 20, 2012 at 10:22 AM, Bijieshan bijies...@huawei.com
 wrote:
 
   You can avoid read  write running parallel from your application
 level,
   if I read your mail correctly. You can use ReentrantReadWriteLock if
 your
   intention is like that. But it's not recommended.
   HBase has its own mechanism(MVCC) to manage the read/write consistency.
   When we start a scanning, the latest data has not committed by MVCC may
  not
   be visible(According to our configuration).
  
   Jieshan
   -Original Message-
   From: Bing Li [mailto:lbl...@gmail.com]
   Sent: Thursday, September 20, 2012 10:02 AM
   To: hbase-u...@hadoop.apache.org; user
   Subject: Is it correct and required to keep consistency this way?
  
   Dear all,
  
   Sorry to send the email multiple times! An error in the previous email
 is
   corrected.
  
   I am not exactly sure if it is correct and required to keep consistency
  as
   follows when saving and reading from HBase? Your help is highly
   appreciated.
  
   Best regards,
   Bing
  
   // Writing
   public void AddOutgoingNeighbor(String hostNodeKey, String
   groupKey, int timingScale, String neighborKey)
   {
   ListPut puts = new ArrayListPut();
   Put hostNodeKeyPut;
   Put groupKeyPut;
   Put topGroupKeyPut;
   Put timingScalePut;
   Put neighborKeyPut;
  
   byte[] outgoingRowKey =
   Bytes.toBytes(NeighborStructure.NODE_OUTGOING_NEIGHBOR_ROW +
   Tools.GetAHash(hostNodeKey + groupKey + timingScale + neighborKey));
  
   hostNodeKeyPut = new Put(outgoingRowKey);
  
   hostNodeKeyPut.add(NeighborStructure.NODE_OUTGOING_NEIGHBOR_FAMILY,
   NeighborStructure.NODE_OUTGOING_NEIGHBOR_HOST_NODE_KEY_COLUMN,
   Bytes.toBytes(hostNodeKey));
   puts.add(hostNodeKeyPut);
  
   groupKeyPut = new Put(outgoingRowKey);
  
   groupKeyPut.add(NeighborStructure.NODE_OUTGOING_NEIGHBOR_FAMILY,
   NeighborStructure.NODE_OUTGOING_NEIGHBOR_GROUP_KEY_COLUMN,
   Bytes.toBytes(groupKey));
   puts.add(groupKeyPut);
  
   topGroupKeyPut = new Put(outgoingRowKey);
  
   topGroupKeyPut.add(NeighborStructure.NODE_OUTGOING_NEIGHBOR_FAMILY,
   NeighborStructure.NODE_OUTGOING_NEIGHBOR_TOP_GROUP_KEY_COLUMN,
   Bytes.toBytes(GroupRegistry.WWW().GetParentGroupKey(groupKey)));
   puts.add(topGroupKeyPut);
  
   timingScalePut = new Put(outgoingRowKey);
  
   timingScalePut.add(NeighborStructure.NODE_OUTGOING_NEIGHBOR_FAMILY,
   NeighborStructure.NODE_OUTGOING_NEIGHBOR_TIMING_SCALE_COLUMN,
   Bytes.toBytes(timingScale));
   puts.add(timingScalePut);
  
   neighborKeyPut = new Put(outgoingRowKey);
  
   neighborKeyPut.add(NeighborStructure.NODE_OUTGOING_NEIGHBOR_FAMILY

HBase Is So Slow To Save Data?

2012-08-29 Thread Bing Li
Dear all,

According to my experiences, it is very slow for HBase to save data? Am I
right?

For example, today I need to save data in a HashMap to HBase. It took about
more than three hours. However when saving the same HashMap in a file in
the text format with the redirected System.out, it took only 4.5 seconds!

Why is HBase so slow? It is indexing?

My code to save data in HBase is as follows. I think the code must be
correct.

..
public synchronized void
AddVirtualOutgoingHHNeighbors(ConcurrentHashMapString,
ConcurrentHashMapString, SetString hhOutNeighborMap, int timingScale)
{
ListPut puts = new ArrayListPut();

String hhNeighborRowKey;
Put hubKeyPut;
Put groupKeyPut;
Put topGroupKeyPut;
Put timingScalePut;
Put nodeKeyPut;
Put hubNeighborTypePut;

for (Map.EntryString, ConcurrentHashMapString,
SetString sourceHubGroupNeighborEntry : hhOutNeighborMap.entrySet())
{
for (Map.EntryString, SetString
groupNeighborEntry : sourceHubGroupNeighborEntry.getValue().entrySet())
{
for (String neighborKey :
groupNeighborEntry.getValue())
{
hhNeighborRowKey =
NeighborStructure.HUB_HUB_NEIGHBOR_ROW +
Tools.GetAHash(sourceHubGroupNeighborEntry.getKey() +
groupNeighborEntry.getKey() + timingScale + neighborKey);

hubKeyPut = new
Put(Bytes.toBytes(hhNeighborRowKey));

hubKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_HUB_KEY_COLUMN),
Bytes.toBytes(sourceHubGroupNeighborEntry.getKey()));
puts.add(hubKeyPut);

groupKeyPut = new
Put(Bytes.toBytes(hhNeighborRowKey));

groupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_GROUP_KEY_COLUMN),
Bytes.toBytes(groupNeighborEntry.getKey()));
puts.add(groupKeyPut);

topGroupKeyPut = new
Put(Bytes.toBytes(hhNeighborRowKey));

topGroupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TOP_GROUP_KEY_COLUMN),
Bytes.toBytes(GroupRegistry.WWW().GetParentGroupKey(groupNeighborEntry.getKey(;
puts.add(topGroupKeyPut);

timingScalePut = new
Put(Bytes.toBytes(hhNeighborRowKey));

timingScalePut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TIMING_SCALE_COLUMN),
Bytes.toBytes(timingScale));
puts.add(timingScalePut);

nodeKeyPut = new
Put(Bytes.toBytes(hhNeighborRowKey));

nodeKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_NODE_KEY_COLUMN),
Bytes.toBytes(neighborKey));
puts.add(nodeKeyPut);

hubNeighborTypePut = new
Put(Bytes.toBytes(hhNeighborRowKey));

hubNeighborTypePut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TYPE_COLUMN),
Bytes.toBytes(SocialRole.VIRTUAL_NEIGHBOR));
puts.add(hubNeighborTypePut);
}
}
}

try
{
this.neighborTable.put(puts);
}
catch (IOException e)
{
e.printStackTrace();
}
}
..

Thanks so much!

Best regards,
Bing


Re: HBase Is So Slow To Save Data?

2012-08-29 Thread Bing Li
Dear all,

By the way, my HBase is in the pseudo-distributed mode. Thanks!

Best regards,
Bing

On Wed, Aug 29, 2012 at 10:04 PM, Bing Li lbl...@gmail.com wrote:

 Dear all,

 According to my experiences, it is very slow for HBase to save data? Am I
 right?

 For example, today I need to save data in a HashMap to HBase. It took
 about more than three hours. However when saving the same HashMap in a file
 in the text format with the redirected System.out, it took only 4.5 seconds!

 Why is HBase so slow? It is indexing?

 My code to save data in HBase is as follows. I think the code must be
 correct.

 ..
 public synchronized void
 AddVirtualOutgoingHHNeighbors(ConcurrentHashMapString,
 ConcurrentHashMapString, SetString hhOutNeighborMap, int timingScale)
 {
 ListPut puts = new ArrayListPut();

 String hhNeighborRowKey;
 Put hubKeyPut;
 Put groupKeyPut;
 Put topGroupKeyPut;
 Put timingScalePut;
 Put nodeKeyPut;
 Put hubNeighborTypePut;

 for (Map.EntryString, ConcurrentHashMapString,
 SetString sourceHubGroupNeighborEntry : hhOutNeighborMap.entrySet())
 {
 for (Map.EntryString, SetString
 groupNeighborEntry : sourceHubGroupNeighborEntry.getValue().entrySet())
 {
 for (String neighborKey :
 groupNeighborEntry.getValue())
 {
 hhNeighborRowKey =
 NeighborStructure.HUB_HUB_NEIGHBOR_ROW +
 Tools.GetAHash(sourceHubGroupNeighborEntry.getKey() +
 groupNeighborEntry.getKey() + timingScale + neighborKey);

 hubKeyPut = new
 Put(Bytes.toBytes(hhNeighborRowKey));

 hubKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
 Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_HUB_KEY_COLUMN),
 Bytes.toBytes(sourceHubGroupNeighborEntry.getKey()));
 puts.add(hubKeyPut);

 groupKeyPut = new
 Put(Bytes.toBytes(hhNeighborRowKey));

 groupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
 Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_GROUP_KEY_COLUMN),
 Bytes.toBytes(groupNeighborEntry.getKey()));
 puts.add(groupKeyPut);

 topGroupKeyPut = new
 Put(Bytes.toBytes(hhNeighborRowKey));

 topGroupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
 Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TOP_GROUP_KEY_COLUMN),
 Bytes.toBytes(GroupRegistry.WWW().GetParentGroupKey(groupNeighborEntry.getKey(;
 puts.add(topGroupKeyPut);

 timingScalePut = new
 Put(Bytes.toBytes(hhNeighborRowKey));

 timingScalePut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
 Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TIMING_SCALE_COLUMN),
 Bytes.toBytes(timingScale));
 puts.add(timingScalePut);

 nodeKeyPut = new
 Put(Bytes.toBytes(hhNeighborRowKey));

 nodeKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
 Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_NODE_KEY_COLUMN),
 Bytes.toBytes(neighborKey));
 puts.add(nodeKeyPut);

 hubNeighborTypePut = new
 Put(Bytes.toBytes(hhNeighborRowKey));

 hubNeighborTypePut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
 Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TYPE_COLUMN),
 Bytes.toBytes(SocialRole.VIRTUAL_NEIGHBOR));
 puts.add(hubNeighborTypePut);
 }
 }
 }

 try
 {
 this.neighborTable.put(puts);
 }
 catch (IOException e)
 {
 e.printStackTrace();
 }
 }
 ..

 Thanks so much!

 Best regards,
 Bing



Re: HBase Is So Slow To Save Data?

2012-08-29 Thread Bing Li
Dear N Keywal,

Thanks so much for your reply!

The total amount of data is about 110M. The available memory is enough, 2G.

In Java, I just set a collection to NULL to collect garbage. Do you think
it is fine?

Best regards,
Bing

On Wed, Aug 29, 2012 at 11:22 PM, N Keywal nkey...@gmail.com wrote:

 Hi Bing,

 You should expect HBase to be slower in the generic case:
 1) it writes much more data (see hbase data model), with extra columns
 qualifiers, timestamps  so on.
 2) the data is written multiple times: once in the write-ahead-log, once
 per replica on datanode  so on again.
 3) there are inter process calls  inter machine calls on the critical
 path.

 This is the cost of the atomicity, reliability and scalability features.
 With these features in mind, HBase is reasonably fast to save data on a
 cluster.

 On your specific case (without the points 2  3 above), the performance
 seems to be very bad.

 You should first look at:
 - how much is spent in the put vs. preparing the list
 - do you have garbage collection going on? even swap?
 - what's the size of your final Array vs. the available memory?

 Cheers,

 N.



 On Wed, Aug 29, 2012 at 4:08 PM, Bing Li lbl...@gmail.com wrote:

 Dear all,

 By the way, my HBase is in the pseudo-distributed mode. Thanks!

 Best regards,
 Bing

 On Wed, Aug 29, 2012 at 10:04 PM, Bing Li lbl...@gmail.com wrote:

  Dear all,
 
  According to my experiences, it is very slow for HBase to save data? Am
 I
  right?
 
  For example, today I need to save data in a HashMap to HBase. It took
  about more than three hours. However when saving the same HashMap in a
 file
  in the text format with the redirected System.out, it took only 4.5
 seconds!
 
  Why is HBase so slow? It is indexing?
 
  My code to save data in HBase is as follows. I think the code must be
  correct.
 
  ..
  public synchronized void
  AddVirtualOutgoingHHNeighbors(ConcurrentHashMapString,
  ConcurrentHashMapString, SetString hhOutNeighborMap, int
 timingScale)
  {
  ListPut puts = new ArrayListPut();
 
  String hhNeighborRowKey;
  Put hubKeyPut;
  Put groupKeyPut;
  Put topGroupKeyPut;
  Put timingScalePut;
  Put nodeKeyPut;
  Put hubNeighborTypePut;
 
  for (Map.EntryString, ConcurrentHashMapString,
  SetString sourceHubGroupNeighborEntry : hhOutNeighborMap.entrySet())
  {
  for (Map.EntryString, SetString
  groupNeighborEntry : sourceHubGroupNeighborEntry.getValue().entrySet())
  {
  for (String neighborKey :
  groupNeighborEntry.getValue())
  {
  hhNeighborRowKey =
  NeighborStructure.HUB_HUB_NEIGHBOR_ROW +
  Tools.GetAHash(sourceHubGroupNeighborEntry.getKey() +
  groupNeighborEntry.getKey() + timingScale + neighborKey);
 
  hubKeyPut = new
  Put(Bytes.toBytes(hhNeighborRowKey));
 
  hubKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
  Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_HUB_KEY_COLUMN),
  Bytes.toBytes(sourceHubGroupNeighborEntry.getKey()));
  puts.add(hubKeyPut);
 
  groupKeyPut = new
  Put(Bytes.toBytes(hhNeighborRowKey));
 
 
 groupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
  Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_GROUP_KEY_COLUMN),
  Bytes.toBytes(groupNeighborEntry.getKey()));
  puts.add(groupKeyPut);
 
  topGroupKeyPut = new
  Put(Bytes.toBytes(hhNeighborRowKey));
 
 
 topGroupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
  Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TOP_GROUP_KEY_COLUMN),
 
 Bytes.toBytes(GroupRegistry.WWW().GetParentGroupKey(groupNeighborEntry.getKey(;
  puts.add(topGroupKeyPut);
 
  timingScalePut = new
  Put(Bytes.toBytes(hhNeighborRowKey));
 
 
 timingScalePut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
  Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TIMING_SCALE_COLUMN),
  Bytes.toBytes(timingScale));
  puts.add(timingScalePut);
 
  nodeKeyPut = new
  Put(Bytes.toBytes(hhNeighborRowKey));
 
  nodeKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
  Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_NODE_KEY_COLUMN),
  Bytes.toBytes(neighborKey));
  puts.add(nodeKeyPut);
 
  hubNeighborTypePut = new
  Put(Bytes.toBytes(hhNeighborRowKey));
 
 
 hubNeighborTypePut.add

Re: HBase Is So Slow To Save Data?

2012-08-29 Thread Bing Li
I see. Thanks so much!

Bing


On Wed, Aug 29, 2012 at 11:59 PM, N Keywal nkey...@gmail.com wrote:

 It's not useful here: if you have a memory issue, it's when your using the
 list, not when you have finished with it and set it to null.
 You need to monitor the memory consumption of the jvm, both the client 
 the server.
 Google around these keywords, there are many examples on the web.
 Google as well arrayList initialization.

 Note as well that the important is not the memory size of the structure on
 disk but the size of the ListPut puts = new ArrayListPut(); before
 the table put.

 On Wed, Aug 29, 2012 at 5:42 PM, Bing Li lbl...@gmail.com wrote:

  Dear N Keywal,
 
  Thanks so much for your reply!
 
  The total amount of data is about 110M. The available memory is enough,
 2G.
 
  In Java, I just set a collection to NULL to collect garbage. Do you think
  it is fine?
 
  Best regards,
  Bing
 
 
  On Wed, Aug 29, 2012 at 11:22 PM, N Keywal nkey...@gmail.com wrote:
 
  Hi Bing,
 
  You should expect HBase to be slower in the generic case:
  1) it writes much more data (see hbase data model), with extra columns
  qualifiers, timestamps  so on.
  2) the data is written multiple times: once in the write-ahead-log, once
  per replica on datanode  so on again.
  3) there are inter process calls  inter machine calls on the critical
  path.
 
  This is the cost of the atomicity, reliability and scalability features.
  With these features in mind, HBase is reasonably fast to save data on a
  cluster.
 
  On your specific case (without the points 2  3 above), the performance
  seems to be very bad.
 
  You should first look at:
  - how much is spent in the put vs. preparing the list
  - do you have garbage collection going on? even swap?
  - what's the size of your final Array vs. the available memory?
 
  Cheers,
 
  N.
 
 
 
  On Wed, Aug 29, 2012 at 4:08 PM, Bing Li lbl...@gmail.com wrote:
 
  Dear all,
 
  By the way, my HBase is in the pseudo-distributed mode. Thanks!
 
  Best regards,
  Bing
 
  On Wed, Aug 29, 2012 at 10:04 PM, Bing Li lbl...@gmail.com wrote:
 
   Dear all,
  
   According to my experiences, it is very slow for HBase to save data?
  Am I
   right?
  
   For example, today I need to save data in a HashMap to HBase. It took
   about more than three hours. However when saving the same HashMap in
 a
  file
   in the text format with the redirected System.out, it took only 4.5
  seconds!
  
   Why is HBase so slow? It is indexing?
  
   My code to save data in HBase is as follows. I think the code must be
   correct.
  
   ..
   public synchronized void
   AddVirtualOutgoingHHNeighbors(ConcurrentHashMapString,
   ConcurrentHashMapString, SetString hhOutNeighborMap, int
  timingScale)
   {
   ListPut puts = new ArrayListPut();
  
   String hhNeighborRowKey;
   Put hubKeyPut;
   Put groupKeyPut;
   Put topGroupKeyPut;
   Put timingScalePut;
   Put nodeKeyPut;
   Put hubNeighborTypePut;
  
   for (Map.EntryString, ConcurrentHashMapString,
   SetString sourceHubGroupNeighborEntry :
  hhOutNeighborMap.entrySet())
   {
   for (Map.EntryString, SetString
   groupNeighborEntry :
 sourceHubGroupNeighborEntry.getValue().entrySet())
   {
   for (String neighborKey :
   groupNeighborEntry.getValue())
   {
   hhNeighborRowKey =
   NeighborStructure.HUB_HUB_NEIGHBOR_ROW +
   Tools.GetAHash(sourceHubGroupNeighborEntry.getKey() +
   groupNeighborEntry.getKey() + timingScale + neighborKey);
  
   hubKeyPut = new
   Put(Bytes.toBytes(hhNeighborRowKey));
  
  
 hubKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
   Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_HUB_KEY_COLUMN),
   Bytes.toBytes(sourceHubGroupNeighborEntry.getKey()));
   puts.add(hubKeyPut);
  
   groupKeyPut = new
   Put(Bytes.toBytes(hhNeighborRowKey));
  
  
 
 groupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
   Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_GROUP_KEY_COLUMN),
   Bytes.toBytes(groupNeighborEntry.getKey()));
   puts.add(groupKeyPut);
  
   topGroupKeyPut = new
   Put(Bytes.toBytes(hhNeighborRowKey));
  
  
 
 topGroupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
  
 Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TOP_GROUP_KEY_COLUMN),
  
 
 Bytes.toBytes(GroupRegistry.WWW().GetParentGroupKey(groupNeighborEntry.getKey(;
   puts.add(topGroupKeyPut

Re: HBase Is So Slow To Save Data?

2012-08-29 Thread Bing Li
Dear Cristofer,

Thanks so much for your reminding!

Best regards,
Bing

On Thu, Aug 30, 2012 at 12:32 AM, Cristofer Weber 
cristofer.we...@neogrid.com wrote:

 There's also a lot of conversions from same values to byte array
 representation, eg, your NeighborStructure constants. You should do this
 conversion only once to save time, since you are doing this inside 3 nested
 loops. Not sure about how much this can improve, but you should try this
 also.

 Best regards,
 Cristofer

 -Mensagem original-
 De: Bing Li [mailto:lbl...@gmail.com]
 Enviada em: quarta-feira, 29 de agosto de 2012 13:07
 Para: user@hbase.apache.org
 Cc: hbase-u...@hadoop.apache.org
 Assunto: Re: HBase Is So Slow To Save Data?

 I see. Thanks so much!

 Bing


 On Wed, Aug 29, 2012 at 11:59 PM, N Keywal nkey...@gmail.com wrote:

  It's not useful here: if you have a memory issue, it's when your using
  the list, not when you have finished with it and set it to null.
  You need to monitor the memory consumption of the jvm, both the client
   the server.
  Google around these keywords, there are many examples on the web.
  Google as well arrayList initialization.
 
  Note as well that the important is not the memory size of the
  structure on disk but the size of the ListPut puts = new
  ArrayListPut(); before the table put.
 
  On Wed, Aug 29, 2012 at 5:42 PM, Bing Li lbl...@gmail.com wrote:
 
   Dear N Keywal,
  
   Thanks so much for your reply!
  
   The total amount of data is about 110M. The available memory is
   enough,
  2G.
  
   In Java, I just set a collection to NULL to collect garbage. Do you
   think it is fine?
  
   Best regards,
   Bing
  
  
   On Wed, Aug 29, 2012 at 11:22 PM, N Keywal nkey...@gmail.com wrote:
  
   Hi Bing,
  
   You should expect HBase to be slower in the generic case:
   1) it writes much more data (see hbase data model), with extra
   columns qualifiers, timestamps  so on.
   2) the data is written multiple times: once in the write-ahead-log,
   once per replica on datanode  so on again.
   3) there are inter process calls  inter machine calls on the
   critical path.
  
   This is the cost of the atomicity, reliability and scalability
 features.
   With these features in mind, HBase is reasonably fast to save data
   on a cluster.
  
   On your specific case (without the points 2  3 above), the
   performance seems to be very bad.
  
   You should first look at:
   - how much is spent in the put vs. preparing the list
   - do you have garbage collection going on? even swap?
   - what's the size of your final Array vs. the available memory?
  
   Cheers,
  
   N.
  
  
  
   On Wed, Aug 29, 2012 at 4:08 PM, Bing Li lbl...@gmail.com wrote:
  
   Dear all,
  
   By the way, my HBase is in the pseudo-distributed mode. Thanks!
  
   Best regards,
   Bing
  
   On Wed, Aug 29, 2012 at 10:04 PM, Bing Li lbl...@gmail.com wrote:
  
Dear all,
   
According to my experiences, it is very slow for HBase to save
 data?
   Am I
right?
   
For example, today I need to save data in a HashMap to HBase. It
took about more than three hours. However when saving the same
HashMap in
  a
   file
in the text format with the redirected System.out, it took only
4.5
   seconds!
   
Why is HBase so slow? It is indexing?
   
My code to save data in HBase is as follows. I think the code
must be correct.
   
..
public synchronized void
AddVirtualOutgoingHHNeighbors(ConcurrentHashMapString,
ConcurrentHashMapString, SetString hhOutNeighborMap, int
   timingScale)
{
ListPut puts = new ArrayListPut();
   
String hhNeighborRowKey;
Put hubKeyPut;
Put groupKeyPut;
Put topGroupKeyPut;
Put timingScalePut;
Put nodeKeyPut;
Put hubNeighborTypePut;
   
for (Map.EntryString, ConcurrentHashMapString,
SetString sourceHubGroupNeighborEntry :
   hhOutNeighborMap.entrySet())
{
for (Map.EntryString, SetString
groupNeighborEntry :
  sourceHubGroupNeighborEntry.getValue().entrySet())
{
for (String neighborKey :
groupNeighborEntry.getValue())
{
hhNeighborRowKey =
NeighborStructure.HUB_HUB_NEIGHBOR_ROW +
Tools.GetAHash(sourceHubGroupNeighborEntry.getKey() +
groupNeighborEntry.getKey() + timingScale + neighborKey);
   
hubKeyPut = new
Put(Bytes.toBytes(hhNeighborRowKey));
   
   
  hubKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY)
  ,
Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_HUB_KEY_COLUMN)
, Bytes.toBytes(sourceHubGroupNeighborEntry.getKey

Min/Max Column Value and Row Count

2012-04-18 Thread Bing Li
Dear all,

I noticed that there were no ways to get the min/max of a specific column
value using the current available filters. Right?

Any more convenient approaches to get the row count of a family? I plan to
use FamilyFilter to do that.

Thanks so much!

Best regards,
Bing


Re: Is HBase Thread-Safety?

2012-04-13 Thread Bing Li
NNever,

Thanks so much for your answers!


On Fri, Apr 13, 2012 at 10:50 AM, NNever nnever...@gmail.com wrote:

 1. A pre-row lock is here during the update, so other clients will block
 whild on client performs an update.(see HRegion.put 's annotaion), no
 exception.
 In the client side, while a process is updating, it may not reach the
 buffersize so the other process may read the original value, I think.

 2. What kind of inconsistency? different value on the same row's
 qualifier?


The inconsistency means for the same retrieval, such as a scan, different
values are got in different threads for the multiple instances of a HTable
in them, respectively. Is it possible?

In my case, the little bit inconsistency is not so critical. So I will not
worry about the thread-safety issue. It must be fine?




 3.I don't know the truely realize in code. There Is caching, but
 everytime you call methods like Htable.get, it still need connect to server
 to retrieve data——so, not as fast as in memory, isn't it?


I plan to design a read-only mechanism except when a periodically-updatinge
in with HBase for my system to raise the performance. Locking must affect
the performance. If caching is not fast enough in HBase, the design might
not be good?

Thanks again!

Best,
Bing



 Best regards,
 nn

 2012/4/13 Bing Li lbl...@gmail.com

 Dear Iars,

 Thanks so much for your reply!

 In my case, I need to overwrite or update a HTable. If reading during the
 process of updating or overwriting, any exceptions will be thrown by
 HBase?

 If multiple instances for a HTable are used by multiple threads, there
 must
 be inconsistency among them, right?

 I guess caching must be done in HBase. So retrieving in HTable must be
 almost as fast as in memory?

 Best regards,
 Bing

 On Fri, Apr 13, 2012 at 6:17 AM, lars hofhansl lhofha...@yahoo.com
 wrote:

  Hi Bing,
 
  Which part? The server certainly is thread safe.
  The client is not, at least not all the way through.
 
  The main consideration is HTable, which is not thread safe, you need to
  create one instance for each thread
  (HBASE-4805 makes that much cheaper), store the HTable in a ThreadLocal
  after creation, or use HTablePool.
 
  Please let me know if that answers your question.
 
  Thanks.
 
  -- Lars
 
 
  - Original Message -
  From: Bing Li lbl...@gmail.com
  To: hbase-u...@hadoop.apache.org; user user@hbase.apache.org
  Cc:
  Sent: Thursday, April 12, 2012 3:10 PM
  Subject: Is HBase Thread-Safety?
 
  Dear all,
 
  Is HBase thread-safety? Do I need to consider the consistency issue when
  manipulating HBase?
 
  Thanks so much!
 
  Best regards,
  Bing
 
 





Fwd: Is HBase Thread-Safety?

2012-04-13 Thread Bing Li
NNever,

Thanks so much for your answers!


On Fri, Apr 13, 2012 at 10:50 AM, NNever nnever...@gmail.com wrote:

 1. A pre-row lock is here during the update, so other clients will block
 whild on client performs an update.(see HRegion.put 's annotaion), no
 exception.
 In the client side, while a process is updating, it may not reach the
 buffersize so the other process may read the original value, I think.

 2. What kind of inconsistency? different value on the same row's
 qualifier?


The inconsistency means for the same retrieval, such as a scan, different
values are got in different threads for the multiple instances of a HTable
in them, respectively. Is it possible?

In my case, the little bit inconsistency is not so critical. So I will not
worry about the thread-safety issue. It must be fine?




 3.I don't know the truely realize in code. There Is caching, but
 everytime you call methods like Htable.get, it still need connect to server
 to retrieve data——so, not as fast as in memory, isn't it?


I plan to design a read-only mechanism except when a periodically-updatinge
in with HBase for my system to raise the performance. Locking must affect
the performance. If caching is not fast enough in HBase, the design might
not be good?

Thanks again!

Best,
Bing



 Best regards,
 nn

 2012/4/13 Bing Li lbl...@gmail.com

 Dear Iars,

 Thanks so much for your reply!

 In my case, I need to overwrite or update a HTable. If reading during the
 process of updating or overwriting, any exceptions will be thrown by
 HBase?

 If multiple instances for a HTable are used by multiple threads, there
 must
 be inconsistency among them, right?

 I guess caching must be done in HBase. So retrieving in HTable must be
 almost as fast as in memory?

 Best regards,
 Bing

 On Fri, Apr 13, 2012 at 6:17 AM, lars hofhansl lhofha...@yahoo.com
 wrote:

  Hi Bing,
 
  Which part? The server certainly is thread safe.
  The client is not, at least not all the way through.
 
  The main consideration is HTable, which is not thread safe, you need to
  create one instance for each thread
  (HBASE-4805 makes that much cheaper), store the HTable in a ThreadLocal
  after creation, or use HTablePool.
 
  Please let me know if that answers your question.
 
  Thanks.
 
  -- Lars
 
 
  - Original Message -
  From: Bing Li lbl...@gmail.com
  To: hbase-u...@hadoop.apache.org; user user@hbase.apache.org
  Cc:
  Sent: Thursday, April 12, 2012 3:10 PM
  Subject: Is HBase Thread-Safety?
 
  Dear all,
 
  Is HBase thread-safety? Do I need to consider the consistency issue when
  manipulating HBase?
 
  Thanks so much!
 
  Best regards,
  Bing
 
 





Re: Is HBase Thread-Safety?

2012-04-12 Thread Bing Li
Dear Iars,

Thanks so much for your reply!

In my case, I need to overwrite or update a HTable. If reading during the
process of updating or overwriting, any exceptions will be thrown by HBase?

If multiple instances for a HTable are used by multiple threads, there must
be inconsistency among them, right?

I guess caching must be done in HBase. So retrieving in HTable must be
almost as fast as in memory?

Best regards,
Bing

On Fri, Apr 13, 2012 at 6:17 AM, lars hofhansl lhofha...@yahoo.com wrote:

 Hi Bing,

 Which part? The server certainly is thread safe.
 The client is not, at least not all the way through.

 The main consideration is HTable, which is not thread safe, you need to
 create one instance for each thread
 (HBASE-4805 makes that much cheaper), store the HTable in a ThreadLocal
 after creation, or use HTablePool.

 Please let me know if that answers your question.

 Thanks.

 -- Lars


 - Original Message -
 From: Bing Li lbl...@gmail.com
 To: hbase-u...@hadoop.apache.org; user user@hbase.apache.org
 Cc:
 Sent: Thursday, April 12, 2012 3:10 PM
 Subject: Is HBase Thread-Safety?

 Dear all,

 Is HBase thread-safety? Do I need to consider the consistency issue when
 manipulating HBase?

 Thanks so much!

 Best regards,
 Bing




Re: NotServingRegionException in Pseudo-Distributed Mode

2012-04-11 Thread Bing Li
Dear all,

By the way, I didn't see any severe exceptions and no any exceptions
related to NotServingRegionException.

Thanks so much!
Bing

On Thu, Apr 12, 2012 at 12:27 AM, Bing Li lbl...@gmail.com wrote:

 Dear all,

 I got an exception as follows when running HBase. My Hadoop is set up in
 the pseudo-distributed mode. The exception happens after the system runs
 for about one hour.

 The specification of NotServingRegionException says thrown by a region
 server if it is sent a request for a region it is not serving. I cannot
 figure out how to solve it in my case.

 Could you please help me on this?

 Thanks so much!
 Bing

  [java]
 org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed
 14 actions: NotServingRegionException: 14 times, servers with issues:
 greatfreeweb:60020,
  [java] at
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1641)
  [java] at
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1409)
  [java] at
 org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:900)
  [java] at
 org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:777)
  [java] at
 org.apache.hadoop.hbase.client.HTable.put(HTable.java:760)
  [java] at
 org.apache.hadoop.hbase.client.HTablePool$PooledHTable.put(HTablePool.java:402)
  [java] at
 com.greatfree.hbase.NeighborPersister.ReplicateNodeNeighbor(NeighborPersister.java:550)
  [java] at
 com.greatfree.hbase.thread.ReplicateNodeNeighborThread.run(ReplicateNodeNeighborThread.java:50)
  [java] at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
  [java] at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
  [java] at java.lang.Thread.run(Thread.java:722)




Re: NotServingRegionException in Pseudo-Distributed Mode

2012-04-11 Thread Bing Li
Dear Shashwat,

I appreciate so much for your reply!

But I still cannot solve the problem with the links in your email. In my
case, the environment is simple. All of the stuffs run on a single machine.

Does the exception affect something? Some data will be lost or anything
else?

I got another link which wondered if NSRE was really an exception.

http://mail-archives.apache.org/mod_mbox/hbase-user/201003.mbox/%3c4bb3617a.4020...@gmx.de%3E

Any further help? Thanks so much!

Best regards,
Bing

On Thu, Apr 12, 2012 at 1:42 AM, Shashwat dwivedishash...@gmail.com wrote:

 Check out this thread may be this will provide some help :


 http://mail-archives.apache.org/mod_mbox/hbase-user/201201.mbox/%3CCAHau4ys9
 eTj_ek_jP=bnpovsprrayuyn4fhtd51dgpdgyvy...@mail.gmail.com%3E
 and
 http://www.mail-archive.com/hbase-user@hadoop.apache.org/msg01180.html


 -Original Message-
 From: Bing Li [mailto:lbl...@gmail.com]
 Sent: 11 April 2012 21:58
 To: hbase-u...@hadoop.apache.org; user
 Subject: NotServingRegionException in Pseudo-Distributed Mode

 Dear all,

 I got an exception as follows when running HBase. My Hadoop is set up in
 the
 pseudo-distributed mode. The exception happens after the system runs for
 about one hour.

 The specification of NotServingRegionException says thrown by a region
 server if it is sent a request for a region it is not serving. I cannot
 figure out how to solve it in my case.

 Could you please help me on this?

 Thanks so much!
 Bing

 [java]
 org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed
 14 actions: NotServingRegionException: 14 times, servers with issues:
 greatfreeweb:60020,
 [java] at

 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.
 processBatchCallback(HConnectionManager.java:1641)
 [java] at

 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.
 processBatch(HConnectionManager.java:1409)
 [java] at
 org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:900)
 [java] at
 org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:777)
 [java] at
 org.apache.hadoop.hbase.client.HTable.put(HTable.java:760)
 [java] at

 org.apache.hadoop.hbase.client.HTablePool$PooledHTable.put(HTablePool.java:4
 02)
 [java] at

 com.greatfree.hbase.NeighborPersister.ReplicateNodeNeighbor(NeighborPersiste
 r.java:550)
 [java] at

 com.greatfree.hbase.thread.ReplicateNodeNeighborThread.run(ReplicateNodeNeig
 hborThread.java:50)
 [java] at

 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11
 10)
 [java] at

 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6
 03)
 [java] at java.lang.Thread.run(Thread.java:722)




Methods Missing in HTableInterface

2012-04-05 Thread Bing Li
Dear all,

I found some methods existed in HTable were not in HTableInterface.

   setAutoFlush
   setWriteBufferSize
   ...

In most cases, I manipulate HBase through HTableInterface from HTablePool.
If I need to use the above methods, how to do that?

I am considering writing my own table pool if no proper ways. Is it fine?

Thanks so much!

Best regards,
Bing


Re: Methods Missing in HTableInterface

2012-04-05 Thread Bing Li
I just did that.

Thanks so much for your help!

Best,
Bing

Methods Missing in HTableInterface
--

Key: HBASE-5728
URL: https://issues.apache.org/jira/browse/HBASE-5728
Project: HBase
 Issue Type: Improvement
 Components: client
   Reporter: Bing Li

On Thu, Apr 5, 2012 at 11:32 PM, Lars George lars.geo...@gmail.com wrote:

 +1, there are quiet a few missing that should be in there. Please create a
 JIRA issue so that we can discuss and agree on which to add.

 Lars

 On Apr 5, 2012, at 6:23 PM, Stack wrote:

  On Thu, Apr 5, 2012 at 4:20 AM, Bing Li lbl...@gmail.com wrote:
  Dear all,
 
  I found some methods existed in HTable were not in HTableInterface.
 
setAutoFlush
setWriteBufferSize
...
 
 
  Make a patch to add them?
  Thanks,
  St.Ack




Re: Starting Abnormally After Shutting Down For Some Time

2012-03-28 Thread Bing Li
Dear Manish and Jean-Daniel,

After starting DFS (/opt/hadoop/bin/start-dfs.sh), I got the following
daemons after tying jps.

5212 Jps
5150 SecondaryNameNode
4932 DataNode
4737 NameNode

Then, I started the HBase (/opt/hbase/bin/start-hbase.sh). The following
daemons were available.

5797 Jps
5526 HMaster
5150 SecondaryNameNode
5711 HRegionServer
4932 DataNode
4737 NameNode
5456 HQuorumPeer

HMaster was started. It seemed that everything was fine.

But when typing status in HBase shell. The following error still occurred.

ERROR: org.apache.hadoop.hbase.MasterNotRunningException: Retried 7 times

In the master log, the following exception was found.

2012-03-28 13:40:01,193 FATAL org.apache.hadoop.hbase.master.HMaster:
Unhandled exception. Starting shutdown.
java.net.ConnectException: Call to localhost/127.0.0.1:9000 failed on
connection exception: java.net.ConnectException: Connection refused
at org.apache.hadoop.ipc.Client.wrapException(Client.java:1095)
at org.apache.hadoop.ipc.Client.call(Client.java:1071)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at $Proxy10.setSafeMode(Unknown Source)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy10.setSafeMode(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.setSafeMode(DFSClient.java:1120)
at
org.apache.hadoop.hdfs.DistributedFileSystem.setSafeMode(DistributedFileSystem.java:423)
at org.apache.hadoop.hbase.util.FSUtils.waitOnSafeMode(FSUtils.java:439)
at
org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:323)
at
org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:128)
at
org.apache.hadoop.hbase.master.MasterFileSystem.init(MasterFileSystem.java:113)
at
org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:448)
at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:326)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701)
at
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:656)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:434)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:560)
at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:184)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1202)
at org.apache.hadoop.ipc.Client.call(Client.java:1046)
... 17 more
2012-03-28 13:40:01,195 INFO org.apache.hadoop.hbase.master.HMaster:
Aborting

What is the problem? Why does it happen after HBase/Hadoop is shutdown for
a couple of days?

Thanks so much!

Bing

On Wed, Mar 28, 2012 at 11:09 AM, Manish Bhoge
manishbh...@rocketmail.comwrote:

 It says you have not started the hbase master. Once you restarted the
 system have you confirmed whether all hadoop daemons are running?
 sudo jps
 If you are using CDH package then you can automatically start the hadoop
 daemons on boot using reconfig package.

 Sent from my BlackBerry, pls excuse typo

 -Original Message-
 From: Bing Li lbl...@gmail.com
 Date: Wed, 28 Mar 2012 03:52:12
 To: hbase-u...@hadoop.apache.org; useruser@hbase.apache.org
 Reply-To: user@hbase.apache.org
 Subject: Starting Abnormally After Shutting Down For Some Time

 Dear all,

 I got a weird problem when programming on the pseudo-distributed mode of
 HBase/Hadoop.

 The HBase/Hadoop were installed correctly. It also ran well with my Java
 code.

 However, if after shutting down the server for some time, for example, four
 or five days, I noticed that HBase/Hadoop got a problem. I got an ERROR
 when typing status in the shell of HBase.

ERROR: org.apache.hadoop.hbase.MasterNotRunningException: Retried 7
 times

 Such a problem had happened for three times in the three weeks.

 The HBase/Hadoop are installed on Ubuntu 10.

 Have you encountered such a problem? How to solve it?

 Thanks so much!

 Best regards,
 Bing




Re: Starting Abnormally After Shutting Down For Some Time

2012-03-28 Thread Bing Li
Jean-Daniel,

I changed dfs.data.dir and dfs.name.dir to new paths in the hdfs-site.xml.

I really cannot figure out why the HBase/Hadoop got a problem after a
couple of days of shutting down. If I use it frequently, no such a master
problem happens.

Each time, I have to reinstall not only HBase/Hadoop but also Ubuntu for
the problem. It wasted me a lot of time.

Thanks so much!

Bing



On Wed, Mar 28, 2012 at 4:46 AM, Jean-Daniel Cryans jdcry...@apache.orgwrote:

 Hi Bing,

 Two questions:

 - Can you look at the master log and see what's preventing the master
 from starting?

 - Did you change dfs.data.dir and dfs.name.dir in hdfs-site.xml? By
 default it writes to /tmp which can get cleaned up.

 J-D

 On Tue, Mar 27, 2012 at 12:52 PM, Bing Li lbl...@gmail.com wrote:
  Dear all,
 
  I got a weird problem when programming on the pseudo-distributed mode of
  HBase/Hadoop.
 
  The HBase/Hadoop were installed correctly. It also ran well with my Java
  code.
 
  However, if after shutting down the server for some time, for example,
 four
  or five days, I noticed that HBase/Hadoop got a problem. I got an ERROR
  when typing status in the shell of HBase.
 
 ERROR: org.apache.hadoop.hbase.MasterNotRunningException: Retried 7
  times
 
  Such a problem had happened for three times in the three weeks.
 
  The HBase/Hadoop are installed on Ubuntu 10.
 
  Have you encountered such a problem? How to solve it?
 
  Thanks so much!
 
  Best regards,
  Bing



Re: Starting Abnormally After Shutting Down For Some Time

2012-03-28 Thread Bing Li
Dear Manish,

I appreciate so much for your replies!

The system tmp directory is changed to anther location in my hdfs-site.xml.

If I ran $HADOOP_HOME/bin/start-all.sh, all of the services were listed,
including job tracker and task tracker.

10211 SecondaryNameNode
10634 Jps
9992 DataNode
10508 TaskTracker
10312 JobTracker
9797 NameNode

In the job tracker's log, one exception was found.

   org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete
/home/libing/GreatFreeLab
s/Hadoop/FS/mapred/system. Name node is in safe mode.

In my system, I didn't see the directory, ~/mapred. How should I configure
for it?

For the properties you listed, they were not set in my system. Are they
required? Since they have default values (
http://hbase.apache.org/docs/r0.20.6/hbase-conf.html), do I need to update
them?

 - hbase.zookeeper.property.clientPort.
 - hbase.zookeeper.quorum.
 - hbase.zookeeper.property.dataDir

Now the system was reinstalled. At least, the pseudo-distributed mode runs
well. I also tried to shut down the ubuntu machine and started it again.
The system worked fine. But I worried the master-related problem must
happen if the machine was shutdown for more time. I really don't understand
the reason.

Thanks so much!

Best,
Bing

On Wed, Mar 28, 2012 at 3:11 PM, Manish Bhoge manishbh...@rocketmail.comwrote:

 Bing,

 As per my experience on the configuration I can list down some points one
 of which may be your solution.

 - first and foremost don't store your service metadata into system tmp
 directory because it may get cleaned up in every start and you loose all
 your job tracker, datanode information. It is as good as you're formatting
 your namenode.
 - if you're using CDH make sure you set up permission perfectly for root,
 dfs data directory and mapred directories.(Refer CDH documentation)
 - I didn't see job tracker in your service list. It should be up and
 running. Check the job tracker log if there is any permission issue when
 starting job tracker and task tracker.
 - before trying your stuff on Hbase set up make sure all your Hadoop
 services are up and running. You can check that by running a sample program
 and check whether job tracker, task tracker responding for your
 mapred.system and mapred.local directories to create intermediate files.
 - once you have all hadoop services up don't set/change any permission.

 As far as Hbase configuration is concerned there are 2 path for set up:
 either you set up zookeeper within hbase-site.xml Or configure separately
 via zoo.cfg. If you are going with hbase setting for zookeeper then confirm
 following setting:
 - hbase.zookeeper.property.clientPort.
 - hbase.zookeeper.quorum.
 - hbase.zookeeper.property.dataDir
 Once you have right setting for these and set up root directory for hbase
 then there not much excercise is required.(Make sure zookeeper service is
 up before you start hbase)

 I think if you follow above rules you should be fine. There is no issue
 because of long time shutdown or frequent machine restart.

  Champ, moreover you need to have good amount of patience to understand
 the problem :) I do understand how frustating when you set up everything
 and next day you find the things are completely down.

 Sent from my BlackBerry, pls excuse typo

 -Original Message-
 From: Bing Li lbl...@gmail.com
 Date: Wed, 28 Mar 2012 14:32:12
 To: user@hbase.apache.org; hbase-u...@hadoop.apache.org
 Reply-To: user@hbase.apache.org
 Subject: Re: Starting Abnormally After Shutting Down For Some Time

 Jean-Daniel,

 I changed dfs.data.dir and dfs.name.dir to new paths in the hdfs-site.xml.

 I really cannot figure out why the HBase/Hadoop got a problem after a
 couple of days of shutting down. If I use it frequently, no such a master
 problem happens.

 Each time, I have to reinstall not only HBase/Hadoop but also Ubuntu for
 the problem. It wasted me a lot of time.

 Thanks so much!

 Bing



 On Wed, Mar 28, 2012 at 4:46 AM, Jean-Daniel Cryans jdcry...@apache.org
 wrote:

  Hi Bing,
 
  Two questions:
 
  - Can you look at the master log and see what's preventing the master
  from starting?
 
  - Did you change dfs.data.dir and dfs.name.dir in hdfs-site.xml? By
  default it writes to /tmp which can get cleaned up.
 
  J-D
 
  On Tue, Mar 27, 2012 at 12:52 PM, Bing Li lbl...@gmail.com wrote:
   Dear all,
  
   I got a weird problem when programming on the pseudo-distributed mode
 of
   HBase/Hadoop.
  
   The HBase/Hadoop were installed correctly. It also ran well with my
 Java
   code.
  
   However, if after shutting down the server for some time, for example,
  four
   or five days, I noticed that HBase/Hadoop got a problem. I got an ERROR
   when typing status in the shell of HBase.
  
  ERROR: org.apache.hadoop.hbase.MasterNotRunningException: Retried 7
   times
  
   Such a problem had happened for three times in the three weeks

Re: Starting Abnormally After Shutting Down For Some Time

2012-03-28 Thread Bing Li
Dear Peter,

When I just started the Ubuntu machine, there was nothing in /tmp.

After starting $HADOOP/bin/start-dfs.sh and $HBase/bin/start-hbase.sh, the
following files were under /tmp. Do you think anything wrong? Thanks!

libing@greatfreeweb:/tmp$ ls -alrt
total 112
drwxr-xr-x 22 root   root4096 2012-03-28 14:17 ..
-rw-r--r--  1 libing libing 5 2012-03-29 04:48
hadoop-libing-namenode.pid
-rw-r--r--  1 libing libing 5 2012-03-29 04:48
hadoop-libing-datanode.pid
-rw-r--r--  1 libing libing 5 2012-03-29 04:48
hadoop-libing-secondarynamenode.pid
-rw-r--r--  1 libing libing 5 2012-03-29 04:48
hbase-libing-zookeeper.pid
drwxr-xr-x  3 libing libing  4096 2012-03-29 04:48 hbase-libing
-rw-r--r--  1 libing libing 5 2012-03-29 04:48 hbase-libing-master.pid
-rw-r--r--  1 libing libing 5 2012-03-29 04:48
hbase-libing-regionserver.pid
drwxr-xr-x  2 libing libing  4096 2012-03-29 04:48 hsperfdata_libing
drwxrwxrwt  4 root   root4096 2012-03-29 04:48 .
-rw-r--r--  1 libing libing 71819 2012-03-29 04:48
jffi5395899026867792565.tmp
libing@greatfreeweb:/tmp$

Best,
Bing

On Thu, Mar 29, 2012 at 3:19 AM, Peter Vandenabeele
pe...@vandenabeele.comwrote:

 On Wed, Mar 28, 2012 at 7:27 PM, Bing Li lbl...@gmail.com wrote:
  Dear all,
 
  I found some configuration information was saved in /tmp in my system. So
  when some of the information is lost, the HBase cannot be started
 normally.
 
  But in my system, I have tried to change the HDFS directory to another
  location. Why are there still some files under /tmp?

 I have a pseudo-distributed set-up (Cloudera cdh3u2) with local
 directory (not /tmp)
 and as a test:

 * stopped the hbase service
 * stopped the hadoop services
 * moved all hadoop related files from tmp to an ORIG directory [see below]
 * restarted all (5) hadoop services
 * restarted the hbase service

 All of that worked stable, so I presume no immediate dependency on the
 /tmp files. The files that are recreated are these:

 peterv@e6500:/tmp$ ls -alrt
 ...
 drwxr-xr-x  4 hdfs   hdfs   4096 2012-03-28 20:07
 Jetty_0_0_0_0_50070_hdfsw2cu08
 drwxr-xr-x  4 hdfs   hdfs   4096 2012-03-28 20:07
 Jetty_0_0_0_0_50075_datanodehwtdwq
 drwxr-xr-x  2 hdfs   hdfs   4096 2012-03-28 20:07 hsperfdata_hdfs
 drwxr-xr-x  4 hdfs   hdfs   4096 2012-03-28 20:07
 Jetty_0_0_0_0_50090_secondaryy6aanv
 drwxr-xr-x  4 mapred mapred 4096 2012-03-28 20:07
 Jetty_0_0_0_0_50030_jobyn7qmk
 drwxr-xr-x  2 mapred mapred 4096 2012-03-28 20:07 hsperfdata_mapred
 drwxr-xr-x  2 root   root   4096 2012-03-28 20:07 hsperfdata_root
 drwxr-xr-x  4 mapred mapred 4096 2012-03-28 20:07
 Jetty_0_0_0_0_50060_task.2vcltf

 The files that I had moved on the site (to ORIG) where:

 peterv@e6500:/tmp$ ls -alrt ORIG/
 total 44
 drwxr-xr-x  4 mapred mapred 4096 2012-03-28 19:58
 Jetty_0_0_0_0_50030_jobyn7qmk
 drwxr-xr-x  4 hdfs   hdfs   4096 2012-03-28 19:58
 Jetty_0_0_0_0_50070_hdfsw2cu08
 drwxr-xr-x  4 hdfs   hdfs   4096 2012-03-28 19:58
 Jetty_0_0_0_0_50090_secondaryy6aanv
 drwxr-xr-x  4 hdfs   hdfs   4096 2012-03-28 19:58
 Jetty_0_0_0_0_50075_datanodehwtdwq
 drwxr-xr-x  4 mapred mapred 4096 2012-03-28 19:59
 Jetty_0_0_0_0_50060_task.2vcltf
 drwxr-xr-x  2 peterv peterv 4096 2012-03-28 20:05 hsperfdata_peterv
 drwxr-xr-x  2 hdfs   hdfs   4096 2012-03-28 20:05 hsperfdata_hdfs
 drwxr-xr-x  2 mapred mapred 4096 2012-03-28 20:05 hsperfdata_mapred
 drwxr-xr-x  2 root   root   4096 2012-03-28 20:06 hsperfdata_root

 Which hadoop/hbase files do you still see in your /tmp directory?

 HTH,

 Peter



Starting Abnormally After Shutting Down For Some Time

2012-03-27 Thread Bing Li
Dear all,

I got a weird problem when programming on the pseudo-distributed mode of
HBase/Hadoop.

The HBase/Hadoop were installed correctly. It also ran well with my Java
code.

However, if after shutting down the server for some time, for example, four
or five days, I noticed that HBase/Hadoop got a problem. I got an ERROR
when typing status in the shell of HBase.

ERROR: org.apache.hadoop.hbase.MasterNotRunningException: Retried 7
times

Such a problem had happened for three times in the three weeks.

The HBase/Hadoop are installed on Ubuntu 10.

Have you encountered such a problem? How to solve it?

Thanks so much!

Best regards,
Bing


Re: Setting Up Pseudo-Distributed Mode Failed On Ubuntu 11

2012-03-11 Thread Bing Li
After installing on Ubuntu Server 11, I found two errors.

1) In the HBase shell, the error is that the master node is not started.
The system prompts it tries seven times;

2) Sometimes, I also saw the following problem. And, the HBase cannot be
stopped.

   0 servers, 0 dead, NaN average load

On Ubuntu Server 10, no such problems.

Thanks so much!
Bing

On Sun, Mar 11, 2012 at 12:01 PM, Gopal absoft...@gmail.com wrote:

 On 03/10/2012 10:23 PM, Bing Li wrote:

 Dear all,

 Yesterday I tried to set up the pseudo-distributed mode for HBase on
 Ubuntu
 11 (64-bit). But I failed to do that. What I have done is exactly the same
 as on Ubuntu 10. On Ubuntu 10, I set it up successfully.

 I am not sure what are the possible problems. Could you give me some
 hints?
 Thanks so much!

 Best regards,
 Bing



 List the error you are  getting.   Dump the Java  stack trace.

 Thanks



Setting Up Pseudo-Distributed Mode Failed On Ubuntu 11

2012-03-10 Thread Bing Li
Dear all,

Yesterday I tried to set up the pseudo-distributed mode for HBase on Ubuntu
11 (64-bit). But I failed to do that. What I have done is exactly the same
as on Ubuntu 10. On Ubuntu 10, I set it up successfully.

I am not sure what are the possible problems. Could you give me some hints?
Thanks so much!

Best regards,
Bing


RowFilter - Each Time It Should Be Initialized?

2012-03-01 Thread Bing Li
Dear all,

I am now using RowFilter to retrieve multiple rows. Each time I need to
call the following line?

 filter = new RowFilter(CompareFilter.CompareOp.EQUAL,
newSubstringComparator(
Classmate2));

I check the relevant APIs. There is a method, reset(). But it seems that I
have to use the constructor to set the new parameters only. Does that way
consume many resources if the count of rows is large.

Thanks,
Bing


Retrieving by Counters and ValueFilter

2012-02-24 Thread Bing Li
Dear all,

HBase has the feature to treat columns as counters. So I attempted to
retrieve data based on the value of counters. Usually, the counters are the
long type.

But the filters' constructors, such as ValueFilter, in HBase does not have
the parameter of long type. If so, may I still retrieve by ValueFilter and
counters?

Thanks so much!

Best regards,
Bing


The Problems When Retrieving By BinaryComparator

2012-02-24 Thread Bing Li
Dear all,

I created a table as follows. I need to retrieve by the column of Salary,
which is a long type data. Some errors are got as follows.

ROW COLUMN+CELL

 Classmate1 column=ClassmateFamily:Address,
timestamp=1330118559432, value=Canada
 Classmate1 column=ClassmateFamily:Age,
timestamp=1330118559429, value=42
 Classmate1 column=ClassmateFamily:Career,
timestamp=1330118559431, value=Faculty
 Classmate1 column=ClassmateFamily:Hobby,
timestamp=1330118559433, value=Soccer
 Classmate1 column=ClassmateFamily:Name,
timestamp=1330118559427, value=Bing
 Classmate1 column=ClassmateFamily:Salary,
timestamp=1330121577483, value=\x00\x00\x00\x00\x00\x00\x03\xEA  (1002 -
long)
 Classmate2 column=ClassmateFamily:Address,
timestamp=1330118559436, value=US
 Classmate2 column=ClassmateFamily:Age,
timestamp=1330118559434, value=52
 Classmate2 column=ClassmateFamily:Career,
timestamp=1330118559435, value=Educator
 Classmate2 column=ClassmateFamily:Hobby,
timestamp=1330118559437, value=Music
 Classmate2 column=ClassmateFamily:Name,
timestamp=1330118559433, value=GreatFree
 Classmate2 column=ClassmateFamily:Salary,
timestamp=1330118559393, value=\x00\x00\x00\x00\x00\x00\x05\xDC  (1500 -
long)
 Classmate3 column=ClassmateFamily:Address,
timestamp=1330118559440, value=US
 Classmate3 column=ClassmateFamily:Age,
timestamp=1330118559438, value=100
 Classmate3 column=ClassmateFamily:Career,
timestamp=1330118559439, value=Researcher
 Classmate3 column=ClassmateFamily:Hobby,
timestamp=1330118559442, value=Science
 Classmate3 column=ClassmateFamily:Name,
timestamp=1330118559437, value=LBLabs
 Classmate3 column=ClassmateFamily:Salary,
timestamp=1330118559397, value=\x00\x00\x00\x00\x00\x00\x07\x08  (1800 -
long)
 Classmate4 column=ClassmateFamily:Address,
timestamp=1330118559445, value=Baoji
 Classmate4 column=ClassmateFamily:Age,
timestamp=1330118559443, value=41
 Classmate4 column=ClassmateFamily:Career,
timestamp=1330118559444, value=Lawyer
 Classmate4 column=ClassmateFamily:Hobby,
timestamp=1330118559446, value=Drawing
 Classmate4 column=ClassmateFamily:Name,
timestamp=1330118559442, value=Dezhi
 Classmate4 column=ClassmateFamily:Salary,
timestamp=1330118559399, value=\x00\x00\x00\x00\x00\x00\x03   (800 - long)

The code is listed below.

Filter filter = new
ValueFilter(CompareFilter.CompareOp.LESS, new
BinaryComparator(Bytes.toBytes(1000))); // The filter line *

Scan scan = new Scan();
scan.addColumn(Bytes.toBytes(ClassmateFamily),
Bytes.toBytes(Salary));
scan.setFilter(filter);

ResultScanner scanner = table.getScanner(scan);
for (Result result : scanner)
{
for (KeyValue kv : result.raw())
{
System.out.println(KV:  + kv + , Value:
 + Bytes.toLong(kv.getValue()));
}
}
scanner.close();

System.out.println();

Get get = new Get(Bytes.toBytes(Classmate3));
get.setFilter(filter);
Result result = table.get(get);
for (KeyValue kv : result.raw())
{
System.out.println(KV:  + kv + , Value:  +
Bytes.toLong(kv.getValue()));
}

I think the correct result should be like the one below. Only the rows that
are less than 1000 must be returned, right?

 [java] KV: Classmate4/ClassmateFamily:Salary/1330118559399/Put/vlen=8,
Value: 800
 [java] 


But, the actual result is as follows. Some rows which are higher than 1000
are returned. Why?

 [java] KV: Classmate1/ClassmateFamily:Salary/1330121577483/Put/vlen=8,
Value: 1002
 [java] KV: Classmate2/ClassmateFamily:Salary/1330118559393/Put/vlen=8,
Value: 1500
 [java] KV: Classmate3/ClassmateFamily:Salary/1330118559397/Put/vlen=8,
Value: 1800
 [java] KV: Classmate4/ClassmateFamily:Salary/1330118559399/Put/vlen=8,
Value: 800
 [java] 
 [java] KV: Classmate3/ClassmateFamily:Salary/1330118559397/Put/vlen=8,
Value: 1800

If I change the filter line to the following one,

 Filter filter = new ValueFilter(CompareFilter.CompareOp.GREATER,
new BinaryComparator(Bytes.toBytes(1000))); // The 

Re: The Problems When Retrieving By BinaryComparator

2012-02-24 Thread Bing Li
Mr Gupta,

Yes, you are right. After changing Bytes.toBytes(1000) to
Bytes.toBytes(1000L), it works fine.

However, the following exception still exists.

 [java] Exception in thread main java.lang.IllegalArgumentException:
offset (0) + length (8) exceed the capacity of the array: 2
 [java]  at
org.apache.hadoop.hbase.util.Bytes.explainWrongLengthOrOffset(Bytes.java:527)
 [java]  at org.apache.hadoop.hbase.util.Bytes.toLong(Bytes.java:505)
 [java]  at org.apache.hadoop.hbase.util.Bytes.toLong(Bytes.java:478)
 [java]  at
com.greatfree.testing.hbase.OrderedQualifierValue.main(Unknown Source)

After searching on the Web, one said it was possible that int type was
inserted into the table while retrieving the long value. I created the
table again and inserted the long type. But I still got the exception. I am
trying to solve the problem.

Thanks so much!
Bing


On Sat, Feb 25, 2012 at 8:31 AM, T Vinod Gupta tvi...@readypulse.comwrote:

 when you do  Bytes.toBytes(1000), you are not telling it whether 1000 is
 integer or long.. you have to be super careful here..
 i didnt read the flow fully but this caught my eye immediate.. try
 repopulating properly and use proper types when using Bytes.

 thanks

 On Fri, Feb 24, 2012 at 4:25 PM, Bing Li lbl...@gmail.com wrote:

 Dear all,

 I created a table as follows. I need to retrieve by the column of
 Salary,
 which is a long type data. Some errors are got as follows.

 ROW COLUMN+CELL

  Classmate1 column=ClassmateFamily:Address,
 timestamp=1330118559432, value=Canada
  Classmate1 column=ClassmateFamily:Age,
 timestamp=1330118559429, value=42
  Classmate1 column=ClassmateFamily:Career,
 timestamp=1330118559431, value=Faculty
  Classmate1 column=ClassmateFamily:Hobby,
 timestamp=1330118559433, value=Soccer
  Classmate1 column=ClassmateFamily:Name,
 timestamp=1330118559427, value=Bing
  Classmate1 column=ClassmateFamily:Salary,
 timestamp=1330121577483, value=\x00\x00\x00\x00\x00\x00\x03\xEA  (1002 -
 long)
  Classmate2 column=ClassmateFamily:Address,
 timestamp=1330118559436, value=US
  Classmate2 column=ClassmateFamily:Age,
 timestamp=1330118559434, value=52
  Classmate2 column=ClassmateFamily:Career,
 timestamp=1330118559435, value=Educator
  Classmate2 column=ClassmateFamily:Hobby,
 timestamp=1330118559437, value=Music
  Classmate2 column=ClassmateFamily:Name,
 timestamp=1330118559433, value=GreatFree
  Classmate2 column=ClassmateFamily:Salary,
 timestamp=1330118559393, value=\x00\x00\x00\x00\x00\x00\x05\xDC  (1500 -
 long)
  Classmate3 column=ClassmateFamily:Address,
 timestamp=1330118559440, value=US
  Classmate3 column=ClassmateFamily:Age,
 timestamp=1330118559438, value=100
  Classmate3 column=ClassmateFamily:Career,
 timestamp=1330118559439, value=Researcher
  Classmate3 column=ClassmateFamily:Hobby,
 timestamp=1330118559442, value=Science
  Classmate3 column=ClassmateFamily:Name,
 timestamp=1330118559437, value=LBLabs
  Classmate3 column=ClassmateFamily:Salary,
 timestamp=1330118559397, value=\x00\x00\x00\x00\x00\x00\x07\x08  (1800 -
 long)
  Classmate4 column=ClassmateFamily:Address,
 timestamp=1330118559445, value=Baoji
  Classmate4 column=ClassmateFamily:Age,
 timestamp=1330118559443, value=41
  Classmate4 column=ClassmateFamily:Career,
 timestamp=1330118559444, value=Lawyer
  Classmate4 column=ClassmateFamily:Hobby,
 timestamp=1330118559446, value=Drawing
  Classmate4 column=ClassmateFamily:Name,
 timestamp=1330118559442, value=Dezhi
  Classmate4 column=ClassmateFamily:Salary,
 timestamp=1330118559399, value=\x00\x00\x00\x00\x00\x00\x03   (800 - long)

 The code is listed below.

Filter filter = new
 ValueFilter(CompareFilter.CompareOp.LESS, new
 BinaryComparator(Bytes.toBytes(1000))); // The filter line *

Scan scan = new Scan();
scan.addColumn(Bytes.toBytes(ClassmateFamily),
 Bytes.toBytes(Salary));
scan.setFilter(filter);

ResultScanner scanner = table.getScanner(scan);
for (Result result : scanner)
{
for (KeyValue kv : result.raw())
{
System.out.println(KV:  + kv + , Value:
  + Bytes.toLong(kv.getValue()));
}
}
scanner.close();

System.out.println

Re: Solr HBase - Re: How is Data Indexed in HBase?

2012-02-23 Thread Bing Li
Dear Mr Gupta,

Your understanding about my solution is correct. Now both HBase and Solr
are used in my system. I hope it could work.

Thanks so much for your reply!

Best regards,
Bing

On Fri, Feb 24, 2012 at 3:30 AM, T Vinod Gupta tvi...@readypulse.comwrote:

 regarding your question on hbase support for high performance and
 consistency - i would say hbase is highly scalable and performant. how it
 does what it does can be understood by reading relevant chapters around
 architecture and design in the hbase book.

 with regards to ranking, i see your problem. but if you split the problem
 into hbase specific solution and solr based solution, you can achieve the
 results probably. may be you do the ranking and store the rank in hbase and
 then use solr to get the results and then use hbase as a lookup to get the
 rank. or you can put the rank as part of the document schema and index the
 rank too for range queries and such. is my understanding of your scenario
 wrong?

 thanks


 On Wed, Feb 22, 2012 at 9:51 AM, Bing Li lbl...@gmail.com wrote:

 Mr Gupta,

 Thanks so much for your reply!

 In my use cases, retrieving data by keyword is one of them. I think Solr
 is a proper choice.

 However, Solr does not provide a complex enough support to rank. And,
 frequent updating is also not suitable in Solr. So it is difficult to
 retrieve data randomly based on the values other than keyword frequency in
 text. In this case, I attempt to use HBase.

 But I don't know how HBase support high performance when it needs to keep
 consistency in a large scale distributed system.

 Now both of them are used in my system.

 I will check out ElasticSearch.

 Best regards,
 Bing


 On Thu, Feb 23, 2012 at 1:35 AM, T Vinod Gupta tvi...@readypulse.comwrote:

 Bing,
 Its a classic battle on whether to use solr or hbase or a combination of
 both. both systems are very different but there is some overlap in the
 utility. they also differ vastly when it compares to computation power,
 storage needs, etc. so in the end, it all boils down to your use case. you
 need to pick the technology that it best suited to your needs.
 im still not clear on your use case though.

 btw, if you haven't started using solr yet - then you might want to
 checkout ElasticSearch. I spent over a week researching between solr and ES
 and eventually chose ES due to its cool merits.

 thanks


 On Wed, Feb 22, 2012 at 9:31 AM, Ted Yu yuzhih...@gmail.com wrote:

 There is no secondary index support in HBase at the moment.

 It's on our road map.

 FYI

 On Wed, Feb 22, 2012 at 9:28 AM, Bing Li lbl...@gmail.com wrote:

  Jacques,
 
  Yes. But I still have questions about that.
 
  In my system, when users search with a keyword arbitrarily, the query
 is
  forwarded to Solr. No any updating operations but appending new
 indexes
  exist in Solr managed data.
 
  When I need to retrieve data based on ranking values, HBase is used.
 And,
  the ranking values need to be updated all the time.
 
  Is that correct?
 
  My question is that the performance must be low if keeping
 consistency in a
  large scale distributed environment. How does HBase handle this issue?
 
  Thanks so much!
 
  Bing
 
 
  On Thu, Feb 23, 2012 at 1:17 AM, Jacques whs...@gmail.com wrote:
 
   It is highly unlikely that you could replace Solr with HBase.
  They're
   really apples and oranges.
  
  
   On Wed, Feb 22, 2012 at 1:09 AM, Bing Li lbl...@gmail.com wrote:
  
   Dear all,
  
   I wonder how data in HBase is indexed? Now Solr is used in my
 system
   because data is managed in inverted index. Such an index is
 suitable to
   retrieve unstructured and huge amount of data. How does HBase deal
 with
   the
   issue? May I replaced Solr with HBase?
  
   Thanks so much!
  
   Best regards,
   Bing
  
  
  
 







How is Data Indexed in HBase?

2012-02-22 Thread Bing Li
Dear all,

I wonder how data in HBase is indexed? Now Solr is used in my system
because data is managed in inverted index. Such an index is suitable to
retrieve unstructured and huge amount of data. How does HBase deal with the
issue? May I replaced Solr with HBase?

Thanks so much!

Best regards,
Bing


Solr HBase - Re: How is Data Indexed in HBase?

2012-02-22 Thread Bing Li
Jacques,

Yes. But I still have questions about that.

In my system, when users search with a keyword arbitrarily, the query is
forwarded to Solr. No any updating operations but appending new indexes
exist in Solr managed data.

When I need to retrieve data based on ranking values, HBase is used. And,
the ranking values need to be updated all the time.

Is that correct?

My question is that the performance must be low if keeping consistency in a
large scale distributed environment. How does HBase handle this issue?

Thanks so much!

Bing


On Thu, Feb 23, 2012 at 1:17 AM, Jacques whs...@gmail.com wrote:

 It is highly unlikely that you could replace Solr with HBase.  They're
 really apples and oranges.


 On Wed, Feb 22, 2012 at 1:09 AM, Bing Li lbl...@gmail.com wrote:

 Dear all,

 I wonder how data in HBase is indexed? Now Solr is used in my system
 because data is managed in inverted index. Such an index is suitable to
 retrieve unstructured and huge amount of data. How does HBase deal with
 the
 issue? May I replaced Solr with HBase?

 Thanks so much!

 Best regards,
 Bing





Re: Solr HBase - Re: How is Data Indexed in HBase?

2012-02-22 Thread Bing Li
Mr Gupta,

Thanks so much for your reply!

In my use cases, retrieving data by keyword is one of them. I think Solr is
a proper choice.

However, Solr does not provide a complex enough support to rank. And,
frequent updating is also not suitable in Solr. So it is difficult to
retrieve data randomly based on the values other than keyword frequency in
text. In this case, I attempt to use HBase.

But I don't know how HBase support high performance when it needs to keep
consistency in a large scale distributed system.

Now both of them are used in my system.

I will check out ElasticSearch.

Best regards,
Bing


On Thu, Feb 23, 2012 at 1:35 AM, T Vinod Gupta tvi...@readypulse.comwrote:

 Bing,
 Its a classic battle on whether to use solr or hbase or a combination of
 both. both systems are very different but there is some overlap in the
 utility. they also differ vastly when it compares to computation power,
 storage needs, etc. so in the end, it all boils down to your use case. you
 need to pick the technology that it best suited to your needs.
 im still not clear on your use case though.

 btw, if you haven't started using solr yet - then you might want to
 checkout ElasticSearch. I spent over a week researching between solr and ES
 and eventually chose ES due to its cool merits.

 thanks


 On Wed, Feb 22, 2012 at 9:31 AM, Ted Yu yuzhih...@gmail.com wrote:

 There is no secondary index support in HBase at the moment.

 It's on our road map.

 FYI

 On Wed, Feb 22, 2012 at 9:28 AM, Bing Li lbl...@gmail.com wrote:

  Jacques,
 
  Yes. But I still have questions about that.
 
  In my system, when users search with a keyword arbitrarily, the query is
  forwarded to Solr. No any updating operations but appending new indexes
  exist in Solr managed data.
 
  When I need to retrieve data based on ranking values, HBase is used.
 And,
  the ranking values need to be updated all the time.
 
  Is that correct?
 
  My question is that the performance must be low if keeping consistency
 in a
  large scale distributed environment. How does HBase handle this issue?
 
  Thanks so much!
 
  Bing
 
 
  On Thu, Feb 23, 2012 at 1:17 AM, Jacques whs...@gmail.com wrote:
 
   It is highly unlikely that you could replace Solr with HBase.  They're
   really apples and oranges.
  
  
   On Wed, Feb 22, 2012 at 1:09 AM, Bing Li lbl...@gmail.com wrote:
  
   Dear all,
  
   I wonder how data in HBase is indexed? Now Solr is used in my system
   because data is managed in inverted index. Such an index is suitable
 to
   retrieve unstructured and huge amount of data. How does HBase deal
 with
   the
   issue? May I replaced Solr with HBase?
  
   Thanks so much!
  
   Best regards,
   Bing
  
  
  
 





TimeStampFilter - type int out of range

2012-02-19 Thread Bing Li
Dear all,

I am running the sample about TimeStampFilter as follows.

ListLong ts = new ArrayListLong();
ts.add(new Long(1329640759364));
ts.add(new Long(1329640759372));
ts.add(new Long(1329640759378));
Filter filter = new TimestampsFilter(ts);

When compiling the above code, the error is type int out of range. But
the time stamp is that long. How to handle this problem?

Thanks so much!

Best regards,
Bing


Re: ERROR zookeeper.RecoverableZooKeeper: ZooKeeper exists failed after 3 retries

2012-02-17 Thread Bing Li
Stack,

The link just describes a standalone mode for HBase. If possible, I think a
pseudo-distributed mode is also preferred.

Thanks,
Bing

On Fri, Feb 17, 2012 at 11:10 PM, Stack st...@duboce.net wrote:

 On Thu, Feb 16, 2012 at 11:03 PM, Bing Li lbl...@gmail.com wrote:
  I just made summary about the experiences to set up a pseudo-distributed
  mode HBase.
 

 Thank you for the writeup.  What would you have us change in here:
 http://hbase.apache.org/book/quickstart.html?

 Thanks,
 St.Ack



Re: ERROR zookeeper.RecoverableZooKeeper: ZooKeeper exists failed after 3 retries

2012-02-17 Thread Bing Li
Yes, I noticed that. But it missed something I mentioned in my previous
email.

Thanks,
Bing

On Sat, Feb 18, 2012 at 12:11 AM, Stack st...@duboce.net wrote:

 The next page is on pseudo-distributed:
 http://hbase.apache.org/book/standalone_dist.html#distributed

 St.Ack

 On Fri, Feb 17, 2012 at 7:18 AM, Bing Li lbl...@gmail.com wrote:
  Stack,
 
  The link just describes a standalone mode for HBase. If possible, I
 think a
  pseudo-distributed mode is also preferred.
 
  Thanks,
  Bing
 
 
  On Fri, Feb 17, 2012 at 11:10 PM, Stack st...@duboce.net wrote:
 
  On Thu, Feb 16, 2012 at 11:03 PM, Bing Li lbl...@gmail.com wrote:
   I just made summary about the experiences to set up a
 pseudo-distributed
   mode HBase.
  
 
  Thank you for the writeup.  What would you have us change in here:
  http://hbase.apache.org/book/quickstart.html?
 
  Thanks,
  St.Ack
 
 



Re: ERROR zookeeper.RecoverableZooKeeper: ZooKeeper exists failed after 3 retries

2012-02-14 Thread Bing Li
Dear Jean-Daniel,

The issue is solved. I think the book in the HBase the Definitive Guide
does not give sufficient descriptions about the pseudo-distributed mode.

Thanks so much!
Bing

On Tue, Feb 14, 2012 at 7:27 AM, Jean-Daniel Cryans jdcry...@apache.orgwrote:

 Is zookeeper running properly? Is it where your shell expects it to
 be? Can you access HBase's web ui on port 60010?

 J-D

 On Sun, Feb 12, 2012 at 1:00 PM, Bing Li lbl...@gmail.com wrote:
  Dear all,
 
  I am a new learner of HBase. I tried to set up my HBase on a
  pseudo-distributed HDFS.
 
  After starting HDFS by running ./start-dfs.sh and ./start-hbase.sh, I
  started the HBase shell.
 
./hbase shell
 
  It was started properly. However, when I typed the command, status, as
  follows.
 
hbase(main):001:0 status
 
  It got the following exception. Since I had very limited experiences to
 use
  HBase, I could not figure out what the problem was.
 
  SLF4J: Class path contains multiple SLF4J bindings.
  SLF4J: Found binding in
 
 [jar:file:/opt/hbase-0.92.0/lib/slf4j-log4j12-1.5.8.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  SLF4J: Found binding in
 
 [jar:file:/opt/hadoop-1.0.0/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
  explanation.
  12/02/13 04:34:01 ERROR zookeeper.RecoverableZooKeeper: ZooKeeper exists
  failed after 3 retries
  12/02/13 04:34:01 WARN zookeeper.ZKUtil: hconnection Unable to set
 watcher
  on znode /hbase/master
  org.apache.zookeeper.KeeperException$ConnectionLossException:
  KeeperErrorCode = ConnectionLoss for /hbase/master
 at
  org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
 at
  org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1003)
 at
 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:154)
 at
 
 org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:226)
 at
 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:76)
 at
 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:580)
 at
 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.init(HConnectionManager.java:569)
 at
 
 org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:186)
 at
  org.apache.hadoop.hbase.client.HBaseAdmin.init(HBaseAdmin.java:98)
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
  Method)
 at
 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
 at
 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
 at
 
 org.jruby.javasupport.JavaConstructor.newInstanceDirect(JavaConstructor.java:275)
 at
 
 org.jruby.java.invokers.ConstructorInvoker.call(ConstructorInvoker.java:91)
 at
 
 org.jruby.java.invokers.ConstructorInvoker.call(ConstructorInvoker.java:178)
 at
 
 org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:322)
 at
 
 org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:178)
 at
  org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:182)
 at
 
 org.jruby.java.proxies.ConcreteJavaProxy$2.call(ConcreteJavaProxy.java:47)
 at
 
 org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:322)
 
  Could you please give me a hand? Thanks so much!
 
  Best regards,
  Bing



ERROR: org.apache.hadoop.hbase.MasterNotRunningException: Retried 7 Times

2012-02-13 Thread Bing Li
Dear all,

After searching on the Web and asking for help from friends, I noticed that
the pseudo distributed configuration in the book, HBase the Definitive
Guide, was not complete. Now the ZooKeeper related exception is fixed.
However, I got another error when typing status in the HBase shell.

ERROR: org.apache.hadoop.hbase.MasterNotRunningException: Retried 7
Times

I am trying to fix it myself. Your help is highly appreciated.

Thanks so much!
Bing Li

On Mon, Feb 13, 2012 at 5:00 AM, Bing Li lbl...@gmail.com wrote:

 Dear all,

 I am a new learner of HBase. I tried to set up my HBase on a
 pseudo-distributed HDFS.

 After starting HDFS by running ./start-dfs.sh and ./start-hbase.sh, I
 started the HBase shell.

./hbase shell

 It was started properly. However, when I typed the command, status, as
 follows.

hbase(main):001:0 status

 It got the following exception. Since I had very limited experiences to
 use HBase, I could not figure out what the problem was.

 SLF4J: Class path contains multiple SLF4J bindings.
 SLF4J: Found binding in
 [jar:file:/opt/hbase-0.92.0/lib/slf4j-log4j12-1.5.8.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: Found binding in
 [jar:file:/opt/hadoop-1.0.0/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
 explanation.
 12/02/13 04:34:01 ERROR zookeeper.RecoverableZooKeeper: ZooKeeper exists
 failed after 3 retries
 12/02/13 04:34:01 WARN zookeeper.ZKUtil: hconnection Unable to set watcher
 on znode /hbase/master
 org.apache.zookeeper.KeeperException$ConnectionLossException:
 KeeperErrorCode = ConnectionLoss for /hbase/master
 at
 org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
 at
 org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1003)
 at
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:154)
 at
 org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:226)
 at
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:76)
 at
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:580)
 at
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.init(HConnectionManager.java:569)
 at
 org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:186)
 at
 org.apache.hadoop.hbase.client.HBaseAdmin.init(HBaseAdmin.java:98)
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
 Method)
 at
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
 at
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
 at
 org.jruby.javasupport.JavaConstructor.newInstanceDirect(JavaConstructor.java:275)
 at
 org.jruby.java.invokers.ConstructorInvoker.call(ConstructorInvoker.java:91)
 at
 org.jruby.java.invokers.ConstructorInvoker.call(ConstructorInvoker.java:178)
 at
 org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:322)
 at
 org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:178)
 at
 org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:182)
 at
 org.jruby.java.proxies.ConcreteJavaProxy$2.call(ConcreteJavaProxy.java:47)
 at
 org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:322)

 Could you please give me a hand? Thanks so much!

 Best regards,
 Bing





Re: ERROR: org.apache.hadoop.hbase.MasterNotRunningException: Retried 7 Times

2012-02-13 Thread Bing Li
Dear Jimmy,

Thanks so much for your reply!

I didn't set up the zookeeper.quorom. After getting your email, I made a
change. Now my hbase-site.xml is as follows.

configuration
  property
namehbase.rootdir/name
valuehdfs://localhost:9000/hbase/value
  /property
  property
namedfs.replication/name
value1/value
  /property
  property
namehbase.cluster.distributed/name
valuetrue/value
  /property
  property
namehbase.zookeeper.quorum/name
valuelocalhost/value
  /property
/configuration

The previous error is still existed. I feel weird why HBase developers
cannot provide a reliable description about their work.

Best,
Bing


On Tue, Feb 14, 2012 at 2:16 AM, Jimmy Xiang jxi...@cloudera.com wrote:

 What's your hbase.zookeeper.quorom configuration?   You can check out this
 quick start guide:

 http://hbase.apache.org/book/quickstart.html

 Thanks,
 Jimmy


 On Mon, Feb 13, 2012 at 10:09 AM, Bing Li lbl...@gmail.com wrote:

 Dear all,

 After searching on the Web and asking for help from friends, I noticed
 that
 the pseudo distributed configuration in the book, HBase the Definitive
 Guide, was not complete. Now the ZooKeeper related exception is fixed.
 However, I got another error when typing status in the HBase shell.

ERROR: org.apache.hadoop.hbase.MasterNotRunningException: Retried 7
 Times

 I am trying to fix it myself. Your help is highly appreciated.

 Thanks so much!
 Bing Li

 On Mon, Feb 13, 2012 at 5:00 AM, Bing Li lbl...@gmail.com wrote:

  Dear all,
 
  I am a new learner of HBase. I tried to set up my HBase on a
  pseudo-distributed HDFS.
 
  After starting HDFS by running ./start-dfs.sh and ./start-hbase.sh, I
  started the HBase shell.
 
 ./hbase shell
 
  It was started properly. However, when I typed the command, status, as
  follows.
 
 hbase(main):001:0 status
 
  It got the following exception. Since I had very limited experiences to
  use HBase, I could not figure out what the problem was.
 
  SLF4J: Class path contains multiple SLF4J bindings.
  SLF4J: Found binding in
 
 [jar:file:/opt/hbase-0.92.0/lib/slf4j-log4j12-1.5.8.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  SLF4J: Found binding in
 
 [jar:file:/opt/hadoop-1.0.0/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
  explanation.
  12/02/13 04:34:01 ERROR zookeeper.RecoverableZooKeeper: ZooKeeper exists
  failed after 3 retries
  12/02/13 04:34:01 WARN zookeeper.ZKUtil: hconnection Unable to set
 watcher
  on znode /hbase/master
  org.apache.zookeeper.KeeperException$ConnectionLossException:
  KeeperErrorCode = ConnectionLoss for /hbase/master
  at
  org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
  at
  org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
  at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1003)
  at
 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:154)
  at
 
 org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:226)
  at
 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:76)
  at
 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:580)
  at
 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.init(HConnectionManager.java:569)
  at
 
 org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:186)
  at
  org.apache.hadoop.hbase.client.HBaseAdmin.init(HBaseAdmin.java:98)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
  Method)
  at
 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
  at
 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
  at
 java.lang.reflect.Constructor.newInstance(Constructor.java:513)
  at
 
 org.jruby.javasupport.JavaConstructor.newInstanceDirect(JavaConstructor.java:275)
  at
 
 org.jruby.java.invokers.ConstructorInvoker.call(ConstructorInvoker.java:91)
  at
 
 org.jruby.java.invokers.ConstructorInvoker.call(ConstructorInvoker.java:178)
  at
 
 org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:322)
  at
 
 org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:178)
  at
 
 org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:182)
  at
 
 org.jruby.java.proxies.ConcreteJavaProxy$2.call(ConcreteJavaProxy.java:47)
  at
 
 org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:322)
 
  Could you please give me a hand? Thanks so much!
 
  Best regards,
  Bing
 
 
 





Re: ERROR: org.apache.hadoop.hbase.MasterNotRunningException: Retried 7 Times

2012-02-13 Thread Bing Li
Dear Jimmy,

I am a new user of HBase. My experiences in HBase and Hadoop is very
limited. I just tried to follow some books, such as Hadoop/HBase the
Definitive Guide. However, I still got some problems.

What I am trying to do is just to set up a pseudo distributed HBase
environment on a single node. After that, I will start my system
programming in Java. I hope I could deploy the system in fully distributed
mode when my system is done.

So what I am configuring is very simple. Do I need to set up the zookeeper
port in hbase-site.xml?

Thanks so much!

Best,
Bing


On Tue, Feb 14, 2012 at 3:16 AM, Jimmy Xiang jxi...@cloudera.com wrote:

 Have you restarted your HBase after the change?  What's the zookeeper port
 does your HMaster use?

 Can you run the following to checkout where is your HMaster as below?

 hbase zkcli
   then:  get /hbase/master
  It should show you master location.

 It seems you have a distributed installation.  How many regionservers do
 you have?  Can you check your
 master web UI to make sure all look fine.

 Thanks,
 Jimmy


 On Mon, Feb 13, 2012 at 10:51 AM, Bing Li lbl...@gmail.com wrote:

 Dear Jimmy,

 Thanks so much for your reply!

 I didn't set up the zookeeper.quorom. After getting your email, I made a
 change. Now my hbase-site.xml is as follows.

 configuration
   property
 namehbase.rootdir/name
 valuehdfs://localhost:9000/hbase/value
   /property
   property
 namedfs.replication/name
 value1/value
   /property
   property
 namehbase.cluster.distributed/name
 valuetrue/value
   /property
   property
 namehbase.zookeeper.quorum/name
 valuelocalhost/value
   /property
 /configuration

 The previous error is still existed. I feel weird why HBase developers
 cannot provide a reliable description about their work.

 Best,
 Bing


 On Tue, Feb 14, 2012 at 2:16 AM, Jimmy Xiang jxi...@cloudera.com wrote:

 What's your hbase.zookeeper.quorom configuration?   You can check out
 this quick start guide:

 http://hbase.apache.org/book/quickstart.html

 Thanks,
 Jimmy


 On Mon, Feb 13, 2012 at 10:09 AM, Bing Li lbl...@gmail.com wrote:

 Dear all,

 After searching on the Web and asking for help from friends, I noticed
 that
 the pseudo distributed configuration in the book, HBase the Definitive
 Guide, was not complete. Now the ZooKeeper related exception is fixed.
 However, I got another error when typing status in the HBase shell.

ERROR: org.apache.hadoop.hbase.MasterNotRunningException: Retried 7
 Times

 I am trying to fix it myself. Your help is highly appreciated.

 Thanks so much!
 Bing Li

 On Mon, Feb 13, 2012 at 5:00 AM, Bing Li lbl...@gmail.com wrote:

  Dear all,
 
  I am a new learner of HBase. I tried to set up my HBase on a
  pseudo-distributed HDFS.
 
  After starting HDFS by running ./start-dfs.sh and ./start-hbase.sh, I
  started the HBase shell.
 
 ./hbase shell
 
  It was started properly. However, when I typed the command, status, as
  follows.
 
 hbase(main):001:0 status
 
  It got the following exception. Since I had very limited experiences
 to
  use HBase, I could not figure out what the problem was.
 
  SLF4J: Class path contains multiple SLF4J bindings.
  SLF4J: Found binding in
 
 [jar:file:/opt/hbase-0.92.0/lib/slf4j-log4j12-1.5.8.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  SLF4J: Found binding in
 
 [jar:file:/opt/hadoop-1.0.0/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
  explanation.
  12/02/13 04:34:01 ERROR zookeeper.RecoverableZooKeeper: ZooKeeper
 exists
  failed after 3 retries
  12/02/13 04:34:01 WARN zookeeper.ZKUtil: hconnection Unable to set
 watcher
  on znode /hbase/master
  org.apache.zookeeper.KeeperException$ConnectionLossException:
  KeeperErrorCode = ConnectionLoss for /hbase/master
  at
  org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
  at
  org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
  at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1003)
  at
 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:154)
  at
 
 org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:226)
  at
 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:76)
  at
 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:580)
  at
 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.init(HConnectionManager.java:569)
  at
 
 org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:186)
  at
  org.apache.hadoop.hbase.client.HBaseAdmin.init(HBaseAdmin.java:98)
  at
 sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
  Method

Re: ERROR: org.apache.hadoop.hbase.MasterNotRunningException: Retried 7 Times

2012-02-13 Thread Bing Li
Dear Jimmy,

I configured the standalone mode successfully. But I wonder why the pseudo
distributed one does work.

I checked in logs and got the following exceptions. Does the information
give you some hints?

Thanks so much for your help again!

Best,
Bing

2012-02-13 18:25:49,782 FATAL org.apache.hadoop.hbase.master.HMaster:
Unhandled exception. Starting shutdown.
java.net.ConnectException: Call to localhost/127.0.0.1:9000 failed on
connection exception: java.net.ConnectException: Connection refuse
d
at org.apache.hadoop.ipc.Client.wrapException(Client.java:1095)
at org.apache.hadoop.ipc.Client.call(Client.java:1071)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at $Proxy10.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)
at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:238)
at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:203)
at
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)
at org.apache.hadoop.hbase.util.FSUtils.getRootDir(FSUtils.java:471)
at
org.apache.hadoop.hbase.master.MasterFileSystem.init(MasterFileSystem.java:94)
at
org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:448)
at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:326)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
at
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:656)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:434)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:560)
at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:184)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1202)
at org.apache.hadoop.ipc.Client.call(Client.java:1046)
... 18 more
2012-02-13 18:25:49,787 INFO org.apache.hadoop.hbase.master.HMaster:
Aborting
2012-02-13 18:25:49,787 DEBUG org.apache.hadoop.hbase.master.HMaster:
Stopping service threads


Thanks so much!
Bing

On Tue, Feb 14, 2012 at 3:35 AM, Jimmy Xiang jxi...@cloudera.com wrote:

 In this case, you may just use the standalone mode.  You can follow the
 quick start step by step.

 The default zookeeper port is 2181, you don't need to configure it.



 On Mon, Feb 13, 2012 at 11:28 AM, Bing Li lbl...@gmail.com wrote:

 Dear Jimmy,

 I am a new user of HBase. My experiences in HBase and Hadoop is very
 limited. I just tried to follow some books, such as Hadoop/HBase the
 Definitive Guide. However, I still got some problems.

 What I am trying to do is just to set up a pseudo distributed HBase
 environment on a single node. After that, I will start my system
 programming in Java. I hope I could deploy the system in fully distributed
 mode when my system is done.

 So what I am configuring is very simple. Do I need to set up the
 zookeeper port in hbase-site.xml?

 Thanks so much!

 Best,
 Bing


 On Tue, Feb 14, 2012 at 3:16 AM, Jimmy Xiang jxi...@cloudera.com wrote:

 Have you restarted your HBase after the change?  What's the zookeeper
 port does your HMaster use?

 Can you run the following to checkout where is your HMaster as below?

 hbase zkcli
   then:  get /hbase/master
  It should show you master location.

 It seems you have a distributed installation.  How many regionservers do
 you have?  Can you check your
 master web UI to make sure all look fine.

 Thanks,
 Jimmy


 On Mon, Feb 13, 2012 at 10:51 AM, Bing Li lbl...@gmail.com wrote:

 Dear Jimmy,

 Thanks so much for your reply!

 I didn't set up the zookeeper.quorom. After getting your email, I made
 a change. Now my hbase-site.xml is as follows.

 configuration
   property
 namehbase.rootdir/name
 valuehdfs://localhost:9000/hbase/value
   /property
   property
 namedfs.replication/name
 value1/value
   /property
   property
 namehbase.cluster.distributed/name
 valuetrue/value
   /property
   property
 namehbase.zookeeper.quorum/name
 valuelocalhost/value
   /property
 /configuration

 The previous error is still existed. I feel weird why HBase developers
 cannot provide a reliable description about their work.

 Best,
 Bing


 On Tue, Feb 14, 2012 at 2:16 AM, Jimmy Xiang jxi...@cloudera.comwrote:

 What's your hbase.zookeeper.quorom configuration?   You can check out
 this quick start guide:

 http

Re: ERROR: org.apache.hadoop.hbase.MasterNotRunningException: Retried 7 Times

2012-02-13 Thread Bing Li
Dear Jimmy,

Thanks so much for your instant reply!

My hbase-site.xml is like the following.

  property
namehbase.rootdir/name
valuehdfs://localhost:9000/hbase/value
  /property
  property
namedfs.replication/name
value1/value
  /property
  property
namehbase.master/name
valuelocalhost:6/value
  /property
  property
namehbase.cluster.distributed/name
valuetrue/value
  /property
  property
namehbase.zookeeper.quorum/name
valuelocalhost/value
  /property

When I run hadoop fs -ls /, the directories and files under the linux root
are displayed.

Best,
Bing

On Tue, Feb 14, 2012 at 3:48 AM, Jimmy Xiang jxi...@cloudera.com wrote:

 Which port does your HDFS listen to? It is not 9000, right?

 namehbase.rootdir/name
 valuehdfs://localhost:9000/hbase/value

 You need to fix this and make sure your HDFS is working, for example,
 the following command should work for you.

 hadoop fs -ls /



 On Mon, Feb 13, 2012 at 11:44 AM, Bing Li lbl...@gmail.com wrote:

 Dear Jimmy,

 I configured the standalone mode successfully. But I wonder why the
 pseudo distributed one does work.

 I checked in logs and got the following exceptions. Does the information
 give you some hints?

 Thanks so much for your help again!

 Best,
 Bing

 2012-02-13 18:25:49,782 FATAL org.apache.hadoop.hbase.master.HMaster:
 Unhandled exception. Starting shutdown.
 java.net.ConnectException: Call to localhost/127.0.0.1:9000 failed on
 connection exception: java.net.ConnectException: Connection refuse
 d
 at org.apache.hadoop.ipc.Client.wrapException(Client.java:1095)
  at org.apache.hadoop.ipc.Client.call(Client.java:1071)
 at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
  at $Proxy10.getProtocolVersion(Unknown Source)
 at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
  at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
 at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)
  at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:238)
 at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:203)
  at
 org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
 at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)
  at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
 at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)
  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
 at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)
  at org.apache.hadoop.hbase.util.FSUtils.getRootDir(FSUtils.java:471)
 at
 org.apache.hadoop.hbase.master.MasterFileSystem.init(MasterFileSystem.java:94)
  at
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:448)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:326)
  at java.lang.Thread.run(Thread.java:662)
 Caused by: java.net.ConnectException: Connection refused
 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
  at
 sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
 at
 org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
  at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:656)
 at
 org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:434)
  at
 org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:560)
 at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:184)
  at org.apache.hadoop.ipc.Client.getConnection(Client.java:1202)
 at org.apache.hadoop.ipc.Client.call(Client.java:1046)
  ... 18 more
 2012-02-13 18:25:49,787 INFO org.apache.hadoop.hbase.master.HMaster:
 Aborting
 2012-02-13 18:25:49,787 DEBUG org.apache.hadoop.hbase.master.HMaster:
 Stopping service threads


 Thanks so much!
 Bing


 On Tue, Feb 14, 2012 at 3:35 AM, Jimmy Xiang jxi...@cloudera.com wrote:

 In this case, you may just use the standalone mode.  You can follow the
 quick start step by step.

 The default zookeeper port is 2181, you don't need to configure it.



 On Mon, Feb 13, 2012 at 11:28 AM, Bing Li lbl...@gmail.com wrote:

 Dear Jimmy,

 I am a new user of HBase. My experiences in HBase and Hadoop is very
 limited. I just tried to follow some books, such as Hadoop/HBase the
 Definitive Guide. However, I still got some problems.

 What I am trying to do is just to set up a pseudo distributed HBase
 environment on a single node. After that, I will start my system
 programming in Java. I hope I could deploy the system in fully distributed
 mode when my system is done.

 So what I am configuring is very simple. Do I need to set up the
 zookeeper port in hbase-site.xml?

 Thanks so much!

 Best,
 Bing


 On Tue, Feb 14, 2012 at 3:16 AM, Jimmy Xiang jxi...@cloudera.comwrote:

 Have you restarted your HBase after the change?  What's the zookeeper
 port does your HMaster use?

 Can you run the following to checkout where is your HMaster as below?

 hbase zkcli
   then:  get /hbase/master
  It should show you master location.

 It seems

Re: Why Cannot the Data/Name Directory Be Changed?

2012-02-13 Thread Bing Li
Dear all,

I fixed the problem in the previous email by doing that on Ubuntu 10
instead of RedHat 9. RedHat 9 might be too old?

Thanks so much!
Bing

On Tue, Feb 14, 2012 at 1:00 PM, Bing Li lbl...@gmail.com wrote:

 Dear all,

 I am a new user of HDFS. The default Data/Name directory is /tmp. I would
 like to change it. The hdfs-site.xml is updated as follows.

   property
 namedfs.replication/name
 value1/value
 descriptionThe actual number of replications can be specified when
 the file is created./description
   /property
   property
 namehadoop.tmp.dir/name
 value/home/bing/GreatFreeLabs/Hadoop/FS/value
   /property
   property
 namedfs.name.dir/name
 value${hadoop.tmp.dir}/dfs/name//value
   /property
   property
 namedfs.data.dir/name
 value${hadoop.tmp.dir}/dfs/data//value
   /property

 But when formatting by running the following command, I was asked to
 format the /tmp. Why?

$ hadoop namenode -format
Re-format filesystem in /tmp/hadoop-libing/dfs/name ? (Y or N) N

 Because the updated name node is not formatted, the name node cannot be
 started.

 How to solve the problem? Thanks so much!

 Best regards,
 Bing



Which Version of Hadoop Should I Use?

2012-02-04 Thread Bing Li
Dear all,

I am starting to learn how to use HBase. I am a little bit confused about
the version of Hadoop. Which one should I use?

According to the book, HBase - The Definitive Guide, Page. 47, it is said
that The current version of HBase will only run on Hadoop 0.20.x.

But, in the page of http://hbase.apache.org/book/hadoop.html, it is said
HBase will lose data unless it is running on an HDFS that has a durable
sync implementation. Hadoop 0.20.2, Hadoop 0.20.203.0, and Hadoop
0.20.204.0 DO NOT have this attribute. Currently only Hadoop versions
0.20.205.x or any release in excess of this version -- this includes hadoop
1.0.0 -- have a working, durable sync. If so, Hadoop 0.20.x can NOT be
used with the latest version HBase?

Now the version of HBase I am learning is 0.92. I noticed that a jar
file, hadoop-core-1.0.0.jar, was there. It seems that the HBase can run
with Hadoop 1.0?

Could you please give me a hand on this?

Thanks so much!

Best regards,
Bing


Fwd: How to Rank in HBase?

2012-01-29 Thread Bing Li
Another question is whether it is proper to update data in HBase frequently?

Thanks,
Bing

-- Forwarded message --
From: Bing Li lbl...@gmail.com
Date: Mon, Jan 30, 2012 at 4:00 AM
Subject: How to Rank in HBase?
To: user@hbase.apache.org


Dear all,

I am a new user of HBase. I wonder the ranking strategy in HBase.

I am now using Solr to manage the large amount of data in my system. I got
one issue when loading data from Solr. In most cases, data is loaded and
ranked from Solr according to keyword partial matching degree. Well, my
case is different. I hope data can be loaded by another complete matching
field, e.g., the author of the data. I noticed that Solr could not rank the
data properly for the complete matching.

I guess I can do the same thing in HBase too, right?

My question is whether it is possible to rank data in HBase according to
customized strategy, like PageRank?

Thasks,
Bing


Re: How to Rank in HBase?

2012-01-29 Thread Bing Li
Dear Stack,

Thanks so much for your reply!

According to my understanding, in a large scale distributed system, it
prefers write-once-read-many. Frequent-updating must bring heavy load for
the consistency issue and the performance must be lowered. HBase must not
be suitable to be updated frequently, right?

Best regards,
Bing

On Mon, Jan 30, 2012 at 1:51 PM, Stack st...@duboce.net wrote:

 On Sun, Jan 29, 2012 at 12:02 PM, Bing Li lbl...@gmail.com wrote:
  Another question is whether it is proper to update data in HBase
 frequently?
 

 This is 'normal', yes.
 St.Ack



Re: How to Rank in HBase?

2012-01-29 Thread Bing Li
Dear Ian,

I appreciate so much for your detailed reply! I will read the book about
HBase.

Best regards,
Bing

On Mon, Jan 30, 2012 at 2:36 PM, Ian Varley ivar...@salesforce.com wrote:

 Bing,

 HBase uses an approach to structuring its storage known as Log Structured
 Merge Trees, which you can learn more about here:


 http://scholar.google.com/scholar?q=log+structured+merge+treehl=enas_sdt=0as_vis=1oi=scholart

 As well as in Lars George's great book, here:

 http://shop.oreilly.com/product/0636920014348.do

 It does all of these frequent updates just in memory, which is very
 fast; at the same time, it writes a simple forward-only log of all edits
 (known as the Write Ahead Log, or WAL) to disk in order to provide
 durability in the event of machine failure. It periodically writes the
 in-memory data to disk in big immutable ordered chunks, called store
 files, which is very efficient. Future reads of the data then merge the
 on-disk store file data with the current state in memory, to get the full
 picture of the state of any row. Over time, the many small store files get
 compacted into bigger files, so that individual reads don't have too many
 files to read from. Each get or scan operation can just read small
 blocks of the store files; when you ask for one record, it doesn't have to
 read gigabytes of data from the disk, it can just read a small block. As
 such, random small reads and writes on a very big data set can be done
 efficiently.

 Furthermore, it's fine to update the data store frequently. For any given
 record, you can make as many updates as you want to the in-memory
 structures, and these aren't written to disk until the memory store is
 flushed (and into the WAL, but that's also efficient b/c it's ordered by
 update time, not record key). It all happens in memory, which is very fast
 (but, again, it's safe b/c of the WAL). There are even some recent JIRAs
 that make that process more efficient, by, for example, HBASE-4241
 https://issues.apache.org/jira/browse/HBASE-4241.

 One way to think about it is that HBase is *precisely* a layer that adds
 these efficient random read/write capabilities on top of the Hadoop
 distributed file system (HDFS), and takes care of doing that in a way that
 parallelizes nicely across a large cluster of machines, deals with machine
 failures, etc.

 Ian

 On Jan 29, 2012, at 10:16 PM, Bing Li wrote:

 Dear Stack,

 Thanks so much for your reply!

 According to my understanding, in a large scale distributed system, it
 prefers write-once-read-many. Frequent-updating must bring heavy load for
 the consistency issue and the performance must be lowered. HBase must not
 be suitable to be updated frequently, right?

 Best regards,
 Bing

 On Mon, Jan 30, 2012 at 1:51 PM, Stack st...@duboce.netmailto:
 st...@duboce.net wrote:

 On Sun, Jan 29, 2012 at 12:02 PM, Bing Li lbl...@gmail.commailto:
 lbl...@gmail.com wrote:
 Another question is whether it is proper to update data in HBase
 frequently?


 This is 'normal', yes.
 St.Ack