RE: Speeding up the row count

2013-04-19 Thread Omkar Joshi
Hi,

I'm having a 2-node(VMs) Hadoop cluster atop which HBase is running in the 
distributed mode.

I'm having a table named ORDERS with >10 rows.

NOTE : Since my cluster is ultra-small, I didn't pre-split the table.

ORDERS
rowkey :ORDER_ID

column family : ORDER_DETAILS
columns : CUSTOMER_ID
PRODUCT_ID
REQUEST_DATE
PRODUCT_QUANTITY
PRICE
PAYMENT_MODE

The java client code to simply check the count of the records is :

public long getTableCount(String tableName, String columnFamilyName) {

AggregationClient aggregationClient = new 
AggregationClient(config);
Scan scan = new Scan();
scan.addFamily(Bytes.toBytes(columnFamilyName));
scan.setFilter(new FirstKeyOnlyFilter());

long rowCount = 0;

try {
rowCount = 
aggregationClient.rowCount(Bytes.toBytes(tableName),
null, scan);
System.out.println("No. of rows in " + tableName + " is 
"
+ rowCount);
} catch (Throwable e) {
// TODO Auto-generated catch block
e.printStackTrace();
}

return rowCount;
}

It is running for more than 6 minutes now :(

What shall I do to speed up the execution to milliseconds(at least a couple of 
seconds)?

Regards,
Omkar Joshi


-Original Message-
From: Vedad Kirlic [mailto:kirl...@gmail.com]
Sent: Thursday, April 18, 2013 12:22 AM
To: user@hbase.apache.org
Subject: Re: Speeding up the row count

Hi Omkar,

If you are not interested in occurrences of specific column (e.g. name,
email ... ), and just want to get total number of rows (regardless of their
content - i.e. columns), you should avoid adding any columns to the Scan, in
which case coprocessor implementation for AggregateClient, will add
FirstKeyOnlyFilter to the Scan, so to avoid loading unnecessary columns, so
this should result in some speed up.

This is similar approach to what hbase shell 'count' implementation does,
although reduction in overhead in that case is bigger, since data transfer
from region server to client (shell) is minimized, whereas in case of
coprocessor, data does not leave region server, so most of the improvement
in that case should come from avoiding loading of unnecessary files. Not
sure how this will apply to your particular case, given that data set per
row seems to be rather small. Also, in case of AggregateClient you will
benefit if/when your tables span multiple regions. Essentially, performance
of this approach will 'degrade' as your table gets bigger, but only to the
point when it splits, from which point it should be pretty constant. Having
this in mind, and your type of data, you might consider pre-splitting your
tables.

DISCLAIMER: this is mostly theoretical, since I'm not an expert in hbase
internals :), so your best bet is to try it - I'm too lazy to verify impact
my self ;)

Finally, if your case can tolerate eventual consistency of counters with
actual number of rows, you can, as already suggested, have RowCounter map
reduce run every once in a while, write the counter(s) back to hbase, and
read those when you need to obtain the number of rows.

Regards,
Vedad



--
View this message in context: 
http://apache-hbase.679495.n3.nabble.com/Speeding-up-the-row-count-tp4042378p4042415.html
Sent from the HBase User mailing list archive at Nabble.com.

The contents of this e-mail and any attachment(s) may contain confidential or 
privileged information for the intended recipient(s). Unintended recipients are 
prohibited from taking action on the basis of information in this e-mail and  
using or disseminating the information,  and must notify the sender and delete 
it from their system. L&T Infotech will not accept responsibility or liability 
for the accuracy or completeness of, or the presence of any virus or disabling 
code in this e-mail"


Re: Slow region server recoveries

2013-04-19 Thread Nicolas Liochon
Hey Varun,

Could you please share the logs and the configuration (hdfs / hbase
settings + cluster description). What's the failure scenario?
>From an HDFS pov, HDFS 3703 does not change the dead node status. But these
node will be given the lowest priority when reading.


Cheers,

Nicolas


On Fri, Apr 19, 2013 at 3:01 AM, Varun Sharma  wrote:

> Hi,
>
> We are facing problems with really slow HBase region server recoveries ~ 20
> minuted. Version is hbase 0.94.3 compiled with hadoop.profile=2.0.
>
> Hadoop version is CDH 4.2 with HDFS 3703 and HDFS 3912 patched and stale
> node timeouts configured correctly. Time for dead node detection is still
> 10 minutes.
>
> We see that our region server is trying to read an HLog is stuck there for
> a long time. Logs here:
>
> 2013-04-12 21:14:30,248 WARN org.apache.hadoop.hdfs.DFSClient: Failed to
> connect to /10.156.194.251:50010 for file
>
> /hbase/feeds/fbe25f94ed4fa37fb0781e4a8efae142/home/1d102c5238874a5d82adbcc09bf06599
> for block
>
> BP-696828882-10.168.7.226-1364886167971:blk_-3289968688911401881_9428:java.net.SocketTimeoutException:
> 15000 millis timeout while waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/10.156.192.173:52818
> remote=/
> 10.156.194.251:50010]
>
> I would think that HDFS 3703 would make the server fail fast and go to the
> third datanode. Currently, the recovery seems way too slow for production
> usage...
>
> Varun
>


Re: Speeding up the row count

2013-04-19 Thread Ted Yu
Since there is only one region in your table, using aggregation coprocessor has 
no advantage. 
I think there may be some issue with your cluster - row count should finish 
within 6 minutes.

Have you checked server logs ?

Thanks

On Apr 19, 2013, at 12:33 AM, Omkar Joshi  wrote:

> Hi,
> 
> I'm having a 2-node(VMs) Hadoop cluster atop which HBase is running in the 
> distributed mode.
> 
> I'm having a table named ORDERS with >10 rows.
> 
> NOTE : Since my cluster is ultra-small, I didn't pre-split the table.
> 
> ORDERS
> rowkey :ORDER_ID
> 
> column family : ORDER_DETAILS
>columns : CUSTOMER_ID
>PRODUCT_ID
>REQUEST_DATE
>PRODUCT_QUANTITY
>PRICE
>PAYMENT_MODE
> 
> The java client code to simply check the count of the records is :
> 
> public long getTableCount(String tableName, String columnFamilyName) {
> 
>AggregationClient aggregationClient = new 
> AggregationClient(config);
>Scan scan = new Scan();
>scan.addFamily(Bytes.toBytes(columnFamilyName));
>scan.setFilter(new FirstKeyOnlyFilter());
> 
>long rowCount = 0;
> 
>try {
>rowCount = 
> aggregationClient.rowCount(Bytes.toBytes(tableName),
>null, scan);
>System.out.println("No. of rows in " + tableName + " 
> is "
>+ rowCount);
>} catch (Throwable e) {
>// TODO Auto-generated catch block
>e.printStackTrace();
>}
> 
>return rowCount;
>}
> 
> It is running for more than 6 minutes now :(
> 
> What shall I do to speed up the execution to milliseconds(at least a couple 
> of seconds)?
> 
> Regards,
> Omkar Joshi
> 
> 
> -Original Message-
> From: Vedad Kirlic [mailto:kirl...@gmail.com]
> Sent: Thursday, April 18, 2013 12:22 AM
> To: user@hbase.apache.org
> Subject: Re: Speeding up the row count
> 
> Hi Omkar,
> 
> If you are not interested in occurrences of specific column (e.g. name,
> email ... ), and just want to get total number of rows (regardless of their
> content - i.e. columns), you should avoid adding any columns to the Scan, in
> which case coprocessor implementation for AggregateClient, will add
> FirstKeyOnlyFilter to the Scan, so to avoid loading unnecessary columns, so
> this should result in some speed up.
> 
> This is similar approach to what hbase shell 'count' implementation does,
> although reduction in overhead in that case is bigger, since data transfer
> from region server to client (shell) is minimized, whereas in case of
> coprocessor, data does not leave region server, so most of the improvement
> in that case should come from avoiding loading of unnecessary files. Not
> sure how this will apply to your particular case, given that data set per
> row seems to be rather small. Also, in case of AggregateClient you will
> benefit if/when your tables span multiple regions. Essentially, performance
> of this approach will 'degrade' as your table gets bigger, but only to the
> point when it splits, from which point it should be pretty constant. Having
> this in mind, and your type of data, you might consider pre-splitting your
> tables.
> 
> DISCLAIMER: this is mostly theoretical, since I'm not an expert in hbase
> internals :), so your best bet is to try it - I'm too lazy to verify impact
> my self ;)
> 
> Finally, if your case can tolerate eventual consistency of counters with
> actual number of rows, you can, as already suggested, have RowCounter map
> reduce run every once in a while, write the counter(s) back to hbase, and
> read those when you need to obtain the number of rows.
> 
> Regards,
> Vedad
> 
> 
> 
> --
> View this message in context: 
> http://apache-hbase.679495.n3.nabble.com/Speeding-up-the-row-count-tp4042378p4042415.html
> Sent from the HBase User mailing list archive at Nabble.com.
> 
> The contents of this e-mail and any attachment(s) may contain confidential or 
> privileged information for the intended recipient(s). Unintended recipients 
> are prohibited from taking action on the basis of information in this e-mail 
> and  using or disseminating the information,  and must notify the sender and 
> delete it from their system. L&T Infotech will not accept responsibility or 
> liability for the accuracy or completeness of, or the presence of any virus 
> or disabling code in this e-mail"


Coreprocessor always scan the whole table.

2013-04-19 Thread GuoWei
Hello,

We use HBase core processor endpoint  to process realtime data. But when I use 
coreprocessorExec method to scan table and pass startRow and endRow. It always 
scan all table instead of the result between the startRow and endRow. 

my code.

results = table.coprocessorExec(IEndPoint_SA.class,  startrow, endrow,
new Batch.Call>() {
  public Hashtable call(IEndPoint_SA 
instance)throws IOException{
  Hashtable s = null;
try {
s=instance.GetList();
} catch (ParseException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return s;
  }
});



Best Regards / 商祺
郭伟 Guo Wei
-


RE: Speeding up the row count

2013-04-19 Thread Omkar Joshi
Hi Ted,

6 minutes is too long :(
Will this decrease to seconds if more nodes are added in the cluster?

I got this exception finally(I recall faintly about increasing some timeout 
parameter while querying but I didn't want to increase it to a high value) :

Apr 19, 2013 1:05:43 PM 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation 
processExecs
WARNING: Error executing for row
java.util.concurrent.ExecutionException: 
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after 
attempts=10, exceptions:
Fri Apr 19 12:56:01 IST 2013, 
org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@1d6e77, 
java.net.SocketTimeoutException: Call to cldx-1140-1034/172.25.6.71:60020 
failed on socket timeout exception: java.net.SocketTimeoutException: 6 
millis timeout while waiting for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/0.0.0.0:1770 
remote=cldx-1140-1034/172.25.6.71:60020]
Fri Apr 19 12:57:02 IST 2013, 
org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@1d6e77, 
java.net.SocketTimeoutException: Call to cldx-1140-1034/172.25.6.71:60020 
failed on socket timeout exception: java.net.SocketTimeoutException: 6 
millis timeout while waiting for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/0.0.0.0:1782 
remote=cldx-1140-1034/172.25.6.71:60020]
Fri Apr 19 12:58:04 IST 2013, 
org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@1d6e77, 
java.net.SocketTimeoutException: Call to cldx-1140-1034/172.25.6.71:60020 
failed on socket timeout exception: java.net.SocketTimeoutException: 6 
millis timeout while waiting for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/0.0.0.0:1785 
remote=cldx-1140-1034/172.25.6.71:60020]
Fri Apr 19 12:59:05 IST 2013, 
org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@1d6e77, 
java.net.SocketTimeoutException: Call to cldx-1140-1034/172.25.6.71:60020 
failed on socket timeout exception: java.net.SocketTimeoutException: 6 
millis timeout while waiting for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/0.0.0.0:1794 
remote=cldx-1140-1034/172.25.6.71:60020]
Fri Apr 19 13:00:08 IST 2013, 
org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@1d6e77, 
java.net.SocketTimeoutException: Call to cldx-1140-1034/172.25.6.71:60020 
failed on socket timeout exception: java.net.SocketTimeoutException: 6 
millis timeout while waiting for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/0.0.0.0:1800 
remote=cldx-1140-1034/172.25.6.71:60020]
Fri Apr 19 13:01:10 IST 2013, 
org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@1d6e77, 
java.net.SocketTimeoutException: Call to cldx-1140-1034/172.25.6.71:60020 
failed on socket timeout exception: java.net.SocketTimeoutException: 6 
millis timeout while waiting for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/0.0.0.0:1802 
remote=cldx-1140-1034/172.25.6.71:60020]
Fri Apr 19 13:02:14 IST 2013, 
org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@1d6e77, 
java.net.SocketTimeoutException: Call to cldx-1140-1034/172.25.6.71:60020 
failed on socket timeout exception: java.net.SocketTimeoutException: 6 
millis timeout while waiting for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/0.0.0.0:1804 
remote=cldx-1140-1034/172.25.6.71:60020]
Fri Apr 19 13:03:19 IST 2013, 
org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@1d6e77, 
java.net.SocketTimeoutException: Call to cldx-1140-1034/172.25.6.71:60020 
failed on socket timeout exception: java.net.SocketTimeoutException: 6 
millis timeout while waiting for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/0.0.0.0:1809 
remote=cldx-1140-1034/172.25.6.71:60020]
Fri Apr 19 13:04:27 IST 2013, 
org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@1d6e77, 
java.net.SocketTimeoutException: Call to cldx-1140-1034/172.25.6.71:60020 
failed on socket timeout exception: java.net.SocketTimeoutException: 6 
millis timeout while waiting for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/0.0.0.0:1812 
remote=cldx-1140-1034/172.25.6.71:60020]
Fri Apr 19 13:05:43 IST 2013, 
org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@1d6e77, 
java.net.SocketTimeoutException: Call to cldx-1140-1034/172.25.6.71:60020 
failed on socket timeout exception: java.net.SocketTimeoutException: 6 
millis timeout while waiting for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/0.0.0.0:1829 
remote=cldx-1140-1034/172.25.6.71:60020]

at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source)
at java.util.concurrent.FutureTask.get(Unknown Source)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processExecs(HConnectionManager.java:1475)
at 
org.apache.hadoop.hbase.client.HTable.coprocessorExec(HTable.java:1236)
at

RE: Problem in filters

2013-04-19 Thread Omkar Joshi
Hi,

There was small issue with the data(delimiters were messed up) - the filters 
seem to work correctly.
I'm now working on Hive+HBase integration, Phoenix will be taken up later.

Regards,
Omkar Joshi


-Original Message-
From: Ian Varley [mailto:ivar...@salesforce.com] 
Sent: Wednesday, April 17, 2013 9:46 PM
To: user@hbase.apache.org
Subject: Re: Problem in filters

Omkar,

Have you considered using Phoenix (https://github.com/forcedotcom/phoenix), a 
SQL skin over HBase to execute your SQL directly? That'll save you from 
learning all the nuances of HBase filters and give you as good or better 
performance.

Once you've downloaded and installed Phoenix, here's what you'd need to do:

// One time DDL statement
Connection conn = 
DriverManager.getConnection("jdbc:phoenix:your-zookeeper-quorum-host");
conn.createStatement().execute("CREATE VIEW ORDERS(\n" +
// Not sure what the PK is, so I added this column
"ORDER_DETAILS.ORDER_DETAILS_ID VARCHAR NOT NULL PRIMARY KEY,\n" +
// If you have fixed length IDs, then use CHAR(xxx)
"ORDER_DETAILS.CUSTOMER_ID VARCHAR,\n" +
"ORDER_DETAILS.PRODUCT_ID VARCHAR,\n" +
"ORDER_DETAILS.REQUEST_DATE DATE,\n" +
"ORDER_DETAILS.PRODUCT_QUANTITY INTEGER,\n" +
"ORDER_DETAILS.PRICE DECIMAL(10,2),\n" +
 // not sure on the type here, but this might map to an Enum
"ORDER_DETAILS.PAYMENT_MODE CHAR(1)\n" +
")");

// Running the query:
Connection conn = 
DriverManager.getConnection("jdbc:phoenix:your-zookeeper-quorum-host");
PreparedStatement stmt = conn.prepareStatement("SELECT 
ORDER_ID,CUSTOMER_ID,PRODUCT_ID,QUANTITY\n" +
"FROM ORDERS WHERE QUANTITY >= ? and PRODUCT_ID=?");
stmt.setInt(1,16);
stmt.setString(2,"P60337998");
ResultSet rs = stmt.executeQuery();
while (rs.next()) {
System.out.println("ORDER_ID=" + rs.getString("ORDER_ID") + ",CUSTOMER_ID=" 
+ rs.getString("CUSTOMER_ID")+
",PRODUCT_ID=" + rs.getString("PRODUCT_ID") + ",QUANTITY=" + 
rs.getInt("QUANTITY"));
}

There are different trade-offs for the make up of the columns in your PK, 
depending on your access patterns. Getting this right could prevent full table 
scans and make your query execute much faster. Also, there are performance 
trade-offs for using a VIEW versus a TABLE.

Ian


On Apr 17, 2013, at 8:32 AM, Jean-Marc Spaggiari wrote:

Hi Omkar,

Using the shell, can you scan the few first lines from your table to make
sure it's store with the expected format? Don't forget the limit the number
of rows retrieved.

JM


2013/4/17 Omkar Joshi 
mailto:omkar.jo...@lntinfotech.com>>

Hi Ted,

I tried using only productIdFilter without FilterList but still no output.

public void executeOrdersQuery() {
   /*
* SELECT ORDER_ID,CUSTOMER_ID,PRODUCT_ID,QUANTITY FROM
ORDERS WHERE
* QUANTITY >=16 and PRODUCT_ID='P60337998'
*/
   String tableName = "ORDERS";

   String family = "ORDER_DETAILS";
   int quantity = 16;
   String productId = "P60337998";

   SingleColumnValueFilter quantityFilter = new
SingleColumnValueFilter(
   Bytes.toBytes(family),
Bytes.toBytes("PRODUCT_QUANTITY"),
   CompareFilter.CompareOp.GREATER_OR_EQUAL,
   Bytes.toBytes(quantity));

   SingleColumnValueFilter productIdFilter = new
SingleColumnValueFilter(
   Bytes.toBytes(family),
Bytes.toBytes("PRODUCT_ID"),
   CompareFilter.CompareOp.EQUAL,
Bytes.toBytes(productId));

   FilterList filterList = new FilterList(
   FilterList.Operator.MUST_PASS_ALL);
   // filterList.addFilter(quantityFilter);
   filterList.addFilter(productIdFilter);

   Scan scan = new Scan();
   scan.addColumn(Bytes.toBytes(family),
Bytes.toBytes("ORDER_ID"));
   scan.addColumn(Bytes.toBytes(family),
Bytes.toBytes("CUSTOMER_ID"));
   scan.addColumn(Bytes.toBytes(family),
Bytes.toBytes("PRODUCT_ID"));
   scan.addColumn(Bytes.toBytes(family),
Bytes.toBytes("QUANTITY"));

   // scan.setFilter(filterList);
   scan.setFilter(productIdFilter);

   HTableInterface tbl =
hTablePool.getTable(Bytes.toBytes(tableName));
   ResultScanner scanResults = null;
   try {
   scanResults = tbl.getScanner(scan);

   System.out.println("scanResults : ");

   for (Result result : scanResults) {
   System.out.println("The result is " +
result);
   }

   } catch (IOException e) {
   // TODO Auto-generated catch block
   e.printStackTrace();
   } finally {
   try {
   tbl.close(

Re: Coreprocessor always scan the whole table.

2013-04-19 Thread ramkrishna vasudevan
HBASE-6870 deals with it. It is not yet committed.  We can review the patch
and take it to closure.

Regards
Ram


On Fri, Apr 19, 2013 at 3:19 PM, GuoWei  wrote:

> Hello,
>
> We use HBase core processor endpoint  to process realtime data. But when I
> use coreprocessorExec method to scan table and pass startRow and endRow. It
> always scan all table instead of the result between the startRow and endRow.
>
> my code.
>
> results = table.coprocessorExec(IEndPoint_SA.class,  startrow, endrow,
> new
> Batch.Call>() {
>   public Hashtable call(IEndPoint_SA
> instance)throws IOException{
>   Hashtable s = null;
> try {
> s=instance.GetList();
> } catch (ParseException e) {
> // TODO Auto-generated catch block
> e.printStackTrace();
> }
> return s;
>   }
> });
>
>
>
> Best Regards / 商祺
> 郭伟 Guo Wei
> -
>


Re: Coreprocessor always scan the whole table.

2013-04-19 Thread Ted Yu
Which hbase version are you using ?

Thanks

On Apr 19, 2013, at 2:49 AM, GuoWei  wrote:

> Hello,
> 
> We use HBase core processor endpoint  to process realtime data. But when I 
> use coreprocessorExec method to scan table and pass startRow and endRow. It 
> always scan all table instead of the result between the startRow and endRow. 
> 
> my code.
> 
> results = table.coprocessorExec(IEndPoint_SA.class,  startrow, endrow,
>new Batch.Call>() {
>  public Hashtable call(IEndPoint_SA 
> instance)throws IOException{
>  Hashtable s = null;
>try {
>s=instance.GetList();
>} catch (ParseException e) {
>// TODO Auto-generated catch block
>e.printStackTrace();
>}
>return s;
>  }
>});
> 
> 
> 
> Best Regards / 商祺
> 郭伟 Guo Wei
> -


Re: Slow region server recoveries

2013-04-19 Thread Varun Sharma
Hi Nicholas,

Here is the failure scenario, I have dug up the logs.

A machine fails and stops accepting/transmitting traffic. The HMaster
starts the distributed split for 13 tasks. There are 12 region servers. 12
tasks succeed but the 13th one takes a looong time.

Zookeeper timeout is set to 30 seconds. Stale node timeout is 20 seconds.
Both patches are there.

a) Machine fails around 27:30
b) Master starts the split around 27:40 and submits the tasks. The one task
which fails seems to be the one which contains the WAL being currently
written to:

2013-04-19 00:27:44,325 INFO
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting hlog:
hdfs://
ec2-107-20-237-30.compute-1.amazonaws.com/hbase/.logs/ip-10-156-194-94.ec2.internal,60020,1366323217601-splitting/ip-10-156-194-94.ec2.internal%2C60020%2C1366323217601.1366331156141,
length=0

Basically this region server picks up the task but finds the length of this
file to be 0 and drops. This happens again

c) Finally another region server picks up the task but it ends up going to
the bad datanode which should not happen because of the stale node timeout)
Unfortunately it hits the 45 retries and a connect timeout of 20 seconds
every time. This delays recovery significantly. Now I guess reducing # of
retries to 1 is one possibility.
But then the namenode logs are very interesting.

d) Namenode seems to be in cyclic lease recovery loop until the node is
marked dead. There is this one last block which exhibits this.

2013-04-19 00:28:09,744 INFO BlockStateChange: BLOCK* blk_-*
5723958680970112840_174056*{blockUCState=UNDER_RECOVERY,
primaryNodeIndex=1,
replicas=[ReplicaUnderConstruction[10.156.194.94:50010|RBW],
ReplicaUnderConstruction[10.156.192.106:50010|RBW],
ReplicaUnderConstruction[10.156.195.38:50010|RBW]]} recovery started,
primary=10.156.192.106:50010
2013-04-19 00:28:09,744 WARN org.apache.hadoop.hdfs.StateChange: DIR*
NameSystem.internalReleaseLease: File
/hbase/.logs/ip-10-156-194-94.ec2.internal,60020,1366323217601-splitting/ip-10-156-194-94.ec2.internal%2C60020%2C1366323217601.1366331156141
has not been closed. Lease recovery is in progress. RecoveryId = 174413 for
block blk_-5723958680970112840_174056{blockUCState=UNDER_RECOVERY,
primaryNodeIndex=1,
replicas=[ReplicaUnderConstruction[10.156.194.94:50010|RBW],
ReplicaUnderConstruction[10.156.192.106:50010|RBW],
ReplicaUnderConstruction[10.156.195.38:50010|RBW]]}

I see this over and over again in the logs until the datanode is marked
dead. It seems to be cycling through the replicas for this WAL block and
trying to add it to the recovery list. I looked at the code and it says:

  // Cannot close file right now, since the last block requires
recovery.
  // This may potentially cause infinite loop in lease recovery
  // if there are no valid replicas on data-nodes.
  NameNode.stateChangeLog.warn(
"DIR* NameSystem.internalReleaseLease: " +
"File " + src + " has not been closed." +
   " Lease recovery is in progress. " +
"RecoveryId = " + blockRecoveryId + " for block " +
lastBlock);
  break;

Eventually for this block, we get

2013-04-19 00:41:20,736 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
commitBlockSynchronization(lastblock=BP-696828882-10.168.7.226-1364886167971:blk_-
*5723958680970112840_174056*, newgenerationstamp=174413,
newlength=119148648, newtargets=[10.156.192.106:50010, 10.156.195.38:50010],
closeFile=true, deleteBlock=false)
2013-04-19 00:41:20,736 ERROR
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:hdfs (auth:SIMPLE) cause:java.io.IOException: Block
(=BP-696828882-10.168.7.226-1364886167971:blk_-5723958680970112840_174056)
not found
2013-04-19 00:41:20,736 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 35 on 8020, call
org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.commitBlockSynchronization
from 10.156.192.106:53271: error: java.io.IOException: Block
(=BP-696828882-10.168.7.226-1364886167971:blk_-5723958680970112840_174056)
not found

On the datanode side, i see a call for recover blocks - I see that a write
pipeline is there, which gets terminated with some socket timeouts...

00:28:11,471 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: NameNode
at ec2-107-20-237-30.compute-1.amazonaws.com/10.168.7.226:8020 calls
recoverBlock(BP-696828882-10.168.7.226-1364886167971:blk_-5723958680970112840_174056,
targets=[10.156.194.94:50010, 10.156.192.106:50010, 10.156.195.38:50010],
newGenerationStamp=174413)

Not sure but this looks like a case where data could be lost   ?

Varun


On Fri, Apr 19, 2013 at 12:38 AM, Nicolas Liochon  wrote:

> Hey Varun,
>
> Could you please share the logs and the configuration (hdfs / hbase
> settings + cluster description). What's the failure scenario?
> From an HDFS pov, HDFS 3703 does not change the dead node status. But these
> node will be given the lowest priority when reading.
>
>
> Cheers,
>
> Nicola

Re: Slow region server recoveries

2013-04-19 Thread Nicolas Liochon
Thanks for the detailed scenario and analysis. I'm going to have a look.
I can't access the logs (ec2-107-20-237-30.compute-1.amazonaws.com
timeouts), could you please send them directly to me?

Thanks,

Nicolas


On Fri, Apr 19, 2013 at 12:46 PM, Varun Sharma  wrote:

> Hi Nicholas,
>
> Here is the failure scenario, I have dug up the logs.
>
> A machine fails and stops accepting/transmitting traffic. The HMaster
> starts the distributed split for 13 tasks. There are 12 region servers. 12
> tasks succeed but the 13th one takes a looong time.
>
> Zookeeper timeout is set to 30 seconds. Stale node timeout is 20 seconds.
> Both patches are there.
>
> a) Machine fails around 27:30
> b) Master starts the split around 27:40 and submits the tasks. The one task
> which fails seems to be the one which contains the WAL being currently
> written to:
>
> 2013-04-19 00:27:44,325 INFO
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting hlog:
> hdfs://
>
> ec2-107-20-237-30.compute-1.amazonaws.com/hbase/.logs/ip-10-156-194-94.ec2.internal,60020,1366323217601-splitting/ip-10-156-194-94.ec2.internal%2C60020%2C1366323217601.1366331156141
> ,
> length=0
>
> Basically this region server picks up the task but finds the length of this
> file to be 0 and drops. This happens again
>
> c) Finally another region server picks up the task but it ends up going to
> the bad datanode which should not happen because of the stale node timeout)
> Unfortunately it hits the 45 retries and a connect timeout of 20 seconds
> every time. This delays recovery significantly. Now I guess reducing # of
> retries to 1 is one possibility.
> But then the namenode logs are very interesting.
>
> d) Namenode seems to be in cyclic lease recovery loop until the node is
> marked dead. There is this one last block which exhibits this.
>
> 2013-04-19 00:28:09,744 INFO BlockStateChange: BLOCK* blk_-*
> 5723958680970112840_174056*{blockUCState=UNDER_RECOVERY,
> primaryNodeIndex=1,
> replicas=[ReplicaUnderConstruction[10.156.194.94:50010|RBW],
> ReplicaUnderConstruction[10.156.192.106:50010|RBW],
> ReplicaUnderConstruction[10.156.195.38:50010|RBW]]} recovery started,
> primary=10.156.192.106:50010
> 2013-04-19 00:28:09,744 WARN org.apache.hadoop.hdfs.StateChange: DIR*
> NameSystem.internalReleaseLease: File
>
> /hbase/.logs/ip-10-156-194-94.ec2.internal,60020,1366323217601-splitting/ip-10-156-194-94.ec2.internal%2C60020%2C1366323217601.1366331156141
> has not been closed. Lease recovery is in progress. RecoveryId = 174413 for
> block blk_-5723958680970112840_174056{blockUCState=UNDER_RECOVERY,
> primaryNodeIndex=1,
> replicas=[ReplicaUnderConstruction[10.156.194.94:50010|RBW],
> ReplicaUnderConstruction[10.156.192.106:50010|RBW],
> ReplicaUnderConstruction[10.156.195.38:50010|RBW]]}
>
> I see this over and over again in the logs until the datanode is marked
> dead. It seems to be cycling through the replicas for this WAL block and
> trying to add it to the recovery list. I looked at the code and it says:
>
>   // Cannot close file right now, since the last block requires
> recovery.
>   // This may potentially cause infinite loop in lease recovery
>   // if there are no valid replicas on data-nodes.
>   NameNode.stateChangeLog.warn(
> "DIR* NameSystem.internalReleaseLease: " +
> "File " + src + " has not been closed." +
>" Lease recovery is in progress. " +
> "RecoveryId = " + blockRecoveryId + " for block " +
> lastBlock);
>   break;
>
> Eventually for this block, we get
>
> 2013-04-19 00:41:20,736 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
>
> commitBlockSynchronization(lastblock=BP-696828882-10.168.7.226-1364886167971:blk_-
> *5723958680970112840_174056*, newgenerationstamp=174413,
> newlength=119148648, newtargets=[10.156.192.106:50010, 10.156.195.38:50010
> ],
> closeFile=true, deleteBlock=false)
> 2013-04-19 00:41:20,736 ERROR
> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
> as:hdfs (auth:SIMPLE) cause:java.io.IOException: Block
> (=BP-696828882-10.168.7.226-1364886167971:blk_-5723958680970112840_174056)
> not found
> 2013-04-19 00:41:20,736 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 35 on 8020, call
>
> org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.commitBlockSynchronization
> from 10.156.192.106:53271: error: java.io.IOException: Block
> (=BP-696828882-10.168.7.226-1364886167971:blk_-5723958680970112840_174056)
> not found
>
> On the datanode side, i see a call for recover blocks - I see that a write
> pipeline is there, which gets terminated with some socket timeouts...
>
> 00:28:11,471 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: NameNode
> at ec2-107-20-237-30.compute-1.amazonaws.com/10.168.7.226:8020 calls
>
> recoverBlock(BP-696828882-10.168.7.226-1364886167971:blk_-5723958680970112840_174056,
> targets=[10.156.194.94:50010, 10.156.192.106:50010, 10.156.195.38:50010],
> newG

Re: Coreprocessor always scan the whole table.

2013-04-19 Thread GuoWei

We use base 0.94.1 in our production environment.


Best Regards / 商祺
郭伟 Guo Wei

在 2013-4-19,下午6:01,Ted Yu  写道:

> Which hbase version are you using ?
> 
> Thanks
> 
> On Apr 19, 2013, at 2:49 AM, GuoWei  wrote:
> 
>> Hello,
>> 
>> We use HBase core processor endpoint  to process realtime data. But when I 
>> use coreprocessorExec method to scan table and pass startRow and endRow. It 
>> always scan all table instead of the result between the startRow and endRow. 
>> 
>> my code.
>> 
>> results = table.coprocessorExec(IEndPoint_SA.class,  startrow, endrow,
>>   new Batch.Call>() {
>> public Hashtable call(IEndPoint_SA 
>> instance)throws IOException{
>> Hashtable s = null;
>>   try {
>>   s=instance.GetList();
>>   } catch (ParseException e) {
>>   // TODO Auto-generated catch block
>>   e.printStackTrace();
>>   }
>>   return s;
>> }
>>   });
>> 
>> 
>> 
>> Best Regards / 商祺
>> 郭伟 Guo Wei
>> -
> 



Re: Coreprocessor always scan the whole table.

2013-04-19 Thread Jean-Marc Spaggiari
Then https://issues.apache.org/jira/browse/HBASE-6870 is most probably
impacting you.

Take a look at the link. It's not yet fixed but it's coming. You might want
to upgrade to a release which will include this fix.

JM

2013/4/19 GuoWei 

>
> We use base 0.94.1 in our production environment.
>
>
> Best Regards / 商祺
> 郭伟 Guo Wei
>
> 在 2013-4-19,下午6:01,Ted Yu  写道:
>
> > Which hbase version are you using ?
> >
> > Thanks
> >
> > On Apr 19, 2013, at 2:49 AM, GuoWei  wrote:
> >
> >> Hello,
> >>
> >> We use HBase core processor endpoint  to process realtime data. But
> when I use coreprocessorExec method to scan table and pass startRow and
> endRow. It always scan all table instead of the result between the startRow
> and endRow.
> >>
> >> my code.
> >>
> >> results = table.coprocessorExec(IEndPoint_SA.class,  startrow, endrow,
> >>   new Batch.Call>() {
> >> public Hashtable call(IEndPoint_SA
> instance)throws IOException{
> >> Hashtable s = null;
> >>   try {
> >>   s=instance.GetList();
> >>   } catch (ParseException e) {
> >>   // TODO Auto-generated catch block
> >>   e.printStackTrace();
> >>   }
> >>   return s;
> >> }
> >>   });
> >>
> >>
> >>
> >> Best Regards / 商祺
> >> 郭伟 Guo Wei
> >> -
> >
>
>


Starting Region Server with HBase API

2013-04-19 Thread Mehmet Simsek
Hi, I can stop region server by using HBaseAdmin class but cannot start.

How can I start region server by using Hbase API?


HRegionServer class has startRegionServer method, Can I use this class?
-- 
M. Nurettin ŞİMŞEK


Re: Coreprocessor always scan the whole table.

2013-04-19 Thread Ted Yu
Please upgrade to 0.94.6.1 which is more stable. 

Cheers

On Apr 19, 2013, at 4:58 AM, GuoWei  wrote:

> 
> We use base 0.94.1 in our production environment.
> 
> 
> Best Regards / 商祺
> 郭伟 Guo Wei
> 
> 在 2013-4-19,下午6:01,Ted Yu  写道:
> 
>> Which hbase version are you using ?
>> 
>> Thanks
>> 
>> On Apr 19, 2013, at 2:49 AM, GuoWei  wrote:
>> 
>>> Hello,
>>> 
>>> We use HBase core processor endpoint  to process realtime data. But when I 
>>> use coreprocessorExec method to scan table and pass startRow and endRow. It 
>>> always scan all table instead of the result between the startRow and 
>>> endRow. 
>>> 
>>> my code.
>>> 
>>> results = table.coprocessorExec(IEndPoint_SA.class,  startrow, endrow,
>>>  new Batch.Call>() {
>>>public Hashtable call(IEndPoint_SA 
>>> instance)throws IOException{
>>>Hashtable s = null;
>>>  try {
>>>  s=instance.GetList();
>>>  } catch (ParseException e) {
>>>  // TODO Auto-generated catch block
>>>  e.printStackTrace();
>>>  }
>>>  return s;
>>>}
>>>  });
>>> 
>>> 
>>> 
>>> Best Regards / 商祺
>>> 郭伟 Guo Wei
>>> -
>> 
> 


Re: Starting Region Server with HBase API

2013-04-19 Thread Ted Yu
startRegionServer creates a new Thread, wrapping the passed in
HRegionServer.

Can you use script to start region server ?

Cheers

On Fri, Apr 19, 2013 at 5:15 AM, Mehmet Simsek wrote:

> Hi, I can stop region server by using HBaseAdmin class but cannot start.
>
> How can I start region server by using Hbase API?
>
>
> HRegionServer class has startRegionServer method, Can I use this class?
> --
> M. Nurettin ŞİMŞEK
>


Re: Speeding up the row count

2013-04-19 Thread Ted Yu
The stack trace was from your HBase client. 

Can you check server log ?

Thanks

On Apr 19, 2013, at 2:55 AM, Omkar Joshi  wrote:

> Hi Ted,
> 
> 6 minutes is too long :(
> Will this decrease to seconds if more nodes are added in the cluster?
> 
> I got this exception finally(I recall faintly about increasing some timeout 
> parameter while querying but I didn't want to increase it to a high value) :
> 
> Apr 19, 2013 1:05:43 PM 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation 
> processExecs
> WARNING: Error executing for row
> java.util.concurrent.ExecutionException: 
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after 
> attempts=10, exceptions:
> Fri Apr 19 12:56:01 IST 2013, 
> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@1d6e77, 
> java.net.SocketTimeoutException: Call to cldx-1140-1034/172.25.6.71:60020 
> failed on socket timeout exception: java.net.SocketTimeoutException: 6 
> millis timeout while waiting for channel to be ready for read. ch : 
> java.nio.channels.SocketChannel[connected local=/0.0.0.0:1770 
> remote=cldx-1140-1034/172.25.6.71:60020]
> Fri Apr 19 12:57:02 IST 2013, 
> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@1d6e77, 
> java.net.SocketTimeoutException: Call to cldx-1140-1034/172.25.6.71:60020 
> failed on socket timeout exception: java.net.SocketTimeoutException: 6 
> millis timeout while waiting for channel to be ready for read. ch : 
> java.nio.channels.SocketChannel[connected local=/0.0.0.0:1782 
> remote=cldx-1140-1034/172.25.6.71:60020]
> Fri Apr 19 12:58:04 IST 2013, 
> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@1d6e77, 
> java.net.SocketTimeoutException: Call to cldx-1140-1034/172.25.6.71:60020 
> failed on socket timeout exception: java.net.SocketTimeoutException: 6 
> millis timeout while waiting for channel to be ready for read. ch : 
> java.nio.channels.SocketChannel[connected local=/0.0.0.0:1785 
> remote=cldx-1140-1034/172.25.6.71:60020]
> Fri Apr 19 12:59:05 IST 2013, 
> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@1d6e77, 
> java.net.SocketTimeoutException: Call to cldx-1140-1034/172.25.6.71:60020 
> failed on socket timeout exception: java.net.SocketTimeoutException: 6 
> millis timeout while waiting for channel to be ready for read. ch : 
> java.nio.channels.SocketChannel[connected local=/0.0.0.0:1794 
> remote=cldx-1140-1034/172.25.6.71:60020]
> Fri Apr 19 13:00:08 IST 2013, 
> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@1d6e77, 
> java.net.SocketTimeoutException: Call to cldx-1140-1034/172.25.6.71:60020 
> failed on socket timeout exception: java.net.SocketTimeoutException: 6 
> millis timeout while waiting for channel to be ready for read. ch : 
> java.nio.channels.SocketChannel[connected local=/0.0.0.0:1800 
> remote=cldx-1140-1034/172.25.6.71:60020]
> Fri Apr 19 13:01:10 IST 2013, 
> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@1d6e77, 
> java.net.SocketTimeoutException: Call to cldx-1140-1034/172.25.6.71:60020 
> failed on socket timeout exception: java.net.SocketTimeoutException: 6 
> millis timeout while waiting for channel to be ready for read. ch : 
> java.nio.channels.SocketChannel[connected local=/0.0.0.0:1802 
> remote=cldx-1140-1034/172.25.6.71:60020]
> Fri Apr 19 13:02:14 IST 2013, 
> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@1d6e77, 
> java.net.SocketTimeoutException: Call to cldx-1140-1034/172.25.6.71:60020 
> failed on socket timeout exception: java.net.SocketTimeoutException: 6 
> millis timeout while waiting for channel to be ready for read. ch : 
> java.nio.channels.SocketChannel[connected local=/0.0.0.0:1804 
> remote=cldx-1140-1034/172.25.6.71:60020]
> Fri Apr 19 13:03:19 IST 2013, 
> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@1d6e77, 
> java.net.SocketTimeoutException: Call to cldx-1140-1034/172.25.6.71:60020 
> failed on socket timeout exception: java.net.SocketTimeoutException: 6 
> millis timeout while waiting for channel to be ready for read. ch : 
> java.nio.channels.SocketChannel[connected local=/0.0.0.0:1809 
> remote=cldx-1140-1034/172.25.6.71:60020]
> Fri Apr 19 13:04:27 IST 2013, 
> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@1d6e77, 
> java.net.SocketTimeoutException: Call to cldx-1140-1034/172.25.6.71:60020 
> failed on socket timeout exception: java.net.SocketTimeoutException: 6 
> millis timeout while waiting for channel to be ready for read. ch : 
> java.nio.channels.SocketChannel[connected local=/0.0.0.0:1812 
> remote=cldx-1140-1034/172.25.6.71:60020]
> Fri Apr 19 13:05:43 IST 2013, 
> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@1d6e77, 
> java.net.SocketTimeoutException: Call to cldx-1140-1034/172.25.6.71:60020 
> failed on socket timeout exception: java.net.SocketTimeoutException: 6 
> millis timeout while waiting for channel to be ready for read. ch : 
> java.nio.channels.SocketChannel[connected local=/0.0.0.0:1829 
> remote=cldx-1140-1034/172.25.6.71:60020]
> 
>at java.util.concurrent.FutureTask$Sync.innerGet

Re: Starting Region Server with HBase API

2013-04-19 Thread Mehmet Simsek
Can I use script to start region server in java application starting in
windows platform?


Re: Speeding up the row count

2013-04-19 Thread James Taylor
Phoenix will parallelize within a region:

SELECT count(1) FROM orders

I agree with Ted, though, even serially, 100,000 rows shouldn't take any where 
near 6 mins. You say > 100,000 rows. Can you tell us what it's < ?

Thanks,
James

On Apr 19, 2013, at 2:37 AM, "Ted Yu"  wrote:

> Since there is only one region in your table, using aggregation coprocessor 
> has no advantage. 
> I think there may be some issue with your cluster - row count should finish 
> within 6 minutes.
> 
> Have you checked server logs ?
> 
> Thanks
> 
> On Apr 19, 2013, at 12:33 AM, Omkar Joshi  wrote:
> 
>> Hi,
>> 
>> I'm having a 2-node(VMs) Hadoop cluster atop which HBase is running in the 
>> distributed mode.
>> 
>> I'm having a table named ORDERS with >10 rows.
>> 
>> NOTE : Since my cluster is ultra-small, I didn't pre-split the table.
>> 
>> ORDERS
>> rowkey :ORDER_ID
>> 
>> column family : ORDER_DETAILS
>>   columns : CUSTOMER_ID
>>   PRODUCT_ID
>>   REQUEST_DATE
>>   PRODUCT_QUANTITY
>>   PRICE
>>   PAYMENT_MODE
>> 
>> The java client code to simply check the count of the records is :
>> 
>> public long getTableCount(String tableName, String columnFamilyName) {
>> 
>>   AggregationClient aggregationClient = new 
>> AggregationClient(config);
>>   Scan scan = new Scan();
>>   scan.addFamily(Bytes.toBytes(columnFamilyName));
>>   scan.setFilter(new FirstKeyOnlyFilter());
>> 
>>   long rowCount = 0;
>> 
>>   try {
>>   rowCount = 
>> aggregationClient.rowCount(Bytes.toBytes(tableName),
>>   null, scan);
>>   System.out.println("No. of rows in " + tableName + " 
>> is "
>>   + rowCount);
>>   } catch (Throwable e) {
>>   // TODO Auto-generated catch block
>>   e.printStackTrace();
>>   }
>> 
>>   return rowCount;
>>   }
>> 
>> It is running for more than 6 minutes now :(
>> 
>> What shall I do to speed up the execution to milliseconds(at least a couple 
>> of seconds)?
>> 
>> Regards,
>> Omkar Joshi
>> 
>> 
>> -Original Message-
>> From: Vedad Kirlic [mailto:kirl...@gmail.com]
>> Sent: Thursday, April 18, 2013 12:22 AM
>> To: user@hbase.apache.org
>> Subject: Re: Speeding up the row count
>> 
>> Hi Omkar,
>> 
>> If you are not interested in occurrences of specific column (e.g. name,
>> email ... ), and just want to get total number of rows (regardless of their
>> content - i.e. columns), you should avoid adding any columns to the Scan, in
>> which case coprocessor implementation for AggregateClient, will add
>> FirstKeyOnlyFilter to the Scan, so to avoid loading unnecessary columns, so
>> this should result in some speed up.
>> 
>> This is similar approach to what hbase shell 'count' implementation does,
>> although reduction in overhead in that case is bigger, since data transfer
>> from region server to client (shell) is minimized, whereas in case of
>> coprocessor, data does not leave region server, so most of the improvement
>> in that case should come from avoiding loading of unnecessary files. Not
>> sure how this will apply to your particular case, given that data set per
>> row seems to be rather small. Also, in case of AggregateClient you will
>> benefit if/when your tables span multiple regions. Essentially, performance
>> of this approach will 'degrade' as your table gets bigger, but only to the
>> point when it splits, from which point it should be pretty constant. Having
>> this in mind, and your type of data, you might consider pre-splitting your
>> tables.
>> 
>> DISCLAIMER: this is mostly theoretical, since I'm not an expert in hbase
>> internals :), so your best bet is to try it - I'm too lazy to verify impact
>> my self ;)
>> 
>> Finally, if your case can tolerate eventual consistency of counters with
>> actual number of rows, you can, as already suggested, have RowCounter map
>> reduce run every once in a while, write the counter(s) back to hbase, and
>> read those when you need to obtain the number of rows.
>> 
>> Regards,
>> Vedad
>> 
>> 
>> 
>> --
>> View this message in context: 
>> http://apache-hbase.679495.n3.nabble.com/Speeding-up-the-row-count-tp4042378p4042415.html
>> Sent from the HBase User mailing list archive at Nabble.com.
>> 
>> The contents of this e-mail and any attachment(s) may contain confidential 
>> or privileged information for the intended recipient(s). Unintended 
>> recipients are prohibited from taking action on the basis of information in 
>> this e-mail and  using or disseminating the information,  and must notify 
>> the sender and delete it from their system. L&T Infotech will not accept 
>> responsibility or liability for the accuracy or completeness of, or the 
>> presence of any virus

Re: Starting Region Server with HBase API

2013-04-19 Thread Ted Yu
By java application I assume it is an HBase client.
Is your HBase cluster secure ?

How you thought about security implication of allowing client app to start
region server ?

Cheers

On Fri, Apr 19, 2013 at 7:52 AM, Mehmet Simsek wrote:

> Can I use script to start region server in java application starting in
> windows platform?
>


Re: zookeeper taking 15GB RAM

2013-04-19 Thread Rohit Kelkar
Hi, any inputs on this issue? Is there some periodic cleanup that we need
to do?

- Rohit Kelkar


On Thu, Apr 18, 2013 at 10:33 AM, Rohit Kelkar wrote:

> No. Just using the "bin/zkServer.sh start" command. Also each node has 48
> GB RAM
>
> - Rohit Kelkar
>
>
> On Thu, Apr 18, 2013 at 10:28 AM, Jean-Marc Spaggiari <
> jean-m...@spaggiari.org> wrote:
>
>> Hi Rohit,
>>
>> How are you starting your ZK servers? Are you passing any Xm* parameters?
>>
>> JM
>>
>> 2013/4/18 Rohit Kelkar 
>>
>> > We have a 3 node cluster in production with the zookeeper instances
>> running
>> > on each node. I checked the memory usage for QuorumPeerMain using 'cat
>> > /proc//status' and it shows that the VMData is ~16GB.
>> > The zookeeper is being used only by hbase. Is this usual for hbase to
>> load
>> > up the zookeeper memory so much or is it that we have missed out some
>> > important maintenance activity on zookeeper?
>> >
>> > - Rohit Kelkar
>> >
>>
>
>


Re: Coreprocessor always scan the whole table.

2013-04-19 Thread Gary Helmling
As others mention HBASE-6870 is about coprocessorExec() always scanning the
full .META. table to determine region locations.  Is this what you mean or
are you talking about your coprocessor always scanning your full user table?

If you want to limit the scan within regions in your user table, you'll
need to pass startRow and endRow as parameters to your instance.GetList()
method.  Then when you create the region scanner in your coprocessor code,
you'll need to set the start and end row yourself in order to limit the
rows scanned.


On Fri, Apr 19, 2013 at 5:59 AM, Ted Yu  wrote:

> Please upgrade to 0.94.6.1 which is more stable.
>
> Cheers
>
> On Apr 19, 2013, at 4:58 AM, GuoWei  wrote:
>
> >
> > We use base 0.94.1 in our production environment.
> >
> >
> > Best Regards / 商祺
> > 郭伟 Guo Wei
> >
> > 在 2013-4-19,下午6:01,Ted Yu  写道:
> >
> >> Which hbase version are you using ?
> >>
> >> Thanks
> >>
> >> On Apr 19, 2013, at 2:49 AM, GuoWei  wrote:
> >>
> >>> Hello,
> >>>
> >>> We use HBase core processor endpoint  to process realtime data. But
> when I use coreprocessorExec method to scan table and pass startRow and
> endRow. It always scan all table instead of the result between the startRow
> and endRow.
> >>>
> >>> my code.
> >>>
> >>> results = table.coprocessorExec(IEndPoint_SA.class,  startrow, endrow,
> >>>  new Batch.Call>() {
> >>>public Hashtable call(IEndPoint_SA
> instance)throws IOException{
> >>>Hashtable s = null;
> >>>  try {
> >>>  s=instance.GetList();
> >>>  } catch (ParseException e) {
> >>>  // TODO Auto-generated catch block
> >>>  e.printStackTrace();
> >>>  }
> >>>  return s;
> >>>}
> >>>  });
> >>>
> >>>
> >>>
> >>> Best Regards / 商祺
> >>> 郭伟 Guo Wei
> >>> -
> >>
> >
>


Re: zookeeper taking 15GB RAM

2013-04-19 Thread Arpit Gupta
Take a look at this https://issues.apache.org/jira/browse/ZOOKEEPER-1670

When no xmx was set we noticed that zookeeper could take upto 1/4 of the memory 
available on the system with jdk 1.6

--
Arpit Gupta
Hortonworks Inc.
http://hortonworks.com/

On Apr 19, 2013, at 9:15 AM, Rohit Kelkar  wrote:

> Hi, any inputs on this issue? Is there some periodic cleanup that we need
> to do?
> 
> - Rohit Kelkar
> 
> 
> On Thu, Apr 18, 2013 at 10:33 AM, Rohit Kelkar wrote:
> 
>> No. Just using the "bin/zkServer.sh start" command. Also each node has 48
>> GB RAM
>> 
>> - Rohit Kelkar
>> 
>> 
>> On Thu, Apr 18, 2013 at 10:28 AM, Jean-Marc Spaggiari <
>> jean-m...@spaggiari.org> wrote:
>> 
>>> Hi Rohit,
>>> 
>>> How are you starting your ZK servers? Are you passing any Xm* parameters?
>>> 
>>> JM
>>> 
>>> 2013/4/18 Rohit Kelkar 
>>> 
 We have a 3 node cluster in production with the zookeeper instances
>>> running
 on each node. I checked the memory usage for QuorumPeerMain using 'cat
 /proc//status' and it shows that the VMData is ~16GB.
 The zookeeper is being used only by hbase. Is this usual for hbase to
>>> load
 up the zookeeper memory so much or is it that we have missed out some
 important maintenance activity on zookeeper?
 
 - Rohit Kelkar
 
>>> 
>> 
>> 



Re: Starting Region Server with HBase API

2013-04-19 Thread Mehmet Simsek
Security isn't necessary for us.We want to start region server in java 
application.

How can we do?

Mehmet Şimşek

On 19 Nis 2013, at 19:08, Ted Yu  wrote:

> By java application I assume it is an HBase client.
> Is your HBase cluster secure ?
> 
> How you thought about security implication of allowing client app to start
> region server ?
> 
> Cheers
> 
> On Fri, Apr 19, 2013 at 7:52 AM, Mehmet Simsek 
> wrote:
> 
>> Can I use script to start region server in java application starting in
>> windows platform?
>> 


Re: Slow region server recoveries

2013-04-19 Thread Varun Sharma
Is there a place to upload these logs ?


On Fri, Apr 19, 2013 at 10:25 AM, Varun Sharma  wrote:

> Hi Nicholas,
>
> Attached are the namenode, dn logs (of one of the healthy replicas of the
> WAL block) and the rs logs which got stuch doing the log split. Action
> begins at 2013-04-19 00:27*.
>
> Also, the rogue block is 5723958680970112840_174056. Its very interesting
> to trace this guy through the HDFS logs (dn and nn).
>
> Btw, do you know what the UNDER_RECOVERY stage is for, in HDFS ? Also does
> the stale node stuff kick in for that state ?
>
> Thanks
> Varun
>
>
> On Fri, Apr 19, 2013 at 4:00 AM, Nicolas Liochon wrote:
>
>> Thanks for the detailed scenario and analysis. I'm going to have a look.
>> I can't access the logs (ec2-107-20-237-30.compute-1.amazonaws.com
>> timeouts), could you please send them directly to me?
>>
>> Thanks,
>>
>> Nicolas
>>
>>
>> On Fri, Apr 19, 2013 at 12:46 PM, Varun Sharma 
>> wrote:
>>
>> > Hi Nicholas,
>> >
>> > Here is the failure scenario, I have dug up the logs.
>> >
>> > A machine fails and stops accepting/transmitting traffic. The HMaster
>> > starts the distributed split for 13 tasks. There are 12 region servers.
>> 12
>> > tasks succeed but the 13th one takes a looong time.
>> >
>> > Zookeeper timeout is set to 30 seconds. Stale node timeout is 20
>> seconds.
>> > Both patches are there.
>> >
>> > a) Machine fails around 27:30
>> > b) Master starts the split around 27:40 and submits the tasks. The one
>> task
>> > which fails seems to be the one which contains the WAL being currently
>> > written to:
>> >
>> > 2013-04-19 00:27:44,325 INFO
>> > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting hlog:
>> > hdfs://
>> >
>> >
>> ec2-107-20-237-30.compute-1.amazonaws.com/hbase/.logs/ip-10-156-194-94.ec2.internal,60020,1366323217601-splitting/ip-10-156-194-94.ec2.internal%2C60020%2C1366323217601.1366331156141
>> > ,
>> > length=0
>> >
>> > Basically this region server picks up the task but finds the length of
>> this
>> > file to be 0 and drops. This happens again
>> >
>> > c) Finally another region server picks up the task but it ends up going
>> to
>> > the bad datanode which should not happen because of the stale node
>> timeout)
>> > Unfortunately it hits the 45 retries and a connect timeout of 20 seconds
>> > every time. This delays recovery significantly. Now I guess reducing #
>> of
>> > retries to 1 is one possibility.
>> > But then the namenode logs are very interesting.
>> >
>> > d) Namenode seems to be in cyclic lease recovery loop until the node is
>> > marked dead. There is this one last block which exhibits this.
>> >
>> > 2013-04-19 00:28:09,744 INFO BlockStateChange: BLOCK* blk_-*
>> > 5723958680970112840_174056*{blockUCState=UNDER_RECOVERY,
>> > primaryNodeIndex=1,
>> > replicas=[ReplicaUnderConstruction[10.156.194.94:50010|RBW],
>> > ReplicaUnderConstruction[10.156.192.106:50010|RBW],
>> > ReplicaUnderConstruction[10.156.195.38:50010|RBW]]} recovery started,
>> > primary=10.156.192.106:50010
>> > 2013-04-19 00:28:09,744 WARN org.apache.hadoop.hdfs.StateChange: DIR*
>> > NameSystem.internalReleaseLease: File
>> >
>> >
>> /hbase/.logs/ip-10-156-194-94.ec2.internal,60020,1366323217601-splitting/ip-10-156-194-94.ec2.internal%2C60020%2C1366323217601.1366331156141
>> > has not been closed. Lease recovery is in progress. RecoveryId = 174413
>> for
>> > block blk_-5723958680970112840_174056{blockUCState=UNDER_RECOVERY,
>> > primaryNodeIndex=1,
>> > replicas=[ReplicaUnderConstruction[10.156.194.94:50010|RBW],
>> > ReplicaUnderConstruction[10.156.192.106:50010|RBW],
>> > ReplicaUnderConstruction[10.156.195.38:50010|RBW]]}
>> >
>> > I see this over and over again in the logs until the datanode is marked
>> > dead. It seems to be cycling through the replicas for this WAL block and
>> > trying to add it to the recovery list. I looked at the code and it says:
>> >
>> >   // Cannot close file right now, since the last block requires
>> > recovery.
>> >   // This may potentially cause infinite loop in lease recovery
>> >   // if there are no valid replicas on data-nodes.
>> >   NameNode.stateChangeLog.warn(
>> > "DIR* NameSystem.internalReleaseLease: " +
>> > "File " + src + " has not been closed." +
>> >" Lease recovery is in progress. " +
>> > "RecoveryId = " + blockRecoveryId + " for block " +
>> > lastBlock);
>> >   break;
>> >
>> > Eventually for this block, we get
>> >
>> > 2013-04-19 00:41:20,736 INFO
>> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
>> >
>> >
>> commitBlockSynchronization(lastblock=BP-696828882-10.168.7.226-1364886167971:blk_-
>> > *5723958680970112840_174056*, newgenerationstamp=174413,
>> > newlength=119148648, newtargets=[10.156.192.106:50010,
>> 10.156.195.38:50010
>> > ],
>> > closeFile=true, deleteBlock=false)
>> > 2013-04-19 00:41:20,736 ERROR
>> > org.apache.hadoop.security.UserGroupInformation:
>> PriviledgedActionException
>

Re: Starting Region Server with HBase API

2013-04-19 Thread Ted Yu
Can you tell us a bit more about your requirement ?

Looks like you want to control the cluster size (number of region servers
in particular). Once the requirement is outlined, we can think of formal
way to address it.

Thanks

On Fri, Apr 19, 2013 at 9:34 AM, Mehmet Simsek wrote:

> Security isn't necessary for us.We want to start region server in java
> application.
>
> How can we do?
>
> Mehmet Şimşek
>
> On 19 Nis 2013, at 19:08, Ted Yu  wrote:
>
> > By java application I assume it is an HBase client.
> > Is your HBase cluster secure ?
> >
> > How you thought about security implication of allowing client app to
> start
> > region server ?
> >
> > Cheers
> >
> > On Fri, Apr 19, 2013 at 7:52 AM, Mehmet Simsek  >wrote:
> >
> >> Can I use script to start region server in java application starting in
> >> windows platform?
> >>
>


Re: Slow region server recoveries

2013-04-19 Thread Ted Yu
Can you show snippet from DN log which mentioned UNDER_RECOVERY ?

Here is the criteria for stale node checking to kick in (from
https://issues.apache.org/jira/secure/attachment/12544897/HDFS-3703-trunk-read-only.patch
):

+   * Check if the datanode is in stale state. Here if
+   * the namenode has not received heartbeat msg from a
+   * datanode for more than staleInterval (default value is
+   * {@link DFSConfigKeys#DFS_NAMENODE_STALE_DATANODE_INTERVAL_MILLI_DEFAULT}),
+   * the datanode will be treated as stale node.


On Fri, Apr 19, 2013 at 10:28 AM, Varun Sharma  wrote:

> Is there a place to upload these logs ?
>
>
> On Fri, Apr 19, 2013 at 10:25 AM, Varun Sharma 
> wrote:
>
> > Hi Nicholas,
> >
> > Attached are the namenode, dn logs (of one of the healthy replicas of the
> > WAL block) and the rs logs which got stuch doing the log split. Action
> > begins at 2013-04-19 00:27*.
> >
> > Also, the rogue block is 5723958680970112840_174056. Its very interesting
> > to trace this guy through the HDFS logs (dn and nn).
> >
> > Btw, do you know what the UNDER_RECOVERY stage is for, in HDFS ? Also
> does
> > the stale node stuff kick in for that state ?
> >
> > Thanks
> > Varun
> >
> >
> > On Fri, Apr 19, 2013 at 4:00 AM, Nicolas Liochon  >wrote:
> >
> >> Thanks for the detailed scenario and analysis. I'm going to have a look.
> >> I can't access the logs (ec2-107-20-237-30.compute-1.amazonaws.com
> >> timeouts), could you please send them directly to me?
> >>
> >> Thanks,
> >>
> >> Nicolas
> >>
> >>
> >> On Fri, Apr 19, 2013 at 12:46 PM, Varun Sharma 
> >> wrote:
> >>
> >> > Hi Nicholas,
> >> >
> >> > Here is the failure scenario, I have dug up the logs.
> >> >
> >> > A machine fails and stops accepting/transmitting traffic. The HMaster
> >> > starts the distributed split for 13 tasks. There are 12 region
> servers.
> >> 12
> >> > tasks succeed but the 13th one takes a looong time.
> >> >
> >> > Zookeeper timeout is set to 30 seconds. Stale node timeout is 20
> >> seconds.
> >> > Both patches are there.
> >> >
> >> > a) Machine fails around 27:30
> >> > b) Master starts the split around 27:40 and submits the tasks. The one
> >> task
> >> > which fails seems to be the one which contains the WAL being currently
> >> > written to:
> >> >
> >> > 2013-04-19 00:27:44,325 INFO
> >> > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting hlog:
> >> > hdfs://
> >> >
> >> >
> >>
> ec2-107-20-237-30.compute-1.amazonaws.com/hbase/.logs/ip-10-156-194-94.ec2.internal,60020,1366323217601-splitting/ip-10-156-194-94.ec2.internal%2C60020%2C1366323217601.1366331156141
> >> > ,
> >> > length=0
> >> >
> >> > Basically this region server picks up the task but finds the length of
> >> this
> >> > file to be 0 and drops. This happens again
> >> >
> >> > c) Finally another region server picks up the task but it ends up
> going
> >> to
> >> > the bad datanode which should not happen because of the stale node
> >> timeout)
> >> > Unfortunately it hits the 45 retries and a connect timeout of 20
> seconds
> >> > every time. This delays recovery significantly. Now I guess reducing #
> >> of
> >> > retries to 1 is one possibility.
> >> > But then the namenode logs are very interesting.
> >> >
> >> > d) Namenode seems to be in cyclic lease recovery loop until the node
> is
> >> > marked dead. There is this one last block which exhibits this.
> >> >
> >> > 2013-04-19 00:28:09,744 INFO BlockStateChange: BLOCK* blk_-*
> >> > 5723958680970112840_174056*{blockUCState=UNDER_RECOVERY,
> >> > primaryNodeIndex=1,
> >> > replicas=[ReplicaUnderConstruction[10.156.194.94:50010|RBW],
> >> > ReplicaUnderConstruction[10.156.192.106:50010|RBW],
> >> > ReplicaUnderConstruction[10.156.195.38:50010|RBW]]} recovery started,
> >> > primary=10.156.192.106:50010
> >> > 2013-04-19 00:28:09,744 WARN org.apache.hadoop.hdfs.StateChange: DIR*
> >> > NameSystem.internalReleaseLease: File
> >> >
> >> >
> >>
> /hbase/.logs/ip-10-156-194-94.ec2.internal,60020,1366323217601-splitting/ip-10-156-194-94.ec2.internal%2C60020%2C1366323217601.1366331156141
> >> > has not been closed. Lease recovery is in progress. RecoveryId =
> 174413
> >> for
> >> > block blk_-5723958680970112840_174056{blockUCState=UNDER_RECOVERY,
> >> > primaryNodeIndex=1,
> >> > replicas=[ReplicaUnderConstruction[10.156.194.94:50010|RBW],
> >> > ReplicaUnderConstruction[10.156.192.106:50010|RBW],
> >> > ReplicaUnderConstruction[10.156.195.38:50010|RBW]]}
> >> >
> >> > I see this over and over again in the logs until the datanode is
> marked
> >> > dead. It seems to be cycling through the replicas for this WAL block
> and
> >> > trying to add it to the recovery list. I looked at the code and it
> says:
> >> >
> >> >   // Cannot close file right now, since the last block requires
> >> > recovery.
> >> >   // This may potentially cause infinite loop in lease recovery
> >> >   // if there are no valid replicas on data-nodes.
> >> >   NameNode.stateChangeLog.warn(
> >> >  

Re: Slow region server recoveries

2013-04-19 Thread Varun Sharma
here is the snippet
2013-04-19 00:27:38,337 INFO
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
Recover RBW replica
BP-696828882-10.168.7.226-1364886167971:blk_40107897639761277_174072
2013-04-19 00:27:38,337 INFO
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
Recovering ReplicaBeingWritten, blk_40107897639761277_174072, RBW
2013-04-19 00:28:11,471 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: NameNode at
ec2-107-20-237-30.compute-1.amazonaws.com/10.168.7.226:8020 calls
recoverBlock(BP-696828882-10.168.7.226-1364886167971:blk_-5723958680970112840_174056,
targets=[*10.156.194.94:50010, 10.156.192.106:50010, 10.156.195.38:50010*],
newGenerationStamp=174413)
2013-04-19 00:41:20,716 INFO
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
initReplicaRecovery: blk_-5723958680970112840_174056, recoveryId=174413,
replica=ReplicaBeingWritten, blk_-5723958680970112840_174056, RBW
2013-04-19 00:41:20,716 INFO
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
initReplicaRecovery: changing replica state for
blk_-5723958680970112840_174056 from RBW to RUR
2013-04-19 00:41:20,733 INFO
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
updateReplica:
BP-696828882-10.168.7.226-1364886167971:blk_-5723958680970112840_174056,
recoveryId=174413, length=119148648, replica=ReplicaUnderRecovery,
blk_-5723958680970112840_174056, RUR
2013-04-19 00:41:20,745 WARN
org.apache.hadoop.hdfs.server.datanode.DataNode: recoverBlocks FAILED:
RecoveringBlock{BP-696828882-10.168.7.226-1364886167971:blk_-5723958680970112840_174056;
getBlockSize()=0; corrupt=false; offset=-1; locs=[10.156.194.94:50010,
10.156.192.106:50010, 10.156.195.38:50010]}
2013-04-19 00:41:23,733 INFO
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
initReplicaRecovery: blk_-5723958680970112840_174056, recoveryId=174418,
replica=FinalizedReplica, blk_-5723958680970112840_174413, FINALIZED
2013-04-19 00:41:23,733 INFO
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
initReplicaRecovery: changing replica state for
blk_-5723958680970112840_174056 from FINALIZED to RUR
2013-04-19 00:41:23,736 INFO
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
updateReplica:
BP-696828882-10.168.7.226-1364886167971:blk_-5723958680970112840_174413,
recoveryId=174418, length=119148648, replica=ReplicaUnderRecovery,
blk_-5723958680970112840_174413, RUR

Block recovery takes a long time and eventually seems to fail - during
recoverBlock() call - all three datanodes (including the failed/stale one
is there)



On Fri, Apr 19, 2013 at 10:40 AM, Ted Yu  wrote:

> Can you show snippet from DN log which mentioned UNDER_RECOVERY ?
>
> Here is the criteria for stale node checking to kick in (from
>
> https://issues.apache.org/jira/secure/attachment/12544897/HDFS-3703-trunk-read-only.patch
> ):
>
> +   * Check if the datanode is in stale state. Here if
> +   * the namenode has not received heartbeat msg from a
> +   * datanode for more than staleInterval (default value is
> +   * {@link
> DFSConfigKeys#DFS_NAMENODE_STALE_DATANODE_INTERVAL_MILLI_DEFAULT}),
> +   * the datanode will be treated as stale node.
>
>
> On Fri, Apr 19, 2013 at 10:28 AM, Varun Sharma 
> wrote:
>
> > Is there a place to upload these logs ?
> >
> >
> > On Fri, Apr 19, 2013 at 10:25 AM, Varun Sharma 
> > wrote:
> >
> > > Hi Nicholas,
> > >
> > > Attached are the namenode, dn logs (of one of the healthy replicas of
> the
> > > WAL block) and the rs logs which got stuch doing the log split. Action
> > > begins at 2013-04-19 00:27*.
> > >
> > > Also, the rogue block is 5723958680970112840_174056. Its very
> interesting
> > > to trace this guy through the HDFS logs (dn and nn).
> > >
> > > Btw, do you know what the UNDER_RECOVERY stage is for, in HDFS ? Also
> > does
> > > the stale node stuff kick in for that state ?
> > >
> > > Thanks
> > > Varun
> > >
> > >
> > > On Fri, Apr 19, 2013 at 4:00 AM, Nicolas Liochon  > >wrote:
> > >
> > >> Thanks for the detailed scenario and analysis. I'm going to have a
> look.
> > >> I can't access the logs (ec2-107-20-237-30.compute-1.amazonaws.com
> > >> timeouts), could you please send them directly to me?
> > >>
> > >> Thanks,
> > >>
> > >> Nicolas
> > >>
> > >>
> > >> On Fri, Apr 19, 2013 at 12:46 PM, Varun Sharma 
> > >> wrote:
> > >>
> > >> > Hi Nicholas,
> > >> >
> > >> > Here is the failure scenario, I have dug up the logs.
> > >> >
> > >> > A machine fails and stops accepting/transmitting traffic. The
> HMaster
> > >> > starts the distributed split for 13 tasks. There are 12 region
> > servers.
> > >> 12
> > >> > tasks succeed but the 13th one takes a looong time.
> > >> >
> > >> > Zookeeper timeout is set to 30 seconds. Stale node timeout is 20
> > >> seconds.
> > >> > Both patches are there.
> > >> >
> > >> > a) Machine fails around 27:30
> > >> > b) Master starts the split around 27:40 and submits the tasks. The
> one
> > >> 

Overwrite a row

2013-04-19 Thread Kristoffer Sjögren
Hi

Is it possible to completely overwrite/replace a row in a single _atomic_
action? Already existing columns and qualifiers should be removed if they
do not exist in the data inserted into the row.

The only way to do this is to first delete the row then insert new data in
its place, correct? Or is there an operation to do this?

Cheers,
-Kristoffer


Re: Speeding up the row count

2013-04-19 Thread lars hofhansl
You should expect to be able to scan about 1-2m small rows/s/core if everything 
is in cache.
Something is definitely wrong in your setup. Can you post your config files 
(HBase and HDFS) via pastebin?


-- Lars




 From: Omkar Joshi 
To: "user@hbase.apache.org"  
Sent: Friday, April 19, 2013 2:55 AM
Subject: RE: Speeding up the row count
 

Hi Ted,

6 minutes is too long :(
Will this decrease to seconds if more nodes are added in the cluster?

I got this exception finally(I recall faintly about increasing some timeout 
parameter while querying but I didn't want to increase it to a high value) :

Apr 19, 2013 1:05:43 PM 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation 
processExecs
WARNING: Error executing for row
java.util.concurrent.ExecutionException: 
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after 
attempts=10, exceptions:
Fri Apr 19 12:56:01 IST 2013, 
org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@1d6e77, 
java.net.SocketTimeoutException: Call to cldx-1140-1034/172.25.6.71:60020 
failed on socket timeout exception: java.net.SocketTimeoutException: 6 
millis timeout while waiting for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/0.0.0.0:1770 
remote=cldx-1140-1034/172.25.6.71:60020]
Fri Apr 19 12:57:02 IST 2013, 
org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@1d6e77, 
java.net.SocketTimeoutException: Call to cldx-1140-1034/172.25.6.71:60020 
failed on socket timeout exception: java.net.SocketTimeoutException: 6 
millis timeout while waiting for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/0.0.0.0:1782 
remote=cldx-1140-1034/172.25.6.71:60020]
Fri Apr 19 12:58:04 IST 2013, 
org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@1d6e77, 
java.net.SocketTimeoutException: Call to cldx-1140-1034/172.25.6.71:60020 
failed on socket timeout exception: java.net.SocketTimeoutException: 6 
millis timeout while waiting for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/0.0.0.0:1785 
remote=cldx-1140-1034/172.25.6.71:60020]
Fri Apr 19 12:59:05 IST 2013, 
org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@1d6e77, 
java.net.SocketTimeoutException: Call to cldx-1140-1034/172.25.6.71:60020 
failed on socket timeout exception: java.net.SocketTimeoutException: 6 
millis timeout while waiting for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/0.0.0.0:1794 
remote=cldx-1140-1034/172.25.6.71:60020]
Fri Apr 19 13:00:08 IST 2013, 
org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@1d6e77, 
java.net.SocketTimeoutException: Call to cldx-1140-1034/172.25.6.71:60020 
failed on socket timeout exception: java.net.SocketTimeoutException: 6 
millis timeout while waiting for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/0.0.0.0:1800 
remote=cldx-1140-1034/172.25.6.71:60020]
Fri Apr 19 13:01:10 IST 2013, 
org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@1d6e77, 
java.net.SocketTimeoutException: Call to cldx-1140-1034/172.25.6.71:60020 
failed on socket timeout exception: java.net.SocketTimeoutException: 6 
millis timeout while waiting for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/0.0.0.0:1802 
remote=cldx-1140-1034/172.25.6.71:60020]
Fri Apr 19 13:02:14 IST 2013, 
org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@1d6e77, 
java.net.SocketTimeoutException: Call to cldx-1140-1034/172.25.6.71:60020 
failed on socket timeout exception: java.net.SocketTimeoutException: 6 
millis timeout while waiting for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/0.0.0.0:1804 
remote=cldx-1140-1034/172.25.6.71:60020]
Fri Apr 19 13:03:19 IST 2013, 
org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@1d6e77, 
java.net.SocketTimeoutException: Call to cldx-1140-1034/172.25.6.71:60020 
failed on socket timeout exception: java.net.SocketTimeoutException: 6 
millis timeout while waiting for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/0.0.0.0:1809 
remote=cldx-1140-1034/172.25.6.71:60020]
Fri Apr 19 13:04:27 IST 2013, 
org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@1d6e77, 
java.net.SocketTimeoutException: Call to cldx-1140-1034/172.25.6.71:60020 
failed on socket timeout exception: java.net.SocketTimeoutException: 6 
millis timeout while waiting for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/0.0.0.0:1812 
remote=cldx-1140-1034/172.25.6.71:60020]
Fri Apr 19 13:05:43 IST 2013, 
org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@1d6e77, 
java.net.SocketTimeoutException: Call to cldx-1140-1034/172.25.6.71:60020 
failed on socket timeout exception: java.net.SocketTimeoutException: 6 
millis timeout while waiting for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/0.0.0.0:1829 
remote=cldx-1140-1034/172.

Re: zookeeper taking 15GB RAM

2013-04-19 Thread Rohit Kelkar
Thanks for the reply. I checked the default max heap size for java on the
nodes and it turns out its 16G. So now I have to start zookeeper with a
reasonable value for heapsize. What are the factors that would impact the
heap size of zookeeper? Is it more tables in hbase or is is the number of
regions?

- Rohit Kelkar


On Fri, Apr 19, 2013 at 11:22 AM, Arpit Gupta  wrote:

> Take a look at this https://issues.apache.org/jira/browse/ZOOKEEPER-1670
>
> When no xmx was set we noticed that zookeeper could take upto 1/4 of the
> memory available on the system with jdk 1.6
>
> --
> Arpit Gupta
> Hortonworks Inc.
> http://hortonworks.com/
>
> On Apr 19, 2013, at 9:15 AM, Rohit Kelkar  wrote:
>
> > Hi, any inputs on this issue? Is there some periodic cleanup that we need
> > to do?
> >
> > - Rohit Kelkar
> >
> >
> > On Thu, Apr 18, 2013 at 10:33 AM, Rohit Kelkar  >wrote:
> >
> >> No. Just using the "bin/zkServer.sh start" command. Also each node has
> 48
> >> GB RAM
> >>
> >> - Rohit Kelkar
> >>
> >>
> >> On Thu, Apr 18, 2013 at 10:28 AM, Jean-Marc Spaggiari <
> >> jean-m...@spaggiari.org> wrote:
> >>
> >>> Hi Rohit,
> >>>
> >>> How are you starting your ZK servers? Are you passing any Xm*
> parameters?
> >>>
> >>> JM
> >>>
> >>> 2013/4/18 Rohit Kelkar 
> >>>
>  We have a 3 node cluster in production with the zookeeper instances
> >>> running
>  on each node. I checked the memory usage for QuorumPeerMain using 'cat
>  /proc//status' and it shows that the VMData is ~16GB.
>  The zookeeper is being used only by hbase. Is this usual for hbase to
> >>> load
>  up the zookeeper memory so much or is it that we have missed out some
>  important maintenance activity on zookeeper?
> 
>  - Rohit Kelkar
> 
> >>>
> >>
> >>
>
>


Re: Overwrite a row

2013-04-19 Thread Ted Yu
What is the maximum number of versions do you allow for the underlying
table ?

Thanks

On Fri, Apr 19, 2013 at 10:53 AM, Kristoffer Sjögren wrote:

> Hi
>
> Is it possible to completely overwrite/replace a row in a single _atomic_
> action? Already existing columns and qualifiers should be removed if they
> do not exist in the data inserted into the row.
>
> The only way to do this is to first delete the row then insert new data in
> its place, correct? Or is there an operation to do this?
>
> Cheers,
> -Kristoffer
>


Re: Overwrite a row

2013-04-19 Thread Kristoffer Sjögren
What would you suggest? I want the operation to be atomic.


On Fri, Apr 19, 2013 at 8:32 PM, Ted Yu  wrote:

> What is the maximum number of versions do you allow for the underlying
> table ?
>
> Thanks
>
> On Fri, Apr 19, 2013 at 10:53 AM, Kristoffer Sjögren  >wrote:
>
> > Hi
> >
> > Is it possible to completely overwrite/replace a row in a single _atomic_
> > action? Already existing columns and qualifiers should be removed if they
> > do not exist in the data inserted into the row.
> >
> > The only way to do this is to first delete the row then insert new data
> in
> > its place, correct? Or is there an operation to do this?
> >
> > Cheers,
> > -Kristoffer
> >
>


Re: Inconsistent performance numbers with increased nodes

2013-04-19 Thread Marcos Luis Ortiz Valmaseda
Just a question, Alex. Why are you using OpenJDK? The first recommendation
for a Hadoop cluster is to use Java SDK from Oracle , because precisely
with OpenJDK, there are some performance issues, which should be fixed in
the next releases, but I encourage you to use Java 1.6. from Oracle.

- Which is the replication factor in your cluster? (default: 3)
- What is the value of your HDFS blocks? (default: 64 Mb, a good value is
128 Mb or 256 Mb depending of your cluster load)



2013/4/19 Alex O'Ree 

> Marcos
>
> - Java version - 1.6 OpenJDK x64, latest version in the CentOS repo
> - JVM tuning configuration, I think that we just changed the max ram
> to close to 4GB
> - Hadoop JT, DN, NN configuration, 1 JT, 10/12 DN, 1 NN. No security, no
> ssl
> - Network topology, star
> - Network speed for the cluster, emulated 4G celluar
> - Hardware properties for all nodes in the cluster - 2 core, 2.2Ghz, 4GB
> ram
> - Which platform are you using for the benchmark? The benchmark was
> the basic word count sample app, using the wikipedia export as the
> data set.
>
> Here's the result set I'm looking at and i'm just giving bogus values
> to make the point
> 10 DN cluster,
> 10 minutes, consistently
>
> 12 DN cluster,
> 10m, 15m, 10m, 15m, 15m, 10m, 10m
>
> Basically, there the result set for the 12 DN cluster I expected to be
> consistent, however the data set isn't. Since there's a high
> correlation between the lowest values in the 12 DN data with the
> average values in the 10 DN cluster, I'm asserting that Hadoop may
> have just talked to 10 DNs instead of all 12.
>
> This is for a paper that I plan on publishing shortly containing
> emulated network conditions for a number of different network types.
>
> On Fri, Apr 19, 2013 at 3:26 PM, Marcos Luis Ortiz Valmaseda
>  wrote:
> > Regards, Alex.
> > We need more information to be able to get you a good answer:
> > - Java version
> > - JVM tuning configuration
> > - Hadoop JT, DN, NN configuration
> > - Network topology
> > - Network speed for the cluster
> > - Hardware properties for all nodes in the cluster
> >
> > Hadoop is an actual scalable system, where you can add more nodes and the
> > performance should be better, but there are some configurations which can
> > downgrade its performance.
> >
> > Another things is:
> > Which platform are you using for the benchmark?
> > There is an amazing platform developed by Jason Dai from Intel called
> > Hibench, which is great for this kind of work.[1][2]
> >
> > With all this information, I think that we can help you to find the root
> > causes behind the performance of the cluster.
> >
> > [1] https://github.com/intel-hadoop/HiBench
> > [2]
> >
> http://hadoopsummit.org/amsterdam-blog/meet-the-presenters-jason-dai-of-intel/
> >
> >
> >
> > 2013/4/19 Alex O'Ree 
> >>
> >> Hi I'm running a 10 data node cluster and was experimenting with
> >> adding additional nodes to it. I've done some performance bench
> >> marking with 10 nodes and have compared them to 12 nodes and I've
> >> found some rather interesting and inconsistent results. The behavior
> >> I'm seeing is that during some of the 12 node bench runs, I'm actually
> >> seeing two different performance levels, one set at a different level
> >> than 10 nodes, and another at exactly the performance of a 10 node
> >> cluster. I've eliminated any possibility of networking problems or
> >> problems related to a specific machine. Before switching to a 12 node
> >> cluster, the initial cluster was destroyed, rebuilt and the dataset
> >> was added in. This should have yielded an evenly balanced cluster
> >> (confirmed through the web app)
> >>
> >> So my question is, is this an expected behavior or is something else
> >> going on here that I'm not aware of. For reference, I'm using 1.0.8 on
> >> CentOS 6.3 x64
> >
> >
> >
> >
> > --
> > Marcos Ortiz Valmaseda,
> > Data-Driven Product Manager at PDVSA
> > Blog: http://dataddict.wordpress.com/
> > LinkedIn: http://www.linkedin.com/in/marcosluis2186
> > Twitter: @marcosluis2186
>



-- 
Marcos Ortiz Valmaseda,
*Data-Driven Product Manager* at PDVSA
*Blog*: http://dataddict.wordpress.com/
*LinkedIn: *http://www.linkedin.com/in/marcosluis2186
*Twitter*: @marcosluis2186 


Re: Slow region server recoveries

2013-04-19 Thread Varun Sharma
Hi Ted,

I had a long offline discussion with nicholas on this. Looks like the last
block which was still being written too, took an enormous time to recover.
Here's what happened.
a) Master split tasks and region servers process them
b) Region server tries to recover lease for each WAL log - most cases are
noop since they are already rolled over/finalized
c) The last file lease recovery takes some time since the crashing server
was writing to it and had a lease on it - but basically we have the lease 1
minute after the server was lost
d) Now we start the recovery for this but we end up hitting the stale data
node which is puzzling.

It seems that we did not hit the stale datanode when we were trying to
recover the finalized WAL blocks with trivial lease recovery. However, for
the final block, we hit the stale datanode. Any clue why this might be
happening ?

Varun


On Fri, Apr 19, 2013 at 10:40 AM, Ted Yu  wrote:

> Can you show snippet from DN log which mentioned UNDER_RECOVERY ?
>
> Here is the criteria for stale node checking to kick in (from
>
> https://issues.apache.org/jira/secure/attachment/12544897/HDFS-3703-trunk-read-only.patch
> ):
>
> +   * Check if the datanode is in stale state. Here if
> +   * the namenode has not received heartbeat msg from a
> +   * datanode for more than staleInterval (default value is
> +   * {@link
> DFSConfigKeys#DFS_NAMENODE_STALE_DATANODE_INTERVAL_MILLI_DEFAULT}),
> +   * the datanode will be treated as stale node.
>
>
> On Fri, Apr 19, 2013 at 10:28 AM, Varun Sharma 
> wrote:
>
> > Is there a place to upload these logs ?
> >
> >
> > On Fri, Apr 19, 2013 at 10:25 AM, Varun Sharma 
> > wrote:
> >
> > > Hi Nicholas,
> > >
> > > Attached are the namenode, dn logs (of one of the healthy replicas of
> the
> > > WAL block) and the rs logs which got stuch doing the log split. Action
> > > begins at 2013-04-19 00:27*.
> > >
> > > Also, the rogue block is 5723958680970112840_174056. Its very
> interesting
> > > to trace this guy through the HDFS logs (dn and nn).
> > >
> > > Btw, do you know what the UNDER_RECOVERY stage is for, in HDFS ? Also
> > does
> > > the stale node stuff kick in for that state ?
> > >
> > > Thanks
> > > Varun
> > >
> > >
> > > On Fri, Apr 19, 2013 at 4:00 AM, Nicolas Liochon  > >wrote:
> > >
> > >> Thanks for the detailed scenario and analysis. I'm going to have a
> look.
> > >> I can't access the logs (ec2-107-20-237-30.compute-1.amazonaws.com
> > >> timeouts), could you please send them directly to me?
> > >>
> > >> Thanks,
> > >>
> > >> Nicolas
> > >>
> > >>
> > >> On Fri, Apr 19, 2013 at 12:46 PM, Varun Sharma 
> > >> wrote:
> > >>
> > >> > Hi Nicholas,
> > >> >
> > >> > Here is the failure scenario, I have dug up the logs.
> > >> >
> > >> > A machine fails and stops accepting/transmitting traffic. The
> HMaster
> > >> > starts the distributed split for 13 tasks. There are 12 region
> > servers.
> > >> 12
> > >> > tasks succeed but the 13th one takes a looong time.
> > >> >
> > >> > Zookeeper timeout is set to 30 seconds. Stale node timeout is 20
> > >> seconds.
> > >> > Both patches are there.
> > >> >
> > >> > a) Machine fails around 27:30
> > >> > b) Master starts the split around 27:40 and submits the tasks. The
> one
> > >> task
> > >> > which fails seems to be the one which contains the WAL being
> currently
> > >> > written to:
> > >> >
> > >> > 2013-04-19 00:27:44,325 INFO
> > >> > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting
> hlog:
> > >> > hdfs://
> > >> >
> > >> >
> > >>
> >
> ec2-107-20-237-30.compute-1.amazonaws.com/hbase/.logs/ip-10-156-194-94.ec2.internal,60020,1366323217601-splitting/ip-10-156-194-94.ec2.internal%2C60020%2C1366323217601.1366331156141
> > >> > ,
> > >> > length=0
> > >> >
> > >> > Basically this region server picks up the task but finds the length
> of
> > >> this
> > >> > file to be 0 and drops. This happens again
> > >> >
> > >> > c) Finally another region server picks up the task but it ends up
> > going
> > >> to
> > >> > the bad datanode which should not happen because of the stale node
> > >> timeout)
> > >> > Unfortunately it hits the 45 retries and a connect timeout of 20
> > seconds
> > >> > every time. This delays recovery significantly. Now I guess
> reducing #
> > >> of
> > >> > retries to 1 is one possibility.
> > >> > But then the namenode logs are very interesting.
> > >> >
> > >> > d) Namenode seems to be in cyclic lease recovery loop until the node
> > is
> > >> > marked dead. There is this one last block which exhibits this.
> > >> >
> > >> > 2013-04-19 00:28:09,744 INFO BlockStateChange: BLOCK* blk_-*
> > >> > 5723958680970112840_174056*{blockUCState=UNDER_RECOVERY,
> > >> > primaryNodeIndex=1,
> > >> > replicas=[ReplicaUnderConstruction[10.156.194.94:50010|RBW],
> > >> > ReplicaUnderConstruction[10.156.192.106:50010|RBW],
> > >> > ReplicaUnderConstruction[10.156.195.38:50010|RBW]]} recovery
> started,
> > >> > primary=10.156.192.106:50010
> > >> > 2013-04-19 00:28:09,744 WA

Re: Slow region server recoveries

2013-04-19 Thread Varun Sharma
This is 0.94.3 hbase...


On Fri, Apr 19, 2013 at 1:09 PM, Varun Sharma  wrote:

> Hi Ted,
>
> I had a long offline discussion with nicholas on this. Looks like the last
> block which was still being written too, took an enormous time to recover.
> Here's what happened.
> a) Master split tasks and region servers process them
> b) Region server tries to recover lease for each WAL log - most cases are
> noop since they are already rolled over/finalized
> c) The last file lease recovery takes some time since the crashing server
> was writing to it and had a lease on it - but basically we have the lease 1
> minute after the server was lost
> d) Now we start the recovery for this but we end up hitting the stale data
> node which is puzzling.
>
> It seems that we did not hit the stale datanode when we were trying to
> recover the finalized WAL blocks with trivial lease recovery. However, for
> the final block, we hit the stale datanode. Any clue why this might be
> happening ?
>
> Varun
>
>
> On Fri, Apr 19, 2013 at 10:40 AM, Ted Yu  wrote:
>
>> Can you show snippet from DN log which mentioned UNDER_RECOVERY ?
>>
>> Here is the criteria for stale node checking to kick in (from
>>
>> https://issues.apache.org/jira/secure/attachment/12544897/HDFS-3703-trunk-read-only.patch
>> ):
>>
>> +   * Check if the datanode is in stale state. Here if
>> +   * the namenode has not received heartbeat msg from a
>> +   * datanode for more than staleInterval (default value is
>> +   * {@link
>> DFSConfigKeys#DFS_NAMENODE_STALE_DATANODE_INTERVAL_MILLI_DEFAULT}),
>> +   * the datanode will be treated as stale node.
>>
>>
>> On Fri, Apr 19, 2013 at 10:28 AM, Varun Sharma 
>> wrote:
>>
>> > Is there a place to upload these logs ?
>> >
>> >
>> > On Fri, Apr 19, 2013 at 10:25 AM, Varun Sharma 
>> > wrote:
>> >
>> > > Hi Nicholas,
>> > >
>> > > Attached are the namenode, dn logs (of one of the healthy replicas of
>> the
>> > > WAL block) and the rs logs which got stuch doing the log split. Action
>> > > begins at 2013-04-19 00:27*.
>> > >
>> > > Also, the rogue block is 5723958680970112840_174056. Its very
>> interesting
>> > > to trace this guy through the HDFS logs (dn and nn).
>> > >
>> > > Btw, do you know what the UNDER_RECOVERY stage is for, in HDFS ? Also
>> > does
>> > > the stale node stuff kick in for that state ?
>> > >
>> > > Thanks
>> > > Varun
>> > >
>> > >
>> > > On Fri, Apr 19, 2013 at 4:00 AM, Nicolas Liochon > > >wrote:
>> > >
>> > >> Thanks for the detailed scenario and analysis. I'm going to have a
>> look.
>> > >> I can't access the logs (ec2-107-20-237-30.compute-1.amazonaws.com
>> > >> timeouts), could you please send them directly to me?
>> > >>
>> > >> Thanks,
>> > >>
>> > >> Nicolas
>> > >>
>> > >>
>> > >> On Fri, Apr 19, 2013 at 12:46 PM, Varun Sharma 
>> > >> wrote:
>> > >>
>> > >> > Hi Nicholas,
>> > >> >
>> > >> > Here is the failure scenario, I have dug up the logs.
>> > >> >
>> > >> > A machine fails and stops accepting/transmitting traffic. The
>> HMaster
>> > >> > starts the distributed split for 13 tasks. There are 12 region
>> > servers.
>> > >> 12
>> > >> > tasks succeed but the 13th one takes a looong time.
>> > >> >
>> > >> > Zookeeper timeout is set to 30 seconds. Stale node timeout is 20
>> > >> seconds.
>> > >> > Both patches are there.
>> > >> >
>> > >> > a) Machine fails around 27:30
>> > >> > b) Master starts the split around 27:40 and submits the tasks. The
>> one
>> > >> task
>> > >> > which fails seems to be the one which contains the WAL being
>> currently
>> > >> > written to:
>> > >> >
>> > >> > 2013-04-19 00:27:44,325 INFO
>> > >> > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting
>> hlog:
>> > >> > hdfs://
>> > >> >
>> > >> >
>> > >>
>> >
>> ec2-107-20-237-30.compute-1.amazonaws.com/hbase/.logs/ip-10-156-194-94.ec2.internal,60020,1366323217601-splitting/ip-10-156-194-94.ec2.internal%2C60020%2C1366323217601.1366331156141
>> > >> > ,
>> > >> > length=0
>> > >> >
>> > >> > Basically this region server picks up the task but finds the
>> length of
>> > >> this
>> > >> > file to be 0 and drops. This happens again
>> > >> >
>> > >> > c) Finally another region server picks up the task but it ends up
>> > going
>> > >> to
>> > >> > the bad datanode which should not happen because of the stale node
>> > >> timeout)
>> > >> > Unfortunately it hits the 45 retries and a connect timeout of 20
>> > seconds
>> > >> > every time. This delays recovery significantly. Now I guess
>> reducing #
>> > >> of
>> > >> > retries to 1 is one possibility.
>> > >> > But then the namenode logs are very interesting.
>> > >> >
>> > >> > d) Namenode seems to be in cyclic lease recovery loop until the
>> node
>> > is
>> > >> > marked dead. There is this one last block which exhibits this.
>> > >> >
>> > >> > 2013-04-19 00:28:09,744 INFO BlockStateChange: BLOCK* blk_-*
>> > >> > 5723958680970112840_174056*{blockUCState=UNDER_RECOVERY,
>> > >> > primaryNodeIndex=1,
>> > >> > replicas=[ReplicaUnderConstruc

Slow region server recoveries due to lease recovery going to stale data node

2013-04-19 Thread Ted Yu
I think the issue would be more appropriate for hdfs-dev@ mailing list.

Putting use@hbase as Bcc.

-- Forwarded message --
From: Varun Sharma 
Date: Fri, Apr 19, 2013 at 1:10 PM
Subject: Re: Slow region server recoveries
To: user@hbase.apache.org


This is 0.94.3 hbase...


On Fri, Apr 19, 2013 at 1:09 PM, Varun Sharma  wrote:

> Hi Ted,
>
> I had a long offline discussion with nicholas on this. Looks like the last
> block which was still being written too, took an enormous time to recover.
> Here's what happened.
> a) Master split tasks and region servers process them
> b) Region server tries to recover lease for each WAL log - most cases are
> noop since they are already rolled over/finalized
> c) The last file lease recovery takes some time since the crashing server
> was writing to it and had a lease on it - but basically we have the lease
1
> minute after the server was lost
> d) Now we start the recovery for this but we end up hitting the stale data
> node which is puzzling.
>
> It seems that we did not hit the stale datanode when we were trying to
> recover the finalized WAL blocks with trivial lease recovery. However, for
> the final block, we hit the stale datanode. Any clue why this might be
> happening ?
>
> Varun
>
>
> On Fri, Apr 19, 2013 at 10:40 AM, Ted Yu  wrote:
>
>> Can you show snippet from DN log which mentioned UNDER_RECOVERY ?
>>
>> Here is the criteria for stale node checking to kick in (from
>>
>>
https://issues.apache.org/jira/secure/attachment/12544897/HDFS-3703-trunk-read-only.patch
>> ):
>>
>> +   * Check if the datanode is in stale state. Here if
>> +   * the namenode has not received heartbeat msg from a
>> +   * datanode for more than staleInterval (default value is
>> +   * {@link
>> DFSConfigKeys#DFS_NAMENODE_STALE_DATANODE_INTERVAL_MILLI_DEFAULT}),
>> +   * the datanode will be treated as stale node.
>>
>>
>> On Fri, Apr 19, 2013 at 10:28 AM, Varun Sharma 
>> wrote:
>>
>> > Is there a place to upload these logs ?
>> >
>> >
>> > On Fri, Apr 19, 2013 at 10:25 AM, Varun Sharma 
>> > wrote:
>> >
>> > > Hi Nicholas,
>> > >
>> > > Attached are the namenode, dn logs (of one of the healthy replicas of
>> the
>> > > WAL block) and the rs logs which got stuch doing the log split.
Action
>> > > begins at 2013-04-19 00:27*.
>> > >
>> > > Also, the rogue block is 5723958680970112840_174056. Its very
>> interesting
>> > > to trace this guy through the HDFS logs (dn and nn).
>> > >
>> > > Btw, do you know what the UNDER_RECOVERY stage is for, in HDFS ? Also
>> > does
>> > > the stale node stuff kick in for that state ?
>> > >
>> > > Thanks
>> > > Varun
>> > >
>> > >
>> > > On Fri, Apr 19, 2013 at 4:00 AM, Nicolas Liochon > > >wrote:
>> > >
>> > >> Thanks for the detailed scenario and analysis. I'm going to have a
>> look.
>> > >> I can't access the logs (ec2-107-20-237-30.compute-1.amazonaws.com
>> > >> timeouts), could you please send them directly to me?
>> > >>
>> > >> Thanks,
>> > >>
>> > >> Nicolas
>> > >>
>> > >>
>> > >> On Fri, Apr 19, 2013 at 12:46 PM, Varun Sharma 
>> > >> wrote:
>> > >>
>> > >> > Hi Nicholas,
>> > >> >
>> > >> > Here is the failure scenario, I have dug up the logs.
>> > >> >
>> > >> > A machine fails and stops accepting/transmitting traffic. The
>> HMaster
>> > >> > starts the distributed split for 13 tasks. There are 12 region
>> > servers.
>> > >> 12
>> > >> > tasks succeed but the 13th one takes a looong time.
>> > >> >
>> > >> > Zookeeper timeout is set to 30 seconds. Stale node timeout is 20
>> > >> seconds.
>> > >> > Both patches are there.
>> > >> >
>> > >> > a) Machine fails around 27:30
>> > >> > b) Master starts the split around 27:40 and submits the tasks. The
>> one
>> > >> task
>> > >> > which fails seems to be the one which contains the WAL being
>> currently
>> > >> > written to:
>> > >> >
>> > >> > 2013-04-19 00:27:44,325 INFO
>> > >> > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting
>> hlog:
>> > >> > hdfs://
>> > >> >
>> > >> >
>> > >>
>> >
>>
ec2-107-20-237-30.compute-1.amazonaws.com/hbase/.logs/ip-10-156-194-94.ec2.internal,60020,1366323217601-splitting/ip-10-156-194-94.ec2.internal%2C60020%2C1366323217601.1366331156141
>> > >> > ,
>> > >> > length=0
>> > >> >
>> > >> > Basically this region server picks up the task but finds the
>> length of
>> > >> this
>> > >> > file to be 0 and drops. This happens again
>> > >> >
>> > >> > c) Finally another region server picks up the task but it ends up
>> > going
>> > >> to
>> > >> > the bad datanode which should not happen because of the stale node
>> > >> timeout)
>> > >> > Unfortunately it hits the 45 retries and a connect timeout of 20
>> > seconds
>> > >> > every time. This delays recovery significantly. Now I guess
>> reducing #
>> > >> of
>> > >> > retries to 1 is one possibility.
>> > >> > But then the namenode logs are very interesting.
>> > >> >
>> > >> > d) Namenode seems to be in cyclic lease recovery loop until the
>> node
>> > is
>> > >> > marked dead. Th

Re: Overwrite a row

2013-04-19 Thread Ted Yu
If the maximum number of versions is set to 1 for your table, you would
already have what you wanted.

Normally max versions being 1 is not desired, that was why I asked about
your use case.

Cheers

On Fri, Apr 19, 2013 at 12:44 PM, Kristoffer Sjögren wrote:

> What would you suggest? I want the operation to be atomic.
>
>
> On Fri, Apr 19, 2013 at 8:32 PM, Ted Yu  wrote:
>
> > What is the maximum number of versions do you allow for the underlying
> > table ?
> >
> > Thanks
> >
> > On Fri, Apr 19, 2013 at 10:53 AM, Kristoffer Sjögren  > >wrote:
> >
> > > Hi
> > >
> > > Is it possible to completely overwrite/replace a row in a single
> _atomic_
> > > action? Already existing columns and qualifiers should be removed if
> they
> > > do not exist in the data inserted into the row.
> > >
> > > The only way to do this is to first delete the row then insert new data
> > in
> > > its place, correct? Or is there an operation to do this?
> > >
> > > Cheers,
> > > -Kristoffer
> > >
> >
>


RefGuide schema design examples

2013-04-19 Thread Doug Meil
Hi folks,

I reorganized the Schema Design case studies 2 weeks ago and consolidated them 
into here, plus added several cases common on the dist-list.

http://hbase.apache.org/book.html#schema.casestudies

Comments/suggestions welcome.  Thanks!


Doug Meil
Chief Software Architect, Explorys
doug.m...@explorysmedical.com




Re: Coreprocessor always scan the whole table.

2013-04-19 Thread GuoWei
Thanks  a lot.


Best Regards / 商祺
郭伟 Guo Wei


> Please upgrade to 0.94.6.1 which is more stable. 
> 
> Cheers
> 
> On Apr 19, 2013, at 4:58 AM, GuoWei  wrote:
> 
>> 
>> We use base 0.94.1 in our production environment.
>> 
>> 
>> Best Regards / 商祺
>> 郭伟 Guo Wei
>> 
>> 在 2013-4-19,下午6:01,Ted Yu  写道:
>> 
>>> Which hbase version are you using ?
>>> 
>>> Thanks
>>> 
>>> On Apr 19, 2013, at 2:49 AM, GuoWei  wrote:
>>> 
 Hello,
 
 We use HBase core processor endpoint  to process realtime data. But when I 
 use coreprocessorExec method to scan table and pass startRow and endRow. 
 It always scan all table instead of the result between the startRow and 
 endRow. 
 
 my code.
 
 results = table.coprocessorExec(IEndPoint_SA.class,  startrow, endrow,
 new Batch.Call>() {
   public Hashtable call(IEndPoint_SA 
 instance)throws IOException{
   Hashtable s = null;
 try {
 s=instance.GetList();
 } catch (ParseException e) {
 // TODO Auto-generated catch block
 e.printStackTrace();
 }
 return s;
   }
 });
 
 
 
 Best Regards / 商祺
 郭伟 Guo Wei
 -
>>> 
>> 
> 



Re: RefGuide schema design examples

2013-04-19 Thread Marcos Luis Ortiz Valmaseda
Wow, great work, Doug.


2013/4/19 Doug Meil 

> Hi folks,
>
> I reorganized the Schema Design case studies 2 weeks ago and consolidated
> them into here, plus added several cases common on the dist-list.
>
> http://hbase.apache.org/book.html#schema.casestudies
>
> Comments/suggestions welcome.  Thanks!
>
>
> Doug Meil
> Chief Software Architect, Explorys
> doug.m...@explorysmedical.com
>
>
>


-- 
Marcos Ortiz Valmaseda,
*Data-Driven Product Manager* at PDVSA
*Blog*: http://dataddict.wordpress.com/
*LinkedIn: *http://www.linkedin.com/in/marcosluis2186
*Twitter*: @marcosluis2186 


Re: Overwrite a row

2013-04-19 Thread Mohamed Ibrahim
Actually I do see it in the 0.94 JavaDocs (
http://hbase.apache.org/0.94/apidocs/org/apache/hadoop/hbase/client/HTable.html#mutateRow(org.apache.hadoop.hbase.client.RowMutations)
),
so may be it was added in 0.94.6 even though the jira says fixed in 0.95 .
I haven't used it though, but it seems that's what you're looking for.

Sorry for confusion.

Mohamed


On Fri, Apr 19, 2013 at 4:35 PM, Mohamed Ibrahim wrote:

> It seems that 0.95 is not released yet, mutateRow won't be a solution for
> now. I saw it in the downloads and I thought it was released.
>
>
> On Fri, Apr 19, 2013 at 4:18 PM, Mohamed Ibrahim wrote:
>
>> Just noticed you want to delete as well. I think that's supported since
>> 0.95 in mutateRow (
>> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#mutateRow(org.apache.hadoop.hbase.client.RowMutations)
>>  ).
>> You can do multiple puts and deletes and they will be performed atomically.
>> So you can remove qualifiers and put new ones.
>>
>> Mohamed
>>
>>
>> On Fri, Apr 19, 2013 at 3:44 PM, Kristoffer Sjögren wrote:
>>
>>> What would you suggest? I want the operation to be atomic.
>>>
>>>
>>> On Fri, Apr 19, 2013 at 8:32 PM, Ted Yu  wrote:
>>>
>>> > What is the maximum number of versions do you allow for the underlying
>>> > table ?
>>> >
>>> > Thanks
>>> >
>>> > On Fri, Apr 19, 2013 at 10:53 AM, Kristoffer Sjögren >> > >wrote:
>>> >
>>> > > Hi
>>> > >
>>> > > Is it possible to completely overwrite/replace a row in a single
>>> _atomic_
>>> > > action? Already existing columns and qualifiers should be removed if
>>> they
>>> > > do not exist in the data inserted into the row.
>>> > >
>>> > > The only way to do this is to first delete the row then insert new
>>> data
>>> > in
>>> > > its place, correct? Or is there an operation to do this?
>>> > >
>>> > > Cheers,
>>> > > -Kristoffer
>>> > >
>>> >
>>>
>>
>>
>


Re: Overwrite a row

2013-04-19 Thread Mohamed Ibrahim
It seems that 0.95 is not released yet, mutateRow won't be a solution for
now. I saw it in the downloads and I thought it was released.


On Fri, Apr 19, 2013 at 4:18 PM, Mohamed Ibrahim wrote:

> Just noticed you want to delete as well. I think that's supported since
> 0.95 in mutateRow (
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#mutateRow(org.apache.hadoop.hbase.client.RowMutations)
>  ).
> You can do multiple puts and deletes and they will be performed atomically.
> So you can remove qualifiers and put new ones.
>
> Mohamed
>
>
> On Fri, Apr 19, 2013 at 3:44 PM, Kristoffer Sjögren wrote:
>
>> What would you suggest? I want the operation to be atomic.
>>
>>
>> On Fri, Apr 19, 2013 at 8:32 PM, Ted Yu  wrote:
>>
>> > What is the maximum number of versions do you allow for the underlying
>> > table ?
>> >
>> > Thanks
>> >
>> > On Fri, Apr 19, 2013 at 10:53 AM, Kristoffer Sjögren > > >wrote:
>> >
>> > > Hi
>> > >
>> > > Is it possible to completely overwrite/replace a row in a single
>> _atomic_
>> > > action? Already existing columns and qualifiers should be removed if
>> they
>> > > do not exist in the data inserted into the row.
>> > >
>> > > The only way to do this is to first delete the row then insert new
>> data
>> > in
>> > > its place, correct? Or is there an operation to do this?
>> > >
>> > > Cheers,
>> > > -Kristoffer
>> > >
>> >
>>
>
>


Re: Overwrite a row

2013-04-19 Thread Mohamed Ibrahim
Hello Kristoffer,

HBase row mutations are atomic ( http://hbase.apache.org/acid-semantics.html ),
which include put . So when you overwrite a row it is not possible for
another processes to read half old / half new data. They will either read
all old or all new data if the put succeeds. It is also not possible for
put to fail in the middle leaving a partly modified row.

Best,
Mohamed


On Fri, Apr 19, 2013 at 1:53 PM, Kristoffer Sjögren wrote:

> Hi
>
> Is it possible to completely overwrite/replace a row in a single _atomic_
> action? Already existing columns and qualifiers should be removed if they
> do not exist in the data inserted into the row.
>
> The only way to do this is to first delete the row then insert new data in
> its place, correct? Or is there an operation to do this?
>
> Cheers,
> -Kristoffer
>


Re: Overwrite a row

2013-04-19 Thread Ted Yu
I don't know details about Kristoffer's schema.
If all the column qualifiers are known a priori, mutateRow() should serve
his needs.

HBase allows arbitrary number of columns in a column family. If the schema
is dynamic, mutateRow() wouldn't suffice.
If the column qualifiers are known but the row is very wide (and a few
columns are updated per call), performance would degrade.

Just some factors to consider.

Cheers

On Fri, Apr 19, 2013 at 1:41 PM, Mohamed Ibrahim wrote:

> Actually I do see it in the 0.94 JavaDocs (
>
> http://hbase.apache.org/0.94/apidocs/org/apache/hadoop/hbase/client/HTable.html#mutateRow(org.apache.hadoop.hbase.client.RowMutations)
> ),
> so may be it was added in 0.94.6 even though the jira says fixed in 0.95 .
> I haven't used it though, but it seems that's what you're looking for.
>
> Sorry for confusion.
>
> Mohamed
>
>
> On Fri, Apr 19, 2013 at 4:35 PM, Mohamed Ibrahim  >wrote:
>
> > It seems that 0.95 is not released yet, mutateRow won't be a solution for
> > now. I saw it in the downloads and I thought it was released.
> >
> >
> > On Fri, Apr 19, 2013 at 4:18 PM, Mohamed Ibrahim  >wrote:
> >
> >> Just noticed you want to delete as well. I think that's supported since
> >> 0.95 in mutateRow (
> >>
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#mutateRow(org.apache.hadoop.hbase.client.RowMutations)).
> >> You can do multiple puts and deletes and they will be performed
> atomically.
> >> So you can remove qualifiers and put new ones.
> >>
> >> Mohamed
> >>
> >>
> >> On Fri, Apr 19, 2013 at 3:44 PM, Kristoffer Sjögren  >wrote:
> >>
> >>> What would you suggest? I want the operation to be atomic.
> >>>
> >>>
> >>> On Fri, Apr 19, 2013 at 8:32 PM, Ted Yu  wrote:
> >>>
> >>> > What is the maximum number of versions do you allow for the
> underlying
> >>> > table ?
> >>> >
> >>> > Thanks
> >>> >
> >>> > On Fri, Apr 19, 2013 at 10:53 AM, Kristoffer Sjögren <
> sto...@gmail.com
> >>> > >wrote:
> >>> >
> >>> > > Hi
> >>> > >
> >>> > > Is it possible to completely overwrite/replace a row in a single
> >>> _atomic_
> >>> > > action? Already existing columns and qualifiers should be removed
> if
> >>> they
> >>> > > do not exist in the data inserted into the row.
> >>> > >
> >>> > > The only way to do this is to first delete the row then insert new
> >>> data
> >>> > in
> >>> > > its place, correct? Or is there an operation to do this?
> >>> > >
> >>> > > Cheers,
> >>> > > -Kristoffer
> >>> > >
> >>> >
> >>>
> >>
> >>
> >
>


Re: RefGuide schema design examples

2013-04-19 Thread Viral Bajaria
+1!


On Fri, Apr 19, 2013 at 4:09 PM, Marcos Luis Ortiz Valmaseda <
marcosluis2...@gmail.com> wrote:

> Wow, great work, Doug.
>
>
> 2013/4/19 Doug Meil 
>
> > Hi folks,
> >
> > I reorganized the Schema Design case studies 2 weeks ago and consolidated
> > them into here, plus added several cases common on the dist-list.
> >
> > http://hbase.apache.org/book.html#schema.casestudies
> >
> > Comments/suggestions welcome.  Thanks!
> >
> >
> > Doug Meil
> > Chief Software Architect, Explorys
> > doug.m...@explorysmedical.com
> >
> >
> >
>
>
> --
> Marcos Ortiz Valmaseda,
> *Data-Driven Product Manager* at PDVSA
> *Blog*: http://dataddict.wordpress.com/
> *LinkedIn: *http://www.linkedin.com/in/marcosluis2186
> *Twitter*: @marcosluis2186 
>


Re: Coreprocessor always scan the whole table.

2013-04-19 Thread GuoWei
Yes, I just want to limit the scan with the table by the table row key. 
And I don't pass the startRow and endRow in GetList method. 
I just pass the startRow and endRow to  coprocessorExec method. 
My Code is as below:


> results = table.coprocessorExec(IEndPoint_SA.class,  startrow, endrow,
> new Batch.Call>() {
>   public Hashtable call(IEndPoint_SA
>> instance)throws IOException{
>   Hashtable s = null;
> try {
> s=instance.GetList();
> } catch (ParseException e) {
> // TODO Auto-generated catch block
> e.printStackTrace();
> }
> return s;
>   }
> });

Thanks  a lot.



Best Regards / 商祺
郭伟 Guo Wei

在 2013-4-20,上午12:17,Gary Helmling  写道:

> As others mention HBASE-6870 is about coprocessorExec() always scanning the
> full .META. table to determine region locations.  Is this what you mean or
> are you talking about your coprocessor always scanning your full user table?
> 
> If you want to limit the scan within regions in your user table, you'll
> need to pass startRow and endRow as parameters to your instance.GetList()
> method.  Then when you create the region scanner in your coprocessor code,
> you'll need to set the start and end row yourself in order to limit the
> rows scanned.
> 
> 
> On Fri, Apr 19, 2013 at 5:59 AM, Ted Yu  wrote:
> 
>> Please upgrade to 0.94.6.1 which is more stable.
>> 
>> Cheers
>> 
>> On Apr 19, 2013, at 4:58 AM, GuoWei  wrote:
>> 
>>> 
>>> We use base 0.94.1 in our production environment.
>>> 
>>> 
>>> Best Regards / 商祺
>>> 郭伟 Guo Wei
>>> 
>>> 在 2013-4-19,下午6:01,Ted Yu  写道:
>>> 
 Which hbase version are you using ?
 
 Thanks
 
 On Apr 19, 2013, at 2:49 AM, GuoWei  wrote:
 
> Hello,
> 
> We use HBase core processor endpoint  to process realtime data. But
>> when I use coreprocessorExec method to scan table and pass startRow and
>> endRow. It always scan all table instead of the result between the startRow
>> and endRow.
> 
> my code.
> 
> results = table.coprocessorExec(IEndPoint_SA.class,  startrow, endrow,
> new Batch.Call>() {
>   public Hashtable call(IEndPoint_SA
>> instance)throws IOException{
>   Hashtable s = null;
> try {
> s=instance.GetList();
> } catch (ParseException e) {
> // TODO Auto-generated catch block
> e.printStackTrace();
> }
> return s;
>   }
> });
> 
> 
> 
> Best Regards / 商祺
> 郭伟 Guo Wei
> -
 
>>> 
>>