Re: Lease exception

2017-01-11 Thread Rajeshkumar J
This is the log i got
2017-01-05 11:41:49,629 DEBUG
[B.defaultRpcServer.handler=15,queue=0,port=16020] ipc.RpcServer:
B.defaultRpcServer.handler=15,queue=0,port=16020: callId: 3 service:
ClientService methodName: Scan size: 23 connection: xx.xx.xx.xx:x
org.apache.hadoop.hbase.regionserver.LeaseException: lease '706' does not
exist
at org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:221)
at org.apache.hadoop.hbase.regionserver.Leases.cancelLease(Leases.java:206)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2491)
at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32205)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
at
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
at java.lang.Thread.run(Thread.java:744)
2017-01-05 11:41:49,629 TRACE
[B.defaultRpcServer.handler=18,queue=0,port=16020] ipc.RpcServer: callId: 2
service: ClientService methodName: Scan size: 29 connection:
xx.xx.xx.xx:x param: scanner_id: 706 number_of_rows: 2147483647
close_scanner: false next_call_seq: 0 client_handles_partials: true
client_handles_heartbeats: true connection: xx.xx.xx.xx:x, response
scanner_id: 706 more_results: true stale: false more_results_in_region:
false queueTime: 1 processingTime: 60136 totalTime: 60137

I have hbase scanner timeout of 6 but here total time is greater than
that so I am getting lease exception. can any one suggest me is there any
way to find out why it takes this time.

Thanks

On Thu, Dec 22, 2016 at 3:13 PM, Phil Yang  wrote:

> https://github.com/apache/hbase/blob/rel/1.1.1/hbase-
> server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.
> java#L2491
>
> There is a TTL for scanners at server, to prevent client don't close the
> scanners and they leak. The TTL is configured by
> hbase.client.scanner.timeout.period at server and refreshed when a scan
> RPC
> request comes . The TTLs of all scanners are managed by Lease. Your error
> happens when server closes a scanner but in Lease it is already expired. So
> I think you can try to increase  hbase.client.scanner.timeout.period at
> server or decrease your hbase.client.scanner.timeout.period at client to
> prevent the scanner expired before its scanning done.
> hbase.client.scanner.timeout.period is used both at client and server, may
> be different if you change one of sides.
>
> BTW, I still suggest that you can upgrade your cluster and client. 1.1.1
> has some data-loss bugs on scanning.
>
> Thanks,
> Phil
>
>
> 2016-12-22 17:26 GMT+08:00 Rajeshkumar J :
>
> > can you please explain what is the cause of this lease exception and is
> > there any solve this in current version
> >
> > Thanks
> >
> > On Thu, Dec 22, 2016 at 2:54 PM, Phil Yang  wrote:
> >
> > > In fact at client the rpc timeout of scan request is also
> > > hbase.client.scanner.timeout.period which replaces the
> > > deprecated hbase.regionserver.lease.period.
> > >
> > > Your code that throws LeaseException has been removed by HBASE-16604,
> > maybe
> > > you can try to upgrade your cluster to 1.1.7? Your client can also
> > upgrade
> > > to 1.1.7 which will ignore UnknowScannerException and retry when the
> > lease
> > > is expired at server.
> > >
> > > Thanks,
> > > Phil
> > >
> > >
> > > 2016-12-22 16:51 GMT+08:00 Rajeshkumar J  >:
> > >
> > > > Also there is a solution what i have found from hbase user guide that
> > > > hbase.rpc.timeout must be greater than hbase.client.scanner.timeout.
> > > > period.
> > > > How these two properties plays a part in the above exception. Please
> > can
> > > > anyone explain?
> > > >
> > > > On Wed, Dec 21, 2016 at 9:39 PM, Rajeshkumar J <
> > > > rajeshkumarit8...@gmail.com>
> > > > wrote:
> > > >
> > > > > I am using hbase version 1.1.1
> > > > > Also I didn't understand something here. Whenever a scanner.next()
> is
> > > > > called it needs to return rows(based on caching value) within
> leasing
> > > > > period or else scanner client will be closed eventually throwing
> this
> > > > > exception. Correct me as I didn't get the clear understanding of
> this
> > > > issue
> > > > >
> > > > > On Wed, Dec 21, 2016 at 7:31 PM, Ted Y

Re: Lease exception

2016-12-26 Thread Rajeshkumar J
Also how to change this property hbase.client.scanner.timeout.period in
client side as I only know that to change this property in hbase-site.xml

On Thu, Dec 22, 2016 at 3:13 PM, Phil Yang  wrote:

> https://github.com/apache/hbase/blob/rel/1.1.1/hbase-
> server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.
> java#L2491
>
> There is a TTL for scanners at server, to prevent client don't close the
> scanners and they leak. The TTL is configured by
> hbase.client.scanner.timeout.period at server and refreshed when a scan
> RPC
> request comes . The TTLs of all scanners are managed by Lease. Your error
> happens when server closes a scanner but in Lease it is already expired. So
> I think you can try to increase  hbase.client.scanner.timeout.period at
> server or decrease your hbase.client.scanner.timeout.period at client to
> prevent the scanner expired before its scanning done.
> hbase.client.scanner.timeout.period is used both at client and server, may
> be different if you change one of sides.
>
> BTW, I still suggest that you can upgrade your cluster and client. 1.1.1
> has some data-loss bugs on scanning.
>
> Thanks,
> Phil
>
>
> 2016-12-22 17:26 GMT+08:00 Rajeshkumar J :
>
> > can you please explain what is the cause of this lease exception and is
> > there any solve this in current version
> >
> > Thanks
> >
> > On Thu, Dec 22, 2016 at 2:54 PM, Phil Yang  wrote:
> >
> > > In fact at client the rpc timeout of scan request is also
> > > hbase.client.scanner.timeout.period which replaces the
> > > deprecated hbase.regionserver.lease.period.
> > >
> > > Your code that throws LeaseException has been removed by HBASE-16604,
> > maybe
> > > you can try to upgrade your cluster to 1.1.7? Your client can also
> > upgrade
> > > to 1.1.7 which will ignore UnknowScannerException and retry when the
> > lease
> > > is expired at server.
> > >
> > > Thanks,
> > > Phil
> > >
> > >
> > > 2016-12-22 16:51 GMT+08:00 Rajeshkumar J  >:
> > >
> > > > Also there is a solution what i have found from hbase user guide that
> > > > hbase.rpc.timeout must be greater than hbase.client.scanner.timeout.
> > > > period.
> > > > How these two properties plays a part in the above exception. Please
> > can
> > > > anyone explain?
> > > >
> > > > On Wed, Dec 21, 2016 at 9:39 PM, Rajeshkumar J <
> > > > rajeshkumarit8...@gmail.com>
> > > > wrote:
> > > >
> > > > > I am using hbase version 1.1.1
> > > > > Also I didn't understand something here. Whenever a scanner.next()
> is
> > > > > called it needs to return rows(based on caching value) within
> leasing
> > > > > period or else scanner client will be closed eventually throwing
> this
> > > > > exception. Correct me as I didn't get the clear understanding of
> this
> > > > issue
> > > > >
> > > > > On Wed, Dec 21, 2016 at 7:31 PM, Ted Yu 
> wrote:
> > > > >
> > > > >> Which hbase release are you using ?
> > > > >>
> > > > >> There is heartbeat support when scanning.
> > > > >> Looks like the version you use doesn't have this support.
> > > > >>
> > > > >> Cheers
> > > > >>
> > > > >> > On Dec 21, 2016, at 4:02 AM, Rajeshkumar J <
> > > > rajeshkumarit8...@gmail.com>
> > > > >> wrote:
> > > > >> >
> > > > >> > Hi,
> > > > >> >
> > > > >> >   Thanks for the reply. I have properties as below
> > > > >> >
> > > > >> > 
> > > > >> >hbase.regionserver.lease.period
> > > > >> >90
> > > > >> >  
> > > > >> >  
> > > > >> >hbase.rpc.timeout
> > > > >> >90>/value>
> > > > >> >  
> > > > >> >
> > > > >> >
> > > > >> > Correct me If I am wrong.
> > > > >> >
> > > > >> > I know hbase.regionserver.lease.period, which says how long a
> > > scanner
> > > > >> > lives between calls to scanner.next().
> > > > >> >
> > > > >> >

Re: Lease exception

2016-12-26 Thread Rajeshkumar J
sorry for the delay. I didn't get the lease concept here whether it is
specific to hbase or like lease in hadoop?

On Thu, Dec 22, 2016 at 3:13 PM, Phil Yang  wrote:

> https://github.com/apache/hbase/blob/rel/1.1.1/hbase-
> server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.
> java#L2491
>
> There is a TTL for scanners at server, to prevent client don't close the
> scanners and they leak. The TTL is configured by
> hbase.client.scanner.timeout.period at server and refreshed when a scan
> RPC
> request comes . The TTLs of all scanners are managed by Lease. Your error
> happens when server closes a scanner but in Lease it is already expired. So
> I think you can try to increase  hbase.client.scanner.timeout.period at
> server or decrease your hbase.client.scanner.timeout.period at client to
> prevent the scanner expired before its scanning done.
> hbase.client.scanner.timeout.period is used both at client and server, may
> be different if you change one of sides.
>
> BTW, I still suggest that you can upgrade your cluster and client. 1.1.1
> has some data-loss bugs on scanning.
>
> Thanks,
> Phil
>
>
> 2016-12-22 17:26 GMT+08:00 Rajeshkumar J :
>
> > can you please explain what is the cause of this lease exception and is
> > there any solve this in current version
> >
> > Thanks
> >
> > On Thu, Dec 22, 2016 at 2:54 PM, Phil Yang  wrote:
> >
> > > In fact at client the rpc timeout of scan request is also
> > > hbase.client.scanner.timeout.period which replaces the
> > > deprecated hbase.regionserver.lease.period.
> > >
> > > Your code that throws LeaseException has been removed by HBASE-16604,
> > maybe
> > > you can try to upgrade your cluster to 1.1.7? Your client can also
> > upgrade
> > > to 1.1.7 which will ignore UnknowScannerException and retry when the
> > lease
> > > is expired at server.
> > >
> > > Thanks,
> > > Phil
> > >
> > >
> > > 2016-12-22 16:51 GMT+08:00 Rajeshkumar J  >:
> > >
> > > > Also there is a solution what i have found from hbase user guide that
> > > > hbase.rpc.timeout must be greater than hbase.client.scanner.timeout.
> > > > period.
> > > > How these two properties plays a part in the above exception. Please
> > can
> > > > anyone explain?
> > > >
> > > > On Wed, Dec 21, 2016 at 9:39 PM, Rajeshkumar J <
> > > > rajeshkumarit8...@gmail.com>
> > > > wrote:
> > > >
> > > > > I am using hbase version 1.1.1
> > > > > Also I didn't understand something here. Whenever a scanner.next()
> is
> > > > > called it needs to return rows(based on caching value) within
> leasing
> > > > > period or else scanner client will be closed eventually throwing
> this
> > > > > exception. Correct me as I didn't get the clear understanding of
> this
> > > > issue
> > > > >
> > > > > On Wed, Dec 21, 2016 at 7:31 PM, Ted Yu 
> wrote:
> > > > >
> > > > >> Which hbase release are you using ?
> > > > >>
> > > > >> There is heartbeat support when scanning.
> > > > >> Looks like the version you use doesn't have this support.
> > > > >>
> > > > >> Cheers
> > > > >>
> > > > >> > On Dec 21, 2016, at 4:02 AM, Rajeshkumar J <
> > > > rajeshkumarit8...@gmail.com>
> > > > >> wrote:
> > > > >> >
> > > > >> > Hi,
> > > > >> >
> > > > >> >   Thanks for the reply. I have properties as below
> > > > >> >
> > > > >> > 
> > > > >> >hbase.regionserver.lease.period
> > > > >> >90
> > > > >> >  
> > > > >> >  
> > > > >> >hbase.rpc.timeout
> > > > >> >90>/value>
> > > > >> >  
> > > > >> >
> > > > >> >
> > > > >> > Correct me If I am wrong.
> > > > >> >
> > > > >> > I know hbase.regionserver.lease.period, which says how long a
> > > scanner
> > > > >> > lives between calls to scanner.next().
> > > > >> >
> > > > >> > As far as I understand when scanner.next() is called it will
> fetch
&

Re: Lease exception

2016-12-22 Thread Phil Yang
https://github.com/apache/hbase/blob/rel/1.1.1/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java#L2491

There is a TTL for scanners at server, to prevent client don't close the
scanners and they leak. The TTL is configured by
hbase.client.scanner.timeout.period at server and refreshed when a scan RPC
request comes . The TTLs of all scanners are managed by Lease. Your error
happens when server closes a scanner but in Lease it is already expired. So
I think you can try to increase  hbase.client.scanner.timeout.period at
server or decrease your hbase.client.scanner.timeout.period at client to
prevent the scanner expired before its scanning done.
hbase.client.scanner.timeout.period is used both at client and server, may
be different if you change one of sides.

BTW, I still suggest that you can upgrade your cluster and client. 1.1.1
has some data-loss bugs on scanning.

Thanks,
Phil


2016-12-22 17:26 GMT+08:00 Rajeshkumar J :

> can you please explain what is the cause of this lease exception and is
> there any solve this in current version
>
> Thanks
>
> On Thu, Dec 22, 2016 at 2:54 PM, Phil Yang  wrote:
>
> > In fact at client the rpc timeout of scan request is also
> > hbase.client.scanner.timeout.period which replaces the
> > deprecated hbase.regionserver.lease.period.
> >
> > Your code that throws LeaseException has been removed by HBASE-16604,
> maybe
> > you can try to upgrade your cluster to 1.1.7? Your client can also
> upgrade
> > to 1.1.7 which will ignore UnknowScannerException and retry when the
> lease
> > is expired at server.
> >
> > Thanks,
> > Phil
> >
> >
> > 2016-12-22 16:51 GMT+08:00 Rajeshkumar J :
> >
> > > Also there is a solution what i have found from hbase user guide that
> > > hbase.rpc.timeout must be greater than hbase.client.scanner.timeout.
> > > period.
> > > How these two properties plays a part in the above exception. Please
> can
> > > anyone explain?
> > >
> > > On Wed, Dec 21, 2016 at 9:39 PM, Rajeshkumar J <
> > > rajeshkumarit8...@gmail.com>
> > > wrote:
> > >
> > > > I am using hbase version 1.1.1
> > > > Also I didn't understand something here. Whenever a scanner.next() is
> > > > called it needs to return rows(based on caching value) within leasing
> > > > period or else scanner client will be closed eventually throwing this
> > > > exception. Correct me as I didn't get the clear understanding of this
> > > issue
> > > >
> > > > On Wed, Dec 21, 2016 at 7:31 PM, Ted Yu  wrote:
> > > >
> > > >> Which hbase release are you using ?
> > > >>
> > > >> There is heartbeat support when scanning.
> > > >> Looks like the version you use doesn't have this support.
> > > >>
> > > >> Cheers
> > > >>
> > > >> > On Dec 21, 2016, at 4:02 AM, Rajeshkumar J <
> > > rajeshkumarit8...@gmail.com>
> > > >> wrote:
> > > >> >
> > > >> > Hi,
> > > >> >
> > > >> >   Thanks for the reply. I have properties as below
> > > >> >
> > > >> > 
> > > >> >hbase.regionserver.lease.period
> > > >> >90
> > > >> >  
> > > >> >  
> > > >> >hbase.rpc.timeout
> > > >> >90>/value>
> > > >> >  
> > > >> >
> > > >> >
> > > >> > Correct me If I am wrong.
> > > >> >
> > > >> > I know hbase.regionserver.lease.period, which says how long a
> > scanner
> > > >> > lives between calls to scanner.next().
> > > >> >
> > > >> > As far as I understand when scanner.next() is called it will fetch
> > no
> > > >> > of rows as in *hbase.client.scanner.caching. *When this fetching
> > > >> > process takes more than lease period it will close the scanner
> > object.
> > > >> > so this exception occuring?
> > > >> >
> > > >> >
> > > >> > Thanks,
> > > >> >
> > > >> > Rajeshkumar J
> > > >> >
> > > >> >
> > > >> >
> > > >> > On Wed, Dec 21, 2016 at 5:07 PM, Richard Startin <
> > > >> richardstar...@outlook.com
> > > >> >> wrote:
> > &g

Re: Lease exception

2016-12-22 Thread Rajeshkumar J
can you please explain what is the cause of this lease exception and is
there any solve this in current version

Thanks

On Thu, Dec 22, 2016 at 2:54 PM, Phil Yang  wrote:

> In fact at client the rpc timeout of scan request is also
> hbase.client.scanner.timeout.period which replaces the
> deprecated hbase.regionserver.lease.period.
>
> Your code that throws LeaseException has been removed by HBASE-16604, maybe
> you can try to upgrade your cluster to 1.1.7? Your client can also upgrade
> to 1.1.7 which will ignore UnknowScannerException and retry when the lease
> is expired at server.
>
> Thanks,
> Phil
>
>
> 2016-12-22 16:51 GMT+08:00 Rajeshkumar J :
>
> > Also there is a solution what i have found from hbase user guide that
> > hbase.rpc.timeout must be greater than hbase.client.scanner.timeout.
> > period.
> > How these two properties plays a part in the above exception. Please can
> > anyone explain?
> >
> > On Wed, Dec 21, 2016 at 9:39 PM, Rajeshkumar J <
> > rajeshkumarit8...@gmail.com>
> > wrote:
> >
> > > I am using hbase version 1.1.1
> > > Also I didn't understand something here. Whenever a scanner.next() is
> > > called it needs to return rows(based on caching value) within leasing
> > > period or else scanner client will be closed eventually throwing this
> > > exception. Correct me as I didn't get the clear understanding of this
> > issue
> > >
> > > On Wed, Dec 21, 2016 at 7:31 PM, Ted Yu  wrote:
> > >
> > >> Which hbase release are you using ?
> > >>
> > >> There is heartbeat support when scanning.
> > >> Looks like the version you use doesn't have this support.
> > >>
> > >> Cheers
> > >>
> > >> > On Dec 21, 2016, at 4:02 AM, Rajeshkumar J <
> > rajeshkumarit8...@gmail.com>
> > >> wrote:
> > >> >
> > >> > Hi,
> > >> >
> > >> >   Thanks for the reply. I have properties as below
> > >> >
> > >> > 
> > >> >hbase.regionserver.lease.period
> > >> >90
> > >> >  
> > >> >  
> > >> >hbase.rpc.timeout
> > >> >90>/value>
> > >> >  
> > >> >
> > >> >
> > >> > Correct me If I am wrong.
> > >> >
> > >> > I know hbase.regionserver.lease.period, which says how long a
> scanner
> > >> > lives between calls to scanner.next().
> > >> >
> > >> > As far as I understand when scanner.next() is called it will fetch
> no
> > >> > of rows as in *hbase.client.scanner.caching. *When this fetching
> > >> > process takes more than lease period it will close the scanner
> object.
> > >> > so this exception occuring?
> > >> >
> > >> >
> > >> > Thanks,
> > >> >
> > >> > Rajeshkumar J
> > >> >
> > >> >
> > >> >
> > >> > On Wed, Dec 21, 2016 at 5:07 PM, Richard Startin <
> > >> richardstar...@outlook.com
> > >> >> wrote:
> > >> >
> > >> >> It means your lease on a region server has expired during a call to
> > >> >> resultscanner.next(). This happens on a slow call to next(). You
> can
> > >> either
> > >> >> embrace it or "fix" it by making sure hbase.rpc.timeout exceeds
> > >> >> hbase.regionserver.lease.period.
> > >> >>
> > >> >> https://richardstartin.com
> > >> >>
> > >> >> On 21 Dec 2016, at 11:30, Rajeshkumar J <
> rajeshkumarit8...@gmail.com
> > <
> > >> >> mailto:rajeshkumarit8...@gmail.com>> wrote:
> > >> >>
> > >> >> Hi,
> > >> >>
> > >> >>  I have faced below issue in our production cluster
> > >> >>
> > >> >> org.apache.hadoop.hbase.regionserver.LeaseException:
> > >> >> org.apache.hadoop.hbase.regionserver.LeaseException: lease
> '166881'
> > >> does
> > >> >> not exist
> > >> >> at org.apache.hadoop.hbase.regionserver.Leases.
> > >> >> removeLease(Leases.java:221)
> > >> >> at org.apache.hadoop.hbase.regionserver.Leases.
> > >> >> cancelLease(Leases.java:206)
> > >> >> at
> > >> >> org.apache.hadoop.hbase.regionserver.RSRpcServices.
> > >> >> scan(RSRpcServices.java:2491)
> > >> >> at
> > >> >> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$
> > >> ClientService$2.
> > >> >> callBlockingMethod(ClientProtos.java:32205)
> > >> >> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
> > >> >> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
> > >> >> at
> > >> >> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExec
> > >> utor.java:130)
> > >> >> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.
> > java:107)
> > >> >> at java.lang.Thread.run(Thread.java:744)
> > >> >>
> > >> >>
> > >> >> Can any one explain what is lease exception
> > >> >>
> > >> >> Thanks,
> > >> >> Rajeshkumar J
> > >> >>
> > >>
> > >
> > >
> >
>


Re: Lease exception

2016-12-22 Thread Phil Yang
In fact at client the rpc timeout of scan request is also
hbase.client.scanner.timeout.period which replaces the
deprecated hbase.regionserver.lease.period.

Your code that throws LeaseException has been removed by HBASE-16604, maybe
you can try to upgrade your cluster to 1.1.7? Your client can also upgrade
to 1.1.7 which will ignore UnknowScannerException and retry when the lease
is expired at server.

Thanks,
Phil


2016-12-22 16:51 GMT+08:00 Rajeshkumar J :

> Also there is a solution what i have found from hbase user guide that
> hbase.rpc.timeout must be greater than hbase.client.scanner.timeout.
> period.
> How these two properties plays a part in the above exception. Please can
> anyone explain?
>
> On Wed, Dec 21, 2016 at 9:39 PM, Rajeshkumar J <
> rajeshkumarit8...@gmail.com>
> wrote:
>
> > I am using hbase version 1.1.1
> > Also I didn't understand something here. Whenever a scanner.next() is
> > called it needs to return rows(based on caching value) within leasing
> > period or else scanner client will be closed eventually throwing this
> > exception. Correct me as I didn't get the clear understanding of this
> issue
> >
> > On Wed, Dec 21, 2016 at 7:31 PM, Ted Yu  wrote:
> >
> >> Which hbase release are you using ?
> >>
> >> There is heartbeat support when scanning.
> >> Looks like the version you use doesn't have this support.
> >>
> >> Cheers
> >>
> >> > On Dec 21, 2016, at 4:02 AM, Rajeshkumar J <
> rajeshkumarit8...@gmail.com>
> >> wrote:
> >> >
> >> > Hi,
> >> >
> >> >   Thanks for the reply. I have properties as below
> >> >
> >> > 
> >> >hbase.regionserver.lease.period
> >> >90
> >> >  
> >> >  
> >> >hbase.rpc.timeout
> >> >90>/value>
> >> >  
> >> >
> >> >
> >> > Correct me If I am wrong.
> >> >
> >> > I know hbase.regionserver.lease.period, which says how long a scanner
> >> > lives between calls to scanner.next().
> >> >
> >> > As far as I understand when scanner.next() is called it will fetch no
> >> > of rows as in *hbase.client.scanner.caching. *When this fetching
> >> > process takes more than lease period it will close the scanner object.
> >> > so this exception occuring?
> >> >
> >> >
> >> > Thanks,
> >> >
> >> > Rajeshkumar J
> >> >
> >> >
> >> >
> >> > On Wed, Dec 21, 2016 at 5:07 PM, Richard Startin <
> >> richardstar...@outlook.com
> >> >> wrote:
> >> >
> >> >> It means your lease on a region server has expired during a call to
> >> >> resultscanner.next(). This happens on a slow call to next(). You can
> >> either
> >> >> embrace it or "fix" it by making sure hbase.rpc.timeout exceeds
> >> >> hbase.regionserver.lease.period.
> >> >>
> >> >> https://richardstartin.com
> >> >>
> >> >> On 21 Dec 2016, at 11:30, Rajeshkumar J  <
> >> >> mailto:rajeshkumarit8...@gmail.com>> wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >>  I have faced below issue in our production cluster
> >> >>
> >> >> org.apache.hadoop.hbase.regionserver.LeaseException:
> >> >> org.apache.hadoop.hbase.regionserver.LeaseException: lease '166881'
> >> does
> >> >> not exist
> >> >> at org.apache.hadoop.hbase.regionserver.Leases.
> >> >> removeLease(Leases.java:221)
> >> >> at org.apache.hadoop.hbase.regionserver.Leases.
> >> >> cancelLease(Leases.java:206)
> >> >> at
> >> >> org.apache.hadoop.hbase.regionserver.RSRpcServices.
> >> >> scan(RSRpcServices.java:2491)
> >> >> at
> >> >> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$
> >> ClientService$2.
> >> >> callBlockingMethod(ClientProtos.java:32205)
> >> >> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
> >> >> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
> >> >> at
> >> >> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExec
> >> utor.java:130)
> >> >> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.
> java:107)
> >> >> at java.lang.Thread.run(Thread.java:744)
> >> >>
> >> >>
> >> >> Can any one explain what is lease exception
> >> >>
> >> >> Thanks,
> >> >> Rajeshkumar J
> >> >>
> >>
> >
> >
>


Re: Lease exception

2016-12-22 Thread Rajeshkumar J
Also there is a solution what i have found from hbase user guide that
hbase.rpc.timeout must be greater than hbase.client.scanner.timeout.period.
How these two properties plays a part in the above exception. Please can
anyone explain?

On Wed, Dec 21, 2016 at 9:39 PM, Rajeshkumar J 
wrote:

> I am using hbase version 1.1.1
> Also I didn't understand something here. Whenever a scanner.next() is
> called it needs to return rows(based on caching value) within leasing
> period or else scanner client will be closed eventually throwing this
> exception. Correct me as I didn't get the clear understanding of this issue
>
> On Wed, Dec 21, 2016 at 7:31 PM, Ted Yu  wrote:
>
>> Which hbase release are you using ?
>>
>> There is heartbeat support when scanning.
>> Looks like the version you use doesn't have this support.
>>
>> Cheers
>>
>> > On Dec 21, 2016, at 4:02 AM, Rajeshkumar J 
>> wrote:
>> >
>> > Hi,
>> >
>> >   Thanks for the reply. I have properties as below
>> >
>> > 
>> >hbase.regionserver.lease.period
>> >90
>> >  
>> >  
>> >hbase.rpc.timeout
>> >90>/value>
>> >  
>> >
>> >
>> > Correct me If I am wrong.
>> >
>> > I know hbase.regionserver.lease.period, which says how long a scanner
>> > lives between calls to scanner.next().
>> >
>> > As far as I understand when scanner.next() is called it will fetch no
>> > of rows as in *hbase.client.scanner.caching. *When this fetching
>> > process takes more than lease period it will close the scanner object.
>> > so this exception occuring?
>> >
>> >
>> > Thanks,
>> >
>> > Rajeshkumar J
>> >
>> >
>> >
>> > On Wed, Dec 21, 2016 at 5:07 PM, Richard Startin <
>> richardstar...@outlook.com
>> >> wrote:
>> >
>> >> It means your lease on a region server has expired during a call to
>> >> resultscanner.next(). This happens on a slow call to next(). You can
>> either
>> >> embrace it or "fix" it by making sure hbase.rpc.timeout exceeds
>> >> hbase.regionserver.lease.period.
>> >>
>> >> https://richardstartin.com
>> >>
>> >> On 21 Dec 2016, at 11:30, Rajeshkumar J > >> mailto:rajeshkumarit8...@gmail.com>> wrote:
>> >>
>> >> Hi,
>> >>
>> >>  I have faced below issue in our production cluster
>> >>
>> >> org.apache.hadoop.hbase.regionserver.LeaseException:
>> >> org.apache.hadoop.hbase.regionserver.LeaseException: lease '166881'
>> does
>> >> not exist
>> >> at org.apache.hadoop.hbase.regionserver.Leases.
>> >> removeLease(Leases.java:221)
>> >> at org.apache.hadoop.hbase.regionserver.Leases.
>> >> cancelLease(Leases.java:206)
>> >> at
>> >> org.apache.hadoop.hbase.regionserver.RSRpcServices.
>> >> scan(RSRpcServices.java:2491)
>> >> at
>> >> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$
>> ClientService$2.
>> >> callBlockingMethod(ClientProtos.java:32205)
>> >> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
>> >> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
>> >> at
>> >> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExec
>> utor.java:130)
>> >> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
>> >> at java.lang.Thread.run(Thread.java:744)
>> >>
>> >>
>> >> Can any one explain what is lease exception
>> >>
>> >> Thanks,
>> >> Rajeshkumar J
>> >>
>>
>
>


Re: Lease exception

2016-12-21 Thread Rajeshkumar J
I am using hbase version 1.1.1
Also I didn't understand something here. Whenever a scanner.next() is
called it needs to return rows(based on caching value) within leasing
period or else scanner client will be closed eventually throwing this
exception. Correct me as I didn't get the clear understanding of this issue

On Wed, Dec 21, 2016 at 7:31 PM, Ted Yu  wrote:

> Which hbase release are you using ?
>
> There is heartbeat support when scanning.
> Looks like the version you use doesn't have this support.
>
> Cheers
>
> > On Dec 21, 2016, at 4:02 AM, Rajeshkumar J 
> wrote:
> >
> > Hi,
> >
> >   Thanks for the reply. I have properties as below
> >
> > 
> >hbase.regionserver.lease.period
> >90
> >  
> >  
> >hbase.rpc.timeout
> >90>/value>
> >  
> >
> >
> > Correct me If I am wrong.
> >
> > I know hbase.regionserver.lease.period, which says how long a scanner
> > lives between calls to scanner.next().
> >
> > As far as I understand when scanner.next() is called it will fetch no
> > of rows as in *hbase.client.scanner.caching. *When this fetching
> > process takes more than lease period it will close the scanner object.
> > so this exception occuring?
> >
> >
> > Thanks,
> >
> > Rajeshkumar J
> >
> >
> >
> > On Wed, Dec 21, 2016 at 5:07 PM, Richard Startin <
> richardstar...@outlook.com
> >> wrote:
> >
> >> It means your lease on a region server has expired during a call to
> >> resultscanner.next(). This happens on a slow call to next(). You can
> either
> >> embrace it or "fix" it by making sure hbase.rpc.timeout exceeds
> >> hbase.regionserver.lease.period.
> >>
> >> https://richardstartin.com
> >>
> >> On 21 Dec 2016, at 11:30, Rajeshkumar J  >> mailto:rajeshkumarit8...@gmail.com>> wrote:
> >>
> >> Hi,
> >>
> >>  I have faced below issue in our production cluster
> >>
> >> org.apache.hadoop.hbase.regionserver.LeaseException:
> >> org.apache.hadoop.hbase.regionserver.LeaseException: lease '166881'
> does
> >> not exist
> >> at org.apache.hadoop.hbase.regionserver.Leases.
> >> removeLease(Leases.java:221)
> >> at org.apache.hadoop.hbase.regionserver.Leases.
> >> cancelLease(Leases.java:206)
> >> at
> >> org.apache.hadoop.hbase.regionserver.RSRpcServices.
> >> scan(RSRpcServices.java:2491)
> >> at
> >> org.apache.hadoop.hbase.protobuf.generated.
> ClientProtos$ClientService$2.
> >> callBlockingMethod(ClientProtos.java:32205)
> >> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
> >> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
> >> at
> >> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(
> RpcExecutor.java:130)
> >> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> >> at java.lang.Thread.run(Thread.java:744)
> >>
> >>
> >> Can any one explain what is lease exception
> >>
> >> Thanks,
> >> Rajeshkumar J
> >>
>


Re: Lease exception

2016-12-21 Thread Ted Yu
Which hbase release are you using ?

There is heartbeat support when scanning. 
Looks like the version you use doesn't have this support. 

Cheers

> On Dec 21, 2016, at 4:02 AM, Rajeshkumar J  
> wrote:
> 
> Hi,
> 
>   Thanks for the reply. I have properties as below
> 
> 
>hbase.regionserver.lease.period
>90
>  
>  
>hbase.rpc.timeout
>90>/value>
>  
> 
> 
> Correct me If I am wrong.
> 
> I know hbase.regionserver.lease.period, which says how long a scanner
> lives between calls to scanner.next().
> 
> As far as I understand when scanner.next() is called it will fetch no
> of rows as in *hbase.client.scanner.caching. *When this fetching
> process takes more than lease period it will close the scanner object.
> so this exception occuring?
> 
> 
> Thanks,
> 
> Rajeshkumar J
> 
> 
> 
> On Wed, Dec 21, 2016 at 5:07 PM, Richard Startin > wrote:
> 
>> It means your lease on a region server has expired during a call to
>> resultscanner.next(). This happens on a slow call to next(). You can either
>> embrace it or "fix" it by making sure hbase.rpc.timeout exceeds
>> hbase.regionserver.lease.period.
>> 
>> https://richardstartin.com
>> 
>> On 21 Dec 2016, at 11:30, Rajeshkumar J > mailto:rajeshkumarit8...@gmail.com>> wrote:
>> 
>> Hi,
>> 
>>  I have faced below issue in our production cluster
>> 
>> org.apache.hadoop.hbase.regionserver.LeaseException:
>> org.apache.hadoop.hbase.regionserver.LeaseException: lease '166881' does
>> not exist
>> at org.apache.hadoop.hbase.regionserver.Leases.
>> removeLease(Leases.java:221)
>> at org.apache.hadoop.hbase.regionserver.Leases.
>> cancelLease(Leases.java:206)
>> at
>> org.apache.hadoop.hbase.regionserver.RSRpcServices.
>> scan(RSRpcServices.java:2491)
>> at
>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.
>> callBlockingMethod(ClientProtos.java:32205)
>> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
>> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
>> at
>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
>> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
>> at java.lang.Thread.run(Thread.java:744)
>> 
>> 
>> Can any one explain what is lease exception
>> 
>> Thanks,
>> Rajeshkumar J
>> 


Re: Lease exception

2016-12-21 Thread Richard Startin
If your client caching is set to a large value, you will need to do a long scan 
occasionally, and the rpc itself will be expensive in terms of IO. So it's 
worth looking at hbase.client.scanner.caching to see if it is too large. If 
you're scanning the whole table check you aren't churning the block cache.

The XML below looks wrong, was that copied verbatim from your site file?

https://richardstartin.com

> On 21 Dec 2016, at 12:02, Rajeshkumar J  wrote:
> 
> Hi,
> 
>   Thanks for the reply. I have properties as below
> 
> 
>hbase.regionserver.lease.period
>90
>  
>  
>hbase.rpc.timeout
>90>/value>
>  
> 
> 
> Correct me If I am wrong.
> 
> I know hbase.regionserver.lease.period, which says how long a scanner
> lives between calls to scanner.next().
> 
> As far as I understand when scanner.next() is called it will fetch no
> of rows as in *hbase.client.scanner.caching. *When this fetching
> process takes more than lease period it will close the scanner object.
> so this exception occuring?
> 
> 
> Thanks,
> 
> Rajeshkumar J
> 
> 
> 
> On Wed, Dec 21, 2016 at 5:07 PM, Richard Startin > wrote:
> 
>> It means your lease on a region server has expired during a call to
>> resultscanner.next(). This happens on a slow call to next(). You can either
>> embrace it or "fix" it by making sure hbase.rpc.timeout exceeds
>> hbase.regionserver.lease.period.
>> 
>> https://richardstartin.com
>> 
>> On 21 Dec 2016, at 11:30, Rajeshkumar J > mailto:rajeshkumarit8...@gmail.com>> wrote:
>> 
>> Hi,
>> 
>>  I have faced below issue in our production cluster
>> 
>> org.apache.hadoop.hbase.regionserver.LeaseException:
>> org.apache.hadoop.hbase.regionserver.LeaseException: lease '166881' does
>> not exist
>> at org.apache.hadoop.hbase.regionserver.Leases.
>> removeLease(Leases.java:221)
>> at org.apache.hadoop.hbase.regionserver.Leases.
>> cancelLease(Leases.java:206)
>> at
>> org.apache.hadoop.hbase.regionserver.RSRpcServices.
>> scan(RSRpcServices.java:2491)
>> at
>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.
>> callBlockingMethod(ClientProtos.java:32205)
>> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
>> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
>> at
>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
>> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
>> at java.lang.Thread.run(Thread.java:744)
>> 
>> 
>> Can any one explain what is lease exception
>> 
>> Thanks,
>> Rajeshkumar J
>> 


Re: Lease exception

2016-12-21 Thread Rajeshkumar J
Hi,

   Thanks for the reply. I have properties as below


hbase.regionserver.lease.period
90
  
  
hbase.rpc.timeout
90>/value>
  


Correct me If I am wrong.

I know hbase.regionserver.lease.period, which says how long a scanner
lives between calls to scanner.next().

As far as I understand when scanner.next() is called it will fetch no
of rows as in *hbase.client.scanner.caching. *When this fetching
process takes more than lease period it will close the scanner object.
so this exception occuring?


Thanks,

Rajeshkumar J



On Wed, Dec 21, 2016 at 5:07 PM, Richard Startin  wrote:

> It means your lease on a region server has expired during a call to
> resultscanner.next(). This happens on a slow call to next(). You can either
> embrace it or "fix" it by making sure hbase.rpc.timeout exceeds
> hbase.regionserver.lease.period.
>
> https://richardstartin.com
>
> On 21 Dec 2016, at 11:30, Rajeshkumar J  mailto:rajeshkumarit8...@gmail.com>> wrote:
>
> Hi,
>
>   I have faced below issue in our production cluster
>
> org.apache.hadoop.hbase.regionserver.LeaseException:
> org.apache.hadoop.hbase.regionserver.LeaseException: lease '166881' does
> not exist
> at org.apache.hadoop.hbase.regionserver.Leases.
> removeLease(Leases.java:221)
> at org.apache.hadoop.hbase.regionserver.Leases.
> cancelLease(Leases.java:206)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.
> scan(RSRpcServices.java:2491)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.
> callBlockingMethod(ClientProtos.java:32205)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
> at
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> at java.lang.Thread.run(Thread.java:744)
>
>
> Can any one explain what is lease exception
>
> Thanks,
> Rajeshkumar J
>


Re: Lease exception

2016-12-21 Thread Richard Startin
It means your lease on a region server has expired during a call to 
resultscanner.next(). This happens on a slow call to next(). You can either 
embrace it or "fix" it by making sure hbase.rpc.timeout exceeds 
hbase.regionserver.lease.period.

https://richardstartin.com

On 21 Dec 2016, at 11:30, Rajeshkumar J 
mailto:rajeshkumarit8...@gmail.com>> wrote:

Hi,

  I have faced below issue in our production cluster

org.apache.hadoop.hbase.regionserver.LeaseException:
org.apache.hadoop.hbase.regionserver.LeaseException: lease '166881' does
not exist
at org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:221)
at org.apache.hadoop.hbase.regionserver.Leases.cancelLease(Leases.java:206)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2491)
at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32205)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
at
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
at java.lang.Thread.run(Thread.java:744)


Can any one explain what is lease exception

Thanks,
Rajeshkumar J


Lease exception

2016-12-21 Thread Rajeshkumar J
Hi,

   I have faced below issue in our production cluster

org.apache.hadoop.hbase.regionserver.LeaseException:
org.apache.hadoop.hbase.regionserver.LeaseException: lease '166881' does
not exist
at org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:221)
at org.apache.hadoop.hbase.regionserver.Leases.cancelLease(Leases.java:206)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2491)
at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32205)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
at
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
at java.lang.Thread.run(Thread.java:744)


Can any one explain what is lease exception

Thanks,
Rajeshkumar J


Re: Lease exception when I execute large scan with filters.

2014-04-12 Thread Ted Yu
HBase refguide has some explanation on internals w.r.t. versions:
http://hbase.apache.org/book.html#versions

bq. why HBase has versioning

This came from Bigtable. See the paragraph on page 3 of osdi paper:
http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf

The example use case from the above paper was to store 3 versions (i.e.
timestamps) of contents column. The timestamps are

bq. the times at which these page versions were actually crawled.

Cheers


On Sat, Apr 12, 2014 at 2:14 PM, Michael Segel wrote:

> You do realize that it is an internal feature and that the public API can
> change to not present access to it.
> However, that wouldn't be a good idea because you would want to be able to
> change it and in some cases review the versions of a cell.  How else do you
> describe versioning which is unique to HBase and/or other specific
> databases, yet temporal modeling is not?
>
> In fact if memory servers... going back to 2009-10 IIRC the 'old API' vs the
> 'new API' for Hadoop where the 'new API' had a subset of the exposed
> classes / methods than the old API? (It was an attempt to simplify the API...
> ) So again, APIs can change.
>
> The point is that you should be modeling your data on time if it is time
> sensitive data. Using versioning bypasses this with bad consequences.
>
> By all means keep abusing the cell's versioning.
> Just don't complain about poor performance and your HBase tossing
> exceptions left and right. I mean I can't stop you from mixing booze, coke
> and meth. All I can do is tell you that its not a good idea and not
> recommended.
>
> If you want a good definition of why HBase has versioning... go ask StAck,
> Ted, Nick or one of the committers since they are more familiar with the
> internal workings of HBase than I. When you get a good answer, then have
> the online HBase book updated.
>
> -Mike
>
> PS... if you want a really good example of why not to use versioning to
> store temporal data...
> What happens if you're storing 100 versions of a cell and you find out
> that you have a duplicate entry with the wrong timestamp and you want to
> delete that one version.
> How do you do that? Going from memory, and I could very well be wrong, but
> the tombstone marker is on the cell, not the version, right?
>
> If it is on the version, what happens to the versions of the cell that are
> older than the tombstone marker?
> Sorry, its been a while since I've been intimate with HBase. Doing a bit
> of other things at the moment, and I'm already overtaxing my last remaining
> living brain cell.  ;-)
>
>
> On Apr 12, 2014, at 9:14 PM, Brian Jeltema  wrote:
>
> > I don't want to be argumentative here, but by definition is's not an
> internal feature because it's part of the
> > public API. We use versioning in a way that makes me somewhat
> uncomfortable, but it's been quite
> > useful. I'd like to see a clear explanation of why it exists and what
> use cases it was intended to support.
> >
> > Brian
> >
> >> Since you asked...
> >>
> >> Simplest answer... your schema should not rely upon internal features of
> the system.  Since you are tracking your data along the lines of a temporal
> attribute it should be part of the schema. In terms of a good design, by
> making it a part of the schema, you're defining that the data has a
> temporal property/attribute.
> >>
> >> Cell versioning is an internal feature of HBase. Its there for a reason.
> >> Perhaps one of the committers should expand on why its there.  (When I
> asked this earlier, never got an answer. )
> >>
> >>
> >> Longer answer... review how HBase stores the rows, including the versions
> of the cell.
> >> You're putting an unnecessary stress on the system.
> >>
> >> Its just not Zen... ;-)
> >>
> >> The reason I'm a bit short on this topic is that its an issue that
> keeps coming up, over and over again because some idiot keeps looking to
> take a shortcut without understanding the implications of their decision.
> Just like salting the key. (Note:  prepending a truncated hash isn't the
> same as using a salt.  Salting has a specific meaning and the salt is
> orthogonal to the underlying key. Any relationship between the salt and the
> key is purely random luck.)
> >>
> >> Does that help?
> >> (BTW, this should be part of any schema design talk... yet somehow I
> think its not covered... )
> >>
> >> -Mike
> >>
> >> PS. Its not weird that the cell versions are checked. It makes perfect
> sense.
> >>
> >> On Apr 12, 2014, at 2:55 PM, Guillermo Ortiz 
> wrote:
> >>
> >>> Well, It was just a example why I could keep a thousand versions or a
> cell.
> >>> I didn't know that HBase was checking each version when I do a scan,
> it's a
> >>> little weird when data is sorted.
> >>>
> >>> You get my attention with your comment, that it's better to store data
> over
> >>> time with new columns that with versions. Why is it better?
> >>> Versions looks that there're very convenient for that

Re: Lease exception when I execute large scan with filters.

2014-04-12 Thread Michael Segel
You do realize that it is an internal feature and that the public API can 
change to not present access to it.
However, that wouldn’t be a good idea because you would want to be able to 
change it and in some cases review the versions of a cell.  How else do you 
describe versioning which is unique to HBase and/or other specific databases, 
yet temporal modeling is not? 

In fact if memory servers… going back to 2009-10 IIRC the ‘old API’ vs the ‘new 
API’ for Hadoop where the ‘new API’ had a subset of the exposed classes / 
methods than the old API? (It was an attempt to simplify the API… ) So again, 
APIs can change. 

The point is that you should be modeling your data on time if it is time 
sensitive data. Using versioning bypasses this with bad consequences. 

By all means keep abusing the cell’s versioning. 
Just don’t complain about poor performance and your HBase tossing exceptions 
left and right. I mean I can’t stop you from mixing booze, coke and meth. All I 
can do is tell you that its not a good idea and not recommended. 

If you want a good definition of why HBase has versioning… go ask StAck, Ted, 
Nick or one of the committers since they are more familiar with the internal 
workings of HBase than I. When you get a good answer, then have the online 
HBase book updated.

-Mike

PS… if you want a really good example of why not to use versioning to store 
temporal data… 
What happens if you’re storing 100 versions of a cell and you find out that you 
have a duplicate entry with the wrong timestamp and you want to delete that one 
version.
How do you do that? Going from memory, and I could very well be wrong, but the 
tombstone marker is on the cell, not the version, right? 

If it is on the version, what happens to the versions of the cell that are 
older than the tombstone marker?
Sorry, its been a while since I’ve been intimate with HBase. Doing a bit of 
other things at the moment, and I’m already overtaxing my last remaining living 
brain cell.  ;-) 


On Apr 12, 2014, at 9:14 PM, Brian Jeltema  wrote:

> I don't want to be argumentative here, but by definition is's not an internal 
> feature because it's part of the
> public API. We use versioning in a way that makes me somewhat uncomfortable, 
> but it's been quite
> useful. I'd like to see a clear explanation of why it exists and what use 
> cases it was intended to support.
> 
> Brian
> 
>> Since you asked… 
>> 
>> Simplest answer… your schema should not rely upon internal features of the 
>> system.  Since you are tracking your data along the lines of a temporal 
>> attribute it should be part of the schema. In terms of a good design, by 
>> making it a part of the schema, you’re defining that the data has a temporal 
>> property/attribute. 
>> 
>> Cell versioning is an internal feature of HBase. Its there for a reason. 
>> Perhaps one of the committers should expand on why its there.  (When I asked 
>> this earlier, never got an answer. ) 
>> 
>> 
>> Longer answer… review how HBase stores the rows, including the versions of 
>> the cell. 
>> You’re putting an unnecessary stress on the system. 
>> 
>> Its just not Zen… ;-) 
>> 
>> The reason I’m a bit short on this topic is that its an issue that keeps 
>> coming up, over and over again because some idiot keeps looking to take a 
>> shortcut without understanding the implications of their decision. Just like 
>> salting the key. (Note:  prepending a truncated hash isn’t the same as using 
>> a salt.  Salting has a specific meaning and the salt is orthogonal to the 
>> underlying key. Any relationship between the salt and the key is purely 
>> random luck.) 
>> 
>> Does that help? 
>> (BTW, this should be part of any schema design talk… yet somehow I think its 
>> not covered… ) 
>> 
>> -Mike
>> 
>> PS. Its not weird that the cell versions are checked. It makes perfect 
>> sense. 
>> 
>> On Apr 12, 2014, at 2:55 PM, Guillermo Ortiz  wrote:
>> 
>>> Well, It was just a example why I could keep a thousand versions or a cell.
>>> I didn't know that HBase was checking each version when I do a scan, it's a
>>> little weird when data is sorted.
>>> 
>>> You get my attention with your comment, that it's better to store data over
>>> time with new columns that with versions. Why is it better?
>>> Versions looks that there're very convenient for that use case. So, does it
>>> work better a rowkey with 3600 columns, that a rowkey with a column with
>>> 3600 versions? What's the reason for avoiding a massive use of versions?
>>> 
>>> 
>>> 2014-04-12 15:07 GMT+02:00 Michael Segel :
>>> 
 Silly question...
 
 Why does the idea of using versioning to capture temporal changes to data
 keep being propagated?
 
 Seriously this issue keeps popping up...
 
 If you want to capture data over time... use a timestamp as part of the
 column name.  Don't abuse the cell's version.
 
 
 
 On Apr 11, 2014, at 11:03 AM, gortiz  wrote:
 
> Yes, I have tried w

Re: Lease exception when I execute large scan with filters.

2014-04-12 Thread Brian Jeltema
I don't want to be argumentative here, but by definition is's not an internal 
feature because it's part of the
public API. We use versioning in a way that makes me somewhat uncomfortable, 
but it's been quite
useful. I'd like to see a clear explanation of why it exists and what use cases 
it was intended to support.

Brian

> Since you asked… 
> 
> Simplest answer… your schema should not rely upon internal features of the 
> system.  Since you are tracking your data along the lines of a temporal 
> attribute it should be part of the schema. In terms of a good design, by 
> making it a part of the schema, you’re defining that the data has a temporal 
> property/attribute. 
> 
> Cell versioning is an internal feature of HBase. Its there for a reason. 
> Perhaps one of the committers should expand on why its there.  (When I asked 
> this earlier, never got an answer. ) 
> 
> 
> Longer answer… review how HBase stores the rows, including the versions of 
> the cell. 
> You’re putting an unnecessary stress on the system. 
> 
> Its just not Zen… ;-) 
> 
> The reason I’m a bit short on this topic is that its an issue that keeps 
> coming up, over and over again because some idiot keeps looking to take a 
> shortcut without understanding the implications of their decision. Just like 
> salting the key. (Note:  prepending a truncated hash isn’t the same as using 
> a salt.  Salting has a specific meaning and the salt is orthogonal to the 
> underlying key. Any relationship between the salt and the key is purely 
> random luck.) 
> 
> Does that help? 
> (BTW, this should be part of any schema design talk… yet somehow I think its 
> not covered… ) 
> 
> -Mike
> 
> PS. Its not weird that the cell versions are checked. It makes perfect sense. 
> 
> On Apr 12, 2014, at 2:55 PM, Guillermo Ortiz  wrote:
> 
>> Well, It was just a example why I could keep a thousand versions or a cell.
>> I didn't know that HBase was checking each version when I do a scan, it's a
>> little weird when data is sorted.
>> 
>> You get my attention with your comment, that it's better to store data over
>> time with new columns that with versions. Why is it better?
>> Versions looks that there're very convenient for that use case. So, does it
>> work better a rowkey with 3600 columns, that a rowkey with a column with
>> 3600 versions? What's the reason for avoiding a massive use of versions?
>> 
>> 
>> 2014-04-12 15:07 GMT+02:00 Michael Segel :
>> 
>>> Silly question...
>>> 
>>> Why does the idea of using versioning to capture temporal changes to data
>>> keep being propagated?
>>> 
>>> Seriously this issue keeps popping up...
>>> 
>>> If you want to capture data over time... use a timestamp as part of the
>>> column name.  Don't abuse the cell's version.
>>> 
>>> 
>>> 
>>> On Apr 11, 2014, at 11:03 AM, gortiz  wrote:
>>> 
 Yes, I have tried with two different values for that value of versions,
>>> 1000 and maximum value for integers.
 
 But, I want to keep those versions. I don't want to keep just 3
>>> versions. Imagine that I want to record a new version each minute and store
>>> a day, those are 1440 versions.
 
 Why is HBase going to read all the versions?? , I thought, if you don't
>>> indicate any versions it's just read the newest and skip the rest. It
>>> doesn't make too much sense to read all of them if data is sorted, plus the
>>> newest version is stored in the top.
 
 
 On 11/04/14 11:54, Anoop John wrote:
> What is the max version setting u have done for ur table cf?  When u set
> some a value, HBase has to keep all those versions.  During a scan it
>>> will
> read all those versions. In 94 version the default value for the max
> versions is 3.  I guess you have set some bigger value.   If u have not,
> mind testing after a major compaction?
> 
> -Anoop-
> 
> On Fri, Apr 11, 2014 at 1:01 PM, gortiz  wrote:
> 
>> Last test I have done it's to reduce the number of versions to 100.
>> So, right now, I have 100 rows with 100 versions each one.
>> Times are: (I got the same times for blocksize of 64Ks and 1Mb)
>> 100row-1000versions + blockcache-> 80s.
>> 100row-1000versions + No blockcache-> 70s.
>> 
>> 100row-*100*versions + blockcache-> 7.3s.
>> 100row-*100*versions + No blockcache-> 6.1s.
>> 
>> What's the reasons of this? I guess HBase is enough smart for not
>>> consider
>> old versions, so, it just checks the newest. But, I reduce 10 times the
>> size (in versions) and I got a 10x of performance.
>> 
>> The filter is scan 'filters', {FILTER => "ValueFilter(=,
>> 'binary:5')",STARTROW => '10100101',
>> STOPROW => '60100201'}
>> 
>> 
>> 
>> On 11/04/14 09:04, gortiz wrote:
>> 
>>> Well, I guessed that, what it doesn't make too much sense because
>>> it's so
>>> slow. I only have right now 100 rows with 1000 

Re: Lease exception when I execute large scan with filters.

2014-04-12 Thread Michael Segel
Since you asked… 

Simplest answer… your schema should not rely upon internal features of the 
system.  Since you are tracking your data along the lines of a temporal 
attribute it should be part of the schema. In terms of a good design, by making 
it a part of the schema, you’re defining that the data has a temporal 
property/attribute. 

Cell versioning is an internal feature of HBase. Its there for a reason. 
Perhaps one of the committers should expand on why its there.  (When I asked 
this earlier, never got an answer. ) 


Longer answer… review how HBase stores the rows, including the versions of the 
cell. 
You’re putting an unnecessary stress on the system. 

Its just not Zen… ;-) 

The reason I’m a bit short on this topic is that its an issue that keeps coming 
up, over and over again because some idiot keeps looking to take a shortcut 
without understanding the implications of their decision. Just like salting the 
key. (Note:  prepending a truncated hash isn’t the same as using a salt.  
Salting has a specific meaning and the salt is orthogonal to the underlying 
key. Any relationship between the salt and the key is purely random luck.) 

Does that help? 
(BTW, this should be part of any schema design talk… yet somehow I think its 
not covered… ) 

-Mike

PS. Its not weird that the cell versions are checked. It makes perfect sense. 

On Apr 12, 2014, at 2:55 PM, Guillermo Ortiz  wrote:

> Well, It was just a example why I could keep a thousand versions or a cell.
> I didn't know that HBase was checking each version when I do a scan, it's a
> little weird when data is sorted.
> 
> You get my attention with your comment, that it's better to store data over
> time with new columns that with versions. Why is it better?
> Versions looks that there're very convenient for that use case. So, does it
> work better a rowkey with 3600 columns, that a rowkey with a column with
> 3600 versions? What's the reason for avoiding a massive use of versions?
> 
> 
> 2014-04-12 15:07 GMT+02:00 Michael Segel :
> 
>> Silly question...
>> 
>> Why does the idea of using versioning to capture temporal changes to data
>> keep being propagated?
>> 
>> Seriously this issue keeps popping up...
>> 
>> If you want to capture data over time... use a timestamp as part of the
>> column name.  Don't abuse the cell's version.
>> 
>> 
>> 
>> On Apr 11, 2014, at 11:03 AM, gortiz  wrote:
>> 
>>> Yes, I have tried with two different values for that value of versions,
>> 1000 and maximum value for integers.
>>> 
>>> But, I want to keep those versions. I don't want to keep just 3
>> versions. Imagine that I want to record a new version each minute and store
>> a day, those are 1440 versions.
>>> 
>>> Why is HBase going to read all the versions?? , I thought, if you don't
>> indicate any versions it's just read the newest and skip the rest. It
>> doesn't make too much sense to read all of them if data is sorted, plus the
>> newest version is stored in the top.
>>> 
>>> 
>>> On 11/04/14 11:54, Anoop John wrote:
 What is the max version setting u have done for ur table cf?  When u set
 some a value, HBase has to keep all those versions.  During a scan it
>> will
 read all those versions. In 94 version the default value for the max
 versions is 3.  I guess you have set some bigger value.   If u have not,
 mind testing after a major compaction?
 
 -Anoop-
 
 On Fri, Apr 11, 2014 at 1:01 PM, gortiz  wrote:
 
> Last test I have done it's to reduce the number of versions to 100.
> So, right now, I have 100 rows with 100 versions each one.
> Times are: (I got the same times for blocksize of 64Ks and 1Mb)
> 100row-1000versions + blockcache-> 80s.
> 100row-1000versions + No blockcache-> 70s.
> 
> 100row-*100*versions + blockcache-> 7.3s.
> 100row-*100*versions + No blockcache-> 6.1s.
> 
> What's the reasons of this? I guess HBase is enough smart for not
>> consider
> old versions, so, it just checks the newest. But, I reduce 10 times the
> size (in versions) and I got a 10x of performance.
> 
> The filter is scan 'filters', {FILTER => "ValueFilter(=,
> 'binary:5')",STARTROW => '10100101',
> STOPROW => '60100201'}
> 
> 
> 
> On 11/04/14 09:04, gortiz wrote:
> 
>> Well, I guessed that, what it doesn't make too much sense because
>> it's so
>> slow. I only have right now 100 rows with 1000 versions each row.
>> I have checked the size of the dataset and each row is about 700Kbytes
>> (around 7Gb, 100rowsx1000versions). So, it should only check 100 rows
>> x
>> 700Kbytes = 70Mb, since it just check the newest version. How can it
>> spend
>> too many time checking this quantity of data?
>> 
>> I'm generating again the dataset with a bigger blocksize (previously
>> was
>> 64Kb, now, it's going to be 1Mb). I could try tunning the scanni

Re: Lease exception when I execute large scan with filters.

2014-04-12 Thread Guillermo Ortiz
Well, It was just a example why I could keep a thousand versions or a cell.
I didn't know that HBase was checking each version when I do a scan, it's a
little weird when data is sorted.

You get my attention with your comment, that it's better to store data over
time with new columns that with versions. Why is it better?
Versions looks that there're very convenient for that use case. So, does it
work better a rowkey with 3600 columns, that a rowkey with a column with
3600 versions? What's the reason for avoiding a massive use of versions?


2014-04-12 15:07 GMT+02:00 Michael Segel :

> Silly question...
>
> Why does the idea of using versioning to capture temporal changes to data
> keep being propagated?
>
> Seriously this issue keeps popping up...
>
> If you want to capture data over time... use a timestamp as part of the
> column name.  Don't abuse the cell's version.
>
>
>
> On Apr 11, 2014, at 11:03 AM, gortiz  wrote:
>
> > Yes, I have tried with two different values for that value of versions,
> 1000 and maximum value for integers.
> >
> > But, I want to keep those versions. I don't want to keep just 3
> versions. Imagine that I want to record a new version each minute and store
> a day, those are 1440 versions.
> >
> > Why is HBase going to read all the versions?? , I thought, if you don't
> indicate any versions it's just read the newest and skip the rest. It
> doesn't make too much sense to read all of them if data is sorted, plus the
> newest version is stored in the top.
> >
> >
> > On 11/04/14 11:54, Anoop John wrote:
> >> What is the max version setting u have done for ur table cf?  When u set
> >> some a value, HBase has to keep all those versions.  During a scan it
> will
> >> read all those versions. In 94 version the default value for the max
> >> versions is 3.  I guess you have set some bigger value.   If u have not,
> >> mind testing after a major compaction?
> >>
> >> -Anoop-
> >>
> >> On Fri, Apr 11, 2014 at 1:01 PM, gortiz  wrote:
> >>
> >>> Last test I have done it's to reduce the number of versions to 100.
> >>> So, right now, I have 100 rows with 100 versions each one.
> >>> Times are: (I got the same times for blocksize of 64Ks and 1Mb)
> >>> 100row-1000versions + blockcache-> 80s.
> >>> 100row-1000versions + No blockcache-> 70s.
> >>>
> >>> 100row-*100*versions + blockcache-> 7.3s.
> >>> 100row-*100*versions + No blockcache-> 6.1s.
> >>>
> >>> What's the reasons of this? I guess HBase is enough smart for not
> consider
> >>> old versions, so, it just checks the newest. But, I reduce 10 times the
> >>> size (in versions) and I got a 10x of performance.
> >>>
> >>> The filter is scan 'filters', {FILTER => "ValueFilter(=,
> >>> 'binary:5')",STARTROW => '10100101',
> >>> STOPROW => '60100201'}
> >>>
> >>>
> >>>
> >>> On 11/04/14 09:04, gortiz wrote:
> >>>
>  Well, I guessed that, what it doesn't make too much sense because
> it's so
>  slow. I only have right now 100 rows with 1000 versions each row.
>  I have checked the size of the dataset and each row is about 700Kbytes
>  (around 7Gb, 100rowsx1000versions). So, it should only check 100 rows
> x
>  700Kbytes = 70Mb, since it just check the newest version. How can it
> spend
>  too many time checking this quantity of data?
> 
>  I'm generating again the dataset with a bigger blocksize (previously
> was
>  64Kb, now, it's going to be 1Mb). I could try tunning the scanning and
>  baching parameters, but I don't think they're going to affect too
> much.
> 
>  Another test I want to do, it's generate the same dataset with just
>  100versions, It should spend around the same time, right? Or am I
> wrong?
> 
>  On 10/04/14 18:08, Ted Yu wrote:
> 
> > It should be newest version of each value.
> >
> > Cheers
> >
> >
> > On Thu, Apr 10, 2014 at 9:55 AM, gortiz  wrote:
> >
> > Another little question is, when the filter I'm using, Do I check
> all the
> >> versions? or just the newest? Because, I'm wondering if when I do a
> scan
> >> over all the table, I look for the value "5" in all the dataset or
> I'm
> >> just
> >> looking for in one newest version of each value.
> >>
> >>
> >> On 10/04/14 16:52, gortiz wrote:
> >>
> >> I was trying to check the behaviour of HBase. The cluster is a
> group of
> >>> old computers, one master, five slaves, each one with 2Gb, so,
> 12gb in
> >>> total.
> >>> The table has a column family with 1000 columns and each column
> with
> >>> 100
> >>> versions.
> >>> There's another column faimily with four columns an one image of
> 100kb.
> >>>   (I've tried without this column family as well.)
> >>> The table is partitioned manually in all the slaves, so data are
> >>> balanced
> >>> in the cluster.
> >>>
> >>> I'm executing this sentence *scan 'table1', {FILTER =>
> "Va

Re: Lease exception when I execute large scan with filters.

2014-04-12 Thread Michael Segel
Silly question… 

Why does the idea of using versioning to capture temporal changes to data keep 
being propagated? 

Seriously this issue keeps popping up… 

If you want to capture data over time… use a timestamp as part of the column 
name.  Don’t abuse the cell’s version.



On Apr 11, 2014, at 11:03 AM, gortiz  wrote:

> Yes, I have tried with two different values for that value of versions, 1000 
> and maximum value for integers.
> 
> But, I want to keep those versions. I don't want to keep just 3 versions. 
> Imagine that I want to record a new version each minute and store a day, 
> those are 1440 versions.
> 
> Why is HBase going to read all the versions?? , I thought, if you don't 
> indicate any versions it's just read the newest and skip the rest. It doesn't 
> make too much sense to read all of them if data is sorted, plus the newest 
> version is stored in the top.
> 
> 
> On 11/04/14 11:54, Anoop John wrote:
>> What is the max version setting u have done for ur table cf?  When u set
>> some a value, HBase has to keep all those versions.  During a scan it will
>> read all those versions. In 94 version the default value for the max
>> versions is 3.  I guess you have set some bigger value.   If u have not,
>> mind testing after a major compaction?
>> 
>> -Anoop-
>> 
>> On Fri, Apr 11, 2014 at 1:01 PM, gortiz  wrote:
>> 
>>> Last test I have done it's to reduce the number of versions to 100.
>>> So, right now, I have 100 rows with 100 versions each one.
>>> Times are: (I got the same times for blocksize of 64Ks and 1Mb)
>>> 100row-1000versions + blockcache-> 80s.
>>> 100row-1000versions + No blockcache-> 70s.
>>> 
>>> 100row-*100*versions + blockcache-> 7.3s.
>>> 100row-*100*versions + No blockcache-> 6.1s.
>>> 
>>> What's the reasons of this? I guess HBase is enough smart for not consider
>>> old versions, so, it just checks the newest. But, I reduce 10 times the
>>> size (in versions) and I got a 10x of performance.
>>> 
>>> The filter is scan 'filters', {FILTER => "ValueFilter(=,
>>> 'binary:5')",STARTROW => '10100101',
>>> STOPROW => '60100201'}
>>> 
>>> 
>>> 
>>> On 11/04/14 09:04, gortiz wrote:
>>> 
 Well, I guessed that, what it doesn't make too much sense because it's so
 slow. I only have right now 100 rows with 1000 versions each row.
 I have checked the size of the dataset and each row is about 700Kbytes
 (around 7Gb, 100rowsx1000versions). So, it should only check 100 rows x
 700Kbytes = 70Mb, since it just check the newest version. How can it spend
 too many time checking this quantity of data?
 
 I'm generating again the dataset with a bigger blocksize (previously was
 64Kb, now, it's going to be 1Mb). I could try tunning the scanning and
 baching parameters, but I don't think they're going to affect too much.
 
 Another test I want to do, it's generate the same dataset with just
 100versions, It should spend around the same time, right? Or am I wrong?
 
 On 10/04/14 18:08, Ted Yu wrote:
 
> It should be newest version of each value.
> 
> Cheers
> 
> 
> On Thu, Apr 10, 2014 at 9:55 AM, gortiz  wrote:
> 
> Another little question is, when the filter I'm using, Do I check all the
>> versions? or just the newest? Because, I'm wondering if when I do a scan
>> over all the table, I look for the value "5" in all the dataset or I'm
>> just
>> looking for in one newest version of each value.
>> 
>> 
>> On 10/04/14 16:52, gortiz wrote:
>> 
>> I was trying to check the behaviour of HBase. The cluster is a group of
>>> old computers, one master, five slaves, each one with 2Gb, so, 12gb in
>>> total.
>>> The table has a column family with 1000 columns and each column with
>>> 100
>>> versions.
>>> There's another column faimily with four columns an one image of 100kb.
>>>   (I've tried without this column family as well.)
>>> The table is partitioned manually in all the slaves, so data are
>>> balanced
>>> in the cluster.
>>> 
>>> I'm executing this sentence *scan 'table1', {FILTER => "ValueFilter(=,
>>> 'binary:5')"* in HBase 0.94.6
>>> My time for lease and rpc is three minutes.
>>> Since, it's a full scan of the table, I have been playing with the
>>> BLOCKCACHE as well (just disable and enable, not about the size of
>>> it). I
>>> thought that it was going to have too much calls to the GC. I'm not
>>> sure
>>> about this point.
>>> 
>>> I know that it's not the best way to use HBase, it's just a test. I
>>> think
>>> that it's not working because the hardware isn't enough, although, I
>>> would
>>> like to try some kind of tunning to improve it.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On 10/04/14 14:21, Ted Yu wrote:
>>> 
>>> Can you give us a bit 

Re: Lease exception when I execute large scan with filters.

2014-04-11 Thread Guillermo Ortiz
Okay, thank you, I'll check it this Monday. I didn't know that Scan checks
all the versions.
So, I was checking each column and each version although it just showed me
the newest version because I didn't indicate anything about the VERSIONS
attribute. It makes sense that it takes so long.


2014-04-11 16:57 GMT+02:00 Ted Yu :

> In your previous example:
> scan 'table1', {FILTER => "ValueFilter(=, 'binary:5')"}
>
> there was no expression w.r.t. timestamp. See the following javadoc from
> Scan.java:
>
>  * To only retrieve columns within a specific range of version timestamps,
>
>  * execute {@link #setTimeRange(long, long) setTimeRange}.
>
>  * 
>
>  * To only retrieve columns with a specific timestamp, execute
>
>  * {@link #setTimeStamp(long) setTimestamp}.
>
> You can use one of the above methods to make your scan more selective.
>
>
> ValueFilter#filterKeyValue(Cell) doesn't utilize advanced feature of
> ReturnCode. You can refer to:
>
>
> https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/Filter.ReturnCode.html
>
> You can take a look at SingleColumnValueFilter#filterKeyValue() for example
> of how various ReturnCode's are used to speed up scan.
>
> Cheers
>
>
> On Fri, Apr 11, 2014 at 8:40 AM, Guillermo Ortiz  >wrote:
>
> > I read something interesting about it in HBase TDG.
> >
> > Page 344:
> > The StoreScanner class combines the store files and memstore that the
> > Store instance
> > contains. It is also where the exclusion happens, based on the Bloom
> > filter, or the timestamp. If you are asking for versions that are not
> more
> > than 30 minutes old, for example, you can skip all storage files that are
> > older than one hour: they will not contain anything of interest. See "Key
> > Design" on page 357 for details on the exclusion, and how to make use of
> > it.
> >
> > So, I guess that it doesn't have to read all the HFiles?? But, I don't
> know
> > if HBase really uses the timestamp of each row or the date of the file. I
> > guess when I execute the scan, it reads everything, but, I don't know
> why.
> > I think there's something else that I don't see so that everything works
> to
> > me.
> >
> >
> > 2014-04-11 13:05 GMT+02:00 gortiz :
> >
> > > Sorry, I didn't get it why it should read all the timestamps and not
> just
> > > the newest it they're sorted and you didn't specific any timestamp in
> > your
> > > filter.
> > >
> > >
> > >
> > > On 11/04/14 12:13, Anoop John wrote:
> > >
> > >> In the storage layer (HFiles in HDFS) all versions of a particular
> cell
> > >> will be staying together.  (Yes it has to be lexicographically ordered
> > >> KVs). So during a scan we will have to read all the version data.  At
> > this
> > >> storage layer it doesn't know the versions stuff etc.
> > >>
> > >> -Anoop-
> > >>
> > >> On Fri, Apr 11, 2014 at 3:33 PM, gortiz  wrote:
> > >>
> > >>  Yes, I have tried with two different values for that value of
> versions,
> > >>> 1000 and maximum value for integers.
> > >>>
> > >>> But, I want to keep those versions. I don't want to keep just 3
> > versions.
> > >>> Imagine that I want to record a new version each minute and store a
> > day,
> > >>> those are 1440 versions.
> > >>>
> > >>> Why is HBase going to read all the versions?? , I thought, if you
> don't
> > >>> indicate any versions it's just read the newest and skip the rest. It
> > >>> doesn't make too much sense to read all of them if data is sorted,
> plus
> > >>> the
> > >>> newest version is stored in the top.
> > >>>
> > >>>
> > >>>
> > >>> On 11/04/14 11:54, Anoop John wrote:
> > >>>
> > >>>What is the max version setting u have done for ur table cf?
>  When u
> >  set
> >  some a value, HBase has to keep all those versions.  During a scan
> it
> >  will
> >  read all those versions. In 94 version the default value for the max
> >  versions is 3.  I guess you have set some bigger value.   If u have
> > not,
> >  mind testing after a major compaction?
> > 
> >  -Anoop-
> > 
> >  On Fri, Apr 11, 2014 at 1:01 PM, gortiz  wrote:
> > 
> >    Last test I have done it's to reduce the number of versions to
> 100.
> > 
> > > So, right now, I have 100 rows with 100 versions each one.
> > > Times are: (I got the same times for blocksize of 64Ks and 1Mb)
> > > 100row-1000versions + blockcache-> 80s.
> > > 100row-1000versions + No blockcache-> 70s.
> > >
> > > 100row-*100*versions + blockcache-> 7.3s.
> > > 100row-*100*versions + No blockcache-> 6.1s.
> > >
> > > What's the reasons of this? I guess HBase is enough smart for not
> > > consider
> > > old versions, so, it just checks the newest. But, I reduce 10 times
> > the
> > > size (in versions) and I got a 10x of performance.
> > >
> > > The filter is scan 'filters', {FILTER => "ValueFilter(=,
> > > 'binary:5')",STARTROW =>
> '10100101',
> > > STOPROW => '601000

Re: Lease exception when I execute large scan with filters.

2014-04-11 Thread Ted Yu
In your previous example:
scan 'table1', {FILTER => "ValueFilter(=, 'binary:5')"}

there was no expression w.r.t. timestamp. See the following javadoc from
Scan.java:

 * To only retrieve columns within a specific range of version timestamps,

 * execute {@link #setTimeRange(long, long) setTimeRange}.

 * 

 * To only retrieve columns with a specific timestamp, execute

 * {@link #setTimeStamp(long) setTimestamp}.

You can use one of the above methods to make your scan more selective.


ValueFilter#filterKeyValue(Cell) doesn't utilize advanced feature of
ReturnCode. You can refer to:

https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/Filter.ReturnCode.html

You can take a look at SingleColumnValueFilter#filterKeyValue() for example
of how various ReturnCode's are used to speed up scan.

Cheers


On Fri, Apr 11, 2014 at 8:40 AM, Guillermo Ortiz wrote:

> I read something interesting about it in HBase TDG.
>
> Page 344:
> The StoreScanner class combines the store files and memstore that the
> Store instance
> contains. It is also where the exclusion happens, based on the Bloom
> filter, or the timestamp. If you are asking for versions that are not more
> than 30 minutes old, for example, you can skip all storage files that are
> older than one hour: they will not contain anything of interest. See "Key
> Design" on page 357 for details on the exclusion, and how to make use of
> it.
>
> So, I guess that it doesn't have to read all the HFiles?? But, I don't know
> if HBase really uses the timestamp of each row or the date of the file. I
> guess when I execute the scan, it reads everything, but, I don't know why.
> I think there's something else that I don't see so that everything works to
> me.
>
>
> 2014-04-11 13:05 GMT+02:00 gortiz :
>
> > Sorry, I didn't get it why it should read all the timestamps and not just
> > the newest it they're sorted and you didn't specific any timestamp in
> your
> > filter.
> >
> >
> >
> > On 11/04/14 12:13, Anoop John wrote:
> >
> >> In the storage layer (HFiles in HDFS) all versions of a particular cell
> >> will be staying together.  (Yes it has to be lexicographically ordered
> >> KVs). So during a scan we will have to read all the version data.  At
> this
> >> storage layer it doesn't know the versions stuff etc.
> >>
> >> -Anoop-
> >>
> >> On Fri, Apr 11, 2014 at 3:33 PM, gortiz  wrote:
> >>
> >>  Yes, I have tried with two different values for that value of versions,
> >>> 1000 and maximum value for integers.
> >>>
> >>> But, I want to keep those versions. I don't want to keep just 3
> versions.
> >>> Imagine that I want to record a new version each minute and store a
> day,
> >>> those are 1440 versions.
> >>>
> >>> Why is HBase going to read all the versions?? , I thought, if you don't
> >>> indicate any versions it's just read the newest and skip the rest. It
> >>> doesn't make too much sense to read all of them if data is sorted, plus
> >>> the
> >>> newest version is stored in the top.
> >>>
> >>>
> >>>
> >>> On 11/04/14 11:54, Anoop John wrote:
> >>>
> >>>What is the max version setting u have done for ur table cf?  When u
>  set
>  some a value, HBase has to keep all those versions.  During a scan it
>  will
>  read all those versions. In 94 version the default value for the max
>  versions is 3.  I guess you have set some bigger value.   If u have
> not,
>  mind testing after a major compaction?
> 
>  -Anoop-
> 
>  On Fri, Apr 11, 2014 at 1:01 PM, gortiz  wrote:
> 
>    Last test I have done it's to reduce the number of versions to 100.
> 
> > So, right now, I have 100 rows with 100 versions each one.
> > Times are: (I got the same times for blocksize of 64Ks and 1Mb)
> > 100row-1000versions + blockcache-> 80s.
> > 100row-1000versions + No blockcache-> 70s.
> >
> > 100row-*100*versions + blockcache-> 7.3s.
> > 100row-*100*versions + No blockcache-> 6.1s.
> >
> > What's the reasons of this? I guess HBase is enough smart for not
> > consider
> > old versions, so, it just checks the newest. But, I reduce 10 times
> the
> > size (in versions) and I got a 10x of performance.
> >
> > The filter is scan 'filters', {FILTER => "ValueFilter(=,
> > 'binary:5')",STARTROW => '10100101',
> > STOPROW => '60100201'}
> >
> >
> >
> > On 11/04/14 09:04, gortiz wrote:
> >
> >   Well, I guessed that, what it doesn't make too much sense because
> > it's
> >
> >> so
> >> slow. I only have right now 100 rows with 1000 versions each row.
> >> I have checked the size of the dataset and each row is about
> 700Kbytes
> >> (around 7Gb, 100rowsx1000versions). So, it should only check 100
> rows
> >> x
> >> 700Kbytes = 70Mb, since it just check the newest version. How can it
> >> spend
> >> too many time checking this quantity of data?
> 

Re: Lease exception when I execute large scan with filters.

2014-04-11 Thread Guillermo Ortiz
I read something interesting about it in HBase TDG.

Page 344:
The StoreScanner class combines the store files and memstore that the
Store instance
contains. It is also where the exclusion happens, based on the Bloom
filter, or the timestamp. If you are asking for versions that are not more
than 30 minutes old, for example, you can skip all storage files that are
older than one hour: they will not contain anything of interest. See "Key
Design" on page 357 for details on the exclusion, and how to make use of
it.

So, I guess that it doesn't have to read all the HFiles?? But, I don't know
if HBase really uses the timestamp of each row or the date of the file. I
guess when I execute the scan, it reads everything, but, I don't know why.
I think there's something else that I don't see so that everything works to
me.


2014-04-11 13:05 GMT+02:00 gortiz :

> Sorry, I didn't get it why it should read all the timestamps and not just
> the newest it they're sorted and you didn't specific any timestamp in your
> filter.
>
>
>
> On 11/04/14 12:13, Anoop John wrote:
>
>> In the storage layer (HFiles in HDFS) all versions of a particular cell
>> will be staying together.  (Yes it has to be lexicographically ordered
>> KVs). So during a scan we will have to read all the version data.  At this
>> storage layer it doesn't know the versions stuff etc.
>>
>> -Anoop-
>>
>> On Fri, Apr 11, 2014 at 3:33 PM, gortiz  wrote:
>>
>>  Yes, I have tried with two different values for that value of versions,
>>> 1000 and maximum value for integers.
>>>
>>> But, I want to keep those versions. I don't want to keep just 3 versions.
>>> Imagine that I want to record a new version each minute and store a day,
>>> those are 1440 versions.
>>>
>>> Why is HBase going to read all the versions?? , I thought, if you don't
>>> indicate any versions it's just read the newest and skip the rest. It
>>> doesn't make too much sense to read all of them if data is sorted, plus
>>> the
>>> newest version is stored in the top.
>>>
>>>
>>>
>>> On 11/04/14 11:54, Anoop John wrote:
>>>
>>>What is the max version setting u have done for ur table cf?  When u
 set
 some a value, HBase has to keep all those versions.  During a scan it
 will
 read all those versions. In 94 version the default value for the max
 versions is 3.  I guess you have set some bigger value.   If u have not,
 mind testing after a major compaction?

 -Anoop-

 On Fri, Apr 11, 2014 at 1:01 PM, gortiz  wrote:

   Last test I have done it's to reduce the number of versions to 100.

> So, right now, I have 100 rows with 100 versions each one.
> Times are: (I got the same times for blocksize of 64Ks and 1Mb)
> 100row-1000versions + blockcache-> 80s.
> 100row-1000versions + No blockcache-> 70s.
>
> 100row-*100*versions + blockcache-> 7.3s.
> 100row-*100*versions + No blockcache-> 6.1s.
>
> What's the reasons of this? I guess HBase is enough smart for not
> consider
> old versions, so, it just checks the newest. But, I reduce 10 times the
> size (in versions) and I got a 10x of performance.
>
> The filter is scan 'filters', {FILTER => "ValueFilter(=,
> 'binary:5')",STARTROW => '10100101',
> STOPROW => '60100201'}
>
>
>
> On 11/04/14 09:04, gortiz wrote:
>
>   Well, I guessed that, what it doesn't make too much sense because
> it's
>
>> so
>> slow. I only have right now 100 rows with 1000 versions each row.
>> I have checked the size of the dataset and each row is about 700Kbytes
>> (around 7Gb, 100rowsx1000versions). So, it should only check 100 rows
>> x
>> 700Kbytes = 70Mb, since it just check the newest version. How can it
>> spend
>> too many time checking this quantity of data?
>>
>> I'm generating again the dataset with a bigger blocksize (previously
>> was
>> 64Kb, now, it's going to be 1Mb). I could try tunning the scanning and
>> baching parameters, but I don't think they're going to affect too
>> much.
>>
>> Another test I want to do, it's generate the same dataset with just
>> 100versions, It should spend around the same time, right? Or am I
>> wrong?
>>
>> On 10/04/14 18:08, Ted Yu wrote:
>>
>>   It should be newest version of each value.
>>
>>> Cheers
>>>
>>>
>>> On Thu, Apr 10, 2014 at 9:55 AM, gortiz  wrote:
>>>
>>> Another little question is, when the filter I'm using, Do I check all
>>> the
>>>
>>>versions? or just the newest? Because, I'm wondering if when I do
 a
 scan
 over all the table, I look for the value "5" in all the dataset or
 I'm
 just
 looking for in one newest version of each value.


 On 10/04/14 16:52, gortiz wrote:

 I was trying to che

Re: Lease exception when I execute large scan with filters.

2014-04-11 Thread gortiz
Sorry, I didn't get it why it should read all the timestamps and not 
just the newest it they're sorted and you didn't specific any timestamp 
in your filter.



On 11/04/14 12:13, Anoop John wrote:

In the storage layer (HFiles in HDFS) all versions of a particular cell
will be staying together.  (Yes it has to be lexicographically ordered
KVs). So during a scan we will have to read all the version data.  At this
storage layer it doesn't know the versions stuff etc.

-Anoop-

On Fri, Apr 11, 2014 at 3:33 PM, gortiz  wrote:


Yes, I have tried with two different values for that value of versions,
1000 and maximum value for integers.

But, I want to keep those versions. I don't want to keep just 3 versions.
Imagine that I want to record a new version each minute and store a day,
those are 1440 versions.

Why is HBase going to read all the versions?? , I thought, if you don't
indicate any versions it's just read the newest and skip the rest. It
doesn't make too much sense to read all of them if data is sorted, plus the
newest version is stored in the top.



On 11/04/14 11:54, Anoop John wrote:


  What is the max version setting u have done for ur table cf?  When u set
some a value, HBase has to keep all those versions.  During a scan it will
read all those versions. In 94 version the default value for the max
versions is 3.  I guess you have set some bigger value.   If u have not,
mind testing after a major compaction?

-Anoop-

On Fri, Apr 11, 2014 at 1:01 PM, gortiz  wrote:

  Last test I have done it's to reduce the number of versions to 100.

So, right now, I have 100 rows with 100 versions each one.
Times are: (I got the same times for blocksize of 64Ks and 1Mb)
100row-1000versions + blockcache-> 80s.
100row-1000versions + No blockcache-> 70s.

100row-*100*versions + blockcache-> 7.3s.
100row-*100*versions + No blockcache-> 6.1s.

What's the reasons of this? I guess HBase is enough smart for not
consider
old versions, so, it just checks the newest. But, I reduce 10 times the
size (in versions) and I got a 10x of performance.

The filter is scan 'filters', {FILTER => "ValueFilter(=,
'binary:5')",STARTROW => '10100101',
STOPROW => '60100201'}



On 11/04/14 09:04, gortiz wrote:

  Well, I guessed that, what it doesn't make too much sense because it's

so
slow. I only have right now 100 rows with 1000 versions each row.
I have checked the size of the dataset and each row is about 700Kbytes
(around 7Gb, 100rowsx1000versions). So, it should only check 100 rows x
700Kbytes = 70Mb, since it just check the newest version. How can it
spend
too many time checking this quantity of data?

I'm generating again the dataset with a bigger blocksize (previously was
64Kb, now, it's going to be 1Mb). I could try tunning the scanning and
baching parameters, but I don't think they're going to affect too much.

Another test I want to do, it's generate the same dataset with just
100versions, It should spend around the same time, right? Or am I wrong?

On 10/04/14 18:08, Ted Yu wrote:

  It should be newest version of each value.

Cheers


On Thu, Apr 10, 2014 at 9:55 AM, gortiz  wrote:

Another little question is, when the filter I'm using, Do I check all
the


  versions? or just the newest? Because, I'm wondering if when I do a
scan
over all the table, I look for the value "5" in all the dataset or I'm
just
looking for in one newest version of each value.


On 10/04/14 16:52, gortiz wrote:

I was trying to check the behaviour of HBase. The cluster is a group
of


old computers, one master, five slaves, each one with 2Gb, so, 12gb
in
total.
The table has a column family with 1000 columns and each column with
100
versions.
There's another column faimily with four columns an one image of
100kb.
(I've tried without this column family as well.)
The table is partitioned manually in all the slaves, so data are
balanced
in the cluster.

I'm executing this sentence *scan 'table1', {FILTER =>
"ValueFilter(=,
'binary:5')"* in HBase 0.94.6
My time for lease and rpc is three minutes.
Since, it's a full scan of the table, I have been playing with the
BLOCKCACHE as well (just disable and enable, not about the size of
it). I
thought that it was going to have too much calls to the GC. I'm not
sure
about this point.

I know that it's not the best way to use HBase, it's just a test. I
think
that it's not working because the hardware isn't enough, although, I
would
like to try some kind of tunning to improve it.








On 10/04/14 14:21, Ted Yu wrote:

Can you give us a bit more information:


HBase release you're running
What filters are used for the scan

Thanks

On Apr 10, 2014, at 2:36 AM, gortiz  wrote:

I got this error when I execute a full scan with filters about a
table.

Caused by: java.lang.RuntimeException: org.apache.hadoop.hbase.

regionserver.LeaseException:
org.apache.hadoop.hbase.regionserver.LeaseException: lease
'-4165751462641113359' does not exist
 

Re: Lease exception when I execute large scan with filters.

2014-04-11 Thread Anoop John
In the storage layer (HFiles in HDFS) all versions of a particular cell
will be staying together.  (Yes it has to be lexicographically ordered
KVs). So during a scan we will have to read all the version data.  At this
storage layer it doesn't know the versions stuff etc.

-Anoop-

On Fri, Apr 11, 2014 at 3:33 PM, gortiz  wrote:

> Yes, I have tried with two different values for that value of versions,
> 1000 and maximum value for integers.
>
> But, I want to keep those versions. I don't want to keep just 3 versions.
> Imagine that I want to record a new version each minute and store a day,
> those are 1440 versions.
>
> Why is HBase going to read all the versions?? , I thought, if you don't
> indicate any versions it's just read the newest and skip the rest. It
> doesn't make too much sense to read all of them if data is sorted, plus the
> newest version is stored in the top.
>
>
>
> On 11/04/14 11:54, Anoop John wrote:
>
>>  What is the max version setting u have done for ur table cf?  When u set
>> some a value, HBase has to keep all those versions.  During a scan it will
>> read all those versions. In 94 version the default value for the max
>> versions is 3.  I guess you have set some bigger value.   If u have not,
>> mind testing after a major compaction?
>>
>> -Anoop-
>>
>> On Fri, Apr 11, 2014 at 1:01 PM, gortiz  wrote:
>>
>>  Last test I have done it's to reduce the number of versions to 100.
>>> So, right now, I have 100 rows with 100 versions each one.
>>> Times are: (I got the same times for blocksize of 64Ks and 1Mb)
>>> 100row-1000versions + blockcache-> 80s.
>>> 100row-1000versions + No blockcache-> 70s.
>>>
>>> 100row-*100*versions + blockcache-> 7.3s.
>>> 100row-*100*versions + No blockcache-> 6.1s.
>>>
>>> What's the reasons of this? I guess HBase is enough smart for not
>>> consider
>>> old versions, so, it just checks the newest. But, I reduce 10 times the
>>> size (in versions) and I got a 10x of performance.
>>>
>>> The filter is scan 'filters', {FILTER => "ValueFilter(=,
>>> 'binary:5')",STARTROW => '10100101',
>>> STOPROW => '60100201'}
>>>
>>>
>>>
>>> On 11/04/14 09:04, gortiz wrote:
>>>
>>>  Well, I guessed that, what it doesn't make too much sense because it's
 so
 slow. I only have right now 100 rows with 1000 versions each row.
 I have checked the size of the dataset and each row is about 700Kbytes
 (around 7Gb, 100rowsx1000versions). So, it should only check 100 rows x
 700Kbytes = 70Mb, since it just check the newest version. How can it
 spend
 too many time checking this quantity of data?

 I'm generating again the dataset with a bigger blocksize (previously was
 64Kb, now, it's going to be 1Mb). I could try tunning the scanning and
 baching parameters, but I don't think they're going to affect too much.

 Another test I want to do, it's generate the same dataset with just
 100versions, It should spend around the same time, right? Or am I wrong?

 On 10/04/14 18:08, Ted Yu wrote:

  It should be newest version of each value.
>
> Cheers
>
>
> On Thu, Apr 10, 2014 at 9:55 AM, gortiz  wrote:
>
> Another little question is, when the filter I'm using, Do I check all
> the
>
>>  versions? or just the newest? Because, I'm wondering if when I do a
>> scan
>> over all the table, I look for the value "5" in all the dataset or I'm
>> just
>> looking for in one newest version of each value.
>>
>>
>> On 10/04/14 16:52, gortiz wrote:
>>
>> I was trying to check the behaviour of HBase. The cluster is a group
>> of
>>
>>> old computers, one master, five slaves, each one with 2Gb, so, 12gb
>>> in
>>> total.
>>> The table has a column family with 1000 columns and each column with
>>> 100
>>> versions.
>>> There's another column faimily with four columns an one image of
>>> 100kb.
>>>(I've tried without this column family as well.)
>>> The table is partitioned manually in all the slaves, so data are
>>> balanced
>>> in the cluster.
>>>
>>> I'm executing this sentence *scan 'table1', {FILTER =>
>>> "ValueFilter(=,
>>> 'binary:5')"* in HBase 0.94.6
>>> My time for lease and rpc is three minutes.
>>> Since, it's a full scan of the table, I have been playing with the
>>> BLOCKCACHE as well (just disable and enable, not about the size of
>>> it). I
>>> thought that it was going to have too much calls to the GC. I'm not
>>> sure
>>> about this point.
>>>
>>> I know that it's not the best way to use HBase, it's just a test. I
>>> think
>>> that it's not working because the hardware isn't enough, although, I
>>> would
>>> like to try some kind of tunning to improve it.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 10/04/14 14:21, Ted Yu wro

Re: Lease exception when I execute large scan with filters.

2014-04-11 Thread gortiz
Yes, I have tried with two different values for that value of versions, 
1000 and maximum value for integers.


But, I want to keep those versions. I don't want to keep just 3 
versions. Imagine that I want to record a new version each minute and 
store a day, those are 1440 versions.


Why is HBase going to read all the versions?? , I thought, if you don't 
indicate any versions it's just read the newest and skip the rest. It 
doesn't make too much sense to read all of them if data is sorted, plus 
the newest version is stored in the top.



On 11/04/14 11:54, Anoop John wrote:

What is the max version setting u have done for ur table cf?  When u set
some a value, HBase has to keep all those versions.  During a scan it will
read all those versions. In 94 version the default value for the max
versions is 3.  I guess you have set some bigger value.   If u have not,
mind testing after a major compaction?

-Anoop-

On Fri, Apr 11, 2014 at 1:01 PM, gortiz  wrote:


Last test I have done it's to reduce the number of versions to 100.
So, right now, I have 100 rows with 100 versions each one.
Times are: (I got the same times for blocksize of 64Ks and 1Mb)
100row-1000versions + blockcache-> 80s.
100row-1000versions + No blockcache-> 70s.

100row-*100*versions + blockcache-> 7.3s.
100row-*100*versions + No blockcache-> 6.1s.

What's the reasons of this? I guess HBase is enough smart for not consider
old versions, so, it just checks the newest. But, I reduce 10 times the
size (in versions) and I got a 10x of performance.

The filter is scan 'filters', {FILTER => "ValueFilter(=,
'binary:5')",STARTROW => '10100101',
STOPROW => '60100201'}



On 11/04/14 09:04, gortiz wrote:


Well, I guessed that, what it doesn't make too much sense because it's so
slow. I only have right now 100 rows with 1000 versions each row.
I have checked the size of the dataset and each row is about 700Kbytes
(around 7Gb, 100rowsx1000versions). So, it should only check 100 rows x
700Kbytes = 70Mb, since it just check the newest version. How can it spend
too many time checking this quantity of data?

I'm generating again the dataset with a bigger blocksize (previously was
64Kb, now, it's going to be 1Mb). I could try tunning the scanning and
baching parameters, but I don't think they're going to affect too much.

Another test I want to do, it's generate the same dataset with just
100versions, It should spend around the same time, right? Or am I wrong?

On 10/04/14 18:08, Ted Yu wrote:


It should be newest version of each value.

Cheers


On Thu, Apr 10, 2014 at 9:55 AM, gortiz  wrote:

Another little question is, when the filter I'm using, Do I check all the

versions? or just the newest? Because, I'm wondering if when I do a scan
over all the table, I look for the value "5" in all the dataset or I'm
just
looking for in one newest version of each value.


On 10/04/14 16:52, gortiz wrote:

I was trying to check the behaviour of HBase. The cluster is a group of

old computers, one master, five slaves, each one with 2Gb, so, 12gb in
total.
The table has a column family with 1000 columns and each column with
100
versions.
There's another column faimily with four columns an one image of 100kb.
   (I've tried without this column family as well.)
The table is partitioned manually in all the slaves, so data are
balanced
in the cluster.

I'm executing this sentence *scan 'table1', {FILTER => "ValueFilter(=,
'binary:5')"* in HBase 0.94.6
My time for lease and rpc is three minutes.
Since, it's a full scan of the table, I have been playing with the
BLOCKCACHE as well (just disable and enable, not about the size of
it). I
thought that it was going to have too much calls to the GC. I'm not
sure
about this point.

I know that it's not the best way to use HBase, it's just a test. I
think
that it's not working because the hardware isn't enough, although, I
would
like to try some kind of tunning to improve it.








On 10/04/14 14:21, Ted Yu wrote:

Can you give us a bit more information:

HBase release you're running
What filters are used for the scan

Thanks

On Apr 10, 2014, at 2:36 AM, gortiz  wrote:

   I got this error when I execute a full scan with filters about a
table.


Caused by: java.lang.RuntimeException: org.apache.hadoop.hbase.
regionserver.LeaseException:
org.apache.hadoop.hbase.regionserver.LeaseException: lease
'-4165751462641113359' does not exist
  at 
org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231)


  at org.apache.hadoop.hbase.regionserver.HRegionServer.
next(HRegionServer.java:2482)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(
NativeMethodAccessorImpl.java:39)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(
DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at org.apache.hadoop.hbase.ipc.WritableRpcEngin

Re: Lease exception when I execute large scan with filters.

2014-04-11 Thread Anoop John
What is the max version setting u have done for ur table cf?  When u set
some a value, HBase has to keep all those versions.  During a scan it will
read all those versions. In 94 version the default value for the max
versions is 3.  I guess you have set some bigger value.   If u have not,
mind testing after a major compaction?

-Anoop-

On Fri, Apr 11, 2014 at 1:01 PM, gortiz  wrote:

> Last test I have done it's to reduce the number of versions to 100.
> So, right now, I have 100 rows with 100 versions each one.
> Times are: (I got the same times for blocksize of 64Ks and 1Mb)
> 100row-1000versions + blockcache-> 80s.
> 100row-1000versions + No blockcache-> 70s.
>
> 100row-*100*versions + blockcache-> 7.3s.
> 100row-*100*versions + No blockcache-> 6.1s.
>
> What's the reasons of this? I guess HBase is enough smart for not consider
> old versions, so, it just checks the newest. But, I reduce 10 times the
> size (in versions) and I got a 10x of performance.
>
> The filter is scan 'filters', {FILTER => "ValueFilter(=,
> 'binary:5')",STARTROW => '10100101',
> STOPROW => '60100201'}
>
>
>
> On 11/04/14 09:04, gortiz wrote:
>
>> Well, I guessed that, what it doesn't make too much sense because it's so
>> slow. I only have right now 100 rows with 1000 versions each row.
>> I have checked the size of the dataset and each row is about 700Kbytes
>> (around 7Gb, 100rowsx1000versions). So, it should only check 100 rows x
>> 700Kbytes = 70Mb, since it just check the newest version. How can it spend
>> too many time checking this quantity of data?
>>
>> I'm generating again the dataset with a bigger blocksize (previously was
>> 64Kb, now, it's going to be 1Mb). I could try tunning the scanning and
>> baching parameters, but I don't think they're going to affect too much.
>>
>> Another test I want to do, it's generate the same dataset with just
>> 100versions, It should spend around the same time, right? Or am I wrong?
>>
>> On 10/04/14 18:08, Ted Yu wrote:
>>
>>> It should be newest version of each value.
>>>
>>> Cheers
>>>
>>>
>>> On Thu, Apr 10, 2014 at 9:55 AM, gortiz  wrote:
>>>
>>> Another little question is, when the filter I'm using, Do I check all the
 versions? or just the newest? Because, I'm wondering if when I do a scan
 over all the table, I look for the value "5" in all the dataset or I'm
 just
 looking for in one newest version of each value.


 On 10/04/14 16:52, gortiz wrote:

 I was trying to check the behaviour of HBase. The cluster is a group of
> old computers, one master, five slaves, each one with 2Gb, so, 12gb in
> total.
> The table has a column family with 1000 columns and each column with
> 100
> versions.
> There's another column faimily with four columns an one image of 100kb.
>   (I've tried without this column family as well.)
> The table is partitioned manually in all the slaves, so data are
> balanced
> in the cluster.
>
> I'm executing this sentence *scan 'table1', {FILTER => "ValueFilter(=,
> 'binary:5')"* in HBase 0.94.6
> My time for lease and rpc is three minutes.
> Since, it's a full scan of the table, I have been playing with the
> BLOCKCACHE as well (just disable and enable, not about the size of
> it). I
> thought that it was going to have too much calls to the GC. I'm not
> sure
> about this point.
>
> I know that it's not the best way to use HBase, it's just a test. I
> think
> that it's not working because the hardware isn't enough, although, I
> would
> like to try some kind of tunning to improve it.
>
>
>
>
>
>
>
>
> On 10/04/14 14:21, Ted Yu wrote:
>
> Can you give us a bit more information:
>>
>> HBase release you're running
>> What filters are used for the scan
>>
>> Thanks
>>
>> On Apr 10, 2014, at 2:36 AM, gortiz  wrote:
>>
>>   I got this error when I execute a full scan with filters about a
>> table.
>>
>>> Caused by: java.lang.RuntimeException: org.apache.hadoop.hbase.
>>> regionserver.LeaseException:
>>> org.apache.hadoop.hbase.regionserver.LeaseException: lease
>>> '-4165751462641113359' does not exist
>>>  at 
>>> org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231)
>>>
>>>
>>>  at org.apache.hadoop.hbase.regionserver.HRegionServer.
>>> next(HRegionServer.java:2482)
>>>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>  at sun.reflect.NativeMethodAccessorImpl.invoke(
>>> NativeMethodAccessorImpl.java:39)
>>>  at sun.reflect.DelegatingMethodAccessorImpl.invoke(
>>> DelegatingMethodAccessorImpl.java:25)
>>>  at java.lang.reflect.Method.invoke(Method.java:597)
>>>  at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(
>>> WritableRpcEn

Re: Lease exception when I execute large scan with filters.

2014-04-11 Thread gortiz

Last test I have done it's to reduce the number of versions to 100.
So, right now, I have 100 rows with 100 versions each one.
Times are: (I got the same times for blocksize of 64Ks and 1Mb)
100row-1000versions + blockcache-> 80s.
100row-1000versions + No blockcache-> 70s.

100row-*100*versions + blockcache-> 7.3s.
100row-*100*versions + No blockcache-> 6.1s.

What's the reasons of this? I guess HBase is enough smart for not 
consider old versions, so, it just checks the newest. But, I reduce 10 
times the size (in versions) and I got a 10x of performance.


The filter is scan 'filters', {FILTER => "ValueFilter(=, 
'binary:5')",STARTROW => '10100101', 
STOPROW => '60100201'}



On 11/04/14 09:04, gortiz wrote:
Well, I guessed that, what it doesn't make too much sense because it's 
so slow. I only have right now 100 rows with 1000 versions each row.
I have checked the size of the dataset and each row is about 700Kbytes 
(around 7Gb, 100rowsx1000versions). So, it should only check 100 rows 
x 700Kbytes = 70Mb, since it just check the newest version. How can it 
spend too many time checking this quantity of data?


I'm generating again the dataset with a bigger blocksize (previously 
was 64Kb, now, it's going to be 1Mb). I could try tunning the scanning 
and baching parameters, but I don't think they're going to affect too 
much.


Another test I want to do, it's generate the same dataset with just 
100versions, It should spend around the same time, right? Or am I wrong?


On 10/04/14 18:08, Ted Yu wrote:

It should be newest version of each value.

Cheers


On Thu, Apr 10, 2014 at 9:55 AM, gortiz  wrote:

Another little question is, when the filter I'm using, Do I check 
all the
versions? or just the newest? Because, I'm wondering if when I do a 
scan
over all the table, I look for the value "5" in all the dataset or 
I'm just

looking for in one newest version of each value.


On 10/04/14 16:52, gortiz wrote:

I was trying to check the behaviour of HBase. The cluster is a 
group of

old computers, one master, five slaves, each one with 2Gb, so, 12gb in
total.
The table has a column family with 1000 columns and each column 
with 100

versions.
There's another column faimily with four columns an one image of 
100kb.

  (I've tried without this column family as well.)
The table is partitioned manually in all the slaves, so data are 
balanced

in the cluster.

I'm executing this sentence *scan 'table1', {FILTER => "ValueFilter(=,
'binary:5')"* in HBase 0.94.6
My time for lease and rpc is three minutes.
Since, it's a full scan of the table, I have been playing with the
BLOCKCACHE as well (just disable and enable, not about the size of 
it). I
thought that it was going to have too much calls to the GC. I'm not 
sure

about this point.

I know that it's not the best way to use HBase, it's just a test. I 
think
that it's not working because the hardware isn't enough, although, 
I would

like to try some kind of tunning to improve it.








On 10/04/14 14:21, Ted Yu wrote:


Can you give us a bit more information:

HBase release you're running
What filters are used for the scan

Thanks

On Apr 10, 2014, at 2:36 AM, gortiz  wrote:

  I got this error when I execute a full scan with filters about a 
table.
Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hbase.regionserver.LeaseException:

org.apache.hadoop.hbase.regionserver.LeaseException: lease
'-4165751462641113359' does not exist
 at 
org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231) 



 at org.apache.hadoop.hbase.regionserver.HRegionServer.
next(HRegionServer.java:2482)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(
NativeMethodAccessorImpl.java:39)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(
DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(
WritableRpcEngine.java:320)
 at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(
HBaseServer.java:1428)

I have read about increase the lease time and rpc time, but it's not
working.. what else could I try?? The table isn't too big. I have 
been
checking the logs from GC, HMaster and some RegionServers and I 
didn't see
anything weird. I tried as well to try with a couple of caching 
values.





--
*Guillermo Ortiz*
/Big Data Developer/

Telf.: +34 917 680 490
Fax: +34 913 833 301
C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain

_http://www.bidoop.es_







--
*Guillermo Ortiz*
/Big Data Developer/

Telf.: +34 917 680 490
Fax: +34 913 833 301
C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain

_http://www.bidoop.es_



Re: Lease exception when I execute large scan with filters.

2014-04-11 Thread gortiz
Well, I guessed that, what it doesn't make too much sense because it's 
so slow. I only have right now 100 rows with 1000 versions each row.
I have checked the size of the dataset and each row is about 700Kbytes 
(around 7Gb, 100rowsx1000versions). So, it should only check 100 rows x 
700Kbytes = 70Mb, since it just check the newest version. How can it 
spend too many time checking this quantity of data?


I'm generating again the dataset with a bigger blocksize (previously was 
64Kb, now, it's going to be 1Mb). I could try tunning the scanning and 
baching parameters, but I don't think they're going to affect too much.


Another test I want to do, it's generate the same dataset with just 
100versions, It should spend around the same time, right? Or am I wrong?


On 10/04/14 18:08, Ted Yu wrote:

It should be newest version of each value.

Cheers


On Thu, Apr 10, 2014 at 9:55 AM, gortiz  wrote:


Another little question is, when the filter I'm using, Do I check all the
versions? or just the newest? Because, I'm wondering if when I do a scan
over all the table, I look for the value "5" in all the dataset or I'm just
looking for in one newest version of each value.


On 10/04/14 16:52, gortiz wrote:


I was trying to check the behaviour of HBase. The cluster is a group of
old computers, one master, five slaves, each one with 2Gb, so, 12gb in
total.
The table has a column family with 1000 columns and each column with 100
versions.
There's another column faimily with four columns an one image of 100kb.
  (I've tried without this column family as well.)
The table is partitioned manually in all the slaves, so data are balanced
in the cluster.

I'm executing this sentence *scan 'table1', {FILTER => "ValueFilter(=,
'binary:5')"* in HBase 0.94.6
My time for lease and rpc is three minutes.
Since, it's a full scan of the table, I have been playing with the
BLOCKCACHE as well (just disable and enable, not about the size of it). I
thought that it was going to have too much calls to the GC. I'm not sure
about this point.

I know that it's not the best way to use HBase, it's just a test. I think
that it's not working because the hardware isn't enough, although, I would
like to try some kind of tunning to improve it.








On 10/04/14 14:21, Ted Yu wrote:


Can you give us a bit more information:

HBase release you're running
What filters are used for the scan

Thanks

On Apr 10, 2014, at 2:36 AM, gortiz  wrote:

  I got this error when I execute a full scan with filters about a table.

Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hbase.regionserver.LeaseException:
org.apache.hadoop.hbase.regionserver.LeaseException: lease
'-4165751462641113359' does not exist
 at org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231)

 at org.apache.hadoop.hbase.regionserver.HRegionServer.
next(HRegionServer.java:2482)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(
NativeMethodAccessorImpl.java:39)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(
DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(
WritableRpcEngine.java:320)
 at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(
HBaseServer.java:1428)

I have read about increase the lease time and rpc time, but it's not
working.. what else could I try?? The table isn't too big. I have been
checking the logs from GC, HMaster and some RegionServers and I didn't see
anything weird. I tried as well to try with a couple of caching values.




--
*Guillermo Ortiz*
/Big Data Developer/

Telf.: +34 917 680 490
Fax: +34 913 833 301
C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain

_http://www.bidoop.es_




--
*Guillermo Ortiz*
/Big Data Developer/

Telf.: +34 917 680 490
Fax: +34 913 833 301
C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain

_http://www.bidoop.es_



Re: Lease exception when I execute large scan with filters.

2014-04-10 Thread Ted Yu
It should be newest version of each value.

Cheers


On Thu, Apr 10, 2014 at 9:55 AM, gortiz  wrote:

> Another little question is, when the filter I'm using, Do I check all the
> versions? or just the newest? Because, I'm wondering if when I do a scan
> over all the table, I look for the value "5" in all the dataset or I'm just
> looking for in one newest version of each value.
>
>
> On 10/04/14 16:52, gortiz wrote:
>
>> I was trying to check the behaviour of HBase. The cluster is a group of
>> old computers, one master, five slaves, each one with 2Gb, so, 12gb in
>> total.
>> The table has a column family with 1000 columns and each column with 100
>> versions.
>> There's another column faimily with four columns an one image of 100kb.
>>  (I've tried without this column family as well.)
>> The table is partitioned manually in all the slaves, so data are balanced
>> in the cluster.
>>
>> I'm executing this sentence *scan 'table1', {FILTER => "ValueFilter(=,
>> 'binary:5')"* in HBase 0.94.6
>> My time for lease and rpc is three minutes.
>> Since, it's a full scan of the table, I have been playing with the
>> BLOCKCACHE as well (just disable and enable, not about the size of it). I
>> thought that it was going to have too much calls to the GC. I'm not sure
>> about this point.
>>
>> I know that it's not the best way to use HBase, it's just a test. I think
>> that it's not working because the hardware isn't enough, although, I would
>> like to try some kind of tunning to improve it.
>>
>>
>>
>>
>>
>>
>>
>>
>> On 10/04/14 14:21, Ted Yu wrote:
>>
>>> Can you give us a bit more information:
>>>
>>> HBase release you're running
>>> What filters are used for the scan
>>>
>>> Thanks
>>>
>>> On Apr 10, 2014, at 2:36 AM, gortiz  wrote:
>>>
>>>  I got this error when I execute a full scan with filters about a table.

 Caused by: java.lang.RuntimeException: 
 org.apache.hadoop.hbase.regionserver.LeaseException:
 org.apache.hadoop.hbase.regionserver.LeaseException: lease
 '-4165751462641113359' does not exist
 at 
 org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231)

 at org.apache.hadoop.hbase.regionserver.HRegionServer.
 next(HRegionServer.java:2482)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(
 NativeMethodAccessorImpl.java:39)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(
 DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(
 WritableRpcEngine.java:320)
 at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(
 HBaseServer.java:1428)

 I have read about increase the lease time and rpc time, but it's not
 working.. what else could I try?? The table isn't too big. I have been
 checking the logs from GC, HMaster and some RegionServers and I didn't see
 anything weird. I tried as well to try with a couple of caching values.

>>>
>>
>>
>
> --
> *Guillermo Ortiz*
> /Big Data Developer/
>
> Telf.: +34 917 680 490
> Fax: +34 913 833 301
> C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain
>
> _http://www.bidoop.es_
>
>


Re: Lease exception when I execute large scan with filters.

2014-04-10 Thread gortiz
Another little question is, when the filter I'm using, Do I check all 
the versions? or just the newest? Because, I'm wondering if when I do a 
scan over all the table, I look for the value "5" in all the dataset or 
I'm just looking for in one newest version of each value.


On 10/04/14 16:52, gortiz wrote:
I was trying to check the behaviour of HBase. The cluster is a group 
of old computers, one master, five slaves, each one with 2Gb, so, 12gb 
in total.
The table has a column family with 1000 columns and each column with 
100 versions.
There's another column faimily with four columns an one image of 
100kb.  (I've tried without this column family as well.)
The table is partitioned manually in all the slaves, so data are 
balanced in the cluster.


I'm executing this sentence *scan 'table1', {FILTER => "ValueFilter(=, 
'binary:5')"* in HBase 0.94.6

My time for lease and rpc is three minutes.
Since, it's a full scan of the table, I have been playing with the 
BLOCKCACHE as well (just disable and enable, not about the size of 
it). I thought that it was going to have too much calls to the GC. I'm 
not sure about this point.


I know that it's not the best way to use HBase, it's just a test. I 
think that it's not working because the hardware isn't enough, 
although, I would like to try some kind of tunning to improve it.









On 10/04/14 14:21, Ted Yu wrote:

Can you give us a bit more information:

HBase release you're running
What filters are used for the scan

Thanks

On Apr 10, 2014, at 2:36 AM, gortiz  wrote:


I got this error when I execute a full scan with filters about a table.

Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hbase.regionserver.LeaseException: 
org.apache.hadoop.hbase.regionserver.LeaseException: lease 
'-4165751462641113359' does not exist
at 
org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231) 

at 
org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2482)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1428)


I have read about increase the lease time and rpc time, but it's not 
working.. what else could I try?? The table isn't too big. I have 
been checking the logs from GC, HMaster and some RegionServers and I 
didn't see anything weird. I tried as well to try with a couple of 
caching values.






--
*Guillermo Ortiz*
/Big Data Developer/

Telf.: +34 917 680 490
Fax: +34 913 833 301
C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain

_http://www.bidoop.es_



Re: Lease exception when I execute large scan with filters.

2014-04-10 Thread gortiz
I was trying to check the behaviour of HBase. The cluster is a group of 
old computers, one master, five slaves, each one with 2Gb, so, 12gb in 
total.
The table has a column family with 1000 columns and each column with 100 
versions.
There's another column faimily with four columns an one image of 100kb.  
(I've tried without this column family as well.)
The table is partitioned manually in all the slaves, so data are 
balanced in the cluster.


I'm executing this sentence *scan 'table1', {FILTER => "ValueFilter(=, 
'binary:5')"* in HBase 0.94.6

My time for lease and rpc is three minutes.
Since, it's a full scan of the table, I have been playing with the 
BLOCKCACHE as well (just disable and enable, not about the size of it). 
I thought that it was going to have too much calls to the GC. I'm not 
sure about this point.


I know that it's not the best way to use HBase, it's just a test. I 
think that it's not working because the hardware isn't enough, although, 
I would like to try some kind of tunning to improve it.









On 10/04/14 14:21, Ted Yu wrote:

Can you give us a bit more information:

HBase release you're running
What filters are used for the scan

Thanks

On Apr 10, 2014, at 2:36 AM, gortiz  wrote:


I got this error when I execute a full scan with filters about a table.

Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hbase.regionserver.LeaseException: 
org.apache.hadoop.hbase.regionserver.LeaseException: lease 
'-4165751462641113359' does not exist
at org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2482)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1428)

I have read about increase the lease time and rpc time, but it's not working.. 
what else could I try?? The table isn't too big. I have been checking the logs 
from GC, HMaster and some RegionServers and I didn't see anything weird. I 
tried as well to try with a couple of caching values.



--
*Guillermo Ortiz*
/Big Data Developer/

Telf.: +34 917 680 490
Fax: +34 913 833 301
C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain

_http://www.bidoop.es_



Re: Lease exception when I execute large scan with filters.

2014-04-10 Thread Ted Yu
Can you give us a bit more information:

HBase release you're running
What filters are used for the scan

Thanks

On Apr 10, 2014, at 2:36 AM, gortiz  wrote:

> I got this error when I execute a full scan with filters about a table.
> 
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hbase.regionserver.LeaseException: 
> org.apache.hadoop.hbase.regionserver.LeaseException: lease 
> '-4165751462641113359' does not exist
>at org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231)
>at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2482)
>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>at java.lang.reflect.Method.invoke(Method.java:597)
>at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
>at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1428)
> 
> I have read about increase the lease time and rpc time, but it's not 
> working.. what else could I try?? The table isn't too big. I have been 
> checking the logs from GC, HMaster and some RegionServers and I didn't see 
> anything weird. I tried as well to try with a couple of caching values.


Lease exception when I execute large scan with filters.

2014-04-10 Thread gortiz

I got this error when I execute a full scan with filters about a table.

Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hbase.regionserver.LeaseException: 
org.apache.hadoop.hbase.regionserver.LeaseException: lease 
'-4165751462641113359' does not exist
at 
org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2482)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1428)


I have read about increase the lease time and rpc time, but it's not 
working.. what else could I try?? The table isn't too big. I have been 
checking the logs from GC, HMaster and some RegionServers and I didn't 
see anything weird. I tried as well to try with a couple of caching values.


Re: Lease Exception Errors When Running Heavy Map Reduce Job

2013-08-28 Thread Ameya Kanitkar
Any ideas? Anyone?


On Wed, Aug 28, 2013 at 9:36 AM, Ameya Kanitkar  wrote:

> Thanks for your response.
>
> I checked namenode logs and I find following:
>
> 2013-08-28 15:25:24,025 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: recoverLease: recover
> lease [Lease.  Holder:
> DFSClient_hb_rs_smartdeals-hbase14-snc1.snc1,60020,1377700014053_-346895658_25,
> pendingcreates: 1],
> src=/hbase/.logs/smartdeals-hbase14-snc1.snc1,60020,1377700014053-splitting/smartdeals-hbase14-snc1.snc1%2C60020%2C1377700014053.1377700015413
> from client
> DFSClient_hb_rs_smartdeals-hbase14-snc1.snc1,60020,1377700014053_-346895658_25
> 2013-08-28 15:25:24,025 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering
> lease=[Lease.  Holder:
> DFSClient_hb_rs_smartdeals-hbase14-snc1.snc1,60020,1377700014053_-346895658_25,
> pendingcreates: 1],
> src=/hbase/.logs/smartdeals-hbase14-snc1.snc1,60020,1377700014053-splitting/smartdeals-hbase14-snc1.snc1%2C60020%2C1377700014053.1377700015413
> 2013-08-28 15:25:24,025 WARN org.apache.hadoop.hdfs.StateChange: BLOCK*
> internalReleaseLease: All existing blocks are COMPLETE, lease removed, file
> closed.
>
> There are LeaseException errors on namenode as well:
> http://pastebin.com/4feVcL1F Not sure why its happening.
>
> I do not think I am ending up with any timeouts, as my jobs fail within
> couple of minutes, while all my time outs are 10 minutes+
> Not sure why above would
>
> Ameya
>
>
>
> On Wed, Aug 28, 2013 at 9:00 AM, Ted Yu  wrote:
>
>> From the log you posted on pastebin, I see the following.
>> Can you check namenode log to see what went wrong ?
>>
>>
>>1. Caused by:
>>org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease
>> on
>>
>>  
>> /hbase/.logs/smartdeals-hbase14-snc1.snc1,60020,1376944419197/smartdeals-hbase14-snc1.snc1%2C60020%2C1376944419197.1377699297514
>>File does not exist. [Lease.  Holder:
>>
>>  
>> DFSClient_hb_rs_smartdeals-hbase14-snc1.snc1,60020,1376944419197_-413917755_25,
>>pendingcreates: 1]
>>
>>
>>
>> On Wed, Aug 28, 2013 at 8:00 AM, Ameya Kanitkar 
>> wrote:
>>
>> > HI All,
>> >
>> > We have a very heavy map reduce job that goes over entire table with
>> over
>> > 1TB+ data in HBase and exports all data (Similar to Export job but with
>> > some additional custom code built in) to HDFS.
>> >
>> > However this job is not very stable, and often times we get following
>> error
>> > and job fails:
>> >
>> > org.apache.hadoop.hbase.regionserver.LeaseException:
>> > org.apache.hadoop.hbase.regionserver.LeaseException: lease
>> > '-4456594242606811626' does not exist
>> > at
>> > org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231)
>> > at
>> >
>> org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2429)
>> > at sun.reflect.GeneratedMethodAccessor42.invoke(Unknown Source)
>> > at
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> > at java.lang.reflect.Method.invoke(Method.java:597)
>> > at
>> >
>> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
>> > at
>> >
>> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1400)
>> >
>> > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>> > Method)
>> > at
>> >
>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>> > at
>> >
>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>> > at java.lang.reflect.Constructor.newInstance(Constructor.
>> >
>> >
>> > Here are more detailed logs on the RS: http://pastebin.com/xaHF4ksb
>> >
>> > We have changed following settings in HBase to counter this problem
>> > but issue persists:
>> >
>> > 
>> > 
>> > hbase.regionserver.lease.period
>> > 90
>> > 
>> >
>> > 
>> > 
>> > hbase.rpc.timeout
>> > 90
>> > 
>> >
>> >
>> > We also reduced number of mappers per RS less than available CPU's on
>> the
>> > box.
>> >
>> > We also observed that problem once happens, happens multiple times on
>> > the same RS. All other regions are unaffected. But different RS
>> > observes this problem on different days. There is no particular region
>> > causing this either.
>> >
>> > We are running: 0.94.2 with cdh4.2.0
>> >
>> > Any ideas?
>> >
>> >
>> > Ameya
>> >
>>
>
>


Re: Lease Exception Errors When Running Heavy Map Reduce Job

2013-08-28 Thread Ameya Kanitkar
Thanks for your response.

I checked namenode logs and I find following:

2013-08-28 15:25:24,025 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: recoverLease: recover
lease [Lease.  Holder:
DFSClient_hb_rs_smartdeals-hbase14-snc1.snc1,60020,1377700014053_-346895658_25,
pendingcreates: 1],
src=/hbase/.logs/smartdeals-hbase14-snc1.snc1,60020,1377700014053-splitting/smartdeals-hbase14-snc1.snc1%2C60020%2C1377700014053.1377700015413
from client
DFSClient_hb_rs_smartdeals-hbase14-snc1.snc1,60020,1377700014053_-346895658_25
2013-08-28 15:25:24,025 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering
lease=[Lease.  Holder:
DFSClient_hb_rs_smartdeals-hbase14-snc1.snc1,60020,1377700014053_-346895658_25,
pendingcreates: 1],
src=/hbase/.logs/smartdeals-hbase14-snc1.snc1,60020,1377700014053-splitting/smartdeals-hbase14-snc1.snc1%2C60020%2C1377700014053.1377700015413
2013-08-28 15:25:24,025 WARN org.apache.hadoop.hdfs.StateChange: BLOCK*
internalReleaseLease: All existing blocks are COMPLETE, lease removed, file
closed.

There are LeaseException errors on namenode as well:
http://pastebin.com/4feVcL1F Not sure why its happening.

I do not think I am ending up with any timeouts, as my jobs fail within
couple of minutes, while all my time outs are 10 minutes+
Not sure why above would

Ameya



On Wed, Aug 28, 2013 at 9:00 AM, Ted Yu  wrote:

> From the log you posted on pastebin, I see the following.
> Can you check namenode log to see what went wrong ?
>
>
>1. Caused by:
>org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease
> on
>
>  
> /hbase/.logs/smartdeals-hbase14-snc1.snc1,60020,1376944419197/smartdeals-hbase14-snc1.snc1%2C60020%2C1376944419197.1377699297514
>File does not exist. [Lease.  Holder:
>
>  
> DFSClient_hb_rs_smartdeals-hbase14-snc1.snc1,60020,1376944419197_-413917755_25,
>pendingcreates: 1]
>
>
>
> On Wed, Aug 28, 2013 at 8:00 AM, Ameya Kanitkar  wrote:
>
> > HI All,
> >
> > We have a very heavy map reduce job that goes over entire table with over
> > 1TB+ data in HBase and exports all data (Similar to Export job but with
> > some additional custom code built in) to HDFS.
> >
> > However this job is not very stable, and often times we get following
> error
> > and job fails:
> >
> > org.apache.hadoop.hbase.regionserver.LeaseException:
> > org.apache.hadoop.hbase.regionserver.LeaseException: lease
> > '-4456594242606811626' does not exist
> > at
> > org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2429)
> > at sun.reflect.GeneratedMethodAccessor42.invoke(Unknown Source)
> > at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > at java.lang.reflect.Method.invoke(Method.java:597)
> > at
> >
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
> > at
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1400)
> >
> > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> > Method)
> > at
> >
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> > at
> >
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> > at java.lang.reflect.Constructor.newInstance(Constructor.
> >
> >
> > Here are more detailed logs on the RS: http://pastebin.com/xaHF4ksb
> >
> > We have changed following settings in HBase to counter this problem
> > but issue persists:
> >
> > 
> > 
> > hbase.regionserver.lease.period
> > 90
> > 
> >
> > 
> > 
> > hbase.rpc.timeout
> > 90
> > 
> >
> >
> > We also reduced number of mappers per RS less than available CPU's on the
> > box.
> >
> > We also observed that problem once happens, happens multiple times on
> > the same RS. All other regions are unaffected. But different RS
> > observes this problem on different days. There is no particular region
> > causing this either.
> >
> > We are running: 0.94.2 with cdh4.2.0
> >
> > Any ideas?
> >
> >
> > Ameya
> >
>


Re: Lease Exception Errors When Running Heavy Map Reduce Job

2013-08-28 Thread Ted Yu
>From the log you posted on pastebin, I see the following.
Can you check namenode log to see what went wrong ?


   1. Caused by:
   org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on
   
/hbase/.logs/smartdeals-hbase14-snc1.snc1,60020,1376944419197/smartdeals-hbase14-snc1.snc1%2C60020%2C1376944419197.1377699297514
   File does not exist. [Lease.  Holder:
   
DFSClient_hb_rs_smartdeals-hbase14-snc1.snc1,60020,1376944419197_-413917755_25,
   pendingcreates: 1]



On Wed, Aug 28, 2013 at 8:00 AM, Ameya Kanitkar  wrote:

> HI All,
>
> We have a very heavy map reduce job that goes over entire table with over
> 1TB+ data in HBase and exports all data (Similar to Export job but with
> some additional custom code built in) to HDFS.
>
> However this job is not very stable, and often times we get following error
> and job fails:
>
> org.apache.hadoop.hbase.regionserver.LeaseException:
> org.apache.hadoop.hbase.regionserver.LeaseException: lease
> '-4456594242606811626' does not exist
> at
> org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2429)
> at sun.reflect.GeneratedMethodAccessor42.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
> at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1400)
>
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> at java.lang.reflect.Constructor.newInstance(Constructor.
>
>
> Here are more detailed logs on the RS: http://pastebin.com/xaHF4ksb
>
> We have changed following settings in HBase to counter this problem
> but issue persists:
>
> 
> 
> hbase.regionserver.lease.period
> 90
> 
>
> 
> 
> hbase.rpc.timeout
> 90
> 
>
>
> We also reduced number of mappers per RS less than available CPU's on the
> box.
>
> We also observed that problem once happens, happens multiple times on
> the same RS. All other regions are unaffected. But different RS
> observes this problem on different days. There is no particular region
> causing this either.
>
> We are running: 0.94.2 with cdh4.2.0
>
> Any ideas?
>
>
> Ameya
>


Re: Lease Exception Errors When Running Heavy Map Reduce Job

2013-08-28 Thread Dhaval Shah
Couple of things:
- Can you check the resources on the region server for which you get the lease 
exception? It seems like the server is heavily thrashed
- What are your values for scan.setCaching and scan.setBatch? 



The lease does not exist exception generally happens when the client goes back 
to the region server after the lease expires (in your case 90). If you 
setCaching is really high for example, the client gets enough data in one call 
to scanner.next and keeps processing it for > 90 ms and when it eventually 
goes back to the region server, the lease on the region server has already 
expired. Setting your setCaching value lower might help in this case

Regards,
Dhaval



From: Ameya Kanitkar 
To: user@hbase.apache.org 
Sent: Wednesday, 28 August 2013 11:00 AM
Subject: Lease Exception Errors When Running Heavy Map Reduce Job


HI All,

We have a very heavy map reduce job that goes over entire table with over
1TB+ data in HBase and exports all data (Similar to Export job but with
some additional custom code built in) to HDFS.

However this job is not very stable, and often times we get following error
and job fails:

org.apache.hadoop.hbase.regionserver.LeaseException:
org.apache.hadoop.hbase.regionserver.LeaseException: lease
'-4456594242606811626' does not exist
    at org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231)
    at 
org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2429)
    at sun.reflect.GeneratedMethodAccessor42.invoke(Unknown Source)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at 
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
    at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1400)

    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
    at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
    at java.lang.reflect.Constructor.newInstance(Constructor.


Here are more detailed logs on the RS: http://pastebin.com/xaHF4ksb

We have changed following settings in HBase to counter this problem
but issue persists:



hbase.regionserver.lease.period
90




hbase.rpc.timeout
90



We also reduced number of mappers per RS less than available CPU's on the box.

We also observed that problem once happens, happens multiple times on
the same RS. All other regions are unaffected. But different RS
observes this problem on different days. There is no particular region
causing this either.

We are running: 0.94.2 with cdh4.2.0

Any ideas?


Ameya 


Lease Exception Errors When Running Heavy Map Reduce Job

2013-08-28 Thread Ameya Kanitkar
HI All,

We have a very heavy map reduce job that goes over entire table with over
1TB+ data in HBase and exports all data (Similar to Export job but with
some additional custom code built in) to HDFS.

However this job is not very stable, and often times we get following error
and job fails:

org.apache.hadoop.hbase.regionserver.LeaseException:
org.apache.hadoop.hbase.regionserver.LeaseException: lease
'-4456594242606811626' does not exist
at 
org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2429)
at sun.reflect.GeneratedMethodAccessor42.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1400)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.


Here are more detailed logs on the RS: http://pastebin.com/xaHF4ksb

We have changed following settings in HBase to counter this problem
but issue persists:



hbase.regionserver.lease.period
90




hbase.rpc.timeout
90



We also reduced number of mappers per RS less than available CPU's on the box.

We also observed that problem once happens, happens multiple times on
the same RS. All other regions are unaffected. But different RS
observes this problem on different days. There is no particular region
causing this either.

We are running: 0.94.2 with cdh4.2.0

Any ideas?


Ameya