long running client- missing regions

2015-09-24 Thread Wojciech Indyk
Hello!
I have a problem with long-running HBase-client. I have the
HBase-0.98.6-cdh5.3.1.
After few days of application run I have a bug (?) with a trace as
below. As I debugged my issue I see, that the client try to reach
region, that does not exists. The region has existed in the past,
hoverer has been recently merged or migrated. The client (?) doesn't
refresh information about a region for a record and because of this
cannot get the existing record. Restart of the client-application
helps. Is it a bug or am I missing something in a configuration?

java.net.SocketTimeoutException: callTimeout=100, callDuration=2307:
row 'abcd' on table 'test:REC-DOMAIN-ARTICLE_ID
at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:141)
~[hbase-client-0.98.6-cdh5.3.1.jar:?]
at org.apache.hadoop.hbase.client.HTable.get(HTable.java:781)
~[hbase-client-0.98.6-cdh5.3.1.jar:?]
at 
pl.com.agora.bigdata.recommendation_service.db.hbase.HBaseRecommendationDao.getRecommendations(HBaseRecommendationDao.java:52)
~[HBaseRecommendationDao.class:?]
at 
pl.com.agora.bigdata.recommendation_service.logic.RecommendationEngine.fetchRecommendations(RecommendationEngine.java:552)
[RecommendationEngine.class:?]
at 
pl.com.agora.bigdata.recommendation_service.logic.RecommendationEngine.getRecommendations(RecommendationEngine.java:177)
[RecommendationEngine.class:?]
at 
pl.com.agora.bigdata.recommendation_service.services.RecommendationService.getRecommendation(RecommendationService.java:84)
[RecommendationService.class:?]
at sun.reflect.GeneratedMethodAccessor202.invoke(Unknown Source) ~[?:?]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
~[?:1.8.0_25]
at java.lang.reflect.Method.invoke(Method.java:483) ~[?:1.8.0_25]
at 
org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:221)
[spring-web-4.1.5.RELEASE.jar:4.1.5.RELEASE]
at 
org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:137)
[spring-web-4.1.5.RELEASE.jar:4.1.5.RELEASE]
at 
org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:110)
[spring-webmvc-4.1.5.RELEASE.jar:4.1.5.RELEASE]
at 
org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandleMethod(RequestMappingHandlerAdapter.java:777)
[spring-webmvc-4.1.5.RELEASE.jar:4.1.5.RELEASE]
at 
org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:706)
[spring-webmvc-4.1.5.RELEASE.jar:4.1.5.RELEASE]
at 
org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:85)
[spring-webmvc-4.1.5.RELEASE.jar:4.1.5.RELEASE]
at 
org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:943)
[spring-webmvc-4.1.5.RELEASE.jar:4.1.5.RELEASE]
at 
org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:877)
[spring-webmvc-4.1.5.RELEASE.jar:4.1.5.RELEASE]
at 
org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:966)
[spring-webmvc-4.1.5.RELEASE.jar:4.1.5.RELEASE]
at 
org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:868)
[spring-webmvc-4.1.5.RELEASE.jar:4.1.5.RELEASE]
at javax.servlet.http.HttpServlet.service(HttpServlet.java:644)
[servlet-api.jar:?]
at 
org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:842)
[spring-webmvc-4.1.5.RELEASE.jar:4.1.5.RELEASE]
at javax.servlet.http.HttpServlet.service(HttpServlet.java:725)
[servlet-api.jar:?]
at 
org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:842)
[spring-webmvc-4.1.5.RELEASE.jar:4.1.5.RELEASE]
at javax.servlet.http.HttpServlet.service(HttpServlet.java:725)
[servlet-api.jar:?]
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:291)
[catalina.jar:8.0.20]
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
[catalina.jar:8.0.20]
at 
org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
[tomcat-websocket.jar:8.0.20]
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239)
[catalina.jar:8.0.20]
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
[catalina.jar:8.0.20]
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:219)
[catalina.jar:8.0.20]
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106)
[catalina.jar:8.0.20]
at 

Re: Large number of column qualifiers

2015-09-24 Thread Gaurav Agarwal
After spending more time I realised that my understanding and my question
(was invalid).
I am still trying to get more information regarding the problem and will
update the thread once I have a better handle on the problem.

Apologies for the confusion..

On Thu, Sep 24, 2015 at 10:32 AM, ramkrishna vasudevan <
ramkrishna.s.vasude...@gmail.com> wrote:

> Am not sure whether you have tried it. the scan API has got an API called
> 'batching'. Did you try it?  So per row if there are more columns you can
> still limit the amount of data being sent to the client. I think the main
> issue you are facing is that the qualifiers getting returned are more in
> number and so the client is not able to accept them?
>
> 'Short.MAX_VALUE which is 32,767 bytes.'
> This comment applies for the qualifier length ie. the name that you specify
> for the qualifier not on the number of qualifiers.
>
> Regards
> Ram
>
> On Thu, Sep 24, 2015 at 8:52 AM, Anoop John  wrote:
>
> > >>I have Column Family with very large number of column qualifiers (>
> > 50,000). Each column qualifier is 8 bytes long.
> >
> > When u say u have 5 qualifiers in a CF, means u will have those many
> > cells coming under that CF per row.  So am not getting what is the
> > qualifier length limit as such coming. Per qualifier, you will have a
> diff
> > cell and its qualifier.
> >
> > -Anoop-
> >
> >
> > On Thu, Sep 24, 2015 at 1:13 AM, Vladimir Rodionov <
> vladrodio...@gmail.com
> > >
> > wrote:
> >
> > > Yes, the comment is incorrect.
> > >
> > > hbase.client.keyvalue.maxsize controls max key-value size, but its
> > > unlimited in a master (I was wrong about 1MB, this is probably for
> older
> > > versions of HBase)
> > >
> > >
> > > -Vlad
> > >
> > > On Wed, Sep 23, 2015 at 11:45 AM, Gaurav Agarwal 
> > wrote:
> > >
> > > > Thanks Vlad. Could you please point me the KV size setting (default
> > 1MB)?
> > > > Just to make sure that I understand correct, are you suggesting that
> > the
> > > > following comment is incorrect in Cell.java?
> > > >
> > > >  /**
> > > >* Contiguous raw bytes that may start at any index in the
> containing
> > > > array. Max length is
> > > >* Short.MAX_VALUE which is 32,767 bytes.
> > > >* @return The array containing the qualifier bytes.
> > > >*/
> > > >   byte[] getQualifierArray();
> > > >
> > > > On Thu, Sep 24, 2015 at 12:10 AM, Gaurav Agarwal 
> > > wrote:
> > > >
> > > > > Thanks Vlad. Could you please point me the KV size setting (default
> > > 1MB)?
> > > > > Just to make sure that I understand correct - the following comment
> > is
> > > > > incorrect in Cell.java:
> > > > >
> > > > >  /**
> > > > >* Contiguous raw bytes that may start at any index in the
> > containing
> > > > > array. Max length is
> > > > >* Short.MAX_VALUE which is 32,767 bytes.
> > > > >* @return The array containing the qualifier bytes.
> > > > >*/
> > > > >   byte[] getQualifierArray();
> > > > >
> > > > > On Wed, Sep 23, 2015 at 11:43 PM, Vladimir Rodionov <
> > > > > vladrodio...@gmail.com> wrote:
> > > > >
> > > > >> Check KeyValue class (Cell's implementation). getQualifierArray()
> > > > returns
> > > > >> kv's backing array. There is no SHORT limit on a size of this
> array,
> > > but
> > > > >> there are other limits in  HBase - maximum KV size, for example,
> > which
> > > > is
> > > > >> configurable, but, by default, is 1MB. Having 50K qualifiers is a
> > bad
> > > > >> idea.
> > > > >> Consider redesigning your data model and use rowkey instead.
> > > > >>
> > > > >> -Vlad
> > > > >>
> > > > >> On Wed, Sep 23, 2015 at 10:24 AM, Ted Yu 
> > wrote:
> > > > >>
> > > > >> > Please take a look at HBASE-11544 which is in hbase 1.1
> > > > >> >
> > > > >> > Cheers
> > > > >> >
> > > > >> > On Wed, Sep 23, 2015 at 10:18 AM, Gaurav Agarwal <
> > gau...@arkin.net>
> > > > >> wrote:
> > > > >> >
> > > > >> > > Hi All,
> > > > >> > >
> > > > >> > > I have Column Family with very large number of column
> qualifiers
> > > (>
> > > > >> > > 50,000). Each column qualifier is 8 bytes long. The problem is
> > the
> > > > >> when I
> > > > >> > > do a scan operation to fetch some rows, the client side Cell
> > > object
> > > > >> does
> > > > >> > > not have enough space allocated in it to hold all the
> > > > columnQaulifiers
> > > > >> > for
> > > > >> > > a given row and hence I cannot read all the columns back for a
> > > given
> > > > >> row.
> > > > >> > >
> > > > >> > > Please see the code snippet that I am using:
> > > > >> > >
> > > > >> > >  final ResultScanner rs = htable.getScanner(scan);
> > > > >> > >  for (Result row = rs.next(); row != null; row = rs.next()) {
> > > > >> > > final Cell[] cells = row.rawCells();
> > > > >> > > if (cells != null) {
> > > > >> > > for (final Cell cell : cells) {
> > > > >> > > final long c = Bytes.toLong(
> > > > >> > > 

Re: Large number of column qualifiers

2015-09-24 Thread ramkrishna vasudevan
Hi

In the version that you were using by default the caching was 1000 ( I
believe) need to see the old code.  So in that case it was trying to fetch
1000 rows and each row with 20k cols.  Now when you are saying that the
client was missing rows, did you check the server logs?

Did you get any OutOfOrderScannerException?  There is something called
'client.rpc.timeout' which can be increased in your case - but provided
your caching and batching is adjusted.

In the current trunk code - there is no default caching value (unless
specified), the server tries to fetch 2MB of data and that is sent back to
the client.
In any case I would suggest to check your server logs for any Exceptions.
Increase the timeout property and adjust your caching and batching to fetch
the data.  If still the client is missing out on rows then we need the logs
and analyse things.  Ted's mail referring to
https://issues.apache.org/jira/browse/HBASE-11544 will give an idea of the
general behaviour with scans and how it affects scanning bigger and wider
rows.

Regards
Ram


On Thu, Sep 24, 2015 at 2:32 PM, Gaurav Agarwal  wrote:

> Hi,
>
> The problem that I am actually facing is that when doing a scan over rows
> where each row has very large number of cells (large number of columns),
> the scan API seems to be transparently dropping data - in my case I noticed
> that entire row of data was missing in few cases.
>
> On suggestions from Ram(above), I tried doing *scan.setCaching(1)* and
> optionally,* scan.setBatch(5000)* and the problem got resolved (at least
> for now).  So this indicates that the client (cannot be server I hope) was
> dropping the cells if the number (or maybe bytes) of cells became quite
> large across number of rows cached. Note that in my case, the number of
> bytes per cell is close to 30B (including qualifier,value and timestamp)
> and each row key is close to 20B.
>
> I am not clear what setting controls the maximum number/bytes of cells that
> can be received by the client before this problem surfaces. Can someone
> please point me these settings/code?
>
> On Thu, Sep 24, 2015 at 12:05 PM, Gaurav Agarwal  wrote:
>
> > After spending more time I realised that my understanding and my question
> > (was invalid).
> > I am still trying to get more information regarding the problem and will
> > update the thread once I have a better handle on the problem.
> >
> > Apologies for the confusion..
> >
> > On Thu, Sep 24, 2015 at 10:32 AM, ramkrishna vasudevan <
> > ramkrishna.s.vasude...@gmail.com> wrote:
> >
> >> Am not sure whether you have tried it. the scan API has got an API
> called
> >> 'batching'. Did you try it?  So per row if there are more columns you
> can
> >> still limit the amount of data being sent to the client. I think the
> main
> >> issue you are facing is that the qualifiers getting returned are more in
> >> number and so the client is not able to accept them?
> >>
> >> 'Short.MAX_VALUE which is 32,767 bytes.'
> >> This comment applies for the qualifier length ie. the name that you
> >> specify
> >> for the qualifier not on the number of qualifiers.
> >>
> >> Regards
> >> Ram
> >>
> >> On Thu, Sep 24, 2015 at 8:52 AM, Anoop John 
> >> wrote:
> >>
> >> > >>I have Column Family with very large number of column qualifiers (>
> >> > 50,000). Each column qualifier is 8 bytes long.
> >> >
> >> > When u say u have 5 qualifiers in a CF, means u will have those
> many
> >> > cells coming under that CF per row.  So am not getting what is the
> >> > qualifier length limit as such coming. Per qualifier, you will have a
> >> diff
> >> > cell and its qualifier.
> >> >
> >> > -Anoop-
> >> >
> >> >
> >> > On Thu, Sep 24, 2015 at 1:13 AM, Vladimir Rodionov <
> >> vladrodio...@gmail.com
> >> > >
> >> > wrote:
> >> >
> >> > > Yes, the comment is incorrect.
> >> > >
> >> > > hbase.client.keyvalue.maxsize controls max key-value size, but its
> >> > > unlimited in a master (I was wrong about 1MB, this is probably for
> >> older
> >> > > versions of HBase)
> >> > >
> >> > >
> >> > > -Vlad
> >> > >
> >> > > On Wed, Sep 23, 2015 at 11:45 AM, Gaurav Agarwal 
> >> > wrote:
> >> > >
> >> > > > Thanks Vlad. Could you please point me the KV size setting
> (default
> >> > 1MB)?
> >> > > > Just to make sure that I understand correct, are you suggesting
> that
> >> > the
> >> > > > following comment is incorrect in Cell.java?
> >> > > >
> >> > > >  /**
> >> > > >* Contiguous raw bytes that may start at any index in the
> >> containing
> >> > > > array. Max length is
> >> > > >* Short.MAX_VALUE which is 32,767 bytes.
> >> > > >* @return The array containing the qualifier bytes.
> >> > > >*/
> >> > > >   byte[] getQualifierArray();
> >> > > >
> >> > > > On Thu, Sep 24, 2015 at 12:10 AM, Gaurav Agarwal <
> gau...@arkin.net>
> >> > > wrote:
> >> > > >
> >> > > > > Thanks Vlad. Could you please point me the KV size setting
> >> (default
> >> 

Re: HBase Filter Problem

2015-09-24 Thread ramkrishna vasudevan
Just trying to understand  more,
you are having a combination of PRefixFilter and SingleColumnValueFilter -
now the column you have specified in the SingleColumnValueFilter -  is it
the only column that you have in your table?  Or is there many other
columns and one such column was used in the SingleColumnValueFilter?

The idea of FirstKeyOnlyFilter is just to skip to the next row on getting
the first ever column in that row.  May be the combination of these two is
causing some issues.

Regards
Ram

On Wed, Sep 23, 2015 at 2:31 PM, donhoff_h <165612...@qq.com> wrote:

> Hi,
>
> There are 90 Million records in the table. And I use the the MUST_PASS_ALL
> for all my filters.  When I use PrefixFilter + SingleColumnValueFilter, it
> returned fast. So I supposed that the combination of PrefixFilter +
> SingleColumnValueFilter + FirstKeyOnlyFilter should be fast. But the fact
> is just in contrast. Do you know the reason that cause it?
>
> Thanks!
>
>
>
> -- 原始邮件 --
> 发件人: "Fulin Sun";;
> 发送时间: 2015年9月23日(星期三) 下午4:53
> 收件人: "HBase User";
>
> 主题: 回复: HBase Filter Problem
>
>
>
> Hi , there
>
> How many rows are there in the hbase table ? You want to achive the
> default FilterList.Operator.MUST_PASS_ALL or
> you just want to use or conditions for these filters ?
>
> I think the reason is that this kind of filter list just go more scan work
> and lower performance.
>
> Best,
> Sun.
>
>
>
>
> CertusNet
>
> 发件人: donhoff_h
> 发送时间: 2015-09-23 16:33
> 收件人: user
> 主题: HBase Filter Problem
> Hi,
>
> I wrote a program which function is to extract some data from a HBase
> table. According to business requirements I had to use the PrefixFilter and
> the SingleColumnValueFilter to filter the data.  The program ran very fast
> and returned in 1 sec.
>
> Considering I just need the rowkey of each record in my final result, I
> tried to improve my program by using the PrefixFilter +
> SingleColumnValueFilter + FirstKeyOnlyFitler. To my surprise the program
> ran very slow this time. It run about 20min and still not finished. So I
> had to kill it.
>
> Does anybody know the reason that cause my program run such slow?  Since I
> set the PrefixFilter as the first filter in the FilterList object, I think
> the program should ran fast.
>
> Many Thanks!
>


Re: Large number of column qualifiers

2015-09-24 Thread Gaurav Agarwal
Hi,

The problem that I am actually facing is that when doing a scan over rows
where each row has very large number of cells (large number of columns),
the scan API seems to be transparently dropping data - in my case I noticed
that entire row of data was missing in few cases.

On suggestions from Ram(above), I tried doing *scan.setCaching(1)* and
optionally,* scan.setBatch(5000)* and the problem got resolved (at least
for now).  So this indicates that the client (cannot be server I hope) was
dropping the cells if the number (or maybe bytes) of cells became quite
large across number of rows cached. Note that in my case, the number of
bytes per cell is close to 30B (including qualifier,value and timestamp)
and each row key is close to 20B.

I am not clear what setting controls the maximum number/bytes of cells that
can be received by the client before this problem surfaces. Can someone
please point me these settings/code?

On Thu, Sep 24, 2015 at 12:05 PM, Gaurav Agarwal  wrote:

> After spending more time I realised that my understanding and my question
> (was invalid).
> I am still trying to get more information regarding the problem and will
> update the thread once I have a better handle on the problem.
>
> Apologies for the confusion..
>
> On Thu, Sep 24, 2015 at 10:32 AM, ramkrishna vasudevan <
> ramkrishna.s.vasude...@gmail.com> wrote:
>
>> Am not sure whether you have tried it. the scan API has got an API called
>> 'batching'. Did you try it?  So per row if there are more columns you can
>> still limit the amount of data being sent to the client. I think the main
>> issue you are facing is that the qualifiers getting returned are more in
>> number and so the client is not able to accept them?
>>
>> 'Short.MAX_VALUE which is 32,767 bytes.'
>> This comment applies for the qualifier length ie. the name that you
>> specify
>> for the qualifier not on the number of qualifiers.
>>
>> Regards
>> Ram
>>
>> On Thu, Sep 24, 2015 at 8:52 AM, Anoop John 
>> wrote:
>>
>> > >>I have Column Family with very large number of column qualifiers (>
>> > 50,000). Each column qualifier is 8 bytes long.
>> >
>> > When u say u have 5 qualifiers in a CF, means u will have those many
>> > cells coming under that CF per row.  So am not getting what is the
>> > qualifier length limit as such coming. Per qualifier, you will have a
>> diff
>> > cell and its qualifier.
>> >
>> > -Anoop-
>> >
>> >
>> > On Thu, Sep 24, 2015 at 1:13 AM, Vladimir Rodionov <
>> vladrodio...@gmail.com
>> > >
>> > wrote:
>> >
>> > > Yes, the comment is incorrect.
>> > >
>> > > hbase.client.keyvalue.maxsize controls max key-value size, but its
>> > > unlimited in a master (I was wrong about 1MB, this is probably for
>> older
>> > > versions of HBase)
>> > >
>> > >
>> > > -Vlad
>> > >
>> > > On Wed, Sep 23, 2015 at 11:45 AM, Gaurav Agarwal 
>> > wrote:
>> > >
>> > > > Thanks Vlad. Could you please point me the KV size setting (default
>> > 1MB)?
>> > > > Just to make sure that I understand correct, are you suggesting that
>> > the
>> > > > following comment is incorrect in Cell.java?
>> > > >
>> > > >  /**
>> > > >* Contiguous raw bytes that may start at any index in the
>> containing
>> > > > array. Max length is
>> > > >* Short.MAX_VALUE which is 32,767 bytes.
>> > > >* @return The array containing the qualifier bytes.
>> > > >*/
>> > > >   byte[] getQualifierArray();
>> > > >
>> > > > On Thu, Sep 24, 2015 at 12:10 AM, Gaurav Agarwal 
>> > > wrote:
>> > > >
>> > > > > Thanks Vlad. Could you please point me the KV size setting
>> (default
>> > > 1MB)?
>> > > > > Just to make sure that I understand correct - the following
>> comment
>> > is
>> > > > > incorrect in Cell.java:
>> > > > >
>> > > > >  /**
>> > > > >* Contiguous raw bytes that may start at any index in the
>> > containing
>> > > > > array. Max length is
>> > > > >* Short.MAX_VALUE which is 32,767 bytes.
>> > > > >* @return The array containing the qualifier bytes.
>> > > > >*/
>> > > > >   byte[] getQualifierArray();
>> > > > >
>> > > > > On Wed, Sep 23, 2015 at 11:43 PM, Vladimir Rodionov <
>> > > > > vladrodio...@gmail.com> wrote:
>> > > > >
>> > > > >> Check KeyValue class (Cell's implementation). getQualifierArray()
>> > > > returns
>> > > > >> kv's backing array. There is no SHORT limit on a size of this
>> array,
>> > > but
>> > > > >> there are other limits in  HBase - maximum KV size, for example,
>> > which
>> > > > is
>> > > > >> configurable, but, by default, is 1MB. Having 50K qualifiers is a
>> > bad
>> > > > >> idea.
>> > > > >> Consider redesigning your data model and use rowkey instead.
>> > > > >>
>> > > > >> -Vlad
>> > > > >>
>> > > > >> On Wed, Sep 23, 2015 at 10:24 AM, Ted Yu 
>> > wrote:
>> > > > >>
>> > > > >> > Please take a look at HBASE-11544 which is in hbase 1.1
>> > > > >> >
>> > > > >> > Cheers
>> > > > >> >
>> > > > >> > On Wed, 

Re: Connection closes prematurely

2015-09-24 Thread Ted Yu
22:28:10,233 DEBUG org.apache.hadoop.hbase.util.ByteStringer
 - Failed to classload HBaseZeroCopyByteString:
java.lang.IllegalAccessError: class com.google.protobuf.HBaseZeroCopyByteString
cannot access its superclass com.google.protobuf.LiteralByteString

Can you check the classpath ?
HBaseZeroCopyByteString is in hbase-protocol module.

Were you running against hadoop-1 or hadoop-2 ?

Cheers

On Thu, Sep 24, 2015 at 5:20 AM, Lydia Ickler 
wrote:

> Hi all,
>
> I am trying to get the HBaseReadExample from Apache Flink to run. I have
> filled a table with the HBaseWriteExample (that works great) and purposely
> split it over 3 regions.
> Now when I try to read from it the first split seems to be scanned (170
> rows) fine and after that the Connections of Zookeeper and RCP are suddenly
> closed down.
>
> Does anyone has an idea why this is happening?
>
> Best regards,
> Lydia
>
>
> 22:28:10,178 DEBUG org.apache.flink.runtime.operators.DataSourceTask
>- Opening input split Locatable Split (2) at [grips5:60020]:
> DataSource (at createInput(ExecutionEnvironment.java:502)
> (org.apache.flink.HBaseReadExample$1)) (1/1)
> 22:28:10,178 INFO  org.apache.flink.addons.hbase.TableInputFormat
>   - opening split [2|[grips5:60020]||-]
> 22:28:10,189 DEBUG org.apache.zookeeper.ClientCnxn - Reading reply
> sessionid:0x24ff6a96ecd000a, packet:: clientPath:null serverPath:null
> finished:false header:: 3,4  replyHeader:: 3,51539607639,0  request::
> '/hbase/meta-region-server,F  response::
> #0001a726567696f6e7365727665723a363030$
> 22:28:10,202 DEBUG org.apache.zookeeper.ClientCnxn - Reading reply
> sessionid:0x24ff6a96ecd000a, packet:: clientPath:null serverPath:null
> finished:false header:: 4,4  replyHeader:: 4,51539607639,0  request::
> '/hbase/meta-region-server,F  response::
> #0001a726567696f6e7365727665723a363030$
> 22:28:10,211 DEBUG LocalActorRefProvider(akka://flink) - resolve of
> path sequence [/temp/$b] failed
> 22:28:10,233 DEBUG org.apache.hadoop.hbase.util.ByteStringer
>- Failed to classload HBaseZeroCopyByteString:
> java.lang.IllegalAccessError: class
> com.google.protobuf.HBaseZeroCopyByteString cannot access its superclass
> com.google.protobuf.LiteralByteString
> 22:28:10,358 DEBUG org.apache.hadoop.ipc.RpcClient - Use SIMPLE
> authentication for service ClientService, sasl=false
> 22:28:10,370 DEBUG org.apache.hadoop.ipc.RpcClient - Connecting to
> grips1/130.73.20.14:60020
> 22:28:10,380 DEBUG org.apache.hadoop.ipc.RpcClient - IPC Client
> (2145423150) connection to grips1/130.73.20.14:60020 from hduser:
> starting, connections 1
> 22:28:10,394 DEBUG org.apache.hadoop.ipc.RpcClient - IPC Client
> (2145423150) connection to grips1/130.73.20.14:60020 from hduser: got
> response header call_id: 0, totalSize: 469 bytes
> 22:28:10,397 DEBUG org.apache.hadoop.ipc.RpcClient - IPC Client
> (2145423150) connection to grips1/130.73.20.14:60020 from hduser: wrote
> request header call_id: 0 method_name: "Get" request_param: true
> 22:28:10,413 DEBUG org.apache.zookeeper.ClientCnxn - Reading reply
> sessionid:0x24ff6a96ecd000a, packet:: clientPath:null serverPath:null
> finished:false header:: 5,4  replyHeader:: 5,51539607639,0  request::
> '/hbase/meta-region-server,F  response::
> #0001a726567696f6e7365727665723a363030$
> 22:28:10,424 DEBUG org.apache.hadoop.ipc.RpcClient - IPC Client
> (2145423150) connection to grips1/130.73.20.14:60020 from hduser: wrote
> request header call_id: 1 method_name: "Scan" request_param: true priority:
> 100
> 22:28:10,426 DEBUG org.apache.hadoop.ipc.RpcClient - IPC Client
> (2145423150) connection to grips1/130.73.20.14:60020 from hduser: got
> response header call_id: 1 cell_block_meta { length: 480 }, totalSize: 497
> bytes
> 22:28:10,432 DEBUG org.apache.hadoop.hbase.client.ClientSmallScanner
>- Finished with small scan at {ENCODED => 1588230740, NAME =>
> 'hbase:meta,,1', STARTKEY => '', ENDKEY => ''}
> 22:28:10,434 DEBUG org.apache.hadoop.ipc.RpcClient - Use SIMPLE
> authentication for service ClientService, sasl=false
> 22:28:10,434 DEBUG org.apache.hadoop.ipc.RpcClient - Connecting to
> grips5/130.73.20.16:60020
> 22:28:10,435 DEBUG org.apache.hadoop.ipc.RpcClient - IPC Client
> (2145423150) connection to grips5/130.73.20.16:60020 from hduser: wrote
> request header call_id: 2 method_name: "Scan" request_param: true
> 22:28:10,436 DEBUG org.apache.hadoop.ipc.RpcClient - IPC Client
> (2145423150) connection to grips5/130.73.20.16:60020 from hduser:
> starting, connections 2
> 22:28:10,437 DEBUG org.apache.hadoop.ipc.RpcClient - IPC Client
> (2145423150) connection to grips5/130.73.20.16:60020 from hduser: got
> response header call_id: 2, totalSize: 12 bytes
> 22:28:10,438 DEBUG org.apache.flink.runtime.operators.DataSourceTask
>- Starting to read 

Re: Exporting a snapshot to external cluster

2015-09-24 Thread Serega Sheypak
Have no Idea, some guys try to use "curl" to determine active NN.
My suggestion is different. You should put remote NN HA configuration in
hdfs-site.xml.

2015-09-24 14:33 GMT+02:00 Akmal Abbasov :

> > add remote cluster HA configuration to your "local" hdfs client
> > configuration
> I am using the following command in script
> $HBASE_PATH/bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot
> -snapshot snapshot-name -copy-to hdfs://remote_hbase_master/hbase
> 
> In this case how I can know which namenode is active?
>
> Thanks!
>
> > On 23 Sep 2015, at 12:14, Serega Sheypak 
> wrote:
> >
> >> 1. to know which of the HDFS namenode is active
> > add remote cluster HA configuration to your "local" hdfs client
> > configuration
> >
> >> Afaik, it should be done through zookeeper, but through which API it
> will
> > be more convenient?
> > no,no,no
> > use hdfs-site.xml configuration.
> > You need to add configuration for remote NN HA and your local hdfs client
> > would correctly resolve active NN.
> >
> > 2015-09-23 11:32 GMT+02:00 Akmal Abbasov :
> >
> >> Hi all,
> >> I would like to know the best practice when exporting a snapshot to
> remote
> >> hbase cluster with ha configuration.
> >> My assumption is:
> >> 1. to know which of the HDFS namenode is active
> >> 2. export snapshot to active namenode
> >>
> >> Since I need to do this programmatically what is the best way to know
> >> which namenode is active?
> >> Afaik, it should be done through zookeeper, but through which API it
> will
> >> be more convenient?
> >>
> >> Thanks.
>
>


Connection closes prematurely

2015-09-24 Thread Lydia Ickler
Hi all,

I am trying to get the HBaseReadExample from Apache Flink to run. I have filled 
a table with the HBaseWriteExample (that works great) and purposely split it 
over 3 regions.
Now when I try to read from it the first split seems to be scanned (170 rows) 
fine and after that the Connections of Zookeeper and RCP are suddenly closed 
down.

Does anyone has an idea why this is happening?

Best regards,
Lydia 


22:28:10,178 DEBUG org.apache.flink.runtime.operators.DataSourceTask
 - Opening input split Locatable Split (2) at [grips5:60020]:  DataSource (at 
createInput(ExecutionEnvironment.java:502) 
(org.apache.flink.HBaseReadExample$1)) (1/1)
22:28:10,178 INFO  org.apache.flink.addons.hbase.TableInputFormat   
 - opening split [2|[grips5:60020]||-]
22:28:10,189 DEBUG org.apache.zookeeper.ClientCnxn - Reading reply 
sessionid:0x24ff6a96ecd000a, packet:: clientPath:null serverPath:null 
finished:false header:: 3,4  replyHeader:: 3,51539607639,0  request:: 
'/hbase/meta-region-server,F  response:: 
#0001a726567696f6e7365727665723a363030$
22:28:10,202 DEBUG org.apache.zookeeper.ClientCnxn - Reading reply 
sessionid:0x24ff6a96ecd000a, packet:: clientPath:null serverPath:null 
finished:false header:: 4,4  replyHeader:: 4,51539607639,0  request:: 
'/hbase/meta-region-server,F  response:: 
#0001a726567696f6e7365727665723a363030$
22:28:10,211 DEBUG LocalActorRefProvider(akka://flink) - resolve of path 
sequence [/temp/$b] failed
22:28:10,233 DEBUG org.apache.hadoop.hbase.util.ByteStringer
 - Failed to classload HBaseZeroCopyByteString: java.lang.IllegalAccessError: 
class com.google.protobuf.HBaseZeroCopyByteString cannot access its superclass 
com.google.protobuf.LiteralByteString
22:28:10,358 DEBUG org.apache.hadoop.ipc.RpcClient - Use SIMPLE 
authentication for service ClientService, sasl=false
22:28:10,370 DEBUG org.apache.hadoop.ipc.RpcClient - Connecting to 
grips1/130.73.20.14:60020
22:28:10,380 DEBUG org.apache.hadoop.ipc.RpcClient - IPC Client 
(2145423150) connection to grips1/130.73.20.14:60020 from hduser: starting, 
connections 1
22:28:10,394 DEBUG org.apache.hadoop.ipc.RpcClient - IPC Client 
(2145423150) connection to grips1/130.73.20.14:60020 from hduser: got response 
header call_id: 0, totalSize: 469 bytes
22:28:10,397 DEBUG org.apache.hadoop.ipc.RpcClient - IPC Client 
(2145423150) connection to grips1/130.73.20.14:60020 from hduser: wrote request 
header call_id: 0 method_name: "Get" request_param: true
22:28:10,413 DEBUG org.apache.zookeeper.ClientCnxn - Reading reply 
sessionid:0x24ff6a96ecd000a, packet:: clientPath:null serverPath:null 
finished:false header:: 5,4  replyHeader:: 5,51539607639,0  request:: 
'/hbase/meta-region-server,F  response:: 
#0001a726567696f6e7365727665723a363030$
22:28:10,424 DEBUG org.apache.hadoop.ipc.RpcClient - IPC Client 
(2145423150) connection to grips1/130.73.20.14:60020 from hduser: wrote request 
header call_id: 1 method_name: "Scan" request_param: true priority: 100
22:28:10,426 DEBUG org.apache.hadoop.ipc.RpcClient - IPC Client 
(2145423150) connection to grips1/130.73.20.14:60020 from hduser: got response 
header call_id: 1 cell_block_meta { length: 480 }, totalSize: 497 bytes
22:28:10,432 DEBUG org.apache.hadoop.hbase.client.ClientSmallScanner
 - Finished with small scan at {ENCODED => 1588230740, NAME => 'hbase:meta,,1', 
STARTKEY => '', ENDKEY => ''}
22:28:10,434 DEBUG org.apache.hadoop.ipc.RpcClient - Use SIMPLE 
authentication for service ClientService, sasl=false
22:28:10,434 DEBUG org.apache.hadoop.ipc.RpcClient - Connecting to 
grips5/130.73.20.16:60020
22:28:10,435 DEBUG org.apache.hadoop.ipc.RpcClient - IPC Client 
(2145423150) connection to grips5/130.73.20.16:60020 from hduser: wrote request 
header call_id: 2 method_name: "Scan" request_param: true
22:28:10,436 DEBUG org.apache.hadoop.ipc.RpcClient - IPC Client 
(2145423150) connection to grips5/130.73.20.16:60020 from hduser: starting, 
connections 2
22:28:10,437 DEBUG org.apache.hadoop.ipc.RpcClient - IPC Client 
(2145423150) connection to grips5/130.73.20.16:60020 from hduser: got response 
header call_id: 2, totalSize: 12 bytes
22:28:10,438 DEBUG org.apache.flink.runtime.operators.DataSourceTask
 - Starting to read input from split Locatable Split (2) at [grips5:60020]:  
DataSource (at createInput(ExecutionEnvironment.java:502) 
(org.apache.flink.HBaseReadExample$1)) (1/1)
22:28:10,438 DEBUG org.apache.hadoop.ipc.RpcClient - IPC Client 
(2145423150) connection to grips5/130.73.20.16:60020 from hduser: wrote request 
header call_id: 3 method_name: "Scan" request_param: true
22:28:10,457 DEBUG org.apache.hadoop.ipc.RpcClient - IPC Client 
(2145423150) connection to grips5/130.73.20.16:60020 from hduser: got response 
header call_id: 3 

Re: Exporting a snapshot to external cluster

2015-09-24 Thread Akmal Abbasov
> add remote cluster HA configuration to your "local" hdfs client
> configuration
I am using the following command in script
$HBASE_PATH/bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot 
snapshot-name -copy-to hdfs://remote_hbase_master/hbase 

In this case how I can know which namenode is active?

Thanks!

> On 23 Sep 2015, at 12:14, Serega Sheypak  wrote:
> 
>> 1. to know which of the HDFS namenode is active
> add remote cluster HA configuration to your "local" hdfs client
> configuration
> 
>> Afaik, it should be done through zookeeper, but through which API it will
> be more convenient?
> no,no,no
> use hdfs-site.xml configuration.
> You need to add configuration for remote NN HA and your local hdfs client
> would correctly resolve active NN.
> 
> 2015-09-23 11:32 GMT+02:00 Akmal Abbasov :
> 
>> Hi all,
>> I would like to know the best practice when exporting a snapshot to remote
>> hbase cluster with ha configuration.
>> My assumption is:
>> 1. to know which of the HDFS namenode is active
>> 2. export snapshot to active namenode
>> 
>> Since I need to do this programmatically what is the best way to know
>> which namenode is active?
>> Afaik, it should be done through zookeeper, but through which API it will
>> be more convenient?
>> 
>> Thanks.



Re: Large number of column qualifiers

2015-09-24 Thread Ted Yu
Gaurav:
Please also check GC activities on the client side.

Here is the reason I brought this to your attention:
HBASE-14177 Full GC on client may lead to missing scan results

Cheers

On Thu, Sep 24, 2015 at 2:13 AM, ramkrishna vasudevan <
ramkrishna.s.vasude...@gmail.com> wrote:

> Hi
>
> In the version that you were using by default the caching was 1000 ( I
> believe) need to see the old code.  So in that case it was trying to fetch
> 1000 rows and each row with 20k cols.  Now when you are saying that the
> client was missing rows, did you check the server logs?
>
> Did you get any OutOfOrderScannerException?  There is something called
> 'client.rpc.timeout' which can be increased in your case - but provided
> your caching and batching is adjusted.
>
> In the current trunk code - there is no default caching value (unless
> specified), the server tries to fetch 2MB of data and that is sent back to
> the client.
> In any case I would suggest to check your server logs for any Exceptions.
> Increase the timeout property and adjust your caching and batching to fetch
> the data.  If still the client is missing out on rows then we need the logs
> and analyse things.  Ted's mail referring to
> https://issues.apache.org/jira/browse/HBASE-11544 will give an idea of the
> general behaviour with scans and how it affects scanning bigger and wider
> rows.
>
> Regards
> Ram
>
>
> On Thu, Sep 24, 2015 at 2:32 PM, Gaurav Agarwal  wrote:
>
> > Hi,
> >
> > The problem that I am actually facing is that when doing a scan over rows
> > where each row has very large number of cells (large number of columns),
> > the scan API seems to be transparently dropping data - in my case I
> noticed
> > that entire row of data was missing in few cases.
> >
> > On suggestions from Ram(above), I tried doing *scan.setCaching(1)* and
> > optionally,* scan.setBatch(5000)* and the problem got resolved (at least
> > for now).  So this indicates that the client (cannot be server I hope)
> was
> > dropping the cells if the number (or maybe bytes) of cells became quite
> > large across number of rows cached. Note that in my case, the number of
> > bytes per cell is close to 30B (including qualifier,value and timestamp)
> > and each row key is close to 20B.
> >
> > I am not clear what setting controls the maximum number/bytes of cells
> that
> > can be received by the client before this problem surfaces. Can someone
> > please point me these settings/code?
> >
> > On Thu, Sep 24, 2015 at 12:05 PM, Gaurav Agarwal 
> wrote:
> >
> > > After spending more time I realised that my understanding and my
> question
> > > (was invalid).
> > > I am still trying to get more information regarding the problem and
> will
> > > update the thread once I have a better handle on the problem.
> > >
> > > Apologies for the confusion..
> > >
> > > On Thu, Sep 24, 2015 at 10:32 AM, ramkrishna vasudevan <
> > > ramkrishna.s.vasude...@gmail.com> wrote:
> > >
> > >> Am not sure whether you have tried it. the scan API has got an API
> > called
> > >> 'batching'. Did you try it?  So per row if there are more columns you
> > can
> > >> still limit the amount of data being sent to the client. I think the
> > main
> > >> issue you are facing is that the qualifiers getting returned are more
> in
> > >> number and so the client is not able to accept them?
> > >>
> > >> 'Short.MAX_VALUE which is 32,767 bytes.'
> > >> This comment applies for the qualifier length ie. the name that you
> > >> specify
> > >> for the qualifier not on the number of qualifiers.
> > >>
> > >> Regards
> > >> Ram
> > >>
> > >> On Thu, Sep 24, 2015 at 8:52 AM, Anoop John 
> > >> wrote:
> > >>
> > >> > >>I have Column Family with very large number of column qualifiers
> (>
> > >> > 50,000). Each column qualifier is 8 bytes long.
> > >> >
> > >> > When u say u have 5 qualifiers in a CF, means u will have those
> > many
> > >> > cells coming under that CF per row.  So am not getting what is the
> > >> > qualifier length limit as such coming. Per qualifier, you will have
> a
> > >> diff
> > >> > cell and its qualifier.
> > >> >
> > >> > -Anoop-
> > >> >
> > >> >
> > >> > On Thu, Sep 24, 2015 at 1:13 AM, Vladimir Rodionov <
> > >> vladrodio...@gmail.com
> > >> > >
> > >> > wrote:
> > >> >
> > >> > > Yes, the comment is incorrect.
> > >> > >
> > >> > > hbase.client.keyvalue.maxsize controls max key-value size, but its
> > >> > > unlimited in a master (I was wrong about 1MB, this is probably for
> > >> older
> > >> > > versions of HBase)
> > >> > >
> > >> > >
> > >> > > -Vlad
> > >> > >
> > >> > > On Wed, Sep 23, 2015 at 11:45 AM, Gaurav Agarwal <
> gau...@arkin.net>
> > >> > wrote:
> > >> > >
> > >> > > > Thanks Vlad. Could you please point me the KV size setting
> > (default
> > >> > 1MB)?
> > >> > > > Just to make sure that I understand correct, are you suggesting
> > that
> > >> > the
> > >> > > > following comment is incorrect in Cell.java?
> > 

Re: Exporting a snapshot to external cluster

2015-09-24 Thread Anil Gupta
Hi Akmal,

It will be better if you use name service value. You will not need to worry 
about which NN is active. I believe you can find that property in Hadoop's 
core-site.xml file. 

Sent from my iPhone

On Sep 24, 2015, at 7:23 AM, Akmal Abbasov  wrote:

>> My suggestion is different. You should put remote NN HA configuration in
>> hdfs-site.xml.
> ok, in case I’ll put it, still how I can determine which of those 2 namenodes 
> is active?
> 
>> On 24 Sep 2015, at 15:56, Serega Sheypak  wrote:
>> 
>> Have no Idea, some guys try to use "curl" to determine active NN.
>> My suggestion is different. You should put remote NN HA configuration in
>> hdfs-site.xml.
>> 
>> 2015-09-24 14:33 GMT+02:00 Akmal Abbasov :
>> 
 add remote cluster HA configuration to your "local" hdfs client
 configuration
>>> I am using the following command in script
>>> $HBASE_PATH/bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot
>>> -snapshot snapshot-name -copy-to hdfs://remote_hbase_master/hbase
>>> 
>>> In this case how I can know which namenode is active?
>>> 
>>> Thanks!
>>> 
> On 23 Sep 2015, at 12:14, Serega Sheypak 
 wrote:
 
> 1. to know which of the HDFS namenode is active
 add remote cluster HA configuration to your "local" hdfs client
 configuration
 
> Afaik, it should be done through zookeeper, but through which API it
>>> will
 be more convenient?
 no,no,no
 use hdfs-site.xml configuration.
 You need to add configuration for remote NN HA and your local hdfs client
 would correctly resolve active NN.
 
 2015-09-23 11:32 GMT+02:00 Akmal Abbasov :
 
> Hi all,
> I would like to know the best practice when exporting a snapshot to
>>> remote
> hbase cluster with ha configuration.
> My assumption is:
> 1. to know which of the HDFS namenode is active
> 2. export snapshot to active namenode
> 
> Since I need to do this programmatically what is the best way to know
> which namenode is active?
> Afaik, it should be done through zookeeper, but through which API it
>>> will
> be more convenient?
> 
> Thanks.
> 


what can I do with create table configuration ?

2015-09-24 Thread Jeesoo Shin
Hello,

I tried to change hbase.hregion.memstore.flush.size for a table but it
didn't work.
(just wanted to see if I can set different memstore size for each table)
create 't1', {NAME => 'cf', CONFIGURATION =>
{'hbase.hregion.memstore.flush.size' => '1048576'}}

What can I set with CONFIGURATION?
Any document listing it?

TIA.


Re: what can I do with create table configuration ?

2015-09-24 Thread Ted Yu
Which release of hbase do you use ?

I used command similar to yours and I got :

hbase(main):005:0> describe 't3'
Table t3 is ENABLED
t3
COLUMN FAMILIES DESCRIPTION
{NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW',
REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE',
MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_
DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false',
BLOCKCACHE => 'true', CONFIGURATION => {'hbase.hregion.memstore.flush.size'
=> '1048576'}}

I use 1.2.0-SNAPSHOT

FYI

On Thu, Sep 24, 2015 at 6:05 AM, Jeesoo Shin  wrote:

> Hello,
>
> I tried to change hbase.hregion.memstore.flush.size for a table but it
> didn't work.
> (just wanted to see if I can set different memstore size for each table)
> create 't1', {NAME => 'cf', CONFIGURATION =>
> {'hbase.hregion.memstore.flush.size' => '1048576'}}
>
> What can I set with CONFIGURATION?
> Any document listing it?
>
> TIA.
>


Prefetching Indexes of HFiles

2015-09-24 Thread Anthony Nguyen
Hi all,

Is there a better way to warm HFile indexes other than scanning through my
datasets? I did see PREFETCH_BLOCKS_ON_OPEN, but the warning that it "is
not a good idea if the data to be preloaded will not fit into the
blockcache" makes me wary. Why would this be a bad idea?

Thanks!


Re: Prefetching Indexes of HFiles

2015-09-24 Thread Ted Yu
I think the option you mentioned is for data blocks.

As for index blocks, please refer to HFile V2 design doc.
Also see this:
https://issues.apache.org/jira/browse/HBASE
-3857?focusedCommentId=13031489=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13031489

Cheers

On Thu, Sep 24, 2015 at 1:16 PM, Anthony Nguyen  wrote:

> Hi all,
>
> Is there a better way to warm HFile indexes other than scanning through my
> datasets? I did see PREFETCH_BLOCKS_ON_OPEN, but the warning that it "is
> not a good idea if the data to be preloaded will not fit into the
> blockcache" makes me wary. Why would this be a bad idea?
>
> Thanks!
>


Re: Timeouts on snapshot restore

2015-09-24 Thread Ted Yu
bq. Excluding datanode RS-1:50010

Was RS-1 the only data node to be excluded in that timeframe ?
Have you run fsck to see if hdfs is healthy ?

Cheers

On Thu, Sep 24, 2015 at 7:47 PM, Alexandre Normand <
alexandre.norm...@opower.com> wrote:

> Hi Ted,
> We'll be upgrading to cdh5 in the coming months but we're unfortunately
> stuck on 0.94.6 at the moment.
>
> The RS logs were empty around the time of the failed snapshot restore
> operation, but the following errors were in the master log.  The node
> 'RS-1' is the only node indicated in the logs. These errors occurred
> throughout the duration of the snapshot_restore operation.
>
> Sep 24, 9:51:41.655 PM INFO org.apache.hadoop.hdfs.DFSClient
> Exception in createBlockOutputStream
> java.io.IOException: Bad connect ack with firstBadLink as [RS-1]:50010
> at
>
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1117)
> at
>
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1040)
> at
>
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:488)
> Sep 24, 9:51:41.664 PM INFO org.apache.hadoop.hdfs.DFSClient
> Abandoning
> BP-1769805853-namenode1-1354129919031:blk_8350736734896383334_327644360
> Sep 24, 9:51:41.678 PM INFO org.apache.hadoop.hdfs.DFSClient
> Excluding datanode RS-1:50010
> Sep 24, 9:52:58.954 PM INFO org.apache.hadoop.hdfs.DFSClient
> Exception in createBlockOutputStream
> java.net.SocketTimeoutException: 75000 millis timeout while waiting for
> channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/hmaster1:59726
> remote=/RS-1:50010]
> at
>
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:165)
> at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:156)
> at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:129)
> at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:117)
> at java.io.FilterInputStream.read(FilterInputStream.java:83)
> at java.io.FilterInputStream.read(FilterInputStream.java:83)
> at
>
> org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:169)
> at
>
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1106)
> at
>
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1040)
> at
>
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:488)
> Sep 24, 9:52:58.956 PM INFO org.apache.hadoop.hdfs.DFSClient
> Abandoning
> BP-1769805853-namenode1-1354129919031:blk_-6817802178798905477_327644519
> Sep 24, 9:52:59.011 PM INFO org.apache.hadoop.hdfs.DFSClient
> Excluding datanode RS-1:50010
> Sep 24, 9:54:22.963 PM WARN
> org.apache.hadoop.hbase.errorhandling.TimeoutExceptionInjector
> Timer already marked completed, ignoring!
> Sep 24, 9:54:22.964 PM ERROR
> org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler
> Failed taking snapshot { ss=** table=** type=DISABLED } due to
> exception:org.apache.hadoop.hbase.errorhandling.TimeoutException: Timeout
> elapsed! Source:Timeout caused Foreign Exception Start:1443145459635,
> End:1443146059635, diff:60, max:60 ms
>
> Thanks!
>
> On Thu, Sep 24, 2015 at 6:34 PM, Ted Yu  wrote:
>
> > 0.94.6 is really old. There have been quite a few bug fixes /
> improvements
> > to snapshot feature since its release.
> >
> > The error happens when SnapshotDescription corresponding to
> > kiji.prod.table.site.DI...
> > was not found by ProcedureCoordinator.
> >
> > bq. does the timeout necessarily mean that the restore failed or could it
> > be still running asynchronously
> >
> > Can you check region server logs around the time TimeoutException was
> > thrown to see which server was the straggler ?
> >
> > Thanks
> >
> > On Thu, Sep 24, 2015 at 5:13 PM, Alexandre Normand <
> > alexandre.norm...@gmail.com> wrote:
> >
> > > Hey,
> > > We're trying to restore a snapshot of a relatively big table (20TB)
> using
> > > hbase 0.94.6-cdh4.5.0 and we're getting timeouts doing so. We increased
> > the
> > > timeout configurations(hbase.snapshot.master.timeoutMillis,
> > > hbase.snapshot.region.timeout, hbase.snapshot.master.timeout.millis) to
> > 10
> > > minutes but we're still experiencing the timeouts. Here's the error and
> > > stack trace (table name obfuscated just because):
> > >
> > > ERROR: org.apache.hadoop.hbase.snapshot.HBaseSnapshotException:
> > > org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot {
> > > ss*-1443136710408 table= type=FLUSH } had an error.
> > > kiji.prod.table.site.DI-1019-1443136710408 not found in proclist []
> > > at
> > >
> >
> org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:360)
> > > at
> > >
> org.apache.hadoop.hbase.master.HMaster.isSnapshotDone(HMaster.java:2075)
> > > at 

Re: Timeouts on snapshot restore

2015-09-24 Thread Alexandre Normand
Hi Ted,
We'll be upgrading to cdh5 in the coming months but we're unfortunately
stuck on 0.94.6 at the moment.

The RS logs were empty around the time of the failed snapshot restore
operation, but the following errors were in the master log.  The node
'RS-1' is the only node indicated in the logs. These errors occurred
throughout the duration of the snapshot_restore operation.

Sep 24, 9:51:41.655 PM INFO org.apache.hadoop.hdfs.DFSClient
Exception in createBlockOutputStream
java.io.IOException: Bad connect ack with firstBadLink as [RS-1]:50010
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1117)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1040)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:488)
Sep 24, 9:51:41.664 PM INFO org.apache.hadoop.hdfs.DFSClient
Abandoning
BP-1769805853-namenode1-1354129919031:blk_8350736734896383334_327644360
Sep 24, 9:51:41.678 PM INFO org.apache.hadoop.hdfs.DFSClient
Excluding datanode RS-1:50010
Sep 24, 9:52:58.954 PM INFO org.apache.hadoop.hdfs.DFSClient
Exception in createBlockOutputStream
java.net.SocketTimeoutException: 75000 millis timeout while waiting for
channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/hmaster1:59726
remote=/RS-1:50010]
at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:165)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:156)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:129)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:117)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at
org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:169)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1106)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1040)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:488)
Sep 24, 9:52:58.956 PM INFO org.apache.hadoop.hdfs.DFSClient
Abandoning
BP-1769805853-namenode1-1354129919031:blk_-6817802178798905477_327644519
Sep 24, 9:52:59.011 PM INFO org.apache.hadoop.hdfs.DFSClient
Excluding datanode RS-1:50010
Sep 24, 9:54:22.963 PM WARN
org.apache.hadoop.hbase.errorhandling.TimeoutExceptionInjector
Timer already marked completed, ignoring!
Sep 24, 9:54:22.964 PM ERROR
org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler
Failed taking snapshot { ss=** table=** type=DISABLED } due to
exception:org.apache.hadoop.hbase.errorhandling.TimeoutException: Timeout
elapsed! Source:Timeout caused Foreign Exception Start:1443145459635,
End:1443146059635, diff:60, max:60 ms

Thanks!

On Thu, Sep 24, 2015 at 6:34 PM, Ted Yu  wrote:

> 0.94.6 is really old. There have been quite a few bug fixes / improvements
> to snapshot feature since its release.
>
> The error happens when SnapshotDescription corresponding to
> kiji.prod.table.site.DI...
> was not found by ProcedureCoordinator.
>
> bq. does the timeout necessarily mean that the restore failed or could it
> be still running asynchronously
>
> Can you check region server logs around the time TimeoutException was
> thrown to see which server was the straggler ?
>
> Thanks
>
> On Thu, Sep 24, 2015 at 5:13 PM, Alexandre Normand <
> alexandre.norm...@gmail.com> wrote:
>
> > Hey,
> > We're trying to restore a snapshot of a relatively big table (20TB) using
> > hbase 0.94.6-cdh4.5.0 and we're getting timeouts doing so. We increased
> the
> > timeout configurations(hbase.snapshot.master.timeoutMillis,
> > hbase.snapshot.region.timeout, hbase.snapshot.master.timeout.millis) to
> 10
> > minutes but we're still experiencing the timeouts. Here's the error and
> > stack trace (table name obfuscated just because):
> >
> > ERROR: org.apache.hadoop.hbase.snapshot.HBaseSnapshotException:
> > org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot {
> > ss*-1443136710408 table= type=FLUSH } had an error.
> > kiji.prod.table.site.DI-1019-1443136710408 not found in proclist []
> > at
> >
> org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:360)
> > at
> > org.apache.hadoop.hbase.master.HMaster.isSnapshotDone(HMaster.java:2075)
> > at sun.reflect.GeneratedMethodAccessor36.invoke(Unknown Source)
> > at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > at java.lang.reflect.Method.invoke(Method.java:606)
> > at
> >
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
> > at
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1428)
> > Caused by: 

Timeouts on snapshot restore

2015-09-24 Thread Alexandre Normand
Hey,
We're trying to restore a snapshot of a relatively big table (20TB) using
hbase 0.94.6-cdh4.5.0 and we're getting timeouts doing so. We increased the
timeout configurations(hbase.snapshot.master.timeoutMillis,
hbase.snapshot.region.timeout, hbase.snapshot.master.timeout.millis) to 10
minutes but we're still experiencing the timeouts. Here's the error and
stack trace (table name obfuscated just because):

ERROR: org.apache.hadoop.hbase.snapshot.HBaseSnapshotException:
org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot {
ss*-1443136710408 table= type=FLUSH } had an error.
kiji.prod.table.site.DI-1019-1443136710408 not found in proclist []
at 
org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:360)
at 
org.apache.hadoop.hbase.master.HMaster.isSnapshotDone(HMaster.java:2075)
at sun.reflect.GeneratedMethodAccessor36.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1428)
Caused by: org.apache.hadoop.hbase.errorhandling.TimeoutException via
timer-java.util.Timer@8ad0d5c:org.apache.hadoop.hbase.errorhandling.TimeoutException:
Timeout elapsed! Source:Timeout caused Foreign Exception
Start:1443136713121, End:1443137313121, diff:60, max:60 ms
at 
org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:85)
at 
org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.rethrowExceptionIfFailed(TakeSnapshotHandler.java:285)
at 
org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:350)
... 6 more
Caused by: org.apache.hadoop.hbase.errorhandling.TimeoutException:
Timeout elapsed! Source:Timeout caused Foreign Exception
Start:1443136713121, End:1443137313121, diff:60, max:60 ms
at 
org.apache.hadoop.hbase.errorhandling.TimeoutExceptionInjector$1.run(TimeoutExceptionInjector.java:68)
at java.util.TimerThread.mainLoop(Timer.java:555)
at java.util.TimerThread.run(Timer.java:505)


We could increase the timeout again but we'd like to solicit some feedback
before trying that. First, does the timeout necessarily mean that the
restore failed or could it be still running asynchronously and eventually
completing? What's involved in the snapshot restore that could be useful in
informing what timeout value would be appropriate for this operation?

Thanks!

-- 
Alex


?????? HBase Filter Problem

2015-09-24 Thread donhoff_h
Hi,

There are many other columns and one such column was used in the 
SingleColumnValueFilter. My intention is first use the PrefixFilter to narrow 
the data scope, then use the SingleColumnValueFilter to choose the correct 
record, and last use the FirstKeyOnlyFilter to get just one KV and extract the 
rowkey from it. But the result shows that the SingleColumnValueFilter seems can 
not co-work with the FirstKeyOnlyFilter.  So I want to get understood the 
mechanism that cause this.



--  --
??: "ramkrishna vasudevan";;
: 2015??9??24??(??) 3:21
??: "user@hbase.apache.org"; 

: Re: HBase Filter Problem



Just trying to understand  more,
you are having a combination of PRefixFilter and SingleColumnValueFilter -
now the column you have specified in the SingleColumnValueFilter -  is it
the only column that you have in your table?  Or is there many other
columns and one such column was used in the SingleColumnValueFilter?

The idea of FirstKeyOnlyFilter is just to skip to the next row on getting
the first ever column in that row.  May be the combination of these two is
causing some issues.

Regards
Ram

On Wed, Sep 23, 2015 at 2:31 PM, donhoff_h <165612...@qq.com> wrote:

> Hi,
>
> There are 90 Million records in the table. And I use the the MUST_PASS_ALL
> for all my filters.  When I use PrefixFilter + SingleColumnValueFilter, it
> returned fast. So I supposed that the combination of PrefixFilter +
> SingleColumnValueFilter + FirstKeyOnlyFilter should be fast. But the fact
> is just in contrast. Do you know the reason that cause it?
>
> Thanks!
>
>
>
> --  --
> ??: "Fulin Sun";;
> : 2015??9??23??(??) 4:53
> ??: "HBase User";
>
> : : HBase Filter Problem
>
>
>
> Hi , there
>
> How many rows are there in the hbase table ? You want to achive the
> default FilterList.Operator.MUST_PASS_ALL or
> you just want to use or conditions for these filters ?
>
> I think the reason is that this kind of filter list just go more scan work
> and lower performance.
>
> Best,
> Sun.
>
>
>
>
> CertusNet
>
>  donhoff_h
> ?? 2015-09-23 16:33
>  user
> ?? HBase Filter Problem
> Hi??
>
> I wrote a program which function is to extract some data from a HBase
> table. According to business requirements I had to use the PrefixFilter and
> the SingleColumnValueFilter to filter the data.  The program ran very fast
> and returned in 1 sec.
>
> Considering I just need the rowkey of each record in my final result, I
> tried to improve my program by using the PrefixFilter +
> SingleColumnValueFilter + FirstKeyOnlyFitler. To my surprise the program
> ran very slow this time. It run about 20min and still not finished. So I
> had to kill it.
>
> Does anybody know the reason that cause my program run such slow?  Since I
> set the PrefixFilter as the first filter in the FilterList object, I think
> the program should ran fast.
>
> Many Thanks!
>