Re: ABORTING region server and following HBase cluster "crash"

2018-09-10 Thread Batyrshin Alexander
After update web interface at Master show that every region server now 1.4.7 
and no RITS.

Cluster recovered only when we restart all regions servers 4 times...

> On 11 Sep 2018, at 04:08, Josh Elser  wrote:
> 
> Did you update the HBase jars on all RegionServers?
> 
> Make sure that you have all of the Regions assigned (no RITs). There could be 
> a pretty simple explanation as to why the index can't be written to.
> 
> On 9/9/18 3:46 PM, Batyrshin Alexander wrote:
>> Correct me if im wrong.
>> But looks like if you have A and B region server that has index and primary 
>> table then possible situation like this.
>> A and B under writes on table with indexes
>> A - crash
>> B failed on index update because A is not operating then B starting aborting
>> A after restart try to rebuild index from WAL but B at this time is aborting 
>> then A starting aborting too
>> From this moment nothing happens (0 requests to region servers) and A and B 
>> is not responsible from Master-status web interface
>>> On 9 Sep 2018, at 04:38, Batyrshin Alexander <0x62...@gmail.com 
>>> > wrote:
>>> 
>>> After update we still can't recover HBase cluster. Our region servers 
>>> ABORTING over and over:
>>> 
>>> prod003:
>>> Sep 09 02:51:27 prod003 hbase[1440]: 2018-09-09 02:51:27,395 FATAL 
>>> [RpcServer.default.FPBQ.Fifo.handler=92,queue=2,port=60020] 
>>> regionserver.HRegionServer: ABORTING region server 
>>> prod003,60020,1536446665703: Could not update the index table, killing 
>>> server region because couldn't write to an index table
>>> Sep 09 02:51:27 prod003 hbase[1440]: 2018-09-09 02:51:27,395 FATAL 
>>> [RpcServer.default.FPBQ.Fifo.handler=77,queue=7,port=60020] 
>>> regionserver.HRegionServer: ABORTING region server 
>>> prod003,60020,1536446665703: Could not update the index table, killing 
>>> server region because couldn't write to an index table
>>> Sep 09 02:52:19 prod003 hbase[1440]: 2018-09-09 02:52:19,224 FATAL 
>>> [RpcServer.default.FPBQ.Fifo.handler=82,queue=2,port=60020] 
>>> regionserver.HRegionServer: ABORTING region server 
>>> prod003,60020,1536446665703: Could not update the index table, killing 
>>> server region because couldn't write to an index table
>>> Sep 09 02:52:28 prod003 hbase[1440]: 2018-09-09 02:52:28,922 FATAL 
>>> [RpcServer.default.FPBQ.Fifo.handler=94,queue=4,port=60020] 
>>> regionserver.HRegionServer: ABORTING region server 
>>> prod003,60020,1536446665703: Could not update the index table, killing 
>>> server region because couldn't write to an index table
>>> Sep 09 02:55:02 prod003 hbase[957]: 2018-09-09 02:55:02,096 FATAL 
>>> [RpcServer.default.FPBQ.Fifo.handler=95,queue=5,port=60020] 
>>> regionserver.HRegionServer: ABORTING region server 
>>> prod003,60020,1536450772841: Could not update the index table, killing 
>>> server region because couldn't write to an index table
>>> Sep 09 02:55:18 prod003 hbase[957]: 2018-09-09 02:55:18,793 FATAL 
>>> [RpcServer.default.FPBQ.Fifo.handler=97,queue=7,port=60020] 
>>> regionserver.HRegionServer: ABORTING region server 
>>> prod003,60020,1536450772841: Could not update the index table, killing 
>>> server region because couldn't write to an index table
>>> 
>>> prod004:
>>> Sep 09 02:52:13 prod004 hbase[4890]: 2018-09-09 02:52:13,541 FATAL 
>>> [RpcServer.default.FPBQ.Fifo.handler=83,queue=3,port=60020] 
>>> regionserver.HRegionServer: ABORTING region server 
>>> prod004,60020,1536446387325: Could not update the index table, killing 
>>> server region because couldn't write to an index table
>>> Sep 09 02:52:50 prod004 hbase[4890]: 2018-09-09 02:52:50,264 FATAL 
>>> [RpcServer.default.FPBQ.Fifo.handler=75,queue=5,port=60020] 
>>> regionserver.HRegionServer: ABORTING region server 
>>> prod004,60020,1536446387325: Could not update the index table, killing 
>>> server region because couldn't write to an index table
>>> Sep 09 02:53:40 prod004 hbase[4890]: 2018-09-09 02:53:40,709 FATAL 
>>> [RpcServer.default.FPBQ.Fifo.handler=66,queue=6,port=60020] 
>>> regionserver.HRegionServer: ABORTING region server 
>>> prod004,60020,1536446387325: Could not update the index table, killing 
>>> server region because couldn't write to an index table
>>> Sep 09 02:54:00 prod004 hbase[4890]: 2018-09-09 02:54:00,060 FATAL 
>>> [RpcServer.default.FPBQ.Fifo.handler=89,queue=9,port=60020] 
>>> regionserver.HRegionServer: ABORTING region server 
>>> prod004,60020,1536446387325: Could not update the index table, killing 
>>> server region because couldn't write to an index table
>>> 
>>> prod005:
>>> Sep 09 02:52:50 prod005 hbase[3772]: 2018-09-09 02:52:50,661 FATAL 
>>> [RpcServer.default.FPBQ.Fifo.handler=65,queue=5,port=60020] 
>>> regionserver.HRegionServer: ABORTING region server 
>>> prod005,60020,153644649: Could not update the index table, killing 
>>> server region because couldn't write to an index table
>>> Sep 09 02:53:27 prod005 hbase[3772]: 2018-09-09 02:53:27,542 FATAL 
>>> 

Re: ABORTING region server and following HBase cluster "crash"

2018-09-10 Thread Josh Elser

Did you update the HBase jars on all RegionServers?

Make sure that you have all of the Regions assigned (no RITs). There 
could be a pretty simple explanation as to why the index can't be 
written to.


On 9/9/18 3:46 PM, Batyrshin Alexander wrote:

Correct me if im wrong.

But looks like if you have A and B region server that has index and 
primary table then possible situation like this.


A and B under writes on table with indexes
A - crash
B failed on index update because A is not operating then B starting aborting
A after restart try to rebuild index from WAL but B at this time is 
aborting then A starting aborting too
 From this moment nothing happens (0 requests to region servers) and A 
and B is not responsible from Master-status web interface



On 9 Sep 2018, at 04:38, Batyrshin Alexander <0x62...@gmail.com 
> wrote:


After update we still can't recover HBase cluster. Our region servers 
ABORTING over and over:


prod003:
Sep 09 02:51:27 prod003 hbase[1440]: 2018-09-09 02:51:27,395 FATAL 
[RpcServer.default.FPBQ.Fifo.handler=92,queue=2,port=60020] 
regionserver.HRegionServer: ABORTING region 
server prod003,60020,1536446665703: Could not update the index table, 
killing server region because couldn't write to an index table
Sep 09 02:51:27 prod003 hbase[1440]: 2018-09-09 02:51:27,395 FATAL 
[RpcServer.default.FPBQ.Fifo.handler=77,queue=7,port=60020] 
regionserver.HRegionServer: ABORTING region 
server prod003,60020,1536446665703: Could not update the index table, 
killing server region because couldn't write to an index table
Sep 09 02:52:19 prod003 hbase[1440]: 2018-09-09 02:52:19,224 FATAL 
[RpcServer.default.FPBQ.Fifo.handler=82,queue=2,port=60020] 
regionserver.HRegionServer: ABORTING region 
server prod003,60020,1536446665703: Could not update the index table, 
killing server region because couldn't write to an index table
Sep 09 02:52:28 prod003 hbase[1440]: 2018-09-09 02:52:28,922 FATAL 
[RpcServer.default.FPBQ.Fifo.handler=94,queue=4,port=60020] 
regionserver.HRegionServer: ABORTING region 
server prod003,60020,1536446665703: Could not update the index table, 
killing server region because couldn't write to an index table
Sep 09 02:55:02 prod003 hbase[957]: 2018-09-09 02:55:02,096 FATAL 
[RpcServer.default.FPBQ.Fifo.handler=95,queue=5,port=60020] 
regionserver.HRegionServer: ABORTING region 
server prod003,60020,1536450772841: Could not update the index table, 
killing server region because couldn't write to an index table
Sep 09 02:55:18 prod003 hbase[957]: 2018-09-09 02:55:18,793 FATAL 
[RpcServer.default.FPBQ.Fifo.handler=97,queue=7,port=60020] 
regionserver.HRegionServer: ABORTING region 
server prod003,60020,1536450772841: Could not update the index table, 
killing server region because couldn't write to an index table


prod004:
Sep 09 02:52:13 prod004 hbase[4890]: 2018-09-09 02:52:13,541 FATAL 
[RpcServer.default.FPBQ.Fifo.handler=83,queue=3,port=60020] 
regionserver.HRegionServer: ABORTING region 
server prod004,60020,1536446387325: Could not update the index table, 
killing server region because couldn't write to an index table
Sep 09 02:52:50 prod004 hbase[4890]: 2018-09-09 02:52:50,264 FATAL 
[RpcServer.default.FPBQ.Fifo.handler=75,queue=5,port=60020] 
regionserver.HRegionServer: ABORTING region 
server prod004,60020,1536446387325: Could not update the index table, 
killing server region because couldn't write to an index table
Sep 09 02:53:40 prod004 hbase[4890]: 2018-09-09 02:53:40,709 FATAL 
[RpcServer.default.FPBQ.Fifo.handler=66,queue=6,port=60020] 
regionserver.HRegionServer: ABORTING region 
server prod004,60020,1536446387325: Could not update the index table, 
killing server region because couldn't write to an index table
Sep 09 02:54:00 prod004 hbase[4890]: 2018-09-09 02:54:00,060 FATAL 
[RpcServer.default.FPBQ.Fifo.handler=89,queue=9,port=60020] 
regionserver.HRegionServer: ABORTING region 
server prod004,60020,1536446387325: Could not update the index table, 
killing server region because couldn't write to an index table


prod005:
Sep 09 02:52:50 prod005 hbase[3772]: 2018-09-09 02:52:50,661 FATAL 
[RpcServer.default.FPBQ.Fifo.handler=65,queue=5,port=60020] 
regionserver.HRegionServer: ABORTING region 
server prod005,60020,153644649: Could not update the index table, 
killing server region because couldn't write to an index table
Sep 09 02:53:27 prod005 hbase[3772]: 2018-09-09 02:53:27,542 FATAL 
[RpcServer.default.FPBQ.Fifo.handler=90,queue=0,port=60020] 
regionserver.HRegionServer: ABORTING region 
server prod005,60020,153644649: Could not update the index table, 
killing server region because couldn't write to an index table
Sep 09 02:54:00 prod005 hbase[3772]: 2018-09-09 02:53:59,915 FATAL 
[RpcServer.default.FPBQ.Fifo.handler=7,queue=7,port=60020] 
regionserver.HRegionServer: ABORTING region 
server prod005,60020,153644649: Could not update the index table, 
killing server region because couldn't write to an index table

Re: Read-Write data to/from Phoenix 4.13 or 4.14 with Spark SQL Dataframe 2.1.0

2018-09-10 Thread Josh Elser
Lots of details missing here about how you're trying to submit these 
Spark jobs, but let me try to explain how things work now:


Phoenix provides spark(1) and spark2 jars. These JARs provide the 
implementation for Spark *on top* of what the phoenix-client.jar. You 
want to include both the phoenix-client and relevant phoenix-spark jars 
when you submit your application.


This should be how things are meant to work with Phoenix 4.13 and 4.14. 
If this doesn't help you, please give us some more specifics about the 
commands you run and the output you get. Thanks!


On 9/10/18 6:20 AM, lkyaes wrote:

Hello !

I wonder if there any way how to get working Phoenix 4.13 or 4.14 with 
Spark 2.1.0


In production we used Spark SQL dataframe to load from and write data to 
Hbase with Apache Phoenix (Spark 1.6 and Phoenix 4.7) and it worked well.


After upgrade , we faced an issues with loading and writing, it is not 
possible anymore.


Our environment:

·Cloudera 5.11.2,

·HBase 1.2

·Spark 2.1.0(parcel , compatible with Coudera 5.11.2)

·APACHE_PHOENIX 4.14.0-cdh5.11.2.p0.3 (we tested 4.13 as well)

We read/write data by Python (Pyspark library) but the same errors will 
come also writing in Scala.


*Read data from Phoenix 4.13 with Spark 2.1.0 error :*

Py4JJavaError:An error occurred while calling o213.load.
: java.lang.NoClassDefFoundError: org/apache/spark/sql/DataFrame

*Read data from Phoenix 4.14 with Spark 2.1.0 error :*

Py4JJavaError:An error occurred while calling o89.load. : 
com.google.common.util.concurrent.ExecutionError: 
java.lang.NoSuchMethodError: 
com.lmax.disruptor.dsl.Disruptor.(Lcom/lmax/disruptor/EventFactory;ILjava/util/concurrent/ThreadFactory;Lcom/lmax/disruptor/dsl/ProducerType;Lcom/lmax/disruptor/WaitStrategy;)V


(Disruptor .jar versions changing - did not solve the issue)

*Insert data to Phoenix 4.14 with Spark 2.1.0 error:*

Py4JJavaError:An error occurred while calling o186.save. 
:java.lang.AbstractMethodError: 
org.apache.phoenix.spark.DefaultSource.createRelation(Lorg/apache/spark/sql/SQLContext;Lorg/apache/spark/sql/SaveMode;Lscala/collection/immutable/Map;Lorg/apache/spark/sql/Dataset;)Lorg/apache/spark/sql/sources/BaseRelation;



Actually we areawarethat Spark2 failed to read and write Phoenix due to 
Spark changing the DataFrame API, as well as a Scala version change, the 
resultant JAR isn't binary compatible with Spark versions < 2.0.


*DataFrame class is missing from Spark 2 and *This issues was fixed ONCE 
by patch for Phoenix versioon 
4.10https://issues.apache.org/jira/browse/PHOENIX-


Unfortanatly this patch is not sutable for our enviroment, Could you 
please comment whether other versions of Phoenix has such fix?


How to read/write data from Phoenix 4.13/or 4.14 using Spark2?

Regards and hope for you help,
Liubov Kyaes
Data Engineer
ir.ee 

**//___^



Salting based on partial rowkeys

2018-09-10 Thread Gerald Sangudi
Hello folks,

We have a requirement for salting based on partial, rather than full,
rowkeys. My colleague Mike Polcari has identified the requirement and
proposed an approach.

I found an already-open JIRA ticket for the same issue:
https://issues.apache.org/jira/browse/PHOENIX-4757. I can provide more
details from the proposal.

The JIRA proposes a syntax of SALT_BUCKETS(col, ...) = N, whereas Mike
proposes SALT_COLUMN=col or SALT_COLUMNS=col, ... .

The benefit at issue is that users gain more control over partitioning, and
this can be used to push some additional aggregations and hash joins down
to region servers.

I would appreciate any go-ahead / thoughts / guidance / objections /
feedback. I'd like to be sure that the concept at least is not
objectionable. We would like to work on this and submit a patch down the
road. I'll also add a note to the JIRA ticket.

Thanks,
Gerald


Re: ABORTING region server and following HBase cluster "crash"

2018-09-10 Thread Jaanai Zhang
The root cause could not be got from log information lastly. The index
might have been corrupted and it seems the action of aborting server still
continue due to Index handler failures policy.


   Yun Zhang
   Best regards!



Batyrshin Alexander <0x62...@gmail.com> 于2018年9月10日周一 上午3:46写道:

> Correct me if im wrong.
>
> But looks like if you have A and B region server that has index and
> primary table then possible situation like this.
>
> A and B under writes on table with indexes
> A - crash
> B failed on index update because A is not operating then B starting
> aborting
> A after restart try to rebuild index from WAL but B at this time is
> aborting then A starting aborting too
> From this moment nothing happens (0 requests to region servers) and A and
> B is not responsible from Master-status web interface
>
>
> On 9 Sep 2018, at 04:38, Batyrshin Alexander <0x62...@gmail.com> wrote:
>
> After update we still can't recover HBase cluster. Our region servers
> ABORTING over and over:
>
> prod003:
> Sep 09 02:51:27 prod003 hbase[1440]: 2018-09-09 02:51:27,395 FATAL
> [RpcServer.default.FPBQ.Fifo.handler=92,queue=2,port=60020]
> regionserver.HRegionServer: ABORTING region
> server prod003,60020,1536446665703: Could not update the index table,
> killing server region because couldn't write to an index table
> Sep 09 02:51:27 prod003 hbase[1440]: 2018-09-09 02:51:27,395 FATAL
> [RpcServer.default.FPBQ.Fifo.handler=77,queue=7,port=60020]
> regionserver.HRegionServer: ABORTING region
> server prod003,60020,1536446665703: Could not update the index table,
> killing server region because couldn't write to an index table
> Sep 09 02:52:19 prod003 hbase[1440]: 2018-09-09 02:52:19,224 FATAL
> [RpcServer.default.FPBQ.Fifo.handler=82,queue=2,port=60020]
> regionserver.HRegionServer: ABORTING region
> server prod003,60020,1536446665703: Could not update the index table,
> killing server region because couldn't write to an index table
> Sep 09 02:52:28 prod003 hbase[1440]: 2018-09-09 02:52:28,922 FATAL
> [RpcServer.default.FPBQ.Fifo.handler=94,queue=4,port=60020]
> regionserver.HRegionServer: ABORTING region
> server prod003,60020,1536446665703: Could not update the index table,
> killing server region because couldn't write to an index table
> Sep 09 02:55:02 prod003 hbase[957]: 2018-09-09 02:55:02,096 FATAL
> [RpcServer.default.FPBQ.Fifo.handler=95,queue=5,port=60020]
> regionserver.HRegionServer: ABORTING region
> server prod003,60020,1536450772841: Could not update the index table,
> killing server region because couldn't write to an index table
> Sep 09 02:55:18 prod003 hbase[957]: 2018-09-09 02:55:18,793 FATAL
> [RpcServer.default.FPBQ.Fifo.handler=97,queue=7,port=60020]
> regionserver.HRegionServer: ABORTING region
> server prod003,60020,1536450772841: Could not update the index table,
> killing server region because couldn't write to an index table
>
> prod004:
> Sep 09 02:52:13 prod004 hbase[4890]: 2018-09-09 02:52:13,541 FATAL
> [RpcServer.default.FPBQ.Fifo.handler=83,queue=3,port=60020]
> regionserver.HRegionServer: ABORTING region
> server prod004,60020,1536446387325: Could not update the index table,
> killing server region because couldn't write to an index table
> Sep 09 02:52:50 prod004 hbase[4890]: 2018-09-09 02:52:50,264 FATAL
> [RpcServer.default.FPBQ.Fifo.handler=75,queue=5,port=60020]
> regionserver.HRegionServer: ABORTING region
> server prod004,60020,1536446387325: Could not update the index table,
> killing server region because couldn't write to an index table
> Sep 09 02:53:40 prod004 hbase[4890]: 2018-09-09 02:53:40,709 FATAL
> [RpcServer.default.FPBQ.Fifo.handler=66,queue=6,port=60020]
> regionserver.HRegionServer: ABORTING region
> server prod004,60020,1536446387325: Could not update the index table,
> killing server region because couldn't write to an index table
> Sep 09 02:54:00 prod004 hbase[4890]: 2018-09-09 02:54:00,060 FATAL
> [RpcServer.default.FPBQ.Fifo.handler=89,queue=9,port=60020]
> regionserver.HRegionServer: ABORTING region
> server prod004,60020,1536446387325: Could not update the index table,
> killing server region because couldn't write to an index table
>
> prod005:
> Sep 09 02:52:50 prod005 hbase[3772]: 2018-09-09 02:52:50,661 FATAL
> [RpcServer.default.FPBQ.Fifo.handler=65,queue=5,port=60020]
> regionserver.HRegionServer: ABORTING region
> server prod005,60020,153644649: Could not update the index table,
> killing server region because couldn't write to an index table
> Sep 09 02:53:27 prod005 hbase[3772]: 2018-09-09 02:53:27,542 FATAL
> [RpcServer.default.FPBQ.Fifo.handler=90,queue=0,port=60020]
> regionserver.HRegionServer: ABORTING region
> server prod005,60020,153644649: Could not update the index table,
> killing server region because couldn't write to an index table
> Sep 09 02:54:00 prod005 hbase[3772]: 2018-09-09 02:53:59,915 FATAL
> [RpcServer.default.FPBQ.Fifo.handler=7,queue=7,port=60020]
> 

Read-Write data to/from Phoenix 4.13 or 4.14 with Spark SQL Dataframe 2.1.0

2018-09-10 Thread lkyaes
Hello !

I wonder if there any way  how to get working Phoenix 4.13 or 4.14 with
Spark 2.1.0

In production we used Spark SQL dataframe to load from and  write data to
Hbase with Apache Phoenix  (Spark 1.6 and Phoenix 4.7)  and it worked well.

After upgrade , we faced an issues with loading and writing, it is not
possible anymore.

Our environment:

· Cloudera 5.11.2,

· HBase 1.2

· Spark 2.1.0   (parcel , compatible with Coudera 5.11.2)

· APACHE_PHOENIX  4.14.0-cdh5.11.2.p0.3   (we tested 4.13 as well)



We read/write data by Python (Pyspark library) but the same errors will
come also writing in Scala.

*Read data  from Phoenix 4.13  with Spark 2.1.0 error :*

Py4JJavaError: An error occurred while calling o213.load.
: java.lang.NoClassDefFoundError: org/apache/spark/sql/DataFrame

*Read data  from Phoenix 4.14  with Spark 2.1.0 error :*

Py4JJavaError: An error occurred while calling o89.load. :
com.google.common.util.concurrent.ExecutionError:
java.lang.NoSuchMethodError:
com.lmax.disruptor.dsl.Disruptor.(Lcom/lmax/disruptor/EventFactory;ILjava/util/concurrent/ThreadFactory;Lcom/lmax/disruptor/dsl/ProducerType;Lcom/lmax/disruptor/WaitStrategy;)V

(Disruptor .jar versions changing -  did not solve the issue)

*Insert data to Phoenix 4.14  with Spark 2.1.0  error:*

Py4JJavaError: An error occurred while calling o186.save. :
java.lang.AbstractMethodError:
org.apache.phoenix.spark.DefaultSource.createRelation(Lorg/apache/spark/sql/SQLContext;Lorg/apache/spark/sql/SaveMode;Lscala/collection/immutable/Map;Lorg/apache/spark/sql/Dataset;)Lorg/apache/spark/sql/sources/BaseRelation;



Actually we are aware that  Spark2 failed to read and write Phoenix  due to
Spark changing the DataFrame API, as well as a Scala version change, the
resultant JAR isn't binary compatible with Spark versions < 2.0.

*DataFrame class is missing from Spark 2 and *This issues was fixed ONCE  by
patch for Phoenix versioon 4.10
https://issues.apache.org/jira/browse/PHOENIX-

Unfortanatly this patch is not sutable for our enviroment, Could you please
comment whether other versions of Phoenix has such fix?

How to read/write data from Phoenix 4.13/or 4.14 using Spark2?

Regards and hope for you help,
Liubov Kyaes
Data Engineer
ir.ee