Re: Encountering BufferUnderflowException when querying from Phoenix

2018-10-19 Thread William Shen
Dug into one of the row that was having a similar problem, throwing
IllegalArgumentException [1] instead of BufferUnderflowException, but both
seemed to be data issue on how the varchar array is stored in an unexpected
format in HBase.

The row looks like:
*A_VARCHAR_OF_170_CHARS*\x00\x00\x00\x80\x01\x00\x00\x02\xAD\x00\x00\x00

I could not make sense of it based on the 4.13 encoding (hence Phoenix is
throwing an exception), and I looked back to 4.8 and it doesn't seem like
the old format either... Anyone recognize the hex encoding by any chance,
or is this some sort of data corruption?

Thanks,

- Will


[1]

java.lang.IllegalArgumentException
at java.nio.Buffer.position(Buffer.java:244)
at 
org.apache.phoenix.schema.types.PArrayDataType.createPhoenixArray(PArrayDataType.java:1025)
at 
org.apache.phoenix.schema.types.PArrayDataType.toObject(PArrayDataType.java:375)
at 
org.apache.phoenix.schema.types.PVarcharArray.toObject(PVarcharArray.java:65)
at 
org.apache.phoenix.schema.types.PDataType.toObject(PDataType.java:1011)
at 
org.apache.phoenix.compile.ExpressionProjector.getValue(ExpressionProjector.java:75)
at 
org.apache.phoenix.jdbc.PhoenixResultSet.getString(PhoenixResultSet.java:609)
at sqlline.Rows$Row.(Rows.java:183)
at sqlline.BufferedRows.(BufferedRows.java:38)
at sqlline.SqlLine.print(SqlLine.java:1660)
at sqlline.Commands.execute(Commands.java:833)
at sqlline.Commands.sql(Commands.java:732)
at sqlline.SqlLine.dispatch(SqlLine.java:813)
at sqlline.SqlLine.begin(SqlLine.java:686)
at sqlline.SqlLine.start(SqlLine.java:398)
at sqlline.SqlLine.main(SqlLine.java:291)


On Wed, Oct 17, 2018 at 3:21 PM William Shen 
wrote:

> Thank Jaanai.
>
> At first we thought it was data issue too, but as we restored the table
> from snapshot to a separate schema on the same cluster to triage, the
> exception no longer happens... Does that give further clue on what the
> issue might've been?
>
> 0: jdbc:phoenix:journalnode,test> SELECT A, B, C, D  FROM SCHEMA.TABLE
>  where A = 13100423;
>
> java.nio.BufferUnderflowException
>
> at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:151)
>
> at java.nio.ByteBuffer.get(ByteBuffer.java:715)
>
> at
> org.apache.phoenix.schema.types.PArrayDataType.createPhoenixArray(PArrayDataType.java:1028)
>
> at
> org.apache.phoenix.schema.types.PArrayDataType.toObject(PArrayDataType.java:375)
>
> at
> org.apache.phoenix.schema.types.PVarcharArray.toObject(PVarcharArray.java:65)
>
> at org.apache.phoenix.schema.types.PDataType.toObject(PDataType.java:1011)
>
> at
> org.apache.phoenix.compile.ExpressionProjector.getValue(ExpressionProjector.java:75)
>
> at
> org.apache.phoenix.jdbc.PhoenixResultSet.getString(PhoenixResultSet.java:609)
>
> at sqlline.Rows$Row.(Rows.java:183)
>
> at sqlline.BufferedRows.(BufferedRows.java:38)
>
> at sqlline.SqlLine.print(SqlLine.java:1660)
>
> at sqlline.Commands.execute(Commands.java:833)
>
> at sqlline.Commands.sql(Commands.java:732)
>
> at sqlline.SqlLine.dispatch(SqlLine.java:813)
>
> at sqlline.SqlLine.begin(SqlLine.java:686)
>
> at sqlline.SqlLine.start(SqlLine.java:398)
>
> at sqlline.SqlLine.main(SqlLine.java:291)
>
>
>
> 0: jdbc:phoenix:journalnode,test> SELECT A, B, C, D  FROM SCHEMA.CORRUPTION
> where A = 13100423;
>
> +---+++-+
>
> |A | B  | C  |D |
>
> +---+++-+
>
> | 13100423  | 5159   | 7  | ['female']  |
>
> +---+++-+
>
> 1 row selected (1.76 seconds)
>
> On Sun, Oct 14, 2018 at 8:39 PM Jaanai Zhang 
> wrote:
>
>> It looks a bug that the remained part greater than retrieved the length
>> in ByteBuffer, Maybe the position of ByteBuffer or the length of target
>> byte array exists some problems.
>>
>> 
>>Jaanai Zhang
>>Best regards!
>>
>>
>>
>> William Shen  于2018年10月12日周五 下午11:53写道:
>>
>>> Hi all,
>>>
>>> We are running Phoenix 4.13, and periodically we would encounter the
>>> following exception when querying from Phoenix in our staging environment.
>>> Initially, we thought we had some incompatible client version connecting
>>> and creating data corruption, but after ensuring that we are only
>>> connecting with 4.13 clients, we still see this issue come up from time to
>>> time. So far, fortunately, since it is in staging, we are able to identify
>>> and delete the data to restore service.
>>>
>>> However, would like to ask for guidance on what else we could look for
>>> to identify the cause of this exception. Could this perhaps caused by
>>> something other than data corruption?
>>>
>>> Thanks in advance!
>>>
>>> The exception looks like:
>>>
>>> 18/10/12 15:45:58 WARN scheduler.TaskSetManager: Lost task 32.2 in stage
>>> 14.0 (TID 1275, ...datanode..., executor 82):
>>> java.nio.BufferUnderflowException
>>>
>>> at java.nio.HeapByteBuffer.

Re: Encountering BufferUnderflowException when querying from Phoenix

2018-10-17 Thread William Shen
Thank Jaanai.

At first we thought it was data issue too, but as we restored the table
from snapshot to a separate schema on the same cluster to triage, the
exception no longer happens... Does that give further clue on what the
issue might've been?

0: jdbc:phoenix:journalnode,test> SELECT A, B, C, D  FROM SCHEMA.TABLE
 where A = 13100423;

java.nio.BufferUnderflowException

at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:151)

at java.nio.ByteBuffer.get(ByteBuffer.java:715)

at
org.apache.phoenix.schema.types.PArrayDataType.createPhoenixArray(PArrayDataType.java:1028)

at
org.apache.phoenix.schema.types.PArrayDataType.toObject(PArrayDataType.java:375)

at
org.apache.phoenix.schema.types.PVarcharArray.toObject(PVarcharArray.java:65)

at org.apache.phoenix.schema.types.PDataType.toObject(PDataType.java:1011)

at
org.apache.phoenix.compile.ExpressionProjector.getValue(ExpressionProjector.java:75)

at
org.apache.phoenix.jdbc.PhoenixResultSet.getString(PhoenixResultSet.java:609)

at sqlline.Rows$Row.(Rows.java:183)

at sqlline.BufferedRows.(BufferedRows.java:38)

at sqlline.SqlLine.print(SqlLine.java:1660)

at sqlline.Commands.execute(Commands.java:833)

at sqlline.Commands.sql(Commands.java:732)

at sqlline.SqlLine.dispatch(SqlLine.java:813)

at sqlline.SqlLine.begin(SqlLine.java:686)

at sqlline.SqlLine.start(SqlLine.java:398)

at sqlline.SqlLine.main(SqlLine.java:291)



0: jdbc:phoenix:journalnode,test> SELECT A, B, C, D  FROM SCHEMA.CORRUPTION
where A = 13100423;

+---+++-+

|A | B  | C  |D |

+---+++-+

| 13100423  | 5159   | 7  | ['female']  |

+---+++-+

1 row selected (1.76 seconds)

On Sun, Oct 14, 2018 at 8:39 PM Jaanai Zhang  wrote:

> It looks a bug that the remained part greater than retrieved the length in
> ByteBuffer, Maybe the position of ByteBuffer or the length of target byte
> array exists some problems.
>
> 
>Jaanai Zhang
>Best regards!
>
>
>
> William Shen  于2018年10月12日周五 下午11:53写道:
>
>> Hi all,
>>
>> We are running Phoenix 4.13, and periodically we would encounter the
>> following exception when querying from Phoenix in our staging environment.
>> Initially, we thought we had some incompatible client version connecting
>> and creating data corruption, but after ensuring that we are only
>> connecting with 4.13 clients, we still see this issue come up from time to
>> time. So far, fortunately, since it is in staging, we are able to identify
>> and delete the data to restore service.
>>
>> However, would like to ask for guidance on what else we could look for to
>> identify the cause of this exception. Could this perhaps caused by
>> something other than data corruption?
>>
>> Thanks in advance!
>>
>> The exception looks like:
>>
>> 18/10/12 15:45:58 WARN scheduler.TaskSetManager: Lost task 32.2 in stage
>> 14.0 (TID 1275, ...datanode..., executor 82):
>> java.nio.BufferUnderflowException
>>
>> at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:151)
>>
>> at java.nio.ByteBuffer.get(ByteBuffer.java:715)
>>
>> at
>> org.apache.phoenix.schema.types.PArrayDataType.createPhoenixArray(PArrayDataType.java:1028)
>>
>> at
>> org.apache.phoenix.schema.types.PArrayDataType.toObject(PArrayDataType.java:375)
>>
>> at
>> org.apache.phoenix.schema.types.PVarcharArray.toObject(PVarcharArray.java:65)
>>
>> at org.apache.phoenix.schema.types.PDataType.toObject(PDataType.java:1011)
>>
>> at
>> org.apache.phoenix.compile.ExpressionProjector.getValue(ExpressionProjector.java:75)
>>
>> at
>> org.apache.phoenix.jdbc.PhoenixResultSet.getObject(PhoenixResultSet.java:525)
>>
>> at
>> org.apache.phoenix.spark.PhoenixRecordWritable$$anonfun$readFields$1.apply$mcVI$sp(PhoenixRecordWritable.scala:96)
>>
>> at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
>>
>> at
>> org.apache.phoenix.spark.PhoenixRecordWritable.readFields(PhoenixRecordWritable.scala:93)
>>
>> at
>> org.apache.phoenix.mapreduce.PhoenixRecordReader.nextKeyValue(PhoenixRecordReader.java:168)
>>
>> at
>> org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:174)
>>
>> at
>> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
>>
>> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>>
>> at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1596)
>>
>> at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1157)
>>
>> at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1157)
>>
>> at
>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1870)
>>
>> at
>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1870)
>>
>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>>
>> at org.apache.spark.scheduler.Task.run(Task.scala:89)
>>
>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:229)
>>
>> at
>> java.uti

Re: Encountering BufferUnderflowException when querying from Phoenix

2018-10-14 Thread Jaanai Zhang
It looks a bug that the remained part greater than retrieved the length in
ByteBuffer, Maybe the position of ByteBuffer or the length of target byte
array exists some problems.


   Jaanai Zhang
   Best regards!



William Shen  于2018年10月12日周五 下午11:53写道:

> Hi all,
>
> We are running Phoenix 4.13, and periodically we would encounter the
> following exception when querying from Phoenix in our staging environment.
> Initially, we thought we had some incompatible client version connecting
> and creating data corruption, but after ensuring that we are only
> connecting with 4.13 clients, we still see this issue come up from time to
> time. So far, fortunately, since it is in staging, we are able to identify
> and delete the data to restore service.
>
> However, would like to ask for guidance on what else we could look for to
> identify the cause of this exception. Could this perhaps caused by
> something other than data corruption?
>
> Thanks in advance!
>
> The exception looks like:
>
> 18/10/12 15:45:58 WARN scheduler.TaskSetManager: Lost task 32.2 in stage
> 14.0 (TID 1275, ...datanode..., executor 82):
> java.nio.BufferUnderflowException
>
> at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:151)
>
> at java.nio.ByteBuffer.get(ByteBuffer.java:715)
>
> at
> org.apache.phoenix.schema.types.PArrayDataType.createPhoenixArray(PArrayDataType.java:1028)
>
> at
> org.apache.phoenix.schema.types.PArrayDataType.toObject(PArrayDataType.java:375)
>
> at
> org.apache.phoenix.schema.types.PVarcharArray.toObject(PVarcharArray.java:65)
>
> at org.apache.phoenix.schema.types.PDataType.toObject(PDataType.java:1011)
>
> at
> org.apache.phoenix.compile.ExpressionProjector.getValue(ExpressionProjector.java:75)
>
> at
> org.apache.phoenix.jdbc.PhoenixResultSet.getObject(PhoenixResultSet.java:525)
>
> at
> org.apache.phoenix.spark.PhoenixRecordWritable$$anonfun$readFields$1.apply$mcVI$sp(PhoenixRecordWritable.scala:96)
>
> at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
>
> at
> org.apache.phoenix.spark.PhoenixRecordWritable.readFields(PhoenixRecordWritable.scala:93)
>
> at
> org.apache.phoenix.mapreduce.PhoenixRecordReader.nextKeyValue(PhoenixRecordReader.java:168)
>
> at
> org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:174)
>
> at
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
>
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>
> at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1596)
>
> at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1157)
>
> at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1157)
>
> at
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1870)
>
> at
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1870)
>
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>
> at org.apache.spark.scheduler.Task.run(Task.scala:89)
>
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:229)
>
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>
> at java.lang.Thread.run(Thread.java:748)
>
>
>


Encountering BufferUnderflowException when querying from Phoenix

2018-10-12 Thread William Shen
Hi all,

We are running Phoenix 4.13, and periodically we would encounter the
following exception when querying from Phoenix in our staging environment.
Initially, we thought we had some incompatible client version connecting
and creating data corruption, but after ensuring that we are only
connecting with 4.13 clients, we still see this issue come up from time to
time. So far, fortunately, since it is in staging, we are able to identify
and delete the data to restore service.

However, would like to ask for guidance on what else we could look for to
identify the cause of this exception. Could this perhaps caused by
something other than data corruption?

Thanks in advance!

The exception looks like:

18/10/12 15:45:58 WARN scheduler.TaskSetManager: Lost task 32.2 in stage
14.0 (TID 1275, ...datanode..., executor 82):
java.nio.BufferUnderflowException

at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:151)

at java.nio.ByteBuffer.get(ByteBuffer.java:715)

at
org.apache.phoenix.schema.types.PArrayDataType.createPhoenixArray(PArrayDataType.java:1028)

at
org.apache.phoenix.schema.types.PArrayDataType.toObject(PArrayDataType.java:375)

at
org.apache.phoenix.schema.types.PVarcharArray.toObject(PVarcharArray.java:65)

at org.apache.phoenix.schema.types.PDataType.toObject(PDataType.java:1011)

at
org.apache.phoenix.compile.ExpressionProjector.getValue(ExpressionProjector.java:75)

at
org.apache.phoenix.jdbc.PhoenixResultSet.getObject(PhoenixResultSet.java:525)

at
org.apache.phoenix.spark.PhoenixRecordWritable$$anonfun$readFields$1.apply$mcVI$sp(PhoenixRecordWritable.scala:96)

at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)

at
org.apache.phoenix.spark.PhoenixRecordWritable.readFields(PhoenixRecordWritable.scala:93)

at
org.apache.phoenix.mapreduce.PhoenixRecordReader.nextKeyValue(PhoenixRecordReader.java:168)

at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:174)

at
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)

at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)

at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1596)

at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1157)

at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1157)

at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1870)

at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1870)

at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)

at org.apache.spark.scheduler.Task.run(Task.scala:89)

at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:229)

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:748)