Re: Missing content in phoenix after writing from Spark

2018-09-12 Thread Saif Addin
Thanks, we'll try Spark Connector then. Thought it didn't support newest
Spark Versions

On Wed, Sep 12, 2018 at 11:03 PM Jaanai Zhang 
wrote:

> It seems columns data missing mapping information of the schema. if you
> want to use this way to write HBase table,  you can create an HBase table
> and uses Phoenix mapping it.
>
> 
>Jaanai Zhang
>Best regards!
>
>
>
> Thomas D'Silva  于2018年9月13日周四 上午6:03写道:
>
>> Is there a reason you didn't use the spark-connector to serialize your
>> data?
>>
>> On Wed, Sep 12, 2018 at 2:28 PM, Saif Addin  wrote:
>>
>>> Thank you Josh! That was helpful. Indeed, there was a salt bucket on the
>>> table, and the key-column now shows correctly.
>>>
>>> However, the problem still persists in that the rest of the columns show
>>> as completely empty on Phoenix (appear correctly on Hbase). We'll be
>>> looking into this but if you have any further advice, appreciated.
>>>
>>> Saif
>>>
>>> On Wed, Sep 12, 2018 at 5:50 PM Josh Elser  wrote:
>>>
 Reminder: Using Phoenix internals forces you to understand exactly how
 the version of Phoenix that you're using serializes data. Is there a
 reason you're not using SQL to interact with Phoenix?

 Sounds to me that Phoenix is expecting more data at the head of your
 rowkey. Maybe a salt bucket that you've defined on the table but not
 created?

 On 9/12/18 4:32 PM, Saif Addin wrote:
 > Hi all,
 >
 > We're trying to write tables with all string columns from spark.
 > We are not using the Spark Connector, instead we are directly writing
 > byte arrays from RDDs.
 >
 > The process works fine, and Hbase receives the data correctly, and
 > content is consistent.
 >
 > However reading the table from Phoenix, we notice the first character
 of
 > strings are missing. This sounds like it's a byte encoding issue, but
 > we're at loss. We're using PVarchar to generate bytes.
 >
 > Here's the snippet of code creating the RDD:
 >
 > val tdd = pdd.flatMap(x => {
 >val rowKey = PVarchar.INSTANCE.toBytes(x._1)
 >for(i <- 0 until cols.length) yield {
 >  other stuff for other columns ...
 >  ...
 >  (rowKey, (column1, column2, column3))
 >}
 > })
 >
 > ...
 >
 > We then create the following output to be written down in Hbase
 >
 > val output = tdd.map(x => {
 >  val rowKeyByte: Array[Byte] = x._1
 >  val immutableRowKey = new ImmutableBytesWritable(rowKeyByte)
 >
 >  val kv = new KeyValue(rowKeyByte,
 >  PVarchar.INSTANCE.toBytes(column1),
 >  PVarchar.INSTANCE.toBytes(column2),
 >PVarchar.INSTANCE.toBytes(column3)
 >  )
 >  (immutableRowKey, kv)
 > })
 >
 > By the way, we are using *KryoSerializer* in order to be able to
 > serialize all classes necessary for Hbase (KeyValue, BytesWritable,
 etc).
 >
 > The key of this table is the one missing data when queried from
 Phoenix.
 > So we guess something is wrong with the byte ser.
 >
 > Any ideas? Appreciated!
 > Saif

>>>
>>


Re: Missing content in phoenix after writing from Spark

2018-09-12 Thread Jaanai Zhang
It seems columns data missing mapping information of the schema. if you
want to use this way to write HBase table,  you can create an HBase table
and uses Phoenix mapping it.


   Jaanai Zhang
   Best regards!



Thomas D'Silva  于2018年9月13日周四 上午6:03写道:

> Is there a reason you didn't use the spark-connector to serialize your
> data?
>
> On Wed, Sep 12, 2018 at 2:28 PM, Saif Addin  wrote:
>
>> Thank you Josh! That was helpful. Indeed, there was a salt bucket on the
>> table, and the key-column now shows correctly.
>>
>> However, the problem still persists in that the rest of the columns show
>> as completely empty on Phoenix (appear correctly on Hbase). We'll be
>> looking into this but if you have any further advice, appreciated.
>>
>> Saif
>>
>> On Wed, Sep 12, 2018 at 5:50 PM Josh Elser  wrote:
>>
>>> Reminder: Using Phoenix internals forces you to understand exactly how
>>> the version of Phoenix that you're using serializes data. Is there a
>>> reason you're not using SQL to interact with Phoenix?
>>>
>>> Sounds to me that Phoenix is expecting more data at the head of your
>>> rowkey. Maybe a salt bucket that you've defined on the table but not
>>> created?
>>>
>>> On 9/12/18 4:32 PM, Saif Addin wrote:
>>> > Hi all,
>>> >
>>> > We're trying to write tables with all string columns from spark.
>>> > We are not using the Spark Connector, instead we are directly writing
>>> > byte arrays from RDDs.
>>> >
>>> > The process works fine, and Hbase receives the data correctly, and
>>> > content is consistent.
>>> >
>>> > However reading the table from Phoenix, we notice the first character
>>> of
>>> > strings are missing. This sounds like it's a byte encoding issue, but
>>> > we're at loss. We're using PVarchar to generate bytes.
>>> >
>>> > Here's the snippet of code creating the RDD:
>>> >
>>> > val tdd = pdd.flatMap(x => {
>>> >val rowKey = PVarchar.INSTANCE.toBytes(x._1)
>>> >for(i <- 0 until cols.length) yield {
>>> >  other stuff for other columns ...
>>> >  ...
>>> >  (rowKey, (column1, column2, column3))
>>> >}
>>> > })
>>> >
>>> > ...
>>> >
>>> > We then create the following output to be written down in Hbase
>>> >
>>> > val output = tdd.map(x => {
>>> >  val rowKeyByte: Array[Byte] = x._1
>>> >  val immutableRowKey = new ImmutableBytesWritable(rowKeyByte)
>>> >
>>> >  val kv = new KeyValue(rowKeyByte,
>>> >  PVarchar.INSTANCE.toBytes(column1),
>>> >  PVarchar.INSTANCE.toBytes(column2),
>>> >PVarchar.INSTANCE.toBytes(column3)
>>> >  )
>>> >  (immutableRowKey, kv)
>>> > })
>>> >
>>> > By the way, we are using *KryoSerializer* in order to be able to
>>> > serialize all classes necessary for Hbase (KeyValue, BytesWritable,
>>> etc).
>>> >
>>> > The key of this table is the one missing data when queried from
>>> Phoenix.
>>> > So we guess something is wrong with the byte ser.
>>> >
>>> > Any ideas? Appreciated!
>>> > Saif
>>>
>>
>


Re: Missing content in phoenix after writing from Spark

2018-09-12 Thread Thomas D'Silva
Is there a reason you didn't use the spark-connector to serialize your data?

On Wed, Sep 12, 2018 at 2:28 PM, Saif Addin  wrote:

> Thank you Josh! That was helpful. Indeed, there was a salt bucket on the
> table, and the key-column now shows correctly.
>
> However, the problem still persists in that the rest of the columns show
> as completely empty on Phoenix (appear correctly on Hbase). We'll be
> looking into this but if you have any further advice, appreciated.
>
> Saif
>
> On Wed, Sep 12, 2018 at 5:50 PM Josh Elser  wrote:
>
>> Reminder: Using Phoenix internals forces you to understand exactly how
>> the version of Phoenix that you're using serializes data. Is there a
>> reason you're not using SQL to interact with Phoenix?
>>
>> Sounds to me that Phoenix is expecting more data at the head of your
>> rowkey. Maybe a salt bucket that you've defined on the table but not
>> created?
>>
>> On 9/12/18 4:32 PM, Saif Addin wrote:
>> > Hi all,
>> >
>> > We're trying to write tables with all string columns from spark.
>> > We are not using the Spark Connector, instead we are directly writing
>> > byte arrays from RDDs.
>> >
>> > The process works fine, and Hbase receives the data correctly, and
>> > content is consistent.
>> >
>> > However reading the table from Phoenix, we notice the first character
>> of
>> > strings are missing. This sounds like it's a byte encoding issue, but
>> > we're at loss. We're using PVarchar to generate bytes.
>> >
>> > Here's the snippet of code creating the RDD:
>> >
>> > val tdd = pdd.flatMap(x => {
>> >val rowKey = PVarchar.INSTANCE.toBytes(x._1)
>> >for(i <- 0 until cols.length) yield {
>> >  other stuff for other columns ...
>> >  ...
>> >  (rowKey, (column1, column2, column3))
>> >}
>> > })
>> >
>> > ...
>> >
>> > We then create the following output to be written down in Hbase
>> >
>> > val output = tdd.map(x => {
>> >  val rowKeyByte: Array[Byte] = x._1
>> >  val immutableRowKey = new ImmutableBytesWritable(rowKeyByte)
>> >
>> >  val kv = new KeyValue(rowKeyByte,
>> >  PVarchar.INSTANCE.toBytes(column1),
>> >  PVarchar.INSTANCE.toBytes(column2),
>> >PVarchar.INSTANCE.toBytes(column3)
>> >  )
>> >  (immutableRowKey, kv)
>> > })
>> >
>> > By the way, we are using *KryoSerializer* in order to be able to
>> > serialize all classes necessary for Hbase (KeyValue, BytesWritable,
>> etc).
>> >
>> > The key of this table is the one missing data when queried from
>> Phoenix.
>> > So we guess something is wrong with the byte ser.
>> >
>> > Any ideas? Appreciated!
>> > Saif
>>
>


Re: Missing content in phoenix after writing from Spark

2018-09-12 Thread Saif Addin
Thank you Josh! That was helpful. Indeed, there was a salt bucket on the
table, and the key-column now shows correctly.

However, the problem still persists in that the rest of the columns show as
completely empty on Phoenix (appear correctly on Hbase). We'll be looking
into this but if you have any further advice, appreciated.

Saif

On Wed, Sep 12, 2018 at 5:50 PM Josh Elser  wrote:

> Reminder: Using Phoenix internals forces you to understand exactly how
> the version of Phoenix that you're using serializes data. Is there a
> reason you're not using SQL to interact with Phoenix?
>
> Sounds to me that Phoenix is expecting more data at the head of your
> rowkey. Maybe a salt bucket that you've defined on the table but not
> created?
>
> On 9/12/18 4:32 PM, Saif Addin wrote:
> > Hi all,
> >
> > We're trying to write tables with all string columns from spark.
> > We are not using the Spark Connector, instead we are directly writing
> > byte arrays from RDDs.
> >
> > The process works fine, and Hbase receives the data correctly, and
> > content is consistent.
> >
> > However reading the table from Phoenix, we notice the first character of
> > strings are missing. This sounds like it's a byte encoding issue, but
> > we're at loss. We're using PVarchar to generate bytes.
> >
> > Here's the snippet of code creating the RDD:
> >
> > val tdd = pdd.flatMap(x => {
> >val rowKey = PVarchar.INSTANCE.toBytes(x._1)
> >for(i <- 0 until cols.length) yield {
> >  other stuff for other columns ...
> >  ...
> >  (rowKey, (column1, column2, column3))
> >}
> > })
> >
> > ...
> >
> > We then create the following output to be written down in Hbase
> >
> > val output = tdd.map(x => {
> >  val rowKeyByte: Array[Byte] = x._1
> >  val immutableRowKey = new ImmutableBytesWritable(rowKeyByte)
> >
> >  val kv = new KeyValue(rowKeyByte,
> >  PVarchar.INSTANCE.toBytes(column1),
> >  PVarchar.INSTANCE.toBytes(column2),
> >PVarchar.INSTANCE.toBytes(column3)
> >  )
> >  (immutableRowKey, kv)
> > })
> >
> > By the way, we are using *KryoSerializer* in order to be able to
> > serialize all classes necessary for Hbase (KeyValue, BytesWritable, etc).
> >
> > The key of this table is the one missing data when queried from Phoenix.
> > So we guess something is wrong with the byte ser.
> >
> > Any ideas? Appreciated!
> > Saif
>


Re: Missing content in phoenix after writing from Spark

2018-09-12 Thread Josh Elser
Reminder: Using Phoenix internals forces you to understand exactly how 
the version of Phoenix that you're using serializes data. Is there a 
reason you're not using SQL to interact with Phoenix?


Sounds to me that Phoenix is expecting more data at the head of your 
rowkey. Maybe a salt bucket that you've defined on the table but not 
created?


On 9/12/18 4:32 PM, Saif Addin wrote:

Hi all,

We're trying to write tables with all string columns from spark.
We are not using the Spark Connector, instead we are directly writing 
byte arrays from RDDs.


The process works fine, and Hbase receives the data correctly, and 
content is consistent.


However reading the table from Phoenix, we notice the first character of 
strings are missing. This sounds like it's a byte encoding issue, but 
we're at loss. We're using PVarchar to generate bytes.


Here's the snippet of code creating the RDD:

val tdd = pdd.flatMap(x => {
   val rowKey = PVarchar.INSTANCE.toBytes(x._1)
   for(i <- 0 until cols.length) yield {
     other stuff for other columns ...
     ...
     (rowKey, (column1, column2, column3))
   }
})

...

We then create the following output to be written down in Hbase

val output = tdd.map(x => {
     val rowKeyByte: Array[Byte] = x._1
     val immutableRowKey = new ImmutableBytesWritable(rowKeyByte)

     val kv = new KeyValue(rowKeyByte,
         PVarchar.INSTANCE.toBytes(column1),
         PVarchar.INSTANCE.toBytes(column2),
       PVarchar.INSTANCE.toBytes(column3)
     )
     (immutableRowKey, kv)
})

By the way, we are using *KryoSerializer* in order to be able to 
serialize all classes necessary for Hbase (KeyValue, BytesWritable, etc).


The key of this table is the one missing data when queried from Phoenix. 
So we guess something is wrong with the byte ser.


Any ideas? Appreciated!
Saif


Re: Issue in upgrading phoenix : java.lang.ArrayIndexOutOfBoundsException: SYSTEM:CATALOG 63

2018-09-12 Thread Thomas D'Silva
can you attach the schema of your table? and the explain plan for select *
from mytable?

On Tue, Sep 11, 2018 at 10:24 PM, Tanvi Bhandari 
wrote:

> " mapped hbase tables to phoenix and created them explicitly from phoenix
> sqlline client. I first created schema corresponding to namespace and then
> tables." By this statement, I meant the same. I re-created my tables
> since I had the DDLs with me.
>
> After that I tried getting the count of records in my table which gave me
> 8 records (expected result). - *select count(*) from "myTable"*;
> But when I performed the *select * from "myTable";* it is not returning
> any result.
>
> On Wed, Sep 12, 2018 at 1:55 AM Thomas D'Silva 
> wrote:
>
>> Since you dropped all the system tables, all the phoenix metadata was
>> lost. If you have the ddl statements used to create your tables, you can
>> try rerunning them.
>>
>> On Tue, Sep 11, 2018 at 9:32 AM, Tanvi Bhandari > > wrote:
>>
>>> Hi,
>>>
>>>
>>>
>>> I am trying to upgrade the phoenix binaries in my setup from phoenix-4.6
>>> (had optional concept of schema) to phoenix-4.14 (schema is a must in
>>> here).
>>>
>>> Earlier, I had the phoenix-4.6-hbase-1.1 binaries. When I try to run the
>>> phoenix-4.14-hbase-1.3 on the same data. Hbase comes up fine But when I try
>>> to connect to phoenix using sqline client,  I get the following error on
>>> *console*:
>>>
>>>
>>>
>>> 18/09/07 04:22:48 WARN ipc.CoprocessorRpcChannel: Call failed on
>>> IOException
>>>
>>> org.apache.hadoop.hbase.DoNotRetryIOException: 
>>> org.apache.hadoop.hbase.DoNotRetryIOException:
>>> SYSTEM:CATALOG: 63
>>>
>>> at org.apache.phoenix.util.ServerUtil.createIOException(
>>> ServerUtil.java:120)
>>>
>>> at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.
>>> getVersion(MetaDataEndpointImpl.java:3572)
>>>
>>> at org.apache.phoenix.coprocessor.generated.MetaDataProtos$
>>> MetaDataService.callMethod(MetaDataProtos.java:16422)
>>>
>>> at org.apache.hadoop.hbase.regionserver.HRegion.
>>> execService(HRegion.java:7435)
>>>
>>> at org.apache.hadoop.hbase.regionserver.RSRpcServices.
>>> execServiceOnRegion(RSRpcServices.java:1875)
>>>
>>> at org.apache.hadoop.hbase.regionserver.RSRpcServices.
>>> execService(RSRpcServices.java:1857)
>>>
>>> at org.apache.hadoop.hbase.protobuf.generated.
>>> ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32209)
>>>
>>> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:
>>> 2114)
>>>
>>> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.
>>> java:101)
>>>
>>> at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(
>>> RpcExecutor.java:130)
>>>
>>> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.
>>> java:107)
>>>
>>> at java.lang.Thread.run(Thread.java:745)
>>>
>>> Caused by: java.lang.ArrayIndexOutOfBoundsException: 63
>>>
>>> at org.apache.phoenix.schema.PTableImpl.init(PTableImpl.
>>> java:517)
>>>
>>> at org.apache.phoenix.schema.PTableImpl.(PTableImpl.
>>> java:421)
>>>
>>> at org.apache.phoenix.schema.PTableImpl.makePTable(
>>> PTableImpl.java:406)
>>>
>>> at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getTable(
>>> MetaDataEndpointImpl.java:1046)
>>>
>>> at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.
>>> buildTable(MetaDataEndpointImpl.java:587)
>>>
>>>at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.loadTable(
>>> MetaDataEndpointImpl.java:1305)
>>>
>>> at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.
>>> getVersion(MetaDataEndpointImpl.java:3568)
>>>
>>> ... 10 more
>>>
>>>
>>>
>>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>>> Method)
>>>
>>> at sun.reflect.NativeConstructorAccessorImpl.newInstance(
>>> NativeConstructorAccessorImpl.java:62)
>>>
>>> at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
>>> DelegatingConstructorAccessorImpl.java:45)
>>>
>>> at java.lang.reflect.Constructor.newInstance(Constructor.java:
>>> 423)
>>>
>>> at org.apache.hadoop.ipc.RemoteException.instantiateException(
>>> RemoteException.java:106)
>>>
>>> at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(
>>> RemoteException.java:95)
>>>
>>> at org.apache.hadoop.hbase.protobuf.ProtobufUtil.
>>> getRemoteException(ProtobufUtil.java:326)
>>>
>>> at org.apache.hadoop.hbase.protobuf.ProtobufUtil.
>>> execService(ProtobufUtil.java:1629)
>>>
>>> at org.apache.hadoop.hbase.ipc.RegionCoprocessorRpcChannel$1.
>>> call(RegionCoprocessorRpcChannel.java:104)
>>>
>>> at org.apache.hadoop.hbase.ipc.RegionCoprocessorRpcChannel$1.
>>> call(RegionCoprocessorRpcChannel.java:94)
>>>
>>> at org.apache.hadoop.hbase.client.RpcRetryingCaller.
>>> callWithRetries(RpcRetryingCaller.java:136)
>>>
>>> at org.apache.hadoop.hbase.ipc.RegionCoprocessorRpcChannel.
>>> 

Missing content in phoenix after writing from Spark

2018-09-12 Thread Saif Addin
Hi all,

We're trying to write tables with all string columns from spark.
We are not using the Spark Connector, instead we are directly writing byte
arrays from RDDs.

The process works fine, and Hbase receives the data correctly, and content
is consistent.

However reading the table from Phoenix, we notice the first character of
strings are missing. This sounds like it's a byte encoding issue, but we're
at loss. We're using PVarchar to generate bytes.

Here's the snippet of code creating the RDD:

val tdd = pdd.flatMap(x => {
  val rowKey = PVarchar.INSTANCE.toBytes(x._1)
  for(i <- 0 until cols.length) yield {
other stuff for other columns ...
...
(rowKey, (column1, column2, column3))
  }
})

...

We then create the following output to be written down in Hbase

val output = tdd.map(x => {
val rowKeyByte: Array[Byte] = x._1
val immutableRowKey = new ImmutableBytesWritable(rowKeyByte)

val kv = new KeyValue(rowKeyByte,
PVarchar.INSTANCE.toBytes(column1),
PVarchar.INSTANCE.toBytes(column2),
  PVarchar.INSTANCE.toBytes(column3)
)

(immutableRowKey, kv)
})

By the way, we are using *KryoSerializer* in order to be able to serialize
all classes necessary for Hbase (KeyValue, BytesWritable, etc).

The key of this table is the one missing data when queried from Phoenix. So
we guess something is wrong with the byte ser.

Any ideas? Appreciated!
Saif