Secondary Indexes - Missing Data in Phoenix

2019-07-24 Thread Alexander Lytchier
Hi,

We are currently using Cloudera as a package manager for our Hadoop Cluster 
with Phoenix 4.7.0 (CLABS_PHOENIX) and HBase 1.2.0-cdh5.7.6. Phoenix 4.7.0 
appears to be the latest version supported 
(http://archive.cloudera.com/cloudera-labs/phoenix/parcels/latest/) even though 
it’s old.

The table in question has a binary row-key: pk BINARY(30): 1 Byte for salting, 
8 Bytes - timestamp (Long), 20 Bytes - hash result of other record fields. + 1 
extra byte for unknown issue about updating schema in future (not sure if 
relevant). We are currently facing performance issues and are attempting to 
mitigate it by adding secondary indexes.

When generating a local index synchronously with the following command:

CREATE LOCAL INDEX INDEX_TABLE ON “MyTable” (“cf”.”type”);

I can see that the resulting index table in Phoenix is populated, in HBase I 
can see the row-key of the index table and queries work as expected:

\x00\x171545413\x00 column=cf:cf:type, timestamp=1563954319353, value=1545413
\x00\x00\x00\x01b\xB2s\xDB
@\x1B\x94\xFA\xD4\x14c\x0B
d$\x82\xAD\xE6\xB3\xDF\x06
\xC9\x07@\xB9\xAE\x00

However, for the case where the index is created asynchronously, and then 
populated using the IndexTool, with the following commands:

CREATE LOCAL INDEX INDEX_TABLE ON “MyTable” (“cf”.”type”) ASYNC;

sudo -u hdfs HADOOP_CLASSPATH=`hbase classpath` hadoop jar 
/opt/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/lib/hbase/bin/../lib/hbase-client-1.2.0-cdh5.7.1.jar
 org.apache.phoenix.mapreduce.index.IndexTool --data-table "MyTable" 
--index-table INDEX_TABLE --output-path hdfs://nameservice1/

I get the following row-key in HBase:

\x00\x00\x00\x00\x00\x00\x column=cf:cf:type, timestamp=1563954000238, 
value=1545413
00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x
00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x
151545413\x00\x00\x
00\x00\x01b\xB2s\xDB@\x1B\
x94\xFA\xD4\x14c\x0Bd$\x82
\xAD\xE6\xB3\xDF\x06\xC9\x
07@\xB9\xAE\x00

It is has 32 additional 0-bytes (\x00). Why is there a difference – is one 
expected? What’s more, the index table in Phoenix is empty (I guess it’s not 
able to read the underlying HBase index table with that key?), so any queries 
that use the local index in Phoenix return no value.

Do you have any suggestions? We must use the async method to populate the index 
table on production because of the massive amounts of data, but if Phoenix is 
not able to read the index table it cannot be used for queries.

Is it possible this issue has been fixed in a newer version?

Thanks


Re: Secondary Indexes - Missing Data in Phoenix

2019-07-25 Thread Josh Elser
Local indexes are stored in the same table as the data. They are "local" 
to the data.


I would not be surprised if you are running into issues because you are 
using such an old version of Phoenix.


On 7/24/19 10:35 PM, Alexander Lytchier wrote:

Hi,

We are currently using Cloudera as a package manager for our Hadoop 
Cluster with Phoenix 4.7.0 (CLABS_PHOENIX)and HBase 1.2.0-cdh5.7.6. 
Phoenix 4.7.0 appears to be the latest version supported 
(http://archive.cloudera.com/cloudera-labs/phoenix/parcels/latest/) even 
though it’s old.


The table in question has a binary row-key: pk BINARY(30): 1 Byte for 
salting, 8 Bytes - timestamp (Long), 20 Bytes - hash result of other 
record fields. + 1 extra byte for unknown issue about updating schema in 
future (not sure if relevant). We are currently facing performance 
issues and are attempting to mitigate it by adding secondary indexes.


When generating a local index synchronously with the following command:

CREATE LOCAL INDEX INDEX_TABLE ON “MyTable” (“cf”.”type”);

I can see that the resulting index table in Phoenix is populated, in 
HBase I can see the row-key of the index table and queries work as expected:


\x00\x171545413\x00 column=cf:cf:type, timestamp=1563954319353, 
value=1545413


\x00\x00\x00\x01b\xB2s\xDB

@\x1B\x94\xFA\xD4\x14c\x0B

d$\x82\xAD\xE6\xB3\xDF\x06

\xC9\x07@\xB9\xAE\x00

However, for the case where the index is created asynchronously, and 
then populated using the IndexTool, with the following commands:



CREATE LOCAL INDEX INDEX_TABLE ON “MyTable” (“cf”.”type”) ASYNC;

sudo -u hdfs HADOOP_CLASSPATH=`hbase classpath` hadoop jar 
/opt/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/lib/hbase/bin/../lib/hbase-client-1.2.0-cdh5.7.1.jar 
org.apache.phoenix.mapreduce.index.IndexTool --data-table "MyTable" 
--index-table INDEX_TABLE --output-path hdfs://nameservice1/


I get the following row-key in HBase:


\x00\x00\x00\x00\x00\x00\x column=cf:cf:type, timestamp=1563954000238, 
value=1545413


00\x00\x00\x00\x00\x00\x00

\x00\x00\x00\x00\x00\x00\x

00\x00\x00\x00\x00\x00\x00

\x00\x00\x00\x00\x00\x00\x

151545413\x00\x00\x

00\x00\x01b\xB2s\xDB@\x1B\

x94\xFA\xD4\x14c\x0Bd$\x82

\xAD\xE6\xB3\xDF\x06\xC9\x

07@\xB9\xAE\x00

It is has 32 additional 0-bytes (\x00). Why is there a difference – is 
one expected? What’s more, the index table in Phoenix is empty (I guess 
it’s not able to read the underlying HBase index table with that key?), 
so any queries that use the local index in Phoenix return no value.


Do you have any suggestions? We must use the /async /method to populate 
the index table on production because of the massive amounts of data, 
but if Phoenix is not able to read the index table it cannot be used for 
queries.


Is it possible this issue has been fixed in a newer version?

Thanks



Re: Secondary Indexes - Missing Data in Phoenix

2019-07-25 Thread Alexander Lytchier
Thanks for the reply.

We will attempt to update to Phoenix 4.14.X and re-try adding secondary indexes.

Can you help to clarify “local indexes are stored in the same table as the 
data”. When a local index is created in Phoenix I observe that a new table is 
created in HBase _LOCAL_IDX_TABLE_NAME. It was my assumption that this is where 
the columns for the index table are stored, along with the PK values? Moreover 
using EXPLAIN in Phoenix I can see that it will attempt to SCAN OVER 
_LOCAL_IDX_TABLE_NAME when my query is using the index.

On 2019/07/25 14:00:25, Josh Elser mailto:e...@apache.org>> 
wrote:
> Local indexes are stored in the same table as the data. They are "local" >
> to the data.>
>
> I would not be surprised if you are running into issues because you are >
> using such an old version of Phoenix.>
>
> On 7/24/19 10:35 PM, Alexander Lytchier wrote:>
> > Hi,>
> > >
> > We are currently using Cloudera as a package manager for our Hadoop >
> > Cluster with Phoenix 4.7.0 (CLABS_PHOENIX)and HBase 1.2.0-cdh5.7.6. >
> > Phoenix 4.7.0 appears to be the latest version supported >
> > (http://archive.cloudera.com/cloudera-labs/phoenix/parcels/latest/) even >
> > though it’s old.>
> > >
> > The table in question has a binary row-key: pk BINARY(30): 1 Byte for >
> > salting, 8 Bytes - timestamp (Long), 20 Bytes - hash result of other >
> > record fields. + 1 extra byte for unknown issue about updating schema in >
> > future (not sure if relevant). We are currently facing performance >
> > issues and are attempting to mitigate it by adding secondary indexes.>
> > >
> > When generating a local index synchronously with the following command:>
> > >
> > CREATE LOCAL INDEX INDEX_TABLE ON “MyTable” (“cf”.”type”);>
> > >
> > I can see that the resulting index table in Phoenix is populated, in >
> > HBase I can see the row-key of the index table and queries work as 
> > expected:>
> > >
> > \x00\x171545413\x00 column=cf:cf:type, timestamp=1563954319353, >
> > value=1545413>
> > >
> > \x00\x00\x00\x01b\xB2s\xDB>
> > >
> > @\x1B\x94\xFA\xD4\x14c\x0B>
> > >
> > d$\x82\xAD\xE6\xB3\xDF\x06>
> > >
> > \xC9\x07@\xB9\xAE\x00>
> > >
> > However, for the case where the index is created asynchronously, and >
> > then populated using the IndexTool, with the following commands:>
> > >
> > >
> > CREATE LOCAL INDEX INDEX_TABLE ON “MyTable” (“cf”.”type”) ASYNC;>
> > >
> > sudo -u hdfs HADOOP_CLASSPATH=`hbase classpath` hadoop jar >
> > /opt/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/lib/hbase/bin/../lib/hbase-client-1.2.0-cdh5.7.1.jar
> >  >
> > org.apache.phoenix.mapreduce.index.IndexTool --data-table "MyTable" >
> > --index-table INDEX_TABLE --output-path hdfs://nameservice1/>
> > >
> > I get the following row-key in HBase:>
> > >
> > >
> > \x00\x00\x00\x00\x00\x00\x column=cf:cf:type, timestamp=1563954000238, >
> > value=1545413>
> > >
> > 00\x00\x00\x00\x00\x00\x00>
> > >
> > \x00\x00\x00\x00\x00\x00\x>
> > >
> > 00\x00\x00\x00\x00\x00\x00>
> > >
> > \x00\x00\x00\x00\x00\x00\x>
> > >
> > 151545413\x00\x00\x>
> > >
> > 00\x00\x01b\xB2s\xDB@\x1B\>
> > >
> > x94\xFA\xD4\x14c\x0Bd$\x82>
> > >
> > \xAD\xE6\xB3\xDF\x06\xC9\x>
> > >
> > 07@\xB9\xAE\x00>
> > >
> > It is has 32 additional 0-bytes (\x00). Why is there a difference – is >
> > one expected? What’s more, the index table in Phoenix is empty (I guess >
> > it’s not able to read the underlying HBase index table with that key?), >
> > so any queries that use the local index in Phoenix return no value.>
> > >
> > Do you have any suggestions? We must use the /async /method to populate >
> > the index table on production because of the massive amounts of data, >
> > but if Phoenix is not able to read the index table it cannot be used for >
> > queries.>
> > >
> > Is it possible this issue has been fixed in a newer version?>
> > >
> > Thanks>
> > >
>


Re: Secondary Indexes - Missing Data in Phoenix

2019-07-29 Thread Aleksandr Saraseka
Hello Alex.
Please refer to this JIRA https://issues.apache.org/jira/browse/PHOENIX-1734 .
Since v4.8 local index it's just a shadow CF within data table.

On Fri, Jul 26, 2019 at 5:08 AM Alexander Lytchier <
alexanderlytch...@m800.com> wrote:

> Thanks for the reply.
>
> We will attempt to update to Phoenix 4.14.X and re-try adding secondary
> indexes.
>
> Can you help to clarify “local indexes are stored in the same table as the
> data”. When a local index is created in Phoenix I observe that a new table
> is created in HBase *_LOCAL_IDX_TABLE_NAME*. It was my assumption that
> this is where the columns for the index table are stored, along with the PK
> values? Moreover using *EXPLAIN* in Phoenix I can see that it will
> attempt to SCAN OVER *_LOCAL_IDX_TABLE_NAME* when my query is using the
> index.
>
>
>
> On 2019/07/25 14:00:25, Josh Elser  wrote:
>
> > Local indexes are stored in the same table as the data. They are "local"
> >
>
> > to the data.>
>
> >
>
> > I would not be surprised if you are running into issues because you are
> >
>
> > using such an old version of Phoenix.>
>
> >
>
> > On 7/24/19 10:35 PM, Alexander Lytchier wrote:>
>
> > > Hi,>
>
> > > >
>
> > > We are currently using Cloudera as a package manager for our Hadoop >
>
> > > Cluster with Phoenix 4.7.0 (CLABS_PHOENIX)and HBase 1.2.0-cdh5.7.6. >
>
> > > Phoenix 4.7.0 appears to be the latest version supported >
>
> > > (http://archive.cloudera.com/cloudera-labs/phoenix/parcels/latest/)
> even >
>
> > > though it’s old.>
>
> > > >
>
> > > The table in question has a binary row-key: pk BINARY(30): 1 Byte for
> >
>
> > > salting, 8 Bytes - timestamp (Long), 20 Bytes - hash result of other >
>
> > > record fields. + 1 extra byte for unknown issue about updating schema
> in >
>
> > > future (not sure if relevant). We are currently facing performance >
>
> > > issues and are attempting to mitigate it by adding secondary indexes.>
>
> > > >
>
> > > When generating a local index synchronously with the following
> command:>
>
> > > >
>
> > > CREATE LOCAL INDEX INDEX_TABLE ON “MyTable” (“cf”.”type”);>
>
> > > >
>
> > > I can see that the resulting index table in Phoenix is populated, in >
>
> > > HBase I can see the row-key of the index table and queries work as
> expected:>
>
> > > >
>
> > > \x00\x171545413\x00 column=cf:cf:type, timestamp=1563954319353, >
>
> > > value=1545413>
>
> > > >
>
> > > \x00\x00\x00\x01b\xB2s\xDB>
>
> > > >
>
> > > @\x1B\x94\xFA\xD4\x14c\x0B>
>
> > > >
>
> > > d$\x82\xAD\xE6\xB3\xDF\x06>
>
> > > >
>
> > > \xC9\x07@\xB9\xAE\x00>
>
> > > >
>
> > > However, for the case where the index is created asynchronously, and >
>
> > > then populated using the IndexTool, with the following commands:>
>
> > > >
>
> > > >
>
> > > CREATE LOCAL INDEX INDEX_TABLE ON “MyTable” (“cf”.”type”) ASYNC;>
>
> > > >
>
> > > sudo -u hdfs HADOOP_CLASSPATH=`hbase classpath` hadoop jar >
>
> > >
> /opt/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/lib/hbase/bin/../lib/hbase-client-1.2.0-cdh5.7.1.jar
> >
>
> > > org.apache.phoenix.mapreduce.index.IndexTool --data-table "MyTable" >
>
> > > --index-table INDEX_TABLE --output-path hdfs://nameservice1/>
>
> > > >
>
> > > I get the following row-key in HBase:>
>
> > > >
>
> > > >
>
> > > \x00\x00\x00\x00\x00\x00\x column=cf:cf:type, timestamp=1563954000238,
> >
>
> > > value=1545413>
>
> > > >
>
> > > 00\x00\x00\x00\x00\x00\x00>
>
> > > >
>
> > > \x00\x00\x00\x00\x00\x00\x>
>
> > > >
>
> > > 00\x00\x00\x00\x00\x00\x00>
>
> > > >
>
> > > \x00\x00\x00\x00\x00\x00\x>
>
> > > >
>
> > > 151545413\x00\x00\x>
>
> > > >
>
> > > 00\x00\x01b\xB2s\xDB@\x1B\>
>
> > > >
>
> > > x94\xFA\xD4\x14c\x0Bd$\x82>
>
> > > >
>
> > > \xAD\xE6\xB3\xDF\x06\xC9\x>
>
> > > >
>
> > > 07@\xB9\xAE\x00>
>
> > > >
>
> > > It is has 32 additional 0-bytes (\x00). Why is there a difference – is
> >
>
> > > one expected? What’s more, the index table in Phoenix is empty (I
> guess >
>
> > > it’s not able to read the underlying HBase index table with that
> key?), >
>
> > > so any queries that use the local index in Phoenix return no value.>
>
> > > >
>
> > > Do you have any suggestions? We must use the /async /method to
> populate >
>
> > > the index table on production because of the massive amounts of data,
> >
>
> > > but if Phoenix is not able to read the index table it cannot be used
> for >
>
> > > queries.>
>
> > > >
>
> > > Is it possible this issue has been fixed in a newer version?>
>
> > > >
>
> > > Thanks>
>
> > > >
>
> >
>


-- 
Aleksandr Saraseka
DBA
380997600401
 *•*  asaras...@eztexting.com  *•*  eztexting.com