[ 
https://issues.apache.org/jira/browse/PHOENIX-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14265085#comment-14265085
 ] 

Maryann Xue edited comment on PHOENIX-1570 at 1/5/15 9:01 PM:
--------------------------------------------------------------

Thanks again [~bdifn] for your analysis, which was mostly very correct. I'll 
just try to re-state the problem, to hopefully make it even clearer.
Everything else worked perfectly right except for the final expression 
evaluation in column projectors. Think the root cause lies in the 
ExpressionCompiler for LocalIndexDataColumnRef, which compiles column reference 
into ProjectedColumnExpression. The ProjectedColumnExpression object holds a 
KeyValueSchema object that evolves with the compilation process, which means 
the schema can be different at different times. The "columns" (here, equivalent 
to the schema) passed to ProjectedColumnExpression by LocalIndexDataColumnRef 
is not "final" but rather a "snapshot" (LocalIndexDataColumnRef.java:70).
Luckily though, in a lot of cases this won't produce wrong results. But in 
[~bdifn]'s case, the field count of the schema just crossed the boundary 
between fixed-length and variable-length. The interpretation for earlier 
columns (from "a" to "r") still took the schema as a fixed-length one and thus 
got the wrong bit information. 
[~bdifn]'s fix may not be a complete solution for the problem. But on the other 
hand, it might require adding a whole new pass on the AST before expression 
compilation (incl. group-by, order-by as well as select) in order to get a 
final schema for local index. I can think of a walk-around though, which is to 
remove "fixed-length" optimization from ValueBitSet and make everything 
"variable-length". What do you think, [~jamestaylor]?


was (Author: maryannxue):
Thanks again [~bdifn] for your analysis, which was mostly very correct. I'll 
just try to re-state the problem, to hopefully make it even clearer.
Everything else worked perfectly right except for the final expression 
evaluation in column projectors. Think the root cause lies in the 
ExpressionCompiler for LocalIndexDataColumnRef, which compiles column reference 
into ProjectedColumnExpression. The ProjectedColumnExpression object holds an 
KeyValueSchema object that evolves as the compilation goes, which means the 
schema is different when compiling "a" and "s". The "columns" passed to 
ProjectedColumnExpression by LocalIndexDataColumnRef is not "final" but just a 
"snapshot" (LocalIndexDataColumnRef.java:70).
Luckily though, in a lot of cases this won't produce wrong results. But in 
[~bdifn]'s case, the field count of the schema just crossed the boundary 
between non-variable-length and variable-length. The interpretation for earlier 
columns (from "a" to "r") still took the schema as a non-variable-length one 
and thus got the wrong bit information. 
I think the fix may not be a complete solution for the problem. But on the 
other hand, it might require adding a whole new pass on the AST before 
expression compilation (incl. group-by, order-by as well as select) in order to 
get a final schema for local index. I can think of a walk-around though, which 
is to remove "non-variable-length" optimization from ValueBitSet and make 
everything "variable-length". What do you think, [~jamestaylor]?

> Data missing when using local index
> -----------------------------------
>
>                 Key: PHOENIX-1570
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1570
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.2.1, 4.2.2
>         Environment: ubuntu 
> HBase 0.98.7
> Hadoop 2.5.1
> OS: ubuntu
>            Reporter: wuchengzhi
>            Priority: Critical
>
> 1. crate a table by the schema as below:
> CREATE TABLE IF NOT EXISTS Miss_data_table(
> a BIGINT NOT NULL,
> b VARCHAR,
> c INTEGER,
> d INTEGER,
> e INTEGER,
> f INTEGER,
> g VARCHAR,
> h VARCHAR,
> i INTEGER,
> j VARCHAR,
> k INTEGER,
> l VARCHAR,
> m VARCHAR,
> n INTEGER,
> o INTEGER,
> p VARCHAR,
> q VARCHAR,
> r INTEGER,
> s BIGINT,
> t VARCHAR CONSTRAINT pk PRIMARY KEY(a))
> 2.create local index for the table with column: q
> create local index idx_q on Miss_data_table (q);
> 3.upsert data into table.
> upsert into Miss_data_table 
> values(96660688,'hello/TEST-0',156,-1,-1,0,'2013-02-14 
> 18:34:05.0','TEST-1',0,'495839182',0,'50','',0,0,'1818378','102218',0,26,'20141201')
> 4. execute querys...
> select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t from Miss_data_table where q = 
> '102218';
> +----------+--------------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+--------+------+------+----------+
> | A        | B            | C    | D    | E    | F    | G    | H    | I    | 
> J    | K    | L    | M    | N    | O    | P    | Q      | R    | S    | T     
>    |
> +----------+--------------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+--------+------+------+----------+
> | 96660688 | hello/TEST-0 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 
> NULL | NULL | NULL | NULL | NULL | NULL | NULL | 102218 | NULL | 26   | 
> 20141201 |
> +----------+--------------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+--------+------+------+----------+
> select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t from Miss_data_table where 
> a=96660688;
> +----------+--------------+------+------+------+------+-----------------------+--------+------+-----------+------+------+------+------+------+---------+--------+------+------+----------+
> | A        | B            | C    | D    | E    | F    | G                     
> | H      | I    | J         | K    | L    | M    | N    | O    | P       | Q  
>     | R    | S    | T        |
> +----------+--------------+------+------+------+------+-----------------------+--------+------+-----------+------+------+------+------+------+---------+--------+------+------+----------+
> | 96660688 | hello/TEST-0 | 156  | -1   | -1   | 0    | 2013-02-14 18:34:05.0 
> | TEST-1 | 0    | 495839182 | 0    | 50   | NULL | 0    | 0    | 1818378 | 
> 102218 | 0    | 26   | 20141201 |
> +----------+--------------+------+------+------+------+-----------------------+--------+------+-----------+------+------+------+------+------+---------+--------+------+------+----------+
> // execute the query plain ,it shows we fetch data by local index.
> explain select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t from Miss_data_table 
> where q = '102218';
> +------------------------------------------+
> |                   PLAN                   |
> +------------------------------------------+
> | CLIENT 1-CHUNK PARALLEL 1-WAY RANGE SCAN OVER 
> _LOCAL_IDX_TEST.MISS_DATA_TABLE [-32768,'102218'] |
> | CLIENT MERGE SORT                        |
> +------------------------------------------+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to