[ 
https://issues.apache.org/jira/browse/PHOENIX-2783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15211238#comment-15211238
 ] 

Biju Nair commented on PHOENIX-2783:
------------------------------------

Hi [~sergey.soldatov], since the change is in the {{DDL}} code path and the 
number of columns in a index will not be large, it may be worth trading the 
small difference in performance identified in your experiment for a cleaner 
code using {{HashSet}}. But looking further into the 
[code|https://github.com/apache/phoenix/blob/dbc9ee9dfe9e168c45ad279f8478c59f0882240c/phoenix-core/src/main/java/org/apache/phoenix/schema/MetaDataClient.java]
 I am not sure whether the proposed changed in the correct one due to two 
reasons.

1. 
[createIndex|https://github.com/apache/phoenix/blob/dbc9ee9dfe9e168c45ad279f8478c59f0882240c/phoenix-core/src/main/java/org/apache/phoenix/schema/MetaDataClient.java#L1117]
 in turn gets converted into a call to 
[createTable|https://github.com/apache/phoenix/blob/dbc9ee9dfe9e168c45ad279f8478c59f0882240c/phoenix-core/src/main/java/org/apache/phoenix/schema/MetaDataClient.java#L1320]
 which also includes logic similar to the one in the proposed fix to identify 
duplicate columns which is duplication of code/logic which can make future 
maintenance difficult.
2. There are other scenarios, where duplicate columns in DDLs generate 
undesired results i.e. the scope of this issue is not confined only to index 
creation. The following are two which I came across.
a) In table creation where two columns of same name but different types are 
used in the DDL, the DDL generates an error in {{sqlline.py}}, but the table is 
created and left in an unusable state similar to the index creation issue 
reported in this jura ticket.
{noformat}
0: jdbc:phoenix:vm1:2181:/hbase> create table tbl2 (i integer not null primary 
key, i integer, i varchar);
Error: ERROR 514 (42892): A duplicate column name was detected in the object 
definition or ALTER TABLE statement. columnName=TBL2.I (state=42892,code=514)
{noformat}
b) Again in table creation if the DDL has two columns with same name and type, 
the table creation goes through successfully and the table is usable. But the 
expectation of users from SQL DB background, it is not the expected behavior.
{noformat}
0: jdbc:phoenix:vm1:2181:/hbase> create table tbl1 (i integer not null primary 
key, i integer);
No rows affected (0.632 seconds)
0: jdbc:phoenix:vm1:2181:/hbase> upsert into tbl1 values (1, 2);
1 row affected (0.057 seconds)
0: jdbc:phoenix:vm1:2181:/hbase> select * from tbl1;
+------------+------------+
|     I      |     I      |
+------------+------------+
| 1          | 2         |
+------------+------------+
1 row selected (0.084 seconds)
{noformat}
Since the issue impacts both the index and table creation, it may be better to 
have the duplication checking logic at the start of 
[createTableInternal|https://github.com/apache/phoenix/blob/dbc9ee9dfe9e168c45ad279f8478c59f0882240c/phoenix-core/src/main/java/org/apache/phoenix/schema/MetaDataClient.java#L1502]
 method. I have done a quick change to test it and at least it handles these 
three scenarios. Will attach the change as a patch file which will help with 
the conversation. It will be good to know the feedback from project members 
like  [~jamestaylor] who has more knowledge about the history of the code on 
whether we are approaching it in the right direction.

> Creating secondary index with duplicated columns makes the catalog corrupted
> ----------------------------------------------------------------------------
>
>                 Key: PHOENIX-2783
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2783
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.7.0
>            Reporter: Sergey Soldatov
>            Assignee: Sergey Soldatov
>         Attachments: PHOENIX-2783-1.patch, PHOENIX-2783-2.patch
>
>
> Simple example
> {noformat}
> create table x (t1 varchar primary key, t2 varchar, t3 varchar);
> create index idx on x (t2) include (t1,t3,t3);
> {noformat}
> cause an exception that duplicated column was detected, but the client 
> updates the catalog before throwing it and makes it unusable. All following 
> attempt to use table x cause an exception ArrayIndexOutOfBounds. This problem 
> was discussed on the user list recently. 
> The cause of the problem is that check for duplicated columns happen in 
> PTableImpl after MetaDataClient complete the server createTable. 
> The simple way to fix is to add a similar check in MetaDataClient before 
> createTable is called. 
> Possible someone can suggest a more elegant way to fix it? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to