[ https://issues.apache.org/jira/browse/PHOENIX-2783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15211238#comment-15211238 ]
Biju Nair commented on PHOENIX-2783: ------------------------------------ Hi [~sergey.soldatov], since the change is in the {{DDL}} code path and the number of columns in a index will not be large, it may be worth trading the small difference in performance identified in your experiment for a cleaner code using {{HashSet}}. But looking further into the [code|https://github.com/apache/phoenix/blob/dbc9ee9dfe9e168c45ad279f8478c59f0882240c/phoenix-core/src/main/java/org/apache/phoenix/schema/MetaDataClient.java] I am not sure whether the proposed changed in the correct one due to two reasons. 1. [createIndex|https://github.com/apache/phoenix/blob/dbc9ee9dfe9e168c45ad279f8478c59f0882240c/phoenix-core/src/main/java/org/apache/phoenix/schema/MetaDataClient.java#L1117] in turn gets converted into a call to [createTable|https://github.com/apache/phoenix/blob/dbc9ee9dfe9e168c45ad279f8478c59f0882240c/phoenix-core/src/main/java/org/apache/phoenix/schema/MetaDataClient.java#L1320] which also includes logic similar to the one in the proposed fix to identify duplicate columns which is duplication of code/logic which can make future maintenance difficult. 2. There are other scenarios, where duplicate columns in DDLs generate undesired results i.e. the scope of this issue is not confined only to index creation. The following are two which I came across. a) In table creation where two columns of same name but different types are used in the DDL, the DDL generates an error in {{sqlline.py}}, but the table is created and left in an unusable state similar to the index creation issue reported in this jura ticket. {noformat} 0: jdbc:phoenix:vm1:2181:/hbase> create table tbl2 (i integer not null primary key, i integer, i varchar); Error: ERROR 514 (42892): A duplicate column name was detected in the object definition or ALTER TABLE statement. columnName=TBL2.I (state=42892,code=514) {noformat} b) Again in table creation if the DDL has two columns with same name and type, the table creation goes through successfully and the table is usable. But the expectation of users from SQL DB background, it is not the expected behavior. {noformat} 0: jdbc:phoenix:vm1:2181:/hbase> create table tbl1 (i integer not null primary key, i integer); No rows affected (0.632 seconds) 0: jdbc:phoenix:vm1:2181:/hbase> upsert into tbl1 values (1, 2); 1 row affected (0.057 seconds) 0: jdbc:phoenix:vm1:2181:/hbase> select * from tbl1; +------------+------------+ | I | I | +------------+------------+ | 1 | 2 | +------------+------------+ 1 row selected (0.084 seconds) {noformat} Since the issue impacts both the index and table creation, it may be better to have the duplication checking logic at the start of [createTableInternal|https://github.com/apache/phoenix/blob/dbc9ee9dfe9e168c45ad279f8478c59f0882240c/phoenix-core/src/main/java/org/apache/phoenix/schema/MetaDataClient.java#L1502] method. I have done a quick change to test it and at least it handles these three scenarios. Will attach the change as a patch file which will help with the conversation. It will be good to know the feedback from project members like [~jamestaylor] who has more knowledge about the history of the code on whether we are approaching it in the right direction. > Creating secondary index with duplicated columns makes the catalog corrupted > ---------------------------------------------------------------------------- > > Key: PHOENIX-2783 > URL: https://issues.apache.org/jira/browse/PHOENIX-2783 > Project: Phoenix > Issue Type: Bug > Affects Versions: 4.7.0 > Reporter: Sergey Soldatov > Assignee: Sergey Soldatov > Attachments: PHOENIX-2783-1.patch, PHOENIX-2783-2.patch > > > Simple example > {noformat} > create table x (t1 varchar primary key, t2 varchar, t3 varchar); > create index idx on x (t2) include (t1,t3,t3); > {noformat} > cause an exception that duplicated column was detected, but the client > updates the catalog before throwing it and makes it unusable. All following > attempt to use table x cause an exception ArrayIndexOutOfBounds. This problem > was discussed on the user list recently. > The cause of the problem is that check for duplicated columns happen in > PTableImpl after MetaDataClient complete the server createTable. > The simple way to fix is to add a similar check in MetaDataClient before > createTable is called. > Possible someone can suggest a more elegant way to fix it? -- This message was sent by Atlassian JIRA (v6.3.4#6332)