[ https://issues.apache.org/jira/browse/PHOENIX-2521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
James Taylor updated PHOENIX-2521: ---------------------------------- Summary: Support duplicate rows in CSV Bulk Loader (was: Index rows are not updated when the index key updated using bulk loader ) > Support duplicate rows in CSV Bulk Loader > ----------------------------------------- > > Key: PHOENIX-2521 > URL: https://issues.apache.org/jira/browse/PHOENIX-2521 > Project: Phoenix > Issue Type: Bug > Affects Versions: 4.5.2 > Reporter: Afshin Moazami > > found out the map reduce csv bulk load tool doesn't behave the same as > UPSERTs. Is it by design or a bug? > Here is the queries for creating table and index: > {code} CREATE TABLE mySchema.mainTable ( > id varchar NOT NULL, > name varchar, > address varchar > CONSTRAINT pk PRIMARY KEY (id)); {code} > {code} CREATE INDEX myIndex > ON mySchema.mainTable (name, id) > INCLUDE (address); {code} > if I execute two upserts where the second one update the name (which is the > key for index), everything works fine (the record will be updated in both > table and index table) > {code} UPSERT INTO mySchema.mainTable (id, name, address) values ('1', > 'john', 'Montreal');{code} > {code}UPSERT INTO mySchema.mainTable (id, name, address) values ('1', 'jack', > 'Montreal');{code} > {code}SELECT /*+ INDEX(mySchema.mainTable myIndex) */ * from > mySchema.mainTable where name = 'jack'; {code} ==> one record > {code}SELECT /*+ INDEX(mySchema.mainTable myIndex) */ * from > mySchema.mainTable where name = 'john'; {code} ==> zero records > But, if I load the date using org.apache.phoenix.mapreduce.CsvBulkLoadTool to > the main table, it behaves different. The main table will be updated, but the > new record will be appended to the index table: > HADOOP_CLASSPATH=/usr/lib/hbase/lib/hbase-protocol-1.1.2.jar:/etc/hbase/conf > hadoop jar > /usr/lib/hbase/phoenix-4.5.2-HBase-1.1-bin/phoenix-4.5.2-HBase-1.1-client.jar > org.apache.phoenix.mapreduce.CsvBulkLoadTool -d',' -s mySchema -t mainTable > -i /tmp/input.txt > input.txt: > 2,tomas,montreal > 2,george,montreal > (I have tried it both with/without -it and got the same result) > {code}SELECT /*+ INDEX(mySchema.mainTable myIndex) */ * from > mySchema.mainTable where name = 'tomas' {code} ==> one record; > {code} SELECT /*+ INDEX(mySchema.mainTable myIndex) */ * from > mySchema.mainTable where name = 'george' {code} ==> one record; -- This message was sent by Atlassian JIRA (v6.3.4#6332)