[ https://issues.apache.org/jira/browse/HIVE-2520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13166880#comment-13166880 ]
Hudson commented on HIVE-2520: ------------------------------ Integrated in Hive-trunk-h0.21 #1137 (See [https://builds.apache.org/job/Hive-trunk-h0.21/1137/]) HIVE-2520 left semi join will duplicate data (binlijin via namit) namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1212738 Files : * /hive/trunk/data/files/sales.txt * /hive/trunk/data/files/things.txt * /hive/trunk/data/files/things2.txt * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java * /hive/trunk/ql/src/test/queries/clientpositive/leftsemijoin.q * /hive/trunk/ql/src/test/results/clientpositive/leftsemijoin.q.out > left semi join will duplicate data > ---------------------------------- > > Key: HIVE-2520 > URL: https://issues.apache.org/jira/browse/HIVE-2520 > Project: Hive > Issue Type: Bug > Affects Versions: 0.7.0 > Reporter: binlijin > Assignee: binlijin > Priority: Critical > Labels: patch > Attachments: HIVE-2520.D717.1.patch, hive-2520.2.patch, > hive-2520.patch > > > CREATE TABLE sales (name STRING, id INT) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'; > CREATE TABLE things (id INT, name STRING) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'; > The 'sales' table has data in a file: sales.txt, and the data is: > Joe 2 > Hank 2 > The 'things' table has data int two files: things.txt and things2.txt: > The content of things.txt is : > 2 Tie > The content of things2.txt is : > 2 Tie > SELECT * FROM sales LEFT SEMI JOIN things ON (sales.id = things.id); > will output: > Joe 2 > Joe 2 > Hank 2 > Hank 2 > so the result is wrong. > In CommonJoinOperator left semi join should use " genObject(null, 0, new > IntermediateObject(new ArrayList[numAliases], 0), true); " to generate data. > but now it uses " genUniqueJoinObject(0, 0); " to generate data. > This patch will solve this problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira