Hi all: I want to ask a question about TestSkewedJoin# testSkewedJoinKeyPartition:
@Test public void testSkewedJoinKeyPartition() throws IOException { String outputDir = "testSkewedJoinKeyPartition"; try{ Util.deleteFile(cluster, outputDir); }catch(Exception e){ // it is ok if directory not exist } pigServer.registerQuery("A = LOAD '" + INPUT_FILE1 + "' as (id, name, n);"); pigServer.registerQuery("B = LOAD '" + INPUT_FILE2 + "' as (id, name);"); pigServer.registerQuery("E = join A by id, B by id using 'skewed' parallel 7;"); pigServer.store("E", outputDir); int[][] lineCount = new int[3][7]; FileStatus[] outputFiles = fs.listStatus(new Path(outputDir), Util.getSuccessMarkerPathFilter()); // check how many times a key appear in each part- file for (int i=0; i<7; i++) { String filename = outputFiles[i].getPath().toString(); Util.copyFromClusterToLocal(cluster, filename, OUTPUT_DIR + "/" + i); BufferedReader reader = new BufferedReader(new FileReader(OUTPUT_DIR + "/" + i)); String line = null; while((line = reader.readLine()) != null) { String[] cols = line.split("\t"); int key = Integer.parseInt(cols[0])/100 -1; lineCount[key][i] ++; } reader.close(); } int fc = 0; for(int i=0; i<3; i++) { for(int j=0; j<7; j++) { if (lineCount[i][j] > 0) { fc ++; } } } // atleast one key should be a skewed key // check atleast one key should appear in more than 1 part- file assertTrue(fc > 3); } When I run this unit test , I found the result is in OUTPUT_DIR/0 ~OUTPUT_DIR/6( because the parallel number is 7). One key appears in more 1 part-file. But when I the script in command, I found the result in OUTPUT_DIR/part-00002, OUTPUT_DIR/part-00004, OUTPUT_DIR/part-00006. Other part-0000x is empty. One key only appears in 1 part-file. A = LOAD './SkewedJoinInput1.txt' as (id, name, n); B = LOAD './SkewedJoinInput2.txt' as (id, name); E = join A by id, B by id using 'skewed' parallel 7; store E into './testSkewedJoin.out'; I don't understand why have different results when running in unit test environment and running in command directly? I'm appreciated if anyone can give me some suggestions. Kelly Zhang/Zhang,Liyun Best Regards