----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/43875/#review120456 -----------------------------------------------------------
src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java (line 100) <https://reviews.apache.org/r/43875/#comment181910> In general, wildcards are not preferred in import. src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java (line 555) <https://reviews.apache.org/r/43875/#comment181911> Please create a JIRA for XML format support. src/org/apache/pig/backend/hadoop/executionengine/spark/plan/DotSparkPrinter.java (line 74) <https://reviews.apache.org/r/43875/#comment181912> Shouldn't this be return op.getName() ? src/org/apache/pig/backend/hadoop/executionengine/spark/plan/DotSparkPrinter.java (line 130) <https://reviews.apache.org/r/43875/#comment181913> Nit : naming of variable sparkp. Should follow camel case. May be sparkInnnerOp? - Pallavi Rao On Feb. 23, 2016, 12:29 p.m., prateek vaishnav wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/43875/ > ----------------------------------------------------------- > > (Updated Feb. 23, 2016, 12:29 p.m.) > > > Review request for pig and Pallavi Rao. > > > Repository: pig-git > > > Description > ------- > > https://issues.apache.org/jira/browse/PIG-4807 > > Following test cases have been fixed - > 1. org.apache.pig.test.TestEvalPipelineLocal.testSetLocationCalledInFE > 2. org.apache.pig.test.TestEvalPipelineLocal.testExplainInDotGraph > 3. org.apache.pig.test.TestEvalPipelineLocal.testSortWithUDF > > 1 was failing because of not saving UDF_CONTEXT configuration in jobConf. > This leads UDFContext.getUDFProperties() to return NULL. > > public Properties getUDFProperties(Class c) { > UDFContextKey k = generateKey(c, null); > Properties p = udfConfs.get(k); > if (p == null) { > p = new Properties(); > udfConfs.put(k, p); > } > return p; > } > > Here, udfConfs remains empty even when it was set while processing the pig > query. > udf configuration in jobConf is getting lost while running the job. > In the code udf configuration is meant to be saved by serializing them in > jobConf. > > Currently, serialization is done before loading configuration in jobConf. > It is done in 'newJobConf(PigContext pigContext)' > It needs to be done after loading configuration. > > JobConf jobConf = SparkUtil.newJobConf(pigContext); > configureLoader(physicalPlan, op, jobConf); > UDFContext.getUDFContext().serialize(jobConf); > > 2 was failing because of pig-spark not supporting 'explain' in dot format. I > have added the DotSparkPrinter to fix the same. > > 3 was failing because instead of UDFSortComparator, SortConveter class was > using SortComparator. > > JavaPairRDD<Tuple, Object> sorted = r.sortByKey( > sortOperator.new SortComparator(), true); > > It should be using mComparator stored in POSort class. I have changed it to > following > > JavaPairRDD<Tuple, Object> sorted = r.sortByKey( > sortOperator.getMComparator(), true); > > > Diffs > ----- > > > src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POSort.java > a759857 > src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java > b74977d > > src/org/apache/pig/backend/hadoop/executionengine/spark/converter/LoadConverter.java > 90cff23 > > src/org/apache/pig/backend/hadoop/executionengine/spark/converter/SortConverter.java > f54f8fc > > src/org/apache/pig/backend/hadoop/executionengine/spark/plan/DotSparkPrinter.java > e69de29 > test/org/apache/pig/test/TestEvalPipelineLocal.java d73074c > > Diff: https://reviews.apache.org/r/43875/diff/ > > > Testing > ------- > > Successfully ran TestEvalPipelineLocal in spark/mr/local mode. > > > Thanks, > > prateek vaishnav > >
