[ https://issues.apache.org/jira/browse/PHOENIX-4372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16258181#comment-16258181 ]
Pedro Boado edited comment on PHOENIX-4372 at 11/19/17 3:07 PM: ---------------------------------------------------------------- OK I'll do my best. After pulling latest changes from 4.x-HBase-1.2 and running IT I'm getting only 5 failures in the {{NeedsOwnMiniClusterTest}} category. Two of them ( {{RegexBulkLoadToolIT}}, {{CsvBulkLoadToolIT}} ) are failing with ( return value -1 ) and this error {code} java.lang.Exception: java.lang.IllegalArgumentException: Can't read partitions file at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:406) Caused by: java.lang.IllegalArgumentException: Can't read partitions file at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:108) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:587) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:656) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:268) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Wrong number of partitions in keyset at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:82) ... 11 more {code} I've tracked the error down to {{org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner}} . In method {{setConf()}} it's failing in this check {code} if (splitPoints.length != job.getNumReduceTasks() - 1) { throw new IOException("Wrong number of partitions in keyset"); {code} {{splitPoints.length}} value is 2 ( what it should be ) but {{job.getNumReduceTasks()}} is 1. This is caused by this difference ( Cloudera's HBase is compiled with Hadoop Profile 1.1 ) {code} [INFO] +- org.apache.hbase:hbase-common:jar:1.2.0-cdh5.11.2:compile [INFO] | \- org.apache.hadoop:hadoop-core:jar:2.6.0-mr1-cdh5.11.2:compile {code} This is caused by CDH's implementation of {{LocalJobRunner.Job}} class having a hard restriction limiting the number of reducers to 1: {code:java} // method LocalJobRunner.Job.run() TaskSplitMetaInfo[] taskSplitMetaInfos = SplitMetaInfoReader.readSplitMetaInfo(jobId, this.localFs, LocalJobRunner.this.conf, this.systemJobDir); int numReduceTasks = this.job.getNumReduceTasks(); if (numReduceTasks > 1 || numReduceTasks < 0) { numReduceTasks = 1; this.job.setNumReduceTasks(1); } {code} By excluding {{org.apache.hadoop:hadoop-core:jar:2.6.0-mr1-cdh5.11.2}} tests fallback to use {{org.apache.hadoop:hadoop-mapreduce-client-common:jar:2.6.0-cdh5.11.2}} also available in the classpath and that doesn't impose this limitation in the LocalJobRunner. I'll be rerunning all IT to check the impact of removing this library. In terms of packaging a parcel, neither of these libraries will be included ( as they're already available in CDH classpath ). was (Author: pboado): OK I'll do my best. After pulling latest changes from 4.x-HBase-1.2 and running IT I'm getting only 5 failures in the {{NeedsOwnMiniClusterTest}} category. Two of them ( {{RegexBulkLoadToolIT}}, {{CsvBulkLoadToolIT}} ) are failing with ( return value -1 ) and this error {code} java.lang.Exception: java.lang.IllegalArgumentException: Can't read partitions file at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:406) Caused by: java.lang.IllegalArgumentException: Can't read partitions file at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:108) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:587) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:656) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:268) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Wrong number of partitions in keyset at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:82) ... 11 more {code} I've tracked the error down to {{org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner}} . In method {{setConf()}} it's failing in this check {code} if (splitPoints.length != job.getNumReduceTasks() - 1) { throw new IOException("Wrong number of partitions in keyset"); {code} {{splitPoints.length}} value is 2 ( what it should be ) but {{job.getNumReduceTasks()}} is 1. This is caused by this difference ( Cloudera's HBase is compiled with Hadoop Profile 1.1 ) {code} [INFO] +- org.apache.hbase:hbase-common:jar:1.2.0-cdh5.11.2:compile [INFO] | \- org.apache.hadoop:hadoop-core:jar:2.6.0-mr1-cdh5.11.2:compile {code} This is caused by CDH's implementation of {{LocalJobRunner.Job}} class having a hard restriction limiting the number of reducers to 1: {code:java} // method LocalJobRunner.Job.run() TaskSplitMetaInfo[] taskSplitMetaInfos = SplitMetaInfoReader.readSplitMetaInfo(jobId, this.localFs, LocalJobRunner.this.conf, this.systemJobDir); int numReduceTasks = this.job.getNumReduceTasks(); if (numReduceTasks > 1 || numReduceTasks < 0) { numReduceTasks = 1; this.job.setNumReduceTasks(1); } {code} It looks like this makes impossible for any MR test to run properly in CDH. The restriction does not exist in the "real" mapreduce engine (I've run CSVBulkImport tool before in CDH) but only in the LocalJobRunner. The only reasonable option that I see at the moment is disabling these tests for CDH compilation. What do you think guys? I'm still working on the other errors. > Distribution of Apache Phoenix 4.13 for CDH 5.11.2 > -------------------------------------------------- > > Key: PHOENIX-4372 > URL: https://issues.apache.org/jira/browse/PHOENIX-4372 > Project: Phoenix > Issue Type: Task > Affects Versions: 4.13.0 > Reporter: Pedro Boado > Priority: Minor > Labels: cdh > Attachments: PHOENIX-4372-v2.patch, PHOENIX-4372.patch > > > Changes required on top of branch 4.13-HBase-1.2 for creating a parcel of > Apache Phoenix 4.13.0 for CDH 5.11.2 . -- This message was sent by Atlassian JIRA (v6.4.14#64029)