[ 
https://issues.apache.org/jira/browse/PHOENIX-4372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16258181#comment-16258181
 ] 

Pedro Boado edited comment on PHOENIX-4372 at 11/19/17 1:45 PM:
----------------------------------------------------------------

OK I'll do my best. After pulling latest changes from 4.x-HBase-1.2 and running 
IT I'm getting only 5 failures in the {{NeedsOwnMiniClusterTest}} category.

Two of them ( {{RegexBulkLoadToolIT}}, {{CsvBulkLoadToolIT}} ) are failing with 
( return value -1 ) and this error

{code}
java.lang.Exception: java.lang.IllegalArgumentException: Can't read partitions 
file
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:406)
Caused by: java.lang.IllegalArgumentException: Can't read partitions file
        at 
org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:108)
        at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
        at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
        at 
org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:587)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:656)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:268)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Wrong number of partitions in keyset
        at 
org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:82)
        ... 11 more
{code}

I've tracked the error down to 
{{org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner}} . In method 
{{setConf()}} it's failing in this check

{code}
            if (splitPoints.length != job.getNumReduceTasks() - 1) {
                throw new IOException("Wrong number of partitions in keyset");
{code}

{{splitPoints.length}} value is 2 ( what it should be ) but 
{{job.getNumReduceTasks()}} is 1. This is caused by this difference ( 
Cloudera's HBase is compiled with Hadoop Profile 1.1 )

{code}
[INFO] +- org.apache.hbase:hbase-common:jar:1.2.0-cdh5.11.2:compile
[INFO] |  \- org.apache.hadoop:hadoop-core:jar:2.6.0-mr1-cdh5.11.2:compile
{code}
 
This is caused by CDH's implementation of {{LocalJobRunner.Job}} class having a 
hard restriction limiting the number of reducers to 1: 

{code:java}
                // method LocalJobRunner.Job.run()
                TaskSplitMetaInfo[] taskSplitMetaInfos = 
SplitMetaInfoReader.readSplitMetaInfo(jobId, this.localFs, 
LocalJobRunner.this.conf, this.systemJobDir);
                int numReduceTasks = this.job.getNumReduceTasks();
                if (numReduceTasks > 1 || numReduceTasks < 0) {
                    numReduceTasks = 1;
                    this.job.setNumReduceTasks(1);
                }
{code}

It looks like this makes impossible for any MR test to run properly in CDH. The 
restriction does not exist in the "real" mapreduce engine (I've run 
CSVBulkImport tool before in CDH) but only in the LocalJobRunner.

The only reasonable option that I see at the moment is disabling these tests 
for CDH compilation. What do you think guys?

I'm still working on the other errors.


was (Author: pboado):
OK I'll do my best. After pulling latest changes from 4.x-HBase-1.2 and running 
IT I'm getting only 5 failures in the {{NeedsOwnMiniClusterTest}} category.

Two of them ( {{RegexBulkLoadToolIT}}, {{CsvBulkLoadToolIT}} ) are failing with 
( return value -1 ) and this error

{code}
java.lang.Exception: java.lang.IllegalArgumentException: Can't read partitions 
file
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:406)
Caused by: java.lang.IllegalArgumentException: Can't read partitions file
        at 
org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:108)
        at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
        at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
        at 
org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:587)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:656)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:268)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Wrong number of partitions in keyset
        at 
org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:82)
        ... 11 more
{code}

I've tracked the error down to 
{{org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner}} . In method 
{{setConf()}} it's failing in this check

{code}
            if (splitPoints.length != job.getNumReduceTasks() - 1) {
                throw new IOException("Wrong number of partitions in keyset");
{code}

{{splitPoints.length}} value is 2 ( what it should be ) but 
{{job.getNumReduceTasks()}} is 1. 

I think this is caused by this difference ( Cloudera's HBase is compiled with 
Hadoop Profile 1.1 )

{code}
[INFO] +- org.apache.hbase:hbase-common:jar:1.2.0-cdh5.11.2:compile
[INFO] |  \- org.apache.hadoop:hadoop-core:jar:2.6.0-mr1-cdh5.11.2:compile
{code}
 
Under this version of hadoop the MR job is not picking the right number of 
partitions.

Any help is appreciated to make these tests work. 

I'm still working on the other errors.

> Distribution of Apache Phoenix 4.13 for CDH 5.11.2
> --------------------------------------------------
>
>                 Key: PHOENIX-4372
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4372
>             Project: Phoenix
>          Issue Type: Task
>    Affects Versions: 4.13.0
>            Reporter: Pedro Boado
>            Priority: Minor
>              Labels: cdh
>         Attachments: PHOENIX-4372-v2.patch, PHOENIX-4372.patch
>
>
> Changes required on top of branch 4.13-HBase-1.2 for creating a parcel of 
> Apache Phoenix 4.13.0 for CDH 5.11.2 . 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to