[jira] [Comment Edited] (PHOENIX-4372) Distribution of Apache Phoenix 4.13 for CDH 5.11.2

Pedro Boado (JIRA) Sun, 19 Nov 2017 07:08:48 -0800

    [ 
https://issues.apache.org/jira/browse/PHOENIX-4372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16258181#comment-16258181
 ]


Pedro Boado edited comment on PHOENIX-4372 at 11/19/17 3:07 PM:
----------------------------------------------------------------

OK I'll do my best. After pulling latest changes from 4.x-HBase-1.2 and running 
IT I'm getting only 5 failures in the {{NeedsOwnMiniClusterTest}} category.

Two of them ( {{RegexBulkLoadToolIT}}, {{CsvBulkLoadToolIT}} ) are failing with 
( return value -1 ) and this error

{code}
java.lang.Exception: java.lang.IllegalArgumentException: Can't read partitions 
file
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:406)
Caused by: java.lang.IllegalArgumentException: Can't read partitions file
        at 
org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:108)
        at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
        at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
        at 
org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:587)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:656)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:268)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Wrong number of partitions in keyset
        at 
org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:82)
        ... 11 more
{code}

I've tracked the error down to 
{{org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner}} . In method 
{{setConf()}} it's failing in this check

{code}
            if (splitPoints.length != job.getNumReduceTasks() - 1) {
                throw new IOException("Wrong number of partitions in keyset");
{code}

{{splitPoints.length}} value is 2 ( what it should be ) but 
{{job.getNumReduceTasks()}} is 1. This is caused by this difference ( 
Cloudera's HBase is compiled with Hadoop Profile 1.1 )

{code}
[INFO] +- org.apache.hbase:hbase-common:jar:1.2.0-cdh5.11.2:compile
[INFO] |  \- org.apache.hadoop:hadoop-core:jar:2.6.0-mr1-cdh5.11.2:compile
{code}
 
This is caused by CDH's implementation of {{LocalJobRunner.Job}} class having a 
hard restriction limiting the number of reducers to 1: 

{code:java}
                // method LocalJobRunner.Job.run()
                TaskSplitMetaInfo[] taskSplitMetaInfos = 
SplitMetaInfoReader.readSplitMetaInfo(jobId, this.localFs, 
LocalJobRunner.this.conf, this.systemJobDir);
                int numReduceTasks = this.job.getNumReduceTasks();
                if (numReduceTasks > 1 || numReduceTasks < 0) {
                    numReduceTasks = 1;
                    this.job.setNumReduceTasks(1);
                }
{code}

By excluding {{org.apache.hadoop:hadoop-core:jar:2.6.0-mr1-cdh5.11.2}} tests 
fallback to use  
{{org.apache.hadoop:hadoop-mapreduce-client-common:jar:2.6.0-cdh5.11.2}} also 
available in the classpath and that doesn't impose this limitation in the 
LocalJobRunner. 

I'll be rerunning all IT to check the impact of removing this library. In terms 
of packaging a parcel, neither of these libraries will be included ( as they're 
already available in CDH classpath ).



was (Author: pboado):
OK I'll do my best. After pulling latest changes from 4.x-HBase-1.2 and running 
IT I'm getting only 5 failures in the {{NeedsOwnMiniClusterTest}} category.

Two of them ( {{RegexBulkLoadToolIT}}, {{CsvBulkLoadToolIT}} ) are failing with 
( return value -1 ) and this error

{code}
java.lang.Exception: java.lang.IllegalArgumentException: Can't read partitions 
file
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:406)
Caused by: java.lang.IllegalArgumentException: Can't read partitions file
        at 
org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:108)
        at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
        at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
        at 
org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:587)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:656)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:268)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Wrong number of partitions in keyset
        at 
org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:82)
        ... 11 more
{code}

I've tracked the error down to 
{{org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner}} . In method 
{{setConf()}} it's failing in this check

{code}
            if (splitPoints.length != job.getNumReduceTasks() - 1) {
                throw new IOException("Wrong number of partitions in keyset");
{code}

{{splitPoints.length}} value is 2 ( what it should be ) but 
{{job.getNumReduceTasks()}} is 1. This is caused by this difference ( 
Cloudera's HBase is compiled with Hadoop Profile 1.1 )

{code}
[INFO] +- org.apache.hbase:hbase-common:jar:1.2.0-cdh5.11.2:compile
[INFO] |  \- org.apache.hadoop:hadoop-core:jar:2.6.0-mr1-cdh5.11.2:compile
{code}
 
This is caused by CDH's implementation of {{LocalJobRunner.Job}} class having a 
hard restriction limiting the number of reducers to 1: 

{code:java}
                // method LocalJobRunner.Job.run()
                TaskSplitMetaInfo[] taskSplitMetaInfos = 
SplitMetaInfoReader.readSplitMetaInfo(jobId, this.localFs, 
LocalJobRunner.this.conf, this.systemJobDir);
                int numReduceTasks = this.job.getNumReduceTasks();
                if (numReduceTasks > 1 || numReduceTasks < 0) {
                    numReduceTasks = 1;
                    this.job.setNumReduceTasks(1);
                }
{code}

It looks like this makes impossible for any MR test to run properly in CDH. The 
restriction does not exist in the "real" mapreduce engine (I've run 
CSVBulkImport tool before in CDH) but only in the LocalJobRunner.

The only reasonable option that I see at the moment is disabling these tests 
for CDH compilation. What do you think guys?

I'm still working on the other errors.

> Distribution of Apache Phoenix 4.13 for CDH 5.11.2
> --------------------------------------------------
>
>                 Key: PHOENIX-4372
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4372
>             Project: Phoenix
>          Issue Type: Task
>    Affects Versions: 4.13.0
>            Reporter: Pedro Boado
>            Priority: Minor
>              Labels: cdh
>         Attachments: PHOENIX-4372-v2.patch, PHOENIX-4372.patch
>
>
> Changes required on top of branch 4.13-HBase-1.2 for creating a parcel of 
> Apache Phoenix 4.13.0 for CDH 5.11.2 . 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (PHOENIX-4372) Distribution of Apache Phoenix 4.13 for CDH 5.11.2

Reply via email to