[
https://issues.apache.org/jira/browse/PIG-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16638956#comment-16638956
]
Satish Subhashrao Saley commented on PIG-5342:
----------------------------------------------
Could you please amend the commit? BloomFilterPartitioner class wasn't
committed.
{code:java}
[echo] *** Building Main Sources ***
[echo] *** To compile with all warnings enabled, supply -Dall.warnings=1
on command line ***
[echo] *** Else, you will only be warned about deprecations ***
[echo] *** Hadoop version used: 2 ; HBase version used: 1 ; Spark version
used: 2 ***
[javac] Compiling 1106 source files to /Users/saley/src/pig/build/classes
[javac]
/Users/saley/src/pig/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/TezCompiler.java:113:
error: cannot find symbol
[javac] import
org.apache.pig.backend.hadoop.executionengine.tez.runtime.BloomFilterPartitioner;
[javac] ^
[javac] symbol: class BloomFilterPartitioner
[javac] location: package
org.apache.pig.backend.hadoop.executionengine.tez.runtime
[javac]
/Users/saley/src/pig/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/TezCompiler.java:1495:
error: cannot find symbol
[javac] edge.partitionerClass = BloomFilterPartitioner.class;
[javac] ^
[javac] symbol: class BloomFilterPartitioner
[javac] location: class TezCompiler
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.
[javac] 2 errors
{code}
> Add setting to turn off bloom join combiner
> -------------------------------------------
>
> Key: PIG-5342
> URL: https://issues.apache.org/jira/browse/PIG-5342
> Project: Pig
> Issue Type: Sub-task
> Reporter: Satish Subhashrao Saley
> Assignee: Satish Subhashrao Saley
> Priority: Major
> Fix For: 0.18.0
>
> Attachments: PIG-5342-1.patch, PIG-5342-2.patch, PIG-5342-3.patch,
> PIG-5342-4.patch, PIG-5342-5.patch, PIG-5342-6.patch, PIG-5342-7.patch,
> PIG-5342-8.patch
>
>
> 1) Need a new setting pig.bloomjoin.nocombiner to turn off combiner for bloom
> join. When the keys are all unique, the combiner is unnecessary overhead.
> 2) In previous case, the keys were the bloom filter index and the values were
> the join key. Combining involved doing a distinct on the bag of values which
> has memory issues for more than 10 million records. That needs to be flipped
> and distinct combiner used to scale to a billions of records.
> 3) Mention in documentation that bloom join is also ideal in cases of right
> outer join with smaller dataset on the right. Replicate join only supports
> left outer join.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)