Hi,
PFA patch of fix for PIG-671. Used the approach mentioned in previous email.
I could not find any test cases for Count.java, besides ant test just hung
up.
Output:
grunt> a = load 'test.txt';
grunt> x = foreach a generate COUNT(a.$0,a.$0);
grunt> dump x;
2011-03-08 14:45:03,408 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1045: Could not infer the matching function for
org.apache.pig.builtin.COUNT as multiple or none of them fit. Please use an
explicit cast.
Details at logfile:
/Users/deepakkv/Documents/opensource/pig/working/pig_1299575686422.log
grunt> b = group a all;
grunt> x = foreach b generate COUNT(a.$0,a.$0);
grunt> dump x;
2011-03-08 14:45:19,668 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1045: Could not infer the matching function for
org.apache.pig.builtin.COUNT as multiple or none of them fit. Please use an
explicit cast.
Details at logfile:
/Users/deepakkv/Documents/opensource/pig/working/pig_1299575686422.log
grunt> quit
Regards,
Deepak
On Tue, Mar 8, 2011 at 12:12 PM, deepak kumar v <[email protected]> wrote:
> Hi Dmitriy,
>
> I was looking SUBSTRING.java and thats exactly(getArgToFuncMapping) what i
> am trying now with COUNT.
> Waiting for the build to complete and test out my changes before i could
> post this option.
>
> Regards,
> Deepak
>
>
> On Tue, Mar 8, 2011 at 11:56 AM, Dmitriy Ryaboy <[email protected]>wrote:
>
>> Actually I think if you just implement getArgToFuncMapping for COUNT,
>> where you only return a mapping for a single bag argument, pig will notice
>> that the wrong number of args is supplied during the compilation phase and
>> no runtime exceptions will be required.
>>
>> I haven't checked how well the funcSpec mapping works with Bags, that's
>> something to experiment with.
>>
>> D
>>
>>
>> On Mon, Mar 7, 2011 at 9:55 PM, deepak kumar v <[email protected]>wrote:
>>
>>> Hi Pig Developers,
>>> This is my first dive into open source contribution and i hope to dive
>>> deep.
>>>
>>> I was going through https://issues.apache.org/jira/browse/PIG-671 and
>>> observed the following with COUNT.java
>>>
>>> COUNT.exec() always retrieves the first item from input tuple which it
>>> assumes is a bag and counts the numbers of items in the bag.
>>> Even if we pass multiple arguments to COUNT(), it will always pick the
>>> first
>>> argument.
>>>
>>> There are few ways we go through this
>>> a) Leave as is cause it returns correct result for counting the number of
>>> items in the first argument.
>>> OR
>>> b) Make a check for the size of the input tuple in COUNT.exec() and if it
>>> is
>>> not 1 then throw ExecException() or IllegalArgumentException {might be
>>> correct}
>>> which will cause the Map job to fail.
>>>
>>> Let me know how to we go about it.
>>>
>>>
>>> Regards,
>>> Deepak
>>>
>>
>>
>
Index: src/org/apache/pig/builtin/COUNT.java
===================================================================
--- src/org/apache/pig/builtin/COUNT.java (revision 1079171)
+++ src/org/apache/pig/builtin/COUNT.java (working copy)
@@ -20,10 +20,13 @@
import java.io.IOException;
import java.util.Iterator;
import java.util.Map;
+import java.util.List;
+import java.util.ArrayList;
import org.apache.pig.Accumulator;
import org.apache.pig.Algebraic;
import org.apache.pig.EvalFunc;
+import org.apache.pig.FuncSpec;
import org.apache.pig.PigException;
import org.apache.pig.backend.executionengine.ExecException;
import org.apache.pig.data.DataBag;
@@ -31,6 +34,7 @@
import org.apache.pig.data.Tuple;
import org.apache.pig.data.TupleFactory;
import org.apache.pig.impl.logicalLayer.schema.Schema;
+import org.apache.pig.impl.logicalLayer.FrontendException;
/**
* Generates the count of the number of values in a bag. This count does not
@@ -149,6 +153,18 @@
return new Schema(new Schema.FieldSchema(null, DataType.LONG));
}
+ /* (non-Javadoc)
+ * @see org.apache.pig.EvalFunc#getArgToFuncMapping()
+ */
+ @Override
+ public List<FuncSpec> getArgToFuncMapping() throws FrontendException {
+ List<FuncSpec> funcList = new ArrayList<FuncSpec>();
+ Schema s = new Schema();
+ s.add(new Schema.FieldSchema(null, DataType.BAG));
+ funcList.add(new FuncSpec(this.getClass().getName(), s));
+ return funcList;
+ }
+
/* Accumulator interface implementation */
private long intermediateCount = 0L;