> On Aug. 26, 2014, 11:05 p.m., Jinfeng Ni wrote: > > exec/java-exec/src/main/codegen/templates/RegrTypeFunctions.java, line 120 > > <https://reviews.apache.org/r/25012/diff/2/?file=667996#file667996line120> > > > > How do we handle the case where count = 0? > > Yash Sharma wrote: > Returning 0 for zero count/ all null records > > Jinfeng Ni wrote: > Returning 0 seems not right. In stead, it is expected to get null. > > Just found Postgres had description of the those aggregate function : > > http://www.postgresql.org/docs/9.1/static/functions-aggregate.html > > " > In all cases, null is returned if the computation is meaningless, for > example when N is zero. > " > > On the other side, I realized that the existing aggregation avg() > actually also does not perform correctly. When count = 0, it will return NaN, > in stead of null. > > For the overflow issue, it might be fine to leave it as is now. > > Jinfeng Ni wrote: > We probably should address the issue of null vs NaN vs 0, together for > other existing aggregation functions.
One way of doing it is having the output as Nullable always. And we do not set the value if count is zero. This can be done in all the aggr functions. Thoughts? > On Aug. 26, 2014, 11:05 p.m., Jinfeng Ni wrote: > > exec/java-exec/src/test/resources/regr_input.json, line 300 > > <https://reviews.apache.org/r/25012/diff/2/?file=667999#file667999line300> > > > > I'm not sure what other people will think about this input data, > > containing 300 rows. I feel it's not easy to manually compute the result, > > and verify whether the result is correct. For unit test, I think we had > > better keep the data as simple as possilbe, while cover as wide scenarios > > as possible. > > > > Also, the unit test does not cover the case of some paris contain null > > values, and the case all the pairs contain nulls. > > Yash Sharma wrote: > Actually wanted to test against more records. Have moved to smaller > testcase. Enhanced data with broader test scenarios. > > Jinfeng Ni wrote: > You use JSON format as data source. The columns from JSON file will be > processed as nullable columns. This means the tests will test : > regr_avgx( NullableBitInt, NullableBigInt). > > Can you add test with regr_avgx(BitInt, NullableBigInt)? > > My guess the code might be hit function resolution error for this new > case. In your code, you only have function implementation when both sides are > non-nullable, or both sides are nulllable. But the mixed cases do not have > implementation. I'm not sure if impliciat cast will kick in, for the > aggregation function. > > Yash Sharma wrote: > Will this testcase reproduce similar scenario: > { > "ref" : "`EXPR$3`", > "expr" : "regr_avgx(`C`, `F`) " > } > > Here 'C' has all non-null cols & 'F' has mixed null/non-null cols. > > Jinfeng Ni wrote: > If 'C' and 'F' come from JOSN file, they both are treated as nullable > columns. > > Can you try to test using SQL directly, in stead of using physical plan? > > For instance: > > Q1 : Select regr_avgx(n_nationkey, employee_id) from > cp.`tpch/nation.parquet` t1, cp.`employee.json` t2 where t1.n_nationkey = > t2.employee_id ; > > The query itself is not a meanfuling one. However, it will get > "n_nationkey" from parquet file as non-nullable column, and "employee_id" > from json file as nullable column. This way, we can test the function > resolution for the case where one input is nulllabe, the other input is > non-nullable. I am only testing physical plan directly since Sqlline does not recognize regr_avgx/regr_avgy yet. There might be changes required on th eoptiq side. Is there any other way of testing the same? - Yash ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25012/#review51602 ----------------------------------------------------------- On Aug. 28, 2014, 5:13 p.m., Yash Sharma wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/25012/ > ----------------------------------------------------------- > > (Updated Aug. 28, 2014, 5:13 p.m.) > > > Review request for drill, Jinfeng Ni and Mehant Baid. > > > Repository: drill-git > > > Description > ------- > > Stats function regr_avgx, regr_avgy > > > Diffs > ----- > > exec/java-exec/src/main/codegen/config.fmpp ff6135d > exec/java-exec/src/main/codegen/data/RegrTypes.tdd PRE-CREATION > exec/java-exec/src/main/codegen/templates/RegrTypeFunctions.java > PRE-CREATION > > exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestAggregateFunction.java > 5e57dc7 > exec/java-exec/src/test/resources/functions/test_regression.json > PRE-CREATION > exec/java-exec/src/test/resources/regr_input.json PRE-CREATION > > Diff: https://reviews.apache.org/r/25012/diff/ > > > Testing > ------- > > Done. > $mvn test -Dtest=TestAggregateFunction#testRegr > > > Thanks, > > Yash Sharma > >
