Wenhai Li has posted comments on this change. Change subject: RangeGenerator aggfunc for the numeric/asciiString datatype based on parallel streaming histogram. ......................................................................
Patch Set 21: (21 comments) Hi, Yingyi and Preston. I didn't know you cann't see the comments without publishing. :( Hope it's not too late. https://asterix-gerrit.ics.uci.edu/#/c/806/21/asterixdb/asterix-app/src/test/resources/runtimets/queries/aggregate/global-rg/global-rg.1.ddl.aql File asterixdb/asterix-app/src/test/resources/runtimets/queries/aggregate/global-rg/global-rg.1.ddl.aql: Line 20: > remove this file Done https://asterix-gerrit.ics.uci.edu/#/c/806/21/asterixdb/asterix-app/src/test/resources/runtimets/queries/aggregate/global-rg/global-rg.2.update.aql File asterixdb/asterix-app/src/test/resources/runtimets/queries/aggregate/global-rg/global-rg.2.update.aql: Line 17: * under the License. > remove this file Done https://asterix-gerrit.ics.uci.edu/#/c/806/21/asterixdb/asterix-app/src/test/resources/runtimets/queries/aggregate/global-rg/global-rg.3.query.aql File asterixdb/asterix-app/src/test/resources/runtimets/queries/aggregate/global-rg/global-rg.3.query.aql: Line 19: use dataverse test; > rename this file to global-rg.1.query.aql Done Line 22: for $x in [1.0, 2.0, double("3.0"), 3.1, 3.2, 3.3, 3.4] > WS Done https://asterix-gerrit.ics.uci.edu/#/c/806/21/asterixdb/asterix-app/src/test/resources/runtimets/queries/aggregate/local-rg/local-rg.1.ddl.aql File asterixdb/asterix-app/src/test/resources/runtimets/queries/aggregate/local-rg/local-rg.1.ddl.aql: Line 20: > remove this file. Done https://asterix-gerrit.ics.uci.edu/#/c/806/21/asterixdb/asterix-app/src/test/resources/runtimets/queries/aggregate/local-rg/local-rg.2.update.aql File asterixdb/asterix-app/src/test/resources/runtimets/queries/aggregate/local-rg/local-rg.2.update.aql: Line 16: * specific language governing permissions and limitations > remove this file Done https://asterix-gerrit.ics.uci.edu/#/c/806/21/asterixdb/asterix-app/src/test/resources/runtimets/queries/aggregate/local-rg/local-rg.3.query.aql File asterixdb/asterix-app/src/test/resources/runtimets/queries/aggregate/local-rg/local-rg.3.query.aql: Line 13: * software distributed under the License is distributed on an > rename this file to local-rg.1.query.aql Done Line 22: for $x in [1.0, 2.0, double("3.0"), 3.1, 3.2, 3.3, 3.4] > WS Done https://asterix-gerrit.ics.uci.edu/#/c/806/21/asterixdb/asterix-app/src/test/resources/runtimets/queries/aggregate/rg-double-null/rg-double-null.1.ddl.aql File asterixdb/asterix-app/src/test/resources/runtimets/queries/aggregate/rg-double-null/rg-double-null.1.ddl.aql: Line 8: * with the License. You may obtain a copy of the License at > remove this file Done https://asterix-gerrit.ics.uci.edu/#/c/806/21/asterixdb/asterix-app/src/test/resources/runtimets/queries/aggregate/rg-double-null/rg-double-null.2.update.aql File asterixdb/asterix-app/src/test/resources/runtimets/queries/aggregate/rg-double-null/rg-double-null.2.update.aql: Line 17: * under the License. > remove this file. Done https://asterix-gerrit.ics.uci.edu/#/c/806/21/asterixdb/asterix-app/src/test/resources/runtimets/queries/aggregate/rg-double-null/rg-double-null.3.query.aql File asterixdb/asterix-app/src/test/resources/runtimets/queries/aggregate/rg-double-null/rg-double-null.3.query.aql: Line 19: use dataverse test; > rename this file to rg-double-null.1.query.aql Done https://asterix-gerrit.ics.uci.edu/#/c/806/21/asterixdb/asterix-app/src/test/resources/runtimets/queries/aggregate/rg-double/rg-double.1.ddl.aql File asterixdb/asterix-app/src/test/resources/runtimets/queries/aggregate/rg-double/rg-double.1.ddl.aql: Line 21: create dataverse test; > remove this file Done https://asterix-gerrit.ics.uci.edu/#/c/806/21/asterixdb/asterix-app/src/test/resources/runtimets/queries/aggregate/rg-double/rg-double.2.update.aql File asterixdb/asterix-app/src/test/resources/runtimets/queries/aggregate/rg-double/rg-double.2.update.aql: Line 12: * Unless required by applicable law or agreed to in writing, > remove this file Done https://asterix-gerrit.ics.uci.edu/#/c/806/21/asterixdb/asterix-app/src/test/resources/runtimets/queries/aggregate/rg-double/rg-double.3.query.aql File asterixdb/asterix-app/src/test/resources/runtimets/queries/aggregate/rg-double/rg-double.3.query.aql: Line 20: set partitions '2' > rename this file to rg-double.3.query.aql. Done Line 22: for $x in [1.0, 2.0, double("3.0"), 3.1, 3.2, 3.3, 3.4] > WS Done https://asterix-gerrit.ics.uci.edu/#/c/806/21/asterixdb/asterix-om/src/main/java/org/apache/asterix/om/functions/AsterixBuiltinFunctions.java File asterixdb/asterix-om/src/main/java/org/apache/asterix/om/functions/AsterixBuiltinFunctions.java: Line 225: "ceiling", 1); > code style doesn't seem right. Done https://asterix-gerrit.ics.uci.edu/#/c/806/21/asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/aggregates/std/GlobalRangeGeneratorAggregateFunction.java File asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/aggregates/std/GlobalRangeGeneratorAggregateFunction.java: Line 71: IAType listedItemType = ((AOrderedListType) inRecType).getItemType(); How can we get the listedItemType without the inRecType? https://asterix-gerrit.ics.uci.edu/#/c/806/21/asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/aggregates/std/LocalRangeGeneratorAggregateFunction.java File asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/aggregates/std/LocalRangeGeneratorAggregateFunction.java: Line 47: public class LocalRangeGeneratorAggregateFunction extends AbstractRangeGeneratorAggregateFunction { > Can't we use open lists for local/intermediate aggregate output? Great, how about the final output of the globalgenerator? https://asterix-gerrit.ics.uci.edu/#/c/806/21/asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/aggregates/std/RangeGeneratorAggregateFunction.java File asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/aggregates/std/RangeGeneratorAggregateFunction.java: Line 45: public void step(IFrameTupleReference tuple) throws AlgebricksException { > why no implementation? Currently, the local/global couple of functions is enough to parallel construct/merge the histogram. Needs another round for single construction by this class? https://asterix-gerrit.ics.uci.edu/#/c/806/21/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/range/structures/GenericStreamingHistogram.java File hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/range/structures/GenericStreamingHistogram.java: Line 38: > move the class to org/apache/asterix/runtime/aggregates/std/range? Question to the comments related to the hyracks-histogram: 1. To accommodate the balancing pre-computation involved in all sorts of order-dependent operations, potentially for the job division of the future asterixdb sort-merge join and the matrix segmentation in some relevant mining/graph requirements, do we need to provide the fundamental algorithms in the hyracks running base? 2. The export interfaces below/of the IHistogram are merely related to the hyracks datatype and the transformation/inversion between the variant types and the Double covers the primitive types of hyracks, which naturally supports the abstraction of the type-ignorant calling from the above asterix/agg and the potential statistic requirements from the future optimizer? To this end, it's better to integrate all the changes to asterixAGG? https://asterix-gerrit.ics.uci.edu/#/c/806/21/hyracks-fullstack/hyracks/hyracks-examples/hyracks-integration-tests/data/skew/zipfan2.tbl File hyracks-fullstack/hyracks/hyracks-examples/hyracks-integration-tests/data/skew/zipfan2.tbl: Line 3: 5.1143520826504275E7 9669 51143520 20291 -1171915.9960645214 3003 42424281 291 =kO98+.DI)QN#Z > what does this mean? respectively means: zipfan unsigned double, uniform unsigned int16, zipfan unsigned long/int32, guassin unsigned int16/32, zipfan double, uniform int16, zipfan long/int32, guassin int16/32, ascii string with variant length. We can locally construct for the both files and globally merge the generated intermediate bins. It's just for verification purpose of histogram construction accuracy. We can remove this once the code is stable enough. -- To view, visit https://asterix-gerrit.ics.uci.edu/806 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: I450d0962fbeacfb2b6ab9fae0750f025ef17ba01 Gerrit-PatchSet: 21 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Wenhai Li <lwhaym...@yahoo.com> Gerrit-Reviewer: Jenkins <jenk...@fulliautomatix.ics.uci.edu> Gerrit-Reviewer: Jianfeng Jia <jianfeng....@gmail.com> Gerrit-Reviewer: Michael Blow <mb...@apache.org> Gerrit-Reviewer: Preston Carman <prest...@apache.org> Gerrit-Reviewer: Till Westmann <ti...@apache.org> Gerrit-Reviewer: Wenhai Li <lwhaym...@yahoo.com> Gerrit-Reviewer: Yingyi Bu <buyin...@gmail.com> Gerrit-Reviewer: Yingyi Bu <ying...@google.com> Gerrit-HasComments: Yes