[
https://issues.apache.org/jira/browse/HIVE-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Phabricator updated HIVE-4126:
------------------------------
Attachment: HIVE-4126.D9105.2.patch
hbutani updated the revision "HIVE-4126 [jira] remove support for lead/lag UDFs
outside of UDAF args".
- add PTF clause to grammar
- PTF Test Queries
- Query Data
- corrected data file name
- Merge branch 'hive-896' of github.com:hbutani/hive into hive-896
- windowing + hive attempt
- Hooking QueryDef to QB
- ptf source is a subQuery, not a select statement
- add windowing clauses in grammar
- fix grammar exception issue
- associate PTF nodes with corresponding Insert node, if any
- minor grammar fixes
- flush out processing PTF tree in phase1
- associate PTFs in dest node handling in Phase1
- Classes not needed
- Merge branch 'crook' of github.com:hbutani/hive into crook
- Merge with apache hive
- handle SortBy, Having and Window clauses in Phase1
- remove ambiguities in Hive.g
- Merge branch 'crook' of https://github.com/hbutani/hive into crook
- syntactically allow a window specification in a selectItem
- tweak QuerySpec building.
- check that there is no GBy and where when deciding if a Windowing
- Merge branch 'crook' of https://github.com/hbutani/hive into crook
- Add several checks:
- QSpec to QDef (Checked qdef serialization and deserialization: it works)
- refactor the PTF ifc.
- refactor: rename annotation classes to end in Description
- refactor: move annotation classes to ql.exec package, where the
- refactor: move window functions to ql.udf.generic package
- refcator: move GenericUDFLeadLag to ql.udf.generic
- refactor: move TablFunc bases classes to ql.udf.ptf
- refactor: move PTblFuncs to ql.udf.ptf
- refactor: move Order enum to ptf.query.spec package, so that i can
- refactor: remove classes in ptf.metadata package. Not needed now
- refactor: FunctionRegistry, extract FunctionInfo classes; step 1 in
- fix logic that checks for windowing specifications in Select List
- reenable ensurePTFChainHasPartitioning;
- Added aliasToAST map: To setup expressions map in PTF's output
- Added AST expression: Used to populate PTF RowResolver's expression map
- Using input operator's RowResolver to construct OI for HiveTableDef
- Use of SCRIPT DependencyType for processing rule on PTFOperator.
- Cleanup: Changed prototype of translate method in Translator. Commented
- Add utility methods:
- Add method to get operator name in PTFOperator.
- 1. Translation of QuerySpec to QueryDef
- Merge branch 'hive-896' into crook
- Following changes:
- refactor genPTFPlan:
- minor bug: clear Agg. and Distinct Agg. lists in QB ParseInfo.
- flush out SelectDef translation:
- introduce initializeOutputOI and initializeRawInputOI to TableFunction
- When constructing the RowResolver for the Windowing or Noop PTFs:
- add the columns from the last PTF, before adding any
- 1. Change logic of how/which TableFunc is added to a QuerySpec: if query
- During QDef deserialization use the passed in inputOI as the OI of the
- for subQueries as input to PTF, construct a HiveTableSpec.
- Tests successful for queries with: windowing, lead/lag, noop, gby,
having, join with lead/lag, join with noop
- support an alias for a PTF invocation. This is needed so that a PTF
- add test for alias in ptf invocation
- translate ptf invocations in the from clause (that are not associated
- add tests that exercise generation of separate PTFOps for PTFChain and
- Fix aggregations bug: move aggregation expressions from aggregationTrees
to PTF QuerySpec if no group by clause is seen at the end of phase 1
- add support for PTF invocation in joins.
- adding tests for ptf invocation in joins
- handle mixed case aliases.
- mixed case alias test
- Create PTF Map-side RR:
- Merge branch 'hive-896' into crook
- fix Hive.g merge issue: duplicate KW_ROWS definition
- Having: Tests to support having with windowing and ptf in queries with no
group by.
- - during handleClusterOrDistributeByForWindowing invoke the
- Merge branch 'crook' of https://github.com/hbutani/hive into crook
- add tests to check
- when extracting Windowing clauses from selectList handle the case
- when extracting Windowing clauses from selectList handle the case
- More tests with UDAFs, statistical and distribution functions
- No need to specify Writable option to copy object.
- Following changes:
- Tests:
- disallow Count/Sum distinct with windowing
- refactor ptf.translate package:
- Merge branch 'ptf' of https://github.com/hbutani/hive into ptf
- refactor ptf.query.specification package
- Merge branch 'ptf' of https://github.com/hbutani/hive into ptf
- refactor ptf.query.definition package
- Get rid of ql.ptf.functions package
- move PTFSpec to hive.ql.parse package
- move PTFDef to hive.ql.plan package
- rename QueryInput Def & Spec class names to better reflect their
- refactor ptf.runtime package
- Setup PTFSpec for QB at the end of phase I for cases where it is not
already done.
- Rollback change for os.family
- Remove individual test files: all test queries are in
- get rid ptf.io package
- minor cleanup in ptf.ds package
- Merge branch 'ptf' of https://github.com/hbutani/hive into ptf
- remove ptf.utils.HiveUtils
- cleanup of ptf.query.translate package
- merge with hive
- cleanup and document data struct additions on SemAly, QB, QBParseInfo.
- Changed error message for negative test: ptf_negative_NoSortNoDistByClause
- Merge remote-tracking branch 'remotes/apache_hive/trunk' into ptf
- Remove NPath code
- Merge with PTF HEAD
- Merge remote-tracking branch 'remotes/apache_hive/trunk' into ptf
- Allow use of constant expressions in select clause for queries with No
GBY, No PTF and No windowing
- cleanup: get rid of WindowingTypeCheckFactory; use hive's
- check for errors from TypeCheckFactory when building
- allow Windowing invocations w/o aliases
- cleanup: move remaining ParseUtils function to PTFTranslator
- cleanup:
- cleanup: move PTFPartition to ql.exec package
- carry forward the expression mappings in the RRs constructed for PTFs
- use column position to generate internal names for PTF Op's RR.
- Resolve merge conflicts
- Normalize line endings
- Merge remote-tracking branch 'remotes/apache_hive/trunk' into ptf
- recover changes to PTFOp lost because of merge and CRLF issues
- support different UDAF invocations on the same UDAF but different
- fix range based scanner and add range based window tests
- - set the first Windowing clause encountered in a UDAF invocation as the
- Merge branch 'ptf' of https://github.com/hbutani/hive into ptf
- Account for empty partitions while closing the PTFOperator
- support different literal types in constant expressions in select list
- fix merge issues due to CRLF
- add tests for Partition & Order specs specified
- rules for inferring the default Partitioning Spec.
- To avoid losing changes due to CRLF issue
- Do not process queries with constants in select and let Hive handle them
- redo check if a SelectList constitutes a valid GBy query.
- Resolve merge conflicts with apache_hive
- Compare expressions trees using toStringTree() in translateOrder
- for distinct queries filter out exprs handled by windowing
- Merge branch 'ptf' of https://github.com/hbutani/hive into ptf
- when validating a SelectList for GBy, account for FUNCTION_DI tokens
- add test for select distinct + windowing
- get rid of WindowingException;
- cleanup ptf.utils.Utils and ptf.query.SerializationUtils
- Merge remote-tracking branch 'remotes/apache_hive/trunk' into ptf
- fix merge issue: we already had the methods in ASTNode for antlr3.4
- clean ptf.ds package; move code to PTFPersistence class
- Fix negative tests with new messages (WindowingException removed)
- add a way to disambiguate between sort expressions and fn. args
- expose the PTFInfo via the PTFDef; so the RR can be used for
- change ptf invocation so args come before table and partitioning spec.
- get rid of ptf.Constants class; add new ConfVars
- Support multi-operator function chain + tests
- fix query componentization logic.
- initial checkin of reviving NPath.
- Add more tests for query componentization
- undo inadvertant change made to .gitignore
- change to PTF ifc. A TableFunc is now also responsible for the names of
- finish NPath
- Merge branch 'ptf' of https://github.com/hbutani/hive into ptf
- Cleanup: remove unused methods in PTFTranslator, remove equals methods
- merge with hive
- fix merge issue in SemanticAnalyzer
- Fix bug: Allow partitioning spec for functions that do not support
windowing
- remove dependency on SemanticAnalyzer in PTFTranslator.
- add OperatorType and support PTFOperator for explain plan
- Merge branch 'ptf' of github.com:hbutani/hive into ptf
- Merge remote-tracking branch 'remotes/apache_hive/trunk' into ptf
- Merge remote-tracking branch 'remotes/apache_hive/trunk' into ptf
- Resolve Merge conflicts with apache_hive/ptf_windowing
- refactored spec classes
- refactored PTFDesc; for now called PTFDesc2
- refactored translation of PTF Chain.
- refactor Windowing translation
- simplify RowResolver creation.
- refactored ptfDesc deserializer
- refactor SemanticAnalyzer:
- refactor PTFOp and function classes to use new data structs.
- When executing the WdwTblFunc:
- the first Arg of a Lead/Lag function can refer to windowingFns,
- construct PTF RR using alias specified with PTF invocation
- semAly windowingSpec was not being set on the current QB
- For windowing set the OI for output from Wdw Processing to exactly as
- when setting default PartSpec in WdwTabFn don't clear default Order
- add Windowing Exprs to qbp.destToWindowingExprs; used to filter out
- setHasWindowing not set in moveaggregationExprsToWindowingSpec
- commit with hive
- rename ranking functions in PTFTrans2
- changes to ptf_general_queries.q
- fix sort by in queries 62, 63
- remove suffix 2 from new classes
- add apache license comment to new classes.
- Change get/settFunction to get/setTFunction in PTFDesc
- Fix range condition in PTFPartition: without this change the tests do not
run from CLI
- Modify rc_file and seq_file tests to specify dist/sort condition with
windowing
- fix Deserilizer bugs
- fix npath deserialization
- Merge remote-tracking branch 'remotes/hive/ptf-windowing' into ptf
- remove dependency on FuncRegistry in PTFDeserializer for PTFs
- Add fix for constructing Extract Operator RR during windowing plan
generation + Add test 50 to ptf_general_queries.q
- allow wdw fn refernces in wdw expressions
- rewrite .q and .out file
- add apache license
- apache license headers
- rebuild RR for Noop/NoopMap even when no there is no alias. Input's
- add comments to PTFOp, WdwFuncDesc
- fix lint issues
- Merge remote-tracking branch 'remotes/apache_hive/ptf-windowing' into ptf
- Fix negative tests
- Merge remote-tracking branch 'remotes/apache_hive/ptf-windowing' into ptf
- Column Pruner support for PTFOperator: For HIVE-4035
- minor changes and document PTF ColumnPruner
- Merge remote-tracking branch 'remotes/apache_hive/ptf-windowing' into ptf
- Remove setting the hive.ptf.partition.persistence.memsize in
ptf_general_queries.q
- Merge branch 'ptf' of github.com:hbutani/hive into ptf
- merge with hive
- Resolve merge conflicts with apache_hive/ptf-windowing
- Add new tests on Lead to ptf_general_queries.q + fix names
- Resolve Merge conflicts with ptf-windowing + Renumber last 3 test cases
- Resolve newline issues
- Merge remote-tracking branch 'remotes/hive/ptf-windowing' into ptf
- Merge remote-tracking branch 'hive/ptf-windowing' into ptf
- HIVE-4082: Refactor tests
- Resolve merge issues after merge with ptf-windowing
- Resolve merge conflicts with ptf-windowing
- Merge remote-tracking branch 'apache_hive/ptf-windowing' into ptf
- Merge remote-tracking branch 'apache_hive/ptf-windowing' into ptf
- Merge remote-tracking branch 'remotes/apache_hive/ptf-windowing' into ptf
- HIVE-4126 [jira] remove support for lead/lag UDFs outside of UDAF args
Reviewers: JIRA, ashutoshc
REVISION DETAIL
https://reviews.facebook.net/D9105
CHANGE SINCE LAST DIFF
https://reviews.facebook.net/D9105?vs=29205&id=29415#toc
AFFECTED FILES
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
data/files/flights_tiny.txt
data/files/part.rc
data/files/part.seq
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
ql/src/test/queries/clientpositive/leadlag.q
ql/src/test/queries/clientpositive/leadlag_queries.q
ql/src/test/queries/clientpositive/ptf.q
ql/src/test/queries/clientpositive/windowing.q
ql/src/test/queries/clientpositive/windowing_expressions.q
ql/src/test/results/clientpositive/leadlag.q.out
ql/src/test/results/clientpositive/leadlag_queries.q.out
ql/src/test/results/clientpositive/ptf.q.out
ql/src/test/results/clientpositive/windowing.q.out
ql/src/test/results/clientpositive/windowing_expressions.q.out
To: JIRA, ashutoshc, hbutani
> remove support for lead/lag UDFs outside of UDAF args
> -----------------------------------------------------
>
> Key: HIVE-4126
> URL: https://issues.apache.org/jira/browse/HIVE-4126
> Project: Hive
> Issue Type: Bug
> Components: PTF-Windowing
> Reporter: Harish Butani
> Assignee: Harish Butani
> Attachments: HIVE-4126.D9105.1.patch, HIVE-4126.D9105.2.patch
>
>
> Select Expressions such as
> p_size - lead(p_size,1)
> are currently handled as non aggregation expressions done after all over
> clauses are evaluated.
> Once we allow different partitions in a single select list(Jira 4041), these
> become ambiguous.
> - the equivalent way to do such things is either to use lead/lag UDAFs with
> expressions ( support added with Jira 4081)
> - stack windowing clauses with inline queries. select lead(r,1).. from
> (select rank() as r....)...
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira