[jira] [Commented] (HIVE-11139) Emit more lineage information
[ https://issues.apache.org/jira/browse/HIVE-11139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14746378#comment-14746378 ] Jimmy Xiang commented on HIVE-11139: Filed HIVE-11834 to track this issue. Thanks Mark. > Emit more lineage information > - > > Key: HIVE-11139 > URL: https://issues.apache.org/jira/browse/HIVE-11139 > Project: Hive > Issue Type: Improvement >Reporter: Jimmy Xiang >Assignee: Jimmy Xiang > Fix For: 1.3.0, 2.0.0 > > Attachments: HIVE-11139.1.patch, HIVE-11139.2.patch, > HIVE-11139.3.patch > > > HIVE-1131 emits some column lineage info. But it doesn't support INSERT > statements, or CTAS statements. It doesn't emit the predicate information > either. > We can enhance and use the dependency information created in HIVE-1131, > generate more complete lineage info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11139) Emit more lineage information
[ https://issues.apache.org/jira/browse/HIVE-11139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14745937#comment-14745937 ] Mark Grover commented on HIVE-11139: I have a dynamic partitioning query but at the end of the query it shows me an error message like: {quote} ERROR : Result schema has 2 fields, but we don't get as many dependencies {quote} Going through the source code, led me to this commit. Was this tested to make sure it works fine with dynamic partitioning. Here's my query btw? {code} SET hive.exec.dynamic.partition.mode=nonstrict; DROP TABLE IF EXISTS default.src_mark; CREATE TABLE default.src_mark (first string, word string) PARTITIONED BY (length int) STORED AS PARQUET; INSERT INTO TABLE default.src_mark PARTITION(length) SELECT first, word, length FROM spark_hive.src_flat; {code} And, I verified that all the values in src_flat conform to the schema. Also, at the very least it would be helpful to know what the number of dependencies and what their names were in the error message: https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/hooks/LineageLogger.java#L251 Your thoughts would be much appreciated! > Emit more lineage information > - > > Key: HIVE-11139 > URL: https://issues.apache.org/jira/browse/HIVE-11139 > Project: Hive > Issue Type: Improvement >Reporter: Jimmy Xiang >Assignee: Jimmy Xiang > Fix For: 1.3.0, 2.0.0 > > Attachments: HIVE-11139.1.patch, HIVE-11139.2.patch, > HIVE-11139.3.patch > > > HIVE-1131 emits some column lineage info. But it doesn't support INSERT > statements, or CTAS statements. It doesn't emit the predicate information > either. > We can enhance and use the dependency information created in HIVE-1131, > generate more complete lineage info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11139) Emit more lineage information
[ https://issues.apache.org/jira/browse/HIVE-11139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14632922#comment-14632922 ] Prasanth Jayachandran commented on HIVE-11139: -- [~jxiang] Yes you are right. Setting maxBackupIndex to 0 will truncate the file. If the aim of the appender is to not delete files, then how about using the normal FileAppender which should never delete/roll the files? If rolling up is desired, then recommendation from log4j is to use maxBackupIndex to <10 (for performance reasons) and use a high value for maxFileSize (in the order of GBs). Similar discussion is here https://jazz.net/forum/questions/150960/orgapachelog4jrollingfileappender-maxbackupindex-limit > Emit more lineage information > - > > Key: HIVE-11139 > URL: https://issues.apache.org/jira/browse/HIVE-11139 > Project: Hive > Issue Type: Improvement >Reporter: Jimmy Xiang >Assignee: Jimmy Xiang > Fix For: 1.3.0, 2.0.0 > > Attachments: HIVE-11139.1.patch, HIVE-11139.2.patch, > HIVE-11139.3.patch > > > HIVE-1131 emits some column lineage info. But it doesn't support INSERT > statements, or CTAS statements. It doesn't emit the predicate information > either. > We can enhance and use the dependency information created in HIVE-1131, > generate more complete lineage info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11139) Emit more lineage information
[ https://issues.apache.org/jira/browse/HIVE-11139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14632908#comment-14632908 ] Jimmy Xiang commented on HIVE-11139: [~prasanth_j], this new RFA never renames/deletes a log file. Per the javadoc of RFA, it seems that the log file is truncated and no backup file is created is maxBackupIndex = 0, right? If we can achieve the same thing with the existing RFA, it will be great. If not, we are happy to port it to log4j2 too. > Emit more lineage information > - > > Key: HIVE-11139 > URL: https://issues.apache.org/jira/browse/HIVE-11139 > Project: Hive > Issue Type: Improvement >Reporter: Jimmy Xiang >Assignee: Jimmy Xiang > Fix For: 1.3.0, 2.0.0 > > Attachments: HIVE-11139.1.patch, HIVE-11139.2.patch, > HIVE-11139.3.patch > > > HIVE-1131 emits some column lineage info. But it doesn't support INSERT > statements, or CTAS statements. It doesn't emit the predicate information > either. > We can enhance and use the dependency information created in HIVE-1131, > generate more complete lineage info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11139) Emit more lineage information
[ https://issues.apache.org/jira/browse/HIVE-11139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14632655#comment-14632655 ] Prasanth Jayachandran commented on HIVE-11139: -- Hi [~szehon] and [~jxiang].. I am working on HIVE-11304 and in the process noticed that this jira added a new RFA NoDeleteRollingFileAppender. I am wondering what is the purpose of it? If I understand correctly, it doesn't delete the old rollover files under any condition. If that's the case, similar behaviour can be obtained by setting the maxBackupIndex to negative value or 0 by default in log4j.properties file. http://grepcode.com/file/repo1.maven.org/maven2/log4j/log4j/1.2.17/org/apache/log4j/RollingFileAppender.java#141 The delete codepath gets triggered only when maxBackupIndex is > 0 which should get you the same behaviour of not deleting at all. If it serves a different purpose, can you guys please explain it? Its hard to port such custom appenders to log4j2, as the APIs are not compatible. > Emit more lineage information > - > > Key: HIVE-11139 > URL: https://issues.apache.org/jira/browse/HIVE-11139 > Project: Hive > Issue Type: Improvement >Reporter: Jimmy Xiang >Assignee: Jimmy Xiang > Fix For: 1.3.0, 2.0.0 > > Attachments: HIVE-11139.1.patch, HIVE-11139.2.patch, > HIVE-11139.3.patch > > > HIVE-1131 emits some column lineage info. But it doesn't support INSERT > statements, or CTAS statements. It doesn't emit the predicate information > either. > We can enhance and use the dependency information created in HIVE-1131, > generate more complete lineage info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11139) Emit more lineage information
[ https://issues.apache.org/jira/browse/HIVE-11139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611676#comment-14611676 ] Hive QA commented on HIVE-11139: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12743197/HIVE-11139.3.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9134 tests executed *Failed tests:* {noformat} TestCustomAuthentication - did not produce a TEST-*.xml file {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4473/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4473/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4473/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12743197 - PreCommit-HIVE-TRUNK-Build > Emit more lineage information > - > > Key: HIVE-11139 > URL: https://issues.apache.org/jira/browse/HIVE-11139 > Project: Hive > Issue Type: Improvement >Reporter: Jimmy Xiang >Assignee: Jimmy Xiang > Fix For: 2.0.0 > > Attachments: HIVE-11139.1.patch, HIVE-11139.2.patch, > HIVE-11139.3.patch > > > HIVE-1131 emits some column lineage info. But it doesn't support INSERT > statements, or CTAS statements. It doesn't emit the predicate information > either. > We can enhance and use the dependency information created in HIVE-1131, > generate more complete lineage info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11139) Emit more lineage information
[ https://issues.apache.org/jira/browse/HIVE-11139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611194#comment-14611194 ] Szehon Ho commented on HIVE-11139: -- +1 > Emit more lineage information > - > > Key: HIVE-11139 > URL: https://issues.apache.org/jira/browse/HIVE-11139 > Project: Hive > Issue Type: Improvement >Reporter: Jimmy Xiang >Assignee: Jimmy Xiang > Fix For: 2.0.0 > > Attachments: HIVE-11139.1.patch, HIVE-11139.2.patch > > > HIVE-1131 emits some column lineage info. But it doesn't support INSERT > statements, or CTAS statements. It doesn't emit the predicate information > either. > We can enhance and use the dependency information created in HIVE-1131, > generate more complete lineage info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11139) Emit more lineage information
[ https://issues.apache.org/jira/browse/HIVE-11139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611188#comment-14611188 ] Jimmy Xiang commented on HIVE-11139: Yeah, v2 is on RB. > Emit more lineage information > - > > Key: HIVE-11139 > URL: https://issues.apache.org/jira/browse/HIVE-11139 > Project: Hive > Issue Type: Improvement >Reporter: Jimmy Xiang >Assignee: Jimmy Xiang > Fix For: 2.0.0 > > Attachments: HIVE-11139.1.patch, HIVE-11139.2.patch > > > HIVE-1131 emits some column lineage info. But it doesn't support INSERT > statements, or CTAS statements. It doesn't emit the predicate information > either. > We can enhance and use the dependency information created in HIVE-1131, > generate more complete lineage info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11139) Emit more lineage information
[ https://issues.apache.org/jira/browse/HIVE-11139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611184#comment-14611184 ] Szehon Ho commented on HIVE-11139: -- Can you update the review board? > Emit more lineage information > - > > Key: HIVE-11139 > URL: https://issues.apache.org/jira/browse/HIVE-11139 > Project: Hive > Issue Type: Improvement >Reporter: Jimmy Xiang >Assignee: Jimmy Xiang > Fix For: 2.0.0 > > Attachments: HIVE-11139.1.patch, HIVE-11139.2.patch > > > HIVE-1131 emits some column lineage info. But it doesn't support INSERT > statements, or CTAS statements. It doesn't emit the predicate information > either. > We can enhance and use the dependency information created in HIVE-1131, > generate more complete lineage info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11139) Emit more lineage information
[ https://issues.apache.org/jira/browse/HIVE-11139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611171#comment-14611171 ] Hive QA commented on HIVE-11139: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12743113/HIVE-11139.2.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 9135 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_ptf org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_ptf {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4466/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4466/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4466/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12743113 - PreCommit-HIVE-TRUNK-Build > Emit more lineage information > - > > Key: HIVE-11139 > URL: https://issues.apache.org/jira/browse/HIVE-11139 > Project: Hive > Issue Type: Improvement >Reporter: Jimmy Xiang >Assignee: Jimmy Xiang > Fix For: 2.0.0 > > Attachments: HIVE-11139.1.patch, HIVE-11139.2.patch > > > HIVE-1131 emits some column lineage info. But it doesn't support INSERT > statements, or CTAS statements. It doesn't emit the predicate information > either. > We can enhance and use the dependency information created in HIVE-1131, > generate more complete lineage info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11139) Emit more lineage information
[ https://issues.apache.org/jira/browse/HIVE-11139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14609129#comment-14609129 ] Hive QA commented on HIVE-11139: {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12742613/HIVE-11139.1.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4448/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4448/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4448/ Messages: {noformat} This message was trimmed, see log for full details [copy] Copying 11 files to /data/hive-ptest/working/apache-github-source-source/spark-client/target/tmp/conf [INFO] Executed tasks [INFO] [INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ spark-client --- [INFO] Compiling 5 source files to /data/hive-ptest/working/apache-github-source-source/spark-client/target/test-classes [INFO] [INFO] --- maven-dependency-plugin:2.8:copy (copy-guava-14) @ spark-client --- [INFO] Configured Artifact: com.google.guava:guava:14.0.1:jar [INFO] Copying guava-14.0.1.jar to /data/hive-ptest/working/apache-github-source-source/spark-client/target/dependency/guava-14.0.1.jar [INFO] [INFO] --- maven-surefire-plugin:2.16:test (default-test) @ spark-client --- [INFO] Tests are skipped. [INFO] [INFO] --- maven-jar-plugin:2.2:jar (default-jar) @ spark-client --- [INFO] Building jar: /data/hive-ptest/working/apache-github-source-source/spark-client/target/spark-client-2.0.0-SNAPSHOT.jar [INFO] [INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ spark-client --- [INFO] [INFO] --- maven-install-plugin:2.4:install (default-install) @ spark-client --- [INFO] Installing /data/hive-ptest/working/apache-github-source-source/spark-client/target/spark-client-2.0.0-SNAPSHOT.jar to /home/hiveptest/.m2/repository/org/apache/hive/spark-client/2.0.0-SNAPSHOT/spark-client-2.0.0-SNAPSHOT.jar [INFO] Installing /data/hive-ptest/working/apache-github-source-source/spark-client/pom.xml to /home/hiveptest/.m2/repository/org/apache/hive/spark-client/2.0.0-SNAPSHOT/spark-client-2.0.0-SNAPSHOT.pom [INFO] [INFO] [INFO] Building Hive Query Language 2.0.0-SNAPSHOT [INFO] [INFO] [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-exec --- [INFO] Deleting /data/hive-ptest/working/apache-github-source-source/ql/target [INFO] Deleting /data/hive-ptest/working/apache-github-source-source/ql (includes = [datanucleus.log, derby.log], excludes = []) [INFO] [INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-no-snapshots) @ hive-exec --- [INFO] [INFO] --- maven-antrun-plugin:1.7:run (generate-sources) @ hive-exec --- [INFO] Executing tasks main: [mkdir] Created dir: /data/hive-ptest/working/apache-github-source-source/ql/target/generated-sources/java/org/apache/hadoop/hive/ql/exec/vector/expressions/gen [mkdir] Created dir: /data/hive-ptest/working/apache-github-source-source/ql/target/generated-sources/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/gen [mkdir] Created dir: /data/hive-ptest/working/apache-github-source-source/ql/target/generated-test-sources/java/org/apache/hadoop/hive/ql/exec/vector/expressions/gen Generating vector expression code Generating vector expression test code [INFO] Executed tasks [INFO] [INFO] --- build-helper-maven-plugin:1.8:add-source (add-source) @ hive-exec --- [INFO] Source directory: /data/hive-ptest/working/apache-github-source-source/ql/src/gen/protobuf/gen-java added. [INFO] Source directory: /data/hive-ptest/working/apache-github-source-source/ql/src/gen/thrift/gen-javabean added. [INFO] Source directory: /data/hive-ptest/working/apache-github-source-source/ql/target/generated-sources/java added. [INFO] [INFO] --- antlr3-maven-plugin:3.4:antlr (default) @ hive-exec --- [INFO] ANTLR: Processing source directory /data/hive-ptest/working/apache-github-source-source/ql/src/java ANTLR Parser Generator Version 3.4 org/apache/hadoop/hive/ql/parse/HiveLexer.g org/apache/hadoop/hive/ql/parse/HiveParser.g warning(200): IdentifiersParser.g:455:5: Decision can match input such as "{KW_REGEXP, KW_RLIKE} KW_UNION KW_MAP" using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:455:5: Decision can match input such as "{KW_REGEXP, KW_RLIKE} KW_UNION KW_SELECT" using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that