[ https://issues.apache.org/jira/browse/SPARK-20840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16028720#comment-16028720 ]
Hyukjin Kwon edited comment on SPARK-20840 at 5/30/17 3:20 AM: --------------------------------------------------------------- [~joshrosen] and [~srowen], I gave a shot to follow ^ but failed because Javadoc errors look not stored in the compile analysis (It looks this way resembles https://github.com/sbt/sbt/blob/5e585e50da7da87fb41ea4ed19e374b84a21010b/main/src/main/scala/sbt/Defaults.scala#L1380-L1388) So, I gave another shot with another similar approach (parsing the logs from Javadoc manually) rough version - https://github.com/apache/spark/compare/master...HyukjinKwon:SPARK-20840?expand=1 I tested this after manually introducing few Javadoc errors several times as below: {code} ... ... [spurious errors] ... [info] Generating .../spark/target/javaunidoc/org/apache/spark/sql/DataFrameStatFunctions.html... ... [info] Generating .../spark/target/spark/target/javaunidoc/org/apache/spark/sql/DataFrameReader.html... [error] .../spark/sql/core/target/java/org/apache/spark/sql/DataFrameReader.java:476: error: unexpected text [error] * Loads a {@link Dataset[String}] storing JSON objects (<a href="http://jsonlines.org/">JSON Lines [error] ^ [info] Generating .../spark/target/javaunidoc/org/apache/spark/sql/DataFrameStatFunctions.html... ... ... [some more actual errors] ... [info] Generating .../spark/target/javaunidoc/org/apache/spark/ui/storage/package-frame.html... [info] Generating .../spark/target/javaunidoc/org/apache/spark/ui/storage/package-summary.html... [info] Generating .../spark/target/javaunidoc/org/apache/spark/ui/storage/package-tree.html... ... [info] 4 error [info] 100 warnings ... [error] 4 error(s) found while generating Java documentation. [error] .../spark/sql/core/target/java/org/apache/spark/sql/DataFrameReader.java:476: error: unexpected text [error] * Loads a {@link Dataset[String}] storing JSON objects (<a href="http://jsonlines.org/">JSON Lines [error] ^ [error] .../spark/sql/core/target/java/org/apache/spark/sql/functions.java:2996: error: self-closing element not allowed [error] * @see <a href="http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html"/> [error] ^ [error] .../spark/sql/core/target/java/org/apache/spark/sql/functions.java:3006: error: reference not found [error] * Convert time string to a Unix timestamp (in seconds) by casting rules to {@link TimestampType}. [error] ^ [error] .../spark/sql/core/target/java/org/apache/spark/sql/functions.java:3271: error: reference not found [error] * (Scala-specific) Parses a column containing a JSON string into a {@link StructType} with the [error] ^ ... [info] Main Scala API documentation successful. ... java.lang.RuntimeException: Failed to generate Java documentation from generated Java codes. at scala.sys.package$.error(package.scala:27) at Unidoc$$anonfun$settings$39.apply(SparkBuild.scala:762) at Unidoc$$anonfun$settings$39.apply(SparkBuild.scala:729) ... [error] (spark/javaunidoc:doc) Failed to generate Java documentation from generated Java codes. [error] Total time: 95 s, completed ... {code} This way prints the errors that we probably need to fix as a kind of a report at the end of Javadoc failure. I tried to not change the existing logs being printed out. The approach is basically to parse {{\[error\] # errors}} and find # {{: error:}} logs in a reversed order from Javadoc logs when the task is failed. This is basically based on my observations so far - https://github.com/apache/spark/pull/17389, https://github.com/apache/spark/pull/16307, https://github.com/apache/spark/pull/15999 and https://github.com/apache/spark/pull/16013. So, I think I am not fully sure it always parses correctly although I guess it will work in most cases. This is not a clean shot and a hacky workaround. So, I am not sure if this is acceptable. Probably, another way I could come up with was about a custom logger, for example, by resembling https://github.com/playframework/playframework/blob/e80a4b41ed487df5a77e23762fb301703f9aad33/framework/src/sbt-plugin/src/sbt-test/play-sbt-plugin/play-position-mapper/project/Build.scala What do you think about this? I could, otherwise, simply print out a log to point out this JIRA. was (Author: hyukjin.kwon): [~joshrosen] and [~srowen], I gave a shot to follow ^ but failed because Javadoc errors look not stored in the compile analysis (It looks this way resembles https://github.com/sbt/sbt/blob/5e585e50da7da87fb41ea4ed19e374b84a21010b/main/src/main/scala/sbt/Defaults.scala#L1380-L1388) So, I gave another shot with another similar approach (parsing the logs from Javadoc manually) rough version - https://github.com/apache/spark/compare/master...HyukjinKwon:SPARK-20840?expand=1 I tested this after manually introducing few Javadoc errors several times as below: {code} ... ... [suspicous errors] ... [info] Generating .../spark/target/javaunidoc/org/apache/spark/sql/DataFrameStatFunctions.html... ... [info] Generating .../spark/target/spark/target/javaunidoc/org/apache/spark/sql/DataFrameReader.html... [error] .../spark/sql/core/target/java/org/apache/spark/sql/DataFrameReader.java:476: error: unexpected text [error] * Loads a {@link Dataset[String}] storing JSON objects (<a href="http://jsonlines.org/">JSON Lines [error] ^ [info] Generating .../spark/target/javaunidoc/org/apache/spark/sql/DataFrameStatFunctions.html... ... ... [some more actual errors] ... [info] Generating .../spark/target/javaunidoc/org/apache/spark/ui/storage/package-frame.html... [info] Generating .../spark/target/javaunidoc/org/apache/spark/ui/storage/package-summary.html... [info] Generating .../spark/target/javaunidoc/org/apache/spark/ui/storage/package-tree.html... ... [info] 4 error [info] 100 warnings ... [error] 4 error(s) found while generating Java documentation. [error] .../spark/sql/core/target/java/org/apache/spark/sql/DataFrameReader.java:476: error: unexpected text [error] * Loads a {@link Dataset[String}] storing JSON objects (<a href="http://jsonlines.org/">JSON Lines [error] ^ [error] .../spark/sql/core/target/java/org/apache/spark/sql/functions.java:2996: error: self-closing element not allowed [error] * @see <a href="http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html"/> [error] ^ [error] .../spark/sql/core/target/java/org/apache/spark/sql/functions.java:3006: error: reference not found [error] * Convert time string to a Unix timestamp (in seconds) by casting rules to {@link TimestampType}. [error] ^ [error] .../spark/sql/core/target/java/org/apache/spark/sql/functions.java:3271: error: reference not found [error] * (Scala-specific) Parses a column containing a JSON string into a {@link StructType} with the [error] ^ ... [info] Main Scala API documentation successful. ... java.lang.RuntimeException: Failed to generate Java documentation from generated Java codes. at scala.sys.package$.error(package.scala:27) at Unidoc$$anonfun$settings$39.apply(SparkBuild.scala:762) at Unidoc$$anonfun$settings$39.apply(SparkBuild.scala:729) ... [error] (spark/javaunidoc:doc) Failed to generate Java documentation from generated Java codes. [error] Total time: 95 s, completed ... {code} This way prints the errors that we probably need to fix as a kind of a report at the end of Javadoc failure. I tried to not change the existing logs being printed out. The approach is basically to parse {{\[error\] # errors}} and find # {{: error:}} logs in a reversed order from Javadoc logs when the task is failed. This is basically based on my observations so far - https://github.com/apache/spark/pull/17389, https://github.com/apache/spark/pull/16307, https://github.com/apache/spark/pull/15999 and https://github.com/apache/spark/pull/16013. So, I think I am not fully sure it always parses correctly although I guess it will work in most cases. This is not a clean shot and a hacky workaround. So, I am not sure if this is acceptable. Probably, another way I could come up with was about a custom logger, for example, by resembling https://github.com/playframework/playframework/blob/e80a4b41ed487df5a77e23762fb301703f9aad33/framework/src/sbt-plugin/src/sbt-test/play-sbt-plugin/play-position-mapper/project/Build.scala What do you think about this? I could, otherwise, simply print out a log to point out this JIRA. > Misleading spurious errors when there are Javadoc (Unidoc) breaks > ----------------------------------------------------------------- > > Key: SPARK-20840 > URL: https://issues.apache.org/jira/browse/SPARK-20840 > Project: Spark > Issue Type: Bug > Components: Build, Project Infra > Affects Versions: 2.2.0 > Reporter: Hyukjin Kwon > > Currently, when there are Javadoc breaks, this seems printing warnings as > errors. > For example, the actual errors were as below in > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77070/consoleFull > {code} > [error] > /home/jenkins/workspace/SparkPullRequestBuilder@2/core/target/java/org/apache/spark/scheduler/HighlyCompressedMapStatus.java:4: > error: reference not found > [error] * than both {@link config.SHUFFLE_ACCURATE_BLOCK_THRESHOLD} and > [error] ^ > [error] > /home/jenkins/workspace/SparkPullRequestBuilder@2/core/target/java/org/apache/spark/scheduler/HighlyCompressedMapStatus.java:5: > error: reference not found > [error] * {@link config.SHUFFLE_ACCURATE_BLOCK_THRESHOLD_BY_TIMES_AVERAGE} * > averageSize. It stores the > [error] ^ > {code} > but it also prints many errors from generated Java codes as below: > {code} > [info] Constructing Javadoc information... > [error] > /home/jenkins/workspace/SparkPullRequestBuilder@2/core/target/java/org/apache/spark/scheduler/BlacklistTracker.java:117: > error: ExecutorAllocationClient is not public in org.apache.spark; cannot be > accessed from outside package > [error] public BlacklistTracker > (org.apache.spark.scheduler.LiveListenerBus listenerBus, > org.apache.spark.SparkConf conf, > scala.Option<org.apache.spark.ExecutorAllocationClient> allocationClient, > org.apache.spark.util.Clock clock) { throw new RuntimeException(); } > [error] > ^ > [error] > /home/jenkins/workspace/SparkPullRequestBuilder@2/core/target/java/org/apache/spark/scheduler/BlacklistTracker.java:118: > error: ExecutorAllocationClient is not public in org.apache.spark; cannot be > accessed from outside package > [error] public BlacklistTracker (org.apache.spark.SparkContext sc, > scala.Option<org.apache.spark.ExecutorAllocationClient> allocationClient) { > throw new RuntimeException(); } > [error] > ^ > [error] > /home/jenkins/workspace/SparkPullRequestBuilder@2/core/target/java/org/apache/spark/SparkConf.java:133: > error: ConfigReader is not public in org.apache.spark.internal.config; > cannot be accessed from outside package > [error] private org.apache.spark.internal.config.ConfigReader reader () { > throw new RuntimeException(); } > [error] ^ > [error] > /home/jenkins/workspace/SparkPullRequestBuilder@2/core/target/java/org/apache/spark/SparkConf.java:138: > error: ConfigEntry is not public in org.apache.spark.internal.config; cannot > be accessed from outside package > [error] <T extends java.lang.Object> org.apache.spark.SparkConf set > (org.apache.spark.internal.config.ConfigEntry<T> entry, T value) { throw new > RuntimeException(); } > [error] > ^ > [error] > /home/jenkins/workspace/SparkPullRequestBuilder@2/core/target/java/org/apache/spark/SparkConf.java:139: > error: OptionalConfigEntry is not public in > org.apache.spark.internal.config; cannot be accessed from outside package > [error] <T extends java.lang.Object> org.apache.spark.SparkConf set > (org.apache.spark.internal.config.OptionalConfigEntry<T> entry, T value) { > throw new RuntimeException(); } > [error] > ^ > [error] > /home/jenkins/workspace/SparkPullRequestBuilder@2/core/target/java/org/apache/spark/SparkConf.java:187: > error: ConfigEntry is not public in org.apache.spark.internal.config; cannot > be accessed from outside package > [error] <T extends java.lang.Object> org.apache.spark.SparkConf > setIfMissing (org.apache.spark.internal.config.ConfigEntry<T> entry, T value) > { throw new RuntimeException(); } > [error] > ^ > [error] > /home/jenkins/workspace/SparkPullRequestBuilder@2/core/target/java/org/apache/spark/SparkConf.java:188: > error: OptionalConfigEntry is not public in > org.apache.spark.internal.config; cannot be accessed from outside package > [error] <T extends java.lang.Object> org.apache.spark.SparkConf > setIfMissing (org.apache.spark.internal.config.OptionalConfigEntry<T> entry, > T value) { throw new RuntimeException(); } > [error] > ^ > [error] > /home/jenkins/workspace/SparkPullRequestBuilder@2/core/target/java/org/apache/spark/SparkConf.java:208: > error: ConfigEntry is not public in org.apache.spark.internal.config; cannot > be accessed from outside package > [error] org.apache.spark.SparkConf remove > (org.apache.spark.internal.config.ConfigEntry<?> entry) { throw new > RuntimeException(); } > [error] > ... > {code} > These errors are actually warnings in a successful build without Javadoc > breaks as below - > https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.7/2908/consoleFull > {code} > [info] Constructing Javadoc information... > [warn] > /home/jenkins/workspace/spark-master-test-sbt-hadoop-2.7/core/target/java/org/apache/spark/scheduler/BlacklistTracker.java:117: > error: ExecutorAllocationClient is not public in org.apache.spark; cannot be > accessed from outside package > [warn] public BlacklistTracker > (org.apache.spark.scheduler.LiveListenerBus listenerBus, > org.apache.spark.SparkConf conf, > scala.Option<org.apache.spark.ExecutorAllocationClient> allocationClient, > org.apache.spark.util.Clock clock) { throw new RuntimeException(); } > [warn] > ^ > [warn] > /home/jenkins/workspace/spark-master-test-sbt-hadoop-2.7/core/target/java/org/apache/spark/scheduler/BlacklistTracker.java:118: > error: ExecutorAllocationClient is not public in org.apache.spark; cannot be > accessed from outside package > [warn] public BlacklistTracker (org.apache.spark.SparkContext sc, > scala.Option<org.apache.spark.ExecutorAllocationClient> allocationClient) { > throw new RuntimeException(); } > [warn] > ^ > [warn] > /home/jenkins/workspace/spark-master-test-sbt-hadoop-2.7/core/target/java/org/apache/spark/SparkConf.java:133: > error: ConfigReader is not public in org.apache.spark.internal.config; > cannot be accessed from outside package > [warn] private org.apache.spark.internal.config.ConfigReader reader () { > throw new RuntimeException(); } > [warn] ^ > [warn] > /home/jenkins/workspace/spark-master-test-sbt-hadoop-2.7/core/target/java/org/apache/spark/SparkConf.java:138: > error: ConfigEntry is not public in org.apache.spark.internal.config; cannot > be accessed from outside package > [warn] <T extends java.lang.Object> org.apache.spark.SparkConf set > (org.apache.spark.internal.config.ConfigEntry<T> entry, T value) { throw new > RuntimeException(); } > [warn] > ^ > [warn] > /home/jenkins/workspace/spark-master-test-sbt-hadoop-2.7/core/target/java/org/apache/spark/SparkConf.java:139: > error: OptionalConfigEntry is not public in > org.apache.spark.internal.config; cannot be accessed from outside package > [warn] <T extends java.lang.Object> org.apache.spark.SparkConf set > (org.apache.spark.internal.config.OptionalConfigEntry<T> entry, T value) { > throw new RuntimeException(); } > [warn] > ^ > [warn] > /home/jenkins/workspace/spark-master-test-sbt-hadoop-2.7/core/target/java/org/apache/spark/SparkConf.java:187: > error: ConfigEntry is not public in org.apache.spark.internal.config; cannot > be accessed from outside package > [warn] <T extends java.lang.Object> org.apache.spark.SparkConf > setIfMissing (org.apache.spark.internal.config.ConfigEntry<T> entry, T value) > { throw new RuntimeException(); } > [warn] > ^ > [warn] > /home/jenkins/workspace/spark-master-test-sbt-hadoop-2.7/core/target/java/org/apache/spark/SparkConf.java:188: > error: OptionalConfigEntry is not public in > org.apache.spark.internal.config; cannot be accessed from outside package > [warn] <T extends java.lang.Object> org.apache.spark.SparkConf > setIfMissing (org.apache.spark.internal.config.OptionalConfigEntry<T> entry, > T value) { throw new RuntimeException(); } > [warn] > ^ > [warn] > /home/jenkins/workspace/spark-master-test-sbt-hadoop-2.7/core/target/java/org/apache/spark/SparkConf.java:208: > error: ConfigEntry is not public in org.apache.spark.internal.config; cannot > be accessed from outside package > [warn] org.apache.spark.SparkConf remove > (org.apache.spark.internal.config.ConfigEntry<?> entry) { throw new > RuntimeException(); } > [warn] > ... > {code} > These look warnings not errors in {{javadoc}} but when we introduce a Javadoc > break but it seems sbt produces other warnings as errors when generating > javadoc. > For example, with the Java code, {{A.java}}, below: > {code} > /** > * Hi > */ > public class A extends B { > } > {code} > if we run {{javadoc}} > {code} > javadoc A.java > {code} > it produces a warning because it does not find B symbol. It seems still > generating the documenation fine. > {code} > Loading source file A.java... > Constructing Javadoc information... > A.java:4: error: cannot find symbol > public class A extends B { > ^ > symbol: class B > Standard Doclet version 1.8.0_45 > Building tree for all the packages and classes... > Generating ./A.html... > Generating ./package-frame.html... > Generating ./package-summary.html... > Generating ./package-tree.html... > Generating ./constant-values.html... > Building index for all the packages and classes... > Generating ./overview-tree.html... > Generating ./index-all.html... > Generating ./deprecated-list.html... > Building index for all classes... > Generating ./allclasses-frame.html... > Generating ./allclasses-noframe.html... > Generating ./index.html... > Generating ./help-doc.html... > 1 warning > {code} > However, if we have a javadoc break in comments as below: > {code} > /** > * Hi > * @see B > */ > public class A extends B { > } > {code} > this produces an error and warning. > {code} > Loading source file A.java... > Constructing Javadoc information... > A.java:5: error: cannot find symbol > public class A extends B { > ^ > symbol: class B > Standard Doclet version 1.8.0_45 > Building tree for all the packages and classes... > Generating ./A.html... > A.java:3: error: reference not found > * @see B > ^ > Generating ./package-frame.html... > Generating ./package-summary.html... > Generating ./package-tree.html... > Generating ./constant-values.html... > Building index for all the packages and classes... > Generating ./overview-tree.html... > Generating ./index-all.html... > Generating ./deprecated-list.html... > Building index for all classes... > Generating ./allclasses-frame.html... > Generating ./allclasses-noframe.html... > Generating ./index.html... > Generating ./help-doc.html... > 1 error > 1 warning > {code} > It seems {{sbt unidoc}} recognises errors and also warnings as {{\[error\]}} > when there are breaks (the related context looks described in > https://github.com/sbt/sbt/issues/875#issuecomment-24315400). > Given my observations so far, it is generally okay to just fix {{\[info\] # > errors}} printed at the bottom which are usually produced in generating the > html {{Building tree for all the packages and classes...}} phase. > Essentially, this looks a bug in GenJavaDoc which generates Java codes > wrongly and a bug in SBT that fails to distinguish warnings and errors in > this case. > This message via Jenkins actually looks confusing. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org