[ 
https://issues.apache.org/jira/browse/SPARK-20840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16028720#comment-16028720
 ] 

Hyukjin Kwon edited comment on SPARK-20840 at 5/30/17 3:20 AM:
---------------------------------------------------------------

[~joshrosen] and [~srowen], I gave a shot to follow ^ but failed because 
Javadoc errors look not stored in the compile analysis (It looks this way 
resembles 
https://github.com/sbt/sbt/blob/5e585e50da7da87fb41ea4ed19e374b84a21010b/main/src/main/scala/sbt/Defaults.scala#L1380-L1388)

So, I gave another shot with another similar approach (parsing the logs from 
Javadoc manually) rough version - 
https://github.com/apache/spark/compare/master...HyukjinKwon:SPARK-20840?expand=1

I tested this after manually introducing few Javadoc errors several times as 
below:


{code}
...
... [spurious errors]
...
[info] Generating 
.../spark/target/javaunidoc/org/apache/spark/sql/DataFrameStatFunctions.html...
...
[info] Generating 
.../spark/target/spark/target/javaunidoc/org/apache/spark/sql/DataFrameReader.html...
[error] 
.../spark/sql/core/target/java/org/apache/spark/sql/DataFrameReader.java:476: 
error: unexpected text
[error]    * Loads a {@link Dataset[String}] storing JSON objects (<a 
href="http://jsonlines.org/";>JSON Lines
[error]              ^
[info] Generating 
.../spark/target/javaunidoc/org/apache/spark/sql/DataFrameStatFunctions.html...
...
... [some more actual errors]
...
[info] Generating 
.../spark/target/javaunidoc/org/apache/spark/ui/storage/package-frame.html...
[info] Generating 
.../spark/target/javaunidoc/org/apache/spark/ui/storage/package-summary.html...
[info] Generating 
.../spark/target/javaunidoc/org/apache/spark/ui/storage/package-tree.html...
...
[info] 4 error
[info] 100 warnings
...
[error] 4 error(s) found while generating Java documentation.
[error] 
.../spark/sql/core/target/java/org/apache/spark/sql/DataFrameReader.java:476: 
error: unexpected text
[error]    * Loads a {@link Dataset[String}] storing JSON objects (<a 
href="http://jsonlines.org/";>JSON Lines
[error]              ^
[error] 
.../spark/sql/core/target/java/org/apache/spark/sql/functions.java:2996: error: 
self-closing element not allowed
[error]    * @see <a 
href="http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html"/>
[error]           ^
[error] 
.../spark/sql/core/target/java/org/apache/spark/sql/functions.java:3006: error: 
reference not found
[error]    * Convert time string to a Unix timestamp (in seconds) by casting 
rules to {@link TimestampType}.
[error]                                                                         
             ^
[error] 
.../spark/sql/core/target/java/org/apache/spark/sql/functions.java:3271: error: 
reference not found
[error]    * (Scala-specific) Parses a column containing a JSON string into a 
{@link StructType} with the
[error]                                                                         
     ^
...
[info] Main Scala API documentation successful.
...
java.lang.RuntimeException: Failed to generate Java documentation from 
generated Java codes.
        at scala.sys.package$.error(package.scala:27)
        at Unidoc$$anonfun$settings$39.apply(SparkBuild.scala:762)
        at Unidoc$$anonfun$settings$39.apply(SparkBuild.scala:729)
...
[error] (spark/javaunidoc:doc) Failed to generate Java documentation from 
generated Java codes.
[error] Total time: 95 s, completed ...
{code}

This way prints the errors that we probably need to fix as a kind of a report 
at the end of Javadoc failure. I tried to not change the existing logs being 
printed out.

The approach is basically to parse {{\[error\] # errors}} and find # {{: 
error:}} logs in a reversed order from Javadoc logs when the task is failed. 
This is basically based on my observations so far - 
https://github.com/apache/spark/pull/17389, 
https://github.com/apache/spark/pull/16307, 
https://github.com/apache/spark/pull/15999 and 
https://github.com/apache/spark/pull/16013. So, I think I am not fully sure it 
always parses correctly although I guess it will work in most cases.

This is not a clean shot and a hacky workaround. So, I am not sure if this is 
acceptable. Probably, another way I could come up with was about a custom 
logger, for example, by resembling 
https://github.com/playframework/playframework/blob/e80a4b41ed487df5a77e23762fb301703f9aad33/framework/src/sbt-plugin/src/sbt-test/play-sbt-plugin/play-position-mapper/project/Build.scala

What do you think about this? I could, otherwise, simply print out a log to 
point out this JIRA.



was (Author: hyukjin.kwon):
[~joshrosen] and [~srowen], I gave a shot to follow ^ but failed because 
Javadoc errors look not stored in the compile analysis (It looks this way 
resembles 
https://github.com/sbt/sbt/blob/5e585e50da7da87fb41ea4ed19e374b84a21010b/main/src/main/scala/sbt/Defaults.scala#L1380-L1388)

So, I gave another shot with another similar approach (parsing the logs from 
Javadoc manually) rough version - 
https://github.com/apache/spark/compare/master...HyukjinKwon:SPARK-20840?expand=1

I tested this after manually introducing few Javadoc errors several times as 
below:


{code}
...
... [suspicous errors]
...
[info] Generating 
.../spark/target/javaunidoc/org/apache/spark/sql/DataFrameStatFunctions.html...
...
[info] Generating 
.../spark/target/spark/target/javaunidoc/org/apache/spark/sql/DataFrameReader.html...
[error] 
.../spark/sql/core/target/java/org/apache/spark/sql/DataFrameReader.java:476: 
error: unexpected text
[error]    * Loads a {@link Dataset[String}] storing JSON objects (<a 
href="http://jsonlines.org/";>JSON Lines
[error]              ^
[info] Generating 
.../spark/target/javaunidoc/org/apache/spark/sql/DataFrameStatFunctions.html...
...
... [some more actual errors]
...
[info] Generating 
.../spark/target/javaunidoc/org/apache/spark/ui/storage/package-frame.html...
[info] Generating 
.../spark/target/javaunidoc/org/apache/spark/ui/storage/package-summary.html...
[info] Generating 
.../spark/target/javaunidoc/org/apache/spark/ui/storage/package-tree.html...
...
[info] 4 error
[info] 100 warnings
...
[error] 4 error(s) found while generating Java documentation.
[error] 
.../spark/sql/core/target/java/org/apache/spark/sql/DataFrameReader.java:476: 
error: unexpected text
[error]    * Loads a {@link Dataset[String}] storing JSON objects (<a 
href="http://jsonlines.org/";>JSON Lines
[error]              ^
[error] 
.../spark/sql/core/target/java/org/apache/spark/sql/functions.java:2996: error: 
self-closing element not allowed
[error]    * @see <a 
href="http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html"/>
[error]           ^
[error] 
.../spark/sql/core/target/java/org/apache/spark/sql/functions.java:3006: error: 
reference not found
[error]    * Convert time string to a Unix timestamp (in seconds) by casting 
rules to {@link TimestampType}.
[error]                                                                         
             ^
[error] 
.../spark/sql/core/target/java/org/apache/spark/sql/functions.java:3271: error: 
reference not found
[error]    * (Scala-specific) Parses a column containing a JSON string into a 
{@link StructType} with the
[error]                                                                         
     ^
...
[info] Main Scala API documentation successful.
...
java.lang.RuntimeException: Failed to generate Java documentation from 
generated Java codes.
        at scala.sys.package$.error(package.scala:27)
        at Unidoc$$anonfun$settings$39.apply(SparkBuild.scala:762)
        at Unidoc$$anonfun$settings$39.apply(SparkBuild.scala:729)
...
[error] (spark/javaunidoc:doc) Failed to generate Java documentation from 
generated Java codes.
[error] Total time: 95 s, completed ...
{code}

This way prints the errors that we probably need to fix as a kind of a report 
at the end of Javadoc failure. I tried to not change the existing logs being 
printed out.

The approach is basically to parse {{\[error\] # errors}} and find # {{: 
error:}} logs in a reversed order from Javadoc logs when the task is failed. 
This is basically based on my observations so far - 
https://github.com/apache/spark/pull/17389, 
https://github.com/apache/spark/pull/16307, 
https://github.com/apache/spark/pull/15999 and 
https://github.com/apache/spark/pull/16013. So, I think I am not fully sure it 
always parses correctly although I guess it will work in most cases.

This is not a clean shot and a hacky workaround. So, I am not sure if this is 
acceptable. Probably, another way I could come up with was about a custom 
logger, for example, by resembling 
https://github.com/playframework/playframework/blob/e80a4b41ed487df5a77e23762fb301703f9aad33/framework/src/sbt-plugin/src/sbt-test/play-sbt-plugin/play-position-mapper/project/Build.scala

What do you think about this? I could, otherwise, simply print out a log to 
point out this JIRA.


> Misleading spurious errors when there are Javadoc (Unidoc) breaks
> -----------------------------------------------------------------
>
>                 Key: SPARK-20840
>                 URL: https://issues.apache.org/jira/browse/SPARK-20840
>             Project: Spark
>          Issue Type: Bug
>          Components: Build, Project Infra
>    Affects Versions: 2.2.0
>            Reporter: Hyukjin Kwon
>
> Currently, when there are Javadoc breaks, this seems printing warnings as 
> errors.
> For example, the actual errors were as below in 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77070/consoleFull
> {code}
> [error] 
> /home/jenkins/workspace/SparkPullRequestBuilder@2/core/target/java/org/apache/spark/scheduler/HighlyCompressedMapStatus.java:4:
>  error: reference not found
> [error]  * than both {@link config.SHUFFLE_ACCURATE_BLOCK_THRESHOLD} and
> [error]                     ^
> [error] 
> /home/jenkins/workspace/SparkPullRequestBuilder@2/core/target/java/org/apache/spark/scheduler/HighlyCompressedMapStatus.java:5:
>  error: reference not found
> [error]  * {@link config.SHUFFLE_ACCURATE_BLOCK_THRESHOLD_BY_TIMES_AVERAGE} * 
> averageSize. It stores the
> [error]           ^
> {code}
> but it also prints many errors from generated Java codes as below:
> {code}
> [info] Constructing Javadoc information...
> [error] 
> /home/jenkins/workspace/SparkPullRequestBuilder@2/core/target/java/org/apache/spark/scheduler/BlacklistTracker.java:117:
>  error: ExecutorAllocationClient is not public in org.apache.spark; cannot be 
> accessed from outside package
> [error]   public   BlacklistTracker 
> (org.apache.spark.scheduler.LiveListenerBus listenerBus, 
> org.apache.spark.SparkConf conf, 
> scala.Option<org.apache.spark.ExecutorAllocationClient> allocationClient, 
> org.apache.spark.util.Clock clock)  { throw new RuntimeException(); }
> [error]                                                                       
>                                                                              ^
> [error] 
> /home/jenkins/workspace/SparkPullRequestBuilder@2/core/target/java/org/apache/spark/scheduler/BlacklistTracker.java:118:
>  error: ExecutorAllocationClient is not public in org.apache.spark; cannot be 
> accessed from outside package
> [error]   public   BlacklistTracker (org.apache.spark.SparkContext sc, 
> scala.Option<org.apache.spark.ExecutorAllocationClient> allocationClient)  { 
> throw new RuntimeException(); }
> [error]                                                                       
>                       ^
> [error] 
> /home/jenkins/workspace/SparkPullRequestBuilder@2/core/target/java/org/apache/spark/SparkConf.java:133:
>  error: ConfigReader is not public in org.apache.spark.internal.config; 
> cannot be accessed from outside package
> [error]   private  org.apache.spark.internal.config.ConfigReader reader ()  { 
> throw new RuntimeException(); }
> [error]                                            ^
> [error] 
> /home/jenkins/workspace/SparkPullRequestBuilder@2/core/target/java/org/apache/spark/SparkConf.java:138:
>  error: ConfigEntry is not public in org.apache.spark.internal.config; cannot 
> be accessed from outside package
> [error]    <T extends java.lang.Object> org.apache.spark.SparkConf set 
> (org.apache.spark.internal.config.ConfigEntry<T> entry, T value)  { throw new 
> RuntimeException(); }
> [error]                                                                       
>                           ^
> [error] 
> /home/jenkins/workspace/SparkPullRequestBuilder@2/core/target/java/org/apache/spark/SparkConf.java:139:
>  error: OptionalConfigEntry is not public in 
> org.apache.spark.internal.config; cannot be accessed from outside package
> [error]    <T extends java.lang.Object> org.apache.spark.SparkConf set 
> (org.apache.spark.internal.config.OptionalConfigEntry<T> entry, T value)  { 
> throw new RuntimeException(); }
> [error]                                                                       
>                           ^
> [error] 
> /home/jenkins/workspace/SparkPullRequestBuilder@2/core/target/java/org/apache/spark/SparkConf.java:187:
>  error: ConfigEntry is not public in org.apache.spark.internal.config; cannot 
> be accessed from outside package
> [error]    <T extends java.lang.Object> org.apache.spark.SparkConf 
> setIfMissing (org.apache.spark.internal.config.ConfigEntry<T> entry, T value) 
>  { throw new RuntimeException(); }
> [error]                                                                       
>                                    ^
> [error] 
> /home/jenkins/workspace/SparkPullRequestBuilder@2/core/target/java/org/apache/spark/SparkConf.java:188:
>  error: OptionalConfigEntry is not public in 
> org.apache.spark.internal.config; cannot be accessed from outside package
> [error]    <T extends java.lang.Object> org.apache.spark.SparkConf 
> setIfMissing (org.apache.spark.internal.config.OptionalConfigEntry<T> entry, 
> T value)  { throw new RuntimeException(); }
> [error]                                                                       
>                                    ^
> [error] 
> /home/jenkins/workspace/SparkPullRequestBuilder@2/core/target/java/org/apache/spark/SparkConf.java:208:
>  error: ConfigEntry is not public in org.apache.spark.internal.config; cannot 
> be accessed from outside package
> [error]     org.apache.spark.SparkConf remove 
> (org.apache.spark.internal.config.ConfigEntry<?> entry)  { throw new 
> RuntimeException(); }
> [error]                                                               
> ...
> {code}
> These errors are actually warnings in a successful build without Javadoc 
> breaks as below - 
> https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.7/2908/consoleFull
> {code}
> [info] Constructing Javadoc information...
> [warn] 
> /home/jenkins/workspace/spark-master-test-sbt-hadoop-2.7/core/target/java/org/apache/spark/scheduler/BlacklistTracker.java:117:
>  error: ExecutorAllocationClient is not public in org.apache.spark; cannot be 
> accessed from outside package
> [warn]   public   BlacklistTracker 
> (org.apache.spark.scheduler.LiveListenerBus listenerBus, 
> org.apache.spark.SparkConf conf, 
> scala.Option<org.apache.spark.ExecutorAllocationClient> allocationClient, 
> org.apache.spark.util.Clock clock)  { throw new RuntimeException(); }
> [warn]                                                                        
>                                                                             ^
> [warn] 
> /home/jenkins/workspace/spark-master-test-sbt-hadoop-2.7/core/target/java/org/apache/spark/scheduler/BlacklistTracker.java:118:
>  error: ExecutorAllocationClient is not public in org.apache.spark; cannot be 
> accessed from outside package
> [warn]   public   BlacklistTracker (org.apache.spark.SparkContext sc, 
> scala.Option<org.apache.spark.ExecutorAllocationClient> allocationClient)  { 
> throw new RuntimeException(); }
> [warn]                                                                        
>                      ^
> [warn] 
> /home/jenkins/workspace/spark-master-test-sbt-hadoop-2.7/core/target/java/org/apache/spark/SparkConf.java:133:
>  error: ConfigReader is not public in org.apache.spark.internal.config; 
> cannot be accessed from outside package
> [warn]   private  org.apache.spark.internal.config.ConfigReader reader ()  { 
> throw new RuntimeException(); }
> [warn]                                            ^
> [warn] 
> /home/jenkins/workspace/spark-master-test-sbt-hadoop-2.7/core/target/java/org/apache/spark/SparkConf.java:138:
>  error: ConfigEntry is not public in org.apache.spark.internal.config; cannot 
> be accessed from outside package
> [warn]    <T extends java.lang.Object> org.apache.spark.SparkConf set 
> (org.apache.spark.internal.config.ConfigEntry<T> entry, T value)  { throw new 
> RuntimeException(); }
> [warn]                                                                        
>                          ^
> [warn] 
> /home/jenkins/workspace/spark-master-test-sbt-hadoop-2.7/core/target/java/org/apache/spark/SparkConf.java:139:
>  error: OptionalConfigEntry is not public in 
> org.apache.spark.internal.config; cannot be accessed from outside package
> [warn]    <T extends java.lang.Object> org.apache.spark.SparkConf set 
> (org.apache.spark.internal.config.OptionalConfigEntry<T> entry, T value)  { 
> throw new RuntimeException(); }
> [warn]                                                                        
>                          ^
> [warn] 
> /home/jenkins/workspace/spark-master-test-sbt-hadoop-2.7/core/target/java/org/apache/spark/SparkConf.java:187:
>  error: ConfigEntry is not public in org.apache.spark.internal.config; cannot 
> be accessed from outside package
> [warn]    <T extends java.lang.Object> org.apache.spark.SparkConf 
> setIfMissing (org.apache.spark.internal.config.ConfigEntry<T> entry, T value) 
>  { throw new RuntimeException(); }
> [warn]                                                                        
>                                   ^
> [warn] 
> /home/jenkins/workspace/spark-master-test-sbt-hadoop-2.7/core/target/java/org/apache/spark/SparkConf.java:188:
>  error: OptionalConfigEntry is not public in 
> org.apache.spark.internal.config; cannot be accessed from outside package
> [warn]    <T extends java.lang.Object> org.apache.spark.SparkConf 
> setIfMissing (org.apache.spark.internal.config.OptionalConfigEntry<T> entry, 
> T value)  { throw new RuntimeException(); }
> [warn]                                                                        
>                                   ^
> [warn] 
> /home/jenkins/workspace/spark-master-test-sbt-hadoop-2.7/core/target/java/org/apache/spark/SparkConf.java:208:
>  error: ConfigEntry is not public in org.apache.spark.internal.config; cannot 
> be accessed from outside package
> [warn]     org.apache.spark.SparkConf remove 
> (org.apache.spark.internal.config.ConfigEntry<?> entry)  { throw new 
> RuntimeException(); }
> [warn]
> ...    
> {code}
> These look warnings not errors in {{javadoc}} but when we introduce a Javadoc 
> break but it seems sbt produces other warnings as errors when generating 
> javadoc.
> For example, with the Java code, {{A.java}}, below:
> {code}
> /**
> * Hi
> */
> public class A extends B {
> }
> {code}
> if we run {{javadoc}}
> {code}
> javadoc A.java
> {code}
> it produces a warning because it does not find B symbol. It seems still 
> generating the documenation fine.
> {code}
> Loading source file A.java...
> Constructing Javadoc information...
> A.java:4: error: cannot find symbol
> public class A extends B {
>                        ^
>   symbol: class B
> Standard Doclet version 1.8.0_45
> Building tree for all the packages and classes...
> Generating ./A.html...
> Generating ./package-frame.html...
> Generating ./package-summary.html...
> Generating ./package-tree.html...
> Generating ./constant-values.html...
> Building index for all the packages and classes...
> Generating ./overview-tree.html...
> Generating ./index-all.html...
> Generating ./deprecated-list.html...
> Building index for all classes...
> Generating ./allclasses-frame.html...
> Generating ./allclasses-noframe.html...
> Generating ./index.html...
> Generating ./help-doc.html...
> 1 warning
> {code}
> However, if we have a javadoc break in comments as below:
> {code}
> /**
> * Hi
> * @see B
> */
> public class A extends B {
> }
> {code}
> this produces an error and warning.
> {code}
> Loading source file A.java...
> Constructing Javadoc information...
> A.java:5: error: cannot find symbol
> public class A extends B {
>                        ^
>   symbol: class B
> Standard Doclet version 1.8.0_45
> Building tree for all the packages and classes...
> Generating ./A.html...
> A.java:3: error: reference not found
> * @see B
>        ^
> Generating ./package-frame.html...
> Generating ./package-summary.html...
> Generating ./package-tree.html...
> Generating ./constant-values.html...
> Building index for all the packages and classes...
> Generating ./overview-tree.html...
> Generating ./index-all.html...
> Generating ./deprecated-list.html...
> Building index for all classes...
> Generating ./allclasses-frame.html...
> Generating ./allclasses-noframe.html...
> Generating ./index.html...
> Generating ./help-doc.html...
> 1 error
> 1 warning
> {code}
> It seems {{sbt unidoc}} recognises errors and also warnings as {{\[error\]}} 
> when there are breaks (the related context looks described in 
> https://github.com/sbt/sbt/issues/875#issuecomment-24315400).
> Given my observations so far, it is generally okay to just fix {{\[info\] # 
> errors}} printed at the bottom which are usually produced in generating the 
> html {{Building tree for all the packages and classes...}} phase.
> Essentially, this looks a bug in GenJavaDoc which generates Java codes 
> wrongly and a bug in SBT that fails to distinguish warnings and errors in 
> this case.
> This message via Jenkins actually looks confusing.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to