[ 
https://issues.apache.org/jira/browse/BEAM-7696?focusedWorklogId=280757&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-280757
 ]

ASF GitHub Bot logged work on BEAM-7696:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 23/Jul/19 02:32
            Start Date: 23/Jul/19 02:32
    Worklog Time Spent: 10m 
      Work Description: yanlin-Lynn commented on pull request #9019: 
[BEAM-7696] Prepare files to stage also in local master of spark runner.
URL: https://github.com/apache/beam/pull/9019#discussion_r306106602
 
 

 ##########
 File path: 
runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineOptions.java
 ##########
 @@ -150,17 +152,24 @@ public String create(PipelineOptions options) {
   void setCacheDisabled(boolean value);
 
   /**
-   * Local configurations work in the same JVM and have no problems with 
improperly formatted files
-   * on classpath (eg. directories with .class files or empty directories). 
Prepare files for
-   * staging only when using remote cluster (passing the master address 
explicitly).
+   * Classpath contains non jar files (eg. directories with .class files or 
empty directories) will
+   * cause exception in running log. Though the {@link 
org.apache.spark.SparkContext} can handle
+   * this when running in local master, it's better not to include non-jars 
files in classpath.
    */
-  static void 
prepareFilesToStageForRemoteClusterExecution(SparkPipelineOptions options) {
-    if (!options.getSparkMaster().matches("local\\[?\\d*\\]?")) {
-      options.setFilesToStage(
-          PipelineResources.prepareFilesForStaging(
-              options.getFilesToStage(),
-              MoreObjects.firstNonNull(
-                  options.getTempLocation(), 
System.getProperty("java.io.tmpdir"))));
-    }
+  static void prepareFilesToStage(SparkPipelineOptions options) {
+    List<String> filesToStage =
+        options.getFilesToStage().stream()
+            .map(File::new)
+            .filter(File::exists)
 
 Review comment:
   The 
[PipelineResources.prepareFilesForStaging](https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PipelineResources.java#L82)
 will package directories into jar file. That's why call this method before 
create a SparkContext.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 280757)
    Time Spent: 3h 20m  (was: 3h 10m)

> Detect classpath resources contains directory cause exception
> -------------------------------------------------------------
>
>                 Key: BEAM-7696
>                 URL: https://issues.apache.org/jira/browse/BEAM-7696
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-spark
>            Reporter: Wang Yanlin
>            Assignee: Wang Yanlin
>            Priority: Minor
>             Fix For: 2.15.0
>
>         Attachments: addJar_exception.jpg, files_contains_dir.jpg
>
>          Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Run the unit test  SparkPipelineStateTest.testBatchPipelineRunningState in 
> IntelliJ IDEA on my mac, get the IllegalArgumentException in the console 
> output. I check the source code, and find the result of 
> _PipelineResources.detectClassPathResourcesToStage_ contains directory, which 
> is the cause of the exception.
> See the attached file 'addJar_exception.jpg' for detail, and the result of 
> _PipelineResources.detectClassPathResourcesToStage_
> is showed in attached file 'files_contains_dir.jpg' during debug.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to