[ https://issues.apache.org/jira/browse/BEAM-7696?focusedWorklogId=280757&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-280757 ]
ASF GitHub Bot logged work on BEAM-7696: ---------------------------------------- Author: ASF GitHub Bot Created on: 23/Jul/19 02:32 Start Date: 23/Jul/19 02:32 Worklog Time Spent: 10m Work Description: yanlin-Lynn commented on pull request #9019: [BEAM-7696] Prepare files to stage also in local master of spark runner. URL: https://github.com/apache/beam/pull/9019#discussion_r306106602 ########## File path: runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineOptions.java ########## @@ -150,17 +152,24 @@ public String create(PipelineOptions options) { void setCacheDisabled(boolean value); /** - * Local configurations work in the same JVM and have no problems with improperly formatted files - * on classpath (eg. directories with .class files or empty directories). Prepare files for - * staging only when using remote cluster (passing the master address explicitly). + * Classpath contains non jar files (eg. directories with .class files or empty directories) will + * cause exception in running log. Though the {@link org.apache.spark.SparkContext} can handle + * this when running in local master, it's better not to include non-jars files in classpath. */ - static void prepareFilesToStageForRemoteClusterExecution(SparkPipelineOptions options) { - if (!options.getSparkMaster().matches("local\\[?\\d*\\]?")) { - options.setFilesToStage( - PipelineResources.prepareFilesForStaging( - options.getFilesToStage(), - MoreObjects.firstNonNull( - options.getTempLocation(), System.getProperty("java.io.tmpdir")))); - } + static void prepareFilesToStage(SparkPipelineOptions options) { + List<String> filesToStage = + options.getFilesToStage().stream() + .map(File::new) + .filter(File::exists) Review comment: The [PipelineResources.prepareFilesForStaging](https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PipelineResources.java#L82) will package directories into jar file. That's why call this method before create a SparkContext. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 280757) Time Spent: 3h 20m (was: 3h 10m) > Detect classpath resources contains directory cause exception > ------------------------------------------------------------- > > Key: BEAM-7696 > URL: https://issues.apache.org/jira/browse/BEAM-7696 > Project: Beam > Issue Type: Bug > Components: runner-spark > Reporter: Wang Yanlin > Assignee: Wang Yanlin > Priority: Minor > Fix For: 2.15.0 > > Attachments: addJar_exception.jpg, files_contains_dir.jpg > > Time Spent: 3h 20m > Remaining Estimate: 0h > > Run the unit test SparkPipelineStateTest.testBatchPipelineRunningState in > IntelliJ IDEA on my mac, get the IllegalArgumentException in the console > output. I check the source code, and find the result of > _PipelineResources.detectClassPathResourcesToStage_ contains directory, which > is the cause of the exception. > See the attached file 'addJar_exception.jpg' for detail, and the result of > _PipelineResources.detectClassPathResourcesToStage_ > is showed in attached file 'files_contains_dir.jpg' during debug. -- This message was sent by Atlassian JIRA (v7.6.14#76016)