[ https://issues.apache.org/jira/browse/SPARK-22177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved SPARK-22177. ------------------------------- Resolution: Invalid This is not a Spark issue, but something to do with Spot. Somewhere you're feeding an empty path to some argument > Error running ml_ops.sh(SPOT): Can not create a Path from an empty string > ------------------------------------------------------------------------- > > Key: SPARK-22177 > URL: https://issues.apache.org/jira/browse/SPARK-22177 > Project: Spark > Issue Type: Question > Components: ML, Spark Submit, YARN > Affects Versions: 2.2.0 > Environment: CentOS 7 1708 > Hadoop 2.6.0 > Scala 2.11.8 > SPOT 1.0 > Reporter: Jorge Pizarro > Priority: Minor > Labels: newbie > > Error message running "./ml_ops.sh 20170922 dns 1e-4". > Complete error message: > [soluser@master spot-ml]$ bash -x ./ml_ops.sh 20170922 dns 1e-4 > + FDATE=20170922 > + DSOURCE=dns > + YR=2017 > + MH=09 > + DY=22 > + [[ 8 != \8 ]] > + [[ -z dns ]] > + source /etc/spot.conf > ++ UINODE=master > ++ MLNODE=master > ++ GWNODE=master > ++ DBNAME=spotdb > ++ HUSER=/user/soluser > ++ NAME_NODE=master > ++ WEB_PORT=50070 > ++ DNS_PATH=/user/soluser/dns/hive/y=2017/m=09/d=22/ > ++ PROXY_PATH=/user/soluser/dns/hive/y=2017/m=09/d=22/ > ++ FLOW_PATH=/user/soluser/dns/hive/y=2017/m=09/d=22/ > ++ HPATH=/user/soluser/dns/scored_results/20170922 > ++ IMPALA_DEM=master > ++ IMPALA_PORT=21050 > ++ LUSER=/home/soluser > ++ LPATH=/home/soluser/ml/dns/20170922 > ++ RPATH=/home/soluser/ipython/user/20170922 > ++ LIPATH=/home/soluser/ingest > ++ USER_DOMAIN=neosecure > ++ SPK_EXEC=1 > ++ SPK_EXEC_MEM=1g > ++ SPK_DRIVER_MEM=1g > ++ SPK_DRIVER_MAX_RESULTS=200m > ++ SPK_EXEC_CORES=2 > ++ SPK_DRIVER_MEM_OVERHEAD=100m > ++ SPK_EXEC_MEM_OVERHEAD=100m > ++ SPK_AUTO_BRDCST_JOIN_THR=10485760 > ++ LDA_OPTIMIZER=em > ++ LDA_ALPHA=1.02 > ++ LDA_BETA=1.001 > ++ PRECISION=64 > ++ TOL=1e-6 > ++ TOPIC_COUNT=20 > ++ DUPFACTOR=1000 > + '[' -n 1e-4 ']' > + TOL=1e-4 > + '[' -n '' ']' > + MAXRESULTS=-1 > + '[' dns == flow ']' > + '[' dns == dns ']' > + RAWDATA_PATH=/user/soluser/dns/hive/y=2017/m=09/d=22/ > + '[' '!' -z neosecure ']' > + USER_DOMAIN_CMD='--userdomain neosecure' > + > FEEDBACK_PATH=/user/soluser/dns/scored_results/20170922/feedback/ml_feedback.csv > + HDFS_SCORED_CONNECTS=/user/soluser/dns/scored_results/20170922/scores > + hdfs dfs -rm -R -f /user/soluser/dns/scored_results/20170922/scores > + spark-submit --class org.apache.spot.SuspiciousConnects --master yarn > --deploy-mode cluster --driver-memory 1g --conf > spark.driver.maxResultSize=200m --conf spark.driver.maxPermSize=512m --conf > spark.dynamicAllocation.enabled=true --conf > spark.dynamicAllocation.maxExecutors=1 --conf spark.executor.cores=2 --conf > spark.executor.memory=1g --conf spark.sql.autoBroadcastJoinThreshold=10485760 > --conf 'spark.executor.extraJavaOptions=-XX:MaxPermSize=512M > -XX:PermSize=512M' --conf spark.kryoserializer.buffer.max=512m --conf > spark.yarn.am.waitTime=100s --conf spark.yarn.am.memoryOverhead=100m --conf > spark.yarn.executor.memoryOverhead=100m > target/scala-2.11/spot-ml-assembly-1.1.jar --analysis dns --input > /user/soluser/dns/hive/y=2017/m=09/d=22/ --dupfactor 1000 --feedback > /user/soluser/dns/scored_results/20170922/feedback/ml_feedback.csv > --ldatopiccount 20 --scored /user/soluser/dns/scored_results/20170922/scores > --threshold 1e-4 --maxresults -1 --ldamaxiterations 20 --ldaalpha 1.02 > --ldabeta 1.001 --ldaoptimizer em --precision 64 --userdomain neosecure > 17/09/29 13:51:56 INFO client.RMProxy: Connecting to ResourceManager at > /0.0.0.0:8032 > 17/09/29 13:51:56 INFO yarn.Client: Requesting a new application from cluster > with 0 NodeManagers > 17/09/29 13:51:56 INFO yarn.Client: Verifying our application has not > requested more than the maximum memory capability of the cluster (8192 MB per > container) > 17/09/29 13:51:56 INFO yarn.Client: Will allocate AM container, with 1408 MB > memory including 384 MB overhead > 17/09/29 13:51:56 INFO yarn.Client: Setting up container launch context for > our AM > 17/09/29 13:51:56 INFO yarn.Client: Setting up the launch environment for our > AM container > 17/09/29 13:51:56 INFO yarn.Client: Preparing resources for our AM container > 17/09/29 13:51:57 INFO yarn.Client: Deleted staging directory > hdfs://master:9000/user/soluser/.sparkStaging/application_1506636890912_0058 > Exception in thread "main" java.lang.IllegalArgumentException: Can not create > a Path from an empty string > at org.apache.hadoop.fs.Path.checkPathArg(Path.java:126) > at org.apache.hadoop.fs.Path.<init>(Path.java:134) > at org.apache.hadoop.fs.Path.<init>(Path.java:93) > at > org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:337) > at > org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:458) > at > org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:497) > at > org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:814) > at > org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:169) > at org.apache.spark.deploy.yarn.Client.run(Client.scala:1091) > at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1150) > at org.apache.spark.deploy.yarn.Client.main(Client.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > real 0m3.610s > user 0m5.122s > sys 0m0.369s > [soluser@master spot-ml]$ cat /etc/spot.conf > # Licensed to the Apache Software Foundation (ASF) under one or more > # contributor license agreements. See the NOTICE file distributed with > # this work for additional information regarding copyright ownership. > # The ASF licenses this file to You under the Apache License, Version 2.0 > # (the "License"); you may not use this file except in compliance with > # the License. You may obtain a copy of the License at > # http://www.apache.org/licenses/LICENSE-2.0 > # Unless required by applicable law or agreed to in writing, software > # distributed under the License is distributed on an "AS IS" BASIS, > # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > # See the License for the specific language governing permissions and > # limitations under the License. > #node configuration > UINODE='master' > MLNODE='master' > GWNODE='master' > DBNAME='spotdb' > #hdfs - base user and data source config > HUSER='/user/soluser' > NAME_NODE='master' > WEB_PORT=50070 > DNS_PATH=${HUSER}/${DSOURCE}/hive/y=${YR}/m=${MH}/d=${DY}/ > PROXY_PATH=${HUSER}/${DSOURCE}/hive/y=${YR}/m=${MH}/d=${DY}/ > FLOW_PATH=${HUSER}/${DSOURCE}/hive/y=${YR}/m=${MH}/d=${DY}/ > HPATH=${HUSER}/${DSOURCE}/scored_results/${FDATE} > #impala config > IMPALA_DEM=master > IMPALA_PORT=21050 > #local fs base user and data source config > LUSER='/home/soluser' > LPATH=${LUSER}/ml/${DSOURCE}/${FDATE} > RPATH=${LUSER}/ipython/user/${FDATE} > LIPATH=${LUSER}/ingest > #dns suspicious connects config > USER_DOMAIN='neosecure' > SPK_EXEC='1' > SPK_EXEC_MEM='1g' > SPK_DRIVER_MEM='1g' > SPK_DRIVER_MAX_RESULTS='200m' > SPK_EXEC_CORES='2' > SPK_DRIVER_MEM_OVERHEAD='100m' > SPK_EXEC_MEM_OVERHEAD='100m' > SPK_AUTO_BRDCST_JOIN_THR='10485760' > LDA_OPTIMIZER='em' > LDA_ALPHA='1.02' > LDA_BETA='1.001' > PRECISION='64' > TOL='1e-6' > TOPIC_COUNT=20 > DUPFACTOR=1000 > [soluser@master spot-ml]$ spark-shell > Setting default log level to "WARN". > To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use > setLogLevel(newLevel). > 17/09/29 13:52:57 WARN metastore.ObjectStore: Failed to get database > global_temp, returning NoSuchObjectException > Spark context Web UI available at http://192.168.40.158:4040 > Spark context available as 'sc' (master = spark://master:7077, app id = > app-20170929135251-0000). > Spark session available as 'spark'. > Welcome to > ____ __ > / __/__ ___ _____/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /___/ .__/\_,_/_/ /_/\_\ version 2.2.0 > /_/ > > Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_144) > Type in expressions to have them evaluated. > Type :help for more information. > scala> :quit > [soluser@master spot-ml]$ java -version > java version "1.8.0_144" > Java(TM) SE Runtime Environment (build 1.8.0_144-b01) > Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode) > [soluser@master spot-ml]$ hdfs version > Hadoop 2.6.0-cdh5.12.1 > Subversion http://github.com/cloudera/hadoop -r > 520d8b072e666e9f21d645ca6a5219fc37535a52 > Compiled by jenkins on 2017-08-24T16:34Z > Compiled with protoc 2.5.0 > From source with checksum de51bf9693ab9426379a1cd28142cea0 > This command was run using /usr/lib/hadoop/hadoop-common-2.6.0-cdh5.12.1.jar > [soluser@master spot-ml]$ > Thanks in advance > Jorge Pizarro -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org