[jira] [Commented] (SPARK-13160) PySpark CDH 5
[ https://issues.apache.org/jira/browse/SPARK-13160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130572#comment-15130572 ] David Vega commented on SPARK-13160: I got to attach the files. > PySpark CDH 5 > - > > Key: SPARK-13160 > URL: https://issues.apache.org/jira/browse/SPARK-13160 > Project: Spark > Issue Type: Question > Components: Deploy, PySpark >Affects Versions: 1.3.0 >Reporter: David Vega > Attachments: job.properties, wordcount.py, workflow.xml > > > Hi, > I am trying to deploy my simple pyspark in CDH5 and it is almost impossible. > I tried a lot of oozie configuration. It is difficult to find any right > documentation. > I cann't attach the configuration, I write here: > * wordcount.py > import sys > from operator import add > from pyspark import SparkContext > if __name__ == "__main__": > if len(sys.argv) != 2: > print >> sys.stderr, "Usage: wordcount " > exit(-1) > sc = SparkContext(appName="PythonWordCount") > lines = sc.textFile(sys.argv[1], 1) > counts = lines.flatMap(lambda x: x.split(' ')) \ > .map(lambda x: (x, 1)) \ > .reduceByKey(add) > output = counts.collect() > for (word, count) in output: > print "%s: %i" % (word, count) > sc.stop() > * workflow oozie > > > ${jobTracker} > ${nameNode} > > > startDate > > ${firstNotNull(wf:conf("initial-date"),firstNotNull(wf:conf("dateFromFile"),"sysdate"))} > > > > > > > ${jobTracker} > ${nameNode} > yarn > cluster > ${spark_job_name} > ${spark_code_path_jar_or_py} > --executor-memory 256m --driver-memory 256m > --executor-cores 1 --num-executors 1 --conf > spark.yarn.queue=default > ${nameNode}/group/saludar.txt > > > > > > Hello World failed, error > message[${wf:errorMessage(wf:lastErrorNode())}] > > > > I cann't attach the state my jobs, I write here > Summary Metrics > No tasks have started yet > Tasks > No tasks have started yet -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13160) PySpark CDH 5
[ https://issues.apache.org/jira/browse/SPARK-13160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Vega updated SPARK-13160: --- Attachment: workflow.xml wordcount.py job.properties > PySpark CDH 5 > - > > Key: SPARK-13160 > URL: https://issues.apache.org/jira/browse/SPARK-13160 > Project: Spark > Issue Type: Question > Components: Deploy, PySpark >Affects Versions: 1.3.0 >Reporter: David Vega > Attachments: job.properties, wordcount.py, workflow.xml > > > Hi, > I am trying to deploy my simple pyspark in CDH5 and it is almost impossible. > I tried a lot of oozie configuration. It is difficult to find any right > documentation. > I cann't attach the configuration, I write here: > * wordcount.py > import sys > from operator import add > from pyspark import SparkContext > if __name__ == "__main__": > if len(sys.argv) != 2: > print >> sys.stderr, "Usage: wordcount " > exit(-1) > sc = SparkContext(appName="PythonWordCount") > lines = sc.textFile(sys.argv[1], 1) > counts = lines.flatMap(lambda x: x.split(' ')) \ > .map(lambda x: (x, 1)) \ > .reduceByKey(add) > output = counts.collect() > for (word, count) in output: > print "%s: %i" % (word, count) > sc.stop() > * workflow oozie > > > ${jobTracker} > ${nameNode} > > > startDate > > ${firstNotNull(wf:conf("initial-date"),firstNotNull(wf:conf("dateFromFile"),"sysdate"))} > > > > > > > ${jobTracker} > ${nameNode} > yarn > cluster > ${spark_job_name} > ${spark_code_path_jar_or_py} > --executor-memory 256m --driver-memory 256m > --executor-cores 1 --num-executors 1 --conf > spark.yarn.queue=default > ${nameNode}/group/saludar.txt > > > > > > Hello World failed, error > message[${wf:errorMessage(wf:lastErrorNode())}] > > > > I cann't attach the state my jobs, I write here > Summary Metrics > No tasks have started yet > Tasks > No tasks have started yet -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13160) PySpark CDH 5
David Vega created SPARK-13160: -- Summary: PySpark CDH 5 Key: SPARK-13160 URL: https://issues.apache.org/jira/browse/SPARK-13160 Project: Spark Issue Type: Question Components: Deploy, PySpark Affects Versions: 1.3.0 Reporter: David Vega Hi, I am trying to deploy my simple pyspark in CDH5 and it is almost impossible. I tried a lot of oozie configuration. It is difficult to find any right documentation. I cann't attach the configuration, I write here: * wordcount.py import sys from operator import add from pyspark import SparkContext if __name__ == "__main__": if len(sys.argv) != 2: print >> sys.stderr, "Usage: wordcount " exit(-1) sc = SparkContext(appName="PythonWordCount") lines = sc.textFile(sys.argv[1], 1) counts = lines.flatMap(lambda x: x.split(' ')) \ .map(lambda x: (x, 1)) \ .reduceByKey(add) output = counts.collect() for (word, count) in output: print "%s: %i" % (word, count) sc.stop() * workflow oozie ${jobTracker} ${nameNode} startDate ${firstNotNull(wf:conf("initial-date"),firstNotNull(wf:conf("dateFromFile"),"sysdate"))} ${jobTracker} ${nameNode} yarn cluster ${spark_job_name} ${spark_code_path_jar_or_py} --executor-memory 256m --driver-memory 256m --executor-cores 1 --num-executors 1 --conf spark.yarn.queue=default ${nameNode}/group/saludar.txt Hello World failed, error message[${wf:errorMessage(wf:lastErrorNode())}] I cann't attach the state my jobs, I write here Summary Metrics No tasks have started yet Tasks No tasks have started yet -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org