[jira] [Commented] (SPARK-13160) PySpark CDH 5

2016-02-03 Thread David Vega (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130572#comment-15130572
 ] 

David Vega commented on SPARK-13160:


I got to attach the files.


> PySpark CDH 5
> -
>
> Key: SPARK-13160
> URL: https://issues.apache.org/jira/browse/SPARK-13160
> Project: Spark
>  Issue Type: Question
>  Components: Deploy, PySpark
>Affects Versions: 1.3.0
>Reporter: David Vega
> Attachments: job.properties, wordcount.py, workflow.xml
>
>
> Hi,
> I am trying to deploy my simple pyspark in CDH5 and it is almost impossible.
> I tried a lot of oozie configuration. It is difficult to find any right 
> documentation.
> I cann't attach the configuration, I write here:
> * wordcount.py
> import sys
> from operator import add
> from pyspark import SparkContext
> if __name__ == "__main__":
> if len(sys.argv) != 2:
> print >> sys.stderr, "Usage: wordcount "
> exit(-1)
> sc = SparkContext(appName="PythonWordCount")
> lines = sc.textFile(sys.argv[1], 1)
> counts = lines.flatMap(lambda x: x.split(' ')) \
>  .map(lambda x: (x, 1)) \
>  .reduceByKey(add)
> output = counts.collect()
> for (word, count) in output:
> print "%s: %i" % (word, count)
> sc.stop()
> * workflow oozie
> 
> 
> ${jobTracker}
> ${nameNode}
> 
> 
> startDate
> 
> ${firstNotNull(wf:conf("initial-date"),firstNotNull(wf:conf("dateFromFile"),"sysdate"))}
> 
> 
> 
>  
>   
>   
> ${jobTracker}
> ${nameNode}
> yarn
> cluster
> ${spark_job_name}
> ${spark_code_path_jar_or_py}
> --executor-memory 256m --driver-memory 256m 
> --executor-cores 1 --num-executors 1 --conf 
> spark.yarn.queue=default
> ${nameNode}/group/saludar.txt
> 
> 
> 
> 
> 
> Hello World failed, error 
> message[${wf:errorMessage(wf:lastErrorNode())}]
> 
> 
> 
> I cann't attach the state my jobs, I write here
> Summary Metrics
> No tasks have started yet
> Tasks
> No tasks have started yet



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13160) PySpark CDH 5

2016-02-03 Thread David Vega (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Vega updated SPARK-13160:
---
Attachment: workflow.xml
wordcount.py
job.properties

> PySpark CDH 5
> -
>
> Key: SPARK-13160
> URL: https://issues.apache.org/jira/browse/SPARK-13160
> Project: Spark
>  Issue Type: Question
>  Components: Deploy, PySpark
>Affects Versions: 1.3.0
>Reporter: David Vega
> Attachments: job.properties, wordcount.py, workflow.xml
>
>
> Hi,
> I am trying to deploy my simple pyspark in CDH5 and it is almost impossible.
> I tried a lot of oozie configuration. It is difficult to find any right 
> documentation.
> I cann't attach the configuration, I write here:
> * wordcount.py
> import sys
> from operator import add
> from pyspark import SparkContext
> if __name__ == "__main__":
> if len(sys.argv) != 2:
> print >> sys.stderr, "Usage: wordcount "
> exit(-1)
> sc = SparkContext(appName="PythonWordCount")
> lines = sc.textFile(sys.argv[1], 1)
> counts = lines.flatMap(lambda x: x.split(' ')) \
>  .map(lambda x: (x, 1)) \
>  .reduceByKey(add)
> output = counts.collect()
> for (word, count) in output:
> print "%s: %i" % (word, count)
> sc.stop()
> * workflow oozie
> 
> 
> ${jobTracker}
> ${nameNode}
> 
> 
> startDate
> 
> ${firstNotNull(wf:conf("initial-date"),firstNotNull(wf:conf("dateFromFile"),"sysdate"))}
> 
> 
> 
>  
>   
>   
> ${jobTracker}
> ${nameNode}
> yarn
> cluster
> ${spark_job_name}
> ${spark_code_path_jar_or_py}
> --executor-memory 256m --driver-memory 256m 
> --executor-cores 1 --num-executors 1 --conf 
> spark.yarn.queue=default
> ${nameNode}/group/saludar.txt
> 
> 
> 
> 
> 
> Hello World failed, error 
> message[${wf:errorMessage(wf:lastErrorNode())}]
> 
> 
> 
> I cann't attach the state my jobs, I write here
> Summary Metrics
> No tasks have started yet
> Tasks
> No tasks have started yet



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13160) PySpark CDH 5

2016-02-03 Thread David Vega (JIRA)
David Vega created SPARK-13160:
--

 Summary: PySpark CDH 5
 Key: SPARK-13160
 URL: https://issues.apache.org/jira/browse/SPARK-13160
 Project: Spark
  Issue Type: Question
  Components: Deploy, PySpark
Affects Versions: 1.3.0
Reporter: David Vega


Hi,
I am trying to deploy my simple pyspark in CDH5 and it is almost impossible.
I tried a lot of oozie configuration. It is difficult to find any right 
documentation.
I cann't attach the configuration, I write here:
* wordcount.py
import sys
from operator import add

from pyspark import SparkContext


if __name__ == "__main__":
if len(sys.argv) != 2:
print >> sys.stderr, "Usage: wordcount "
exit(-1)
sc = SparkContext(appName="PythonWordCount")
lines = sc.textFile(sys.argv[1], 1)
counts = lines.flatMap(lambda x: x.split(' ')) \
 .map(lambda x: (x, 1)) \
 .reduceByKey(add)
output = counts.collect()
for (word, count) in output:
print "%s: %i" % (word, count)

sc.stop()

* workflow oozie


${jobTracker}
${nameNode}


startDate

${firstNotNull(wf:conf("initial-date"),firstNotNull(wf:conf("dateFromFile"),"sysdate"))}



 


${jobTracker}
${nameNode}
yarn
cluster
${spark_job_name}
${spark_code_path_jar_or_py}
--executor-memory 256m --driver-memory 256m 
--executor-cores 1 --num-executors 1 --conf 
spark.yarn.queue=default
${nameNode}/group/saludar.txt





Hello World failed, error 
message[${wf:errorMessage(wf:lastErrorNode())}]











I cann't attach the state my jobs, I write here
Summary Metrics
No tasks have started yet
Tasks
No tasks have started yet




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org