I have downloaded Amazon reviews for sentiment analysis from here. The file
is not particularly large (just over 500MB) but comes in the following
format
test.ft.txt.bz2.zip
So it is a text file that is compressed by bz2 followed by zip. Now I like
tro do all these operations in PySpark. In
If you build a normal python egg file with the dependencies, you can execute
that like you are executing a .py file with --py-files
Thanks,
Ewan
On 8 Aug 2016 3:44 p.m., pseudo oduesp <pseudo20...@gmail.com> wrote:
hi,
how i can export all project on pyspark like zip from local s
hi,
how i can export all project on pyspark like zip from local session to
cluster and deploy with spark submit i mean i have a large project with
all dependances and i want create zip containing all of dependecs and
deploy it on cluster
. by calling c.collect(), I see the
RDD has simply been truncated to the first 4 entries. weirdly, this doesn't
happen without calling map on b.
Any ideas?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/zip-in-pyspark-truncates-RDD-to-number-of-processors-tp8069