Working with a text file that is both compressed by bz2 followed by zip in PySpark

2024-03-04 Thread Mich Talebzadeh
I have downloaded Amazon reviews for sentiment analysis from here. The file is not particularly large (just over 500MB) but comes in the following format test.ft.txt.bz2.zip So it is a text file that is compressed by bz2 followed by zip. Now I like tro do all these operations in PySpark. In

Re: zip for pyspark

2016-08-08 Thread Ewan Leith
If you build a normal python egg file with the dependencies, you can execute that like you are executing a .py file with --py-files Thanks, Ewan On 8 Aug 2016 3:44 p.m., pseudo oduesp <pseudo20...@gmail.com> wrote: hi, how i can export all project on pyspark like zip from local s

zip for pyspark

2016-08-08 Thread pseudo oduesp
hi, how i can export all project on pyspark like zip from local session to cluster and deploy with spark submit i mean i have a large project with all dependances and i want create zip containing all of dependecs and deploy it on cluster

zip in pyspark truncates RDD to number of processors

2014-06-21 Thread madeleine
. by calling c.collect(), I see the RDD has simply been truncated to the first 4 entries. weirdly, this doesn't happen without calling map on b. Any ideas? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/zip-in-pyspark-truncates-RDD-to-number-of-processors-tp8069