Re: unsubsribe
I have already send minimum 10 times! Today also I have send one! On Tue, Oct 30, 2018 at 3:51 PM Biplob Biswas wrote: > You need to send the email to user-unsubscr...@spark.apache.org and not > to the usergroup. > > Thanks & Regards > Biplob Biswas > > > On Tue, Oct 30, 2018 at 10:59 AM Anu B Nair wrote: > >> I am sending this Unsubscribe mail for last few months! It never happens! >> If anyone can help us to unsubscribe it wil be really helpful! >> >> On Tue, Oct 30, 2018 at 3:27 PM Mohan Palavancha < >> mohan.palavan...@gmail.com> wrote: >> >>> >>>
Re: unsubsribe
I am sending this Unsubscribe mail for last few months! It never happens! If anyone can help us to unsubscribe it wil be really helpful! On Tue, Oct 30, 2018 at 3:27 PM Mohan Palavancha wrote: > >
Unsubscribe
Hi, I have tried all possible way to unsubscripted from this group. Can anyone help? -- Anu
Unsubscribe
Unsubscribe
Unsubscribe
unsubscribe
Java heap space OutOfMemoryError in pyspark spark-submit (spark version:2.2)
Hi, I have a data set size of 10GB(example Test.txt). I wrote my pyspark script like below(Test.py): *from pyspark import SparkConf from pyspark.sql import SparkSession from pyspark.sql import SQLContext spark = SparkSession.builder.appName("FilterProduct").getOrCreate() sc = spark.sparkContext sqlContext = SQLContext(sc) lines = spark.read.text("C:/Users/test/Desktop/Test.txt").rdd lines.collect()* Then I am executing the above script using below command : spark-submit Test.py --executor-memory 15G --driver-memory 15G Then I am getting error like below: *17/12/29 13:27:18 INFO FileScanRDD: Reading File path: file:///C:/Users/test/Desktop/Test.txt, range: 402653184-536870912, partition values: [empty row] 17/12/29 13:27:18 INFO CodeGenerator: Code generated in 22.743725 ms 17/12/29 13:27:44 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1) java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:3230) at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113) at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140) at org.apache.spark.util.ByteBufferOutputStream.write(ByteBufferOutputStream.scala:41) at java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1877) at java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1786) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1189) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:43) at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:383) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 17/12/29 13:27:44 ERROR Executor: Exception in task 2.0 in stage 0.0 (TID 2) java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:3230) at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113) at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93* Please let me know how to resolve this ? -- Anu
Fwd: [pyspark][MLlib] Getting WARN FPGrowth: Input data is not cached for cached data
Hi, Following is my pyspark code, (attached input sample_fpgrowth.txt and python code along with this mail. Even after I have done cache, I am getting Warning: Input data is not cached. *from pyspark.mllib.fpm import FPGrowthimport pysparkfrom pyspark.context import SparkContextfrom pyspark.sql.session import SparkSessionsc = SparkContext('local')data = sc.textFile("sample_fpgrowth.txt")transactions = data.map(lambda line: line.strip().split(' ')).cache()model = FPGrowth.train(transactions, minSupport=0.2, numPartitions=10)result = model.freqItemsets().collect()print(result)* Understood that it is a warning, but just wanted to know in detail -- Anu r z h k p z y x w v u t s s x o n r x z y m t s q e z x z y r q t p from pyspark.mllib.fpm import FPGrowth import pyspark from pyspark.context import SparkContext from pyspark.sql.session import SparkSession sc = SparkContext('local') data = sc.textFile("sample_fpgrowth.txt") transactions = data.map(lambda line: line.strip().split(' ')).cache() model = FPGrowth.train(transactions, minSupport=0.2, numPartitions=10) result = model.freqItemsets().collect() print(result) - To unsubscribe e-mail: user-unsubscr...@spark.apache.org